Computational Music Synthesis: Sean Luke
Computational Music Synthesis: Sean Luke
Sean Luke
Department of Computer Science
George Mason University
Zeroth Edition
Online Version 0.2
June, 2019
Copyright 2019 by Sean Luke.
Thanks to Carlotta Domeniconi, Bryan Hoyle, Lilas Dinh, Robbie Gillam, John Otten, Mark Snyder,
Adonay Henriquez Iraheta, Alison Howland, Daniel McCloskey, Stephen Yee, Jabari Byrd, Riley Keesling,
Benjamin Ngyen, Joseph Parrotta, Laura Pilkington, Pablo Turriago-Lopez, Eric Velosky, Joshua Weipert,
Joshua Westhoven, Matt Collins, Kelian Li, Thomas Tyra, Talha Mirza, Curtis Roads, Julius Smith, Brad
Ferguson, Zoran Duric, and Dan Lofaro.
Get the latest version of this document or suggest improvements here:
http://cs.gmu.edu/∼sean/book/synthesis/
Cite this document as: Sean Luke, 2019, Computational Music Synthesis, zeroth edition,
available for free at http://cs.gmu.edu/∼sean/book/synthesis/
Always include the URL, as this book is primarily found online. Do not include the online version numbers
unless you must, as Citeseer and Google Scholar may treat each (oft-changing) version as a different book.
BibTEX: @Book{ Luke2019Synthesis,
author = { Sean Luke },
title = { Computational Music Synthesis},
edition = { zeroth },
year = { 2019 },
note = { Available for free at http://cs.gmu.edu/$\sim$sean/book/synthesis/ } }
This document is licensed under the Creative Commons Attribution-No Derivative Works 3.0
United States License, except for those portions of the work licensed differently as described in the next
section. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/3.0/us/ or send a
letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. A summary:
• You are free to redistribute this document.
• You may not modify, transform, translate, or build upon the document except for personal use.
• You must maintain the author’s attribution with the document at all times.
• You may not use the attribution to imply that the author endorses you or your document use.
This summary is just informational: if there is a conflict between the summary and the actual license, the
actual license always takes precedence. Contact me if you need to violate the license (like do a translation).
Certain art is not mine. Those images I did not create are marked with a special endnote like this: ©14
These refer to entries in the Figure Copyright Acknowledgements, starting on page 155, where I list the
creator, original URL, and the license or public domain declaration under which the image may be used. In
some cases there is no license: instead, the creators have kindly granted me permission to use the image in
the book. In these cases you will need to contact the creators directly to obtain permission if you wish to
reuse the image outside of this book.
0
Contents
List of Algorithms 4
0 Preface 5
0.1 Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
0.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 Introduction 7
1.1 A Very Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 The Synthesizer Player’s Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 A Typical Synthesizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Representation of Sound 13
2.1 Units of Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Digitization of Sound Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Additive Synthesis 31
4.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Architecture Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Modulation 43
5.1 Low Frequency Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Step Sequencers and Drum Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 Arpeggiators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.5 Gate/CV and Modular Synthesizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.6 Modulation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.7 Modulation via MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 Subtractive Synthesis 55
6.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Architecture Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
1
7 Oscillators, Combiners, and Amplifiers 67
7.1 Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Antialiasing and the Nyquist Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.3 Wave Shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4 Wave Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.5 Phase Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.6 Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.7 Amplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8 Filters 81
8.1 Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.2 Building a Digital Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.3 Transfer Functions in the Laplace Domain . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.4 Poles and Zeros in the Laplace Domain . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.5 Amplitude and Phase Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.6 Pole and Zero Placement in the Laplace Domain . . . . . . . . . . . . . . . . . . . . . 89
8.7 The Z Domain and the Bilinear Transform . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.8 Pole and Zero Placement in the Z Domain . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.9 Basic Second-Order Butterworth Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.10 Digital Second-Order Butterworth Filters . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.11 Formant Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10 Sampling 113
10.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
10.2 Pulse Code Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.3 Wavetable Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
10.4 Granular Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.5 Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.6 Basic Real-Time Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.7 Windowed Sinc Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2
12 Controllers and MIDI 135
12.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
12.2 MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
12.2.1 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
12.2.2 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
12.2.3 CC, RPN, and NRPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
12.2.4 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
12.2.5 MPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
12.2.6 MIDI 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Sources 151
Index 159
3
List of Algorithms
0 Bubble Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 RecursiveFFT (Private Subfunction) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 The Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 The Inverse Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Multiply by a Window Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6 Simple Monophonic Additive Synthesizer Architecture . . . . . . . . . . . . . . . . . . . 36
7 Sine Table Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8 Sine Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9 Buffer Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
10 Simple Low Frequency Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
11 Random Low Frequency Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
12 Sample and Hold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
13 Simple Linear Time-based ADSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
14 Simple Exponential Rate-based ADSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
15 Simple Parameter Step Sequencer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
16 Simple Monophonic Subtractive Synthesizer Architecture . . . . . . . . . . . . . . . . . . 62
17 Pink Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
18 Initialize a Digital Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
19 Step a Digital Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
20 Delay Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
21 Multi-Tap Delay Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
22 Karplus-Strong String Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4
0 Preface
This book was developed for a senior computer science course I taught in Spring of 2019. Its
objective was to teach a computer science student with some music experience how to build a
digital music synthesizer in software from the ground up. I hope it’ll be useful to others.
The text assumes an undergraduate computer science background and some basic calculus and
linear algebra. But it does not assume that the reader is particularly familiar with the history, use,
or significance of music synthesizers. To provide some appreciation of these concepts, I’ve tried to
include quite a bit of history and grounding in the text. The text also doesn’t assume that the reader
knows much about electrical engineering or digital signal processing (indeed it should be obvious
to experts in these fields that I don’t know much either!) and so tries to provide an introduction to
these concepts in a relatively gentle manner.
One of the problems with writing a book on music topics is that reading about these topics is not
enough: you have to hear the sounds being discussed, and see the instruments being manipulated,
in order to gain an intuitive understanding for the concepts being presented. The mere pages here
won’t help with that.
0.1 Caveats
While I am a computer science professor, a musician, and a synthesizer tool builder on the side,
I am by no means an expert in how to build music synthesizers. I am very familiar with certain
subjects discussed here, but many others were entirely new to me at the start of developing this
book and course. My knowledge of filters, resampling, and effects is particularly weak.
What this means is that you should take a lot of what’s discussed here with a big grain of
salt: there are likely to be a great many errors in the text, ranging from small typos to grand
misconceptions. I would very much appreciate error reports: send them to [email protected]
I may also be making significant modifications to the text over time, even rearranging entire
sections as necessary. I have also tried very hard to cite my sources and give credit where it is due.
If you feel I did not adequately cite or credit you, send me mail.
I refer to my own tools here and there. Hey, that’s my prerogative! They are all open source:
• Gizmo is an Arduino-based MIDI manipulation tool with a step sequencer, arpeggiator, note
recorder, and lots of other stuff. I refer to it in Section 5.
http://cs.gmu.edu/∼sean/projects/gizmo/
• Edisyn is a synthesizer patch editor with very sophisticated tools designed to assist in
exploring the space of patches. I refer to it in Section 5.
http://cs.gmu.edu/∼sean/projects/edisyn/
0.2 Algorithms
Algorithms in this book are written peculiarly and relatively informally. If an algorithm takes
parameters, they will appear first followed by a blank line. If there are no parameters, the algorithm
5
begins immediately. Sometimes certain shared, static global variables are defined which appear at
the beginning and are labelled global. Here is an example of a simple algorithm:
6
1 Introduction
A music synthesizer, or synthesizer (or just synth), is a programmable device which produces
sounds in response to being played or controlled by a musician or composer. Synthesizers are
omnipresent. They’re in pop and rock songs, rap and hip hop, movie and television scores, sound
cards and video games, and — unfortunately — cell phone ringtones. Music synthesizers are used
for other purposes as well: for example, R2-D2’s sounds were generated on a music synthesizer,1
as was Deep Note, the famous trademark sound played before movies to indicate the use of
THX.2 The classic “Ahhh” bootup sound on many Macintoshes in the 90s and 00s was produced
on a Korg Wavestation, a popular commercial synthesizer from the late 90s.
Traditionally a music synthesizer generates sounds from scratch by creating and modifying
waveforms. But that’s not the only scenario. For example samplers will sample a sound, then edit
it and store it to be played back later as individual notes. Romplers3 are similar, except that their
playback samples are fixed in ROM, and so they cannot sample in the first place.
Synthesizers also differ based on their use. For example, while many synthesizers produce
tonal notes for melody, drum machines produce synthesized or sampled drum sounds. Vocoders
take in human speech via a microphone, then use this sound source to produce a synthesized
version of the same, often creating a robot voice sound. Effects units take in sounds — vocals or
instrumentals, say — then modify them and emit the result, adding delay or reverb, for example.
Synthesizers have often been criticized for notionally replicating, and ultimately replacing, real
instruments. And indeed this is not an uncommon use case: a great many movie scores you
probably thought were performed by orchestras were actually done with synthesizers, and much
more cheaply. Certainly there are many stories in history of drum machines eliminating drummers
from bands. But more and more synthesizers have come to be seen as instruments in their own
right, with their own aesthetic and art.
7
Modular synthesizers had many failings. They were large
and cumbersome, required manual connections with cabling,
could only store one patch at a time (the one currently wired
up!), and usually could only produce one note at a time (that
is, they were monophonic). Modular synthesizer keyboards of
the time offered limited control and expressivity. And modular
synths were very, very expensive.
Modular synthesizers were also analog, meaning that their
sounds were produced entirely via analog circuitry. Analog
Figure 1 Moog Minimoog Model D.©2
synthesizers would continue to dominate until the mid-1980s.
8
2000s and Beyond As personal computers be-
came increasingly powerful, the turn of the cen-
tury saw the rise of digital audio workstations or
DAWs: software which could handle much of the
music production environment entirely inside a lap-
top. This included the use of software synthesizers
rather than those in hardware. The early 2000s also
saw the popularization of virtual analog synthe-
sizers, which simulated classic analog approaches
in digital form.
Analog synthesizers have since seen a renais- Figure 4 Propellerhead Reason DAW.©5
sance as musicians yearned for the warm, physical
devices of the past. At the extreme end of this trend, we have since seen the reintroduction of
modular synthesizers as a popular format. What goes around comes around.
Playing Around This is the obvious basic scenario: you own a synthesizer and want to play
on it. This is sometimes called noodling. The important item here is the possible inclusion of
an effects unit. Effects are manipulations of sound to add some kind of “texture” or “color” to
it. For example, we might add a delay or echo to the synthesizer’s sound, or some reverb or
chorus. Effects, particularly reverb, are often important to make a synthesizer’s dry sound become
more realistic or interesting sounding. Because effects are so important an item at the end of the
synthesizer’s audio chain, many modern synthesizers have effects built in as part of the synthesizer
itself.
Performance In the next scenario, you are playing a synthesizer as a solo or group live perfor-
mance involving sound reinforcement. To do this, you will need the inclusion of a mixer, a device
which sums up the sounds from multiple inputs and produces one final sound to be broadcast.
Mixers can be anything from small tabletop/desktop or rackmount devices to massive automated
mixing consoles: but they all do basically the same thing. Here, the effects unit is added as an
auxiliary module: the mixer will send a mixed sound to the effects unit, which then returns the
resulting effects. The mixer then adds the effects to the final sound and outputs it. The amount of
effects added to the final sound is known as how wet the effects are.
Production A classic synthesizer sound recording and production environment adds a recorder
(historically a multi-track tape recorder) which receives sound from the mixer or sends it back to
the mixer to be assessed and revised. The high-grade speakers used by recording engineers or
musicians to assess how the music sounds during the mixing and recording process are known
as monitors. Additionally, because synthesizers can be controlled remotely and also controlled
9
Playing Around
Performance
Other
Instruments Sound Out
and Vocals
Mixer
Musician Plays Synthesizer
Effects
Production
Other
Instruments Aux Sound Out
Musician Plays and Vocals
Recorder
Mixer
Sequencer Synthesizer
(Hardware or
Effects
Controller Computer) Synthesizer
Synthesizer Recorder
Sequencer Mixer
Synthesizer
Controller Effects
Synthesizer
10
via automated means, a musician might construct an entire song by playing multiple parts into a
sequencer, which records the event data (when a note was played or released, etc.) and then can
play multiple synthesizers simultaneously. Sequencers can be found both as computer software
and as dedicated hardware. In this scenario, a musician wouldn’t play a synthesizer directly, but
rather would play a controller, often a keyboard, to issue event data to the sequencer or to one or
more downstream synthesizers.
In-the-Box (ITB) Production The classic synthesizer production environment has given way to
one in which most of these tasks are now done in software on a computer. This is known as
In The Box or ITB production. The core software environment for ITB production is the digital
audio workstation or DAW. This software is a combination of a sequencer, mixer, recorder, and
effects unit. Most DAWs are also modular in design and can be extended via plug-ins, the most
well-known being Virtual Studio Technology plugins (or VSTs), or Audio Unit (AU) plugins.
These plugins can be used for many tasks, but are often used to add software synthesizers (or
softsynths) as additional playable instruments directly in the DAW.
A DAW interacts with the musician in several ways. First, the musician can enter note event
data directly via his controller to the computer through a MIDI interface. Second, the DAW’s
internal sequencer can also control external synthesizers via the same interface, playing them along
with its software synthesizers. Third, the musician can record audio into the DAW from those
synthesizers or from other audio sources (instruments, vocals) via an audio interface. Fourth, the
computer can use this same audio interface to play audio on monitors or headphones to be assessed
by the musician or studio engineer. Ultimately the DAW will be used to write out a final version of
the song in digital form for a CD or MP3 file, etc.
11
Voices The architecture for a single Prophet ’08
voice is very typical of a subtractive analog syn-
thesizer. Each of its eight voices has two oscilla-
tors, which are modules that produce sound waves.
These oscillators are then combined together to
form a single sound wave, which is then fed into a
filter. A filter is a device which modifies the tonal
qualities of a sound: in this case, the Prophet ’08 has
a low pass filter, which can tamp down high fre-
quencies in a sound wave, making it sound duller
or more mellow. The filtered sound is then fed into Figure 7 Eight-voice circuitry for the Prophet ’08.
an amplifier which changes its volume. All of the currently sounding voices are then finally added
together and the result is emitted as sound.
The oscillators, combiner, filter, and amplifier all have many parameters. For example, the
oscillators may be detuned relative to one another; the combiner can be set to weight one oscillator’s
volume more than another; the low-pass filter’s cutoff frequency (the point beyond which it starts
dampening sounds) may be adjusted, or the amplifier’s volume scaling may be tweaked.
Modulation The synthesizer’s many parameters can be manually set by the musician, or the
musician can attach them to modulation devices which will change the parameters automatically
over time as a note is played, to create richer sounds. For example, both the filter and the amplifier
have dedicated DADSR envelopes to change their cutoff frequency and volume scaling respec-
tively. A DADSR (Delay/Attack/Decay/Sustain/Release) envelope is a simple function which
starts at 0 when a note is played, then delays for a certain amount of time, then rises (attacks) to
some maximum value at a certain rate, then falls (decays) to a different (sustain) value at a certain
rate. It then holds at that sustain value as long as the key is held down, and when the key is released,
it dies back down to zero at a certain rate. This allows (for example) a sound to get suddenly loud
or brash initially, then die back and finally decay slowly after it has been released.
In addition to its two dedicated envelopes, the Prophet ’08 has an extra envelope which can
be assigned to many different parameters; and it also has four low frequency oscillators or LFOs
which can also be so assigned. An LFO is just a function which slowly oscillates between 0 and 1 (or
between -1 and 1). When attached to the pitch of a voice an LFO would cause vibrato, for example.
The Prophet ’08 also has a basic step sequencer which can play notes or change parameters in a
certain repeating, programmable pattern; and a simple arpeggiator which repeatedly plays notes
held down by the musician in a certain repeating pattern as well. Modulation sources can be
assigned to many different parameters via the Prophet ’08’s modulation matrix.
Patches and MIDI The parameters which collectively define the sound the Prophet ’08 is making
are called a patch. The Prophet ’08 is a stored patch synthesizer, meaning that after you have
programmed the synthesizer to produce a sound you like, you can save patches to memory; and
you can recall them later. A patch can also be transferred to or from a computer program, or another
Prophet ’08, over a MIDI cable. MIDI can also be used to play or program a Prophet ’08 remotely
from another keyboard controller, computer, or synthesizer. Because you can play a Prophet ’08
remotely via another keyboard, you don’t need the Prophet ’08’s keyboard at all, and indeed there
exists a keyboard-less standalone tabletop or desktop module version of the synthesizer.
12
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 2.59
Audio Track
2 Representation of Sound
Sound waves represent changes in air or water pressure
as a sound arrives to our ear and, in their simplest form,
they are simple one-dimensional functions of time, that 0.0000 0.0030 0.0050 0.0070 0.0090 0.0110 0.0130 0.0150 0.0170 0.0190 0.0210 0.0230 0.0250 0.0270 0.0290 0.0310 0.0330 0.0350 0.0370 0.0390 0.0410 0.0430 0.0450 0.0470 0.0490 0.0510 0.0530 0.0550 0.0570 0.0590 0.0610 0.0630 0.0650 0.0670 0.0697
tials, since each of them is in some sense a part of the final 1.0
where the x-axis would be frequency, not time. This bar -0.5
4 In this example, they all have the same phase, since they’re all 0 at time 0.
13
When a sound is arbitrarily long and complex, the num- Amplitude
ber of sine waves required to describe it is effectively infi-
1.0
Phase Phase is the point in time where the sine wave Figure 12 Time Domain (top) and Frequency
begins (starts or restarts from zero). Consider Figure 13, Domain (bottom) of a bass guitar note.©8
which shows three sine waves with identical frequency
and amplitude, but which differ in phase. Along with
amplitude and frequency, the phase of a partial plays a crit- Amplitude
1.0
ical part in the sound. Thus the frequency domain should
not be thought of as a single plot of frequency versus am- 0.5
plitude, but rather as two separate plots, one of frequency
versus amplitude, and the other of frequency versus phase. 1 2 3 4 5 6
Time
Similarly, the partials that make up a sound have two com- -0.5
ponents: amplitude and phase.
Phase is far less important to us than amplitude: hu- -1.0
mans can detect amplitude much better. In fact, while we
Figure 13 Three identical sine waves which
can distinguish partials which are changing in phase, if we differ only in phase. Note that the blue and
were presented with two sounds with identical partials ex- green sine waves have entirely opposite phase.
cept for different phases, we would not be able to distinguish
between them! Because of this, some synthesis methods (such as additive synthesis) almost entirely
disregard phase: though other ones (such as frequency modulation synthesis), which rely on
changes in phase very heavily.
14
Fundamental 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
% " " $" & $" #" " !" " !" "
Closest Pitch
" " " " " $ " # " $ "
"
Variance (in cents)
" +2 -14 +2 -31 +4 +14 -49 +2 +41 -31 -12 +5 +4 -2 -14
Figure 15 Harmonic series. The fundamental is shown as a low C, and successive harmonics are shown with their
equivalent pitches. The numbers below the notes indicate the degree to which the harmonic deviates from traditional
pitches (in cents).©9
Harmonics and Pitch Identification Most sounds which we perceive as “tonal” or “musical”
have the large majority of their partials organized in very specific way. In these sounds, there
is a specific partial called the fundamental. This is often the lowest significant partial, often the
loudest partial, and often the partial whose frequency we typically would identify as the pitch of
the sound — that is, its associated note. Partials other than the fundamental are called overtones.
Furthermore, in these “tonal” sounds, most overtones have frequencies which are integer multiples
of the fundamental. That is, most of the overtones will have frequencies of the form i × f , where f
is the fundamental frequency, and i is an integer 2, 3, .... When partials are organized this way, we
call them harmonics.
A great many instruments have partials organized 1.0
largely as harmonics. This includes woodwinds, brass,
strings, you name it. The reason for this is that many in- 0.5
struments essentially fixed strings or tubes which can only
vibrate according to certain modes. For example, consider 1 2 3 4 5 6
Figure 14. Here the ends of a violin string are fixed at 0 and -0.5
2π respectively. There are only so many ways that a violin
string can vibrate as long as those ends are fixed. Figure 14 -1.0
shows the first four possibilities, and their frequencies cor-
respond to the first four harmonics ( f , 2 f , 3 f , and 4 f ). A Figure 14 Modes corresponding to the first,
second, third, and fourth harmonics in a
woodwind is similar: vibrations in the air in its tube are stringed instrument.
essentially “fixed” at the two ends.
Many harmonics are very close to the pitch of standard notes, and this has a strong effect on
our perception of the tonality of instruments and the chords they produce. For example, the second
harmonic, whose frequency is 2 f , is exactly an octave above f . The third harmonic is very nearly a
fifth above that. The fourth harmonic is two octaves above the fundamental. Figure 15 shows a
fundamental, various harmonics, the notes that they are closest to, and their degree of deviation
from those notes. A few harmonics are quite off,5 but many are tantalizingly close.
5 You might be wondering: why are they off? This is an interesting question. Classically notes have been tuned such
that an octave corresponds to a doubling in frequency. That lines up nicely with harmonics since every harmonic that is
the next power of 2 is an octave higher. But within an octave, how would one space the remaining notes? The classic
approach has been to assume that there are 12 notes, and that one spaces them such that the ratio between the frequencies
any two successive notes (A / A[, say, or F/E) is exactly the same. This is a fancy way of saying that the notes are laid
out not linearly in frequency but logarithmically. This tuning strategy is known as Equal Temperament. The problem with
this model is that these logarithmic frequencies don’t line up along integer multiple values like the harmonics do. Many
of them are close enough, but some are pretty off. Because integers and logs don’t match up, temperament strategies to
make notes sound more harmonious together have been a matter of debate for many centuries.
15
Music engraving by LilyPond 2.18.2—www.lilypond.org
Mode (0, 1) Mode (1, 1) Mode (2, 1) Mode (0, 2) Mode (3, 1) Mode (1, 2) Mode (4, 1) Mode (2, 2) Mode (0, 3)
Freq 1.0 Freq 1.59 Freq 2.14 Freq 2.30 Freq 2.65 Freq 2.92 Freq 3.16 Freq 3.50 Freq 3.60
Figure 16 First nine drum modes, ordered by frequency. Red regions vibrate up when white regions vibrate down and
vice versa. Drum vibration patterns may combine these modes. Underneath each mode is the frequency relative to the
first mode (the fundamental).
Notice that when defining the term fundamental I used the word often three times: the funda-
mental is often the lowest harmonic, often the loudest, and often what determines the pitch. This
is because sometimes one or more of those things isn’t true. For example, organs often have one
or two fairly loud partials an octave or two below the fundamental (an octave lower corresponds
to half the frequency). Similarly, bells will usually have at least one large partial lower than the
fundamental called the hum tone. In fact, the hum tone is in many ways the fundamental, but
we usually identify the pitch of the bell (its so-called strike tone) with the second harmonic (the
prime), which is also usually louder. Thus the prime is typically thought of as the fundamental.
Bells are bizarre. The next major partial up from the prime is usually the tierce,6 and it’s just a
minor third above the prime, or about 1.2 times the prime frequency.7 There are other inharmonic
partials as well. And yet we perceive bells as, more or less, tonal. The specific amplitudes of the
various partials in bells can cause us to associate the pitch with partials other than the prime. In
fact, bells may cause us to associate the pitch with a partial that doesn’t even exist in the sound.
Finally, drums have their own unique harmonic characteristics quite unlike strings or pipes.
A drum is a 2D sheet, and so its harmonic vibration patterns (or modes) are two dimensional. This
results in a complex series of partial frequencies, as shown in Figure 16, which are generally atonal.8
produced (Mark Kac, 1966, Can one hear the shape of a drum?, American Mathematical Monthly, 73(4)). It took 30 years to
determine that the answer was no (Carolyn Gordon, David Webb, and S. Wolpert, 1992, Isospectral plane domains and
surfaces via riemannian orbifolds, Inventiones Mathematicae, 110(1)).
16
Frequency is of course closely associated with pitch: what note the sound is being played at.
Pitch goes up logarithmically with frequency. When a frequency is doubled, its perceived pitch has
gone up an octave. More precisely, if a note is a certain frequency f , and we go up n semitones
(half-steps) from that note, the new note is at frequency g = 2n/12 × f . Similarly, if you have gone
from frequency f to frequency g, then you have moved n = 12 log2 ( g/ f ) semitones in pitch.
Amplitude We’ll usually describe amplitude as the actual y value of the sound wave itself.
However the amplitude of a signal is occasionally converted into log10 and described in terms
of decibels (dB): specifically, 1dB = 20 log10 × change in amplitude. Doubling the amplitude is
approximately an increase of 6dB. A doubling in perceived volume is often described as an increase
of 10 dB. Decibels are a relative measure. Thus you will very often see negative decibels relative to
some signal volume, to indicate sounds quieter than that signal.
Sine waves of course go both above 0 amplitude and below it: thus a sine wave may be described
informally as having a “negative amplitude” at at certain point: though technically amplitude, like
volume, is a magnitude measure and is only positive. Last, when we are multiplying an amplitude
or volume to make it louder or softer, we are often said to be modifying the gain of the signal.
Phase Because we’re talking about sine waves, the phase of a partial is an angular measure and
so is typically expressed as a value from 0...2π (or if you like, −π...π). In Figure 13 the green and
blue sine waves are out of phase of one another by π.
The Stereo Field When sounds are in stereo, they can appear to come from left of us, in center, or
to the right of us. The angular position from which a sound appears to originate is known as the
pan position of the sound.
think of it this way, because it im- Figure 17 A sine wave discretized and shown in grid form (left) and as a
plies that when the wave is played, lollipop graph (right).
it takes the form of a blocky func-
tion with all horizontal and vertical lines (shown in red) and right angles. This is not really what
happens.
17
Instead, to play a digital sound, it is first fed to a Digital-Analog Converter or DAC.9 This
device changes its output voltage abruptly to match each new sample as it is being played: in this
sense it resembles the blocky function. But then this voltage is fed into a filter, often a capacitor,
which converts the abrupt voltage change into one which slides smoothly from one sample value
to the next, producing a curvy, smooth output.
Thus it might be best to think of a digitized wave as a lollipop graph, as in Figure 17 (right
subfigure). This helps remind you that the samples are not a bunch of blocky lines, but are in fact
just numbers sampled from a real-valued function (the original sound), at very specific and precise
times, and from which another real-valued function can be produced.
The Nyquist Limit and Aliasing The highest possible frequency that can be faithfully repre-
sented in a digitized sound of sampling rate n is exactly n/2. This is known as the Nyquist limit.
However it is possible to draw digital waves which contain within them higher frequency partials
than the Nyquist limit: these waves do not present themselves as proper partials of a given frequen-
cies, but instead create unusual artifacts known as aliasing (or foldover).10 To prevent aliasing,
audio devices apply a low pass filter to strip out frequencies higher than Nyquist before reducing
a sound wave to a given sampling rate. For more on this (and it’s pretty important), see Section 7.2.
Sampling Rates When a sound is sampled, the speed of the sampling is known as the sampling
rate. A sampling rate of n kHz means that one sample is done every 1/(n × 1000) of a second. One
common sampling rate is 44.1 kHz (that is, one sample every 1/44, 100 of a second): this is the
sampling rate of a compact disc, and is a common rate produced by many early digital synthesizers.
Another popular rate is 48 kHz (one sample every 1/48, 000 of a second): this is a common rate
in sound production: it was the sampling rate of Digital Audio Tape and had long been used in
laboratory settings. A third popular rate in sound production is 96 kHz.
Why these values? 44.1 kHz was chosen by Sony in 1979 for the Compact Disc for a very specific
reason. The maximum frequency that humans (typically teenagers) can hear is approximately
20 kHz. Thus a reasonable sampling rate for human-perceptible sound would be one which can
accommodate at least 20 kHz. However to prevent aliasing, a recording application would need to
apply a low-pass filter at 20 kHz. Low-pass filters cannot cut off frequencies precisely: they need
some degree of wiggle-room in frequency. It turns out that 2.05 kHz is adequate wiggle-room. This
means that the sampling rate would need to handle a grand total of 22.05 kHz. If you recall from
the Nyquist Limit, the sampling rate is twice the maximum frequency: hence 44.1 kHz.
48 kHz seems more reasonable: it too is sufficient to cover 20 kHz, with even more wiggle room
for the low-pass filter, and it’s divisible by many different integers. 96 kHz is simply twice 48 kHz.
Bit Depth The sampling rate defines the x axis of the digitized signal: the bit depth defines the
y axis. Bit depth is essentially the resolution of the amplitude of the wave. The most common bit
9 A DAC outputs sound. What device would do sound input, or sampling? That would be, naturally, an Analog-
could draw the wave as a bitmap, and found that if you drew a perfect sawtooth wave (a 45-degree angle going up to
the top, then a sharp vertical line going down, and repeating) it created strange artifacts. This was because it’s possible
to store a sawtooth wave as a digital representation, but this in fact was stuffing in some partials above the Nyquist limit
which created bad aliasing problems.
18
depth is 16 bits: that is, each sample is a 16-bit unsigned integer.11 This implies that a sample can
be any one of 216 = 65536 possible values. The notional center position is half this (32768); this is
the canonical 0-amplitude position. A sine wave would oscillate up above, then down below, the
center position.
You’d think that small bit depths result in a “low resolution” sound in some sense, but this
isn’t the effect. Rather, bit depth largely defines the dynamic range of the sound: the distance in
amplitude between the loudest possible sound representable and the quietest sound before the
sound is overwhelmed by hiss. The point at which you can’t hear quiet sounds any more because
there’s too much hiss is called the noise floor. This is also closely associated with the signal to
noise ratio of a medium. A higher bit depth largely translates into more dynamic range. Since
this is a difference in amplitudes, it’s measured in dB: a bit depth of n yields a difference in dB of
roughly 6n.
Viewed this way, even analog recording media can be thought of as having an effective “bit
depth” based on its dynamic range. A vinyl record has at most a “bit depth”, so to speak, of 10–11
bits (that is, 60–72 dB). A typical cassette tape is between 6–9 bits. Some very high end reel-to-reel
tapes might be able to achieve upwards of 13–14 bits. These are all quite inferior to CDs, at 16 bits.
And DVDs support a bit depth of 24 bits!
Compression Schemes Compression won’t come into play much in the development of a music
synthesizer, but it’s worth mentioning it. The human auditory system is rife with unusual charac-
teristics which can be exploited to remove, modify, or simplify a sound without us being able to
tell. One simple strategy used in many early (and current!) sound formats is companding. This
capitalizes on the fact that humans can distinguish between different soft- or medium-volume
sounds more easily than different high-volume sounds. Thus we might use the bits in our sample
to encode logarithmically: quiet sounds get higher resolution than loud sounds. Early techniques
which applied this often used either the µ-law or a-law companding algorithms.12
More famous nowadays are lossy compression schemes such as MP3, which take advantage
of a variety of eccentricities in human hearing to strip away portions of a sound without being
detected. For example, humans are bad at hearing sounds if there are other, louder sounds near
them in frequency. MP3 will remove the quieter sound under the (usually correct) assumption that
we wouldn’t notice. MP3 generally has a fixed bitrate, meaning the number of bits MP3 uses up to
record a second of audio. But if some sound has a lot of redundancy in it (as an extreme example:
total silence), some compression schemes take advantage of this to compress different parts of a
sound stream at different bitrates as necessary. This is known as a variable bitrate scheme.
Channels Another final factor in the size of audio is the number of channels it consumes. A
channel is a single sound wave. Stereo audio will consist of two parallel sound waves, that is,
two channels. Quadraphonic sound, which was designed to be played all around the listener, has
four channels. Channels may serve different functions as well: for example in a movie theater one
channel, largely for voice, is routed directly behind the screen, while two or more channels provide
a stereo field on both sides of the viewer, and an additional channel underneath the viewer drives
the subwoofer. Similar multi-channel formats have made their way into home theaters, such as 5.1
surround sound, which requires six channels.
11 You could certainly use a 2’s-complement signed representation with 0 at the center instead.
12 If you think about it, these are in some sense a way of representing your sound as floating-point.
19
20
3 The Fourier Transform
As discussed earlier, any sound wave can be represented as a series Im
of sine waves which differ in frequency, amplitude, and phase. In i eiφ = cos φ + i sin φ
Section 4, we will see how to take advantage of this to produce
sound through additive synthesis. Here we will consider the sub-
ject somewhat formally, and also discuss useful algorithms which sin φ
discuss later.
For any sound wave s(t), where t is the time, we have some
function S( f ) describing the frequency spectrum. This function,
with some massaging, provides us with the amplitude and phase
of each sine wave of frequency f participating in forming the
Figure 18 Euler’s Formula on the
sound wave s(t). As it turns out, both S( f ) and s(t) are functions complex plane. The X axis is the real
which yield complex numbers, though when used for sounds, the axis, and Y is the imaginary axis.©10
imaginary portion of s(t) is ignored (both the imaginary and real
portions of S( f ) are used to compute phase and magnitude).
We can convert from s(t) to S( f ) using the Fourier Transform.13 The Inverse Fourier Trans-
form does the opposite: it converts S( f ) into s(t). The two transform functions are so similar that,
as we’ll see, they’re practically the same procedure. It’s useful to first see the those sines and cosines
being constructed to form a sound wave. So let’s look at the Inverse Fourier Transform initially, to
get an intuitive feel for this:
Z ∞
s(t) = S(iω ) (cos(ωt) + i sin(ωt)) dω
−∞
Note that ω is the angular frequency of a sine wave. So what this is doing is, for every possible
frequency (including negative ones!), we’re computing the sine wave in both its real- and imaginary
components, multiplied by our (complex) spectral value at that frequency, S(iω ), which includes
both the amplitude and the phase of the sine wave in question. Add all of these sine waves up and
you get the final wave.
This isn’t the classic way to describe the Inverse Fourier Transform. Instead, we’d use Euler’s
Formula,14 eiθ = cos(θ ) + i sin(θ ), to cast the cos and sin into an exponential. This results in:
Z ∞
s(t) = S(iω ) (cos(ωt) + i sin(ωt)) dω
−∞
Z ∞
= S(iω )eiωt dω
−∞
13 The Fourier Transform is named after Joseph Fourier, who in 1822 showed that arbitrary waves could be represented
circle as shown in Figure 18. But maybe it’s easier to just explain with Taylor series. Here are three classic Taylor series
expansion identities:
θ2 θ4 θ6 θ3 θ5 θ7 θ2 θ3 θ4 θ5 θ6 θ7
cos(θ ) = 1 − + − +··· sin(θ ) = θ − + − +··· eθ = 1 + θ + + + + + + ···
2! 4! 6! 3! 5! 7! 2! 3! 4! 5! 6! 7!
21
As it turns out, the (forward) Fourier Transform is eerily similar to the inverse. Note that the
big difference is a minus sign:
Z ∞
S(iω ) = s(t) (cos(ωt) − i sin(ωt)) dt
−∞
Z ∞
= s(t)e−iωt dt
−∞
Yes, that’s going from negative infinity to positive infinity in time. The Fourier Transform
reaches into the far past and the far future and sums all of it.
Note that because our sampled sound is no longer infinite in length, we now have a notion of a
maximal wavelength: the biggest sine wave we can use in our sound is one whose period is T.
You could see it in sine/cosine form by applying Euler’s Formula16 again: remember it’s
eiθ = cos(θ ) + i sin(θ ). This yields:
1
15 Some people instead prefer to split the N among the two transformations as √1 , which results in nearly identical
N
equations:
1 N −1 1 1 N −1 1
S( f ) = √ ∑ s(t)e−i2π f t N
N t =0
s(t) = √ ∑ S( f )ei2π f t N
N f =0
(t, f = 0, 1, ..., N − 1)
16 By the way, a degenerate case of this formula is one of the most spectacular results in all of mathematics. Specifically,
if we set θ = π, then we have eiθ = eiπ = cos(π ) + i sin(π ) = −1 + i (0) = −1. From this we have eπi + 1 = 0, an
amazing equation containing exactly the five primary constants in mathematics
22
N −1
1 1
∑
S( f ) = s ( t ) cos − 2π f t
N
+ i sin − 2π f t
N
t =0
1 N −1 1 1
N f∑
s(t) = S( f ) cos 2π f t + i sin 2π f t (t, f = 0, 1, ..., N − 1)
=0
N N
Notice that these two transforming equations are identical except for a minus sign and 1/N.
This allows us to create a unified algorithm for them called the Discrete Fourier Transform or DFT
(and the Inverse Discrete Fourier Transform or IDFT).
4: Yr ← hYr0 ...Yr N −1 i array of N elements representing the real values of the output
5: Yi ← hYi0 ...Yi N −1 i array of N elements representing the imaginary values of the output
6: for n from 0 to N − 1 do
7: Yrn ← 0
8: Yin ← 0
9: for m from 0 to N − 1 do
10: if forward then
11: z ← −2πmn N1 . The only difference in the equations is the minus sign
12: else
13: z ← 2πmn N1
14: Yrn ← Yrn + Xrm cos(z) − Xim sin(z) . This is just the e... stuff
15: Yin ← Yin + Xim cos(z) − Xrm sin(z) . and multiplying complex numbers
16: if not forward then
17: Yrn ← YrNn
18: Yin ← YiNn
19: return Yr and Yi
23
s(t) Real
Imaginary 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Real
S(f)
Imaginary 0 0
0
N/2 N-1
Figure 19 Values of interest in the Time domain s(t) and Frequency domain S( f ) arrays for a Real-Valued DFT. Gray,
Red, and Blue boxes show numerical values of interest. Boxes with 0 in them should (or will) be set to 0. The red box at 0
is the value of the DC Offset. The blue box at N/2 is the value of the Nyquist frequency bin, and is only important to
retain if one ultimately needs to reverse the process via an inverse transform; otherwise it can be ignored. The blank
white boxes are just reflected complex conjugates of the gray boxes in S( f ), and can be ignored since they are redundant.
p
• The amplitude of the bin is just the magnitude |Yn |, that is, Yrn2 + Yi2n . When Yn is only real
valued, the magnitude is simply its absolute value.
Yin
• The phase of the bin is tan−1 Yr n
• The frequency of the bin (for the first N/2 + 1 elements) is n/N × R, where R is the sampling
rate (44.1K for example). As we’ll see in the next section, we’re really only interested in the
first N/2 + 1 elements.
24
3.4 The Fast Fourier Transform
The problem with the DFT is that it is slow: its two for-loops means that it’s obviously O( N 2 ).
But it turns out that with a few clever tricks we can come up with a version of the DFT which is
only O( N lg N )! This faster version is called the Fast Fourier Transform or FFT.17 The FFT uses a
divide-and-conquer approach to recursively call smaller and smaller FFTs.
Recall that the forward DFT looks like this:
N −1
1
S( f ) = ∑ s(t)e−i2π f t N
t =0
What if we divided the summing process into two parts: summing the even values of t and the
odd values of t separately? We could write it this way:
N −1
1
S( f ) = ∑ s(t)e−i2π f t N
t =0
N/2−1 N/2−1
1 1
= ∑ s(2t)e−i2π f (t×2) N + ∑ s(2t + 1)e−i2π f (t×2+1) N
t =0 t =0
M −1 M −1
1 1
= ∑ s(2t)e−i2π f (t×2) 2M + ∑ s(2t + 1)e−i2π f (t×2+1) 2M ( M = N/2)
t =0 t =0
M −1 M −1
1 1 1
= ∑ s(2t)e−i2π f (t×2) 2M + ∑ s(2t + 1)e−i2π f (t×2) 2M × e−i2π f 2M
t =0 t =0
M −1 M −1
1 1 1
= ∑ s(2t)e−i2π f t M + e−i2π f 2M × ∑ s(2t + 1)e−i2π f t M
t =0 t =0
M −1 M −1
1 1 1
= ∑ s(2t)e−i2π f t M + e−i2π f N × ∑ s(2t + 1)e−i2π f t M ( M = N/2)
t =0 t =0
Let’s call those two splits E( f ) and O( f ) for even and odd:
1
S( f ) = E( f ) + e−i2π f N × O( f )
It turns out that if we use this equation to compute S( f ) for f from 0...S( N/2 − 1), we can reuse
our E( f ) and O( f ) to compute S( f ) for N/2...N − 1. That’s the divide-and-conquer bit. So let’s
assume that the above derivation is for just the first case. We’ll derive a similar equation the second
case, S( f + N/2), otherwise known as S( f + M).
To do this we take advantage of two identities. The first is that for any integer k, it’s the case
that e−i2πk = 1. The second is that e−iπ = 1:
17 The DFT has been around since 1828, reinvented in many guises. The FFT in its current form is known as the
Cooley-Tukey FFT, by James William Cooley and John Tukey circa 1965. Tukey is famous for lots of things in statistics
as well, not the least of which is the invention of the box plot. But interestingly, the FFT in fact predates the DFT: it was
actually invented by (who else?) Carl Friedrich Gauss. Gauss developed it as part of his astronomical calculations in
1822, but did not publish the results. No one noticed even when his collected works were published in 1866.
25
N −1
1
S( f + M) = ∑ s(t)e−i2π ( f + M)t N
t =0
N/2−1 N/2−1
1 1
= ∑ s(2t)e−i2π ( f + M)(t×2) N + ∑ s(2t + 1)e−i2π ( f + M)(t×2+1) N
t =0 t =0
M −1 M −1
1 1
= ∑ s(2t)e−i2π ( f + M)(t×2) 2M + ∑ s(2t + 1)e−i2π ( f + M)(t×2+1) 2M
t =0 t =0
M −1 M −1
1 1 1
= ∑ s(2t)e−i2π ( f + M)(t×2) 2M + e−i2π ( f + M) 2M × ∑ s(2t + 1)e−i2π ( f + M)(t×2) 2M
t =0 t =0
M −1 M −1
1 1 1
= ∑ s(2t)e−i2π ( f + M)t M + e−i2π ( f + M) 2M × ∑ s(2t + 1)e−i2π ( f + M)t M
t =0 t =0
M −1 M −1
1 1 1 1 1
= ∑ s(2t)e−i2π f t M e−i2πMt M + e−i2π ( f + M) 2M × ∑ s(2t + 1)e−i2π f t M e−i2πMt M
t =0 t =0
M −1 M −1
1 1 1
= ∑ s(2t)e−i2π f t M + e−i2π ( f + M) 2M × ∑ s(2t + 1)e−i2π f t M (First Identity)
t =0 t =0
M −1 M −1
1 1 1 1
= ∑ s(2t)e−i2π f t M + e−i2π f 2M e−i2πM 2M × ∑ s(2t + 1)e−i2π f t M
t =0 t =0
M −1 M −1
1 1 1
= ∑ s(2t)e−i2π f t M − e−i2π f 2M × ∑ s(2t + 1)e−i2π f t M (Second Identity)
t =0 t =0
M −1 M −1
1 1 1
= ∑ s(2t)e−i2π f t M − e−i2π f N × ∑ s(2t + 1)e−i2π f t M
t =0 t =0
Notice that once again we have the same splits E( f ) and O( f )! So we can say:
1
S( f + N/2) = E( f ) − e−i2π f N × O( f )
What this all means is that to compute S( f ), we just need to compute O( f ) and E( f ), and then
use each of them twice. What are O( f ) and E( f )? They’re themselves Fourier Transforms on s(2t)
and s(2t + 1) respectively, and since they only go from 0...M − 1, they’re half the size of S( f )! In
short, to compute a Fourier Transform, we can compute two half-size Fourier Transforms, and then
use them twice each. This is recursive: each of them will require two quarter-size Fourier Transforms,
and so on until we get down to an array of just size 1.
26
Steps 3 and 4 together are N in length. Similarly when we’re inside O( f ) or E( f ), steps 3 and 4
are N/2 in length: but there’s two of them (O( f ) and E( f )). Continuing the recursion to the next
level, steps 3 and 4 are N/4 in length, but there are 4 of them, and so on, all the way down to N
individual computations of size 1. Thus at any level, we have O( N ) computations.
How many levels do we have? We start with 1 size-N computation, then 2 size-N/2 computa-
tions, then 4, then 8, ... until we get to N size-1 computations. The length of h1, 2, 4, 8, ..., N i is lg N.
So our total cost is O( N lg N ).
This divide-by-2-and-conquer strategy assumes, of course, that N is a power of 2. If your
sample count is not a power of 2, there are a number of options for handling this not discussed
here. The FFT is thus:
Algorithm 2 RecursiveFFT (Private Subfunction)
1: Xi ← h Xi0 ...Xi N −1 i array of N elements representing the imaginary values of the input
2: Xr ← h Xr0 ...Xr N −1 i array of N elements representing the real values of the input
3: N ← length of Xi
4: if N = 1 then
5: return Xr and Xi
6: else
7: Yi ← hYi0 ...Yi N −1 i array of N elements representing the imaginary values of the output
8: Yr ← hYr0 ...Yr N −1 i array of N elements representing the real values of the output
9: M ← N/2
10: Ei ← h Ei0 ...Ei M−1 i even-indexed elements from Xi . ∀ x : Ei x = Xi2x
11: Er ← h Er0 ...Er M−1 i even-indexed elements from Xr . ∀ x : Er x = Xr2x
12: Oi ← hOi0 ...Ei M−1 i odd-indexed elements from Xi . ∀ x : Oi x = Xi2x+1
13: Or ← hOr0 ...Or M−1 i odd-indexed elements from Xr . ∀ x : Or x = Xr2x+1
14: Ei, Er ← RecursiveFFT( Ei, Er )
15: Oi, Or ← RecursiveFFT(Oi, Or )
16: for n from 0 to M − 1 do . e−i2π f /N = cos(2π f /N ) − i sin(2π f /N )
17: θ ← −2πn/N
18: Yrn ← Ern + cos(θ )Orn
19: Yin ← Ein − sin(θ )Oin
20: for n from M to N − 1 do
21: θ ← −2πn/N
22: Yrn ← Ern − cos(θ )Orn
23: Yin ← Ein + sin(θ )Oin
24: return Yr and Yi
3: N ← length of Xi
4: Yi, Yr ← RecursiveFFT(Xi, Xr)
5: return Yr and Yi
27
You can also easily do the FFT in an iterative rather than recursive form, that is, as a big loop
largely relying on dynamic programming. It’s a bit faster and doesn’t use the stack, but it has no
computational complexity advantage.
There of course exists an Inverse Fast Fourier Transform or IFFT. We could change some signs
just like we did in the DFT: but instead let’s show off an alternative approach. It turns out that the
Inverse FFT is just the FFT on the complex conjugate of the data.18 That is:
IFFT(S) = conj(FFT(conj(S)))
...where conj(C) applies the complex conjugate to every complex number Ci ∈ C. If you have
forgotten, the conjugate of a complex number a + bi is just a − bi. So we could write it like this:
3: for n from 0 to N − 1 do
4: Xin ← 0 − Xin
5: Yi, Yr ← Fast Fourier Transform(Xi, Xr)
6: for n from 0 to N − 1 do
7: Yin ← 0 − Yin
8: return Yr and Yi
3.5 Windows
The Fourier Transform converts a sound of length N into amplitudes and phases for N/2 frequen-
cies stored in N/2 bins. But those aren’t necessarily all the frequencies in the sound: there’s no
reason we can’t have a frequency that lies (say) half-way between two bins. Storing a frequency
like this causes it to spread, or leak, into neighboring bins.
As a result, even a pure sine wave may not show up
as a 1 in a certain bin and all 0 in the other bins. Rather,
it might look something like Figure 20. In addition to the
primary (nearest) bin, we see leakage out into other bins.
The up/down pattern of leakage forms what are known as
sidelobes. Often we’d like to reduce the sidelobe leakage as
much as possible, and have the primary lobe to be as thin
as possible, ideally fitting into a single bin.
We can’t meet this ideal, but we have ways to rough it.
The approach is to preprocess our sampled sound with
a window before running it through the FFT. Using a
window function w(n) is very simple: you just multiply
it against each of your samples s0 , ..., s N −1 , resulting in
Figure 20 Sidelobes in an FFT (note that the
s0 × w(0), ..., s N −1 × w( N − 1): Y axis is on a log scale).©11
18 This works for the Inverse DFT too. And why not? They’re effectively the same procedure.
28
Algorithm 5 Multiply by a Window Function
1: Xr ← h Xr0 ...Xr N −1 i array of N elements representing sound samples
2: w (n, N ) ← window function
amplitude
decibels
If you have no window function, then w(n) = 1 0.6
0.5
-50
-60
-70
for all n. This is called the rectangular window. 0.4
0.3
-80
-90
Most window functions are zero or near-zero at the 0.2
0.1
-100
-110
-120
ends and positive in the center. There are many 0
0 N-1
-130
-40 -30 -20 -10 0 10 20 30 40
window functions, depending on your particular samples bins
2πn
40
amplitude
decibels
0.6 50
w(n, N ) = 0.53836 − (1 − 0.53836) × cos 0.5
60
N−1 0.4
70
80
90
0.3
0.2 100
110
This is far from the only window option: 0.1
0
120
130
sian, Tukey, Planck-Taper, Slepian, Kaiser, Dolph- Figure 21 Rectangular and Hamming Windows and
Chebyshev, Ultraspherical, Poisson, Lanczos, .... their effects on the frequency domain (note that the Y
axis for the frequency domain is on a log scale).©12
3.6 Applications
The Fourier Transform isn’t just for sounds; indeed this is a very minor use. It’s used for signal
processing of all kinds in everything from the study of of electrical circuits to spectral analysis of
stars. There are 2-D (or higher dimensional versions!) which operate on images and video. In fact,
fastest known way to multiply two very large (like million-digit ) numbers is to convert them with
the FFT, perform a special operation on them, and convert them back!
Though the FFT has many applications outside of the audio realm, let’s consider a few interest-
ing cases in sound processing alone:
• Visualization This one is obvious: an FFT is great at visualization. You can easily analyze
the amplitudes and phases at a variety of frequencies. If you do successive FFTs, perhaps one
per tenth of a second, you could create a spectogram, such as was shown in Figure 9 (another
example is the bottom subfigure — the frequency domain — Figure 12 on page 14).
• Filtering You can accentuate, lower, or entirely strip out partials by converting the sound to
the Fourier domain with an FFT, modifying (or zeroing out!) the amplitudes of interest, then
converting back to the time domain with an IFFT. Similarly, you could modify the phases
of various partials. In fact, it’s often much faster to do an FFT, perform modifications in the
frequency domain, then do an IFFT, than to just do the equivalent thing while in the time
domain! Section 11.4 discusses an example of this in depth.
29
• Pitch Scaling It used to be that pitch shifting was done by recording at a very slow speed,
then speeding it up: the Alvin and the Chipmunks effect. But the FFT can be used in a limited
fashion to pitch shift (up or down) without changing the speed. This is known as pitch scaling.
For example, to double frequency, just do an FFT, then just move each partial in the FFT array
to the slot representing twice its frequency. Then do an IFFT to go back to the original sound.
• Resynthesis A sound is sampled and analyzed, and then recreated (more or less) using a
synthesizer. One common use of resynthesis is a vocoder, which samples the human voice
and then recreates it with a vocal synthesis method. Some resynthesis techniques work
entirely in the time domain, but it’s not uncommon to perform resynthesis by pushing the
sound into the frequency domain where it’s easier to manipulate and analyze.
• Image Synthesis Here you start with a spectogram in the frequency domain, and allow the
musician to modify it as if it were an image: indeed he might create an “image” from scratch
rather than loading a sample in the first place. This is essentially tweaking the image much
as one might do in Adobe Photoshop, then playing the result. Such tools are called image
synthesis synthesizers. Many additive synthesizer tools, such as Image-Line Software’s
Harmor, also sport image-synthesis facilities.
30
4 Additive Synthesis
An additive synthesizer builds a sound by producing and modifying a set of partials, then adding
them up at the end to form the final sound wave. The partials could be added using an IFFT,
or more commonly by adding up a bunch of sine waves. Additive synthesis is one of the most
intuitive and straightforward ways of synthesizing sounds, and yet it is among the rarest due to
its high number of parameters. It’s not easy to develop an additive synthesizer that isn’t tedious
to use. The high computational cost of additive synthesizers has also restricted their availability
compared to other techniques.
4.1 History
Additive synthesis is easily the oldest form of electronic
music synthesis, and if we relaxed its definition to al-
low adding up waves beyond just sine waves, its history
stretches back in time much further than that.
Organ makers have long understood the effect of play-
ing multiple simultaneous pipes for a single note, each with
its own set of partials, to produce a final mixed sound. Pipe
organs are typically organized as sets of pipes (or stops),
one per note, which produce notes with a certain timbre.
As can be seen in Figure 23, stops are of many different
Figure 23 (Left) Ranks of Organ Stops.©14
shapes, and are made out of different materials, notably
(Right) Organ Stop Knobs.©15
steel and wood. A full set of stops of a certain kind, one
per note, is known as an organ rank.
Good organs may have many ranks. To cause an organ
to play a stop from a rank when a note is played, a control
called a stop knob or drawknob is pulled out. Organs can
play many ranks from the same note at once by pulling out
the appropriate stop knobs; in fact some ranks are even
designed to play multiple pipes in the same rank in response
to a single note (a concept called, in organ parlance, mix-
tures). If you wanted to go all-out, playing all the ranks
at the same time, you would pull out all the stop knobs: Figure 24 Rudolf Koenig’s synthesizer.©16
hence the origin of the term “to pull out all the stops”.
Early electronic synthesizer devices were largely ad-
ditive, using tonewheels (also called alternators). A
tonewheel, originally devised by Hermann von Helmholtz
and later Rudolf Koenig (Figure 24), is a metal disk or
drum with teeth (Figure 25). The tonewheel is spun, and
an electromagnet is placed near it, and as the teeth on the
tonewheel get closer or farther from the magnet, they in-
duce a current in the magnet which produces an electronic Figure 25 Diagram of a tonewheel. As the
wave.19 We can do a simple kind of additive synthesis by wheel spins, its teeth alternatively get closer to
summing the sounds from multiple tonewheels at once. or farther from an electromagnet, causing the
magnet to produce a wave.©17
19 This magnetic induction is essentially the same concept as an electric guitar pickup.
31
Figure 26 The Telharmonium. Tonewheel (“dynamo”) shown at bottom left.©18
32
The first significant electronic music synthesizer in history, Thaddeus Cahill’s massive Telhar-
monium, relied on summing tonewheels and was thus a kind of additive synthesizer. The idea
behind the Tellharmonium was that a single performer could produce a song electronically, which
then could be broadcast over telephone lines to many remote sites at once. Figure 26 shows the
Telharmonium in all its glory, including a tonewheel diagram at bottom left.
Tonewheels later formed the sound-generation mech-
anism (along with the famous Leslie rotating speaker) of
the Hammond Organ: and it too worked using additive
synthesis. The Hammond Organ sported nine drawbars
which specified the amplitudes of nine specific partials
ranging in frequency from one octave below the fundamen-
tal to three octaves above. These drawbars were linked to
tonewheels which produced the final sound.
Most later attempts in additive synthesis were in the
digital realm. In 1974 the Rocky Mount Instruments (or
Figure 27 Hammond B3 Organ.©19
RMI) Harmonic Synthesizer was probably the first elec-
tronic music synthesizer to do additive synthesis using digital oscillators. The Bell Labs Digi-
tal Synthesizer, a highly influential experimental digital synthesizer, was also entirely additive.
Fairlight’s Qasar M8 generated samples by manipulating partials, and then used an IFFT to pro-
duce the final sound. Finally (and importantly) the commercially successful, but quite expensive,
New England Digital Synclavier II sported additive synthesis along with other synthesis modes
(sampling, FM), putting additive synthesis within reach of professional music studios.
During the 1980s and 1990s, Kawai was the primary
manufacturer to produce additive synthesizers. Kawai’s
K3, K5, and later its much improved K5000 series brought
additive synthesis to individual musicians. Since the 1990s,
the method has not shown up much in commercial hard-
Figure 28 Kawai K5000s.©20
ware synthesizers, but it features prominently in a number
of software synthesizers, including AIR Music Technology’s Loom, Native Instruments Inc.’s Razor,
Image-Line Software’s Harmor and Harmless, and Camel Audio (now Apple)’s Alchemy.
4.2 Approach
Each timestep an additive synthe-
Partials
sizer produces and modifies an ar- Musician plays
a note with
Generator
Modifier
Generate and
Combiner Modifier
ray of partials, and once the array pitch and
volume Partials
Output the Sound
Modifier
is sufficiently modified, the synthe- Generator
33
Amplitude Amplitude
Amplitude
1.0 1.0
1.0
0.8 0.8
0.8
0.6 0.6
0.6
Figure 29 shows one possible pipeline for an additive synthesizer. This isn’t the only possibility
by far, but it serves as an example with many of the common elements:
• Partials Generators These are sources for arrays of partials. They could be anything. For
example, a generator might output one of several preset arrays of partials designed to produce
specific tones. A partials generator could also change the partials arrays it emits over time.
For example, a partials generator could emit one of 128 different arrays of partials, and the
particular array being emitted is specified by a parameter. This has a close relationship with a
technique discussed later called wavetable synthesis.
• Partials Modifiers These take arrays of partials and modify them, emitting the result. A
simple modifier might just amplify the partials by multiplying all of their amplitudes by a
constant. Or perhaps a modifier might change the frequencies of certain partials.
Another common modifier is a filter, which shapes partials by multiplying their amplitudes
against a filter function as shown in Figure 30. There are many possible filter function shapes,
though certain ones are very common. For example, a low pass filter cuts off high frequencies
after some point, whereas a high pass filter cuts off low frequencies. The filter in Figure 30
is an example of a high pass filter. There is also the band pass filter, which cuts off all
frequencies except those in a certain range, and the notch filter, which does the opposite.
Another common filter in additive synthesis is the formant filter, where the amplitudes of
partials are shaped to simulate the effect of the human vocal tract (see Section 8.11).
In other forms of synthesis which work in the time domain rather than the frequency domain,
filters can be tricky to implement: indeed all of Section 8 discusses these kinds of filters. But
with an additive synthesizer we are fortunate, because in the frequency domain a filter is
little more than a function which manipulates the arrays of partials.20
• Partials Combiners These take two or more arrays of partials and merge them somehow
to form a single array. If the partials are harmonics and both arrays contain the same
20 If you’d like a basic filter function for an additive synthesizer, try using the two-pole Butterworth filter amplitude
response equations in Section 8.9. For example, if you have a desired cutoff frequency ω0 > 0 (in radians) and
resonance Q > 0, then for each partial, given its frequency ω (in radians again), multiply its amplitude against
q 2
1 (1 − ω 2 /ω02 ) + (ω/(ω0 Q))2 to get a basic low-pass filter. To convert a frequency from Hz to radians, just multiply
√
by 2π. Also, resonance normally doesn’t drop below Q = 1/2, which is generally considered the minimum “no
resonance” position.
34
frequencies, then this could be as simple as adding together the amplitudes of the same-
frequency harmonics from both arrays: the additive version of mixing. If the partials have
arbitrary frequencies, and you need to produce a new array that is the same size as each of the
previous arrays, then you’d have to use some clever approach to cut out partials: for example,
you might throw all the partials together and then delete the highest frequency ones.
• Modulation All along the way, the parameters of the partials generators, modifiers, and
combiners can be changed in real time via automated or musician-driven modulation proce-
dures. A modulation signal typically varies from -1 to 1, or perhaps from 0 to 1. Modulators
can be used not only to change the parameters of the aforementioned modules, etc., but also
the parameters of other modulators. The two common kinds of automated modulators are:
– Low Frequency Oscillators or LFOs simply cause the signal to go up and down at a
certain rate specified by the musician.
– Envelopes vary the signal over time after a key has been pressed. For example, in
an Attack-Decay-Sustain-Release (or ADSR) envelope, when you press a note, the
envelope begins sweeping from 0 to some attack level over the course of an attack time.
When it reaches the attack level, it then starts sweeping back down to some sustain
level over the course of a decay time. When it reaches the sustain level, it stays there
until you release the key, at which point it starts sweeping back to zero over the course
of a release time. The musician specifies these values.
Modulation is absolutely critical to making realistic sounds. Consider for a moment that when
someone plays an instrument such as trumpet, we often first hear a loud and brash blare for a
moment, which then fades to a mellower tone. There are two things that are happening here.
First, the trumpet is starting loudly, then quickly dropping in volume. Second, the trumpet
is starting with lots of high-frequency harmonics, giving it a brash and buzzy sound, and
then quickly reduces to just low-frequency harmonics, resulting in a mellowing of tone. If we
wished to simulate this, we’d use a modulation procedure which, when a note was played,
made the sound louder and perhaps opened a low-pass filter to allow higher harmonics
through, and then soon thereafter quieted the sound and closed much of the filter (cutting out
the higher harmonics). This kind of modulation procedure calls for one or more envelopes.
Similarly if we wished to add tremolo (rapidly moving the volume up and down) or vibrato
(rapidly moving the pitch up and down) or another oscillating effect, we could use an LFO.
Other modulation mechanisms include Arpeggiators and Sequencers. We’ll cover all of
these in more detail in Section 5.
4.3 Implementation
An additive synthesizer can be implemented straightforwardly as a set of modules which offer
arrays of partials or individual modulation values to one another. Every so often the code would
update all of the modules in order, allowing them to extract the latest information out of other
modules so as to revise their own offered partials or modulation. A final module, Out, would
extract and hold the latest partials. Every time tick (more rapidly than the modules update) the
facility would grab the latest partials from Out and use them to update a sample, one sample per
tick. Here’s a basic monophonic additive synthesizer top-level architecture:
35
Algorithm 6 Simple Monophonic Additive Synthesizer Architecture
1: M ← h M1 , ..., Mm i modules
2: tick ← 0
3: counter ← 0
4: δ ← 0
5: α ← interpolation factor
6: ticksPerUpdate ← number of ticks to wait between updates . ticksPerUpdate = 32 works well
7: procedure Tick
8: if Note Released then
9: for i from 1 to m do
10: Released(Mi , pitch)
11: if Note Pressed then
12: for i from 1 to m do
13: Pressed(Mi , pitch, volume)
14: tick ← tick +1
15: counter ← counter +1
16: δ ← (1 − α ) × δ + α
17: if counter >= ticksPerUpdate then
18: counter ← 0
19: δ←0
20: for i from 1 to m do
21: Update(Mi , tick)
22: return OutputSample(tick, δ)
Note that a new note may be pressed before the previous note is released: this is known
as playing legato on a monophonic synthesizer. Some modules might respond specially to this
situation. For example, partials generators might gradually slide in pitch from the old note to the
new note, a process called portamento.
Interpolating the Array of Partials What’s the point of δ and α? The call to OutputSample(...)
in Algorithm 6 is called every tick: but the partials are only updated (via Update(...)) every
ticksPerUpdate ticks. If ticksPerUpdate > 1 then we will have a problem: even relatively small
changes in the amplitude and frequency of the partials can appear as abrupt changes in the
underlying sound waves, creating clicks.
The simplest way to fix this is to do partial interpolation. Let At−1 be the amplitudes of
the previous partials and At be the amplitudes of the current partials. Similarly, let F t−1 and
F t be their frequencies. For each partial i, we could define Ai and Fi to be the amplitude and
frequency, respectively, used to generate the next sample via Ai ← (1 − δ) × Ait−1 + δ × Ait , and
Fi ← (1 − δ) × Fit−1 + δ × Fit . Here δ is 0 when we receive a brand new set of partials, and gradually
increases to 1 immediately prior to when we receive the next new set.
In Algorithm 6 we’re passing in a δ to OutputSample(...) which can serve exactly this purpose.
Note that it’s being increased exponentially rather than linearly: I’ve found an exponential curve
to be much more effective at eliminating clicks. But you will need to set α such that, by the time
ticksPerUpdate ticks have expired, δ is within, oh, about 0.97 or so.
36
Warning Imagine that Ait = 0. Then as interpolation pushes Ai towards Ait , you could find Ai
mired in the denormalized21 range, and math with denormal numbers is can be extremely slow
for many programming languages. You need to detect that you’re getting close to the denormals
and just set Ai directly to 0. For example, if Ait < s and δ < s for a value of s somewhat above the
denormals, then Ai ← 0.
Generating a Sound from an Array of Partials At the end of the day, we must take the final array
of partials and produce one sample for our sound. Let us define a partial as a tuple hi, f , a, pi:
• Each partial has a unique ID i ∈ 0...N. This indicates the sine-wave generator responsible
for outputting that partial. If a partial’s position in the array never changes, this isn’t really
necessary: you could just use the array position as the ID. However it might be useful to
rearrange the partials in the array (perhaps because you’ve changed their various frequencies
in some module, and then re-sorted the partials by frequency). Keeping track of which partial
was originally for which generator is helpful because if a generator suddenly switched to a
different partial with a different phase or frequency or amplitude, you might hear an audible
pop as the generator’s sine wave abruptly changed.
• The frequency f of the partial is relative to the base frequency of the note being created: for
example, if the note being played is an A 440, and f = 2.0, then the partial’s frequency is
440 × 2.0 = 880.
• To keep things simple, the amplitude a ≥ 0 of the partial is never negative. If you needed a
“negative” amplitude, as in a Triangle wave, you could achieve this by just shifting the phase
by π.
• The phase p of the partial could be any value, though for our own sanity, we might restrict it
to 0 ≤ p ≤ 2π.
The sound generation facility maintains an array of sine-wave generators G1 ...GN . Each genera-
tor has a current instantaneous phase xi . Let’s say that the interval between successive timesteps
is ∆t seconds: for example, 1/44100 seconds for 44.1KHz. Every timestep each generator Gi finds
its associated partial hi, f , a, pi of ID i. It then increases xi to advance it the amount that it had
changed due to the partial’s frequency:
(t) ( t −1)
xi ← xi + f i ∆t
Let’s assume that the period of our wave corresponded to the interval xi = 0...1. Since our
wave is periodic, it’s helpful to always keep xi in the 0...1 range. So when it gets bigger than 1,
we just subtract 1 to wrap it back into that range. One big reason why this is a good idea is that
high values of xi will start having resolution issues given the computer’s floating-point numerical
accuracy. So we could say:
(t) ( t −1)
xi ← xi + f i ∆t mod 1 (1)
21 Denormalized numbers are a quirk of the IEEE 754 floating point spec. They are a set of numbers greater than zero
but less than the lowest positive exponent. For doubles, that means they’re roughly < 2−308 . Math with them typically
isn’t handled in hardware: it has to be done in software, if your language can’t automatically set denormals to 0. As a
result it’s hundreds, sometimes thousands, of times slower. You don’t want to mess with that.
37
For our purposes, mod 1 is the same thing as saying “keep subtracting 1 until the value is
in the range 0...1, excluding 1.” In Java, x mod 1 (for positive x, which is our case) is easily
implemented as x = x - (int) x; Once this is done for each xi , we just adjust all the sine waves
by their phases, multiply them by their amplitudes, and add ’em up. Keep in mind that the period
of a sine wave goes 0...2π, so we need to adjust our period range accordingly. So the final sample is
defined as:
(t)
∑ sin(2πxi + pi ) × ai
i
We might multiply the final result against a gain (a volume), but that’s basically all there is to it.
Sine Approximation The big cost in additive synthesis is the generation and summation of sine
waves. Don’t use the built-in sin function, it’s costly. Approximate it with a fast lookup table:
2: for i from 0 to 2n − 1 do
3: Si ← sin(2πi/2n )
You can make this much more accurate still by interpolating with the Catmull-Rom
j k cubic
2n 2n
spline (Equation 6, page 120). To map to that equation, let α = x × 2π − x × 2π . Then let
f ( x1 ) = S(i−1 mod 2n ) , f ( x2 ) = S(i mod 2n ) , f ( x3 ) = S(i+1 mod 2n ) , and f ( x2 ) = S(i+2 mod 2n ) . Slightly
slower than direct table lookup, but still far faster than the built-in sin function.
Buffering and Latency This is am important implementation detail you need to be aware of. In
most operating systems you will output sound by dumping data into a buffer. You can dump it in
a sample at a time, or (more efficiently) put in an array of samples all at once. The operating system
must not fully drain the buffer or it will start emitting garbage (usually zeros) as sound because it has
nothing else available. You have to keep this buffer full enough that this does not happen.
The pro blem is that the operating system won’t drain the buffer a byte at a time: instead it will
take chunks out of the buffer in fits and starts. This means you always have to keep the buffer filled
to more than the largest possible chunk. Different operating systems and libraries have different
chunk sizes. For example, Java on OS X (what I’m familiar with) has an archaic audio facility which
requires a buffer size of about 1.5K bytes. Low-latency versions of Linux can reduce this to 512
bytes or less.
You’d like as small a buffer as possible because keeping a buffer full of sound means that you
have that much latency in the audio. That is, if you have to keep a 1.5K buffer full, that’s going to
result in a 17ms audio delay. That’s a lot!
38
Timing How do you make sure that the Tick(...) method is called regularly and consistently?
There are various approaches: you could use timing code provided by the operating system, or poll
a getTime(...) method and call Tick(...) when appropriate. But there’s an easier way: just rely on
the audio output buffer. That is, if the buffer isn’t full, fill it, and each time you fill it with a sample,
you do so by calling Tick(...) once. As discussed before, the buffer gets drained in fits and starts,
and so your filling will be fitfull as well: but that doesn’t matter: all that matters is that all of your
time-sensitive code is in sync with the audio output. So base it directly on the output itself! That is,
I’d call the following over and over again in a tight loop:
• Determine the current pitch of the note (this can be modulated with an LFO or envelope).
• Build an array of 63 harmonics. You can set the amplitude of each of the harmonics separately.
You can also modulate the amplitudes of harmonics, either by assigning each of the harmonics
to one of four envelopes or to an LFO. The envelopes are the important part here: they allow
different harmonics to rise and fall over time, changing the sound timbre considerably.
• Run the harmonics through some kind of filter. The filter has its own envelope and can be
modulated via a LFO.
• Run the harmonics through an amplifier. This amplifies all the harmonics as a group, much
as a sound is amplified (as opposed to earlier in the pipeline, when each harmonic could
have its amplitude changed independently). The amplifier has its own envelope and can be
modulated via a LFO.
39
• Run the harmonics through a formant filter. This filter can be used to adjust the harmonics
to simulate the formant properties of the human vocal tract.
• This pipeline happens twice, for two independent sets of 63 harmonics each. This can be done
in parallel to make two independent voices per note, or one set can be assigned to harmonics
1...63, while the other set is assigned to harmonics 65...127 to create a richer sound with many
higher-frequency harmonics.22
The challenge here is that even with this simple architecture, there were 751 parameters, as
every harmonic had its own amplitude and modulation options. The amplifier, filter, and pitch all
had their own 6- or 7-stage envelopes, as well as the four envelopes that the harmonics could be
assigned to: and this was for each of the two sets of harmonics. It was not easy to program the
Kawai K5.
The K5 was also not a good sounding synthesizer.23 But ten years later, Kawai tried again with
the K5000 series (Figure 31) and produced a much better design. The architecture was similar in
many respects, but with one critical difference: every harmonic now had its own independent
envelope. This allowed for much richer and more complex sounds (but even more parameters!)
40
the large majority of patches only employ small tweaks of standard modules. Rather than tediously
manipulate the individual partials in a sound one by one (though you can do that), Flow is instead
geared more towards pushing arrays through various manipulation and filter modules as a whole.
Flow fixes the number of partials, often to 256. It also disregards phase, and a partial only has
frequency, amplitude, and an ID. Flow can manipulate the frequency and amplitude of partials in a
wide variety of ways and can combine and morph25 partials from multiple sources.
Related tools include AIR Music Technology’s Loom, Native Instruments Inc.’s Razor, and
Image-Line Software’s Harmor. These aren’t fully modular in the sense that Flow is, but they
can organize groups of additive modules together in a linear pipeline. Such tools also may have
variable numbers of partials, and may include phase.
25 Morphing works like this. For each pair of partials, one from each incoming set, produce a new resulting partial
which is the weighted average of the two both in terms of frequency and amplitude. You’d then modulate the weight.
41
42
5 Modulation
By themselves, the audio pipeline modules will produce a constant tone: this might work okay for
an organ sound, but otherwise it’s both boring and atypical of sounds generated by real musical
instruments or physical processes. Real sounds change over time, both rapidly and slowly. To
make anything which sounds realistic, or at least interesting, requires the inclusion of mechanisms
which can change pipeline parameters over time. These are modulation sources.
Modulation signals come from two basic sources:
• The musician himself through various interface options: buttons, knobs, sliders, and so on.
Some of these are general-purpose and can be assigned as the musician prefers. These might
include the modulation wheel, pitch bend wheel, so-called expression pedals, and from
the keyboard’s velocity, release velocity, and aftertouch, among others. For definitions and
more information on these modulation interface options, see Section 12.
How a modulation signal modulates a parameter depends on the range of the parameter. Some
parameters, such as volume, are unipolar, meaning that their range is 0...N (we could just think
of this as 0...1). Other parameters, such as pitch bend, might be bipolar, meaning that their range
is − M... + M (perhaps simplified to −1/2... + 1/2 or −1... + 1). It’s trivial to map a unipolar to a
bipolar signal or vice versa, of course, and synthesizers will often do this.
Another issue is the resolution of the parameter. Some parameters are real-valued with a high
resolution; but others are very coarse-grained. And even if a parameter is high-resolution, some
modulation signals it could receive — notably those provided over MIDI (Section 12.2) — can be
very coarse, often just 7 bits (0...127). In this situation, gradually changing the modulation signal
will create a zipper effect as the parameter clicks from one discretized value to the next.
26 One
desirable property of an LFO is the ability to go high into audio rate, so as to effect a form of frequency
modulation (or FM) on audio signals. We’ll cover FM in Section 9.
43
Section 12.2.2). The period of an LFO in a voice is normally reset when the musician plays a new
note, unless the LFO has been set free.27
In a monophonic synthesizer a new note might be pressed before the last one was released
(known as playing legato). Some LFOs might prefer to not reset in this situation, because the new
note may be perceived by the listener essentially as a continuation of the previous one.
Once you’ve got a master clock providing ticks (see Section4.3), implementing an LFO is pretty
straightforward: you just have to map the ticks into the current cycle position (between 0 and 1).
You could do this with division, or you could do it by incrementing and then truncating back to
between 0 and 1. Each has its own numerical issues. I’ve chosen the latter below.
Algorithm 10 Simple Low Frequency Oscillator
1: r ← rate . In cycles per tick
2: type ← LFO type
3: free ← is the LFO free-running?
4: legato ← did a legato event occur (and we care about legato)?
9: procedure Update
10: s ← s+r
11: if s ≥ 1 then
12: s ← s mod 1 . Easily done in Java as s = s - (int) s
13: if type is Square
then
−1 s < 1/2
14: return
1 otherwise
15: else if typeis Triangle then
s × 4 − 1 s < 1/2
16: return
3−4×s otherwise
17: else if type is Sawtooth then
18: return (1 − s) × 2 − 1
19: else if type is Ramp then
20: return s × 2 − 1
21: else . Type is Sine. See Section 4.3 for fast Sine lookup
22: return sin(s × 2π ) × 2 − 1
Random LFO Oscillators LFOs often also have a random oscillator. For example, every period it
might pick a random new target value between −1 and 1, and then over the course of the period it
would gradually interpolate from its current value to the new value, as shown in Figure 33. We
might adjust the variance in the choice of new random target locations. I’d implement it like this:
27 Unlike
audio-rate oscillators, phase matters for an LFO, since we can certainly detect out-of-phase LFOs used to
modulate various things.
44
1.0 1.0 1.0 1.0
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
-1.0
-1.0 -1.0 -1.0
Ramp The same Ramp fed Random The same Random fed
into 4× Sample and Hold into 1× Sample and Hold
45
Note that we’re picking delta values from −2...2. This is so that, at maximum variance, if we’re
currently at −1, we could shift to any new target value clear up to +1 (and similarly vice versa).
With smaller and smaller variance, we’ll pick new target values closer and closer to our current
value.
Sample and Hold Many synthesizers also have a special function called sample and hold, or
S&H, which takes a modulation input and produces a discretized modulation output. Every period
it samples the current value of the input, and during the course of the period it outputs only that
value, ignoring all later inputs. Like an LFO, Sample and Hold may respond to free running and to
legato. Here is one simple implementation:
Sample and hold can be applied to any modulation source to produce a “discretized” version
of the modulation,28 but it’s particularly common to feed apply it to an LFO. Discretizing sawtooth,
ramp, triangle, and random LFO waves are common: in fact, sample and hold is so often applied
to random LFO waves that it’s typically a full-fledged wave option in many LFO generators. The
coarseness of discretization would depend on the sample and hold rate. Figure 33 shows examples
of sample and hold, at two different frequencies, applied to ramp and random waves.
5.2 Envelopes
An envelope is a time-varying function which, when triggered, starts at some initial value and then
follows the function until it is terminated. Very commonly the trigger is when the musician plays a
note. Software or hardware which produces an envelope function is called an envelope generator.
28 Yes, if the sample and hold is a high enough rate, it’ll sound just like the zipper effect discussed earlier (page 43).
46
ADSR By far the most common envelope used in syn-
thesizers is the Attack-Decay-Sustain-Release envelope, Attack
Level
or ADSR. When triggered (again, usually due pressing a
key), the envelope starts rising from a start value — usually
0 — up to an attack level (sometimes fixed to 1.0) over the Sustain
Level
course of an attack time interval. Once the interval has
been exhausted, the envelope then begins to drop to a sus-
tain level over the course of a decay time. At that point
the envelope holds at the sustain level until a different trig- Attack Decay (Sustain) Release
ger occurs (usually due to the musician releasing the key). Interval Interval Interval
47
Algorithm 13 Simple Linear Time-based ADSR
1: r ← rate . In envelope time units per tick
2: X ← { X0 , ..., X4 } . Time when stage ends. X2 , X4 = ∞
3: Y ← {Y0 , ..., Y4 } . Target parameter value of stage. Y1 = Y2 , and Y3 = Y4 = 0
4: legato ← did a legato event occur (and we care about legato)?
This envelope changes linearly with time. To change exponentially, a time-based ADSR would
need a call to pow() to compute the exponential change, which is very costly. Instead, you could do
a call to, say, x4 , which works pretty well. To do this I’d just replace line 26 with these two lines:
γ ← (s/Xi )
γ ← γ×γ×γ×γ
To adjust the rate of attack/decay, just revise the number of times γ appears in the multiplication.
Because it is multiplying rather than adding, an exponential rate-based envelope will never
reach its target, in Zeno’s Paradox fashion. Thus we need an additional threshold variable, e, which
tells us that we’re “close enough” to the target to assume that we have finished. This value should
be pretty small, but not so small as to get into the denormals (See Footnote 21, page 37).
48
Algorithm 14 Simple Exponential Rate-based ADSR
1: X ← { X0 , ..., X4 } . Exponential rate for stage. X2 , X4 = 1.
2: Y ← {Y0 , ..., Y4 } . Target parameter value of stage. Y1 = Y2 , and Y3 = Y4 = 0.
3: e ←Threshold for switching to new stage. Low. . Should be large enough to avoid denormals!
4: legato ← did a legato event occur (and we care about legato)?
49
The envelopes discussed so far are generally unipolar. But there do exist bipolar multi-stage
envelopes. There’s nothing special about these: their values can simply range anywhere from
−1... + 1 instead of from 0...1.
50
Gizmo is an Arduino-based device for sending and receiving MIDI with many applications.
Gizmo’s step sequencer is laid out as a 2D array, where the X dimension is the stage or step number,
and the Y dimension is the track. Gizmo supports up to 96 steps and up to 12 tracks, depending on
how you want to allocate the Arduino’s absurdly tiny memory. Each track is a sequence of either
notes or parameters, one per cell in the track row. When a track stores notes, its cells contain both
pitch and volume (per note), or can also specify rests (meaning “don’t play anything here”) or ties
(meaning “continue playing the previous note”).
After the musician has entered the relevant data into the step sequencer, it will loop through its
stages, and at each stage it will emit MIDI information corresponding to all the notes and parameter
settings at that stage. Gizmo’s step sequencer can be pulsed by an external clock, or it can run on
its own internal clock, in which case you’d need to specify its tempo.
Options Because step sequencers often deal with note or event data, they usually have a number
of options. Here are a few of Gizmo’s. First you can specify swing, that is, the degree of syncopation
with which the notes are played. Second, you can specify the length of each note before the note is
released. Tracks can have independent per-track volume (as opposed to per-note volume) and also
have a built-in fader to amplify the volume as a whole. Tracks can be muted or soloed, and you
can specify a pattern for automating muting in tracks. Finally, after some number of iterations, the
sequencer can load an entirely different sequence and start on that one: this allows you to have
multiple sections in a song rather than a simple repeating pattern the whole time.
Implementation A step sequencer can be very simple. Here is a very basic step sequencer for
modulating parameters much as is done in an envelope or LFO. And like an LFO, a step sequencer
may respond to free running and to legato.
51
5.4 Arpeggiators
An arpeggiator is a relative of the step sequencer whose purpose is to produce arpeggios. An
arpeggio is a version of a chord where, instead of playing the entire chord all at once, its notes
are played one by one in a pattern. Arpeggiators are only used to change notes: they’re not used
to modulate parameters, and so are not formally modulation devices. The classic arpeggiator
intercepts notes played on the keyboard and sends arpeggios to the voices to play instead. As the
musician adds or removes notes from the chord being played, the arpeggiator responds by adding
or removing them from its arpeggiated note sequence.
Options An arpeggiator is usually outfitted with a note latch facility, which continues to play the
arpeggio even after you have released the keys. Only on completely releasing all the keys and then
playing a new chord does the arpeggio shift. You can usually also specify the number of octaves
an arpeggiator plays: with two octaves specified and the chord C E G, the arpeggiator might
arpeggiate C E G, then the C E G above them, before returning to the originals. Like a sequencer, an
arpeggiator might also be subject to swing, tempo, note length, and note velocity.
Arpeggiation Patterns Arpeggiators usually offer a variety of arpeggio patterns. Here are some
of Gizmo’s built-in offerings (and they are typical):
• Up-Down Repeatedly play the chord notes lowest-to-highest, then back down highest-to-
lowest. Do not play the lowest or highest notes twice in a row.
• Assign Repeatedly play the chord notes in the order in which they were struck by the
musician when he played the chord.
• Custom Repeatedly play the chord in a pattern programmed by the musician (including
ties and rests).
52
note on a keyboard, it would send a gate-high signal to the synthesizer to indicate that some note is
being pressed. Gates are also used as triggers from sequencers etc. to indicate new events.
Second, there are control voltage or CV signals. These are simply signals whose voltage varies
continuously within some range. CV comes in both unipolar and bipolar ranges. For example,
most envelopes are unipolar: an envelope’s CV range would be 0–5 or 0–8 volts. On the other hand,
an LFO is bipolar, and its wave would be outputted in the range ±5 volts. Note that audio is also
bipolar and in a similar range: thus audio and bipolar CV are essentially interchangeable.
In addition to a gate signal (indicating that a note was pressed), a keyboard would normally
also output a unipolar CV signal to indicate which note was being played. This would be usually
be encoded as 1 volt per octave: perhaps the lowest C (note) might be 0 volts, the C one octave
above would be 1 volt, the next C would be 2 volts, and so on.33 A sequencer could similarly be
configured.
33 This is known as volt per octave. Many Korg and Yamaha synthesizers used an alternative encoding: hertz per
volt. Here, rather than increasing voltage by 1 per octave, the voltage would double for one octave. This had the nice
quality of being equivalent to frequency, which likewise doubles once per octave.
34 Modulation matrices as an alternative to cables were common in hardware, where they were known as patch
matrices. For example, the EMS VCS3 sported one, albeit with a source and destination, but no modulation amount.
See Figure 48 in Section 6.1. To my knowledge, the first software modulation matrix in a stored-program commercial
synthesizer appeared in Oberheim’s aptly named Matrix series of analog synthesizers.
53
5.7 Modulation via MIDI
Since the early 1980s, nearly all non-modular synthesiz-
ers (and some modular ones!) have been equipped with
MIDI, a serial protocol to enable one device (synthesizer,
computer, controller, etc.) to send messages or remotely
manipulate another one. MIDI is most often used to send
note data, but it can also be used to send modulation infor-
mation as well.
For example, consider the keyboard in Figure 40. This
keyboard makes no sound: it exists solely to send MIDI ©22
information to a remote synthesizer in order to control it. Figure 40 Novation Remote 25 controller.
And it is filled with options to do so. In addition to a two-octave keyboard, it has numerous buttons
which can send on/off information (the analogue of Gate), and sliders, encoders, potentiometers, a
joystick, and even a 2D touch pad which can send one or two real-valued signals each (the analogue
of CV).
In MIDI, this kind of control data is sent via a few special kinds of messages, notably Control
Change or CC messages. Note however that CC is fairly low-resolution and slow: changes in
response to CC messages may be audibly discretized, unlike the smooth real-valued CV signals in
modular systems. We could deal with this by smoothly interpolating the discrete changes in the
incoming MIDI signal, but this is going to create quite a lot of lag.
See Section 12.2 for more information about MIDI.
54
6 Subtractive Synthesis
Subtractive synthesis is the most common synthesis method, and while it’s not as old as additive
synthesis, it’s still pretty old: it dates from the 1930s. The general idea of subtractive synthesis is
that you’d create a sound, then start slicing into it, removing harmonics and changing its volume,
and the parameters of these operations could change in real time via modulation as the sound
is played. Quite unlike additive synthesis, subtractive synthesis typically is done entirely within
the time domain. This can be more efficient than additive synthesis, and involves many fewer
parameters, but many things are more difficult to implement: for example, building filters in the
time domain is far more laborious than in an additive synthesizer.
Much of the defining feature of a subtractive synthesizer is its pipeline. The basic design of a
typical subtractive synthesizer (such as in Figure 64) is as follows:
• Oscillators produce waveforms (sound). In the digital case, this is one sample at a time.
• These waveforms are combined in some way to produce a final waveform.
• The waveform is then filtered. This means that it is passed through a device which removes
or dampens some of its harmonics, shaping the sound. This is why this method is called
subtractive synthesis. The most common filter is a low pass filter, which tamps down the
high-frequency harmonics, making the sound more mellow or muffled.
• The waveform is then amplified.
• All along the way, the parameters of the oscillators, combination, filters, and amplifier can be
changed in real time via automated or human-driven modulation procedures.
6.1 History
The earliest electronic music synthesizers were primarily additive, but these were eventually
eclipsed by subtractive synthesizers, mostly because subtractive synthesizers are much simpler
and less costly to build. Whereas additive synthesis has to manipulate many partials at once to
create a final sound, subtractive synthesis only has to deal with a single sound at a time, shaping
and adjusting it along the way.
Subtractive synthesis has a long and
rich history. A good place to start is the
Trautonium (Figure 41), a series of de-
vices starting around 1930 which were
played in an unusual fashion. The de-
vices had a taut wire suspended over
a metal plate: when you pressed down
on the wire so that it touched the plate,
the device measured where the wire
was pressed and this determined the Figure 41 (Left) The Telefunken Trautonium, 1933.©23 (Right) Oskar
pitch.35 You could slide up and down Sala’s “Mixtur-Trautonium”, 1952.©24
35 The plate would pass electricity into the wire where you pressed it. The wire had high resistance, so it acted like
a variable resistor, with the degree of resistance proportional to the length of the wire up to the point where it was
touching the plate.
55
the wire, varying the pitch. The pitch drove an oscillator, which was then fed into a filter. The
volume of the sound could be changed via a pedal. Versions of the Trautonium became more
and more elaborate, adding many features which we would normally associate with modern
synthesizers culminating in sophisticated Trautoniums36 such as the Mixtur-Trautonium.
We will unfairly skip many examples and fast forward
to the RCA Mark I / II Electronic Music Synthesizers.
The Mark I (1951) was really a music composition sys-
tem: but the Mark II (1957) combined music composition
with real-time music synthesis; and this was the first time
the term “Music Synthesizer” or “Sound Synthesizer” was
used to describe a specific device. The Mark II was installed
at Princeton and was used by a number of avant-garde mu-
sic composers. The Mark II was highly influential on later
approaches to subtractive synthesis: as can be seen from
Figure 43, the Mark II’s pipeline has many elements that Figure 42 RCA Mark II.©25
are commonly found in music synthesizers today.
Sequencer
(punched paper)
where patch cables would be inserted to attach the out- Envelope Envelope
“Timbre control” “Timbre control”
put one one module to an input on another. By connecting (for Filter?) (for Filter?)
modules via a web of patch cables, a musician could cus- High Pass and Resonance Resonance
High Pass and
Low Pass Filters Low Pass Filters
tomize the synthesizer’s audio and modulation pipeline.
The knob and button settings and patch wiring together Volume Control LFO
(for Volume?)
LFO
(for Volume?)
Volume Control
56
Don Buchla primarily built machines for aca-
demics and avant-garde artists in California, no-
tably Ramon Sender and Morton Subotnick,
and so his devices tended to be much more ex-
ploratory in nature. Buchla would use uncommon
approaches: a “Low-Pass Gate” (essentially a com-
bination of a low pass filter and an amplifier), a
“Source of Uncertainty”, a “Complex Waveform
Generator” (which pioneered the use of wave
folding), and so on. Buchla also experimented
with nonstandard and unusual controllers, includ-
ing the infamous “Multi Dimensional Kinesthetic
Input Port”, shown at the bottom of Figure 45.37 Figure 45 Buchla 200e modular synthesizer.©28 “Multi Di-
mensional Kinesthetic Input Port” at bottom.
Moog’s and Buchla’s synthesizers were very
influential, and formed two schools of synthesizer design, traditionally called East Coast (Moog)
and West Coast (Buchla). The East Coast school, with more approachable architectures and
elements, has since largely won out. Most modern subtractive synthesizers are variants of classic
East Coast designs. However, the West Coast school has lately enjoyed a resurgence in popularity
among modern-day modular synthesizer makers.
37 Suzanne Ciani is an artist famous for using Buchla’s unusual methods to their fullest. Google for her.
38 Fun fact: the ARP 2600 is the synthesizer which produced all of R2-D2’s sounds, as well as those of the Ark of the
Covenant in Raiders of the Lost Ark. ARP is the initials of its founder, Alan R Pearlman.
39 A monster example of a patch matrix is the predecessor to the ARP 2600, the ARP 2500. Google for it. The patch
matrix (really a patch bus) appears both above and below the modules. The 2500 was featured in Close Encounters of
the Third Kind.
40 The VCS 3, and its little brother, the EMS Synthi, were often used by Pink Floyd. They produced many of the
sounds in On The Run, a famous instrumental song off of Dark Side of the Moon.
57
Compact (“Analog”) Synthesizers The 1970s also saw
the proliferation of synthesizers with very limited modu-
larity or with none at all. Rather, manufacturers assumed
a standard pipeline and added many hard-coded optional
routing options into the synthesizers in the hopes that
patching would not be necessary. The premiere example of
a synthesizer like this was the Moog Minimoog Model D,
widely used by pop and rock performers of the time, and
Figure 48 (Left) The EMS VCS 3.©31 (Right)
popular even now. A close-up view of the VCS 3 patch matrix.©32
The Model D had a classic and
simple pipeline: three oscillators fed
into a mixer, which then was put
through a low-pass filter and an am-
plifier. The filter and amplifier each
had their own envelope, and the
third oscillator could be repurposed
to serve as a low-frequency modula-
tion oscillator. And that’s about it!
But this very simple framework, typ- Figure 49 (Left) Moog Minimoog Model D.©33 (Right) Arp Odyssey.©34
ical of Moog design, proved able to
produce a wide range of melodic sounds. The Model D is shown in Figure 49, along with a popular
competitor, the ARP Odyssey.
41 There are different ways you could achieve polyphony. One is to give every key its own monophonic synthesizer
pipeline. If you have N keys, that’s essentially N synthesizers: an expensive route! Another approach would be to use a
small set of oscillators (perhaps 12 for one octave) to produce notes, and then do frequency division — shifting by one or
more octaves — to simultaneously produce multiple notes. That’s the approach the Novachord took. A third approach,
taken by most modern polyphonic synthesizers, is to have a small bank of M pipelines: when a note is pressed, a free
pipeline is temporarily assigned to it to play its voice. Even so, you’d really like to have rather more voices than fingers,
because a voice doesn’t stop when you let go of a key. It trails off with the release envelope, and so the synthesizer may
need to steal a currently playing voice in order to allocate one for a new note being played. That doesn’t sound good.
Finally, a synthesizer could be paraphonic. This usually means that it has multiple oscillators, each playing a different
note, but then they are combined and pushed through a single filter or amplifier. Many 2- or 3-oscillator monophonic
synthesizers can play in paraphonic mode. Note that this means that only one note would trigger the envelopes
controlling filters and amplifiers: this isn’t likely to achieve the sound of polyphony that you’d expect.
58
Polyphony really didn’t start to come into its own until the
1970s. The Oberheim 4-Voice (Figure 51) and 8-Voice were the
first commercially successful polyphonic synthesizers, and were
built up out of many small monophonic synthesizers developed
by Tom Oberheim, called his Synthesizer Expander Modules or
SEMs.42 You could play a single SEM, or two, etc., up to the huge 8-
Voice. By design these modules had to be programmed individually. Figure 51 Oberheim 4-Voice.©36
This could produce impressive sounds but was tremendous work.
A device to the left of the keyboard (see the Figure) made it easier
to synchronize their programming for some tasks.
The Yamaha CS series, notably the Yamaha CS-80 (Figure 52),
also offered eight voices, and emphasized expressive playing.
These machines are still legendary (and costly!), as the CS-80 was
Figure 52 Yamaha CS-80.©37
the synthesizer used by Vangelis on his legendary soundtracks
(notably Blade Runner and Chariots of Fire) and its unique and
easily recognized sound is difficult to replicate.
The Korg PS series continued the Novachord tradition of total
polyphony: every key had its own independent note circuitry in
the extraordinary, and very expensive, Korg PS-3300 (Figure 53).
59
Sequential Circuits’s Dave Smith realized that as synthesizers became cheaper, musicians
would be acquiring not just one but potentially many of them. Outfitted with a CPU and the
ability to store patches in RAM, such synthesizers would benefit from communicating with one
another. This would enable synthesizers or computers to play and control other synthesizers, and
to upload and download stored patches. To this end Sequential Circuits and Roland proposed the
Musical Instrument Digital Interface, or MIDI. MIDI soon caught on, and since then essentially
all hardware synthesizers now come with it. MIDI is discussed in Section 12.2.
The Rise of Digital The early 1980s also saw the birth of the
digital synthesizer. This wave, starting with FM synthesizers,
and culminating in samplers, wavetable synthesizers, and PCM
playback synthesizers (derisively known as ROMplers), all but
eliminated analog synthesizers from the market. While many of
©44
these synthesizers employed pipelines similar to subtractive ones, Figure 58 Clavia Nord Lead 2x.
their oscillator designs were quite different. They also generally had many more parameters than
analog synthesizers, but to keep costs down their design tended towards a menu system and
perhaps a single data entry knob, making them very difficult to program.
In 1995 Clavia introduced the Nord Lead (Figure 58), a new
kind of digital synthesizer. This synthesizer attempted to emulate
the characteristics, pipeline, modules, and style of a classic analog
subtractive synthesizer, using digital components. Clavia called
this a virtual analog synthesizer. Since the introduction of the
Nord Lead, virtual analog synthesizers have proven popular with
manufacturers, largely because they are much cheaper to produce
than analog devices. Perhaps the most famous example of this
©45
is the Korg microKORG (Figure 59). This was an inexpensive Figure 59 Korg microKORG.
virtual analog synthesizer with an additional microphone and vocoder, a device to sample and
resynthesize the human voice (here, for the purpose of making a singer sound robotic). The
microKORG is considered one of the most successful synthesizers in history: it was introduced in
2002, sold at least 100,000 units as of 2009, and is still being sold today.
60
Virtual analogs are software emulations of synthesizers em-
bedded in hardware: but there is no reason that one couldn’t just
do software emulation inside a PC. Many digital synthesizers —
subtractive or not — now take the form of computer programs
commonly called software synthesizers or softsynths. These of-
ten take the form of plugins to Digital Audio Workstations using
plugin library APIs, such as Steinberg’s Virtual Studio Technol-
ogy (or VST) or Apple’s Audio Unit (or AU). Figure 60 shows two
examples: PG-8X, a softsynth inspired by Roland’s JX-8P analog
synthesizer; and OBXD, an emulation of the Oberheim OB-X or
OB-Xa.
61
6.2 Implementation
The classic subtractive synthesis Oscillator
Musician plays
pipeline, shown in Figure 64, is sim- a note with Combiner Filter Amplifier Output
pitch and
ilar in some sense to the additive volume
Oscillator
pipeline in Figure 29. The big differ-
ence, of course, is that the modules Modulates Modulates Modulates
3: procedure Tick
4: tick ← tick +1
5: if Note Released then
6: for i from 1 to m do
7: Released(Mi , pitch)
8: if Note Pressed then
9: for i from 1 to m do
10: Pressed(Mi , pitch, volume)
11: for i from 1 to m do
12: Update(Mi , tick)
13: return OutputSample(tick)
The function OutputSample(tick) would simply take the most recent sample it’s received and
submit it to the audio stream to be played. And as was the case for the additive version of this
62
algorithm, in a monophonic subtractive synthesizer a new note could be pressed before the previous
note was released (playing legato), and some modules might respond specially when this happens,
such as doing a portamento slide from the old note to the new one.
The end of Section 4.3 had some critical discussion about buffering and latency and how to
make the Tick() method consistent in timing for additive synthesizers. That discussion applies in
this case as well.
Oberheim Matrix 6 The Matrix 6 is a 6-voice, polyphonic analog subtractive synthesizer with
analog but Digitally Controlled Oscillators (DCOs) produced by Oberheim between 1986 and
1988. The Matrix 6 came in three forms: a keyboard, a rack-mount version without a keyboard (the
Matrix 6R), and a small rackmount version designed largely for presets (the Matrix 1000). Like
many digital synthesizers of the time, and contrary to nearly all prior analog synthesizer tradition,
the Matrix 6 eschewed knobs and switches. It instead relied entirely on a tedious keypad entry
system to set its 100-odd patch parameters. In fact, the Matrix 1000 could not be programmed at all45
from its front panel: all you could do was select from approximately 1000 presets.
The Matrix 6 had two oscillators per voice, each of
which could produce a simultaneous square wave and a
sawtooth/triangle wave. The square wave had adjustable
pulse width, and the sawtooth/triangle wave could be
adjusted from a full sawtooth to a full triangle shape, or
something in-between. The two oscillators could be de-
tuned relative to one another, and the first oscillator could
be synced to the second. The second oscillator could also
be used to produce white noise. These two oscillators
were then mixed together to form a final sound, which was
then passed through a 4-pole resonant low pass filter, and
Figure 65 Oberheim Matrix 6.©49
then finally an amplifier. The low-pass filter sported filter
FM (see Section 9), which enabled the first oscillator to
modulate the cutoff frequency of the filter at audio speeds,
creating unusual effects.
45 You can program the Matrix 1000: but you must do so via commands sent over MIDI from a patch editor, typically
a dedicated software program.
63
The Matrix 6 was notable for (at the time) its broad array of 1.0
modulation options. Both the filter and the amplifier had dedicated 0.8
this was not all. As befitted its name, the Matrix 6 also had a 0.0
0.0 0.2 0.4 0.6 0.8 1.0
46 This is not entirely unheard of. The entire Casio CZ series, discussed later in Section 7.5, used the exact same
synthesis engine in machines ranging from the large CZ-1 (Figure 82) for $1400 to the tiny CZ-101 for $499. The CZ-101
was the first significant synthesizer to break the $500 barrier (and was the spiritual predecessor to the microKORG, as it
was an early example of a professional synthesizer with minikeys).
64
Figure 69 A small Eurorack-format modular synthesizer, with Doepfer and Analogue Solutions modules.
Each timbre also contained a resonant filter (4-pole or 2-pole low pass, 2-pole band-pass, or
2-pole high pass) with its own dedicated ADSR envelope. The sound was then passed through a
stereo amplifier with its own ADSR envelope and a distortion effect. A timbre had two free LFOs
available, as well as a small, four-slot patch matrix with a small number of sources an destinations.
Finally, the microKORG had a built-in arpeggiator and three effects units through which the
audio was passed. The first effects unit could provide chorus, flanging, or phasing, the second
provided equalization, and the third provided some kind of delay. We’ll discuss effects in detail
in Section 11. Overall, while the microKORG (and MS2000 before it) had more filter and oscillator
options, they had much less modulation flexibility and lower polyphony than the Matrix 6.
Eurorack Modular Synthesizers Eurorack is a popular format for modern modular synthesizers.
The format was introduced in 1995 by Doepfer, and now many manufacturers produce modules
compatible with it. Like essentially all hardware modular synthesizers, Eurorack is monophonic: it
can produce only one sound at a time.
Eurorack signals normally take one of three forms: audio signals, gate signals (which indicate a
“1” or a “trigger” by moving from a low voltage to a high voltage, and a “0” by doing the opposite),
and control voltage (or CV) signals, which typically vary in voltage to indicate a real-valued
number. Gate and CV are used for modulation. All Eurorack jacks are all the same regardless of
the kind of signal they carry: thus there’s no reason you couldn’t plug an audio output into a CV
input to provide very high-rate modulation of some parameter.
The small Eurorack synthesizer shown in Figure 69 is a typical specimen of the breed. It contains
all the basic modules you’d find in a subtractive synthesizer; but be warned that the Eurorack
community has produced many kinds of modules far beyond these simple ones. This synthesizer
contains the following audio modules:
65
• Two Voltage-Controlled Oscillators (VCOs), which produce sawtooth, square, triangle, or
sine waves.
• A suboscillator (labelled “Audio Divider” in the Figure) capable of taking an input wave
and producing a combination of square waves which are 1, 2, 3, and 4 octaves below in pitch.
• A resonant four-pole filter with options for low-pass, high-pass, band-pass, and notch. In the
figure, frequency is referred to as “F” and resonance is referred to as “Q” in the labels.
(Mostly) below these modules are modulation modules which output Gate, CV, or both:
• A two-axis joystick.
• Two Low-Frequency Oscillators (LFOs) producing triangle, sine, square, or sawtooth waves.
• A dual Sample and Hold or (S&H) module which takes an input signal and a trigger (a gate),
and outputs the held value.
• An 8-stage step sequencer. This is often clocked by a square wave LFO, and outputs up two
triggers and two CV values per step.
• Two ADSR envelopes, notionally for the filter and amplifier respectively.
The whole thing might be driven by an additional modulation source: a keyboard with gate
(indicating note on or note off) and CV (indicating pitch).
All of the signals in this synthesizer are analog. All of the audio modules in this synthesizer are
analog as well; though many Eurorack modules use digital means to produce their synthesized
sounds. You’ll note from the picture the presence of cables attaching modules to other modules.
These cables transmit audio, gate, or CV information.
66
7 Oscillators, Combiners, and Amplifiers
A subtractive synthesis pipeline typically consists of oscillators, which produce sounds, combiners
which join multiple sounds into one, and filters and amplifiers, which modify sounds, producing
new ones (plus modulation of course, discussed in Section 5). Filters are a complex topic and will
be treated separately in Section 8. In this Section we’ll cover the others.
7.1 Oscillators
An oscillator is responsible for producing a sound. Oscillators at a minimum will have a parameter
indicating the frequency (or pitch) of the desired sound: they may also have a variety of parameters
which specify the shape of the sound wave. Many early oscillators took the form of Voltage
Controlled Oscillators (or VCOs), meaning that their frequency (and thus pitch) was governed
by a voltage. This voltage could come from a keyboard, or from a musician-settable dial, or could
come from a modulation signal from some modulator. VCOs are not very stable and can drift or
wander slightly in pitch, especially as temperature changes. Later oscillators had their frequency
governed by a digital signal: these were called Digitally Controlled Oscillators or DCOs. A DCO
is still an analog oscillator, but the frequency of the analog signal is kept in check by a digital
timer.47 On the other hand, Numerically Controlled Oscillators, or NCOs, are not analog devices:
they produce extremely-high-resolution faithful implementations of analog signals typical of VCOs
or DCOs.
Other oscillator designs are unabashedly digital: they
can produce complex digital waveforms via a variety Triangle
of synthesis methods, including wavetables, pulse code
modulation (or PCM), or frequency modulation (or
FM), among many other options. We’ll discuss these Sawtooth
methods later.
Early on (and even now!) the most common early
oscillator waveforms were the triangle, sawtooth,48 and Pulse 25%
square, shown in Figure 70. These waveforms were fairly
easy to produce via analog electronics; they had a rich Pulse 50%
assortment of partials, which provided good raw material or Square
47 DCOs are better technology than VCOs: but nostalgic musicians like the drifting nature of VCOs which, when
layered over one another, are thought to produce a more organic or warmer sound, despite their other failings.
48 Perhaps you recall these wave shapes from Section 5.1. Some synthesizers implement a sawtooth sound using a
sawtooth wave, while others produce a ramp wave . This distinction matters for LFOs used in modulation,
but not for oscillators producing audio-frequency waves, because ramp and sawtooth sound the same.
67
1.0 1.0 1.0
Amplitude
Amplitude
0.4 0.4 0.4
amplitude of harmonic #i is 1/i2 , but the phase of every even harmonic was shifted by π.49 This
squared dropoff means that the sum total amplitudes of the triangle wave are much less than the
sawtooth wave: and indeed a triangle wave is much quieter.
In a square wave, the amplitude of harmonic #i is 1/i when i is odd, but 0 when even. This is
also quieter than a sawtooth wave, but louder than a triangle wave. A square wave is just a special
case of a pulse wave. As can be seen in Figure 70, pulse waves come in different shapes, dictated
by a percentage value called the pulse width or duty cycle of the waveform. The pulse width is
the percentage of time the pulse wave stays high versus low. A square wave has a pulse width
of 50%.50
You’ll notice that one very famous wave is curiously missing. Where’s the sine wave? There
are two reasons sine is not as common in audio-rate oscillators. First, it’s nontrivial to make a high
quality sine wave from an analog circuit. But second and more importantly, a sine wave consists of
a single partial. That’s almost no material for downstream filters to work with. You just can’t do
much with the audio from a sine wave.51
Similarities to Certain Musical Instruments These waves can be used as raw material for many
artificial synthesized sounds. But some of them have properties which resemble certain musical
instruments, and thus make them useful in those contexts. For example, when a bow is drawn
across a violin string, the string is snagged by the bow (due to friction) and pulled to the side until
friction cannot pull it any further, at which time it snaps back. This process then repeats. The wave
movement of a violin string thus closely resembles a sawtooth wave.
Brass instruments also have sounds produced by processes which resemble sawtooth waves.
In contrast, many reed instruments, such as clarinets or oboes, produce sounds which resemble
square waves, and flutes produce fairly pure sounds which resemble sine waves.
Noise Another common oscillator is one which produces noise (that is, hiss). Noise is simply
random waves made up of all partials over some distribution. There are certain particularly
common distributions of the spectra of noise, because they are produced by various natural or
physical processes. One of the most common is white noise, which has a uniform distribution
49 Recall
again that humans can’t distinguish phase, so this is of largely academic interest.
50 Generally sin(πip)
speaking, the amplitude of harmonic #i in a pulse wave of pulse width p is i . You might ask
yourself what happens to this equation when the pulse width is 0% or 100% (0.0 or 1.0).
51 Note however that Low Frequency Oscillators can do a lot with sine waves, so it shows up in them all the time.
68
-20 -20 -20 -20
White Noise Pink Noise Brown Noise Blue Noise
Intensity (dB)
Intensity (dB)
Intensity (dB)
Intensity (dB)
Figure 72 Frequency spectra plots of four common kinds of noise: (Left to right) White, Pink, Brown, and Blue noise.
Note that the plots are logarithmic in both directions.©52
of its partial spectra across all frequencies. Another common noise is pink noise, whose higher
frequencies taper off in amplitude (by 3dB per octave). Brown noise, so called because it is
associated with Brownian motion, tapers off even faster (6dB per octave). Finally, blue noise
increases with frequency, by 3dB per octave. There are plenty of other distributions as well. Noise is
often used to dirty up a synthesized sound. It is also used to produce explosive sounds, or sharp,
noisy sounding instruments such as snare drums.
How do you create random noise? White noise is very simple: just use a uniform random
number generator for every sample (between −1... + 1 say). Other kinds of noise can be achieved
by running white noise through a filter to cut down the higher (or in the case of Blue noise, lower)
frequencies. We’ll talk more about filters in Section 8.52
Suboscillators One trick analog synthesizers use to provide more spectral material is to offer
one or more suboscillators. A suboscillator is very simple: it’s just a circuit attached to a primary
oscillator which outputs a (nearly always square) waveform that is 1/2, 1/4, 1/8, etc. of the main
oscillator’s frequency. 1/2 the frequency would be one octave down. This isn’t a true oscillator — its
pitch is locked to the primary oscillator’s pitch — but it’s a cheap and useful way of adding more
complexity and depth to the sound.
52 Oh, so you wanted an actual algorithm? Okay, here’s a simple but not super accurate one originally from Paul
Kellet: see http://www.firstpr.com.au/dsp/pink-noise/ It generates random white noise samples and then applies an
FIR low-pass filter to them. More on filters in Section 8.
69
7.2 Antialiasing and the Nyquist Limit
One critical problem which occurs when an oscillator generates waves is that they can be aliased in
a digital signal. This issue must be dealt with or the sound will produce considerable undesirable
artifacts when played. Aliasing is the central challenge in digital oscillator design.
A digital sound can only store partials up to a certain
frequency called the Nyquist limit. This is one half of 1.0
the sampling rate of the sound. For example, the highest 0.8
frequency that 44.1KHz sound can represent is 22,050Hz. 0.6
If you think about this it makes sense: to represent a sine 0.4
wave, even at its crudest, you need to at least go up and 0.2
then down: meaning you’ll need at least two samples.
But consider a sawtooth wave for a moment. The saw-
0.0
53 This is why aliasing is sometimes called foldover: the reflected partials are “folded over” the Nyquist limit.
70
Figure 75 Moiré patterns (left) in an image. This effect is due to the 2D version of aliasing.©53
Additive Synthesis We could build the wave by adding partials up to the Nyquist limit.
Filtering and Resampling This is the most common approach. We create a bandlimited wave at
a high sampling rate and store a single cycle of it as a digital sample: this is a single-cycle wave.
When we need to convert wave to a certain frequency, this is equivalent to resampling it into a
smaller number of samples. We first apply a low pass filter to strip out any frequencies higher than
the resulting Nyquist limit for our smaller number of samples; then we perform the resampling.
This is is a variation of so-called wavetable synthesis (Section 10.3). The process of resampling is
discussed in Section 10.5.
Discrete Summation Formulas (DSFs) This is a clever way of generating a band-limited wave
without having to add up lots of sine waves as is the case for additive synthesis.54 It turns out that
you can add up N + 1 sine waves of a certain useful pattern just by using the following identity:
N
sin(θ ) − a sin(θ − β) − a N +1 [sin(θ + Nβ + β) − a sin(θ + Nβ)]
∑ ak sin(θ + kβ) =
1 − a2 − 2a cos( β)
k =0
0.4 0.4
71
A BLIT is defined as:
(
blit( x, P) = ( M/P) sinc M [( M/P) x ] 1 M sin(πx/M ) = 0
sinc M ( x ) = sin(πx ) (2)
M = 2b P/2c − 1 M sin(πx/M )
otherwise
Here P is the period of the impulse train in samples, and M is the number of partials (harmonics)
to include. x is the xth sample. The maximum number of harmonics happens to be related to P as
shown, so we can compute it on the fly.
What can we do with this? Well, a band-limited sawtooth for one:
(
0 x≤0
saw( x, P) =
α saw( x − 1, P) + blit( x, P) − 1/P otherwise α = 1 − 1/P seems good
What’s going on is this: we’re just integrating (summing) the BLIT over time: this creates a
stairstep function. And then we’re bit by bit subtracting from the stairstep to get it back down to
zero. Since BLIT shoots up to 1, we need to subtract out a 1 before we get to the next period, so we
subtract out 1/P each time. I find this is reasonably scaled to 0...1 as saw(...) × 0.8 + 0.47.
This starts out very noisy but cleans up after about 6 cycles or so. The thing that cleans it up is
the α bit: this is a leaky integrator: it causes the algorithm to gradually forget its previous (initially
noisy) summation.56
To do a square wave, we need a new kind of BLIT, where the pulses alternate up and down.
We’ll call that a BPBLIT:
bpblit( x, P, D ) = blit( x, P) − blit( x − PD, P)
Here D (which ranges from 0 to 1, 1/2 being default) is the duty cycle of the BPBLIT: it’ll cause
the low pulses to move closer to immediately after the high pulses. Armed with this, we can define
a band-limited square wave as just the integration (sum) of the BPBLIT.
(
0 x≤0
square( x, P, D ) =
α square( x − 1, P, D ) + bpblit( x, P, D ) otherwise α = 0.999 seems good
I’d scale this as tri(...) × 4 + 0.5. Note that Triangle’s frequency is twice what you’d expect.
If you slowly scan through frequencies, you’ll get one or two pops as even 0.9 is not enough to
overcome certain sudden jumps due to numerical instability. Instead, you might try something like
α = 1.0 − 0.1 × min(1, f /1000), where f is the frequency.
56 Theleaky integrator is a common trick not only in digital signal processing but also in machine learning, where this
pattern shows up as the learning rate in equations for reinforcement learning and neural networks.
72
7.3 Wave Shaping
Wave shaping is very simple: it’s just map-
ping an incoming sound signal using a 1.0 1.0
Waveshaping Polynomials You could use any function to shape an incoming signal if you wished,
but it’s common to use polynomials. This is because polynomials allow us to predict and control
the resulting partials in a sound. Notably, a polynomial of order n can only generate harmonics up
to n. Consider the polynomial x5 applied to a sine wave of frequency ω and amplitude 1, as shown
in Figure 77. Our waveshaped signal w(t) would be:
T0 ( x ) = 1
T1 ( x ) = x
T2 ( x ) = 2x2 − 1
T3 ( x ) = 4x3 − 3x
T4 ( x ) = 8x4 − 8x2 + 1
T5 ( x ) = 16x5 − 20x3 + 5x
T6 ( x ) = 32x6 − 48x4 + 18x2 − 1
73
7.4 Wave Folding
Related to wave shaping is wave folding,
made popular by Don Buchla and west- 1.0 1.0
You could create even more inharmonic distortion using a related -1.0
method called wrapping: here, if the sound exceeds 1.0, it’s toroidally
wrapped around to -1 (and vice versa). That is:
Figure 79 Wrapping the func-
x > 1
( x mod 1) − 1 tion 1.5 sin(t).
Wrap( x ) = x < 1 1 − (− x mod 1)
otherwise x
This definition is carefully written such that u mod 1 is only 1.0
Finally, there remains the possibility of clipping: here, the sound -0.5
x > 1
1
Figure 80 Clipping the function
Clip( x ) = x < −1 −1 1.5 sin(t).
otherwise x
Dealing with Aliasing It shouldn’t surprise you that these methods can alias like crazy. Much
of the problem is due to the hard discontinuities that occur when these waves hit the 1.0 or -1.0
boundary. One cheap way to counter this, to some degree, is to round off these corners. For example,
you could compress the amplitude as it got closer to the boundary, with a function something like:
74
1.0 1.0 1.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
(
x ≥ 0 1 − (1 − x ) a a = 2 seems reasonable
Compress( x ) =
x < 0 (1 + x ) a − 1
You’ll still get plenty of aliasing, but it’s better than nothing.57
57 For a proper treatment of how to deal with antialiasing in wave folding, see Fabian Esqueda, Henri Pöntynen, Julian
Parker, and Stefan Bilbao, 2017, Virtual analog models of the Lockhart and Serge wavefolders, Applied Sciences, 7, 1328.
58 This example, minus the triangle windowing, is more or less the example provided in the CZ series user manuals.
75
1.0 1.0 1.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
Notice that we’re passing α ∈ [0, 1) into g(...). This lets us specify degree of phase distortion we
want, generally modulated by an envelope.59 How g(...) maps the distortion varies from function
to function. The CZ series provided a range of g(...) functions and w(...) window options.60
Figure 81 shows a sinusoid being gradually distorted by a g(...), changing according to α.
Eventually the sinusoid is distorted into a pseudo-sawtooth. Multiplying this by a triangle window
function produces an interesting final wave. Without an optional window function, the distortion
function acts as a kind of quasi-filter, stripping out the high frequencies of the sawtooth until we’re
left with a simple one-harmonic sinusoid. We’ll get to true filters in Section 8. Phase distortion’s
quasi-filters in combination with windowing can also provide some degree of resonance, as shown
in another example, Figure 83.
7.6 Combining
A subtractive synthesizer often has several oscillators per voice. The output of these oscillators is
combined, and the resulting sound is then be run through various shaping methods. There are
many different ways that these oscillators could be combined: we’ll cover a few of them next.
Mixing The most straightforward way to combine two or more sounds would be to mix them:
that is, to add them together, multiplied by some weights. For example, if we had n sounds
f 1 (t)... f n (t), each with a weight α1 ...αn (all αi ≥ 0), our mixer might be simply m(t) = ∑i αi f i (t).
It’s also common to play the weights off each other so as not to exceed the maximum volume and
to allow for an easy modulation parameter. For example, we might cross fade one sound into a
second with a single α value as m(t) = α f 1 (t) + (1 − α) f 2 (t). More generally, a cross-fading mixer
α f (t)
for some n sounds might be m(t) = ∑i∑ i αii . Of course, here not all αi can be zero.
i
59 On the CZ series, this was an elaborate eight-stage envelope which could do really nifty things.
60 Casio seemingly went to great lengths to obscure how all this worked in their synth interfaces. Windowing was not
discussed at all, and the front panel had only a limited set of options. To get the full range of combinations of wave and
window functions required undocumented MIDI sysex commands. See http://www.kasploosh.com/projects/CZ/11800-
spelunking/ for an extended discussion of how to do this.
61 This is more or less the example provided in Figures 18–20 of Casio’s patent on PD. Masanori Ishibashi, 1987,
Electronic musical instrument, US Patent 4,658,691, Casio Computer Co., Ltd., Assignee.
76
Ring and Amplitude Modulation Other combination
mechanisms produce complex sounds resulting from com-
bining two incoming sources. Whereas in mixing, the two
sound sources were essentially added, in ring modula-
tion,62 the two sources are multiplied against one another.
That is, the resulting sound is:
r (t) = f (t) × g(t) Figure 84 A ring modulation circuit.©55
5 10 15 20 25 30 5 10 15 20 25 30
a ( t ) = f ( t ) × ( g ( t ) + 1) -0.5 -1
tude modulation in a very simple case, using Ring Modulation Amplitude Modulation
sine waves as our sound sources, with a f and r ( t ) = f ( t ) × g ( t ) a ( t ) = f ( t ) × ( g ( t ) + 1)
a g as amplitudes and ω f and ω g as frequencies re- Figure 85 The effects of ring and amplitude modulation
spectively. Then we have f (t) = a f sin(ω f t) and when given two sine wave signals, f (t) and g(t).
g(t) = a g sin(ω g t).
The ring-modulated signal r (t) would look like this:
cos( A − B) − cos( A + B) = (cos A cos B + sin A sin B) − (cos A cos B − sin A sin B)
= 2 sin A sin B
sin A sin B = (cos( A − B) − cos( A + B))/2
77
Recall that the original signals were sine waves, and thus had one partial each at frequencies ω f
and ω g respectively. The new combined sound also consists of two partials (the two cosines63 ), one
at frequency ω f − ω g and one at frequency ω f + ω g . The original partials are gone. These two new
partials are called sidebands. What happens if ω f − ω g < 0? Then the partial is “reflected” back:
so we just get |ω f − ω g |.
Now, let’s consider amplitude modulation. Here, f (t) will be our primary signal (the carrier),
and g(t) will be the modulator, the signal that changes the amplitude of f (t). Using the same tricks
as we did for ring modulation, we have:
a ( t ) = f ( t ) × ( g ( t ) + 1)
= a f sin(ω f t)( a g sin(ω g t) + 1)
= a f a g sin(ω f t) sin(ω g t) + a f sin(ω f t)
= 1/2 a f a g (cos(ω f t − ω g t) − cos(ω f t + ω g t)) + a f sin(ω f t)
= 1/2 a f a g cos((ω f − ω g )t) − 1/2 a f a g cos((ω f + ω g )t) + a f sin(ω f t)
Here a(t) consists of not two but three partials: the same ω f − ω g and ω f + ω g sidebands as
ring modulation, plus the original carrier partial ω f . So amplitude modulation is in some sense
just ring modulation, mixed with the original carrier to form the final sound.64
Keep in mind that this fine for just two sine waves: but as soon as the mixed sounds become
more sophisticated, the resulting sounds can get more complex.
Frequency Modulation While amplitude modulation allows a modulating signal to change the
amplitude of a carrier signal, Frequency Modulation (or FM) allows a modulating signal to change
the frequency of a carrier signal. Frequency modulation is and important subject and has spawned
an entire family of synthesis methods all on its own. It’ll be discussed in detail in Section 9.
oscillator to simply restart its period, that is, reset to posi- Resulting
and has a distinctive sound. Reversing soft sync causes Reset Reset Reset
78
Sync will introduce high degrees of aliasing, and so it is not easy to deal with. If you are
interested, there exists a method called minBLEP65 which can be used to implement sync in a
bandlimited fashion. You can also use this technique to generate a variety of other bandlimited
waveforms, including square and sawtooth.
7.7 Amplification
There’s really nothing special to say about amplification: it’s just multiplying the amplitude
(volume) of the signal f (t) by some constant a ≥ 0, resulting in g(t) = a × f (t). The value a can be
anything, even a < 1, and so amplification is kind of a misnomer: an amplifier can certainly make
the sound quieter. In fact, an amplifier can be used to shut off a sound entirely. Analog amplifiers
are classically controlled via a voltage level, and so they are often known as voltage controlled
amplifiers or VCAs.
If an amplifier is so boring, why is it a module in a subtractive synthesis chain? Because you
can make a sound more realistic or interesting by changing its amplitude in real time. For example
many musical instruments start with a loud splash, then quiet down for their steady state, and then
fade away when the musician stops playing the note. Thus an amplifier module is important in
conjunction with time-varying modulation.
There are two common modulation mechanisms used for amplification. First, the musician may
specify the amplification of a voice through the velocity with which he strikes the key. Velocity
sensitive keyboards are discussed in Section 12.1. Second, a VCA is normally paired with an
envelope, often ADSR, which defines how the volume changes over time. This is so common and
useful that a great many synthesizers have dedicated ADSR envelopes solely for this purpose.
65 Eli Brandt, 2001, Hard sync without aliasing, in International Computer Music Conference. BTW, for additional
techniques for bandlimited wave manipulation, Vesa Välimäki is a name which crops up a lot; you might google for
him.
79
80
8 Filters
Filters put the “subtractive” in subtractive synthesis, and so they are absolutely critical to the
behavior of a subtractive synthesizer. Unfortunately they are also by far the most complex element
in a subtractive synthesis pipeline. Filters have a rich and detailed mathematical theory, and we
will only just touch on it here.
A filter takes a sound and modifies its partials, outputting the modified sound. It could modify
them in two ways: (1) it could adjust the amplitude of certain partials, or (2) it could adjust
their phase. The degree to which a filter adjusts the amplitude or phase of partials depends
on their frequency, and so the overall the behavior of the filter on the signal is known as its
frequency response. This is, not surprisingly, broken into two behaviors, the amplitude response
and the phase response of the filter. Because humans can’t hear differences in phase, we’re usually
interested in the amplitude response and will focus on it here; but there are interesting uses for the
phase response which we will come to later starting in Section 11.4.
A filter can describe many functions in terms of amplitude (and phase) response, but there are
certain very common ones:
• A low pass (LP) filter doesn’t modify partials below a certain cutoff frequency, but beyond
that cutoff it begins to decrease their amplitude. This drop-off is logarithmic, so if you see it
on a log scale it looks like a straight line: see Figure 87 (A). A low pass filter is by far the most
common filter in synthesizers: so much so that many synthesizers only have a low pass filter.
• A high pass (HP) filter is exactly the opposite: it only decreases the amplitude of partials if
they’re below the cutoff frequency. See Figure 87 (B).
• A band pass (BP) filter is in some sense a combination of low pass and high pass: it decreases
the amplitude of partials if they’re on either side of the cutoff frequency: thus it’s “picking
out” that frequency and shutting off the others. See Figure 87 (C).66
• A notch filter is the opposite of a band pass filter: it decreases the amplitude of partials if
they’re at the cutoff frequency. See Figure 87 (D).67
1 1 1 1
0.1
0.5
0.1 0.1
10-2
0.2
10-3
10-2 10-2
10 50 100 200 500 10 50 100 200 500 10 50 100 200 500 10 50 100 200 500
(A) Low Pass (LP) (B) High Pass (HP) (C) Band Pass (BP) (D) Notch
Figure 87 Amplitude response by frequency of four common filters, with a cutoff at 100. The axes are on a log scale.
Thus the notch is really more or less an inverted band pass, but looks quite different because of the log scaling.
66 This term is also used more broadly to describe a filter which passes through a range of frequencies rather than just
one. Unlike for a notch filter, I don’t think there are different terms to distinguish these two cases.
67 A notch filter is a degenerate case of a band reject filter, which rejects a certain range of frequencies rather than a
specific one.
81
1 10
0.1
0.1
0.1
These aren’t all the filters you can build, not by a long shot. Another common filter is the comb
filter, discussed at length in Section 11. Another important filter is the notional brick wall low pass
filter: essentially a very, very steep low pass frequencies above Nyquist.
Phase The filters above are largely distinguished by how they modify the amplitude of the
partials: but it doesn’t say anything about how they modify phase. In fact, usually when building
the four filters above our goal would be to not modify the phase at all,68 or at the very least, to shift
the phase by the same amount for all partials. Filters for the second case are called linear phase
filters. But there do exist filters designed to adjust phase of partials in different ways. The most
common subclass of filters of this type are the strangely-named all pass (AP) filters. As their name
would suggest, these don’t modify the amplitude at all; their purpose is solely to shift the phase.
We’ll see all pass filters more in Section 11.
Gain In modifying the amplitude or phase of a sound, filters often will inadvertently amplify
the overall volume of the sound as well. The degree of amplification is called the gain of the filter.
It’s not a big deal that a filter has a gain if we know what it is: we could just amplify the signal
back to its original volume after the fact. But it’s convenient to start with a filter that makes no
modification to the volume, that is, its gain is 1. We call this a unity gain filter.
of poles and zeros they have, which in turn determines their order. 10 dB
12 dB/8ve
24 dB/8ve
We’ll get back to what these are in a while, but for now it’s helpful
Loss
to know two facts. First, the number of poles can determine how 20 dB
steep the dropoff is: this effect is called the roll-off of the filter. In 30 dB
the synthesizer world you’ll see filters, particularly low pass filters,
described in terms of their roll-off either by the number of poles 40 dB
0.1 Hz 1 Hz 10 Hz 100 Hz
68 If
your filter is working in real-time, as is the case for a synthesizer, it’s not possible to avoid modifying the phase:
so you need to fall back to a linear phase filter.
82
Second or higher order filters can be constructed to exhibit a curious behavior: just before the
cutoff point, they will increase the amplitude of partials. This is resonance, as shown in Figure 88,
and creates a squelching sound. The degree of resonance
√ is known as the quality factor of the filter
and is defined by the value Q. A value of Q = 1/2 ≈ 0.707 has no resonance, and higher values
of Q create more resonance. A first-order (“one pole”) filter cannot have resonance.
1/2. This is averaging each sample with the sample before Input at Output at Multiply by Delay by one Add
it. If your sound was just a low-frequency sine wave, this time n time n b or – a time step
1 1
83
2
1.0 1.0
1
0.5 0.5
100 200 300 400 500 600 100 200 300 400 500 600 100 200 300 400 500 600
-0.5 -0.5
-1
-1.0 -1.0
-2
(A) Original signal (B) First Order FIR, b0 = 1/2, b1 = 1/2 (C) First Order IIR, b0 = 1/2, a1 = 1/2
Figure 94 Effects of the first-order digital filters from Figures 92 and 93 on a signal consisting of two sine waves:
f ( x ) = sin( x/100) + sin( x × 10). The high-frequency sine wave is tamped down, but the low-frequency wave is
preserved.
71 If
you increase the length of the delay, this simple filter becomes a feedback comb filter. We’ll discuss that more in
Section 11.
84
The full Infinite Impulse Response (IIR) filter b2
Delay
consists of both the FIR and the basic IIR filters
shown so far. The general pattern for a second
order IIR filter is shown in Figure 96 and is known b1
Delay
as the Direct Form I of a digital IIR filter.72
This diagram corresponds to the equation
y(n) = b0 x (n) + b1 x (n − 1) + b2 x (n − 2) x(n) b0 + y(n)
− a1 y ( n − 1) − a2 y ( n − 2)
– a1 Delay
Here’s one reason why the ai values are de-
fined negatively: because it allows us to rearrange
the equation so that all the y elements are on one
side and all the x elements are on the other, show- – a2 Delay
3: Global y ← hy1 ...y N i array of N real values, initially all zero . Note: 1-based array
4: Global x ← h x1 ...x M i array of M real values, initially all zero . Note: 1-based array
85
Algorithm 19 Step a Digital Filter
1: a ← h a1 ...a N i array of N real values . Note: 1-based array
2: b ← hb1 ...b M i array of M real values . Note: 1-based array
3: b0 ← real value . This is “b0 ”, by default 1
4: x0 ← real value . Current input
5: Global y ← hy1 ...y N i array of N real values, initially all zero . Note: 1-based array
6: Global x ← h x1 ...x M i array of M real values, initially all zero . Note: 1-based array
7: sum ← x0 × b0
8: for n from 1 to N do
9: sum ← sum − an × yn
10: for m from 1 to M do
11: sum ← sum + bm × xm
12: for n from N down to 2 do . Note backwards
13: y n ← y n −1
14: y1 ← sum
15: for m from M down to 2 do . Note backwards
16: x m ← x m −1
17: x1 ← x0
18: return sum
1. First we determine the behavior of the filter we want. Though we’re building a digital filter,
we’ll start by cooking up the requirements in the continuous realm, as if we were planning
on building an analog filter.
2. We’ll then choose the so-called poles and zeros of the analog filter in the Laplace domain, a
complex-number space, which will achieve this behavior. The poles and zeros collectively
define the transfer function of the filter.
3. We can verify the behavior pretty easily by using the poles and zeros to directly plot the
amplitude and phase response. This is typically plotted using a Bode plot.
4. There is no exact conversion from a continuous (analog) filter to a digital filter: rather we will
do an approximation. To do this, we start by first mapping the transfer function from the
Laplace domain to a different complex-number space, the Z domain. The Z domain makes it
easy to build a digital filter, but there is no bijection from Laplace coordinates to Z coordinates.
Instead we’ll use a popular conversion called the bilinear transform which will be good
enough for our purposes.
86
5. Once the transfer function is in the Z domain, it’s simple to extract from it the coefficients
with which we will build the digital filter in software.
6. Alternatively you could skip the Laplace domain and just define the poles and zeros in the Z
domain (and in fact designers do this as a matter of course). We’ll also discuss that strategy.
7. We’ll derive the transfer functions (in the Laplace and Z domains) of a popular Butterworth
filter design which can be used in a basic subtractive synthesizer.
Y (i 6283.185)
H (s) = H (iω ) = H (i 2π f ) = H (i 2π 1000) ≈ H (i 6283.185) ≈
X (i 6283.185)
The output of H is a complex number which describes both the change in phase and in amplitude
of the given angular frequency. Importantly, if we wanted to know the amplitude response of the
filter, that is, how our filter would
amplify a partial at a given angular frequency ω, we compute
75 Y (iω ) |Y (iω )|
the magnitude of | H (iω )| = X (iω ) = |X (iω )| .
2
Example. If H (s) = s2s+−s+4 2 , and we wanted to know the amplitude change at frequency
1/π Hz (I picked that to make it easy: 1/π Hz is ω = 2), we could do:
|(2i )2 − 4| | − 4 − 4| | − 8| 8 √
| H (s)| = | H (iω )| = | H (2i )| = 2
= = =p = 8
|(2i ) + 2i + 2| | − 4 + 2i + 2| | − 2 + 2i | (−2)2 + (2)2
73 The Laplace domain is called a domain for a reason: it’s closely related to the Fourier domain. But don’t let that
87
8.4 Poles and Zeros in the Laplace Domain
Y (s)
Given our transfer function H (s) = X (s) we can determine the behavior of the filter from the roots
of the equations X (s) = 0 and Y (s) = 0 respectively. The roots of Y are called the zeros of the
transfer function, because if s was equal to any of the roots of Y (s) = 0, all of H (s) would be
equal to zero. Similarly, if s was a root of X (s) = 0, then H (s) would be a fraction with zero in
the denominator and thus go to infinity. These roots are called the poles of the transfer function,
because they make the equation surface rise up towards infinity like a tent with a tent pole under it.
Example. Let’s try extracting the poles and zeros. We factor the numerator and denominator
of the following transfer function:
Y (s) 2s2 + 2s + 1 s + 12 + 12 i s + 12 − 21 i
H (s) = = 2 =
X (s) s + 5s + 6 ( s + 3) ( s + 2)
From this we can see that the roots of Y (s) = 0 are − 21 − 12 i and − 12 + 12 i respectively, and the roots
of X (s) = 0 are −3 and −2 respectively. The factoring process looks like magic, but it’s just the
result of the quadratic formula, which you √no doubt learned in grade school: for a polynomial of
2
the form ax2 + bx + c = 0 the roots are −b± 2ab −4ac .
Example. Let’s try another example:
Y (s) 1 1
H (s) = = =
X (s) 5s − 3 5(s − 35 )
Thus there are no roots of Y (s), and the single root for X (s) is 53 .
Finding roots gets hard for higher-order polynomials, but thankfully we won’t have to do it!
Instead, to design a filter we’d often start with the roots we want — the zeros and poles based on the
desired filter behavior — and then just multiply them to create the transfer function polynomials.
We can also compute the phase response — how much the phase shifts by — as
√
Remember that the magnitude of a complex number | a + bi | is a2 + b2 and its angle ∠( a + bi )
is tan−1 ba .
88
Im Im
iω
q iω
|iw p| = x 2 + y2 |iω - p1|
y
p1
x |iω - z1|
p
Re Re
|iω - p2| z1
p2
Figure 99 (Left) relationship between a pole p, the current frequency iω, and its impact on the magnitude of the
amplitude at that frequency. (Right) two poles and a zero and their respective magnitudes. In this example, the
amplitude at iω is |iω − p1 | × |iω − p2 | × 1/|iω − z1 |.
Example. Given our previous poles and roots, the magnitude of H (2i ) is:
|2i − (− 12 − 12 i )| × |2i − (− 21 + 12 i )|
| H (2i )| =
|2i − (−3)| × |2i − (−2)|
1+5i 1+3i
×
2 2
=
|3 + 2i | × |2 + 2i |
√ √ √
26 10
2 2 5
=√ √ =
13 8 4
In fact, we can easily plot the magnitude of the filter for any value of 0
ω, as shown in Figure 98. The plots shown are a classic Bode plot of the -5
amplitude and phase response. Note that the x axis is on a log scale, and -10
(for the amplitude plot up top) the y axis is also on a log scale. -15
Thus if we know the poles and zeros of our filter, we can compute the Figure 98 Bode plot of the
amplitude change and the phase change for any frequency in the signal amplitude response (top)
to which the filter is applied. and (awful) phase response
(bottom) of a filter with ze-
ros − 12 − 21 i and − 12 + 21 i,
8.6 Pole and Zero Placement in the Laplace Domain and with poles −3 and −2.
How do you select poles and filters that create a desired effect? This is a complex subject: here we
will only touch on a tiny bit of it to give you a bit of intuitive feel for the nature and complexity of
the problem. First, some rules:
89
(Log Change in)
Im Amplitude Phase
(Log) p (Log)
p
Frequency Frequency
p
3db -π/4
Re -π/2
roll-off p/10
(6db / octave)
10 p
Figure 101 (Left) Position of single pole at − p. (Center) Bode plot of the amplitude response of the filter. Boldface
line represents idealized filter (and useful as a rule of thumb) while the dotted line represents the actual filter. At the
idealized cutoff frequency (p) the real filter has dropped 3db. This being a one-pole filter, at the limit the roll-off is a
consistent 6db per octave. (Right) Bode plot of the phase response of the filter. Again, boldface line represents a useful
rule-of-thumb approximation of the filter behavior, whereas the curved dotted line represents the actual behavior. Phase
begins to significantly change approximately between p/10 and p × 10.
• Poles either come in pairs or singles (by themselves). A single pole is only permitted if it lies
on the real axis: that is, its imaginary portion is zero. Alternatively poles can exist in complex
conjugate pairs: that is, if pole p1 = a + bi, then pole p2 = a − bi. The same goes for zeros.
This should make sense given that poles and zeros are just roots of a polynomial equation,
and roots are either real or are complex conjugate pairs.
• Zeros follow the same rule: they also must either come in complex conjugate pairs, or may be
singles if they lie on the real axis.
• Poles must be on the left hand side of the complex plane: that is, they must have zero or
negative real parts. Otherwise the filter will be unstable. This rule does not hold for zeros.
Im
One simple intuitive idea to keep in mind is that a pole generally will
cause the slope of the amplitude portion of the Bode plot to go down, while p
adding a zero generally will cause it to go up. We can use this heuristic to
r
figure out how to build a filter whose amplitude characteristics are what
we want. r
Re
Let’s start with a single pole lying at − p on the real (x) axis, as shown
in Figure 101. As revealed in this Figure, a pole causes the amplitude to
drop with increasing frequency. Since this is a single pole, the roll-off will
be 6db per octave (recall that Bode plots are in log scale in frequency and Figure 100 Two poles as
in amplitude). The amplitude response of the ideal filter would look like a complex conjugate pair.
the boldface line in the Figure (center), but that’s not possible. Rather, the The value p is the same as
in Figure 101. The value r is
filter will drop off such that there is a 3db drop between the idealized filter related to the degree of res-
and the actual filter at the cutoff frequency, which is at .... p! onance in the filter.
A filter will also change the phase of partials in the signal. A typical
phase response is shown in Figure 101 (right), Again, the boldface line shows the idealized (or in
this case more like fanciful) response.
90
Im (Log Change in)
Amplitude
p2
(Log)
Frequency
p1
p1
Re z1
z1
z2
p2
z2
Figure 102 (Left) positions of two poles and two zeros on the real axis. (Right) Approximate Bode plot showing impact
of each pole and zero in turn. Bold line shows final combined impact: a band reject filter of sorts. Gray bold lines are
the roll-offs of each filter starting their respective cutoff frequency points. Note that because the figure at right is in log
frequency, to produce the effect at right would require that the poles and zeros be spaced exponentially, not linearly as
shown; for example, p1 = 1, z1 = 10, z2 = 100, p2 = 1000.
Now consider two poles. If the poles are not on the real axis, they must be complex conjugates,
as shown in Figure 100. Note that the distance r from the real axis is associated with the degree of
resonance in the filter. If all the poles are on the real axis, then r = 0 and the filter has no resonance.
If you think about it this means that a one pole filter cannot resonate since its sole pole must lie on the
real axis. Second order (and higher) filters can resonate because they have two poles and thus can
have complex conjugate pairs. Additionally, if you have two poles, either as a complex conjugate
pair, or stacked up on top of one another on the real axis, they essentially double the roll-off at p.
Thus the roll-off is now 12db.76
We’ve seen that the presence of a pole will cause the amplitude response to drop over time.
Correspondingly, the presence of a zero will cause the amplitude to rise by the same amount.
Furthermore, the distance p of the pole or zero from the imaginary axis (its negative real value)
roughly corresponds to when the pole or zero starts having significant effect: that is, p corresponds
to the cutoff frequency for that pole or zero.
We can use this to cause poles and zeros to approximately act against one another. Consider
the two-pole, two-zero filter shown in Figure 102. At p1 the first pole comes into effect, and begins
to pull the amplitude down. Then at z1 the first zero comes into effect, and begins to pull the
amplitude up: at this point p1 and z1 effectively cancel each other out, so the amplitude response
stays flat. Then at z2 the second zero comes into effect: combined with z1 it overwhelms p1 and
begins to pull the response up again. Finally at p2 the final pole takes effect and things even out
again. Behold, a band reject filter.77
Gain As discussed before, these filters can also change the overall amplitude, or gain, of the
signal. We’d like to avoid having a change at all (that is, we’d want a unity gain filter), or at least
be able to control the gain. Here we’ll just cover some basics. In general a first-order low-pass filter
with a gain of K has a transfer function of the form:
76 That should sound familiar: in the synthesizer world, 2-pole filters are also (somewhat incorrectly) referred to
as “12db” filters. At this point, you might be able to surmise why 4-pole filters are also often referred to (even more
incorrectly) as “24db” filters.
77 Notice that I’m not discussing the phase response: since it’s not very important for us, I’m omitting it here. Consider
yourself fortunate.
91
1
H (s) = K
τs + 1
Now consider a low-pass filter with a single pole − p1 . It has a transfer function
1 1 1
H (s) = =
s + p1 p1 s/p1 + 1
So K = 1/p1 . We’d like K = 1, so we need to multiply by p1 , resulting in
p1 1
H (s) = =
s + p1 s/p1 + 1
In general, for a multi-pole low pass filter − p1 , ..., − pi , we need to have p1 × ... × pi in the
numerator to make the filter have unity gain. Thus we have:
p1 × ... × pi 1
H (s) = =
( s + p1 ) × · · · × ( s + p i ) (s/p1 + 1) × · · · × (s/pi + 1)
Just for fun, let’s consider the two-pole low pass case, with p1 = p2 = p. This implies that the
two poles are stacked on top of each other and thus must be on the real axis.
1 1
H (s) = =
(s/p + 1) × (s/p + 1) s2
+ 2s
+1
p2 p
Compare this equation to Equation 3 on page 96. This is effectively a special case of the low
pass unity-gain second-order Butterworth filter discussed in Section 8.9. You might try working
out what happens when p1 and p2 are complex conjugates, and its relationship to Equation 3.
92
Yuck. I have no idea how to simplify that. Fortunately, that’s what Mathematica is for:
4z2 + 4z
H (z) =
3z2 − 10z + 3
Now we’ll do two more steps. First we want all the z exponents to be negative:
4z2 + 4z z −2 4 + 4z−1
H (z) = × =
3z2 − 10z + 3 z−2 3 − 10z−1 + 3z−2
Last we want a 1 in the denominator:
4 + 4z−1 1/3 4/3 + 4/3z−1
H (z) = × =
3 − 10z−1 + 3z−2 1/3 1 − 10/3z−1 + z−2
The Payoff These are the coefficients for our digital filter! Specifically if you have a digital filter
of the form
b0 + b1 z−1 + · · · + b M z− M
H (z) =
1 + a 1 z −1 + · · · + a N z − N
Example. Let’s continue where we had left off. We had
4/3 + 4/3z−1
H (z) =
1 − 10/3z−1 + z−2
Thus we have a second order digital filter with b0 = 4/3, b1 = 4/3, a1 = −10/3, a2 = 1.
Delay Notation Notice that a coefficient corresponding to a delay of length n appears alongside
z with the exponent form z−n . For this reason it is common in the digital signal processing world
to refer to an n-step delay as z−n and thus the one-step Delay element in our diagrams would be
commonly written as z-1 .
Frequency Warping One item to be aware of is that because of peculiarities in the Bilinear
Transform’s mapping, a frequency of ωZ in the Z domain doesn’t linearly correspond to a frequency
of ω L in the Laplace domain due to the phenomenon of frequency warping. We often want to
design digital filters with certain cutoff frequencies, and to do this we need to know what the
equivalent “warped” cutoff frequency should be in the Laplace domain to achieve that. It turns out
that the equations for warping frequencies between the Z and Laplace domains are:
The second equation is the more important one: we figure out what our desired cutoff frequency
is (that’s ωZ ), then compute the frequency ω L to use in our equations for building the filter in the
Laplace domain.
93
(to Infinity)
Im Im
eiω
iω
ω
0 Hz Nyquist 0 Hz
Re 1 Re
Figure 103 Relationships between the Laplace Domain (left) and the Z Domain (right). Notice that the entire (infinite) left
half of Laplace plane is mapped inside the unit circle in the Z plane. Whereas the cutoff frequency ω in the Laplace plane
goes up along the imaginary axis from 0 to ∞ as iω, in the Z domain it runs along the unit circle as eiω , corresponding to
going from 0 to the Nyquist frequency. Note how the example poles and zeros are warped in the mapping.79
79 Thisdiagram is largely a rip-off, with permission, of Figure 33-2 (p. 609) of Steven Smith, 1997, The Scientist & Engi-
neer’s Guide to Digital Signal Processing, California Technical Publishing, available online at https://www.dspguide.com/.
94
2.0 2.0 2.0
5000 10 000 15 000 20 000 5000 10 000 15 000 20 000 5000 10 000 15 000 20 000
5000 10 000 15 000 20 000 5000 10 000 15 000 20 000 5000 10 000 15 000 20 000
Figure 104 Difference in Magnitude (Amplitude) Response between a Low-pass Butterworth filter in the Laplace
Domain and one converted to the Z Domain (44.1KHz) via a Bilinear Transform, for different cutoff frequencies.
Resonance is set high (Q = 2) to make things obvious. In 500 and 1000 Hz the two are very nearly identical (Laplace is
on directly on top of Z). By 16,000 the divergence is significant.
In the Z domain, you’d more or less do the same thing, but with eiω (which, if you recall from
Section 3, just means cos(ω ) + i sin(ω )):
|Y (eiω )|
iω ∏ j |(eiω − z j )| ∏ j |(cos(ω ) + i sin(ω ) − z j )|
| H (z)| = | H (e )| = = =
| X (eiω )| ∏k |(eiω − pk )| ∏k |(cos(ω ) + i sin(ω ) − pk )|
Similarly, the phase response would be:
∠ H (z) = ∠ H (eiω ) = ∑ ∠(cos(ω ) − i sin(ω ) − z j ) − ∑ ∠(cos(ω ) − i sin(ω ) − pk )
j k
Though the Z domain maps frequencies from 0 to Nyquist about the unit circle from 0 to π,
this isn’t quite what the Bilinear Transform does. Rather, the Bilinear Transform squishes the entire
imaginary axis into the unit circle. That is, it maps all values of iω, from 0 to positive imaginary
infinity, to the unit circle from 0 to π: the infinite is mapped to the finite. As iω gets larger, its
mapping gets more and more compressed as it approaches π on the unit circle in Z. This is why
you’d need to do frequency warping in Laplace: to get the right frequency values in Z, you need
unusual equivalent frequency values in Laplace due to the nonlinearity.
For the same reason, the Bilinear Transform doesn’t produce exactly the same filter in the Z
domain: the filter frequencies are warped to some degree. The good news is that this warping
is much more pronounced in the higher frequencies, where we don’t care about the disparity so
much for audio purposes: at lower frequencies (under 1/4 Nyquist, say) the two are very similar.
Figure 104 illustrates this for different cutoff frequencies.
The Bilinear Transform is a useful approximation, and we’ll take advantage of it in the next two
Sections (8.9 and 8.10). But defining poles and zeros directly in the Z domain has its merits: you
can avoid a lot of math if you get a hang of the impact of their placement.80
80 The MicroModeler DSP is a great online tool for building filters in the Z domain directly from poles and zeros.
http://www.micromodeler.com/dsp/
On the other hand, Vadim Zavalishin’s The Art of VA Filter Design (VA as in “virtual analog”) goes into depth on histori-
cal filters and how to replicate them digitally: but the text largely stays in Laplace, with a section on how to convert the re-
sult to the Z domain via the Bilinear Transform and other methods. https://www.discodsp.net/VAFilterDesign 2.1.0.pdf
95
8.9 Basic Second-Order Butterworth Filters
Filter design is largely about compromise. There are many
Amplitude ➞
Passband
Ripple
kinds of filter approaches with different and contradictory
characteristics. Consider the concepts in Figure 105 at right.
Stopband
Often we might want a very rapid dropoff in the transition Attenuation
sound, may overshoot the goal amplitude, and may oscil- Figure 105 Filter terms (frequency domain).
late considerably before settling down (so-called ringing).
Ringing
In this Section we’ll consider one popular filter family
Amplitude ➞
Overshoot
in the audio world (and elsewhere), the Butterworth filter.
As shown in comparison with other common families in
Figure 107, Butterworth filters are simple, have a smooth Abrupt
Change
Filter Output
(if not rapid) transition, and have no ripple, though you In Sound Input
can augment them with resonance.
They do have downsides. Butterworth filters can cause Time ➞
considerable deviations in phase response, and they typ- Figure 106 Filter terms (time domain).
ically have significant overshoot and ringing. These effectshttp://commons.wikimedia.org/wiki/File:Filters_order5.svg https://upload.wikimedia.org/wikipedia/commons/b/bd/Filters_orde...
Low Pass For a low pass Butterworth filter at unity gain, N (s) = 1. Thus
1
H (s) = s2 s
(3)
ω02
+ ω0 Q +1
1 of 1 5/17/19, 8:41 AM
96
q
1
The two poles are − 2Q ± (2Q1 )2 − 1 × ω0 and there are (of course) no zeros.81 To get the
amplitude response, we have:
1 1
H (iω ) = (iω )2
= ω2 iω
iω +1
ω02
+ ω0 Q +1 − ω02
+ ω0 Q
1 1 1
LP = | H (iω )| = 2
=
= r
2 2
iω ω2
− ω
ω2
+ ω0 Q + 1 (1 − ω02
) + i ωω0 Q 1− ω2
+ ω
0 ω02 ω0 Q
s2
High Pass A high pass Butterworth filter at unity gain has N (s) = ω02
. So
s2
ω02
H (s) = s2 s
ω02
+ ω0 Q +1
The poles are the same as the low pass filter of course. The two zeros are simple: 0 and 0. To get
the amplitude response, we have:
(iω )2 −ω 2
ω02 ω02
H (iω ) = ω2
= ω2
− ω2 + ωiω 0Q
+1 − ω2 + ωiω0Q
+1
0 0
2 s
−ω 2
ω2 −ω 2
HP = | H (iω )| = 0
= × LP
− ω
2
+ iω
+ 1 ω02
ω2 0
ω0 Q
s
Band Pass A band pass Butterworth filter at unity gain has N (s) = ω0 Q . So
s
ω0 Q
H (s) = s2
ω02
+ ω0sQ +1
The poles are again same as the low pass filter. The sole zero is just 0. To get the amplitude
response, we have:
iω
ω0 Q
H (iω ) = ω2 iω
−ω 2 + ω Q
0
+1
0
s
iω 2
ω0 Q ω
BP = | H (iω )| = =
× LP
− ω
2
+ iω
+ 1 ω0 Q
ω2 0
ω0 Q
81 You can work this out from the quadratic formula followed by some rearranging: for a polynomial of the form
√
−b± b2 −4ac 1 1
ax2 + bx + c = 0, the roots are 2a . In our case, a = ω02
,b = ω0 Q , and c = 1.
97
s2
Notch Finally, a notch Butterworth filter at unity gain has N (s) = 1 + w02
. So
s2
1+ ω02
H (s) = s2 s
ω02
− ω0 Q +1
The poles are again the same. The two zeros are ±iω0 . The amplitude response is:
(iω )2 ω2
1+ ω02
1− ω02
H (iω ) = 2 = ω2
iω
−ωω0 2 + − ω2 + ωiω
ω0 Q + 1 0Q
+1
0
s
ω2 2
1 − ω 2 ω2
Notch = | H (iω )| = 2
0
=
1− 2 × LP
iω ω0
− ωω 2 + ω0 Q + 1
0
Low Pass
1
H (z) =
1 2 z −1 2 1 2 z −1
ω02 T z +1 + ω0 Q T z + 1 +1
ω02 QT 2 (1 + z)2
= Behold the magic of Mathematica
2ω0 T (z2 − 1) + Q(4(z − 1)2 + ω02 T 2 (1 + z)2 )
ω02 QT 2 + 2ω02 QT 2 z + ω02 QT 2 z2
=
(4Q − 2ω0 T + ω02 QT 2 ) + (−8Q + 2ω02 QT 2 )z + (4Q + 2ω0 T + ω02 QT 2 )z2
98
Thus we have the following coefficients for our digital filter:
High Pass
1 2 z −1 2
ω02 T z+1
H (z) =
1 2 z −1 2
ω02 T z +1 + ω10 Q T2 zz−
+1
1
+1
4Q(z − 1)2
= Once again Mathematica
2ω0 T (z2 − 1) + Q(4(z − 1)2 + ω02 T 2 (1 + z)2 )
4Q − 8Qz + 4Qz2
=
(4Q − 2ω0 T + ω02 QT 2 ) + (−8Q + 2ω02 QT 2 )z + (4Q + 2ω0 T + ω02 QT 2 )z2
99
Band Pass
1 2 z −1
ω0 Q T z + 1
H (z) =
1 2 z −1 2
ω02 T z +1 + ω10 Q T2 zz−
+1
1
+1
2ω0 Q2 T (z − 1)(z + 1)
= Once again Mathematica
2ω0 T (z2 − 1) + Q(4(z − 1)2 + ω02 T 2 (1 + z)2 )
−2ω0 Q2 T + 2ω0 Q2 Tz2
=
(4Q − 2ω0 T + ω02 QT 2 ) + (−8Q + 2ω02 QT 2 )z + (4Q + 2ω0 T + ω02 QT 2 )z2
Notch
−1 2
1 + ω12 T2 zz+ 1
H (z) =
0
1 2 z −1 2
ω02 T z +1 + ω10 Q T2 zz− 1
+1 + 1
100
Thus we have the following coefficients for our digital filter:
and similarly the two amplitudes and bandwidths. Figure 109 Simulating formants with multiple resonant
To do this right, the input to the filters might be band pass filters, multiplied by gains and then summed.
made by a digital waveguide model (Section 11.6),
but a square wave, perhaps with a bit of triangle or sine thrown in, works well in a pinch.
82 You’llnotice that I’m not providing a table of formants. Amazingly these tables vary quite considerably from
one another across the Internet. You might try Table III (“Formant Values”) in the back of the Csound Manual,
http://www.csounds.com/manual/html/MiscFormants.html
83 A dipthong is a sound made by combining two vowels. For example, the sound “ay” (as in “hay”) isn’t really a
vowel — it’s actually the vowel “eh” followed by the vowel “ee”.
101
102
9 Frequency Modulation Synthesis
In 1967, using a computer at Stanford’s Artificial Intelligence
Laboratory, composer John Chowning experimented with vi-
brato where one sine wave oscillator slowly (and linearly)
changed the frequency of a second oscillator, whose sound
was then recorded. As he increased the frequency of the first
oscillator, the resulting sound shifted from vibrato into some-
thing else entirely: a tone consisting of a broad spectrum of
partials. He then attached an envelope to the first oscillator and
discovered that he could reproduce various timbres, including
(difficult at the time) brass instrument-like sounds. This was
the birth of frequency modulation or FM synthesis.84 Figure 110 Yamaha DX7.©58 (Repeat of
FM, or more specifically its more easily controllable version Figure 3).
linear FM, is not easy to implement in analog, and so did not
come into its own until the onset of the digital synthesizer age.
But when it did, it was so popular that it almost singlehandedly
eliminated the analog synthesizer market.
Yamaha had obtained an exclusive license to FM for mu-
sic synthesis from Stanford in 1973 (Stanford later patented it
in 1975), and began selling FM synthesizers in 1980. In 1983
Yamaha hit pay dirt with the Yamaha DX7, one of the, if not
the, most successful music synthesizers in history. The DX7
marked the start of a long line of FM synthesizers, largely from
Figure 111 Yamaha YM3812 chip.©59
Yamaha, which defined much of the sound of pop music in the
1980s and 1990s. Among those, the Yamaha TX81Z rackmount
synthesizer particularly found its way onto a great many pop songs due to its ubiquity in music
studios.
FM synthesis then entered the mainstream with the inclusion of the Yamaha YM3812 chip
(Figure 111) on many early PC sound cards, such as the Creative Labs Sound Blaster series. From
there, the technique has since found its way into a myriad of video game consoles, cell phones, etc.
because it is so easy to implement in software or in digital hardware.
84 Thestory of the birth of FM synthesis has been told many times. Here’s a video of Chowning himself telling it.
https://www.youtube.com/watch?v=w4g92vX1YF4
85 Plus nobody’s ever heard of “phase modulation” or “PM” outside of music synthesis. When was the last time you
103
Phase Modulation Let’s consider the output of a single sine-wave oscillator, called the carrier,
with amplitude ac and frequency f c , and which started at timestep t = 0 at phase φc :
y(t) = ac sin(φc + f c t)
The value φc + f c t is the oscillator’s instantaneous phase, that is, where we are in the sine wave
at time t. Let’s say we wanted to modulate this phase position over time. We could do this:
The modulator function m(t) is doing phase modulation or PM. The instantaneous frequency
of this sine wave is the frequency of the sine wave at any given timestep t. It’s simply the first
derivative of the instantaneous phase, that is, it’s dtd (φc + f c t + m(t)) = f c + m0 (t). Thus by
changing the phase of the sine wave in real time via m(t), we’re also effectively changing its
frequency in real time via m0 (t).
Frequency Modulation Now let’s say we wanted to directly change the frequency in real time with
a function rather than indirectly via its derivative. That is, we want the instantaneous frequency to
be f c + m(t). Since we arrived at the instantaneous frequency in the first place by differentiating
over t, to get back to y(t), we integrate over t, and so we have:
Z t
y(t) = ac sin φc + f c + m( x )dx
0
Z t
= ac sin φc + f c t + m( x )dx
0
Here, instead of adding m( x ) to the phase, we’re effectively folding in more and more of it over
time. This direct modulation of frequency is called, not surprisingly, frequency modulation or FM.
To change the frequency by m(t), we just need to change the phase by some other function — in
Rt
this case, by 0 m( x )dx. Either way, regardless of whether we use phase modulation of frequency
modulation, we’re changing the frequency by changing the phase (and vice versa).
Phase and Frequency Modulation are Very Similar To hammer home just how similar phase
and frequency modulation are, let’s consider the situation where we are using a sine wave for
m(...).87 In PM, we’d have
In FM, let’s again modify the instantaneous frequency using sine, that is, f c + m(t) = f c +
am sin(φm + f m t). Integrating this over t and we get
Z t
am
f c + am sin(φm + f m x )dx = f c t + (cos(φm ) − cos(φm + f m t))
0 fm
am am
= fc t + cos(φm ) − cos(φm + f m t)
fm fm
87 In fact, this is a very common scenario in most FM synthesizers, so it’s hardly far fetched!
104
am
fm cos(φm ) is just a constant. Let’s call it D. So we have
am
y(t) = ac sin φc + f c t + D − cos(φm + f m t)
fm
(5)
am π
= ac sin φc + D + f c t + sin(φm − + f m t)
fm 2
Note how similar this equation is to the phase modulation equation, Equation 4. They differ in
just a constant phase (φc versus φc + D and φm vs φm − π2 ), and amplitude factor (am vs af mm ). The
phases are typically disregarded anyway, so we might ignore them. The amplitude factor (which is
called the index of modulation later) will matter, but it’s just a constant change. The take-home
lesson here is: phase modulation and frequency modulation are not the same equation (one is in part
the first derivative of the other) but they can be used to produce the same result.
Linear and Exponential FM Analog subtractive synthesizers have been capable of doing fre-
quency modulation forever: just plug the output of a sine-wave oscillator module into the frequency
control of another sine-wave oscillator module, and you’re good to go. So why wasn’t FM common
until the 1980s?
There is a problem. The frequency control of oscillators in analog synthesizers is historically
exponential. Recall that most analog synthesizers were organized in volt per octave, meaning that
an increase in one volt in a signal controlling pitch would correspond to an an increase in one
octave, which is a doubling of frequency.88 Consider a sine wave going from −1 to +1 being used to
modulate the frequency of our oscillator. The oscillator has a base frequency of, say, 440 Hz. At
−1 the sine wave has cut that down by one octave to 220 Hz. At +1 it has pushed it up by one
octave to 880 Hz. But 440 is not half-way between 220 and 880: the frequency modulation is not
symmetric about 440, and the effect is distorted.
This kind of FM is called exponential FM, and it’s not all that usable. It wasn’t until the advent
of digital synthesizers, which could easily control frequency linearly, that we saw the arrival of FM
as discussed here, linear FM. With linear FM our sine wave would shift the frequency between
440 − N and 440 + N, and so the modulation would be symmetric about 440.
105
1.0 1.0 1.0 1.0
Figure 112 Change and spread of sidebands with increasing index of modulation (I). In all four of these graphs, f c = 440
Hz and f m = 40 Hz. As I increases, the spread (hence bandwidth) of sidebands does as well; and the pattern of sideband
amplitude, including the carrier at the center, changes. Negative amplitudes just mean a positive amplitude but shifted
in phase by π.
Bandwidth and Aliasing One aspect of the index of modulation is its effect on the dropoff in
amplitude of the sidebands, and thus the bandwidth we have to deal with. The sidebands go
on forever, but a heuristic called Carson’s rule says that, for frequency modulation, 99% of all of
the power of the signal is contained in the range f c ± f m × ( I + 1), and there are I + 1 significant
sidebands on each side.89 Recall that for PM, I = am , but for FM, I = af mm .
Let’s say that I = 1, and we’re playing a very high note (about 4000 Hz), and f m = 16 × f c .
Then we will have sidebands out to 4000 + (4000 × 16) × (1 + 1) = 128, 000 Hz. Yuck. What to do?
We have a couple of options.
• We could have a high sample rate, and then downsample, probably via windowed sinc
interpolation (Section 10.7). As an extreme example, imagine that we were sampling at 441
KHz (!) With a Nyquist frequency of 220,500 Hz, this is big enough to handle a sideband
at 128,000 Hz. Downsampling would automatically apply a low pass filter to eliminate all
frequencies higher than 22,050 Hz.
• We have to figure out how to prevent the wide bandwidth in the first place. One strategy
would be to limit the legal values of I and f m , or at least reduce the maximum value of I
when f m and f c are high.
1.0
0
Bessel Functions The index of modulation also 0.8
comes into play in determining the amplitude of 0.6 1
89 Carson’s Rule is not exact. Consider when I = 0. Then y ( t ) = a sin( f t + I × sin( f t )) = a sin( f t ) and there is
c c m c c
no frequency modulation at all — we just have a single partial at f c — yet Carson’s Rule implies that the bandwidth is
f c ± f m × ( I + 1) = f c ± f m , rather than 0 as it should be.
106
Here’s how it works. Given index of modulation I,
then J0 ( I ) is the amplitude of the carrier, that is, the
partial at f c . Furthermore, Jα ( I ) is the amplitude of
sideband numbers ±α, located at f c ± α f m . These can
get complicated fast. Figure 114 shows the amplitude
of various sidebands, and the carrier (sideband 0), for
different modulation index (I) values. Figure 112 shows
four cutaways from this graph for I values of 1, 2, 4, and
8. Some things to notice from these figures. First, with
a small modulation index, the spectrum of the sound
is just a few sidebands (indeed when I = 0, it’s just the
carrier alone), but as the index increases, the number
of effected sidebands increases rapidly. Second, as the Figure 114 Sideband amplitudes by modulation
modulation index increases, some sidebands, including index. Sideband numbers are integers, and shown
the carrier, can drop in amplitude, or go negative.90 as colored stripes. Note that this surface drops
below zero in places.
f c and the sidebands to its right. Thus all the sidebands will 1.0
107
9.3 Operators and Algorithms
We typically want the effect of a modulator on a carrier to change over time; otherwise the sound
would be static and boring. The most common thing to change over time is the amplitude of
each oscillator: this is typically done with its own dedicated envelope. Envelopes would thus
effect indexes of modulation as well as the volume of the final outputted sound. The pairing of an
oscillator with the envelope controlling its amplitude are together known as an operator. Thus we
often don’t refer to oscillators modulating one another but to operators modulating one another.
In the following examples, we’ll stick to phase modulation as the equations are simpler. We’ll
simplify Equation 4 to describe operators as functions being modulated by other operators, that is,
the output yi (t) of operator i is a function of the output of a modulating operator y j (t). And we’ll
ignore phase from now on. Accompanying this equation we can make a little diagram with the
modulator operator on top and the carrier operator on bottom:
j
yi (t) = ai (t) sin f i t + y j (t)
i
So far we’ve just discussed a single carrier and a single modulator. But a modulator could easily
modulate several carriers. Imagine that the oscillators are called i, j, and k. We could have:
yi (t) = ai (t) sin f i t + y j (t) j
yk (t) = ak (t) sin f k t + y j (t) i k
Now, there’s no reason that a carrier couldn’t be modified by several modulators at once, with
their modulations added up:
i j
yk (t) = ai (t) sin f i t + yi (t) + y j (t)
k
... or for an operator to modulate another, while being itself modulated by yet another operator.
k
y j (t) = a j (t) sin f j t + yk (t)
j
yi (t) = ai (t) sin f i t + y j (t)
i
And of course there’s no reason why an operator has to be modulated by anyone else, that is,
108
1 6 2 6 3 4 5 6 7 8
5 5 3 6 3 6 6 6
2 4 2 4 2 5 2 5 2 4 6 2 4 6 2 4 5 2 4 5
1 3 1 3 1 4 1 4 1 3 5 1 3 5 1 3 1 3
9 10 11 12 13 14 15 16
6 3 3 6 5 6 5 4 6
2 4 5 2 6 5 2 6 5 2 4 5 6 2 4 5 6 2 4 2 4 2 3 5
1 3 1 4 1 4 1 3 1 3 1 3 1 3 1
17 18 6 19 20 21 22 23 24
4 6 5 3
2 3 5 3 2 4 2 6 3 5 6 3 6 2 6 3 6 6
1 1 1 4 5 1 2 4 1 2 4 5 1 3 4 5 1 2 4 5 1 2 3 4 5
25 26 27 28 29 30 31 32
5 5
6 3 5 6 3 5 6 2 4 4 6 4 6
1 2 3 4 5 1 2 4 1 2 4 1 3 6 1 2 3 5 1 2 3 6 1 2 3 4 5 1 2 3 4 5 6
Figure 116 Operator modulation graphs (so-called “algorithms”) for the Yamaha DX7. Operators on the bottom layer
(which have bare lines coming out from below them) are mixed together to produce the final sound. Other operators
serve only as modulators. Many algorithms sport self-modulating operators, and in a few cases (Algorithms 4 and 6)
larger modulation cycles.
The point is: the modulation mechanism in a patch is just a graph structure among some N op-
erators. Some FM synthesizer software allows fairly complex graphs (for example, Figure 117). But
many FM synths have followed an unfortunate tradition set by the Yamaha DX7: only allowing the
musician to choose between some M predefined graph structures. Yamaha called these algorithms.
The DX7 had six operators, each of which had a sine
wave oscillator and an envelope to control its amplitude.
There were 32 preset algorithms using these six operators,
as shown in Figure 116. Note that in an algorithm, some
operators are designated to provide the final sound, while
others are solely used to do modulation. In only three
algorithms (4, 6, and 32) did an operator do both tasks.
Operators designated to provide sounds ultimately have
their outputs summed together, weighted by their operator
amplitudes, to provide the final sound.
A few FM synthesizers, such as the Yamaha FS1R, had
up to eight operators; but the vast majority of FM synths Figure 117 OXE 8-operator FM software syn-
have had just four, with a very limited set of algorithms. thesizer. Note the “modulation matrix” at
right, whose lower-diagonal structure implies
However, many of Yamaha’s 4-operator FM synthesizers a full DAG is possible but not a cyclic graph
somewhat made up for their limitation by offering oscil- except for self-modulating operators.
lators which could produce more than just sine waves.
Perhaps the most famous of these was the 4-operator, 8-algorithm, 8-waveform Yamaha TX81Z.
Figure 118 shows the TX81Z’s eight algorithms and its eight possible waveforms. 4-operator syn-
thesizers have since become ubiquitous, having had made their way into numerous PC soundcards,
toy musical instruments, cell phones, and so on.
109
1 4 2 3 4
3 3 4 3 4
2 2 2 4 2 3
1 1 1 1
5 6 7 8
2 4 4 4
1 3 1 2 3 1 2 3 1 2 3 4
Figure 118 Algorithms (left) and waveforms (right) of the Yamaha TX81Z. Operators on the bottom layer (which have
bare lines coming out from below them) are mixed together to produce the final sound: other operators serve only as
modulators. Several algorithms sport self-modulating operators. Note that the waveforms are largely constructed out of
pieces of the sine wave. Six waveforms are silent (zero amplitude) for half of their period.
9.4 Implementation
FM is a perfect match for software. But how would you implement it? Recall Equation 1 in the
Additive Section, page 37. There we were maintaining the current sine wave phase for some
oscillator i as:
(t) ( t −1)
xi ← xi + f i ∆t mod 1
...where ∆t was the sampling interval in seconds: for example, 1/44100 seconds for 44.1KHz. The
final output of this sine wave oscillator was:
(t) (t) (t)
yi ← sin(2πxi ) × ai
Let’s say that this oscillator i is being modulated by the output of one or more oscillators, whose
set is called Mods(i). Then for phase modulation we could update the state of the oscillator xi and
its final output yi as:
(t) ( t −1)
xi ← xi + f i ∆t mod 1
!!
(t) (t) ( t −1) (t)
yi ← sin 2π × x i + bi × ∑ yj × ai
j∈Mods(i)
Keep in mind that you’re also probably modifying ai over time via the oscillator’s accompanying
envelope, and so yi is an operator. Notice the bi snuck into the equation above. This is just a useful
opportunity to specify the degree to which all the incoming modulation signals affect the operator.
Without it (or something like it), the index of modulation is largely defined by the ai envelopes of
the modulators, and so if some modulator is modulating different carriers, it will do so with the
same index of modulation: you can’t differentiate them.92 Anyway, if you don’t care about this,
just set bi = 1.
92 Traditional Yamaha-style FM synthesizers don’t have a b . The index of modulation is entirely controlled by the
i
modulator’s envelopes. However certain other FM synthesizers have bi included, notably the PreenFM2 shown in
Figure 119.
110
So how about frequency modulation? Here we’re repeatedly summing the modulation into the
updated state (that’s the integration). Note again the optional bi :
(t) ( t −1) ( t −1)
xi ← xi + f i ∆t + bi × ∑ yj mod 1
j∈Mods(i)
(t) (t) (t)
yi ← sin(2πxi ) × ai
Of course these don’t have to be sine waves: they can be any wave you deem appropriate: but
sine has a strong tradition and theory regarding the resulting sidebands (and what antialiasing
they will require). Most FM synthesizers aren’t much more than this. Neither the DX7 nor TX81Z,
nor most other Yamaha-style FM synths, had a filter or a VCA envelope.93 They had a single LFO
which could modify pitch and volume, plus a few other minor gizmos.
Advantages of Phase Modulation FM and PM have the same computational complexity and are
both easy to implement. There are some differences to think about though. For example, imagine
that y j was a positive constant: it never changed. Then phase modulation would have no effect
on the output of yi . However frequency modulation would have an effect: yi would have a higher
pitch due to the added integration. Along these same lines,
phase
modulation can make it a bit
(t) ( t −1)
easier to get an operator to modulate itself as yi ← sin xi + yi × ai , or to do similar cyclic
modulations, without changing the fundamental pitch of yi .
Overall, phase modulation seems to be somewhat easier to
work with, and it is likely this reason that Yamaha chose phase
modulation over frequency modulation for their FM (or, er, PM)
synthesizers. Yamaha’s synths offered self-modulation as an
option, though in truth self-modulation tends to create fairly
noisy and chaotic sounds. Partly because of these advantages,
and partly because of Yamaha’s influence, very few synthesizers
in history have chosen FM over PM: one notable exception is
the open-design PreenFM2 (Figure 119). Figure 119 PreenFM2.
Filter FM Last but not least: you can use audio-rate oscillators to modulate many other syn-
thesizer parameters beyond just the frequency or phase of another oscillator. Ever since there
were modular synthesizers, musicians have attached the output of oscillators to the modulation
input of any number of modules to varying degrees of effect. One particularly common method
worth mentioning here is filter FM, where an audio-rate oscillator is used to modulate the cutoff
frequency of a filter through which an audio signal is being run. This can be used to create a wide
range of musical or strongly discordant sounds.
93 Thereare exceptions. For example, the Elektron Digitone has both FM synthesis and filters, as do certain virtual
analog synths with FM options.
111
112
10 Sampling
The synthesizers discussed so far have largely generated sounds algorithmically via oscillators:
sawtooth waves, etc. But increases in computer power and (critically) memory capacity have made
possible sampling sounds directly from the environment. The synthesizer’s algorithmic oscillator
is replaced in a sampler with an “oscillator”, so to speak, which plays back the sampled sound.
other portions of the subtractive synthesizer architecture remain.
This approach is now very widely used in the music industry. Major film scores are produced
entirely using sampled instruments rather than a live orchestra. Stage pianos are often little more
than sample playback devices. Sampling in hip hop has caused all manner of copyright headaches
for artists and producers. Some sampled clips, such as the Funky Drummer or the Amen Break,
have spawned entire musical subgenres of their own. It is even common to sample the output
of analog synthesizers, such as the Roland TR-808 drum machine, in lieu of using the original
instrument.
10.1 History
Sampling and sample playback devices originated with early optical
and tape-replay devices, the most well known example being the
Streetly Electronics Mellotron series. These keyboards played a
tape loop on which a sample of an instrument had been recorded.94
Digital sampling existed as early as the 1960s, but sampling did not
come into its own commercially until the late 1970s. Some notable
early polyphonic examples were the Fairlight CMI and New Eng-
land Digital Synclavier, both sampling and synthesis workstations.
Digital samples use up significant memory, and sample manipu-
lation is computationally costly, so many improvements in samplers
are a direct result of the exponential improvement computer chip
performance and capacity over time. This has included better bit
depth and sampling rates (eventually reaching CD quality or better),
more memory and disk storage capacity, better DACs and ADCs,
and improved sample editing facilities. Firms like E-Mu Systems
and Ensoniq rose to prominence by offering less expensive sam-
plers for the common musician, and were joined by many common Figure 120 A Mellotron Mk VI,
brands from the synthesizer industry. circa 1999.©60
Many samplers emphasized polyphony and the ability to pitch
shift or pitch scale samples to match played notes. But samplers
were also increasingly used to record drums and percussion: these
samplers did not need to vary in pitch in real time, but they did
need to play many different samples simultaneously (drum sets,
for example). This gave rise to a market for phrase samplers and
sampling drum machines which specialized entirely in one-shot
sample playback. Notable in this market was the Akai MPC series,
Figure 121 Akai MPC Renaissance
which was prominent throughout hip-hop. sampling drum machine.©61
94 In most cases you could not record your own samples, thus these were more akin to romplers than samplers.
113
Romplers The late 1980s saw the rise of romplers. These synthesizers played samples just as
samplers did: but they were not samplers as they could not sample the sounds in the first place.
Instead, a rompler would hold a large bank of digital samples in memory (in ROM — hence the
derisive term “rompler”) which it played with its “oscillators”. Romplers were omnipresent
throughout the 1990s, and were used in a great many songs. Romplers were very often rackmount
units (as were most later samplers) and sported extensive multitimbral features, meaning that they
not only had high voice polyphony, but that those voices could play different sounds from one
another. This made it possible to construct an entire multi-instrumental song from a single rompler
controlled by a computer and keyboard. Romplers generally had poor programming interfaces, as
most of them were meant to fill a market demand for preset sound devices.
As computer hardware became cheaper and more capable, samplers and romplers were largely
displaced by digital audio workstations which could do the same exact software routines in a
more standard environment (the laptop).
arithmetic synthesis, the basis of a very successful line of synthesizers, such as the Roland D-50.
96 The basic PCM sounds in the Prophet VS were single-cycle waves, whereas the basic PCM sounds in the Korg
Wavestation could be looping, single-cycle, or one-shot. Because the VS had single-cycle waves, some people incorrectly
classify it as a wavetable synthesizer (Section 10.3), but it’s not.
114
10.3 Wavetable Synthesis
Another rather different use of single-cycle waves is in
the form of wavetables.97 A wavetable is nothing more
than array W = hw1 , w2 , ...wn i of digitized single cycle
waves. Figure 122 shows a wavetable of 64 such waves. A
wavetable oscillator selects a particular single cycle wave
wi from this table and constantly plays it. The idea is that
you can modulate which wave is currently playing via a
parameter, much as you could modulate the pulse width of
Figure 122 Wavetable #31 of the PPG Wave
a square wave. As a modulation signal (from an envelope, synthesizer, with 64 single-cycle waves. Most
say) moves from 0.0 to 1.0, the oscillator changes which waves move smoothly from one to another,
wave it’s playing from 0 to 63. This is more than just cross- but the last four do not: these are triangle,
fading between two waves, since in the process of going pulse, square, and sawtooth, and appear in
PPG and (minus pulse) Waldorf wavetables
from wave 0 to wave 63 we might pass through any number for programming convenience.©62
of unusual waves. Depending on the speed of modulation,
this could create quite complex sounds.
It’s not surprising that many early wavetable synthesiz-
ers sported a host of sophisticated modulation options to
sweep through those wavetables in interesting ways. For
example, the Waldorf Microwave series had an eight-stage
“wave envelope” with a variety of looping options, plus an
additional four-stage bipolar (signed) “free envelope”, in
addition to the usual ADSR options. Figure 123 shows the Figure 123 Waldorf Microwave XT (rare
front panel of the Microwave XT and its envelope controls. “Shadow” version: most are safety orange!).
Note the bottom right quadrant of the knob
It might interest you to know that wavetables have array, devoted entirely to envelopes.
historically been stored in one of two forms. As memory
is plentiful nowadays, wavetables are normally stored as Slot 0
Wave 101
… 30
47
… 38
131
… 46
132
… 55
141
… 60 61 62 63
142 Tri Squ Saw
held a large bank of available single-cycle waves, and each 101 47 131 132 141 142
wavetable was a sparse array whose slots were either refer- Spectra
97 Note that many in the music synthesis community, myself included, use the term wavetable differently than its later
usage in digital signal processing. In the music synthesis world, a wavetable is an array of digitized single cycle waves, a
usage popularized early on by Wolfgang Palm. But in the DSP community, a wavetable has since come to mean a single
digitized wave in and of itself! What the music synthesizer world typically calls wavetable synthesis, the DSP world might
call multiple wavetable synthesis. To make matters worse, in the 1980s Creative Labs often incorrectly used the term
“wavetable” to describe PCM samples generated from their Sound Blaster sound card.
Though it now appears in many synthesizers worldwide, wavetable synthesis is strongly linked with Germany: it
is often attributed to Wolfgang Palm and his wavetable synthesizer, the PPG Wave. Palm later consulted for Waldorf
Music, which in its various incarnations has produced wavetable synthesizers for over two decades.
115
Wavetables are nearly always bounded one-dimensional arrays. But the waves could instead
be organized as an n-dimensional array.98 The array needn’t be bounded either: for example, it
could be toroidal (wrap-around). Of course, an LFO or envelope can easily specify the index of the
wave in the one-dimensional bounded case, but how would you do it in higher dimensions? One
possibility is to define a parametric equation, that is, a collection of functions, one per dimension,
in terms of the modulation value m. For example, if we had a two-dimensional space, we could
m m
define our wave index in that space as i (m) = hcos( 2π ), sin( 2π )i. Thus as the modulation went
from 0 to 1, the index would trace out a circle in the space. If i (0) = i (1), as was the case in this
example, we could further use a sawtooth LFO to repeatedly trace out this path forever as an orbit.
out of streams of very short sound snippets (as short as 1ms 0.8
monly formed by cutting up a sampled PCM sound into little Figure 125 Hann Window.
pieces. Each grain is then associated with a window function
(see Section 3.5) so that it starts and ends at zero and ramps smoothly to full volume in the middle.
Without the window function, you’d likely hear a lot of glitches and pops as grains came and went.
In granular synthesis, the window function is known as a grain envelope.
Early granular synthesis experiments used simple triangular (ramp up, then ramp down) or
trapezoidal (ramp up, hold steady, then ramp down) windows, but as computer power increased,
more elaborate windows became possible. One popular window is the Hann window, which is
little more than a cosine. That is, applied to a grain of length M, the Hann window ranges from
[− M/2...M/2] and is defined as Hann( x ) = 1/2 cos(2πx/M) + 1/2. See Figure 125.
Because defining a stream of grains can require a very high number of parameters, granular
synthesis methods usually simplify things in one of two ways.99 First, synchronous granular
methods repeat one or more grains in a pattern. These could be used for a variety of purposes:
• If the grains are interspersed with silence, you’ll hear beating or rhythmic effects.
• If the grains come one right after the other (or are crossfaded into one another)100 they could
be used to compose new sounds out of their concatenation.
• You could also repeat the same grain over and over, perhaps with crossfading, to lengthen
a portion of a sound. This can form the basis of stretching the length of a sample without
changing its pitch, a form of time stretching.
98 A two-dimensional array of waves is known as a wave terrain, a term coined by Rich Gold in John Bischoff, Rich
Gold, and Jim Horton, 1978, Music for an interactive network of microcomputers, Computer Music Journal, 2(3).
99 These categories are co-opted out of the five categories described by Curtis Roads in Curtis Roads, 2004, Microsound,
116
At the other end of the granular spectrum are asynchronous granular methods, which produce
a stream of randomly or pseudo-randomly chosen grains. These grains may vary randomly or
deterministically in many ways, such as choice of grain, grain length, amplitude, window, grain
density (how many of them appear in a given time interval), degree of overlap, location in the
sound source, and pitch. A blob of grains in this form is often called a grain cloud.
The length of a grain has a significant impact on how
grains are perceived. Very short grains may simply sound
like pops or hiss. As grain length increases beyond 1ms or
so we can start to perceive the pitch of the waves embedded
in each grain, and this increases as grains grow to about
50ms. The density of the grains — that is, how much of the
sound interval is occupied by grains — also has a signifi-
cant impact. Very sparse sounds will produce beating or
rhythmic patterns; denser grain sequences result in a single
continuous sound; and very dense grains could have high Figure 126 Tasty Chips Electronics GR-1.©64
degree of overlap, producing a wall of sound.
Granular synthesis is uncommon. Most granular synthesizers are software; hardware granular
synths are very rare, especially polyphonic ones. One of the very few exceptions is the Tasty Chips
Electronics GR-1, an asynchronous granular synth shown in Figure 126.
10.5 Resampling
The primary computational concern in sampling, and the other techniques discussed so far, is
changing the pitch of a sampled sound. For example, if we have a sample of a trumpet played at A[,
and the musician plays a D, we must shift the sample so it sounds like a D. There are two ways we
could do this. The basic approach would be to perform pitch shifting, whereby we adjust the pitch
of the sound but allow it to become shorter or longer. This is like playing a record or tape faster:
person speaking on the tape is pitched higher but speaks much faster. The much more difficult
alternative (without introducing noticeable artifacts in the sound) is pitch scaling, where the pitch
is adjusted but the length is kept this same. Many samplers and romplers do pitch shifting.
The basic way to do pitch shifting is based on resampling. Resampling is the process of
changing the sample rate of a sound: for example, converting a sound from 44.1KHz to 96KHz.
We can hijack this process to do pitch shifting as follows. To shift a sound to twice its pitch (for
example), we just need to squeeze the sound into half the time. To do this, we could resample the
sound to half the sampling rate (cutting it to half the number of samples), then treat the resulting
half-sized array as if it were a sound in the original sampling rate. Similarly, to shift the sound to
half its pitch, we’d resample the sound to twice the sampling rate (generating twice the samples),
and again treat the result as if it were in the original rate.
2 4 6 8 10 5 10 15 20 25 30 5 10 15 20 25 30 2 4 6 8
Figure 127 Resampling a sound 3/4 its previous sampling rate. The sound is is first stuffed with two zeros per sample.
The result is smoothed over with a low-pass filter to interpolate the zeros between the original samples (and to prevent
frequencies over the final Nyquist limit). Then the sound is decimated to remove all but every fourth sample.
This all works because one consequence of the Nyquist-Shannon sampling theorem is that a
continuous signal bandlimited to contain partials no higher than a frequency F uniquely passes
1
through a set of discrete samples spaced 2F apart from one another. We’re removing samples but
the ones we retain still define the same basic sound, albeit at a lower rate.
Upsampling To resample to a higher sampling rate is called upsampling. If the new sampling
rate is an integer multiple of the original rate (for example, we’re upsampling to twice or three
times the rate), then we need to insert new samples in-between the original samples, a process
known as interpolation. Let’s say we wanted to upsample to four times the original rate. Then
we’d insert three dummy samples in-between each pair of the original samples. These dummy
samples would initially have zero amplitude. To get them to smoothly interpolate between the
originals, we could apply a low pass filter (yet again!), to smooth the whole thing. Note that this
will likely reduce the gain of the sound, so we may need to amplify it again.
Resampling by Rational Values Now let’s say that you needed to resample by a rational value.
For example, you wished to shift from a sample rate of X to ba X, where both a and b are positive
integers. To do this, you’d first upsample by a factor of a, then downsample the result by a factor of
b. Figure 127 shows this two-step process.
The problem is that small pitch shifts will require fractions of ba with large values of a or b or
both, costing lots of memory and computational time. For example, if you wanted to shift up from
C to C], this is an increase of 21/12 ≈ 89 84 . That’s a very rough approximation, and yet it would
require upsampling to 89 times the sampling rate, then decimating by 84! Now imagine an even
smaller pitch shift, such as via a slight tap of a pitch bend wheel: you could see even closer fractions.
A common workaround is to figure out some way to break the fraction into a product of smaller
fractions, and then do up/downsampling on each.101 For example, you could break up 56 7 8
45 = 5 × 9 ,
then do upsample(7), downsample(5), upsample(8), downsample(9). Still very costly.
This technique is also inconvenient to use in real-time scenarios which demand rapid, dynamic
changes in the sampling rate — such as someone moving the pitch bend wheel. We need a method
which can do interpolation in floating point, so we can change sample rates dynamically, in real
time, and without computing large fractions.
101 Obviously 89
you couldn’t do that with 84 , because 89 is prime.
118
10.6 Basic Real-Time Interpolation
Consider the following very simple approach. Given a sound A = h a0 , ..., as−1 i at our desired
sampling rate, but of frequency (pitch) PA , we want to change its pitch to PA0 . To do this, instead of
P
moving forward through A one step at a time, we’ll move forward PAA0 (real-valued) “steps” at a
time. Specifically, at timestep t we have a current real-valued position x t in the sound, and to step
P
forward, we set x t+1 ← x t + PAA0 . If we have a single-cycle or other looping wave, when x t exceeds
the number of samples s in the wave, set x t ← x t mod s to wrap around to the beginning. At any
rate, we return the sample ab xt c . If we are downsampling, we ought to first apply a low pass filter
to the original sound to remove frequencies above Nyquist for the new effective sampling rate.
min( PA0 ,PA )
This is the same as removing frequencies below F2A × PA in the original sound (where FA is
the original sound’s sampling rate).
P
The problem with this method is that PAA0 may not be an integer, so this is a rough approximation
at best: we’re just returning the nearest sample. We could do a bit better by rounding to the nearest
sample rather than taking the floor, that is, returning an where n = round( x t ). All this might work
P
in a pinch, particularly if we are shifting the pitch up, so PAA0 is large. But what if it’s very small?
We’d be returning the same value an over and over again (a kind of sample and hold). We need
some way to guess what certain values would be between the two adjoining samples ab xt c and ad xt e .
We need to do some kind of real-time interpolation.
Recall that for a given set of digital samples there exists exactly one band-limited real-valued
function (that is, one with no frequencies above Nyquist) which passes through all of them. Let’s
say that this unknown band-limited function is f ( x ). What the sampling and interpolation task is
really asking us to do is to find the value of f ( x ) for any needed value x given our known samples
h a0 = f ( x0 ), a1 = f ( x1 ), ..., an = f ( xn )i at sample positions x1 , x2 , ..., xn .
The simplest approach would be do to linear interpolation. Let’s rename the low and high
bracketing values of x t to xl = b x t c and xh = d x t e respectively. Using similar triangles, we know
x − xl f ( x )− f ( xl )
xh − xl = f ( x )− f ( x ) , and from this we get
h l
( x − xl )( f ( xh ) − f ( xl ))
f (x) = + f ( xl )
xh − xl
This is just finding the value f ( x ) on the line between the points h xl , f ( xl )i and h xh , f ( xh )i.
Linear interpolation isn’t great: its first derivative is discontinuous at the sample points, as is the
case for its generalization to higher polynomials, Lagrange interpolation.102
An alternative is to interpolate with a spline: a chunk of a polynomial bounded between two
points. Splines are often smoothly differentiable at the transition points from spline to spline, and
102 Named after the Italian mathematician Joseph-Louis Lagrange, 1736–1813, though he did not invent it. The goal
is to produce a Lagrange polynomial which passes exactly through n points: you can then use that polynomial to
find other points smoothly between them. To start, note that with a little elbow grease we can rearrange the linear
interpolation equation to f ( x ) = f ( xh ) xxh−−xxll + f ( xl ) xxl− xh
− xh . It so happens that we can add a third sample f m to the mix
( x − x )( x − x ) ( x − x )( x − x ) ( x − x )( x − x )
like this: f ( x ) = f ( xh ) ( x − xl )( x − xm ) + f ( xl ) ( x − xh )( x − xm ) + f ( xm ) ( x − xl )( x −hx ) .
h l h m l h l m m l m h
x−x
Notice the pattern? In general if you have samples x1 ...xn available, then f ( x ) = ∑in=1 f ( xi ) ∏nj=1, j6=i xi − xjj .
As mentioned, one problem with Lagrange interpolation is that it’s not continuously differentiable at the sample
points. If you have four sample points x1 , ..., x4 and you’re interpolating from x2 to x3 everything looks great. But once
you’ve reached x3 and want to start interpolating to x4 , you’d likely drop x1 and add a new sample x5 . But now the
polynomial has changed, so it’ll immediately launch off in a new direction: hence a discontinuity at x3 .
119
they avoid another problem with Lagrange interpolation, namely unwanted oscillation. One simple
spline approach is cubic interpolation. Let’s say we had four points h x1 , f ( x1 )i, ..., h x4 , f ( x4 )i
where the four xi are evenly spaced from each other and increasing in value. That’s certainly the
case for our audio samples. We’re trying to find f ( x ) for a value x between x2 and x3 . Let α be how
far x is relative to x2 and x3 , that is, α = ( x − x2 )/( x3 − x2 ). Then
f ( x ) = α3 (− f ( x1 ) + f ( x2 ) − f ( x3 ) + f ( x4 ))
+ α2 (2 f ( x1 ) − 2 f ( x2 ) + f ( x3 ) − f ( x4 ))
+ α(− f ( x1 ) + f ( x3 ))
+ f ( x2 )
A better variation, based on the Catmull-Rom cubic spline, uses successive differences in f (...)
to estimate the first derivative for a potentially smoother interpolation.
f ( x ) = α3 (−1/2 f ( x1 ) + 3/2 f ( x2 ) − 3/2 f ( x3 ) + 1/2 f ( x4 ))
+ α2 ( f ( x1 ) − 5/2 f ( x2 ) + 2 f ( x3 ) − 1/2 f ( x4 ))
(6)
+ α(−1/2 f ( x1 ) + 1/2 f ( x3 ))
+ f ( x2 ) 6
3 4 5 6 7 8
have to filter beforehand when downsampling to eliminate
aliasing103 But it turns out that there exists a method which Figure 128 Basic cubic spine interpolation
and Catmull-Rom interpolation. Note that
will, at its limit, interpolate along the actual band-limited Catmull-Rom is less “bouncy”.
function, and acts as a built-in brick wall antialiasing filter
to boot. That method is windowed sinc interpolation. 1.0
0.8
103 There are lots of ways to optimize these polynomial interpolators to improve their sound quality. You might check
out http://yehar.com/blog/wp-content/uploads/2009/08/deip.pdf
104 Sinc is pronounced “sink”, and is a contraction of sinus cardinalis, (cardinal sine). There are two definitions of sinc,
with and without the appearance of π. Sampling uses the one with π (the normalized sinc function) because its integral
equals 1. Note that we define sinc to be 1 when x = 0 because the function divides by zero at that point otherwise. Does
all this ring a bell? Look back at the variant of sinc used in Equation 2.
120
Interpolation with Sinc Recall that there is exactly one
bandlimited continuous signal which passes through the
points in our digital signal. Sinc is nicknamed the sam-
pling function because, applying the Whittaker-Shannon
interpolation formula, you can use sinc to reconstruct this
continuous signal from your digital samples. 2 4 6 8 10
This is convolving the sinc function against A, as shown in Figure 130. But notice that, because
sinc is symmetric around zero, sinc( FA × (t − k/FA )) = sinc( FA × (k/FA − t)). This means we
could instead write things as a correlation rather than convolution procedure:
∞
C (t) = ∑ sinc( FA × (k/FA − t)) × ak
k =−∞
Now we can identify the new sample positions in A0 and use this equation to compute them
one by one.
121
When downsampling we need to make sure that the original signal contains no frequencies
above the Nyquist limit for the new sampling rate. How can we do this? It so happens that
convolution with sinc isn’t just an interpolation function: it’s also a perfect brick-wall low-pass
filter (in theory at least, when we’re summing from −∞ to ∞). This is because convolution of two
signals in the time domain does the same thing as multiplying the two signals in the frequency
domain. And sinc’s Fourier transform just so happens to be the (brick-wall) rectangle function:
(
1 −0.5 ≤ x ≤ 0.5
rectangle( x ) =
0 otherwise
To change the cutoff frequency, all we need to do is adjust the width of our sinc function. At
present the multiplier FA in Equation 7 ensures a filter cutoff at FA /2, that is, the Nyquist limit for
the original sound. But if we’re downsampling, we need it to cut off at the (lower) Nyquist limit for
the new sound. We do this by replacing FA with min( FA , FA0 ), like this:
∞
a0j = ∑ sinc(min( FA , FA0 ) × (k/FA − j/FA0 )) × ak
k =−∞
This will also change the overall volume, so to keep it a unity gain filter, we need to scale it
back again by min(1, FA0 /FA ):
∞
a0j = min(1, FA0 /FA ) × ∑ sinc(min( FA , FA0 ) × (k/FA − j/FA0 )) × ak
k =−∞
Now let’s define J = FA × j/FA0 . That is, J is the real-valued location of the new sample a0j in
the coordinate system of the original samples in A. This is the spot about which the sinc function is
centered (for example, J = 6 2/3 in Figure 131), as is obvious when we substitute J into the equation:
∞
min( FA ,FA0 )
a0j = min(1, FA0 /FA ) × ∑ sinc FA × (k − J ) × ak
k=−∞
decibels
without just truncating it (which would sound aw- Figure 132 Blackman Window and its effects on
ful). To do this, we can multiply it against a window. the frequency domain (note that the Y axis for the
©65
Windows were introduced in Section 3.5. We’d like a FFT is on a log scale). For more windows, see
Section 3.5.
122
window which dropped completely to zero, such as the Hann window. But there’s a somewhat
better choice for our purposes: the Blackman window, as shown in Figure 132.105 The Blackman
window is applied over a region of length N. It is a function over n ∈ 0...N − 1:
1 2πn 4πn
w(n, N ) = 0.42 − cos + 0.08 cos
2 N−1 N−1
N is normally an odd number. Armed with a window, we could now replace the sinc with a
windowed sinc which tapers off at ±( N − 1)/2 using the window centered at J like sinc was (plus
an offset of N2−1 because Blackman isn’t centered around 0):
∞
min( FA ,FA0 )
a0j = min(1, FA0 /FA ) ∑ sinc FA × (k − J ) × w k − J + N −1
2 ,N × ak
k =−∞
Because all values in the sum outside the window region are 0, we can now make the sum finite.
So what should our upper and lower bounds be? They l should mbe the outermost
j samplekpositions
( N −1) ( N −1)
just inside the window taper region. That is, klow = J − 2 and khigh = J + 2 , thus:
khigh
min( FA ,FA0 )
a0j = min(1, FA0 /FA ) ∑ sinc FA × (k − J ) × w k − J + N −1
2 ,N × ak
k =klow
And we’re done! Hint: if you’re trying to shift the pitch from PA to PA0 , then FA0 = FA × PA /PA0 .
The quality of this approach will largely turn on the size of N, which in turn impacts directly on
computational power.106 It also will impact on the latency of the algorithm: because sinc is not a
causal filter, we must know some of the incoming future sound samples. For much sampling this is
probably not an issue, as we probably already have entire PCM sample or the wavetable available
to us. But if you were using this method to do (say) pitch-shifting of incoming sounds, you should
be aware of this. For example, if you were using 44 sinc coefficients per side on a 44.1KHz sound,
the delay would be about (44, 1000/44 × 1000) ≈ 1 millisecond.
105 Inthe literature you’ll see much better windows still, notably the Kaiser window, but they are difficult to describe
and even tougher to implement (Kaiser is based on Bessel functions).
106 CCRMA has a more efficient table-driven windowed sinc interpolation algorithm you would do well to check
123
124
11 Effects and Physical Modeling
Most of this text is concerned with the creation or sampled playback of sounds. But another important
aspect are algorithms meant to add some effect to a sound to enhance it. The sound being fed into
an effect doesn’t have to come from a synthesizer or sampler: in fact it’s often a live sound like
vocals or an instrument. The goal of an effect is to make the sound feel better or different somehow.
Some of the algorithms we’ve covered so far qualify as effects in and of themselves, and can
be found in guitar pedals and other devices: for example, filters, ring modulation, clipping, and
other distortion mechanisms. But many popular effects rely on some kind of some kind of time
delay to do their magic. These are the bulk of the effects covered in this Section. In some of these
effects (delay, reverb) the delays are long and so are perceived as shifts in time; but in other effects
(chorusing, flanging) the delays are very short and are instead perceived as changes in timbre.
The Section concludes with a short introduction to physical modeling synthesis, an array of
techniques for modeling the acoustic and physical properties of certain instruments. Physical
modeling is lumped in with effects in this Section because its methods often apply similar delay-
based techniques.
11.1 Delays
One of the simplest time-based effects is the humble delay. Here, x(n) Cut + y(n)
the sound is augmented with a copy of itself from some m
timesteps before. A one-shot delay is quite easy to implement:
Long
it’s essentially the extension of an FIR filter, with a delay portion Delay
Cut
Note from Figure 133 that you can cut down the ampli-
tude of both the original and delayed signal. The degree Long
Delay
Cut
to which you cut down one or the other defines how dry
or wet the signal is. A fully dry signal is one which has no Figure 134 Repeated delay, augmented with
effect at all (the delay is cut out entirely). A fully wet signal two additional cut gains to control wetness.
Compare to the basic IIR filter in Figure 93.
is one which has only the effect.
125
What if you wanted a repeating delay? This is also easy: you just need the equivalent of an
extended feedback (that is, IIR) filter. The cut-down is particularly important, because if we don’t
cut down enough, the recurrent nature of this delay will cause it to spiral out of control. Figure 134
shows this delay core, augmented with two outer cut-downs to make it easy to control wetness.
There are lots of variations on delays: you could ping-pong the delay back and forth in stereo,
or sync the delay length to the MIDI clock so the delays come in at the right time. Perhaps you
might pitch-shift the delay or repeatedly run it through a low-pass filter.
11.2 Flangers
While delay effects involve long delays, other effects involve rather x(n)
+ y(n)
short delays which are perceived not as delays but as changes in
the spectrum of the sound. A classic example of this is the flanger.
Short
This is an effect whose characteristic sound is due to a signal being Delay
Cut
mixed with a very short delayed version of itself, where the degree modulates
delay length
of delay is modulated over time via an LFO, perhaps between 1 LFO
Comb Filters When a delay line is very short, as is the case in a 2.0
flanger, we don’t hear a delay any more. Rather we hear the effect
1.5
of a comb filter. One kind of comb filter, a forward comb filter, is
a simple extension of the classic FIR filter with a longer delay: it 1.0
where m is the length of the delay in samples. We’ll assume that Figure 136 Forward comb filter with
b0 = 1. Notice the repeated lobes in the comb filter in Figure 136. b0 = 1 and b1 set to -0.9, -0.5, and -0.2.
A larger value of m will result in more of these lobes.107 You can Im
also see how setting b1 to different values changes the wetness of
the filter.
A comb filter is most easily described in the Z Domain where,
with b0 = 1, its transfer function is
zm + b1 (8 poles)
H (z) = 1 + b1 z−m = Re
zm
From this you can see that the filter will have m poles and m
zeros. The poles all pile up at the origin, while the zeros are spaced
evenly just inside the unit circle.108 It is this even spacing which
creates the lobes in the magnitude response:
q Figure 137 Poles and zeros of a for-
iω
| H (e )| = (1 − b1 )2 + 2b1 cos(ωm) ward comb filter, m = 8, b1 = −0.5,
in the Z domain.
107 Indeed, if m = 1, then we have a standard low-pass or high-pass filter.
108 Youmight ask yourself what a comb filter would look like in the continuous (Laplace) domain. Since this domain
can go to infinity in frequency, a proper comb filter would wind up with an infinite number of poles and zeros. That’s
probably not reasonable to implement.
126
The feedback comb filter, which is the extended equivalent to a basic IIR filter, is just as simple.
It takes the form
y(n) = b0 x (n) + a1 y(n − m)
Again, we may assume that b0 = 1, and so the transfer function, in the Z Domain, is just
1 zm
H (z) = =
1 − a1 z − m z m − a1
Notice how close this is to an inverse of the forward version. It 10
wouldn’t surprise you, then, to find that the feedback comb filter 8
has its zeros all at the origin and its poles spaced evenly just inside 6
the unit circle, the exact opposite of the forward comb filter. The 4
1
| H (eiω )| = p Figure 138 Feedback comb filter
Nyquist
Fractional Delays So far we’ve described m as being an integer. But a flanger’s LFO must
smoothly change the length of the delay, and so m would benefit greatly from being a floating-point
value. This means we need a delay which interpolates between two sample positions.
A simple way to do this is linear interpolation. Let α = m − bmc. That is, α is a value between
0 and 1 which describes where m is with respect to the integers on either side of it. Now we could
modify Equation 8 to roll in a bit of each of the samples on either side of m, that is:
Linear interpolation isn’t very accurate: and it’s particularly bad at delaying high frequencies.
There exist more sophisticated interpolation options, as discussed in Section 10.6. Or you could
hook a time-varying all pass filter to the end of your delay line. We’ll discuss all pass filters coming
up, in Section 11.4.
11.3 Chorus
Chorus is another short-delay effect which sounds like many copies of the same sound mixed
together. And that is basically what it is: the copies are varied in pitch, amplitude, and delay. One
easy way to implement this effect is with a muti-tap delay line. This is a delay which outputs
several different positions in the buffer. It’s pretty straightforward:
127
Algorithm 21 Multi-Tap Delay Line
1: x ← incoming sample
2: A ← h a0 , ..., aq−1 i tap positions, each from 0 to n − 1 .qn
Like flanging, chorusing likewise would benefit from an interpolated delay line so the tap
positions don’t have to be integers. It’s not difficult to modify the previous algorithm to provide
that.
128
11.4 Reverb
Reverb, or more properly reverberation, attempts to replicate the natural echoes which occur in
an enclosed space. These aren’t simple delay echoes: there are a very, very many of them and they
are affected by the nature of the surfaces involved and the distance from the listener. Furthermore,
echoes may bounce off of many surfaces before arriving at the listener’s ear.
It’s common to model a reverb as follows. For some n timesteps after a sound has been produced,
there are no echoes heard at all: sound is slow and hasn’t travelled the distance yet. Then come
a small collection of early reflections which have bounced directly off of surfaces and returned
to the listener. Following this come a large, smeared set of late reflections which result from the
sound bouncing of many surfaces before returning.
Early reflections are more or less multiple arbitrarily-spaced delays and hence are straight-
forwardly implemented with a multi-tap delay line. Late reflections are more complex: if you
implemented them with very short delays (comb filters, say), the result would sound artificial.
Better would be to find a way to have different delay lengths for frequencies in the sound, to create
a smearing effect. Enter the all pass filter.
All-Pass Filters An all pass filter is a strange name: why would we want a filter which doesn’t
change the amplitude of our partials? The reason is simple: the amplitude is left alone, but the
phase is altered in some significant way. And altering the phase is just adding a small, real-valued
delay to the signal. Importantly, this delay can be very small, even less than a single sample, and
different different frequencies can be (and are) delayed by different amounts.
There are many ways to achieve an all-pass filter, but
perhaps the simplest is to intertwine two comb filters, as b 0
-b0
y(n) = b0 x (n) + x (n − m) − b0 y(n − m)
Figure 140 A simple all-pass filter consisting
This has the Z Domain transfer function110 of two intertwined comb filters. Note that the
coefficients are the negatives of one another.
b0 + z−m
H (z) =
1 + b0 z−m
This transfer function has an
even, 1.0 magnitude response, b1
side one another as shown in Figure 141 One all-pass filter nested inside another.
Figure 141.
110 Thisall assumes that b0 is a real value, which would be the case in audio applications. If not, then the equation is
−m
y(n) = b0 x (n) + x (n − m) − b0 y(n − m), with H (z) = b0 +z −m , where b0 is the complex conjugate of b0 .
1+b0 z
129
Low Pass Feedback
Low Pass Feedback
Comb
Low Filter
Pass Feedback
Comb
Low Filter
Pass Feedback
x(n) Comb
Low PassFilter
Feedback All Pass Filter All Pass Filter All Pass Filter All Pass Filter y(n)
Low
CombPass Feedback
Filter
Low
CombPass Feedback
Filter
Low
Comb Pass Feedback
Filter
Comb Filter
Comb Filters
Putting it Together Armed with multi-tap delay lines, comb-filters, and all-pass filters, we have
enough material to string together to form a reverberation algorithm. This algorithmic approach is
often called Schroeder reverberation after Manfred Schroeder, an early pioneer of the technique.
There are lots of ways to arrange these elements, but here’s one example architecture.
Freeverb is popular open source reverb implemen-
tation by Jeremy “Jezar” Wakefield. In this architec- x(n) b + y(n)
0
Convolution Reverb A popular alternative approach to the algorithmic reverbs shown so far is
to directly sample the reverberation pattern of an environment and apply it to the sound. This
approach is called a convolution reverb. The idea behind this is actually surprisingly simple. We
first sample the echoes resulting from an impulse (a single loud, extremely short sound, such as
a balloon popping), and then apply these echos over and over again in response to every single
sample in our sound. These echoes are known as the impulse response of the environment.
If we treated the impulse as a single sample of maximum volume, then the echoes from the
impulse could be thought of as the effect that this single input sample has on the volumes of future
output samples in the final sound. But if we reversed the impulse, we could think of it as the echoed
effects of all previous input samples on the current output sample. For example, let e(k ), 0 ≤ k ≤ N
be the impulse response, where e(0) is the original impulse and e(k ), k > 0 are future echoes. By
reversing this, we can gather the echoes from earlier samples and sum them into the current sound:
N
y(n) = ∑ x (n − k) × e(k)
k =0
...where x (n) is the sound input and y(n) is the resulting output with reverb. Obviously we
should zero pad: if n − k < 0 then x (n − k ) × e(k ) = 0. This equation should look very similar to
the convolution equations found in Section 8.1: indeed the impulse response is essentially being
used as a very long finite impulse response filter.112
111 I’m not providing the details of these parameters here: but you can examine them, and other architectures, at
https://ccrma.stanford.edu/∼jos/Reverb/Reverb.html
112 Now finally it should make sense why FIR is called a finite impulse response filter.
130
This sampling approach cannot be tuned with a variety of parameters like the Schroeder
algorithmic approach can. However, it has the advantage of providing a nearly exact reproduction
of the reverberation qualities of a chosen environment. Convolution reverb is expensive, however.
If the impulse response sound is N samples long, then adding reverb to M samples of the original
sound is O( MN ) for long reverbs.
There’s a faster way to do it: we can use the FFT! A critical feature of the FFT is that convolution
in the time domain is exactly the same thing as multiplication in the frequency domain. To start, let’s
zero-pad the impulse response to be as long as the sound, that is, we set things up so that M = N.
Let’s call the impulse response e(t) and the sound s(t). We take the FFT of the original sound to
produce S( f ), and similarly the FFT of the reversed impulse response to produce E( f ). Next, we
multiply the two, that is, for each value f , the result R( f ) = S( f ) × E( f ). Finally, we take the
inverse FFT of R( f ) to produce the final resulting sound r (t).
So let’s count up the costs: an FFT is O( N lg N ), and so is an IFFT. On top of that, we’re doing
N multiplies. Overall, this is O( N lg N ), which is likely much smaller than the O( MN ) required by
direct convolution. Clever! But this means we have to apply reverb in bulk to the entire sample.
That won’t do.
Instead, if M N, we could perform Sound Chunks · · ·
the Short Time Fourier Transform or STFT. + + + + + + + + + + + + + + +
Overlapping
This is little more than breaking a sound Sound Chunks · · ·
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
into chunks and then doing an FFT on each
Result · · · · · ·
chunk. To get things right, our chunks
Figure 144 Overlap-Add procedure.
ought to overlap by 50%, like bricks in a
brick wall. That is, if our first chunk of our
sound is 0 ... M − 1, then our next chunk is 1/2M ... 3/2M − 1, the next chunk is M ... 2M − 1, and
so on. The general approach would be to break into these overlapping chunks of size M, perform
the FFT and IFFT trick with the reversed impulse response on each of them in turn, then reassemble
them by adding them together using the Overlap-Add Method as shown in Figure 144.
You’d think that adding overlapped chunks in
this way would cause smearing, but it will reassem- Triangular window Fourier transform
0.9
0
10
20
we have first windowed each chunk with a win- 0.8
0.7
30
40
decibels
0.6 50
60
0.5
0
120
130
0 N 40 20 0 20 40
others.113
Figure 145 Triangular or Bartlett Window.©66
But even this is not enough, because even
though M N, it’s still quite large: to process a sample would require an O( M) delay. Thus
convolution reverb algorithms typically break both the sound and the impulse response into even
smaller chunks to reduce the delay. By using variable length chunks and other clever optimizations
the delay can be largely dealt with.114
113 I’m not saying the Triangular and Hann windows are the best choices: simply that they are easy examples.
114 For a practical introduction, see https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/convolution.html
131
11.5 Phasers
2.0
A phaser is typically implemented with a long string of all-pass Figure 146 Phaser amplitude effect
filters with different sizes tuned to provide the phaser’s various
peaks and troughs when remixed with the original sound. Figure 147 shows one possible imple-
mentation.
While an all-pass filter only modifies the
Cut
phase of its signal (and we generally can’t
detect that unless it is extreme), this creates x(n) All Pass
Filter
All Pass
Filter
... Cut + y(n)
interference patterns when added back into
the original signal, and if carefully tuned, Figure 147 A phaser implementation.
can produce phaser and other lobe patterns.
Typically two all-pass filters are needed per lobe, and this may result in the need for quite a number
of them alltogether.115
4: y ← bp
5: b p ← 1/2 y + 1/2 y0
6: p ← p+1
7: if p ≥ m then
8: p←0
9: 0
y ←y
10: return y
132
This should look very familiar: it’s closely related to
1/2
the basic digital delay line (Algorithm 20). But unlike a
delay line, Karplus-Strong’s delay buffer starts filled with
random noise. Furthermore, as the buffer is drained it is + Delay Line of length N
(Initially Random Noise)
y(n)
0.8 0.8
0.6 0.6
0.2 0.2
0.8 0.8
very rapidly this sound loses its high frequencies until, at 0.4 0.4
the end, it’s largely a sine wave. The high frequencies are
0.2 0.2
0 20 40 60 80 100 0 20 40 60 80 100
lost due to the averaging: notice that the averaging is ba- 200 1600
sically a one-pole low-pass filter. Figure 149 shows this
1.0 1.0
0.8 0.8
There are some issues. First N is an integer, and this will 0.4 0.4
0 20 40 60 80 100 0 20 40 60 80 100
mit any frequency through (what else?) the judicious use 400 3200
of all-pass filters. Second, high frequency sounds will de-
cay faster than low-frequency ones because the buffers are Figure 149 Karplus-Strong buffer (size 100),
after various iterations of the algorithm. Note
smaller and so all the samples pass through the filter more the gradual impact of the low-pass filter.
often. Adjusting this per-frequency can be challenging.
One can shorten the die-off very easily by replacing the 1/2
in the equation y(n) = 1/2 y(n − N ) + 1/2 y(n − N − 1)
with some smaller fraction. Lengthening is more complex.
Note that making any adjustments at all may be unneces-
sary: in real plucked instruments it’s naturally the case for
high frequency notes to decay faster anyway.116
Traveling Waves One interpretation of Karplus-Strong’s Figure 150 Traveling waves in a plucked
delay line is as a poor man’s simulation of a traveling string. (Top) The string is initially plucked,
wave in a plucked string. When a string is plucked, its pulled away as shown. (Bottom) The wave
wave doesn’t stay put but rather moves up and down the separates into two traveling waves going op-
posite directions, towards the endpoints. Af-
string; and indeed there are two waves moving back and
ter reaching the endpoints, the waves invert
forth, as shown in Figure 150. Karplus-Strong might be and reflect back (then reflect back again when
viewed as a model of one of these waves as it decays. But reaching the opposite endpoint, and so on).
116 For hints on how to deal with both of these issues, see David A. Jaffe and Julius O. Smith, 1983, Extensions of the
Karplus-Strong plucked-string algorithm, Computer Music Journal, 7(2). This paper also suggests improving the basic
algorithm by adding a variety of low-pass, comb, and all-pass filters in the chain.
133
more sophisticated models of strings use two waves as part of a waveguide network. Traveling
waves don’t just appear in strings: they also occur in the air in tubes or pipes, such as woodwinds,
brass, organs, and even the human vocal tract. Modeling waves with waveguide networks has
given rise to a form of physical modeling synthesis known as digital waveguide synthesis, where
elaborate models of waveguides can be used to closely simulate plucked or bowed strings, blown
flutes or reed instruments, voices, and even electric guitars.
A bidirectional digital waveguide can be
simulated with two multi-tap delay lines as Multi-Tap Delay Line -1
shown in Figure 151. Here’s the general idea.
Each delay line represents a traveling wave in Low Pass
one direction. When sound exits the delay line, Filter
x(n) 1/2 + y(n)
Low Pass
Filter
117 Agood source of advanced techniques in this area, as well as delay-based effects, is Physical Audio Signal Processing
by Julius Smith, available online at https://ccrma.stanford.edu/∼jos/pasp/
134
12 Controllers and MIDI
A controller is a device which enables a human to control the notes or parameters of synthesizer in
a useful way. Early synthesizer designs incorporated controllers such as keyboards and pedals as
part of the system. However with the advent of the Musical Instrument Digital Interface or MIDI,
which enabled one device to remotely control another one, the keyboard and the synthesizer began
to part ways. Many synthesizers became simple rackmount devices intended to be manipulated by
a controller of one’s choosing; and the market began to see controller devices which produced no
sound at all, but rather sent MIDI signals intended for a downstream synthesizer. We’ll discuss
MIDI in Section 12.2.
With the advent of the computer and the Digital Audio Workstation we have seen another
sea change: controllers which do not send MIDI to a synthesizer, but rather directly to computer
software which then either records it or routes it to a software or hardware synthesizer. Indeed
many of the cheap controllers found on the market nowadays are outfitted only with USB jacks
rather than traditional 5-pin MIDI jacks, and intended solely for this purpose.
Controllers are essentially the user interface of synthesizer systems, and so it is critical that they
be designed well. A primary function of a good user interface is to help the musician achieve his
goals or tasks as easily, accurately, and rapidly as possible. Playing music is an operation involving
changing multiple parameters (pitch, volume, many elements of timbre, polyphony) in real-time,
and significant effort in musical interface design has been focused on new ways or paradigms to
enable a musician to control this complex, high-dimensional environment intuitively with minimal
cognitive load.
12.1 History
Keyboards Among the earliest controllers have undoubtedly been keyboards. The modern
keyboard is perhaps five hundred years old, dating largely from organs, and later migrating to
harpsichords and clavichords.118 These instruments all shared something in common: their keys
were essentially switches. No matter how hard you struck a key, it always played the same note at
the same volume. A major evolution in the keyboard came about with the pianoforte,119 nowadays
shortened to piano. This instrument hit strings with a felt hammer when a key was played, and
critically the velocity with which the key was struck translated into the force with which the string
was hit, and thus the volume with which the note was sounded.
This critical difference caused piano keyboards to deviate from organ keyboards in their action.
The action is the mechanics of a keyboard which cause it to respond physically to being played.
Early on, Bartolomeo Cristofori (the inventor of the pianoforte) developed an action which resisted
being played because playing a key required lifting a weight (the hammer). Because the key didn’t
just give way immediately on being struck, it formed a kind of force-feedback which helped the
performer to “dial in” the amount of volume with which he wanted a note to play. As pianos
developed more and more dynamic range120 this resistive weighted action became more and more
detailed in order to serve the more sophisticated needs of professional pianists. Organs never
adopted a weighted action because they didn’t need to: organ keyboards have no dynamic range.
Typical organ actions are unweighted: the keys give way almost immediately upon being struck.
118 In case you were wondering what the difference was between the two: when a note was struck, a harpsichord
plucked a string, while a clavichord would hit it with a small metal tangent.
119 Italian for “soft-loud”.
120 The difference between the loudest possible note and the softest.
135
Modern synthesizer keyboards traditionally have unweighted actions because early synthesiz-
ers, like organs, had no dynamic range; but these unweighted keyboards perhaps stuck around in
the synth world because unweighted actions made for cheaper synthesizers. This is a strange fit
because modern synthesizers, like pianos, are largely velocity sensitive and so have a significant
dynamic range. Even more unfortunate is the recent popularity of cheap mini key keyboards (see
Figure 154) whose travel (the distance the key moves) is significantly reduced, or membrane or
capacitive “keyboards” with no travel at all. Such keyboards make it even more difficult, if not
impossible, to dial in precise volume, much less play notes accurately. There are other synthe-
sizer keyboards, known as weighted keyboards, which simulate the weighted action of a piano.
Like pianos, such keyboards vary in how much resistive weight they impart, depending on the
performer’s tastes.
121 On an organ the expression pedal is called a swell pedal, as early versions controlled the swell box, a set of blinds
between the organ’s pipes (stops) and the audience which could be opened or closed to varying degrees to change the
amount of sound reaching the audience.
122 Named after its inventor, Léon Theremin. The theremin remains the only significant musical instrument that is
played without touching it. You’ve heard the theremin: it’s the eerie space-ship sounding instrument on Good Vibrations
by the Beach Boys. And now for a fun fact. Léon Theremin was a Russian who developed the instrument based on
his Soviet-funded research into radio-based distance sensors. He traveled the world promoting his instrument and
popularizing its use in concert halls, classical and popular music, and so on. He was then kidnapped in his New York
City apartment by Soviet agents and taken back to a Siberian prison-laboratory and forced to design spy devices for
Stalin for 30 years. It was there that he invented an incredible device called The Thing. This was a passive (powerless)
microphone listening device embedded in a Great Seal of the United States given to the U.S. Ambassador to Russia and
which hung in his office for almost a decade before being discovered. Look it up. It’s an amazing story.
123 A theremin-inspired controller found on some synthesizers in the 1990s was Roland’s D-Beam, which measured
the distance of one’s hand with an infrared beam sensor. It’s often, and I think unfairly, ridiculed.
136
Another approach, popularized by the Trautonium and similar
devices (see Section 6.1), was to control pitch by pressing on a
wire at a certain position; this also allowed sliding up and down
the wire to bend the pitch. Variants of this found their way into
synthesizers, including the pitch ribbon, a touch-sensitive strip
on the Yamaha CS-80 (Figure 52 on page 59). This strip was put
to heavy use by Vangelis for his soundtracks (page 59). Touch
strips are found here and there on modern synthesizers, but more
common are sliding wheels such as the ubiquitous pitch bend Figure 154 Akai MPK mini MIDI
wheel and modulation wheel found next to almost all modern controller, with a self-centering joy-
stick (red), drum pads, assignable
synthesizer keyboards. The pitch bend wheel, which shifts the knobs, and velocity sensitive mini
pitch of the keyboard, is self-centering, meaning that when the keys.©69
performer lets go of it, it springs back to its mid-way position. The
modulation wheel, which can often be programmed to control a variety of parameters, stays put
much like an expression pedal. A similar effect (in two simultaneous directions) can be achieved
with a joystick.124
137
since been designed, and drum pads have been reduced in size where they can be played with
fingers and used to augment controller keyboards (such as in Figures 154 and 155).
Drums are not the only option. Software can be added to pick-
ups for guitars and other stringed instruments, converting their
audio into MIDI event signals (via guitar processors). Wind con-
trollers have been devised in the shape of woodwind instruments
(see Figure 157). Wind controllers can control more parameters
than you might imagine, including finger pressure, bite pressure,
breath speed, and other embouchure manipulations. Related is the
breath controller, where the musician simply blows at a certain
rate to maintain a certain parameter value.
Figure 157 Wind Controller.©71
Grid Controllers The 2000s saw a significant degree of influ-
ence on the synthesizer industry by the DJ market. One partic-
ularly popular digital audio workstation, Ableton Live, capital-
ized on this with a GUI consisting of a grid of buttons tied to
samples triggered when the corresponding button was pressed.
To support this and similar DAW UIs came a new kind of MIDI
controller, the grid controller, which provided hardware buttons
corresponding to the software ones in Ableton. The first major
controller of this type was Novation’s Launchpad (Figure 158).
A grid controller is simply an array of buttons or velocity sen-
sitive drum pads with a few additional auxiliary buttons or dials.
These are not complex devices: their grids can be configured
for many performance tasks, but are most commonly they are
©72
used as buttons which trigger sound samples, and which light Figure 158 Novation Launchpad.
up while the sample is playing.
138
Many keyboards implement channel aftertouch (a MIDI term:
see Section 12.2), whereby the keyboard can detect that some key
is being pressed harder and by what amount. This only adds one
global parameter, like the mod wheel or pitch bend, rather than
per-note parameters. It is much more expensive for a keyboard
to implement polyphonic aftertouch (again, a MIDI term), where
the keyboard can report independent aftertouch values for every
key being pressed. Polyphonic aftertouch is rare: only a few synthe-
sizers and controller keyboards have historically implemented it.
Figure 160 Haken Continuum Fin-
Figure 159 shows an Ensoniq SQ80, one synthesizer which had gerboard (Top)©74 and Roger Linn
polyphonic aftertouch. Finally, when the musician releases a key, Design LinnStrument (Bottom).©75
some keyboards report the release velocity.
Recent controllers have made possible even more simultaneous
parameters. The first controller in this category was the Haken
Continuum Fingerboard; others include the ROLI Seaboard and
the Roger Linn Design LinnStrument (Figures 160 and 161).
These devices all take the form of flexible sheets which the
musician plays by hitting with his fingers. When the musician
touches the sheet with a finger, it registers the location touched
and the velocity with which the finger hit the sheet: these translate
into note pitch and velocity (volume) respectively. The musician
Figure 161 ROLI Seaboard.©76
can then move his finger about the sheet, which causes the device
to report the new pressure with which the finger is touching it, as
well as its new X and Y locations. These translate into aftertouch, pitch bend (for the X dimension)
and a third parameter of the musician’s choice for the Y dimension. Finally, as the musician releases
his finger, the sheet reports the release velocity. Critically, this information is reported for multiple
fingers simultaneously and independently.127 Related is the Eigenlabs Eigenharp, which combines a
multidimensional touch-sensitive keyboard, a controller strip, and a breath controller.
12.2 MIDI
In 1978 Dave Smith (of Sequential Circuits, Inc.) released the popular and
influential Prophet 5 synthesizer. The Prophet 5 was the first synthesizer to
be able to store multiple patches in memory, and to do this, it relied on a
CPU and RAM. Smith realized that as synthesizers began to be outfitted with
processors and memory, it would be useful for them to be able to talk to one
another. With this ability, a performer could use one synthesizer keyboard
to play another synthesizer, or a computer could play multiple synthesizers
at once to create a song. So in 1983 he worked with Ikutaro Kakehashi
(the founder of Roland) to propose what would later become the Musical
Instrument Digital Interface, or MIDI. MIDI has since established itself as
one of the stablest, and oldest, computer protocols in history.
MIDI is just a one-way serial port connection between two synthesizers,
allowing one synthesizer to send information to the other. MIDI was designed Figure 162 ©77
Eigenharp.
for very slow devices and to pack a lot of information into a small space.
127 Yes, this means, among other things, that these devices effectively have polyphonic aftertouch.
139
MIDI runs at exactly 31,250 bits per second. This is a strange and nonstandard serial baud rate:
why was it chosen? For the simple reason that 31250 × 32 = 1, 000, 000. Thus a CPU running at N
MHz could be set up to read or write a MIDI byte every N/32 clock cycles, making life easier for
early synthesizer manufacturers.
MIDI bytes are sent (in serial port parlance) with 1 start bit, 8 data bits, and 1 stop bit. This
means that a single byte requires 10 bits, and thus MIDI is effectively transmitted at 3125 bytes
per second. This isn’t very fast: many MIDI messages require three bytes, and so a typical MIDI
message, such as ”play this note”, requires about 1 millisecond to transmit. Keep in mind that
humans can detect audio delays of about 3 milliseconds. Pile up a few MIDI messages to indicate
a large chord, and the delay could be detectable by ordinary ears. Thus a number of tricks are
employed, both in MIDI and by manufacturers after the fact, to maximize throughput.
12.2.1 Routing
MIDI is designed to enable one device to control up to 16
other devices. In its original incarnation, MIDI ran over
a simple 5-pin DIN serial cable, and a MIDI device had a
MIDI in port, a MIDI out port, and a MIDI thru port, as
shown in Figure 163. MIDI In received data from other
devices, MIDI Out sent data to other devices, and MIDI
Thru just forwarded the data received at MIDI In.
To send MIDI data from Synthesizer A to Synthesizer
B, you’d just connect a MIDI cable from A’s Out port to B’s
In port. If you wanted send MIDI data from Synthesizer
A to Synthesizers B and C, you could connect a cable from Figure 163 5-Pin DIN MIDI Cable and its In,
©78
A’s out to B’s In, then connect another cable from B’s Thru Out, and Thru ports.
to C’s in (and repeat to connect to D, etc.)
An alternative would be to connect A to a device In In
B, and B to send to device A. Just connect a MIDI Synth C Synth B Synth C Synth D
cable from A’s Out to B’s In, and likewise another Out Thru Out Thru Out Thru Out Thru
C’s In. Then you connect B’s Out to D’s In. Device Figure 164 MIDI Routing Options
A wouldn’t send data to D: but B could.
Note that while MIDI is designed to allow one sender connect to multiple receivers, it is not
designed to allow multiple senders to send to the same receiver. To enable such magic would
require a special gizmo called a MIDI merge device, and some wizardry would be involved.
140
MIDI over USB MIDI has since been run over Ethernet, Firewire, wireless, Bluetooth, fiber-optic,
you name it. But critically MIDI is now very often run over USB, as an alternative to the old 5-Pin
DIN cables, often to connect a synthesizer or controller to a computer. Given that USB also allows
one device to connect to many, and is much faster than old MIDI serial specs, you’d think this was
a good fit. But it’s not.
The first problem is that USB connects a host (your computer) with a client (your printer, say),
and indeed they have different shaped ports to enforce which is which. USB devices generally
can’t be both hosts and clients without separate USB busses. This means that, in almost all cases,
traffic has to be routed through the host — your laptop — even you just want a controller to control
a synthesizer. Most USB MIDI devices, which lack their on host ports, have lost the peer-to-peer
capability which made MIDI so useful. USB is great for connecting mice to your computer. Not so
much networking synthesizers with other synthesizers.
Another more serious problem is that USB is not electrically isolated. When two devices are
attached over USB, they are directly electrically connected, and this often creates problematic
electronic noise issues — including the infamous “ground loop”, a 50Hz or 60Hz hum produced
when two audio devices are connected which have different grounds. MIDI was originally expressly
designed to avoid these issues: its circuitry specification requires an optoisolator: essentially a
little light bulb and light detector in a small package which, when embedded in the MIDI circuitry,
allows two devices to talk to one another without actually being electrically connected at all.
Nonetheless, with the advent of the Digital Audio Workstation, more and more music studios
are computer-centric, with all the synthesizers and similar devices flowing into a single computer.
The popularity of MIDI over USB only promotes this, as USB is highly PC-centric.
12.2.2 Messages
MIDI messages are just strings of bytes. The first byte in the sequence, called the status byte, has
its high bit set to 1. The remaining data bytes in the sequence have their high bits set to 0. The
status byte indicates the type of message. Thus MIDI can only transfer 7 useful bits in a byte, as the
first bit is used to distinguish the head of a message from the body. For this reason, you’ll find that
the numbers 128 (27 ) and 16384 (27×2 ) show up a lot in MIDI, but 256 rarely does. Indeed, 7-bit
strings in MIDI are so prevalent that they are often referred to as “bytes”.
MIDI is organized so that the most time-sensitive messages are the shortest:
• Single byte messages are largely timing messages. These messages are so time critical that
they can in fact be legally sent in the middle of other messages.
• Two- and Three- byte messages usually signify events such as “play a note”, “release a note”,
“change a control parameter to a certain value”, etc.
• There is a single type of variable-length message: a system exclusive (or sysex) message. This
is essentially an escape mechanism to allow devices to send custom data to one another, often
in large dumps: perhaps transferring a synthesizer patch from a computer to a synthesizer,
for example.
141
Sysex 0xF0 id... data... 0xF7 Sysex messages are manufacturer-specific, but they are required
to have a certain pattern. First comes the status byte 0xF0. Next comes a stream of data bytes. The
first few data bytes must be the ID of the manufacturer of the synthesizer for which the message
is crafted. Manufacturer IDs are unique and registered with the MIDI Association. This allows
synthesizers to ignore Sysex messages that they don’t recognize. At the end of the stream of data
bytes is another status byte, 0xF7, indicating the end of the message.
Channels Some messages (timing messages, sysex, etc.) are broadcast to any and all devices
listening. Other messages (like note information) are sent on one of 16 channels 0...15. The 3 bits
indicating the channel are part of the status byte. A synthesizer can be set up to respond to only
messages on a specific channel: that way you can have up to 16 different synthesizers responding
to messages from the master. There’s no reason a synthesizer can’t respond to different channels
for different purposes (this is common); and there’s no reason you can’t set up several synthesizers
to respond to the same channel (this is unusual). Finally, many synthesizers are set up by default
to respond to messages on any channel for simplicity. In MIDI parlance this is called the omni
channel.
Running Status It takes three bytes (about 1 ms!) just to tell a synthesizer to start playing a note.
But recognizing that very often the same kind of message will appear many times in sequence,
MIDI has a little compression routine: if message A is of a certain type (say “Note On”), and the
very next message B is the same kind of message and on the same channel, then B’s status byte
may be omitted. If the very next message C is again the same message type and channel, its status
byte may be omitted as well, and so on. This allows a stream of (say) Note On messages to start
with a 3-byte message, followed by many 2-byte messages.
Channel Voice Messages Most MIDI messages are of this type: they indicate events such as notes
being played or released, the pitch bend wheel being changed, etc. All of these messages have
associated channels. The channel is specified by the lower four bits of the the status byte (denoted
ch below): thus 0x86 means a status byte for Note Off (the “8”) on channel 6 (the “6”).
• Note On 0x9ch note velocity tells a synthesizer that a note should be played. This
message comes with two data values, both 0...127: the note in question (middle C is 60, that
is, 0x3c), and the velocity (how fast the key was struck), which usually translates to the note’s
volume. Some keyboards may not detect velocity, in which case 64 (0x40) should be used. A
velocity of 0 has a special meaning, discussed next.
• Note Off 0x8ch note release velocity tells a synthesizer that a note should stop being
played. This message comes with two data values, both 0...127: the note in question (middle
C is 60 or 0x3c), and the release velocity (how fast the key was released).
Many keyboards cannot detect release velocity, in which case 64 (0x40) should be used. If we
didn’t care about release velocity, then instead of sending a Note Off, it is very common to
instead send a Note On with a velocity of 0, which is specially interpreted as a Note Off of
velocity 64. This allows a controller to never have to send a Note Off message, just a string of
Note On messages, and so take better advantage of Running Status.
142
• Polyphonic Key Pressure or Polyphonic Aftertouch 0xAch note pressure tells a synthe-
sizer that a key, currently being held down, is now being pressed harder (or softer). This
message comes with two data values, both 0...127: the note in question (middle C is 60 or
0x3c), and the pressure level. Polyphonic key pressure is difficult to implement in a keyboard
and so it’s not very common, and this is probably good because it tends to flood MIDI with
lots of messages.
• Channel Pressure or Channel Aftertouch 0xDch pressure tells a synthesizer that the key-
board as a whole is now being pressed harder (or softer). This message comes with a single data
value (0...127): the pressure level. Many keyboards implement channel pressure. A synthesizer
won’t implement both channel and polyphonic key pressure at the same time.
• Program Change or PC 0xCch patch asks the synthesizer to change to some new patch
(0...127). Many synthesizers have more than 128 patches available, so it’s not uncommon for
patches to be arranged in banks of up to 128 patches, and so a PC message may be preceded by
a bank change request, discussed later. This message is rarely real-time: many synthesizers
take quite a bit of time (milliseconds to seconds) to change to a new patch.
• Pitch Bend 0xEch MSB LSB tells a synthesizer that the pitch bend value has been
changed.128Pitch Bend is a high resolution 14-bit value from -8192...+8191. The two values
(MSB and LSB) are both 0...127, and the bend value is computed as MSB × 128 + LSB − 8192.
• Control Change or CC 0xBch parameter value tells a synthesizer that some parameter
(0...127) has been adjusted to some value (0...127). You can think of this as informing a
synthesizer that a musician wants to tweak some knob on it. The meaning of CC parameters
and their respective values varies from synthesizer to synthesizer, and there’s some complexity
to it, discussed in Section 12.2.3. Also, 0...127 is not particularly fine-grained: also discussed
in Section 12.2.3 are options for sending more precise information.
Clock Messages Many music devices, such as drum machines, can play songs or beat patterns
all on their own. It’s common to want to synchronize several of them so they play their songs or
beats at the same time. MIDI has a mechanism to allow a controller to send clock synchronization
messages to every listener. MIDI defines a clock pulse as 1/24 of a quarter note. This is a useful
value, since lots of things (sixteenth notes, triplets, etc.) are multiples of it. A controller can send
out clock pulses at whatever rate it likes, like a conductor, and listening devices will do their best to
keep up.
To send clock pulses, a device must first send a Start message. It then sends out a stream of Clock
Pulse messages.129 It may conclude by sending a Stop message. If it wished to start up where it left
off, it could then send a Continue message and keep going with pulses. Alternatively, if it wished to
restart from the beginning, it could send another Start message after the Stop and continue pulsing.
128 MSB stands for Most Significant Byte and LSB stands for Least Significant Byte, even though neither of them is a
the clock rate until two clock pulses have been received, and you have to wing it until then.
143
• Clock Pulse or Timing Clock 0xF8 Sends a pulse.
• Start 0xFA Informs all devices to reset themselves and to prepare to begin playing on
receiving pulses.
• Song Select 0xF3 song Informs all devices to prepare to start playing a given song (drum-
beat pattern, whatnot) 0...127. This is not often used.
• Song Position Pointer 0xF2 MSB LSB Informs all devices to prepare to begin playing
the current song at the given position MSB×128+LSB. The position is defined in “MIDI Beats”:
one MIDI Beat is 6 clock pulses, that is, one sixteenth note. Position 0 is the start of the song.
Other Stuff There are several other non-channel messages, none particularly important:
• MIDI Time Code Quarter Frame 0xF1 data byte A sequence of these messages collec-
tively send an SMPTE130 time code stamp. This is an absolute time value (frames, seconds,
minutes, etc.) and is used to synchronize MIDI with video etc. These messages won’t be
discussed further here.
• Tune Request 0xF6 Asks all devices to tune themselves. No, seriously. MIDI was created
when synthesizers were primitive.
• Active Sensing 0xFE An optional heartbeat message which assures downstream devices
that the controller hasn’t been disconnected. It can be ignored.
• System Reset 0xFF Asks synthesizers to completely reset themselves as if they had just
been powered up. Again, MIDI is old.
Control Change (CC) messages (of the form 0xBch parameter value ) are meant to allow a con-
troller to manipulate a synthesizer’s parameters, whatever they may be. Synthesizers are free to
interpret various control change messages however they deem appropriate, though there are some
conventions. Here are a few common ones:
• Parameter 0 often selects the patch bank. Thus (for example) a synthesizer might have up to
128 banks, each containing 128 patches (selected with Program Change).
144
• Parameter 2 often specifies the value of a breath controller device.
• Parameter 11 often specifies the value of an expression controller (this value is usually
multiplied against the global overall instrument volume to set its temporary volume).
• Parameter 64 often specifies whether the sustain pedal is down (1) or not (0).
• Parameter 74 often specifies the “Third Dimension Controller” specified by MIDI poly-
phonic expression (or MPE), discussed later.
• Parameters 6, 32, 96, 97, 98, 99, 100, and 101 are often reserved for NRPN and RPN (see later).
• Parameters 120–123 are reserved for standardized functions called MIDI channel mode
messages:
– Parameter 120 (with value 0) is the all sound off message. This tells a synthesizer to
immediately cut all sound.
– Parameter 121 (with value 0) is the reset all controllers message. This tells a synthesizer
to reset all its parameters to their default settings.
– Parameter 122 is the local switch message. This tells a synthesizer to turn on (127) or
off (0) “local mode”. When in local mode, the synthesizer’s own keyboard can send
notes to the synthesizer. When not in local mode, this connection is disconnected, but
the keyboard can still send messages out MIDI, and the synthesizer can still respond to
MIDI.
– Parameter 123 (with value 0) is the all notes off message. This tells a synthesizer to
effectively send a Note Off message to all its currently played notes. This does not
immediately cut all sound, as notes may have a long release time in response.
• Parameters 124–127 are reserved for additional, now-obsolete standardized MIDI channel
mode messages which control so-called omni mode and mono vs. poly modes. An instrument
in omni mode responds to any channel. An instrument in mono mode is monophonic, and
an instrument in poly mode is polyphonic. While these modes and messages are obsolete,
this region is nonetheless still (unfortunately) reserved.131
CC messages have two serious problems. The first problem is that there are only 120 of them
(disregarding the MIDI channel mode region). But a synthesizer often has hundreds, sometimes
thousands, of parameters! The second problem is that the value can only be 0...127. This is a very
coarse resolution: if you turned a controller knob which sent CC messages to a synthesizer to (say)
change its filter cutoff, the stepping would be very evident — it wouldn’t be smooth.
131 Why 124 and 125 aren’t merged, and similarly 126 and 127, I have no idea.
145
Early on, the MIDI spec tried to deal with the second problem by reserving CC parameters 32–63
to be the Least Significant Byte (LSB) corresponding to the parameters 0–31 (the Most Significant
Byte or MSB). The idea was that you could send a CC for parameter 4 as the MSB, then send
a CC for parameter 36 (32+4) as the LSB, and the synthesizer would interpret this as a higher
resolution 14-bit value 0...16383, that is, MSB × 128 + LSB. This would work okay, except that there
were only 32 high-resolution CC parameters, and this scheme reduced the total number of CC
parameters — already scarce — by 32. Thus many early synthesizers simply disregarded CC for
their advanced parameters and relied on custom messages via the Sysex facility (unfortunately).
But in fact MIDI has a different and better scheme to handle both of these two problems:
Reserved Parameter Numbers (RPN) and Non-Reserved Parameter Numbers (NRPN). The RPN
and NRPN schemes each permit 16384 different parameters, and those parameters can all have
values 0...16383. RPN parameters are reserved for the MIDI Association to define officially, and
NRPN parameters are available for synthesizers to do with as they wish.
RPN and NRPN work as follows. For NRPN, a controller begins by sending CC Parameter 98
and CC Parameter 99, which which define the MSB and LSB respectively of the NRPN Parameter
number being sent. Thus if a controller wished to send an NRPN 259 message, it’d send 2 for
Parameter 98 and 3 for Parameter 99 (2 × 128 + 3 = 259). For RPN, these CC parameters would be
100 and 101 respectively. Next, the controller would send the MSB and LSB of the value of the NRPN
(or RPN) message as CC Parameters 6 and 32 respectively. The MSB and LSB of the value can come
in any order and either may be omitted, unfortunately complicating matters. The controller could
alternatively send an “increment” or “decrement” message (96 and 97 respectively). For example, a
CC 96 with a value of 5 would mean that the parameter should be incremented by 5.
Inspired by Running Status, a stream of these value-messages (6, 32, 96, or 97) could be sent
without having to send the parameter CC messages again, as long as the NRPN or RPN parameter
remained the same. To shut off this running-status-ish stream (perhaps to prevent any further
inadvertent NRPN value messages from corrupting things), one could send the RPN Null message.
This is RPN parameter 16383 — that is, MSB 127 and LSB 127 — with any value.
The problem with RPN and NRPN is that they are slow: to update the value of a new parameter
requires 4 CC messages, or 12 bytes. Another problem with RPN and NRPN is that only some
synthesizers implement them, and even more problematically, some lazy Digital Audio Workstation
manufacturers do not bother to include them as options.
12.2.4 Challenges
MIDI has been remarkably stable since it was invented in 1983: indeed, the spec is still technically
fixed at 1.0!132 But MIDI was designed in the age of synthesizer keyboards, and it was not meant to
be extended to elaborate multidimensional controllers which manipulate many parameters at once,
nor to complex routing scenarios involving software. This produces a number of problems:
• Many MIDI parameters are per-instrument, not per-voice. MIDI can support many pa-
rameters, but it has only has a few defined parameters which are per note: pitch, attack
velocity, release velocity, and polyphonic aftertouch. Other parameters are global to the whole
instrument, whether appropriate or not (often not).
132 Thisis really kind of a lie. MIDI 1.0 in 1983 is fairly different from the MIDI 1.0 of the 2000s. But the MIDI
Association never updated the version number. That’s finally changing soon though, with MIDI 2.0.
146
• MIDI is slow. MIDI was fixed to 31,250 bits per second in order to support early synthesizers
with 1MHz CPUs. This is not fast enough to guarantee smooth transitions beyond the ability
for humans to detect.
• MIDI is low resolution. Only two standard parameters (pitch bend and song position
pointer) are 14-bit: the rest are 7-bit, which is very coarse resolution. There exist two kinds of
14-bit extensions to some parameters (14-bit CC and RPN/NRPN), but they come at the cost
of making MIDI up to 3× slower. One solution to this is not to use MIDI at all, but rather to fall
back to traditional CV/Gate control used by modular synthesizers. CV/Gate is real-valued
and so can be arbitrarily high resolution (in theory) and fast (in theory). A number of current
keyboards provide both MIDI and CV/Gate for modular synthesizers such as Eurorack.133
• MIDI was designed as a one-direction protocol. There’s no standard way to query devices
for capabilities and get results back, to negotiate to use a more advanced version of the
protocol, etc.
But this situation will be changing soon with many new MIDI protocol features. The first of
these are dealt with by a new extension to MIDI called MIDI polyphonic expression or MPE. The
remaining three problems are will be tackled by an upcoming version of MIDI called MIDI 2.0. We
discuss these below.
12.2.5 MPE
MIDI was designed with the idea that people would by and large
use keyboards as controllers. Keyboards are essentially a collection
of levers, and the performer is restricted in the number of param-
eters he can control for each note. In MIDI, a performer can can
specify at most the note of the key, the velocity with which the key
is struck, the velocity with which it is released, and (using poly-
phonic key pressure) the pressure with which a key is currently
being pressed. All other parameters (CC, PC, channel pressure,
NRPN, etc.) are global to the whole instrument. Particularly prob-
lematic: pitch bend is global.
Figure 165 Futuresonus Parva, the
But many of instruments are more expressive than this: for first hardware synthesizer134 to sup-
example, a guitarist can specify the volume and pitch bend of each port MPE.©79
string independently. A woodwind musician controls all sorts of
timbre parameters with his mouth (the embouchure with which he plays the instrument). And so
on. Many current advanced MIDI controllers seek to enable changing a variety of independent,
per-note parameters in real time. But MIDI doesn’t permit this.
133 CV/Gate works as follows. A gate signal is an analog signal which goes from 0 volts to some positive voltage (or,
for some systems, from positive to 0) to indicate that a key has been struck. The opposite occurs to indicate that a key has
been released. Accompanying this is a control voltage or CV signal, which indicates the pitch of the note. Recall from
Footnote 33 (page 53) that CV is either encoded in volt per octave, where each volt means one more octave, or hertz
per volt, where voltage doubles with each octave. These are analog signals, and so they are as fast and as precise as
necessary. Additional signals could be added to indicate velocity and other parameters.
134 The Parva was also the first hardware synthesizer to support USB host for MIDI. This means that a USB MIDI
keyboard controller can be plugged directly into the Parva in order to play it. As mentioned in Section 12.2.1, normally
you’d have to control a synthesizer from a USB controller by attaching both to a laptop. The Parva is a rare exception.
147
To deal with this situation, the high-parameter controller manufacturers (ROLI, Haken, Roger
Linn Design, etc.) support a new MIDI standard which provides “five-dimensional” control (attack
velocity and note pitch, channel aftertouch,135 pitch bend, release velocity, and “Y” dimensional
movement, all per-note). This is known as MIDI polyphonic expression136 or MPE.
MPE works by hijacking MIDI’s 16 channels: rather than assign each channel to a different
synthesizer (a less common need nowadays), MPE uses them for different notes currently being held
down on a single synthesizer. MPE divides the 16 channels into up to two zones. If there is only
one zone, then one channel is designated its master channel, for global parameter messages from
the controller, and the other 15 channels are assigned to 15 different notes (voices) played through
that controller. That way each voice can have its own unique CC messages and its own pitch bend.
A special CC parameter, number 74, is by convention reserved as a dedicated standard “third
dimension” parameter (beyond pressure and pitch bend). If there are two zones — notionally to
allow two instruments to be played by the controller —- then each has its own master channel, and
the remaining 14 channels may be divvied up among the two zones (perhaps one instrument could
be allocated 4 voices and the other 10, say).
The two zones are known as the upper zone and lower zone. The lower zone uses MIDI
channel 1 as its master channel, and has some number of additional channels 2, 3, ... assigned
to individual notes. The upper zone has MIDI channel 16 as its master channel and additional
channels 15, 14, ... assigned to individual notes. If there is only one zone — by far the most common
scenario — it can take up all 15 available channels beyond the master channel, and the controller
may choose to use the upper or the lower zone as this sole zone.
MPE zones are either preconfigured in the instrument, or may be specified by the controller
using RPN command #6 sent on either channel 1 (to configure the lower zone) or 16 (to con-
figure the upper zone), with a parameter value of 0 (turning off that zone) or 1...15 (to as-
sign up to 15 channels to the zone). All told the RPN message consists of three CC messages:
0xBn 0x64 0x06 0xBn 0x65 0x00 0xBn 0x06 0x0m with n being the zone (0 or F for lower or
upper zone), and m being the number of channels 0...F.
Thereafter, when a note is played on the controller, it assigns a channel to the note and sends
Note On, Note Off, and all other note-related information on that channel only. This potentially
includes pitch bend, aftertouch, and CC, NRPN, or RPN commands special to just that note.
Additionally, the controller can make changes to all the notes under its control by issuing commands
on the master channel. There are a lot of subtleties involved in allocating (and reallocating) notes to
channels for which suggestions, but not requirements, are made in the MPE specification.137
Note that MPE doesn’t extend MIDI in any way: it’s just a convention as to how MIDI channels
are allocated and used for a special purpose. There’s no reason you couldn’t (for some reason) use
channels 1...14 for a lower MPE zone, and then use channels 14 and 15 to control standard MIDI
instruments in the conventional way, for example.
135 These devices almost always support polyphonic aftertouch too, but if we’re doing MPE, there’s no reason for it:
each note is on its own channel and so the aftertouch is already uniquely assigned to each note. Besides, polyphonic
aftertouch requires an additional byte.
136 The original name, which I much prefer, was multidimensional polyphonic expression, but the MIDI Association
changed the name prior to its inclusion in the MIDI spec. I don’t know why.
137 MIDI is an open protocol. The MPE specification, as well as other MIDI specifications and documents, are available
148
12.2.6 MIDI 2.0
As of this writing, MIDI 2.0 is not quite released: so we don’t know everything about it. But MIDI
2.0 is designed to deal with a number of difficulties in MIDI, not the least of which are its speed,138
low resolution, and unidirectionality.
MIDI Capability Inquiry (MIDI-CI) MIDI 2.0 is bidirectional. One consequence of this is that
MIDI 2.0 devices can query one another, trade data, and negotiate the protocol to be used.
• Profile Configuration A device can tell another device what kinds of capabilities it has.
For example, a drum machine, in response to a query, may respond indicating that it has a
certain profile typical of drum machines. This informs the listening device that it is capable
of responding to a certain set of directives covered by that profile.
• Property Exchange Devices can query data from one another, or set data, in a standardized
format: this might mean patches, sample or wavetable data, version numbers, vendor and
device names, and so on. Perhaps this might spell the end of custom and proprietary sysex
formats.
• Protocol Negotiation Devices can agree on using a newer protocol than MIDI 1.0, such as
MIDI 2.0. The MIDI 2.0 protocol has a number of important improvements over 1.0, including
higher resolution velocity, pressure, pitch bend, RPN, NRPN, and CC messages; new kinds of
articulated event data (more elaborate Note On / Note Off messages, for example); additional
high-resolution controllers and special messages on a per-note basis; and up to 256 channels.
MIDI 2.0 tries hard to be backward compatible with 1.0 when possible. If either device fails to
respond to a profile configuration request, property exchange, or protocol negotiation, then the
other device falls back to MIDI 1.0, at least for that element.
138 In fact, I do not know how MIDI 2.0 tackles speed yet, but I assume it does.
149
150
Sources
In building these lecture notes I relied on a large number of texts, nearly all of them online. I list the
major ones below. I would like to point out four critical sources, however, which proved invaluable:
• Steven Smith’s free online text, The Scientist & Engineer’s Guide to Digital Signal Processing,139
is extraordinary both in its clarity and coverage. I cannot recommend it highly enough.
• Julius Smith (CCRMA, Stanford) has published a large number of online books, courses, and
other materials in digital signal processing for music and audio. He’s considered among the
very foremost researchers in the field, and several of the algorithms in this text are derivatives
of those in his publications. https://ccrma.stanford.edu/∼jos/
• Curtis Roads’s book, The Computer Music Tutorial.140 Roads is a famous figure in the field: in
addition to being a prolific author and composer, he is also a founder of the International
Computer Music Association and a long-time editor for Computer Music Journal.
Representation of Sound
http://www.hibberts.co.uk/index.htm
https://en.wikipedia.org/wiki/Vibrations of a circular membrane
139 Steven Smith, 1997, The Scientist & Engineer’s Guide to Digital Signal Processing, California Technical Publishing,
151
Additive Synthesis
https://en.wikipedia.org/wiki/Additive synthesis
http://www.doc.gold.ac.uk/∼mas01rf/is52020b2013-14/2013-14/slides13.pdf
Subtractive Synthesis
https://en.wikipedia.org/wiki/Analog synthesizer
https://en.wikipedia.org/wiki/Trautonium
http://120years.net
Filters
https://www.dspguide.com/
https://ccrma.stanford.edu/∼jos/filters/
http://keep.uniza.sk/kvesnew/dokumenty/DREP/Filters/SecondOrderFilters.pdf
(“Second Order Filters”, J. McNames, Portland State University, with permission)
https://www.oreilly.com/library/view/signals-and-systems/9789332515147/xhtml/ch12 12-2.xhtml
https://en.wikipedia.org/wiki/Butterworth filter
https://en.wikipedia.org/wiki/Resonance
https://en.wikipedia.org/wiki/Comb filter
http://www.eecs.umich.edu/courses/eecs206/archive/spring02/lab.dir/Lab9/lab9 v3 0 release.pdf
https://www.staff.ncl.ac.uk/oliver.hinton/eee305/Chapter5.pdf
http://web.mit.edu/2.14/www/Handouts/PoleZero.pdf
https://www.discodsp.net/VAFilterDesign 2.1.0.pdf
http://www.micromodeler.com/dsp/
152
Sampling
https://www.dspguide.com/ch16.htm
https://ccrma.stanford.edu/∼jos/resample/resample.pdf
http://www.jean-lucsinclair.com/s/Granular-Synthesis.pdf
http://www.nicholson.com/rhn/dsp.html
http://paulbourke.net/miscellaneous/interpolation/
http://msp.ucsd.edu/techniques/v0.11/book.pdf
http://yehar.com/blog/wp-content/uploads/2009/08/deip.pdf
http://paulbourke.net/miscellaneous/interpolation/
Controllers
https://www.midi.org/specifications-old/item/the-midi-1-0-specification
https://www.midi.org/articles-old/midi-polyphonic-expression-mpe
153
154
Figure Copyright Acknowledgments
©1
kpr2, CC0, https://commons.wikimedia.org/w/index.php?curid=57571949
©2
Andrew Russeth, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=26604703
©3
JR (Flickr), CC-BY-2.0, https://www.flickr.com/photos/103707855@N05/29583857506/
©4
Steve Sims, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12379187
©5
Reprinted with permission by Ed James (Clyne Media).
©6
Bernd Sieker (Flickr), CC-BY-2.0, https://www.flickr.com/photos/pink dispatcher/13804312464
©7
Aquegg, Public Domain, https://commons.wikimedia.org/wiki/File:Spectrogram-19thC.png
©8
Fourier1789, Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=41344802
©9
This image is mine, but is inspired by a prior original image by MusicMaker5376, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=17413304
©10
Gunther, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=821342
©11
Bob K, Public Domain, https://commons.wikimedia.org/w/index.php?curid=648112
©12
Olli Niemitalo, CC0, https://commons.wikimedia.org/w/index.php?curid=24627501
©13
Dvortygirl, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=2524720
©14
Ken Heaton, Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=48242081
©15
Paulo Ordoveza, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=29694963
©16
Jane023, CC BY-SA 3.0 nl, https://commons.wikimedia.org/w/index.php?curid=19732582
©17
Public Domain, https://commons.wikimedia.org/w/index.php?curid=391692
©18
Scientific American, 1907, Public Domain, https://commons.wikimedia.org/w/index.php?curid=49752962
©19
JacoTen, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=24356990
©20
Julien Lozelli, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=7767415
©21
Brandon Daniel, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=33527250
©22
Allangothic (Wikipedia), CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=2875045
©23
Museumsinsulaner, Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12503701
©24
User:MatthiasKabel, Own work, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=1265010
©25
Finnianhughes101, CC0, https://commons.wikimedia.org/w/index.php?curid=39547075
©26
Surka, Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=8345595
©28
GeschnittenBrot, Flickr: DSC 0117, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=14746344
155
©29
Kimi95, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=7708499
©30
I, Zinnmann, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=2441310
©31
Museumsinsulaner, Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12501904
©32
FallingOutside, Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=21742056
©33
Andrew Russeth, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=26604703
©34
Brandon Daniel, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=19964001
©35
Hollow Sun, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=8863073
©36
Robert Brook, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=7509844
©37
Pete Brown, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=38092002
©38
Mojosynths, Own work, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=3815497
©39
Reproduced with permission from Richard Lawson at RL Music. Thanks to Sequential LLC for assistance.
©40
CPRdave, Public Domain, https://commons.wikimedia.org/w/index.php?curid=7633923
©41
Allert Aalders, CC BY-NC-SA 2.0, https://www.flickr.com/photos/allert/6809874296/
©42
Ed Uthman, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=74691297
©43
Jdmt, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=29340041
©44
Candyman777, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=19810850
©45
F J Degenaar, Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=1671111
©46
Reproduced with permission from Sequential LLC.
©47
Paul Anthony, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=67924982
©48
Reprinted with permission from Make Noise, Inc., via CC BY-NC-ND. Photo credit to Eric “Rodent” Cheslak. 0-Coast
developed by Tony Rolando.
©49
Joshua Schnable, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=26553228
©50
John Athayde, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=7530494
©51
F J Degenaar, Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=1671111
©52
Warrakkk, https://commons.wikimedia.org/w/index.php?curid=19273415.
©53
BillyBob CornCob. CC0, https://commons.wikimedia.org/w/index.php?curid=76760547
©54
Cameron Parkins, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=7810459
©55
Gablin, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=1377602
©56
Spinningspark, CC BY-SA 3.0, https://en.wikipedia.org/w/index.php?curid=23488139
©57
Geek3, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=53958748
156
©58
Steve Sims, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12379187
©59
Stonda (assumed) (Wikimedia contributor), CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=649793.
©60
Buzz Andersen, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=2801145
©61
Matt Vanacoro, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=39425190
©62
Public Domain, from https://waveeditonline.com/
©63
Modified version of a figure by James Maier, reproduced with permission, http://www.carbon111.com This
particular image cropped from http://www.carbon111.com/table002.png
©64
Reprinted with permission from Tasty Chips Electronics.
©65
Olli Niemitalo, CC0, https://commons.wikimedia.org/w/index.php?curid=24627501
©66
Olli Niemitalo, CC0, https://commons.wikimedia.org/w/index.php?curid=77818389
©67
Reprinted with permission from Livi Lets.
©68
What’s On the Air Company, Public Domain, https://commons.wikimedia.org/w/index.php?curid=56188934
©69
Matt Vanacoro, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=39425186
©70
Ben Franscke, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=7633823
©71
Vlad Spears, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=2930845
©72
LeoKliesen00, Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=48087446
©73
DeepSonic, CC BY-SA 2.0, https://www.flickr.com/photos/deepsonic/6600702361
©74
By Lippold Haken and Edmund Eagan CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=3710610
©75
Reprinted with permission from Roger Linn.
©76
Klaus P. Rausch, CC0, https://commons.wikimedia.org/w/index.php?curid=60729226
©77
F7oor, CC BY-SA 2.0, https://www.flickr.com/photos/f7oor/3992788445/
©78
Pretzelpaws CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=142551
©79
Reprinted with permission from Brad Ferguson.
157
158
Index
µ-law, 19 Audio Modeling, 134
5.1 surround sound, 19 audio rate, 43
Audio Unit, 11
a-law, 19 automated modulation, 43
Ableton Live, 138
action, 135 band limited wave, 70, 118
ADC, 18, 113 band pass filter, 34, 65, 81, 101
additive synthesis, 8, 14 band reject filter, 81
ADSR envelope, 35, 47, 66, 79 bandwidth, 101, 106
aftertouch, 43, 138 Bartlett window, 131
AHDSR envelope, 47 Beach Boys, 136
AHR envelope, 47 Behringer BCR2600, 137
AIR Music Technology Loom, 33, 41 Bell Labs Digital Synthesizer, 33
Akai MPC series, 113 Bessel function of the first kind, 106
algorithms, 109 bilinear transform, 86, 92
aliasing, 18, 70 bin, 23
all notes off, 145 bipolar, 43, 53
all pass filter, 82, 127, 129 bipolar envelope, 50
all sound off, 145 bitrate, 19
alternators, 31 Blackman window, 123
Amen Break, 113 Blade Runner, 59
amplifier, 12, 34, 55, 57, 62, 63, 67 BLIT, 71
amplitude, 13, 24, 81 blue noise, 69
amplitude modulation, 77 Bode plot, 86, 89
amplitude response, 81, 87, 88 Bode, Harold, 56
analog filter, 86 breath controller, 138, 145
analog synthesizer, 8, 11 brick wall filter, 82
Analog-Digital Converter, 18, 113 brown noise, 69
Analogue Solutions, 65 Brownian motion, 69
angle modulation, 103 Buchla, Don, 56, 74
angular frequency, 16, 87 buffer, 38
AR envelope, 47 Butterworth filter, 87, 92, 96, 98
ARP 2500, 57
ARP Instruments 2600, 57 Camel Audio Alchemy, 33
ARP Odyssey, 58 capacitive keyboard, 136
arpeggiator, 12, 35, 52, 65 cardinal sine function, 120
arpeggio, 52 carillon, 16
asynchronous granular, 117 Carlos, Wendy, 56
attack level, 12, 35, 47 carrier, 78, 104
attack rate, 47 Carson’s rule, 106
attack time, 12, 35, 47 Casio CZ series, 75
attenuation, 96 Casio CZ-1, 75
AU, 11, 61 Casio CZ-101, 75
audio interface, 11 Catmull-Rom, 38, 120
159
causal filter, 123 DAW, 9, 11
CC, 54 DC offset, 107
channel aftertouch, 139 DC Offset Bin, 24
channels, 19 DCO, 63, 67
Chariots of Fire, 59 decay rate, 47
Chebyshev Polynomials of the First Kind, 73 decay time, 35, 47
Chebyshev, Pafnuty, 73 decibel, 17
chorus, 9, 65 decimation, 117
Chowning, John, 103 Deep Note, 7
Ciani, Suzanne, 57 delay, 9, 65, 125
Clavia Nord Lead, 60 delay time, 12, 47
clipping, 74 dernormals, 37
clock, 51 desktop synthesizer, 8, 12
clock pulse, 143 detune, 12, 63
Clockwork Orange, 56 DFT, 23
Close Encounters of the Third Kind, 57 digital audio workstation, 9, 11, 61, 114, 135,
coefficients, 87 141
COLA, 131 digital delay line, 125, 133
comb filter, 82, 126 digital filter, 86
combiner, 12, 55, 62, 67, 76 digital synthesizer, 5, 8, 60
companding, 19 digital waveguide synthesis, 134
complex conjugate, 28 Digital-Analog Converter, 18, 113
compress, 74 Digitally Controlled Oscillator, 63, 67
Constant Overlap-Add, 131 dipthong, 101
Control Change, 54 Direct Form I, 85
control surface, 137 Direct Form II, 85
control voltage, 53, 65, 147 Discrete Fourier Transform, 23
controller, 8, 11, 60, 135 Doepfer Musikelektronik, 61, 65
convolution, 30, 83, 121, 130 Doppler effect, 128
convolution reverb, 130 downsampling, 106, 117
Cooley, James William, 25 drawbars, 33
Cooley-Tukey FFT, 25 drawknob, 31
correlation, 121 drum computer, 50
Creative Labs Sound Blaster, 103, 115 drum machine, 7, 8, 50, 113
cross fade, 76, 114 drum pad, 138
cubic interpolation, 120 drum synthesizer, 50
cutoff frequency, 12, 81, 93 dry, 125
CV, 53, 61, 65, 147 duophonic, 134
CV/Gate, 65, 147 duty cycle, 68
dynamic range, 19, 135
D-Beam, 136
DAC, 18, 113 E-Mu Systems, 113
DADSR envelope, 12, 47, 64 early reflections, 129
Dark Side of the Moon, 57 East Cost synthesis approach, 57
Dave Smith Instruments Prophet ’08, 11, 63 echo, 9
Dave Smith Instruments Prophet 6, 61 Edisyn, 5, 53
160
effect, 65, 125 Fourier Series, 21
effects unit, 9 Fourier Transform, 14, 21
Eigenlabs Eigenharp, 139 free, 46, 51
electrical isolation, 141 free LFO, 44
Elektron Digitone, 111 Freeverb, 130
ELP, 56 frequency, 13, 24
embouchure, 147 frequency division, 58
Emerson, Keith, 56 frequency domain, 13
EMS Synthi, 57 frequency modulation synthesis, 8, 14, 43, 67,
EMS VCS 3, 57 75, 78, 103, 104, 128
Emu Morpheus, 114 frequency response, 81, 87
Emu UltraProteus, 114 frequency warping, 93
Ensoniq, 113 fully diminished, 16
envelope, 35, 46, 79 fundamental, 15, 107
envelope generator, 46 Funky Drummer, 113
equalization, 65
Euler’s Formula, 21 gain, 17, 82, 84, 91
Eurorack, 61, 63, 65 gate, 52, 61, 65, 147
exponential FM, 105 Gauss, Carl Friedrich, 25
expression controller, 145 Gizmo, 5, 50, 52
expression pedal, 43, 136 Gold, Rich, 116
expressivity, 138 Good Vibrations, 136
grain cloud, 117
fader, 51 grain envelope, 116
Fairlight CMI, 113 grains, 116
Fairlight Qasar M8, 33 granular synthesis, 116
Fast Fourier Transform, 25 grid controller, 138
feedback comb filter, 84, 127 guitar processor, 138
feedforward comb filter, 83
FFT, 25, 131 Haken Continuum Fingerboard, 139
filter, 12, 18, 34, 39, 55, 62, 66, 67, 69, 76 Hamming window, 29
filter FM, 63, 111 Hammond Novachord, 58
filtering, 29 Hammond Organ, 33, 128
finite impulse response filter, 83, 130 Hann window, 116, 123, 131
FIR, 69, 83 hard sync, 78
first-order filter, 83 harmonics, 15, 39, 67, 107
flanger, 65, 126 hertz per volt, 53, 147
Flow, 5, 40 high pass filter, 34, 65, 81
FM synthesis, 8, 43, 60, 67, 75, 78, 103, 104, hold stage, 47
128 hum tone, 16
foldover, 18, 70 human factors, 138
foot controller, 145
formant, 101 IDFT, 23
formant filter, 34, 40, 101 IFFT, 28
forward comb filter, 126 IIR, 84
four pole filter, 63, 65, 82 image synthesis, 30
161
Image-Line Harmless, 33 legato, 36, 44, 46, 47, 51, 63
Image-Line Harmor, 30, 33, 41 Leslie, 33
impulse, 130 Leslie speaker, 128
impulse response, 130 LFO, 12, 35, 43, 64, 66
In The Box, 11 linear arithmetic synthesis, 114
index of modulation, 105 linear FM, 103, 105
infinite impulse response filter, 84 linear interpolation, 119, 127
inharmonic, 107 linear phase filter, 82
instantaneous frequency, 104 local switch, 145
instantaneous phase, 37, 75, 104 lossy compression, 19
interpolation, 118, 119 low frequency oscillator, 12, 35, 43, 66
Inverse Discrete Fourier Transform, 23 low pass filter, 12, 18, 34, 55, 57, 63, 65, 71, 81,
Inverse Fast Fourier Transform, 28 117, 118
Inverse Fourier Transform, 14, 21 low-pass feedback comb filter, 130
ITB, 11 lower zone, 148
jack, 56 magnitude, 87
Jankó keyboard, 138 Make Noise 0-Coast, 61
joystick, 66 matrix destination, 57
Jump, 59 matrix source, 57
membrane, 136
Kaiser window, 123 MIDI, 8, 12, 54, 60, 135, 139
Kakehashi, Ikutaro, 139 MIDI 2.0, 147
Karplus, Kevin, 132 MIDI 2.0 profile, 149
Karplus-Strong, 132 MIDI channel mode messages, 145
Kawai K3, 33 MIDI data byte, 141
Kawai K4, 53 MIDI in, 140
Kawai K5, 33, 39 MIDI interface, 11
Kawai K5000, 33 MIDI merge, 140
keyboard, 135 MIDI out, 140
Korg microKORG, 60, 63, 64 MIDI patchbay, 140
Korg MS10, 57 MIDI polyphonic expression, 145, 147, 148
Korg MS20, 57 MIDI router, 140
Korg MS2000, 64 MIDI status byte, 141
Korg PS-3300, 59 MIDI thru, 140
Korg Wavestation, 7, 114 minBLEP, 79
mini key, 136
lag, 54 mixer, 9, 66
Lagrange interpolation, 119 mixing, 35, 63, 76
Lagrange polynomial, 119 mixing console, 9
Lagrange, Joseph-Louis, 119 mixture, 31
Laplace domain, 86 mode, 15
late reflections, 129 modifier function, 53, 64
latency, 38, 123 modular synthesizer, 7, 56, 65
leak, 28 modulation, 12, 35, 55, 62, 67
leaky integrator, 72 modulation amount, 53
162
modulation destination, 53 Nyquist Frequency Bin, 24
modulation matrix, 12, 53, 57, 59, 64 Nyquist limit, 18, 24, 70, 117
modulation source, 53 Nyquist-Shannon sampling theorem, 118
modulation wheel, 43, 137, 144
modulator, 78, 104 Oberheim 4-Voice, 59
module, 56 Oberheim 8-Voice, 59
Moiré patterns, 70 Oberheim Matrix 1000, 63
monitor, 9 Oberheim Matrix 6, 63
monophonic, 8, 58 Oberheim Matrix 6R, 63
Moog MemoryMoog, 59 Oberheim Matrix series, 53, 60
Moog Minimoog Model D, 8, 58 Oberheim OB-X, 59, 61
Moog, Robert, 56 Oberheim OB-Xa, 59, 61
morph, 41 Oberheim, Tom, 59
MP3, 19 OBXD, 61
MPE, 145, 147, 148 octave, 52
MPE master channel, 148 omni channel, 142
MPE zone, 148 On The Run, 57
multi-stage envelope, 50 one pole filter, 82
multi-track sequencer, 50 one-shot envelope, 47
multi-track tape recorder, 9 one-shot waves, 114
multidimensional polyphonic expression, operator, 108
148 optoisolator, 141
multiple wavetable synthesis, 115 orbit, 116
multitimbral, 114 order, 82, 106
Musical Instrument Digital Interface, 60, 135, oscillator, 12, 55, 62, 67
139 Overlap-Add Method, 131
muti-tap delay line, 127 overshoot, 96
overtone, 15
Native Instruments Razor, 33, 41 OXE FM synthesizer, 109
NCO, 67
New England Digital Synclavier II, 33 pad, 49
noise floor, 19 Palm, Wolfgang, 115
Non-Reserved Parameter Numbers, 145, 146 pan, 17
noodling, 9 parametric equation, 116
normal form, 107 paraphonic, 58
normalized sinc function, 120 partial, 13
notch filter, 34, 81 passband, 96
note, 15 patch, 7, 12, 56
note latch, 52 patch bank, 144
note length, 51, 52 patch cable, 7, 56
note velocity, 52 patch editor, 63
Novation Launchpad, 138 patch matrix, 53, 57
Novation Remote Zero, 137 PCM, 114
NRPN, 145, 146 PCM synthesis, 60, 67
nth order filter, 84 Pearlman, Alan R, 57
Numerically Controlled Oscillator, 67 period, 16, 37, 78
163
PG-8X, 61 rectangle function, 122
phase, 24, 81 rectangular window, 29
phase distortion synthesis, 75 release rate, 47
phase modulation synthesis, 75, 103, 104 release time, 12, 35, 47
phase response, 81, 88, 96 release velocity, 43, 139
phaser, 65, 132 repeating waves, 114
phrase sampler, 113 resampling, 71, 117
physical modeling synthesis, 125, 132 Reserved Parameter Numbers, 145, 146
piano, 135 reset all controllers, 145
pianoforte, 135 resolution, 43
Pink Floyd, 57 resonance, 63, 76, 83
pink noise, 69 rest, 51
pitch, 15, 17 resynthesis, 30
pitch bend, 136 return, 9
pitch bend wheel, 43, 137 reverb, 9, 129
pitch ribbon, 137 reverberation, 129
pitch scaling, 30, 113, 117 reversing soft sync, 78
pitch shifting, 113, 117 ring buffer, 125
PM synthesis, 75, 103, 104 ring modulation, 64, 77
pole, 82, 86, 88 ringing, 96
Pollard Industries Syndrum, 137 ripple, 96
polyphonic, 8, 11, 58 RMI Harmonic Synthesizer, 33
polyphonic aftertouch, 139 Roads, Curtis, 116
portamento, 36, 63, 136 Roger Linn Design LinnStrument, 139
PPG Wave, 115 Roland, 139
PreenFM2, 110, 111 Roland D-50, 114
pressure, 138 Roland Jupiter 8, 59, 60
prime, 16 Roland JX-8P, 61
pulse code modulation, 114 Roland MKS-80 Super Jupiter, 60
pulse code modulation synthesis, 67 Roland TR-808, 50, 113
pulse wave, 68 ROLI Seaboard, 139
pulse width, 63, 68 roll-off, 82
ROM, 7
quadraphonic, 13, 19 rompler, 7, 8, 60, 114
quality factor, 83 rotary speaker, 128
RPN, 145, 146
R2-D2, 7, 57 RPN Null, 146
rackmount synthesizer, 8
Raiders of the Lost Ark, 57 S&H, 46, 66
ramp wave, 43, 67 sample and hold, 46, 66, 119
random wave, 44 sampler, 7, 8, 60
rank, 31 sampling, 113
rate-based envelope, 47 sampling function, 120
RCA Mark I / II Electronic Music sampling rate, 18
Synthesizers, 56 sawtooth wave, 43, 63, 67
recorder, 9 Schroeder reverberation, 130
164
Schroeder, Manfred, 130 swing, 51, 52
second order filter, 84 Switched-On Bach, 56
sections, 51 sync, 63, 64, 78
self-centering, 137 synchronous granular, 116
SEM, 59 synth, 7
semi-modular synthesizer, 57, 61 synthesizer, 7
send, 9 sysex, 141
Sender, Ramon, 57 system exclusive, 141
sequence, 50
sequencer, 8, 11, 35, 50 table-based waveshaper, 73
Sequential Circuits Prophet 5, 59, 61, 139 tabletop synthesizer, 8, 12
Sequential Circuits Prophet VS, 114 tangent, 135
Short Time Fourier Transform, 30, 131 tape-replay, 113
sidebands, 78, 105 Tasty Chips Electronics GR-1, 117
signal to noise ratio, 19 Taylor series, 21
Telharmonium, 33
Simmons SDS-5, 137
tempo, 51, 52
sinc function, 120
The Shining, 56
sine wave, 43, 68
The Thing, 136
single-cycle wave, 64, 71
theremin, 136
Smith, Dave, 114, 139
Theremin, Léon, 136
Smith, Julius, 134
third dimension controller, 145
soft sync, 78
THX, 7
softsynth, 11, 61
tie, 51
software synthesizer, 5, 9, 11, 61
tierce, 16
spectogram, 13, 29
time domain, 13
spline, 119
time stretching, 116
square wave, 43, 63, 67
time-based envelope, 47
state-variable filter, 59
tonewheels, 31
step sequencer, 12, 50, 66
track mute, 51
stereo, 13
track solo, 51
STFT, 30, 131 tracking generator, 64
stop knob, 31 transfer function, 86, 87
stopband, 96 transition band, 96
stops, 31 Trautonium, 55, 137
stored patch synthesizer, 12 travel, 136
Streetly Electronics Mellotron series, 113 traveling wave, 133
strike tone, 16 tremolo, 35, 43, 128, 136
Strong, Alexander, 132 triangle wave, 43, 63, 67
suboscillator, 66, 69 triangular window, 131
Subotnick, Morton, 57 Tron, 56
subtractive synthesis, 11, 55 Tukey, John, 25
sustain level, 12, 35, 47 two pole filter, 65, 82
SWAM, 134
swell box, 136 unipolar, 43, 53
swell pedal, 136 unity gain filter, 82, 84, 91, 122
165
unweighted, 135 wave folding, 57, 74
upper zone, 148 wave sequence, 114, 116
upsampling, 118 wave shaping, 73
USB host for MIDI., 147 wave terrain, 116
waveguide, 134
Välimäki, Vesa, 79 wavetable, 114, 115
Van Halen, 59 wavetable synthesis, 8, 34, 60, 67, 71, 115
Vangelis, 59, 137 weighted action, 135
variable bitrate, 19 weighted keyboard, 136
variance, 44 West Coast synthesis approach, 57
VCA, 66, 79 wet, 9, 125
VCO, 66, 67 white noise, 63, 68
vector synthesis, 8, 114 Whittaker-Shannon interpolation formula,
velocity, 43, 79, 135 121
velocity sensitivity, 79, 136 wind controller, 138
vibrato, 12, 35, 43, 103, 128, 136 window, 28, 75, 116, 122
virtual analog synthesizer, 9, 60, 64 windowed sinc interpolation, 106, 120
Virtual Studio Technology, 11 wrapping, 74
visualization, 29
Xenakis, Iannis, 116
vocoder, 7, 30, 60, 64
voice, 11, 58 Yamaha CS-80, 59
voice stealing, 58 Yamaha DX7, 8, 103, 109
volt per octave, 53, 105, 147 Yamaha FS1R, 109
Voltage Controlled Amplifier, 66, 79 Yamaha TX81Z, 103, 109
Voltage Controlled Oscillator, 66, 67 Yamaha VL1, 134
VST, 11, 61
Z domain, 86, 92
Wakefield, Jeremy “Jezar”, 130 zero, 82, 86, 88
Waldorf Microwave, 115 zero padding, 85, 130
Waldorf Music, 115 zipper effect, 43, 46
166