Build AIEnhanced Audio Plugins With C
Build AIEnhanced Audio Plugins With C
Build AI-Enhanced Audio Plugins with C++ explains how to embed artificial intel
ligence technology inside tools that can be used by audio and music professionals,
through worked examples using Python, C++ and audio APIs which demonstrate
how to combine technologies to produce professional, AI-enhanced creative tools.
Alongside a freely accessible source code repository created by the author that ac
companies the book for readers to reference, each chapter is supported by complete
example applications and projects, including an autonomous music improviser, a
neural network-based synthesizer meta-programmer and a neural audio effects
processor. Detailed instructions on how to build each example are also provided,
including source code extracts, diagrams and background theory.
This is an essential guide for software developers and programmers of all levels
looking to integrate AI into their systems, as well as educators and students of
audio programming, machine learning and software development.
and by Routledge
605 Third Avenue, New York, NY 10017
The right of Matthew John Yee-King to be identified as author of this work has
been asserted in accordance with sections 77 and 78 of the Copyright, Designs and
Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised
in any form or by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying and recording, or in any information
storage or retrieval system, without permission in writing from the publishers.
DOI: 10.4324/9781003365495
Publisher’s Note
This book has been prepared from camera-ready copy provided by the author.
For Sakie, Otoné, and my family. And of course, Asuka
the beagle.
Contents
Foreword x
List of figures xi
I Getting started 1
1 Introduction to the book 2
3 Installing JUCE 23
5 Set up libtorch 44
9 FM synthesizer plugin 72
12 The meta-controller 98
vii
13 Linear interpolating superknob 103
28 Convolution 220
30 Waveshapers 241
Bibliography 335
Index 340
Foreword
x
List of figures
2.1 Many component parts are needed to build AI-enhanced audio soft
ware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The components involved in AI-enhanced audio application devel
opment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Setting up your development environment involves complex ma
chinery and lots of steps. . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Creating, building and running a C++ console program in Visual
Studio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 C++-related packages that you should install. . . . . . . . . . . . . 19
2.6 Creating, building and running a C++ console program in Xcode. 20
2.7 My development setup showing an M1 Mac running macOS hid
den on the left, a ThinkPad running Ubuntu 22.04 in front of the
monitor and a Gigabyte Aero running Windows 11 on the right. . 22
xi
List of figures xii
8.1 dphase depends on the sample rate (the space between the samples)
and the frequency (how fast you need to get through the sine wave). 64
8.2 A synthesizer plugin loads data into the incoming blocks. . . . . . 65
8.3 Printing descriptions of MIDI messages coming into a plugin in
Standalone mode from a USB controller keyboard (left) and MIDI
coming from an on-screen piano keyboard in AudioPluginHost
(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.1 Simple FM plugin with sliders for frequency, modulation index and
modulation depth. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.2 Showing plugin parameters for the Surge XT synthesiser using Au
dioPluginHost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.3 Showing plugin parameters for the FM plugin using AudioPlugin-
Host. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.4 Showing the custom UI for the FM plugin (right), the auto-
generated parameter UI (middle) and AudioPluginHost (left). . . . 78
11.1 Simple single layer network with one input and one output. . . . . 91
List of figures xiii
11.2 More complex single layer with more inputs and outputs. Now we
apply a weight to each input as it goes to each output. We then
sum the weighted inputs and apply a bias to each output. . . . . . 93
11.3 The optimiser adjusts the network weights. . . . . . . . . . . . . . 95
12.1 The meta-controller uses a neural network for new methods of syn
thesizer sound exploration. . . . . . . . . . . . . . . . . . . . . . . 98
12.2 The Wekinator workflow: data collection, training, inference then
back to data collection. . . . . . . . . . . . . . . . . . . . . . . . . 101
16.1 Hosting plugins allows you to control more advanced synthesizers. 129
19.1 User interface for the Dexed DX-7 emulator. It has 155 parameters. 151
19.2 Time for a more complex neural network. . . . . . . . . . . . . . . 153
20.1 How far are modern sequencers from steam-powered pianos? . . . . 158
20.2 My own experience interacting with AI improvisers. Left panel:
playing with Alex McLean in Canute, with an AI improviser adding
even more percussion. Right panel: livecoding an AI improviser in
a performance with musician Finn Peters. . . . . . . . . . . . . . . 159
20.3 Visualisation of a two-state model on the left and the state transi
tion probability table on the right. . . . . . . . . . . . . . . . . . . 164
20.4 Visualisation of a variable order Markov model containing first and
second order states. . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
21.1 Example of the Markov model generated by some simple code. . . 170
23.1 Note duration is the length the note plays for. Inter-onset interval
is the time that elapses between the start of consecutive notes. . . 187
23.2 Measuring inter-onset-intervals. The IOI is the number of samples
between the start and end sample. elapsedSamples is the absolute
number of elapsed samples since the program started and is updated
every time processBlock is called; message.getTimestamp() is the
offset of the message in samples within the current block. . . . . . 188
24.1 Measuring note duration has to cope with notes that fall across
multiple calls to processBlock. . . . . . . . . . . . . . . . . . . . . . 194
24.2 Testing the getTimestamp function on note–on messages – the
timestamp is always between zero and the block size of 2048. . . . 196
List of figures xv
25.1 If notes start close enough in time, they are chords. If the start
times fall outside a threshold, they are single notes. This allows for
human playing where notes in chords do not happen all at the same
time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
27.1 The impulse signal and the impulse responses of a one-pole, two
pole and three-pole system. . . . . . . . . . . . . . . . . . . . . . . 216
28.1 Original drum loop spectrum on the left, filtered version on the
right. High frequencies have been attenuated in the filtered spec
trum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
32.1 What does our simple, random LSTM do to a sine wave? It changes
the shape of the wave and introduces extra frequencies. . . . . . . 266
List of figures xvi
32.2 The steps taken to process a WAV file with a neural network
through various shapes and data formats. . . . . . . . . . . . . . . 270
32.3 Time taken to process 44,100 samples. Anything below the 1000ms
line can potentially run in real-time. Linux seems very fast with low
hidden units, but Windows and macOS catch up at 128 units. . . 272
36.1 The training loop. Data is processed in batches with updates to the
network parameters between batches. Between epochs, checks are
done on whether to save the model and exit. . . . . . . . . . . . . 310
36.2 Comparison of training runs with different sized LSTM networks.
At the top you can see the input signal and the target output signal
recorded from Blackstar HT-1 valve guitar amplifier. The descend
ing graphs on the left show the validation loss over time for three
LSTM network sizes. The waveforms show outputs from the net
works before and after training. . . . . . . . . . . . . . . . . . . . 314
Getting started
1
Introduction to the book
Welcome to ‘Build AI-Enhanced Audio Plugins with C++’! You are about
to embark on a journey into a world of advanced technology, which will be an
essential part of the next generation of audio software. In this chapter, I will
introduce the general area of AI-music technology and then set out the book’s
main aims. I will identify different types of people: audio developers, student
programmers, machine learning engineers, educators and so on and explain how
each group can get the best out of the book. I will explain that you can use any
of the large amounts of code I have written for the book however you like. I will
also explain the dual licensing model used by the JUCE library. I will finish with
a straightforward, working definition of artificial intelligence.
2
Introduction to the book 3
One problem with many AI-music systems I mentioned is that they only exist as
descriptions in research papers. The research papers aim to describe systems and
their performance to other AI-music experts using a very limited number of pages.
They are not tutorials and do not necessarily explain the nuts and bolts needed
to build a complete working system.
Sometimes the researchers who write these research papers provide source code
repositories but my experience has been that it can be challenging to operationalise
the software from these source code repositories. Making the code run likely in
volves having a particular combination of particular versions of other components
installed and often a particular operating system. The challenge is even more
significant if there is no source code repository. I have watched excellent PhD stu
dents labour for weeks to re-implement systems described in research papers, only
to discover there are vital details that should have been included in the paper or
other technical issues.
These problems mean that AI-enhanced music systems are not easily accessible
to musicians wishing to use them and probably not to programmers wanting to
integrate them into innovative music software. This is where this book comes in.
This book will show you several examples of how to build complete working AI-
music systems. All source code written by me is provided in a repository and is
covered with a permissive open-source license, allowing you to re-use it how you see
fit. The book also uses a consistent technical setup, allowing you to easily access
and “wrench on” the examples. So my first aim is really to make AI-enhanced
music technology available and transparent for you.
My second aim is to show you how to construct the technology in a way
that makes it accessible to musicians and audio professionals. Knowing how to
build and run AI-enhanced music systems on your machine is one thing. A quirky
Python script hacked together to work on your setup is acceptable for research
and experimentation but you will have trouble getting musicians to use it. What
is the ideal method for sharing software in a form that musicians and other audio
professionals can use it? The next aim of this book is to answer that question
and to apply the answer to the design of the examples in the book. For many
years I have worked as a researcher/engineer on research projects where one of
the aims has been to get new technology into the hands of users so we can evaluate
and improve that technology. There are many approaches to achieving this aim:
running workshops with pre-configured equipment, making the technology run in
the web browser and so on. All of these approaches have their merits and are
appropriate in different circumstances. But none of them is quite suitable for our
purposes here. Here we are aiming to write software that can be used by musicians,
producers and sound engineers with minimal effort on their part.
How can you achieve this aim of having as many audio professionals as possible
to be able to use your software, and how can this book help you? Firstly, making
software so it integrates with existing creative workflows is crucial. The simplest
Introduction to the book 5
1: Getting started. You will set up your system for the development work
in the book and build some example plugins and other test programs.
2: ML-powered plugin control: the meta-controller. You will build the first
large example in the book, a plugin that hosts and controls another
plugin using a neural network.
3: The autonomous music improviser. The second large example is a plugin
that can learn in real-time from incoming MIDI data and improvise its
own interpretation of what it has learned.
4: Neural audio effects. The third large example is a plugin that models
the non-linear signal processing of guitar amplifiers and effects pedals
using neural networks.
Each part of the book contains detailed instructions on how to build each ex
ample, including source code extracts, diagrams and background theory. I have
also included some brief historical and other context for the examples. The exam
ples in parts 2, 3, and 4 are independent, so you can jump to any of those parts
once you have completed part 1.
I have created a freely accessible source code repository, currently on Github,
which provides each of the examples above in various states of development. As you
work through each stage of developing each example, you can pull up a working
version for that stage from the repository, in case you get stuck. The repository
contains releases of each final product with compiled binaries and installers.
But you will miss out on a lot of learning if you only do that! To fully exploit
the content in the book, you will need to start by working through part 1, which
explains how to set up your system for the development work described in the
book. You will find ‘Progress check’ sections at the end of each chapter, which
clarify what you should have achieved before continuing to the next chapter.
After part 1, there are three detailed example projects, each providing very
different functionality. These three projects are independent, so you can choose
which order you study them in or only study some and not others.
As you work through the parts of the book, each program you are developing
will increase in complexity until it is completely functional at the end of the book
part. To work through the examples, you can type in all the code you see in the
book and build the complete example by hand, or you can read the code and
download the step-by-step versions of the projects from the repository. I find that
people sometimes get really stuck working through these larger projects, where
they cannot make it compile or work properly. So, I have provided staged versions
of the programs in the code repository. If you reach the end of a chapter and
cannot figure out why your program does not work, you can just pick up from a
working stage in the code repository and continue. Of course, there is much to be
gained by spending hours looking for that missing bracket, so do try and debug
your problems before grabbing the working version from the repo.
At the end of some chapters and at the end of all parts, I suggest challenges
and extensions. These are extra features you can add to the plugins, allowing you
to reinforce and increase your understanding of the principles and techniques. If
you are using the book for teaching, these challenges and extensions are things
you could set students for coursework.
very helpful as, in my experience, students spend a lot of time struggling to get
their software development environment set up and working correctly. The book
also provides information about general audio processing techniques and how they
can be adapted using AI technology.
1.5.4 Educator
If you are an educator planning to use this book as part of your teaching, that is an
excellent idea! The most obvious way to use the book in your teaching is to split the
content between lectures and lab classes. In the lectures, you can introduce the AI
theory. You can go as far as you like with the AI theory, depending on the level and
focus of your course. I cover enough in the book to enable the reader to carry out
training and inference and to integrate the AI system into a working application.
I also explain some characteristics of the particular machine learning techniques
used. You can go much further than that, depending on your requirements. For
your lab classes, your students can work through the practical implementation of
the applications. The book contains detailed instructions on how to build each of
Introduction to the book 8
the example applications. I have battle tested and iterated these instructions with
my students.
projects but if you want to use the complete examples in a closed-source manner
you should ensure you understand the JUCE library’s dual licensing model.
Open-source has various definitions. For our purposes, open-source means that
the source code for a piece of software is available. Open-source code generally
comes with a license that dictates what the code’s author wants you to do with
that code. Permissive licenses, such as the MIT license, place few limitations on
the use of that code. Users are free to use that code as they see fit. Users of
MIT-licensed code can adapt the code and even include it in commercial projects
without needing to release their adapted code. All the code written by me for this
book is MIT licensed.
The GPL licenses take a stronger philosophical position concerning freedom
and are designed to encourage further sharing of source code. You can adapt and
use GPL’d code in your project, even if it is a commercial project, but you will
be required to release your code. You are also obliged to make your source code
open-source with a GPL license if your code links to GPL libraries.
Releasing source code with multiple licenses is possible if they are compatible.
The code in this book that I have written is released under a dual MIT / GPL
license. I will explain why below. If you use my code in your project, the MIT
license applies. The GPL license applies if you use my complete examples, which
also link to GPL’d code.
The reason I have dual-licensed the code is because of the JUCE library. The
JUCE library carries a dual license. If you build against the JUCE library, you
can either GPL your whole project or apply for a JUCE license, and then, you do
not need to GPL your code.
1.7 User-readiness
I will show you how to get the software to a point where it will run on your
machine and, with some fiddling, on other people’s machines. I do not cover the
creation of installers or the process of signing / notarising software. Some great
resources online tell you what to do with installers, signing etc., once your amazing
AI-powered plugin is ready for the world.
In the next few chapters, I will explain how you can set up your development
environment for audio software development. You need this setup to work on and
run the example programs in the book. After working through these chapters, you
should be able to build and run a simple C++ program linked to the JUCE audio
library and the libtorch machine learning library using an integrated development
environment (IDE). The chapters should also familiarise you with the CMake tool
which will allow you to create cross-platform projects with which you can create
native applications and plugins for Windows, macOS and Linux systems. You can
use these setup chapters how you like – read them, scribble on them, etc. but I
recommend working through the material with a computer available. Expect to
install software on the computer, run commands in its command shell and execute
programs.
11
Setting up your development environment 12
FIGURE 2.2
The components involved in AI-enhanced audio application development.
Setting up your development environment 13
to be able to build your software for multiple platforms using different Integrated
Development Environments (IDEs). We will use the CMake1 build tool to help
us generate projects for various IDEs. This will make building the software for
different hardware and OS platforms possible.
CMake
CMake is the build tool we will use in this book. With CMake, you write a single
configuration file then you can use it to generate projects for different IDEs. In
the configuration file, you can specify different targets for the build, such as a test
program and a main program. You can specify associations between your project
and external libraries. You can specify actions to be taken, such as copying files.
This makes it a valuable tool for audio application developers wishing to support
various operating systems, as you can maintain a single CMake configuration and
codebase and use it to build for multiple platforms.
Codebase
The codebase is the set of source code files in a project. The build tool and some
handy macros will allow us to have a single codebase for all platforms.
Native program
Some of the programs you encounter in the book will be compiled into native pro
grams in machine code that run on particular CPU hardware. We will write these
programs using the C++ language. Native programs are most appropriate when
developers wish to integrate directly with plugin and Digital Audio Workstation
(DAW) technology. For example, VST3 plugins are native programs. Native pro
grams generally run faster than interpreted programs, and that is important for
realtime audio applications.
1 https://CMake.org/
Setting up your development environment 14
Interpreted program
Some of the programs we write will be interpreted as opposed to compiled. In
terpreted programs are converted to machine code on the fly instead of being
converted into machine code before running. We will write interpreted programs
in the Python language. Interpreted programs are more suited to the kind of ex
perimentation one needs to do when developing machine learning models. It is
common for AI researchers to provide Python code along with their research pa
pers to allow other people to explore their work more easily. It is less common
for researchers to provide C++ code, but there has been a trend towards this in
AI-music research in the last few years.
Trained model
A trained model is a machine learning model that has learnt something valuable.
The model defines the structure of the machine learning component. Training
teaches that component to process data in a particular way. The ability to self-
configure through learning is the essence of machine learning. Trained models
provided by a third party are often called pre-trained models. In case you have
heard of openAI’s infamous GPT model, the ‘P’ stands for pre-trained.
Inference
Inference is the process of using a trained model to generate an output. Inference
does not change the internal configuration of the machine learning model; it just
passes data through it. Once a model is trained, you will use it for inference.
Setting up your development environment 15
Generative model
A generative model is a machine learning model that can generate something
interesting. For example, instead of detecting cats in images, it might generate
images of cats. The ‘G’ in GPT stands for generative, as GPT generates text.
Audio library
An audio library is a set of components that makes it easier to construct audio
applications. Typical components are audio file readers and writers, audio device
management and sound synthesis routines. We will use the JUCE audio library
as it allows us to construct cross-platform audio applications and plugins. If you
are not keen on using JUCE, you should be able to convert the applications we
make to work in other audio libraries. For example, it is possible to create VST
plugins directly using Steinberg’s library, or you could use IPlug2. There are a few
reasons I have chosen to use JUCE: it provides a very consistent experience on
different platforms, it includes a set of cross-platform user interface components;
it is compatible with the CMake build tool, and it can export plugins in several
formats such as VST3, AudioUnit and so on. JUCE also provides components to
build applications and plugins that can host other plugins, a capability we will
make use of in some of the examples in the book.
the audio library) is included inside your program’s binary. Dynamic linking means
the library exists as a separate file, and your application stores a reference to it.
Depending on your platform, dynamically linked libraries are also called DLLs,
shared objects and dylibs. When you run an application with dynamic links, those
libraries must be located on the computer running the application. Depending on
the operating system, the process of locating linked libraries varies. I will explain
in more detail how to deal with this when you encounter it in the book.
Plugin
A plugin is a software component that works inside a larger program. We will
create plugins that work inside DAWs. The plugins we create will include machine
learning capabilities.
Plugin host
A plugin host is any software that can load and use external plugins. DAWs are
plugin hosts, but we will also use a simpler plugin host in some of our examples
to help test our software and to allow our software to host and control plugins.
FIGURE 2.4
Creating, building and running a C++ console program in Visual Studio.
FIGURE 2.6
Creating, building and running a C++ console program in Xcode.
the Xcode download. This download is more reliable, and you can also select older
versions of Xcode in case you have an older version of macOS. Once you have
installed Xcode, you must install the command line tools. You can do this by
running the following command in the Terminal app:
sudo xcode - select -- install
Once all that is ready, you can create, build and run a C++ command line
program in Xcode. Figure 2.6 contains screenshots showing those three stages.
to install the C++ build tools and CMake. For example, on Ubuntu, I run the
following command:
sudo apt install build - essentials cmake
Let’s see if you can create, compile and run a C++ program in Linux. Put this
into your text editor of choice:
# include < iostream >
int main () {
std :: cout << " Hello , AI - enhanced audio ! " << std :: endl ;
}
Save it as ‘main.cpp’ and run this command to compile and link it:
g ++ main . cpp -o myprogram
FIGURE 2.7
My development setup showing an M1 Mac running macOS hidden on the left,
a ThinkPad running Ubuntu 22.04 in front of the monitor and a Gigabyte Aero
running Windows 11 on the right.
This chapter will teach you about the JUCE library and associated tools. JUCE
allows you to create C++ software projects which can be exported for multiple
platforms from the same code-base. I will talk you through your first plugin build
using Projucer and an IDE, covering Windows, macOS and Linux. I will also show
you how you can test your plugins using the AudioPluginHost tool, a plugin host
that comes with the JUCE library. At the end of the chapter, you will be building,
running and testing plugins on your machine.
23
Installing JUCE 24
3.3 Projucer
Projucer is a program that comes with the JUCE
distribution. Projucer is a central hub for managing
your JUCE projects. It can generate different types of
projects for different OS and IDE combinations. Figure
3.1 shows the kinds of applications for which Projucer
can generate templates. The most interesting one for
our purposes is Basic Plugin.
Go ahead – run Projucer and create a new ‘Basic
Plugin’ project. You will be prompted to name it and
save it somewhere. You should end up on the Projucer
project screen, as shown in figure 3.2. If you click on the
Modules tab on the left, you can check if your JUCE
system is configured correctly – the modules should not
FIGURE 3.1
be highlighted in red. If they are red, check that the
The available applica
‘Global paths’ options are set correctly from the file
tion types for Projucer.
menu to point to where you have unzipped the JUCE
folder. Clicking on the Exporters tab lets you add new
exporters by clicking the add button. Make sure there is an exporter for your
system.
Now you are ready to save the project for use in your IDE. Xcode and Visual
Studio users can select the exporter they want from the dropdown at the top of the
Projucer menu and click the icon to the right of it. This will launch the project in
1 https://www.juce.com
2 https://github.com/juce-framework/JUCE/releases
Installing JUCE 25
FIGURE 3.2
Projucer project view. The exporter panel is exposed on the left, module config
uration panel is on the right. Note that my modules are all set to Global.
your IDE. Linux users will need to execute the Makefile from the terminal. More
on that shortly. Saving the project should generate a folder hierarchy like the one
shown below.
|-- Builds
| | - - LinuxMakefile
| | - - macOSX
| ‘-- V is u a l S tu d i o 2 0 2 2
|-- Ju ce Lib ra ry C od e
|-- MyNewPlugin . jucer
‘-- Source
| - - PluginEditor . cpp
| - - PluginEditor . h
| - - Plug i nP ro ce s so r . cpp
‘-- Pl ug i nP ro ce s so r . h
The Builds folder contains the IDE project files. The Source folder contains
the source code for your project that you will end up editing. The .jucer file is the
Projucer configuration file.
Installing JUCE 26
FIGURE 3.3
Building the Standalone solution in Visual Studio Community 2022.
FIGURE 3.4
Enabling console output for a JUCE project in Visual Studio/ Windows. At the
top: redirect text output to the immediate window and open the immediate win
dow. At the bottom: a program running with DBG output showing in the imme
diate window.
Installing JUCE 29
FIGURE 3.5
Running a JUCE plugin project in Xcode – make sure you select Standalone.
This will build your project and store it in a build folder. To run it:
./ build / Y o ur Pr o je ct Na m e
If this seems a bit ‘manual’, do not worry – we will see a more powerful way
to build and run JUCE projects on Linux once we install CMake.
FIGURE 3.6
The JUCE AudioPluginHost application, which comes with the JUCE distribu
tion. One of its built–in plugins, a sine synth, is wired to the MIDI input and the
audio output.
The plugin host is part of the download for JUCE. It is located in the JUCE/ex-
tras/AudioPluginHost folder. You will find a .jucer file there that you can open
with Projucer. Save the project from Projucer, then build and run it as usual.
You should see an interface like figure 3.6. The AudioPluginHost application is a
graph-based environment which allows you to insert plugins into a graph and then
connect them to MIDI data inputs and outputs and audio inputs and outputs. The
AudioPluginHost application has some ready-made plugins that you can wire into
the graph.
How can you wire your plugin to the graph? The first thing to know is that
audio plugins are installed to standard locations on your system. When a plugin
host such as Reaper or Cubase starts up, it scans these locations for new or updated
plugins. Then the host knows which plugins are available. The AudioPluginHost
can also scan these folders to find plugins. The standard locations for VST3 plugins
are:
Windows:
C :\ Program Files \ Common Files \ VST3
and
< home directory >\ AppData \ Local Programs \ Common \ VST3
Mac:
/ Library / Audio / Plugins / VST3
and
< home directory >/ Library / Audio / Plugins / VST3
Installing JUCE 31
Linux:
/ usr / lib / vst3 /
and
< home directory >/. vst3 /
Now that everything is working, we are going to go back a step and break every
thing again(!). We will swap out the Projucer application for an alternative project
generator tool called CMake. This chapter shows you how to get up and running
with JUCE and CMake. CMake is a more general-purpose tool than Projucer, and
it is particularly good at integrating libraries into projects if they also use CMake.
The machine learning library libtorch and the neural network library RTNeural
are examples of libraries we’ll use later that support CMake. You may wonder why
we bothered with Projucer instead of going straight for CMake. Three reasons:
1) I wanted to lead you in gently, 2) Projucer is helpful for running and building
example projects from JUCE, and it provides a more straightforward method to
verify that you have an IDE with C++ capabilities and JUCE 3) Putting CMake
in the mix too early would provide too many opportunities for frustrating config
uration problems. Don’t worry – setting up CMake should be straightforward if
you are here and you have a working build system.
32
Installing and using CMake 33
FIGURE 4.1
CMake running in the Windows Powershell.
2 https://brew.sh/
Installing and using CMake 34
Your project folder should now have a file hierarchy like this:
.
| - - CMakeLists . txt
‘-- src
‘-- main . cpp
Now we are ready to use CMake to generate an IDE project. Fire up your
preferred shell—for me, Terminal.app on macOS, Powershell.exe on Windows or
the regular Terminal program on Ubuntu. Change the directory inside the com
mand shell to the project folder you just created. That will probably involve the
cd command. Once in the directory, run this CMake command:
1 cmake -G
This will list all the available project generators that CMake can use. Remem
ber that the idea is to configure the build using CMake then to generate projects
Installing and using CMake 35
for the various IDEs. This is conceptually similar to how we used Projucer. We
are swapping out Projucer for CMake. The command output should also show you
which generator is the default. The default generator on my Windows 10 system
is ‘Visual Studio 17 2022’. It also lists ‘Visual Studio 16 2019’ as an option. The
default on my macOS and Ubuntu systems is ‘Unix Makefiles’, but the Mac also
lists ‘Xcode’. To generate the project for a specific target, specify that target using
the G option:
# Windows / Visual Studio version :
cmake -G " Visual Studio 17 2022 " -B build .
# macOS / Xcode version
cmake -G " Xcode " -B build .
# Linux / M a k e f i l e version
cmake -G " Unix Makefiles -B build .
The B option tells CMake to create a folder called build and to output its
generated project into that folder. The full stop at the end (‘.’) tells it that the
CMake configuration file is in the current folder. So this command should create
a folder called build containing your IDE project.
Those commands will generate a folder called ‘build’ and then build the project
into that folder. Whilst that is the simplest possible command sequence to build
a CMake project, CMake actually generates multiple forms of the build. For the
debug build (no compiler optimisations, debug flags set, assertions asserted):
cmake -- build build -- config Debug
Now to carry out the release build (compiler optimisations on, no debug flags,
assertions not asserted):
cmake -- build build -- config Release
Note that for this simple project, running the second build command will wipe
out the previously built debug-mode executable.
Installing and using CMake 36
program
output in run the program
console build the program
FIGURE 4.2
A CMake project viewed in VSCode.
menu and run the ‘Cmake: Scan for Kits’ command. Here is some of the output
from that command on my Windows 10 machine:
[ kit ] Found Kit : Visual Studio Community 2022 Release - x86
[ kit ] Found Kit : Visual Studio Community 2022 Release - x86_amd64
[ kit ] Found Kit : Visual Studio Community 2022 Release - amd64_x86
[ kit ] Found Kit : Visual Studio Community 2022 Release - amd64
...
[ kit ] Found Kit : Clang 15.0.1 ( MSVC CLI ) for MSVC 1 7.5.3341 4.496
( Visual Studio Community 2022 Release - x86 )
[ kit ] Found Kit : Clang 15.0.1 ( GNU CLI ) for MSVC 17. 5.33414 .496
( Visual Studio Community 2022 Release - x86 )
...
What is the difference between the Visual Studio 2022 options: x86, x86 amd64,
amd64 x86 and amd64? The first part is the CPU architecture for the machine
used to build the binary, and the second part is the CPU architecture for the
machine(s) that will run the binary. If there is only one part (amd64), the builder
and target are the same. amd64 is a 64-bit architecture, and x86 is a 32-bit archi
tecture. If you are on a 64-bit machine running 64-bit Windows, choosing amd64
is fine.
What about macOS? Here is the output I see on my m1 macOS machine (only
one kit was found):
[ kit ] Found Kit : Clang 14.0.0 arm64 - apple - darwin22 .3.0
Here is some of the output I see on my Ubuntu Linux machine. I had to open
the output area from the View menu and then select CMake/ build from the list
of possible sources to see the list:
...
[ kit ] Found Kit : GCC 11.3.0 x86_64 - linux - gnu
[ kit ] Found Kit : Clang 14.0.0 x86_64 - pc - linux - gnu
[ kit ] Found Kit : Clang 11.1.0 x86_64 - pc - linux - gnu
---
The naming convention for build and target machine described for Windows
does not apply to Linux – x86 64-linux-gnu builds on 64-bit and targets 64-bit.
Once you have selected your build kit, click the ‘build’ button at the bottom
of VSCode to compile and link, then the play icon to run the program. You should
see the output of the hello-cmake program in VSCode’s built-in Terminal.
FIGURE 4.3
A CMake project viewed in Visual Studio Community 2022.
Studio’s built-in CMake support. I recommend the first option as I have encoun
tered issues with the second approach, though it does produce neater-looking
JUCE projects. The following instructions relate to projects generated from the
CMake command.
After running the CMake command with the -G option described above, you
should see a file called hello-cmake.sln in the build folder. Find this file in Win
dows Explorer – the easy way is to run the command explorer.exe. in Powershell
when in the build folder. Then open the .sln file in Visual Studio. A .sln file is
a VS solution file, a collection of ‘products’. Each product within the solution has
a separate Visual Studio project file. Figure 4.3 shows the hell-cmake project in
Visual Studio, highlighting the list of targets, console and run button. The project
will start with ALL BUILD as the target for the play button, so you should right
click on hello-cmake in the solution explorer and select the ‘Set as Startup Project’
option. Clicking the play button now builds and runs hello-cmake.
Installing and using CMake 39
FIGURE 4.4
A CMake project viewed in Xcode.
That will generate an Xcode project in the build folder which you can open
with Xcode. After opening the project in Xcode, hit the play button, and it will
build and run the program. Figure 4.4 illustrates what you will see when you open
a CMake project in Xcode.
to use any Microsoft tools. In that latter case, please refer to the command line
instructions above, or read the documentation for the CMake support in your
preferred IDE.
You can obtain some template PlugEditor and PluginProcessor files by gen
erating a Basic Plugin Project in Projucer and copying over the contents of the
Source folder. Or you could start from the CMake version of the project described
in the repository guide section 39.2.2.
The CMakeLists.txt file should contain the following lines. Do not worry too
much about what everything here is doing yet. Several of these commands are
custom macros provided by the JUCE library’s CMake API. You can find more
information about the JUCE CMake API on the github repository3 . These macros
automatically become available to CMake because the JUCE library gets added
to your project with the add subdirectory command. I will explain the essential
parts which you need to change below and the others when we need them. Put
the following text into your CMakeLists.txt file (or access the example from the
repository):
3 https://github.com/juce-framework/JUCE/blob/master/docs/CMake API.md
Installing and using CMake 41
1 c m a k e _ m i n i m u m _ r e q u i r e d ( VERSION 3.15)
2 project ( minim al_plug in VERSION 0.0.1)
3 # where is your JUCE folder ? ../../ JUCE for me
4 a d d _ s u b d i r e c t o r y (../../ JUCE ./ JUCE )
5 j uc e_ ad d _plu gi n ( mini mal_plug in
6 COMPANY_NAME Yee - King
7 # set to false if you want audio input
8 IS_SYNTH TRUE
9 N E E D S _M I D I _ I N PU T TRUE
10 IS_M IDI_EFFE CT FALSE
11 N E E D S _ M I D I _ O U T P U T TRUE
12 C O P Y _ P L U G I N _ A F T E R _ B U I L D TRUE
13 P L U G I N _ M A N U F A C T U R E R _ C O D E Yeek
14 # should change for each plugin
15 PLUGIN_CODE Abc1
16 FORMATS AU VST3 Standalone
17 # should change for each plugin
18 PRODUCT_NAME " m inimal_p lugin " )
19
20 j u c e _ g e n e r a t e _ j u c e _ h e a d e r ( minimal_ plugin )
21
22 targe t_sourc es ( min imal_pl ugin
23 PRIVATE
24 src / PluginEditor . cpp
25 src / Pl ug i nP ro ce s so r . cpp )
26
27 t a r g e t _ c o m p i l e _ d e f i n i t i o n s ( minimal _plugin
28 PUBLIC #
29 JUCE_ALSA =1
30 J UC E _ D I R E C T S O U ND =1
31 J U C E _ D I S A B L E _ C A U T I O U S _ P A R A M E T E R _ I D _ C H E C K I N G =1
32 J UC E _ U S E _ O G G V O R B I S =1
33 J UC E _ W E B _ B R O W S ER =0
34 JUCE_USE_CURL =0
35 J U C E _ V S T 3 _ C A N _ R E P L A C E _ V S T 2 =0)
36
37 t a r g e t _ l i n k _ l i b r a r i e s ( m inimal_p lugin
38 PRIVATE
39 juce :: j u c e _ a u d i o _ u t i l s
40 PUBLIC
41 juce :: j u c e _ r e c o m m e n d e d _ c o n f i g _ f l a g s
42 juce :: j u c e _ r e c o m m e n d e d _ l t o _ f l a g s
43 juce :: j u c e _ r e c o m m e n d e d _ w a r n i n g _ f l a g s )
Note that setting the COPY PLUGIN AFTER BUILD property in the
juce add plugin block to TRUE is equivalent to editing the Projucer settings to
enable the copy plugin step.
You might add the header files to the target sources command as well, if that
is appropriate for your setup. I have seen people adding header files when working
in Visual Studio Community as it makes them easier to find in the project.
Installing and using CMake 42
The first ‘../../JUCE’ tells CMake where JUCE can be found. I keep JUCE
two folders up from my current project directory, so I specify ‘../../’. You can put
any path you like there. The second part, ‘./JUCE’, defines where JUCE will be
copied to in the build folder. You can leave that as it is.
Next, the project title. Everywhere you see ‘minimal plugin’ in the CMake
Lists.txt file, replace it with your chosen project title. This can be any string
compatible with CMake’s project naming conventions. You can use find and re
place in your code editor to do it. I found the following commands need to be
edited to implement a new project name:
project ( minim al_plug in -> your - plugin
j uc e_ ad d _plu g in ( mini mal_plug in -> your - plugin
j u c e _ g e n e r a t e _ j u c e _ h e a d e r ( minimal_ plugin ) -> etc .
targe t_sourc es ( min imal_pl ugin
t a r g e t _ c o m p i l e _ d e f i n i t i o n s ( minimal _plugin
t a r g e t _ l i n k _ l i b r a r i e s ( m inimal_p lugin
Now you need to change the properties the plugin will report to your host.
These are the company name and plugin identifiers your DAW displays when you
search for plugins. In the section ‘juce add plugin’, change the following properties:
COMPANY_NAME Yee - King # change to your company name
P L U G I N _ M A N U F A C T U R E R _ C O D E Yeek # make one up !
PLUGIN_CODE Abc1 # each of your plugins needs a unique code
PRODUCT_NAME " m inimal_p lugin " # make one up !
The chapter explains how to install and use the libtorch machine learning library.
You will start by examining a minimal example of a libtorch C++ program. Then
you will dig into the libtorch installation process and see how to integrate libtorch
into a simple CMake build. Following that, you will see how to combine libtorch
and JUCE in a single project. At the end of the chapter, you will be ready to
experiment with libtorch and neural networks in C++. Please note that these
instructions are correct and tested on Windows, macOS and Linux at the time of
writing the book, but there are a few moving parts here that are subject to change.
Please refer to the repo guide example 39.2.10 which will provide the latest and
most correct CMake configuration.
44
Set up libtorch 45
This is an elementary program, but if you can build and run it, you are ready
to use a state-of-the-art machine-learning library in your own native applications.
If you have some experience working in C++, you should understand most of what
is going on. For clarity, I will step through the program line-by-line and explain
it. Firstly:
1 # include < torch / torch .h >
2 # include < iostream >
These lines include the header file for the libtorch library and the C++ stan
dard library iostream header. Remember that in C++, header files contain decla
rations of functions, constants and such. They tell your IDE and compiler which
functions are available. When you include a header, you are making the things
declared in that header available for use in your program. So we are making the
libtorch and iostream libraries available for use in our program.
Note that the included files are in angle brackets (< ... >). That impacts
where the IDE searches for these headers for its autocompletion and code-checking
functionality and where the compiler searches when it pulls in the headers during
the compile phase. Using angle brackets means they are located in a standard or,
at least, specified place on your system. So there needs to be a folder somewhere
called torch with a file called torch.h, and the IDE and compiler need to know
where that folder is. We will use CMake to tell the IDE and compiler where to
look later. Next, we have the main function:
1 int main () {
2 torch :: Tensor tensor = torch :: rand ({2 , 3}) ;
3 std :: cout << tensor << std :: endl ;
4 return 0;
5 }
The main function is the entry point for the execution of your program. In
other words, it is the first function to automatically be called when your program
runs. This main function starts by creating an object of type Tensor from the
torch namespace. That is something declared in the torch.h header. We assign
that Tensor object to a variable called tensor. Remember that objects in C++
contain data and functions relating to that data. That means our tensor object
contains some data and some Tensor-related functions. Tensors are the core data
structures in neural network-based machine learning, and we will encounter lots
of them in this book.
Set up libtorch 46
We created the Tensor object using a function from the libtorch library called
rand. rand takes an array as an argument, in this case, containing a 2 and a 3. I
am assuming you know what an array is. We then use the cout function from the
standard library to print out the contents of the tensor variable.
Here is the expected output of the program:
0.9660 0.2080 0.5723
0.6885 0.2689 0.9441
[ CPUFloatType {2 ,3} ]
called src in my user account home folder. Later we will need to tell CMake where
that libtorch folder is to make those includes work. I will explain how to do that
shortly.
1 https://github.com/mlverse/libtorch-mac-m1
2 https://github.com/pytorch/pytorch/blob/main/docs/libtorch.rst
Set up libtorch 48
Next, the find package command tells CMake to verify we have the Torch
package installed. When you run CMake, it will look for libtorch in the stan
dard places and any other places you added using the set(CMAKE PREFIX PATH
.. command. set(MAKE CXX FLAGS ...) tells CMake to add the torch flags to
the compile command. Finally, target link libraries tells CMake to link the
program against the torch library.
This instructs the build system to copy the libtorch shared libraries into the
same folder as the compiled application (.exe). I recommend that you check the
final version of the CMakeLists.txt file from the repo guide example 39.2.10
rather than typing it by hand as these instructions might have changed since the
book was written.
For other platforms: the build you create on your machine should link to
libtorch in the place it is located on your machine. That means that if you want
the software to run on other people’s machines, you will probably need to have
Set up libtorch 49
libtorch in the same place. So you should probably look into creating an installer
if you want to share your software with other people. Unfortunately the details of
creating installers are beyond the scope of this book. Please refer to the GitHub
repository where I will actively maintain information that readers of the book
request such as this.
You can put that above the target link libraries command in CmakeLists.txt.
Add a line to the target link libraries command as follows:
1 t a r g e t _ l i n k _ l i b r a r i e s ( fm - torchknob
2 PRIVATE
3 juce :: j u c e _ a u d i o _ u t i l s
4
5 # ### add this line ###
6 " $ { TO RC H _L IB RA R IE S } "
Set up libtorch 51
7
8 PUBLIC
9 juce :: j u c e _ r e c o m m e n d e d _ c o n f i g _ f l a g s
10 juce :: j u c e _ r e c o m m e n d e d _ l t o _ f l a g s
11 juce :: j u c e _ r e c o m m e n d e d _ w a r n i n g _ f l a g s )
That instructs CMake to link your project against the libtorch library. Then
if you are on Windows, you need to add this:
1 if ( MSVC )
2 file ( GLOB TORCH_DLLS " $ { T O R C H _ I N S T A L L _ P R E F I X }/ lib /*. dll " )
3 a d d _ c u s t o m_ c o m m a n d ( TARGET fm - torchknob
4 POST_BUILD
5 COMMAND $ { CMAKE_COMMAND } -E c o p y _ i f _ d i f f e r e n t
6 $ { TORCH_DLLS }
7 $ < TA RG E T_ FI LE _ DI R : fm - torchknob >)
8 endif ( MSVC )
Ensure you set ‘fm-torchknob’ to the same value as your project name (which
you specify in the target link libraries command).
At this point, attempt a build on your project to verify you have those com
mands set correctly. Cmake should run without errors, and you should be able to
build and run the resulting project. Since you have not changed the code, merely
adding libtorch to the build, the plugin will be functionally identical to the FM
superknob plugin.
An important thing to note is that you can no longer use the JuceHeader.h as
an include. The reason is that at the time of writing, the JuceHeader.h file calls
using namespace juce, which places many things in the global namespace. Torch
also puts many things in the global namespace, some of which have the same name
as the ones placed there by JUCE. This causes compiler problems. So instead of
including the entire JuceHeader.h you should include the individual headers from
the JUCE modules folder. For example, here is how I have my includes set up in
the PluginEditor.h file:
1 // not this :
2 // # include < J u c e H e a d e r .h >
3 // But this :
4 # include < ju c e_ gu i _b as ic s / j u ce _g ui _ ba si cs .h >
5 # include < j u c e _ a u d i o _ u t i l s / j u c e _ a u d i o _ u t il s .h >
You will need to figure out which headers to include to access certain parts of
the JUCE API. The JUCE documentation tells you where each class is defined.
Set up libtorch 52
Note that I am using std::cout since JUCE’s DBG macro cannot handle print
ing tensors like this. You should see some output like this:
JUCE and torch 0.1046
[ CPUFloatType {1 ,1} ]
In this chapter, I will provide instructions on how to set up your Python environ
ment for the work in later chapters. We will only use Python for training neural
networks and some analytical work. We will use C++ to build applications. When
Python code is needed, I will not explain it in such great detail as the C++ code
because machine learning in Python is very well covered in other books, unlike
the C++ content. I will explain the principles behind the code and provide work
ing example code in the code repository. Only some of the examples in the book
require Python code.
53
Python setup instructions 54
If you see the message printed out, you are ready to go.
If, for some reason, you do not have Python installed on your Mac, you can
install it with homebrew, as recommended for the CMake install previously. You
probably do not need to do this if Python is installed. You can even select partic
ular versions:
1 myk@Matthews - Mini ~ % brew install python@3 .10
2 myk@Matthews - Mini ~ % / opt / homebrew / bin / python3 .10
3 Python 3.10.13 ( main , Aug 24 2023 , 22:36:46) [ Clang 14.0.3 ( clang
- 1 4 0 3 . 0 . 2 2 . 1 4 .1 ) ] on darwin
4 Type " help " , " copyright " , " credits " or " license " for more information
.
5 >>> print (" Lets make some brains ")
learning and audio processing, you must install additional packages to your ma
chine. Python packages are equivalent to C++ libraries. I will show you how to
install Python packages after I explain how to set up a virtual environment in the
next section.
FIGURE 6.1
A Jupyter notebook in action. There are cells containing Python code which you
can execute. If you trigger a plot command, the plot will be embedded in the
worksheet.
Python setup instructions 58
You can access documentation within the shell. Go and find yourself an Ipython
tutorial to learn some more tricks.
Jupyter notebooks is a browser-based Python environment. It fires up a web
application you can interact with via your web browser. Many people like to work
in the notebook environment as it allows you to create blocks of code, display
graphs inline, etc. You can even add markdown blocks to your notebook. To run
the system, assuming you have installed Jupyter with pip:
1 jupyter - notebook
This will print out a load of messages, and then it will probably open your web
browser, pointed at the URL: http://localhost:8888/tree. If it does not open the
web browser, just open it yourself and visit that address.
You will see a listing of the folder you were in in your shell when you ran the
command. You might want to quit it and restart it from your working folder to
make it easier to find your files. You can create a new notebook and then start
entering and running code. See figure 6.1 for an example of a notebook in action.
This section provides solutions for common problems people encounter when set
ting up their development environment and building applications.
Problem The IDE does not recognise anything about JUCE when I generate
a project from Projucer.
Solution This can happen if you generate a project with Projucer and later
move the JUCE files to a different location. To verify your setup, try generating
a new project from Projucer, checking the module setup and then building it. If
that works, you can return to your non-working project in Projucer and determine
what differs from the newly generated project.
Problem Visual Studio complains about the v143 (or higher) toolkit not being
installed when you try to build a project.
Solution Re-run the Visual Studio Installer and select the C++ packages with
143 (or higher) in their titles. Or you can choose a different platform toolset from
Projucer’s Visual Studio exporter configuration.
Problem The IDE build completely fails on a newly created Projucer project.
Solution This can happen if you do not have the appropriate components
installed with your IDE. For Visual Studio, you need the components shown in
the instructions earlier in this section. For Xcode, you need to install the Xcode
command line tools to work with C++. For Linux, you need the build-essentials
(Debian) or equivalent for your distribution.
Problem Building with libtorch / JUCE combination complains that a refer
ence to ‘nullopt’ is ambiguous.
Solution This is caused by including the complete JuceHeader. JuceHeader
calls ‘using namespace juce’, which puts all JUCE definitions and such in the
‘global’ namespace. libtorch also puts some things there that then clash with the
JUCE ones. The solution is to include the individual JUCE headers you need
instead of the complete JuceHeader.h file.
Problem The compiler does not seem to understand basic code syntax, such
as initialiser lists on constructors.
Solution I have observed this when building projects on a Mac using Visual
Studio Code. It can be caused by the default C++ version being pre- C++11.
60
Common development environment setup problems 61
Before C++11, initialiser lists and other syntax were not part of the standard.
The solution is to tell the build tools which C++ version you want to use. Add
the following to your CMakeLists.txt file:
You should change project-name to the project name you set in the CMake
project command.
8
Basic plugin development
In this chapter, you will build a simple sine wave synthesizer plugin. This process
will show you some essential aspects of plugin development. You will learn about
the processBlock function and how to fill up a buffer with numbers representing
the signal you want your plugin to generate. You will learn about user interface
widgets and how to edit your plugin’s user interface. This will involve implementing
event listeners and layout code. At the end of the chapter, you will have a simple
plugin up and running.
Now add a function prototype (the specification for a function) to the private
section of the class defined in PluginProcessor.h:
62
Basic plugin development 63
...
private :
double getDPhase ( double freq , double sampleRate ) ;
...
Open up PluginProcessor.cpp and find the constructor. You will initialise the
phase, dphase and frequency variables here:
T e s t P l u g i n A u d i o P r o c e s s o r :: T e s t P l u g i n A u d i o P r o c e s s o r
...
{
phase = 0;
dphase = 0;
frequency = 440;
}
...
Or, if you want to use a more modern C++ style, you can use an initialiser
list:
T e s t P l u g i n A u d i o P r o c e s s o r :: T e s t P l u g i n A u d i o P r o c e s s o r
...
) // at the end of the b r a c k e t s
// put the i n i t i a l i s e r list
: phase {0} , dphase {0} , frequency {440}
{
// then remove the phase = 0 etc .
// things from inside the c o n s t r u c t o r
}
...
Then the implementation of the getDPhase function tells us how much the
phase of the sine generator changes per sample and allows us to synthesize a sine
tone with the correct frequency for the current audio system configuration.
1 double T e s t P l u g i n A u d i o P r o c e s s o r :: getDPhase ( double freq , double
sampleRate )
2 {
3 double two_pi = 3.1415927 * 2;
4 return ( two_pi / sampleRate ) * freq ;
5 }
Let’s clarify what that code is doing. The sine wave gets through its complete
cycle in 2π. So if you compute the output of the sine function with values rising
from 0 to 2π (about 6.3) and plot it, you will see the complete sine wave. The
input to the sine function is called the phase. The top half of figure 8.1 illustrates
how the sine function varies with phase.
But digital audio systems are discrete – they go in steps. In fact, we have
‘sampleRate’ steps in one second, e.g. 44,100 steps. So to complete the sine wave
in one second, we need 44,100 values. So we slice 2π into 44,100 steps, which is how
much the phase changes in each step. For a 1Hz sine wave with the audio system
Basic plugin development 64
FIGURE 8.1
dphase depends on the sample rate (the space between the samples) and the
frequency (how fast you need to get through the sine wave).
2π
sample rate set to 44,100Hz, dphase is 44100 . That is the first part of line 4 in the
getDPhase function. The lower half of figure 8.1 illustrates a discrete version of
the sine function.
What happens if the frequency is higher? Higher frequency means the sine wave
goes through its cycle faster so dphase is higher. Double the frequency, double the
dphase. So that is why we scale the 1Hz dphase by the frequency in the second
part of line 4 in the getDPhase function.
You can see that it receives an audio buffer and a MIDI buffer. In the case of
a synthesizer, the audio buffer provides a place to write the audio output of the
synthesizer. The MIDI buffer provides a place to receive MIDI messages that the
synthesizer needs to process.
Basic plugin development 66
For the sine synthesizer, the processBlock function needs to write a sine tone
into the buffer. Remember that this function will automatically be called repeat
edly by the audio host so that it can receive audio from your plugin. For now, we
will use the std::sin function to generate the sine wave. There are more efficient
ways to do it than this one, but it will be efficient enough for now. We will start
with a one-channel sine tone – noting that the plugin is probably running in stereo.
Usually, the default processBlock code that Projucer generates for a blank
project looks something like this:
1 juce :: S c o pe d N o D e n o r m a l s noDenormals ;
2 auto t o t a l N u m I n p u t C h a n n e l s = g e t T o t a l N u m I n p u t C h a n n e l s () ;
3 auto t o t a l N u m O u t p u t C h a n n e l s = g e t T o t a l N u m O u t p u t C h a n n e l s () ;
4 for ( auto i = t o t a l N u m I n p u t C h a n n e l s ; i < t o t a l N u m O u t p u t C h a n n e l s ; ++ i )
{
5 buffer . clear (i , 0 , buffer . getNumSamples () ) ;
6 }
Line 1 looks a little odd but essentially it prevents extremely small floating
point values (denormals) from causing problems. Then the code iterates over the
channels in the sent buffer and clears them out so they are silent. Add the following
code after those lines in the default processBlock code:
1 for ( int channel = 0; channel < t o t a l N u m O u t p u t C h a n n e l s ; ++ channel )
2 {
3 if ( channel == 0) {
4 auto * channelData = buffer . g et W ri te P oi nt er ( channel ) ;
5 int numSamples = buffer . getNumSamples () ;
6 for ( int sInd =0; sInd < numSamples ; ++ sInd ) {
7 channelData [ sInd ] = ( float ) ( std :: sin ( phase ) * 0.25) ;
8 phase += dphase ;
9 }
10 }
11 }
Read that code carefully – what do you think it is doing? Can you make the
sine tone louder? Be careful if you are wearing headphones. Try running your
plugin in the JUCE AudioPluginHost.
Challenge: can you make the sine tone work in stereo? You could use an array
of phases, one for each channel, or can you do it with a single phase variable?
code with the constantly playing sine tone, and prepare a new copy where you can
continue working on it.
You will now make the sine tone interactive by adding a GUI control for the
frequency. There are two parts to this: adding a widget to the user interface and
then adding the code to respond to the widget. Adding a widget is a three-stage
process:
1. Add a variable to the private area of the PluginEditor class in Plug-
inEditor.h:
private :
...
juce :: Slider freqControl ;
Carry out these steps, then build and run it. You should see the slider displayed
on the user interface. The JUCE Slider class reference can be found here1 . Now
you have the slider displaying, you need to respond to it. You must implement
the SliderListener interface in your PluginEditor to respond to a slider. This is a
four-step process:
1. Specify that your class inherits from the Slider::Listener. In PluginEdi
tor.h:
class T e s t P l u g i n A u d i o P r o c e s s o r E d i t o r :
public juce :: AudioProcessorEditor ,
public Slider :: Listener
1 https://docs.juce.com/master/classSlider.html
2 https://docs.juce.com/master/classSlider 1 1Listener.html
Basic plugin development 68
4. Now tell the slider that we want to listen to it: in the PluginEditor.cpp,
constructor:
freqControl . addListener ( this ) ;
If you run this code, you should see messages printed to the console when
you move the slider. The next step is to change the parameters on the synthesis
system so it changes the frequency of the sine tone according to the value set
on the slider. You must establish communication between the PluginEditor and
the PluginProcessor to achieve that. First, you need a public function on the
PluginProcessor, which the PluginEditor can call to change the frequency. Add
the following function signature to the public section of PluginProcessor.h:
void u pd ateF r eq ue nc y ( double newFreq ) ;
Can you remember the name of the variable that we need to change to make
the sine wave go faster? Can you remember how we set the initial frequency for
the sine wave? Check out the code in prepareToPlay. You need to do the same
thing in update frequency. So, in PluginProcessor.cpp, add an implementation for
updateFrequency:
1 void T e s t P l u g i n A u d i o P r o c e s s o r :: up da te F re qu en c y ( double newFreq )
2 {
3 frequency = newFreq ;
4 dphase = getDPhase ( frequency , getSampleRate () ) ;
5 }
We stored the updated frequency in case we needed it later and changed the
dphase variable to reflect the new frequency. The final step is to call this function
from the PluginEditor when the slider moves, so back to PluginEditor.cpp:
1 void T e s t P l u g i n A u d i o P r o c e s s o r E d i t o r :: s l i d e r V a l u e C h a n g e d ( Slider *
slider )
2 {
3 if ( slider == & freqControl ) {
4 // get the slider value and do s o m e t h i n g
5 DBG ( " Slider value " << slider - > getValue () ) ;
6 audi oProcess or . up da t eF re qu e nc y ( slider - > getValue () ) ;
7 }
8 }
Basic plugin development 69
So, instead of simply returning true or false, it checks a property set in the
CMakeLists.txt file. Look in the CMakeLists.txt file for this line:
N E E D S _ M I D I _ I N P U T TRUE
You can simply return true or false, but the polite JUCE way is to edit CMake
Lists.txt. If you update CMakeLists.txt, you might need to regenerate your IDE
project. The IDE should auto-detect the change and ask if you want to reload.
Now your plugin is ready to receive MIDI. Let’s take a look at the processBlock
function signature:
void processBlock ( juce :: AudioBuffer < float >& , juce :: MidiBuffer &)
override ;
You can see it receives a MidiBuffer object. This will contain any incoming
MIDI data your plugin might want to process. We shall start by just printing out
the messages. Add this code which uses a foreach loop to iterate over the incoming
MIDI messages to the end of your processBlock function:
1 for ( const auto metadata : midiMessages ) {
2 auto message = metadata . getMessage () ;
3 DBG ( " processBlock :: Got message " << message . getDesc ription () ) ;
4 }
Basic plugin development 70
FIGURE 8.3
Printing descriptions of MIDI messages coming into a plugin in Standalone mode
from a USB controller keyboard (left) and MIDI coming from an on-screen piano
keyboard in AudioPluginHost (right).
Now you need to send it some MIDI. If you have a MIDI input device such as
a USB controller keyboard, you can run the plugin in Standalone mode, select the
USB device from the options and check if messages are printed out when you press
the keys. Figure 8.3 illustrates this scenario. You can also use the AudioPluginHost
to send MIDI to your plugin if you do not have a USB MIDI controller. Create
the graph shown in figure 8.3, where the MIDI input block in the host is wired
to the MIDI input of the plugin. You might need to scan for new plugins to add
your plugin to the list. Click on the buttons on the on-screen piano keyboard.
Now we have MIDI coming into the plugin, we need to decide how to respond
to it. Going back to our MIDI parsing code, let’s put some logic in there to update
the frequency in response to a note-on message:
Basic plugin development 71
Read the code carefully. You will see that it uses the isNoteOn function to check
for MIDI note-on messages. It uses the getMidiNoteInHertz function to convert
the MIDI note from the range 0–127 to a frequency value. On line 8, it updates
the dphase value using the frequency extracted from the note-on message.
In this chapter, you will develop a plugin with a more advanced sound synthesis
algorithm: frequency modulation synthesis (FM). This FM synthesis plugin will be
helpful for later work in the book, where you will use machine-learning techniques
to control the plugin. I will show you how to implement a two-oscillator FM
synthesis algorithm with a modulator and a carrier oscillator. I will explain how
the modulation index and depth variables change the timbre. You will add a slider-
based user interface to the plugin, allowing you to control the synthesis engine.
I will then show you how to expose plugin parameters to the host environment,
which is one way that plugins integrate more deeply with plugin hosts. Then, we
will see how to use those plugin parameters in the synthesis algorithm and how
to control them via the user interface. At the end of the chapter, you will have a
useful monophonic FM synthesis plugin.
72
FM synthesizer plugin 73
Then initialise them in the initialiser list, which you can find just above the
constructor body in PluginProcessor.cpp – I am setting the index and depth as I
did in the expressions above:
1 ... mod_phase {0} , mod_dphase {0} , mod_index {0.5} , mod_depth {100}
2 // constructor body follows
3 {}
FM synthesizer plugin 74
Now to set mod dphase based on the frequency (f0 ) and mod index (mi ) vari
ables. In PluginProcessor.cpp’s updateFrequency function, you will see lines that
set up the original carrier oscillator, so add some more for the modulator oscillator:
1 frequency = newFreq ;
2 dphase = getDPhase ( frequency , getSampleRate () ) ;
3 // the mod_index is a multiplier on the base frequency :
4 mod_dphase = getDPhase ( frequency * mod_index , getSampleRate () ) ;
We are about to achieve rich FM tones. The next step is to change the syn
thesis algorithm in PluginProcessor.cpp’s processBlock function. You should see
something like this, which writes the output of a sine function to the channel data
buffer:
1 channelData [ sInd ] = ( float ) ( std :: sin ( phase ) * amp ) ;
2 phase += dphase ;
Change that block to this:
1 channelData [ sInd ] = ( float ) ( std :: sin ( phase ) * amp ) ;
2 mod = std :: sin ( mod_phase ) ;
3 mod *= mod_depth ;
4 dphase = getDPhase ( frequency + mod , getSampleRate () ) ;
5 phase += dphase ;
6 mod_phase += mod_dphase ;
Now build and run – you should hear a richer, bassy tone. This is because the
modulator index is 0.5, so the modulation ends at half the base frequency. You can
experiment with different values for mod index and mod depth now, or you can
move on to the next section, wherein we add slider controls for those parameters.
FIGURE 9.1
Simple FM plugin with sliders for frequency, modulation index and modulation
depth.
If you managed to get this all working, you should see a user interface like
FM synthesizer plugin 76
FIGURE 9.2
Showing plugin parameters for the Surge XT synthesiser using AudioPluginHost.
figure 9.1. The repo guide provides the code for this working FM plugin in example
39.2.7.
FIGURE 9.3
Showing plugin parameters for the FM plugin using AudioPluginHost.
You should see these parameters appearing when you look at the plugin pa
rameters in AudioPluginHost, as shown in figure 9.3. But how do you make those
parameters influence the synthesis algorithm? Add the following to processBlock
in PluginProcessor.cpp before the loop that fills up the channel buffer:
1 mod_dphase = getDPhase ( frequency * static_cast < double >(* modIndexParam
) , getSampleRate () ) ;
2 mod_depth = static_cast < double >(* modDepthParam ) ;
FM synthesizer plugin 78
FIGURE 9.4
Showing the custom UI for the FM plugin (right), the auto-generated parameter
UI (middle) and AudioPluginHost (left).
Remember the mod dphase is the speed the modulator oscillator goes at (so it
dictates its frequency). We are reading in the modIndexParam, using some fancy
syntax to convert it to a double, and then using it to update the mod dphase.
Following that, we set up the mod depth variable using the modDepthParam pa
rameter. This is done as mod depth can be used directly from the modDepthParam
parameter instead of being calculated from it as for mod dphase.
Note that the original GUI sliders we set up do not influence the algorithm any
more. You can change the two setters we created earlier in PluginProcessor.cpp
(setModIndex and setModDepth), so they change the parameters instead of the
variables.
1 void T e s t P l u g i n A u d i o P r o c e s s o r :: setModIndex ( double newIndex )
2 {
3 * modIndexParam = ( float ) newIndex ;
4 }
5 void T e s t P l u g i n A u d i o P r o c e s s o r :: setModDepth ( double newDepth )
6 {
7 * modDepthParam = ( float ) newDepth ;
8 }
Figure 9.4 shows the result, which is that the plugin has two possible interfaces:
1) a custom user interface, defined in PluginEditor.h, which has the original set
of controls we created and 2) a generic interface that the host auto-generates by
querying the parameters on the plugin. You can find the completed FM plugin in
example 39.2.8 in the repo guide. I will make one comment here about efficiency.
Calling std sin might not be the most efficient way to generate a sine signal –
JUCE has classes in its library that provide better oscillators. But for now, with
two oscillators running like that, it is efficient enough and makes the code more
straightforward.
FM synthesizer plugin 79
81
Using regression for synthesizer control 82
FIGURE 10.2
Linear regression finds the straight line that best fits some data. Important features
of the line are the point at which it intercepts the y-axis and the slope gradient.
network. The aim for this chapter therefore is to explain what regression is and
how to do it using neural networks.
The logical next question is – how do you work out where to put the line?
First, you can consider that you can specify a line as follows:
y = mx + b
y is the y coordinate, x is the x coordinate, m is the gradient of the slope and
c is the intercept on the y axis. These values are illustrated in Figure 10.2. So the
main job of a linear regression function is to compute m and b. If you have m
and b, you can calculate y for any x you like. The best line will have the smallest
total error between itself and the set of known data points. If the points are not
on a straight line, the best line will not go through all the points, so the linear
regression function has to minimise the error between the points and the line.
Once you have estimated the intercept and gradient values a and m that define
the line, you can estimate any y given an x value. Simple!
FIGURE 10.3
Linear regression with two lines, allowing the estimation of two parameters given
a single ‘meta-controller’ input control. The x-axis represents the control input
and the y-axis shows the settings for the two parameters the control is mapped
to.
Now you have some feel for the way a neural network might model a linear re
lationship between inputs and outputs, it is time to dive into an implementation
of this using libtorch. This chapter provides instructions for implementing linear
regression with a simple neural network using libtorch. It assumes that the devel
opment environment is set up, including an IDE, JUCE, CMake and libtorch. The
implementation involves creating a new project, generating a dataset, adding noise
to the dataset, creating a neural network model, converting the dataset to tensors,
computing the error, learning from the error using the optimiser, and running the
training loop. So you are going to be busy. The result of working through this will
be a trained neural network that can predict y values from x values with a low
error rate. The instructions also include challenges to extend the implementation
to higher-dimensional datasets and non-linear mappings.
Your main.cpp file should be the same as the minimal libtorch example:
1 // I have added some extra includes that we will need later
2 # include < vector >
3 # include < iostream >
4 # include < random >
5 # include < torch / torch .h >
6
7 int main () {
8 torch :: Tensor tensor = torch :: rand ({2 , 3}) ;
87
Experiment with regression and libtorch 88
Note the extra includes I added at the top. We will need those shortly. I will
not repeat the CMakeLists.txt file here for brevity, but make sure you have the
extra lines for libtorch as seen in listing 39.2.10 in the repo guide.
std :: vector is a growable array. std :: pair is a pair of values. std :: vector <
std :: pair < f loat, f loat >> specifies a growable array containing pairs of floats.
In other words, our x,y values.
Now the implementation:
1 float y {0} , dX {0};
2 std :: vector < std :: pair < float , float > > xys ;
3 dX = ( endX - startX ) / count ; // change in x
4 for ( float x = startX ; x < endX ; x += dX ) {
5 y = (m * x) + b;
6 xys . push_back ({ x , y }) ;
7 }
8 return xys ;
Read it carefully. Can you write some tests to verify that it generates the
numbers as expected? Here is an example of a test wherein I ask for 10 values with
Experiment with regression and libtorch 89
x in the range of 0 to 10. Notice how I access the first x value with xys[0].f irst.
To access the y value, you can use xys[0].second. This is because each item in the
array is a std::pair, and we access the values in the pair with ‘.first’ and ‘.second’.
So to the test – if the first value is not 0, print an error:
1 std :: vector < std :: pair < float , float > > xys ;
2 xys = getLine (5.0 , 0.5 , 0.0 , 10.0 , 10.0) ;
3
4 if ( xys [0]. first != 0.0) {
5 std :: cout << " start value was not zero as expected " < < std :: endl ;
6 }
I like to use informal unit testing when developing number processing pro
grams. You will see more of that as we work through it.
For less experienced C++ programmers, note that we are sending in the ‘xys’
parameter, which is a vector of pairs. Also, I added the ampersand character &
after the type to use ‘pass by reference’. This means that C++ will pass in a
reference to the xy values instead of making a copy and passing that. Using the
ampersand avoids the costly copy operation, but we must remember that the
function now has direct access to the original version of the data sent to it.
Here is an implementation using the random functions available in the standard
library:
1 std :: d e f a u l t _ r a n d o m _ e n g i n e generator ;
2 std :: uniform_real_distribution < float > distribution ( low , high ) ;
3 auto rand = std :: bind ( distribution , generator ) ;
Experiment with regression and libtorch 90
Creating a new random number generator every time the function gets called
is probably inefficient, but it keeps things simple for now. Try your function out
in combination with the printXYs function that you hopefully wrote earlier:
1 int main ()
2 {
3 std :: vector < std :: pair < float , float > > xys ;
4 xys = getLine (5.0 , 0.5 , 0.0 , 10.0 , 5) ;
5 printXYs ( xys ) ;
6 a d d N o i s e T o Y V a l u e s ( xys , -0.1 , 0.1) ;
7 printXYs ( xys ) ;
8 }
I see output like this with my version of the printXYs function:
1 x: 0, y: 5
2 x: 2, y: 6
3 x: 4, y: 7
4 x: 6, y: 8
5 x: 8, y: 9
6 x: 0, y: 4.9
7 x: 2, y: 5.92631
8 x: 4, y: 7.05112
9 x: 6, y: 7.99173
10 x: 8, y: 9.00655
1. The neural network model that will learn to predict y values from x
values.
2. Code to convert our x,y data into tensors so the network can understand
it.
3. The error metric that will compute the error over the dataset.
4. The optimiser that will use the computed error to update the neural
network weight and bias.
5. The training loop that will iteratively compute and learn from errors.
1 https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
Experiment with regression and libtorch 92
If you are working on Windows, you must ensure you are building in the same
mode as the libtorch library you are using. If you are using the Debug version of
libtorch, build in Debug mode, and the same for Release mode.
So we have two parameters – 0.3090 and -0.2448. Earlier I said that a single
‘node’ in a feed-forward neural network scales the input by a weight w then adds
a bias b. Let’s pass some input to the network to verify that is indeed what it
is doing. The following code passes a single value through the neural network by
calling it as if it were a function. Add the code after your parameter printing code:
1 auto input = torch :: empty ({1 , 1}) ;
2 input [0][0] = 0.5;
3 auto output = net ( input ) ;
4 std :: cout << " Passing 0.5 in ... this came out :\ n "
5 << output << std :: endl ;
You might be wondering what empty is and what the data type of ‘input’ is.
You cannot pass simple floats directly into a libtorch neural network – you have to
wrap the floats inside a data structure called a tensor. Empty allows you to create
an empty tensor of a certain shape (1x1 in this case). Then you can put data into
the tensor using array-like syntax – check line 2. More on tensors in a bit but let’s
just see what comes out of the other end of the network. On my machine I see
output like this, noting that the parameters are different every time I run it:
1 0.8999
2 [ CPUFloatType {1 ,1} ]
3 0.3680
4 [ CPUFloatType {1} ]
5 Passing 0.5 in ...
6 0.8179
7 [ CPUFloatType {1 ,1} ]
Let’s verify we know what is going on. The input x was 0.5; the output y was
0.8179; the weight parameter w was 0.8999 and the bias parameter b was 0.3680.
Use a calculator to confirm that y = wx + b.
Experiment with regression and libtorch 93
The first iteration of the parameter printing loop prints six numbers. This is
because each of the two inputs is connected to each of the three outputs, requiring
six weights. Six connections, six weights. Each of those three outputs will sum its
weighted inputs. The second iteration prints a further three numbers. Those are
the biases applied to the outputs.
dimensional matrix containing elements of a single data type.”. If you were to put
data types in order of increasing complexity, you would start with a single float,
then a vector of floats, then a matrix and finally, a tensor.
The following lines create a tensor suitable for storing your vector of x,y pairs:
1 torch :: Tensor in_t ;
2 in_t = torch :: empty ({( long ) xys . size () , 1}) ;
Note that I specified the data type this time, instead of using auto. That makes
it clear we are working with a Tensor variable. Then copy the x coordinates into
the input tensor:
1 for ( int i =0; i < xys . size () ; ++ i ) {
2 in_t [ i ][0] = xys [ i ]. first ; // first is x
3 }
Now get ready to be impressed by how flexible neural networks are – we can
pass the entire input dataset of x values into the network in one go, and it will
compute the entire output dataset of its estimated y values:
1 torch :: Tensor out_t = net ( in_t ) ;
2 std :: cout << " output : " << out_t << std :: endl ;
We call our ‘net’ variable as if it is a function, passing it the input tensor. ‘net’
returns another tensor containing the network’s guesses as to what the outputs
should be. Here is an example of the output I see from the above code:
1 output : -0.7382
2 0.6678
3 2.0737
4 3.4796
5 4.8856
6 [ CPUFloatType {5 ,1} ]
Notice that the shape (5 by 1) and type (CPUFloat) of the tensor and the
data it contains are printed.
torch::mse loss is how we calculate the loss. Try calculating the loss and then
looking at the difference between the correct outputs and the outputs from the
network. Can you figure out how MSE is being computed? Clue: MSE stands for
mean squared error.
Note how we pass the network parameters to the optimiser when we create it.
To use the optimiser to train the network, we can use the following code:
Experiment with regression and libtorch 96
1 // compute output
2 out_t = net ( in_t ) ;
3 // compute loss
4 torch :: Tensor loss_t = torch :: mse_loss ( out_t , correct_t ) ;
5 // reset the previous network parameter changes
6 optimizer . zero_grad () ;
7 // compute changes required to network parameters
8 loss_t . backward () ;
9 // update network parameters
10 optimizer . step () ;
This code represents training for a single epoch. Training for one epoch means
you have calculated the error over the entire dataset. With larger datasets, the
dataset is split into batches and the network is trained on one batch at a time.
This makes it possible to train on larger datasets than you can store in memory.
The linear regression dataset is very small so you can process the entire thing in
a single batch.
3 https://pytorch.org/docs/stable/generated/torch.range.html
4 https://pytorch.org/docs/stable/generated/torch.rand.html
12
The meta-controller
This chapter introduces the AI-music system called the meta-controller which you
will be building for the next few chapters. The meta-controller system allows you
to control many parameters on one or more instruments and effects simultane
ously using a set of powerful ‘meta-controls’. The meta-controller is a variant of
Rebecca Fiebrink’s Wekinator, built using C++ and the libtorch machine learn
ing library; the difference is that the meta-controller is specialised for controlling
standard plugin instruments and effects. This chapter describes the original Wek
inator system and its workflow, as well as describing some previous variants that
people have created such as the learner.js library. At the end of the chapter you
should have a clearer idea about Wekinator and the meta-controller’s interactive
machine-learning workflows.
98
The meta-controller 99
gin at a time, providing new modes of interaction with your plugins and breaking
free from the limitations of graphical user interfaces and standard controllers.
for folks developing innovative musical instruments. A few days before our per
formance, we discovered that our stream would be fed into a real Shanghai club
containing real people we could not even see! Luckily nothing crashed too badly
despite having three people madly editing the same file in real-time.
Returning to the central theme, we will build a version of Wekinator using
C++ and the libtorch machine learning library. Our version will not be as general
as Wekinator, and instead, it will be specialised for control of standard plugin
instruments and effects. So it will be a plugin host that can learn how to control
the plugins in response to different input types.
inputs outputs
Phase 1: 0.1, 0.4 0.3, 0.7, 0.1, 0.5
collect training
interactive data 0.5, 0.1 0.2, 0.6, 0.6, 0.1
loop
Phase 2: Trainer
train the optimise
Untrained
model model
FIGURE 12.2
The Wekinator workflow: data collection, training, inference then back to data
collection.
lable happens when the user manipulates the controller inputs, creative work can
happen.
In this chapter, you will get started on the meta-controller system. The first step
is implementing something similar to the ‘superknob’ found on Yamaha synthe
sizers such as the MODX+. The superknob described here allows you to control
two synthesizer parameters on a simple FM synthesizer using a single knob. It
does so using simple interpolation. This will lay the groundwork for the meta
controller, which allows you to control all parameters on any plugin using a single
user interface control.
13.1 FM synthesizer
You should start with the two-parameter FM
synthesizer shown in figure 13.1, which can be
found in repo-guide 39.2.9. To recap the fea
tures of this synthesizer:
1. Two sliders control modulation in
dex and modulation depth
2. A toggle switch enables either con
stant playing or enveloped playing
FIGURE 13.1 3. A piano keyboard allows playing
User interface for the simple two- notes on the synthesizer
parameter FM synthesizer. The
Make a copy of the project and edit the
toggle switch switches between
CMakeLists.txt file so it has your own ‘com
drone and envelope mode, the
pany name’ and plugin name/ code. Build and
two sliders control modulation
run it in standalone mode to verify it works
depth and index and the pi
correctly.
ano keyboard allows you to play
notes on the synthesizer.
103
Linear interpolating superknob 104
FIGURE 13.2
Superknob UI on the left. On the right is a closer view of a range slider. Small
triangles above and below the line allow the user to constrain the range of the
main slider control.
I made the superknob quite large for my user interface and placed it at the
top of the interface, as shown in figure 13.2. You can design it however you like.
Now you have the superknob widget displayed on the user interface, you need
to implement the listener code to respond to it and to cause a change to the two
parameters. In the PluginEditor.cpp code, locate the sliderValueChanged function.
Add an if statement to check if the incoming slider value is equal to the address
of your superknob widget. In my case, the widget’s variable is called superKnob,
so I have some code like this:
1 if ( slider == & superKnob ) {
2 // The user adjusted the superknob
3 }
Now your job is to take the value from the superknob widget in the range 0-1
and map it to the range specified by each of the sliders. This is just like the equation
for a line: y = mx + b, where y is the value for the slider (i.e. modulation index
or modulation depth), and x is the value of the superknob widget. To calculate
m (also known as the weight ), you need to work out the range for the target
slider by subtracting its lowest allowed value from its highest permitted value. To
calculate b (also known as the bias), just use the getMinValue function:
1 double high = modI ndexSlid er . getMaxValue () ;
2 double low = modInd exSlide r . getMinValue () ;
3 double weight = high - low ; // m
4 double bias = low ; // b
getMaxValue tells you where the user placed the high constraint on the three-
point slider. getMinValue tells you where the user placed the low constraint. Now,
you can plug weight and bias into the equation and send the value to the target
slider:
1 double superV = superKnob . getValue () ; // x
2 double newV = ( superV * weight ) + bias ; // y
3 modIn dexSlid er . setValue ( newV ) ;
Repeat the code for the other slider, called modDepthSlider, in the FM syn
thesizer example code. You should now be able to move the superknob dial and
see the two sliders moving in their constrained ranges. You now have your own
superknob.
controls amd linear interpolation. You should have an FM synthesizer plugin with
a working superknob feature that controls two parameters at the same time in
ranges that the user can specify.
14
Untrained torchknob
107
Untrained torchknob 108
Torchknob
Neural network
slider 1 slider 2
Mod Mod
index ratio
FM synthesizer
FIGURE 14.1
The torchknob system architecture.
1 // create a 1 x1 tensor
2 torch :: Tensor in = torch :: empty ({1 , 1}) ;
3 // Copy the superknob value into the tensor
4 in [0][0] = superKnob . getValue () ;
5 // Pass the tensor through the linear layer
6 torch :: Tensor out = linear ( in ) ;
7 // print the result ( DBG does not work )
8 std :: cout << out << std :: endl ;
9 // Extract the result from our tensor
10 // with out [0][1]. item < float >()
11 // which specifies the position and
12 // data type you want to convert the tensor
13 // data into
14 modDe pthSlid er . setValue ( out [0][0]. item < float >() ) ;
15 modIn dexSlid er . setValue ( out [0][1]. item < float >() ) ;
Try moving the superknob around. You should see the two synthesis parameter
sliders moving. I say ‘should’ because the network starts off with random weights
and biases, so the two sliders might not move much or at all because the values
coming out of the network are out of range of the slider. Each time you run
the program, the network weights and biases (parameters) will be different. Keep
re-running until you get some movement.
1 # pragma once
2
3 # include < torch / torch .h >
4
5 class NeuralNetwork : torch :: nn :: Module {
6 public :
7 NeuralNetwork ( int64_t n_inputs , int64_t n_outputs ) ;
8 std :: vector < float > forward ( const std :: vector < float >& inputs ) ;
9 void a dd Tr ai n in gD at a (
10 std :: vector < float > inputs ,
11 std :: vector < float > outputs ) ;
12 void runTraining ( int epochs ) ;
13 private :
14 int64_t n_inputs ;
15 int64_t n_outputs ;
16 torch :: nn :: Linear linear { nullptr };
17 torch :: Tensor forward ( const torch :: Tensor & input ) ;
18
19 };
You should now write comments above each function to explain what it does.
Having written your documentation, it is time to turn your attention to the Neu
ralNetwork.cpp file. This is where we make good on the promises in the header
file. You need to add NeuralNetwork.cpp to the CMakeLists.txt file in the tar
get sources command so that the new class is included in the build. Here is an
example of a CMake target sources command which adds the NeuralNetwork.cpp
file to the build for a target called fm-torchknob:
1 targe t_sourc es ( fm - torchknob
2 PRIVATE
3 src / PluginEditor . cpp
4 src / Pl ug i nP ro ce s so r . cpp
5 src / NeuralNetwork . cpp )
You might add the header files to the list as well, if that is appropriate for
your setup. I have seen people adding header files when working in Visual Studio
Community as it makes them easier to find in the project. Now back to the Neu
ralNetwork.cpp file, here is an implementation for the NeuralNetwork constructor
which takes two 64-bit integers for the input and output size and uses them to
initialise a Linear layer:
1 NeuralNetwork :: NeuralNetwork ( int64_t _n_inputs , int64_t _n_outputs )
2 : n_inputs { _n_inputs } , n_outputs { _n_outputs }
3 {
4 linear = r eg is te r _m od ul e (
5 " linear " ,
6 torch :: nn :: Linear ( n_inputs , n_outputs )
7 );
8 }
You can see here that I am calling register module, which is a function inherited
from the torch Module class. This allows the Module parent class to keep track
of the layers (modules) we are adding to the network. Later you might use the
functions provided by Module to do various useful operations on the network as
a whole, which is why you need to register the modules.
Now the rest of the functions – here is some placeholder code, for now, just to
get it to compile:
1 std :: vector < float > NeuralNetwork :: forward (
2 const std :: vector < float >& inputs )
3 {
4 std :: vector < float > out = {0 , 0};
5 return out ;
6 }
7
8 void NeuralNetwork :: a dd Tr a in in g Da ta (
9 std :: vector < float > inputs ,
10 std :: vector < float > outputs )
11 {
12
Untrained torchknob 112
13 }
14
15 void NeuralNetwork :: runTraining ( int epochs )
16 {
17
18 }
19
20 torch :: Tensor NeuralNetwork :: forward ( const torch :: Tensor & input )
21 {
22 torch :: Tensor out = linear ( input ) ;
23 return out ;
24 }
To make sure compiling and linking are working, add a NeuralNetwork to the
private area of the PluginEditor.h file:
1 NeuralNetwork nn {1 , 2};
Then in the sliderValueChanged function, put this in the block that responds
to the superknob:
1 // call forward on NeuralNetwork
2 std :: vector < float > nn_outs = nn . forward (
3 // pass it a vector containing the superknob value
4 std :: vector < float >{( float ) superKnob . getValue () }) ;
5
6 // use the return data to set the slider values
7 modDe pthSlid er . setValue ( nn_outs [0]) ;
8 modIn dexSlid er . setValue ( nn_outs [1]) ;
This code calls forward on our new NeuralNetwork class, passing it the su
perknob value and uses the output to set the values on the other two sliders.
Build and run to check for any mistakes. Note that the public forward function on
NeuralNetwork currently returns two zeroes – it does not actually use the neural
network model to infer the slider values.
Add a new file to your src folder called test nn.cpp. Put the following in that
file:
1 # include " NeuralNetwork . h "
2 # include < iostream >
3
4 int main ()
5 {
6 NeuralNetwork nn {2 , 2};
7 std :: cout << nn . forward ({0.1 , 0.5}) << std :: endl ;
8 return 0;
9 }
As you can see, we are just creating a NeuralNetwork object and then calling
forward on it, passing in some values. To add this to your CMake project, add the
following lines to the end of CMakeLists.txt:
1 add_e xecutab le ( test_nn
2 src / NeuralNetwork . cpp
3 src / test_nn . cpp )
4 t a r g e t _ l i n k _ l i b r a r i e s ( test_nn " $ { TO RC H _L IB RA R IE S }")
5 set_property ( TARGET test_nn PROPERTY CXX_STANDARD 14)
When you re-run CMake to regenerate the project, you should have another
executable target you can build. In VSCode, this is accessible from the dropdown
menus for the build and run buttons. You can now choose if you want to do quick
tests of the NeuralNetwork class by editing test nn.cpp or if you want to test the
full integration with the FM synthesizer. You can add a series of test functions to
the test nn.cpp file.
Untrained torchknob 114
1. Essentially, it makes a tensor for the input data, copies the vector data
to the tensor, passes the tensor to the network, copies the output of the
network back out to a vector and returns it.
2. The code is generalised, so it can deal with any number of inputs and
any number of outputs.
3. It uses some dynamic memory allocation to create the tensors and vec
tors. This is definitely not the most efficient way to implement this code
but this code will run at ‘GUI’ speed as opposed to audio speed so it
does not need to be highly optimised yet.
4. A more efficient way would be to create the vectors and tensors as class
data members and re-use them, but that would complicate the code,
and I want to keep it as clear as possible for now.
Next, the private forward function:
1 torch :: Tensor NeuralNetwork :: forward ( const torch :: Tensor & input )
2 {
3 torch :: Tensor out = linear ( input ) ;
4 return out ;
5 }
Untrained torchknob 115
FIGURE 14.2
Basic architecture where a linear layer passes into a softmax layer. The numbers
in the brackets indicate input and output shape. The linear layer input (2,1) goes
from 1 value to 2 nodes, then output (2,2) goes from 2 nodes to 2 outputs.
Now try re-running the test nn program a few times. You should see different
outputs each time. Can you figure out why the outputs differ each time? Are these
outputs in suitable ranges for the sliders?
exp(xi )
Sof tmax(xi ) = (14.1)
Σj exp(xj )
In other words, the softmax value of output xi is the exponential of that output
divided by the sum of the exponents of all outputs. Let’s run through a calculation
to make it really clear. Say we had two outputs with values 10.2 and 1.5. We take
the exponent of those values:
exp(10.2)
= 0.9998 (14.5)
exp(10.2) + exp(1.5)
and softmax of 1.5:
exp(1.5)
= 0.0002 (14.6)
exp(10.2) + exp(1.5)
Since the sum of the softmax of all outputs is 1, softmax is typically used
to convert output layer values into probabilities. This is useful for classification,
which is a common task for neural networks. For example, does a given image
contain a dog or a cat? In classification tasks, the outputs represent the network’s
estimation of the probability of an input belonging to a given class, and softmax
is a handy layer that converts outputs in any range to outputs in the range 0-1,
totalling 1. So you can use softmax as a convenient processing layer to put the
outputs in predictable ranges. With the NeuralNetwork class, it is fairly easy to
add new layers. Here are the steps to add a new layer:
Step 1: Add a field of the appropriate type for your layer to the private section
of the NeuralNetwork class in NeuralNetwork.h. For a softmax layer:
torch::nn::Softmax{nullptr};
Step 2: Call register module in the constructor of NeuralNetwork:
1 softmax = r e gi st er _ mo du le (
2 " softmax " , // name it
3 torch :: nn :: Softmax (1) // configure it
4 );
Note that softmax takes one argument to its constructor, and a ‘1’ is appropri
ate for our purposes. According to the PyTorch documentation, this constructor
argument ‘dim’ is ‘A dimension along which softmax will be computed (so every
slice along dim will sum to 1).’
Step 3: Pass the output of the linear layer through the softmax layer in the
NeuralNetwork’s private forward function. Here is the new, complete forward func
tion:
1 torch :: Tensor NeuralNetwork :: forward ( const torch :: Tensor & input )
2 {
3 std :: cout << " forward input " << input << std :: endl ;
4 torch :: Tensor out = linear ( input ) ;
5 std :: cout << " forward after linear " << out << std :: endl ;
6 out = softmax ( out ) ;
7 std :: cout << " forward after softmax " << out << std :: endl ;
8 return out ;
9 }
Untrained torchknob 117
Note that I have added some print statements to the code so we can check how the
data flows through the network. You can see a visualization of the neural network
architecture in figure 14.2.
You can now return to your JUCE synthesizer and test the new neural network.
Remember that the network outputs are in the range [0..1], so you should scale
them into the correct ranges for the sliders.
The chapter begins by explaining how to train the neural network that powers
the torchknob. First, the focus is on preparing the user interface for training by
adding data input knobs and function buttons. Then, the process of gathering
a training dataset by capturing input and output values from the user interface
is demonstrated. After that, the chapter describes how the training data is con
verted to tensor format and stored. The implementation of the training loop is
then discussed, wherein a Stochastic Gradient Descent optimiser is used to up
date the neural network parameters. Additionally, the chapter reconsiders using
the softmax layer for output normalization. It suggests replacing it with a sig
moid activation function, the output normalization method utilised in the original
Wekinator and learner.js systems.
119
Training the torchknob 120
Torchknob Torchknob
add data
control data
train
reset
mod ratio
mod index
FIGURE 15.2
User interface mockup for trainable superknob system (left). We have an additional
knob to specify training input without triggering the movement of the sliders.
Actual user interface prototype (right).
9 ) . clone () ;
10 // same process for the outputs
11 torch :: Tensor outT = torch :: from_blob (
12 ( float *) ( outputs . data () ) ,
13 outputs . size ()
14 ) . clone () ;
15 trainInputs . push_back ( inT ) ;
16 trainOutputs . push_back ( outT ) ;
The from blob function allows you to create a tensor directly from the data
in the vector which is a bit more efficient on lines of code and memory allocation
than manually coping the vector’s data into the tensor. The clone function copies
the data from the vector to ensure that data belongs to the trainInputs vectors
(otherwise, the memory storing the data can be reassigned). As before, drop down
to your test nn.cpp file to quickly test that things are working as expected. Use
printouts to verify the data from the sliders ends up in the training tensors. The
test file is handy because JUCE’s DBG function cannot handle printing tensors.
When I was debugging this code, it helped me to verify data coming into the
network.
Another useful technique for testing and managing errors is assertion. To verify
that the data being sent to the addTrainingData function has the correct shape,
you can add the following two lines to the addTrainingData function in Neural
Network.cpp:
1 assert ( inputs . size () == ( unsigned long ) n_inputs ) ;
2 assert ( outputs . size () == ( unsigned long ) n_outputs ) ;
That code will trigger a crash if the vectors have the wrong size. Better than
limping on and crashing out later with an intractable bug. Note that I had to cast
the n inputs variable from its initial type of int64 t into an unsigned long to avoid
a compiler warning. Tensor sizes are int64 t but vector sizes are unsigned longs.
Note that I am using a unique ptr here. This allows me to decide which pa
rameters the optimiser receives from its constructor after I have created the neural
network model. In fact, I only know what the parameters are once I create the
model by calling the register module function in the neural network’s constructor.
So I have to dynamically create the SGD. I cannot use this style as I did for the
network layers:
1 torch :: optim :: SGD optimiser { nullptr };
SGD, unfortunately, does not have a constructor with that signature. If you
are not very experienced with C++, you may not be familiar with unique ptr or
even pointers. What you need to know now is that pointers were inherited by
C++ from the C language, and they allow programs to request memory from
the operating system when they are running, as opposed to requesting all needed
memory at the start. If you use a pointer to allocate memory, you must also
return that memory to the system. Otherwise, you have a memory leak, where
your program takes memory but does not return it. Pointers are notorious for
striking fear and confusion into the hearts of novice C++ programmers, and they
can cause all kinds of problems. Unique ptr and other so-called ‘smart pointers’
were invented in a later version of C++ to help solve some of these problems.
Smart pointers automatically free the memory they use once they go out of scope.
In other words, they behave more like regular variables with the added capability
of runtime memory allocation. You will see smart pointers such as unique ptr and
shared ptr popping up in many of the JUCE example applications that come with
the JUCE developer kit.
I am using a smart pointer here as it allows me to create an SGD variable
without assigning an actual SGD object to it. This is helpful as I can only create
an SGD object if I know the parameters for the neural network it will optimise.
So with my smart pointer, I can create the object after I have created the neural
network model in the NeuralNetwork class constructor. Here is some code that uses
the std::make unique function to create an SGD object wrapped in a unique ptr.
Put the code at the end of the NeuralNetwork constructor in NeuralNetwork.cpp:
1 optimiser = std :: make_unique < torch :: optim :: SGD >(
2 this - > parameters () , .01) ;// params , learning rate
Note how I call the parameters function on ‘this’. Remember that our Neu
ralNetwork class inherits from the torch Module class. This means it inherits a
set of functions from Module. One of those functions is ‘parameters’, and it re
turns the parameters of the registered modules in a form that the optimiser can
understand. ‘this’ refers to the current NeuralNetwork object. I could just have
called ‘parameters’ without ‘this’, but I prefer this more explicit syntax as it tells
you parameters is a function inside the current object, not something global or
otherwise in the namespace somewhere.
Next, the actual runTraining function. This is where things get somewhat
Training the torchknob 124
complex as we first convert the training data into a large tensor, then use it in
combination with the optimiser to improve the weights:
1 // Push inputs to one big tensor
2 torch :: Tensor inputs = torch :: cat ( trainInputs )
3 . reshape ({( signed long ) trainInputs . size () , trainInputs [0]. sizes ()
[0]}) ;
4 // Push outputs to one big tensor
5 torch :: Tensor outputs = torch :: cat ( trainOutputs )
6 . reshape ({( signed long ) trainOutputs . size () , trainOutputs [0]. sizes ()
[0]}) ;
7 // Run the training loop
8 for ( int i =0; i < epochs ; ++ i ) {
9 // Clear out the optimizer
10 this - > optimiser - > zero_grad () ;
11 auto loss_result = torch :: mse_loss ( forward ( inputs ) , outputs ) ;
12 float loss = loss_result . item < float >() ;
13 std :: cout << " iter : " << i << " loss " << loss << std :: endl ;
14 // Backward pass
15 loss_result . backward () ;
16 // Apply gradients
17 this - > optimiser - > step () ;
18 }
Again, I recommend dropping to your test nn.cpp file and trying a simple test
program. Here is an example of a simple testing program:
1 int main ()
2 {
3 NeuralNetwork nn {1 , 2};
4 for ( float i =0; i <10; ++ i ) {
5 nn . a d dT ra in i ng Da t a ({ i /10} , { i /5 , i /3}) ;
6 }
7 nn . runTraining (10) ;
8 return 0;
9 }
Satisfy yourself that the data is flowing around the system correctly with prints
and tests in the test function, then return to the JUCE application and verify that
it also works correctly.
FIGURE 15.3
Example of an experiment you can carry out. First, set the training slider to its
lowest value, the same for the modulation controls. Add a training point. Then
move to the middle positions, and add a training point. Finally, move to the
highest positions, and add a training point.
FIGURE 15.4
The learner.js/ Wekinator regression architecture (top). The simpler architecture
we used previously (bottom).
small amount of data. In other examples in the book, I will demonstrate the full
and proper process of training a neural network with a non-interactive workflow.
How many epochs does it take you to train in the test program? My test
program takes around 5–6,000 epochs, and the main torchknob program takes
about the same number. However, this depends on how many data points you
have to feed it. Try adding more layers to the network or increasing the ‘hidden
layer’ size. Adding a reset button that deletes the training data and randomises
the network weights would be helpful.
Training the torchknob 128
This chapter covers the steps involved in adapting the torchknob FM synthesizer
into a plugin host. This means we can move from the torchknob concept, wherein
a neural network learns to control a fixed synthesizer’s parameters, to the meta
controller concept, wherein a neural network can control any plugin synthesizer
available. The steps covered are removing the FM synthesizer UI and DSP code,
adding code to load a VST plugin synthesizer, and hooking the plugin into the
processBlock function so it can generate audio and process MIDI.
129
Plugin meta-controller 130
1 J U C E _ P L U G I N H O S T _ V S T 3 =1
That will enable plugin hosting capability in your application. Now generate
the project, build and run to verify things are ready.
If it all goes terribly wrong, you can refer to the project 39.3.5.
1 p l u g i n F o r m a t M a n a g e r . a d d D e f a u l t F o r m a t s () ;
2 int currInd {0} , vstInd {0};
3 for ( const juce :: A u d i o P l u g i n F o r m a t * f : p l u g i n F o r m a t M a n a g e r .
getFormats () ) {
4 if (f - > getName () == " VST3 ") {
5 vstFormatInd = currInd ;
6 break ;
7 }
8 currInd ++;
9 }
Now everything is set for the loadPlugin function. Here is a minimal imple
mentation of loading a plugin:
1 // do not call processBlock when loading a plugin
2 s u s p e n d P r oc e s s i n g ( true ) ;
3
4 // remove any previously read descs
5 // so we can just use index 0 to find our desired on layer
6 p l u g i n D e s cr i p t i o n s . clear () ;
7 bool added = k no w nP lu gi n Li st . sc anAndAdd File (
8 pluginFile . g et F ul lP a th Na me () ,
9 true ,
10 pluginDescriptions ,
11 * p l u g i n F o r m a t M a n a g e r . getFormat ( vstFormatInd ) ) ;
12
13 juce :: String errorMsg {""};
14 plugi nInstan ce = p l u g i n F o r m a t M a n a g e r . c r e a t e P l u g i n I n s t a n c e (
15 * p l u g i n D e s c r i p t i o n s [0] , // 0 since we emptied the list at the top
16 getSampleRate () , getBlockSize () , errorMsg ) ;
17
18 // get the plugin ready to play
19 pluginInstance - > enableA llBuses () ;
20 pluginInstance - > prepareToPlay ( getSampleRate () , getBlockSize () ) ;
21 // re - enable processBlock
22 s u s p e n d P r oc e s s i n g ( false ) ;
the variable ‘errorMsg’ is empty after you call createPluginInstance, the plugin
should be ready to use.
To test out loadPlugin, locate a plugin on your system. There are default lo
cations for plugins and these vary depending on your operating system. You can
query the default location using the JUCE VST3PluginFormat’s getDefaultLoca
tionsToSearch function1 . Find a VST3 plugin using your operating system’s file
browser or terminal and get the full file path. Then call loadPlugin in AudioPro
cessor.cpp’s prepareToPlay function, as that is called when the audio system is
ready to go. If you call it somewhere else, e.g. the constructor, you might not know
the sample rate or the block size yet, and you need those to create the plugin.
Your plugin should load, and the appropriate messages you coded earlier should
appear.
This code is a good start – you can now host a plugin and hear its output.
Code example 39.3.6 in the repo guide provides a fully working example.
1 https://docs.juce.com/develop/classVST3PluginFormat.html
Plugin meta-controller 134
135
Placing plugins in an AudioProcessGraph structure 136
now. You can find AudioPluginHost in the extras folder of the JUCE distribu
tion. Build it and run it. Figure 17.1 presents a simple example of a graph you
can create with the AudioProcessorGraph. In that example are MIDI in and out
nodes and audio in and out nodes. Two plugin nodes are embedded in the graph:
a synthesizer and an effects unit.
Both plugins receive MIDI from the MIDI-in node allowing them to be con
trolled. The MIDI-in is also wired to the MIDI-out – a kind of MIDI-thru for
those familiar with outboard synthesizers. The audio-in is not wired to anything.
Audio from the synthesizer node passes to the effects unit node and then to the
audio-out.
The AudioPluginInstance variables are unique ptrs and the graph nodes are
Node::Ptrs. More on that shortly.
Placing plugins in an AudioProcessGraph structure 137
The code created a set of unique ptr types, then called std::move to hand
over the unique ptrs to the nodes. We are getting into some gnarly C++ smart
pointer territory here. I mentioned earlier that smart pointers manage dynamically
allocated memory to ensure it is returned to the operating system when it is no
longer needed. Unique ptrs do this by only allowing one part (or one scope) of the
program to access the unique ptr at any time. Here, only the class we defined the
variables in can access them. They cannot be passed around. If you want another
part of the program to access a unique ptr, you can pass it over with the move
function. That is happening here: we create the unique ptrs, then hand them over
Placing plugins in an AudioProcessGraph structure 138
to the node objects, which take over managing them. After calling move, this part
of the program can no longer access the unique ptr variables.
Read the code carefully – can you spot any variables we need to initialise?
Did you spot that audioProcGraph still needs to be initialised? Add this to the
initialiser list on the constructor:
1 // e . g . just before # endif
2 , audi oProcGra ph { new juce :: A u d i o P r o c e s s o r G r a p h () }
To complete the discussion of unique ptr, can you see the two ways we have
instantiated the data stored in unique ptr? First, we instantiated the AudioPlug
inInstance variables, such as midiOutputProc, by calling std::make unique. Then,
we instantiated the AudioProcessorGraph using an initialiser list. In other words,
we called the constructor of AudioProcessorGraph’s unique ptr and passed it a
pointer to an AudioProcessorGraph. Sheesh! The following short program illus
trates these two ways of instantiating unique ptrs, in case you re-encounter them:
1 # include < memory >
2
3 class Test {
4 public :
5 Test () {}
6 };
7
8 int main () {
9 // method 1: instantiate by passing a ’ real pointer ’
10 // to the constructor
11 std :: unique_ptr < Test > myTest { new Test () };
12 // method 2: instantiate by calling make_unique :
13 std :: unique_ptr < Test > myTest2 ;
14 myTest2 = std :: make_unique < Test >() ;
15 }
Placing plugins in an AudioProcessGraph structure 139
Note how I used the pointer syntax to call the addNode function: instead of
audioProcGraph.addNode it is audioProcGraph− >addNode. So far so good –
we have a node sitting in the graph. Now connect the node to the appropriate
input and output nodes by adding this code to the end of the addPluginToGraph
function:
1 // connect the node to the output ??
2 audioProcGraph - > addConnection ({
3 { pluginNode - > nodeID , 0} ,
4 { outputNode - > nodeID , 0}}) ;
5 // This will crash if it ’ s mono ... check if you want !
6 audioProcGraph - > addConnection ({
7 { pluginNode - > nodeID , 1} ,
Placing plugins in an AudioProcessGraph structure 140
You should now be able to load the test plugin and hear it via the graph.
Placing plugins in an AudioProcessGraph structure 141
To make it work, you need to add a private variable to the PluginEditor.h file:
1 juce :: FileChooser fChooser {" Select a plugin ."};
The code uses an anonymous function as an asynchronous callback. If you
are familiar with Javascript, this pattern is familiar to you. The idea is that the
GUI thread is not interrupted by the user browsing for a file, as the file browser
happens in a separate thread. Once the user has selected a file, the code prints out
the file path and then attempts to load it using the AudioProcessor’s loadPlugin
function. The fileChooseFlags specify that the user can select files. The problem is
that plugins do not present themselves as files on all platforms. On some platforms
they present themselves as directories.
If you want your application to be cross-platform, you need to specify different
flags for different platforms. MacOS and Linux plugins are directories. Windows
Placing plugins in an AudioProcessGraph structure 142
plugins are files. So you need to set the flags using some JUCE macros to check
which platform you are on.
1 # ifdef JUCE_LINUX
2 auto f i l e Ch o o s e r F l a g s = juce :: F i l e B r o w s e r C o m p o n e n t :: openMode |
3 juce :: F i l e B r o w s e r C o m p o n e n t :: c a n S e l e c t D i r e c t o r i e s ;
4
5 # endif
6 # ifdef JUCE_MAC
7 auto f i l e Ch o o s e r F l a g s = juce :: F i l e B r o w s e r C o m p o n e n t :: openMode |
8 juce :: F i l e B r o w s e r C o m p o n e n t :: can SelectFi les ;
9 # endif
10 # ifdef JUCE_WINDOWS
11 auto f i l e Ch o o s e r F l a g s = juce :: F i l e B r o w s e r C o m p o n e n t :: openMode |
12 juce :: F i l e B r o w s e r C o m p o n e n t :: can SelectFi les ;
13 # endif
The final functional change we will make to our plugin host before we switch back
to working on the machine learning side is to display the user interface for the
plugin. This chapter explains how to display the user interface for any plugin.
One of the features of the various plugin APIs (VST3, AudioUnit, etc.) is that
they allow the plugin to have a custom user interface. This is the interface you
see when you load the plugin into your DAW. As you should know by now, JUCE
allows you to write a custom GUI for your plugins using the JUCE GUI widgets.
Commercial and other plugins have their own ways of implementing the GUI,
including animated graphics, 3D, etc. How can we show this user interface using
JUCE? The solution presented here is to pop up a free-floating window containing
the user interface. To achieve that, you are going to explore the JUCE API a little
more and define a new class.
143
Show a plugin’s user interface 144
FIGURE 18.2
User interface for the host with the Surge XT plugin user interface showing in a
separate window.
is also a plugin. This is why they all have processBlock, prepareToPlay and so on
– they inherited them from AudioProcessor.
Another thing that AudioProcessor classes have are various functions for re
trieving the user interface. Check out your PluginProcessor.h – there is a function
override called createEditor. Look at the implementation in PluginProcessor.cpp –
it creates and returns an instance of the editor class defined in your PluginEditor
files. In my PluginProcessor.cpp file, the createEditor function contains one line
like this:
1 return new P l u g i n H o s t E di t o r (* this ) ;
If I switch to my PluginEditor.h, I can see that it defines a PluginHostEditor
class, and it takes an AudioProcessor to its constructor:
1 P l u g i n H o s t E d i t o r ( P l u g i n H o s t P r o c e s s o r &) ;
So whenever a plugin host wants to see the user interface of my plugin, somehow
it calls createEditor and createEditor calls the constructor of the user interface
class and returns it.
What is the base class of the editor? In my PluginEditor.h file, the class in
heritance is defined as follows:
1 class P l u gi n H o s t E d i t o r :
Show a plugin’s user interface 145
So in fact it has multiple base classes. The one we are interested in here is Au
dioProcessorEditor. If you look at the class hierarchy for AudioProcessorEditor,
you will find it inherits from Component. A Component, just like any other Com
ponent in JUCE, such as a TextButton or a Slider. You could display it directly on
your user interface. Similarly, if you called createEditor on any external plugin you
have loaded, you will receive an AudioProcessorEditor object and it too would be
a Component. So if you want to display the user interface for a plugin, you could
just treat it as a Component and place it directly into your interface window.
But embedding the plugin UI in your main window is not a good idea: the
lifetime of plugin UI components is a little odd, so you do not want to rely on it
as a part of your main UI. The plugin UI can have a completely different aesthetic
to your UI, so it does not really make sense to simply place it in your UI. And the
plugin’s user interface will have its own preferred size. In the following section,
you will find out how to create a separate window for the plugin UI.
plugin ‘AudioProcessor’ from the graph. Once you have access an AudioProcessor
representing the plugin, you can call its createEditor function to gain access to
the plugin UI.
So in PluginProcessor.h, add a new public function which will ultimately return
an AudioProcessorEditor object representing the plugin’s UI:
1 juce :: A u d i o P r o c e s s o r E d i t o r * g e t H o s t e d P l u g i n E d i t o r () ;
Then the implementation in PluginProcessor.cpp:
1 juce :: A u d i o P r o c e s s o r E d i t o r * P l u g i n H o s t P r o c e s s o r ::
g e t H o s t e d P l u g i n E d i t o r ()
2 {
3 if ( pluginNode ) {
4 return pluginNode - > getProcessor () -> c r e a t e E d i t o r I f N e e d e d () ;
5 }
6 else {
7 return nullptr ;
8 }
9 }
On line 3 the code checks if a pluginNode is available. This will only be the
case if you have already loaded a plugin via the loadPlugin button. If the plug
inNode is available, it calls getProcessor on it, which will return a pointer to the
AudioProcessor inside the node, or in other words, the plugin. This is odd since
earlier you wrapped the plugin in a smart unique ptr and now it is being con
verted back to a regular bare pointer. I think at this point you just need to be
a good citizen and not call delete on that pointer. So how do you get the UI
from an AudioProcessor? As mentioned earlier, you call createEditor, or better,
createEditorIfNeeded which will only call createEditor if necessary. Right so you
have access to the AudioProcessorEditor for your hosted plugin. Now you need to
get that displayed in a window.
Now you should be able to use the PluginWindow class to open a window
showing the plugin user interface. In PluginEditor.h, add this to the private fields:
1 std :: unique_ptr < PluginWindow > pluginWindow ;
Then in PluginEditor.cpp, implement the code to create the window when the
user presses the show UI button:
Show a plugin’s user interface 148
1 if ( btn == & s h o w P l u g i n U I B u t t o n ) {
2 if (! audioP rocessor . g e t H o s t e d P l u g i n E d i t o r () ) {
3 DBG (" s h o w P l u g i n U I B u t t o n no plugin ") ;
4 return ;
5 }
6 if ( pluginWindow ) {
7 DBG (" s h o w P l u g i n U I B u t t o n window already open ") ;
8 return ;
9 }
10 DBG (" s h o w P l u g i n U I B u t t o n creating new gui ") ;
11 pluginWindow =
12 std :: make_unique < PluginWindow >
13 ( aud ioProce ssor . g e t H o s t e d P l u g i n E d i t o r () ) ;
14 }
Note that the code deals with various possible scenarios that can occur: there
is no plugin loaded yet, the plugin is loaded but the window already exists and
the plugin is loaded and the window does not exist.
Try it out! You should see something like figure 18.2, which shows the Surge
XT user interface rendered into a PluginWindow alongside the host user interface.
Of course, you will see the user interface for your plugin of choice.
you to implement things. For example, the Button::Listener that you have been
inheriting from has its buttonClicked function set to ‘0’, forcing you to implement
it in your listener.
Back to the task at hand, add a private variable to the PluginWindow class:
1 PluginWindowListener * listener ;
And a function to assign a listener (you can put this into PluginWindow.h):
1 void a d d P l u g i n W i n d o w L i s t e n e r (
2 P l u g i n W i n d o w L i s t e n e r * _listener )
3 {
4 listener = _listener ;
5 }
Then notify the listener when the PluginWindow’s close button is pressed in
PluginWindow.cpp:
1 void PluginWindow :: c l o s e B u t t o n P r e s s e d ()
2 {
3 if ( listener ) {
4 listener - > p l u g i n C l o s e B u t t o n C l i c k e d () ;
5 }
6 }
Now you need to create a Listener that can listen to the close event and do
the things necessary to remove the window. Inherit from PluginWindowListener
in the PluginEditor:
1 class P l u gi n H o s t E d i t o r : public juce :: AudioProcessorEditor ,
2 public PluginWindowListener ,
3
4 ... in the public section :
5 void p l u g i n C l o s e B u t t o n C l i c k e d () override ;
6 ...
Add the implementation in PluginEditor.cpp:
1 void P l u g in H o s t E d i t o r :: p l u g i n C l o s e B u t t o n C l i c k e d ()
2 {
3 // trigger delete of unique_ptr - owned object
4 pluginWindow . reset () ;
5 }
The last step is to register the PluginHostEditor as a listener. In PluginEd
itor.cpp’s buttonClicked function, after you create the window, register as a lis
tener:
1 pluginWindow = std :: make_unique < PluginWindow >( audioPro cessor .
g e t H o s t e d P l u g i n E d i t o r () ) ;
2 pluginWindow - > a d d P l u g i n W i n d o w L i s t e n e r ( this ) ;
Show a plugin’s user interface 150
151
From plugin host to meta-controller 152
cause the plugin wrapper in JUCE, the AudioProcessor class1 , is plugin format
agnostic. In other words, it uses the same set of parameter control functions for
multiple plugin formats. Some formats have slightly more complex ways of ar
ranging parameters, such as groups of related parameters. That makes the plugin
querying code a little more complicated.
The following code prints a list of plugin parameters and their values. Add it
to PluginProcessor.h and cpp as a private function:
1 void P l u g i n H o s t P r o c e s s o r :: p r i n t P l u g i n P a r a m e t e r s ()
2 {
3 if ( pluginNode ) {
4
5 juce :: Array < juce :: A u d i o P r o c e s s o r P a r a m e t e r * > params =
6 pluginNode - > getProcessor () -> getParameters () ;
7
8 for ( const juce :: A u d i o P r o c e s s o r P a r a m e t e r * p : params ) {
9 DBG (" p : " << p - > getName (100) << " : " << p - > getValue () ) ;
10 }
11 }
12 }
The ‘100’ parameter for getName tells it to automatically crop the name to 100
characters. You have to pass this parameter. Add the code to your PluginProcessor
and call printPluginParameters after loading a plugin. Here is an extract of the
output I see when I load the Dexed plugin:
1 p: Cutoff : 1
2 p: Resonance : 0
3 p: Output : 1
4 p: MASTER TUNE ADJ : 0.5
5 p: ALGORITHM : 1
6 p: FEEDBACK : 1
7 p: OSC KEY SYNC : 1
8 p: LFO SPEED : 0.353535
9 p: LFO DELAY : 0
10 p: LFO PM DEPTH : 0
11 p: LFO AM DEPTH : 0
12 p: LFO KEY SYNC : 1
13 p: LFO WAVE : 0
14 p: TRANSPOSE : 0.5
15 p: P MODE SENS . : 0.428571
16 p: PITCH EG RATE 1 : 1
17 p: PITCH EG RATE 2 : 1
At the end of the output, I see many parameters I did not expect, for example:
1 p: MIDI CC 15|125 : 0
2 p: MIDI CC 15|126 : 0
3 p: MIDI CC 15|127 : 0
4 p: MIDI CC 15|128 : 0
1 https://docs.juce.com/master/classAudioProcessor.html
From plugin host to meta-controller 153
If you develop a plugin using JUCE, JUCE adds extra parameters by default. I
believe the purpose of these parameters is to expose the plugin to MIDI CC control
somehow. But for now, we need to ignore them as they are not the core parameters
we wish to manipulate. You can simply add a filter to the print function to remove
these special parameters:
1 for ( const juce :: A u d i o P r o c e s s o r P a r a m e t e r * p : params ) {
2 if (! p - > getName (100) . contains (" MIDI CC ") ) {
3 DBG (" p : " << p - > getName (100) << " : " << p - > getValue () ) ;
4 }
5 }
That creates a neural network with one input and as many outputs as the plugin
has parameters. You will need a variable in the PluginEditor.h private variables
From plugin host to meta-controller 154
called nn and of type NeuralNetwork. Verify that your code builds and runs. You
cannot train the network yet, as you must also create training data from the
plugin.
Write your own implementation, referring to the parameter querying code you
can find in the printPluginParameters function above.
When the user clicks the add button, call addTrainingData on the NeuralNet
work. Here is the signature for the addTrainingData function:
1 void add Tr ai n in gD at a (
2 std :: vector < float > inputs ,
3 std :: vector < float > outputs ) ;
You should now be able to adjust the torchknob, adjust the parameters on
the plugin and add training data. The effects of this will all be invisible until you
activate the other dial on the right-hand side of the interface.
outputs and send them to the plugin’s parameters. Create a public function in
PluginProcessor.h:
1 void s e t H o s t e d P l u g i n P a r a m V a l u e s ( std :: vector < float > values ) ;
In this part of the book, you will build an autonomous music improviser. The music
improviser is a plugin that receives MIDI data, models the data and generates
MIDI data in the style of the previously received input data. You can think of
interacting with the improviser as having a musical conversation with yourself.
Before we get into the details of building the improviser, in this chapter, I will
provide some background information on the field of algorithmic composition and
its related offshoot, musical agents. You will discover a vast treasure trove from
decades of research into analysing and generating music. I will then explain the
algorithm you will use to implement the improviser: the variable order Markov
model. I will compare the model to others, such as deep learning systems, and
explain why I have chosen this technique.
158
Background: all about sequencers 159
FIGURE 20.2
My own experience interacting with AI improvisers. Left panel: playing with Alex
McLean in Canute, with an AI improviser adding even more percussion. Right
panel: livecoding an AI improviser in a performance with musician Finn Peters.
continues until the present day, though many researchers are switching to deep
neural network methods. Pachet et al. have done some of the most sophisticated
work with Markov models, using them to carry out musical style transfer in the
Flow Machines project in the 2010s[13].
The 1990s saw the use of dynamical systems techniques, chaos and cellular
automata, e.g.[27]. The 1990s also saw algorithmic composition researchers using
evolutionary computation, the emerging trend in computational problem-solving
at the time. For example, Biles’ GenJam jazz improvisation generator[3]. As an
aside, I began experimenting with evolutionary techniques applied to sound syn
thesiser control during my master’s degree in the late 1990s. I reimplemented
some of that work using the Web Audio API in the mid 2010s1 . I then went
on to carry out my doctoral research using evolutionary techniques to program
synthesisers[48], eventually switching to deep neural network techniques in the late
2010s[49]. Automated sound synthesizer programming is not exactly algorithmic
composition, though!
Returning to the main thread, we have reached the early 2000s, which saw
the culmination of techniques such as musical grammars, genetic algorithms and
Markov modelling. This is very much a potted history – I do not want to get
too bogged down in a literature review here. For further reading, there are many
review articles available in open access, e.g.[15] and Nierhaus’s 2009 book pro
vides plenty of detail on techniques through to the deep learning era[30]. Speaking
of deep learning, many areas of applied computer science, including algorithmic
composition, entered the deep learning era in the 2010s, and there has been a lot
of work on algorithmic composition since then. In fact, it has been an ‘endless
stream’ according to Ji et al. in their 2020 comprehensive review of deep music
generation[21]. As well as that review paper, Briot’s article series reviewing deep
learning techniques for music generation is an excellent starting point for further
reading[4].
In summary, there have been decades of work in algorithmic composition utilis
ing a range of different computational techniques. Now that I have given you some
idea of the depth and range of that work, I will zoom in on music improvisers which
is our area of interest for the next few chapters.
neural networks are technically better sequence modellers than Markov models –
this is why GPT and Bard are neural networks, not Markov models!
However, the problem with deep neural networks, and the reason I have yet to
start using them to build improvisers, is that they need a lot of data and time to
train, and they cannot take on new data in real-time. They can respond to your
input in real-time, but the underlying model will not change or learn in real-time.
To change the underlying model requires a training phase. If you consider what
the training data actually is, you might encounter some problems. For example, do
you want your improviser to improvise using a massive dataset of music written
by other people? Perhaps you do, maybe you do not, but with deep networks, you
do not have a lot of choice.
Markov models can generate interesting patterns from data gathered from
scratch in real-time from a single performance if you want, or you can feed them
larger datasets. The improviser we will build is computationally efficient and can
learn on a small dataset in real-time. Musicians I have worked with clearly feel their
own presence in the music generated by these kinds of models and enjoy interacting
with them. I am not saying you cannot achieve this with neural networks, but you
can certainly accomplish this with Markov models, so that is what we shall use.
A, A, B, A, B, A, B, B, B (20.1)
To create a Markov model, we start by calling each of those letters a state. So, at
steps 1 and 2 in the sequence, the state is ‘A’; at step 3, the state is ‘B’. Then we
talk about state transitions. From step 1 to step 2, the state transitions from A
to A. From steps 2 to 3, the state goes from A to B. It is true that A to A is not
strictly a transition, as it looks like nothing changed, but imagine you are playing
musical notes. You play an A, then another A, then a B and so on. There are 8
transitions in that sequence:
A → A, A → B, B → A, A → B, B → A, A → B, B → B, B → B (20.2)
We can refer to the state after the transition as an observation. From those
Background: all about sequencers 164
A 0.7
A B
0.30.4 A 0.7 0.4
B 0.3 0.6
B 0.6
FIGURE 20.3
Visualisation of a two-state model on the left and the state transition probability
table on the right.
transitions, we can compute the probabilities for an observation given the current
state. So, state A is followed by either observation A or B. The observation A
follows state A once. State A goes to observation B three times. Therefore, the
A → A transition has a probability of 25%, and the A → B has a probability
of 75%. We can compile all of that into a state transition matrix showing the
probabilities of transition between all observed states:
A B
A 0.25 0.75 (20.3)
B 0.5 0.5
Figure 20.3 illustrates this in the form of a graph, with the state transition
matrix alongside for reference.
A, A, B, A, B, A, B, B, B (20.4)
Second-order modelling looks at the observations following states that are made
from two letters:
A, A → B
A, B → A
B, A → B
A, B → A
B, A → B
A, B → B
B, B → B
A B
A,A 0 1
A,B 0.666 0.333 (20.5)
B,A 0 1
B,B 0 1
To generate from this second-order model, you would typically choose a two-
letter state from the left column, then sample from its possible next states. The
next state you choose is then pushed on to the end of the previous state, and the
older state is dropped, so you retain a two-letter state. In terms of data struc
tures, this is a FIFO stack. With the probabilities in the matrix above, generated
sequences would not be particularly interesting as you would usually end up gen
erating lists of Bs. Sequence modellers getting stuck in a repetitive state like this
in a musical context is not desirable. Well, maybe for techno fans. I will explain
how I work around that problem shortly.
You can operate in whatever order you like, up to the length of your sequence
Background: all about sequencers 166
1 A 0.25 1 1
B 0.5
FIGURE 20.4
Visualisation of a variable order Markov model containing first and second order
states.
minus one. As the order increases, the number of observations decreases. For ex
ample, if you have nine letters in your sequence like we did, you can only make one
eighth-order observation. Higher order means your model knows more about the
history of events before an event, so it might make more sensible musical decisions.
To clarify that with a musical example, imagine you were doing a listening test
where you listen to a fragment of music, then you have to guess what the next
note or chord is. Do you think you would find it easier to guess the next note with
a longer or shorter fragment?
However, higher order also means fewer observations, meaning the model has
less variation in its output. How can you make sure the model has both the
stateful, musical awareness provided by higher orders and the variation that makes
for interesting sequences? In the next section, I will describe the variable order
Markov model, which takes advantage of both the richness of lower-order models
and the understanding of higher-order models.
A B
A 0.25 0.75
B 0.5 0.5
A,A 0 1 (20.6)
A,B 0.666 0.333
B,A 0 1
B,B 0 1
As you can see, with the same sequence, we have the richness of the first-order
model (more observations) with the information of the second-order model (longer
memory). Figure 20.4 presents a graph-based visualisation of this model. Whilst
the model looks more complex, it is just a combination of the first- and second-
order models. Generation is more complicated – the algorithm I use for generation
runs as follows:
A B
A,A 0 1
A,B 0.666 0.333 (20.7)
B,A 0 1
B,B 0 1
To prevent a sequence made from only Bs, I add an extra logical test to the
algorithm: if there is only one observation in the matrix for this state, or there
are no observations at all, reduce the order and sample again. This is the trick to
avoiding repetitive loops. You only sample from rows in the matrix with at least
two options.
I think that is enough abstract thinking about Markov models for now. I am
keen to proceed with the implementation. If you want to read some analysis about
sequence prediction performance and such for musical and other sequences using
variable order Markov models, and you are ready for a technical and mathematical
deep-dive, I refer you to [2].
In this chapter, I will show you how to use a variable-order Markov model library
I have created. The library is written in C++. We will work on various command
line experiments, feeding the model different sequences and investigating how it
behaves.
169
Programming with Markov models 170
A,B
1.00
B,A A,B,A A
C B
FIGURE 21.1
Example of the Markov model generated by some simple code.
number,state,:number,observations.
The first number signifies the order of the state. The state is a comma-separated
list of the symbols representing the state. The second number is the number of
observations seen to follow that state. The observations are a comma-separated list
of observations. Try adding more events to the model and checking how the model
string changes. Figure 21.1 illustrates the model as it stands after running that
code. See if you can connect that diagram to your interpretation of the model’s
string representation.
You can pass any strings you like as the states for the model. Here is the
signature for the putEvent function from MarkovManager.h:
1 void putEvent ( state_single symbol ) ;
Programming with Markov models 171
‘state single’ is the type for the argument. If you ctrl-click on that type in your
IDE, you should see this from MarkovMode.h:
1 typedef std :: string state_single ;
This means that state single is just a renamed std string type. So you can send
in any strings you like for your symbols. How about something musical? We can
add ‘#’ to the letters to specify sharps and use ‘-’ to represent a rest:
1 mm . putEvent (" A ") ;
2 mm . putEvent (" B ") ;
3 mm . putEvent (" C #") ;
4 mm . putEvent (" A ") ;
5 mm . putEvent (" B ") ;
6 mm . putEvent (" A ") ;
7 mm . putEvent (" C #") ;
8 mm . putEvent (" -") ;
9 mm . putEvent (" G #") ;
You might see different outputs on different runs. This is a variable Markov
model, which means it tries to find the highest-order output with an available
observation following it. You can query the model for the order of the previous
event using the getOrderOfLastEvent function:
1 for ( auto i =0; i <5;++ i ) {
2 state_single next = mm . getEvent () ;
3 int order = mm . g e t O r d e r O f L a s t E v e n t () ;
4 std :: cout << " Next state " << next << " order " << order << std ::
endl ;
5 }
Programming with Markov models 172
What is going on here? Why does the order increase up to 3? The MarkovMan
ager object maintains a memory of the state as the numbers are fed in. This allows
it to build the state transition matrix. When it generates, it maintains a separate
memory, which is the generation state, so the state it will use to query the state
transition matrix. As the generation proceeds, the memory grows and can query
with increasing order. You can try sending in the needChoices flag to getEvent.
In the previous chapter, I explained how the Markov model can become repetitive
or uninteresting if you allow it to sample from transition matrix rows with only
one choice. If you set needChoices to false, it will be allowed to sample rows with
only one observation. You will achieve higher order but less varied output.
In this chapter, you will start building the Markov improviser plugin, which models
incoming MIDI data and generates MIDI data back out from that model in a
similar but ‘interestingly different’ style. The chapter starts with an overview of
the plugin’s main components, then explains how to prepare the JUCE project for
building the plugin. Following that we dig into pitch modelling and find out how
to access MIDI sent into the plugin, then to store it into a Markov model. You will
see how you can extract data from the MIDI messages sent to the processBlock
function, then convert them into a format that is suitable for the Markov model
library we are using. At the end of the chapter you will have a basic monophonic
improviser which can model then generate streams of notes.
174
Starting the Improviser plugin 175
IMPROVISER PLUGIN
Pitch model
Duration model
MIDI MIDI
Onset model
Velocity model
FIGURE 22.1
Overview of the autonomous improviser plugin. Yes, a keytar.
The project pulls the JUCE library from a folder two levels above, which will
work if you download the whole code pack. This line in CMakeLists.txt specifies
the location for JUCE:
1 a d d _ s u b d i r e c t o r y (../../ JUCE ./ JUCE )
The first reference to JUCE in that line should point to the location of the
full JUCE distribution on your system. The CMakeLists.txt file also specifies a
command line test project wherein you can quickly try things out with the Markov
model library. These lines set it up:
1 add_e xecutab le ( markov - expts src / M a r k o v E x p e r i m e n t s . cpp )
Starting the Improviser plugin 176
FIGURE 22.2
The user interface for the basic JUCE MIDI processing plugin.
Compiling the whole plugin project every time to try something out with the
underlying Markov library slows my development cycle. The markov-expts target
builds very quickly as it is separated from the build for the Markov model library
and the JUCE code, so you can use it to test and prototype things that do not
require the complete JUCE program.
Going back to the JUCE target in this project, here are some more things that
are useful to know:
1. The project is a JUCE plugin project, but you can also run it in Stan
dalone mode
2. The plugin target has ‘IS MIDI EFFECT’, ‘NEEDS MIDI INPUT’ and
‘NEEDS MIDI OUTPUT’ set to TRUE as it will be a MIDI effect-type
plugin
3. You should change the settings for COMPANY NAME, PLU
GIN MANUFACTURER CODE and PRODUCT NAME as you see fit
Once you have the starter project downloaded, unpacked and configured, gen
erate the IDE project in your usual way, open it up in your usual IDE and build
it.
keyboard. Figure 22.2 illustrates the user interface for the starter project. The on-
screen keyboard can generate MIDI messages, which you can use to create data
for the Markov models.
Take a look at the user interface code file PluginEditor.cpp. You will see two
functions there called handleNoteOn and handleNoteOff. In terms of C++ tech
niques, the MidiMarkovEditor class extends the juce-MidiKeyboardState-Listener
class. That class is abstract – it has no implementation for those two functions.
Therefore, any class that extends it must implement those two functions. Those
functions will be called by the piano keyboard widget when the user clicks and
releases the notes. So, if you want to print out the notes as they come in, you can
edit the handleNoteOn function as follows:
1 juce :: MidiMessage msg1 =
2 juce :: MidiMessage :: noteOn ( midiChannel ,
3 midiNoteNumber ,
4 velocity ) ;
5 // add this line
6 DBG ( msg1 . getNoteNumber () ) ;
To see debug output on Windows/ Visual Studio, you must enable the Imme
diate mode panel as described in the set-up section 3.5.1. Build and run the plugin
in Standalone mode, then click on the keyboard. You should see output something
like this:
1 77
2 78
3 79
4 80
5 81
6 82
The sample offset is zero for now – that is passed in from the user interface
when it calls addMidi. So now the audio processor is stashing the MIDI messages
away for later processing.
We will now add any notes we have received to the pitchModel object using its
putEvent function. In the MidiMarkovProcessor::processBlock function, add the
following code:
1 if ( midiToProcess . getNumEvents () > 0) {
2 midiMessages . addEvents ( midiToProcess ,
3 midiToProcess . g e t F i r s t E v e n t T i m e () ,
4 midiToProcess . g e t L a s t E v e n t T i m e () +1 , 0) ;
5 midiToProcess . clear () ;
6 }
This code checks if the MIDI buffer midiToProcess has any events (which it
will if the user clicks on the on-screen MIDI keyboard). If it does, it grabs the
events using a time window spanning the earliest to the latest event. Then it
calls addEvents on midiMessages, which is the buffer of MIDI messages sent to
processBlock – remember it has this function prototype:
1 void processBlock ( juce :: AudioBuffer < float >& audioBuffer ,
2 juce :: MidiBuffer & midiMessages ) override ;
The second argument will contain any MIDI messages sent from the plugin host
(if you run the plugin in a DAW or similar). So, we are adding all the messages
collected from the on-screen MIDI keyboard to the messages received from the
host. All the MIDI we need to deal with is now in one place.
After all that, we remove the messages stored from the on-screen keyboard
in midiToProcess by calling clear. We are now ready to push the notes to the
Markov model. In processBlock, iterate over the midiMessages and add all note-
on messages to the Markov model:
1 for ( const auto metadata : midiMessages ) {
2 auto message = metadata . getMessage () ;
3 if ( message . isNoteOn () ) {
4 pitchModel . putEvent ( std :: to_string ( message . getNoteNumber () ) ) ;
5 DBG (" Markov model : " << pitchModel . g e t M o d e l A s S t r i n g () ) ;
6 }
7 }
This code pulls any MIDI note on messages out of the buffer and adds them
to the Markov model. Then, it prints out the full state of the model. Try running
this and pressing the keys on the on-screen piano keyboard. You should see output
like this, showing that your model is rapidly growing:
1 9 ,96 ,96 ,95 ,77 ,79 ,81 ,89 ,91 ,93 ,:1 ,95 ,
2 9 ,96 ,96 ,95 ,93 ,91 ,89 ,86 ,84 ,82 ,:1 ,80 ,
If you have a MIDI controller keyboard handy, you can run the application in
Standalone mode and configure the keyboard as an input device. You can then
put notes into your model by playing on the keys.
Starting the Improviser plugin 180
FIGURE 22.3
The MIDI Markov plugin running in AudioPluginHost. Note how it receives MIDI
and then passes it on to the Dexed synthesiser.
If you call that directly in processBlock, you will send a message every time
processBlock is called. That is probably a bad idea as processBlock gets called
Starting the Improviser plugin 182
FIGURE 22.4
Using AudioPluginHost’s MIDI Logger plugin to observe the MIDI coming out of
the Markov plugin.
sampleRate / blocksize times per second, for example 43 times a second for a
typical 44.1k/1024 set-up. So it is time to make a design decision – when should
the music improviser play a note? Come up with your own idea and implement
it. The most straightforward idea is to play a note when it receives a note. To do
that, you could add this code to the end of processBlock after you have added any
incoming notes to the Markov model:
1 if ( midiMessages . getNumEvents () > 0) {
2 int note = std :: stoi ( pitchModel . getEvent ( true ) ) ;
3 juce :: MidiMessage nOn = juce :: MidiMessage :: noteOn (1 ,
4 note ,
5 0.5 f ) ;
6 midiMessages . addEvent ( nOn , 0) ;
7 }
Try it out – you should see notes coming out of the plugin when it receives
notes. You will find that the plugin generates notes when it receives note-on and
note-off messages. The on-screen keyboard generates a note-off message when you
release the mouse after pressing one of the piano keys. Can you think of a way to
block the double note trigger?
You can test the plugin with the AudioPluginHost, as shown in figure 22.4.
This figure also shows how you can wire up the MIDI Logger plugin to print
out any MIDI messages you generate from the Markov plugin. The MIDI Logger
plugin comes as a built-in plugin with AudioPluginHost.
Starting the Improviser plugin 183
The unsigned long type is appropriate as the number of elapsed samples might
become quite large. Add these to the private section in processBlock.h:
1 unsigned long noteOffTimes [127];
2 unsigned long el apsedSa mples ;
Then, initialise the variables in the constructor:
1 elaps edSampl es = 0;// or if you are a real engineer ,
2 // do this C ++11 - style in the initialiser list
3 for ( auto i =0; i <127;++ i ) {
4 noteOffTimes [ i ] = 0;
5 }
First of all, let’s deal with managing elapsedSamples. At the end of the pro
cessBlock function, just add the length of the block to the elapsedSamples:
1 elaps edSampl es += buffer . getNumSamples () ;
That means every time processBlock is called, elapsedSamples increases by the
number of samples in the block, e.g. 1024. When you want to remember to play a
note off for later, store the time at which the note off should be triggered in the
noteOffTimes. In the processBlock’s note-generating code:
1 ...
2 int note = std :: stoi ( pitchModel . getEvent ( true ) ) ;
3 juce :: MidiMessage nOn = juce :: MidiMessage :: noteOn (1 , note , 0.5 f ) ;
4 g e n e r a t e d Me s s a g e s . addEvent ( nOn , 0) ;
5 // now the note off
6 noteOffTimes [ note ] = ela psedSam ples + getSampleRate () ;
you want it to load the latest version of your plugin. If you save the layout, it
should reload automatically next time, making things easier.
This is not sample-accurate timing as you do not check the timings for every
sample in the buffer, just once per buffer. Feel free to work up a more accurate
timing system, but this will work for now, and the worst timing will be a buffer
length out.
Compile and verify you can see your button. Then verify that you have imple
mented the button listener interface in PluginEditor.h, which the template project
already does. Then add a function prototype to the class in PluginProcessor.h to
allow the UI to call resetModel:
1 void r e s e tM a r k o v M o d e l () ;
Then, an implementation in PluginProcessor.cpp:
1 void M i d i M a r k o v P r o c e s s o r :: r e s e t M a r k o vM o d e l ()
2 {
3 pitchModel . reset () ;
4 }
Finally, put some code in PluginEditor.cpp to deal with the reset:
1 void M i d i Ma r k o v E d i t o r :: buttonClicked ( juce :: Button * btn )
2 {
3 if ( btn == & resetButton ) {
4 audi oProcess or . r e s e t M a r k o v M o d e l () ;
5 }
6 }
Starting the Improviser plugin 186
In this chapter, you will add time modelling to the improviser. You will do this
using two separate Markov chains: one for note duration and one for ‘inter-onset
interval’, which is the time that elapses between the start of notes. You learn how
the processBlock function provides you with MIDI messages with sample-accurate
timing data. You will use this timing data to measure note durations and inter-
onset intervals. You can then create models of the two aspects of time, which you
can use to control the behaviour of the note-generating model.
note duration
inter-onset-interval
FIGURE 23.1
Note duration is the length the note plays for. Inter-onset interval is the time that
elapses between the start of consecutive notes.
187
Modelling note onset times 188
elapsedTime elapsedTime
message message
getTimeStamp() getTimeStamp()
note on note-on
FIGURE 23.2
Measuring inter-onset-intervals. The IOI is the number of samples between the
start and end sample. elapsedSamples is the absolute number of elapsed samples
since the program started and is updated every time processBlock is called; mes
sage.getTimestamp() is the offset of the message in samples within the current
block.
can continue working on the plugin code you had at the end of the last chapter,
or if you want a clean start, you can start with the project from section 39.4.3 in
the repo guide.
Make sure you call reset on that model in your resetMarkovModel function in
PluginProcessor.cpp. Then initialise lastNoteOnTime to zero in the initialiser list
and setup the iOIModel:
1 , pitchModel {} , iOIModel {} , lastNo teOnTime {0} , ela psedSamp les {0}
2 // constructor body here
3 {
I see output like this from that code – make sure you see something similar
when you run the app in standalone mode and click on the on-screen keyboard:
1 Note on at : 55296
2 Note on at : 82944
3 Note on at : 110592
4 Note on at : 152064
5 Note on at : 177664
Before things get any more complex, you should modularise the code. You can
add a function called analyseIoI to PluginProcessor.cpp/h
1 // in the header , private section :
2 void analyseIoI ( const juce :: MidiBuffer & midiMessages ) ;
3 // in the cpp , add the code from above :
4 void M i d i M a r k o v P r o c e s s o r :: analyseIoI ( const juce :: MidiBuffer &
midiMessages )
5 {
6 for ( const auto metadata : midiMessages )
7 {
8 ...
9 }
10 }
11 // then , in processBlock , call the function :
12 u p d a t e N o t e O n T i m e ( midiMessages ) ;
Note how I used const and &. const means we will not modify the sent data (the
Modelling note onset times 190
MidiBuffer) – we are just reading it. The ampersand means the data will be sent
by reference instead of sending in a copy. This makes the code more efficient as it
does not need to copy the MidiBuffer every time that function gets called. Whilst
we are being good engineers, you can also modularise the pitch modelling code
from processBlock into its own function if you like. I call my pitch management
function analysePitches:
1 // in the header , private section :
2 void analyseP itches ( const juce :: MidiBuffer & midiMessages ) ;
3 // in the cpp , add the pitch modelling code :
4 void M i d i M a r k o v P r o c e s s o r :: anal ysePitch es ( const juce :: MidiBuffer &
midiMessages )
5 {
6 for ( const auto metadata : midiMessages ) {
7 auto message = metadata . getMessage () ;
8 if ( message . isNoteOn () ) {
9 pitchModel . putEvent ( std :: to_string ( message . getNoteNumber () ) ) ;
10 }
11 }
12 }
Now you need to update the analyseIoI function, so it computes the IOI and
adds it to the IOI Markov model:
1 unsigned long ex a ct No te O nT im e = elapse dSample s + message . getTimeStamp
() ;
2 // compute the IOI
3 unsigned long iOI = ex ac t No te On T im e \ textendash {} la stNoteOn Time ;
4 // add it to the model
5 iOIModel . putEvent ( std :: to_string ( iOI ) ) ;
6 lastN oteOnTi me = ex ac t No te On T im e ;
7 DBG (" Note on at : " << e xa ct N ot eO nT i me << " IOI " << iOI ) ;
Now build and run in Standalone mode. Tap keys on the on-screen keyboard
and monitor the IOIs detected. Tapping at approximately one-second intervals, I
see the following output (don’t tell anyone that my timing is this bad – I sometimes
claim to be a drummer!):
1 Note on at : 778752 IOI 45568
2 Note on at : 820224 IOI 41472
3 Note on at : 860672 IOI 40448
My experience with improvisers is that you do not want them to model ex
cessively long pauses. This can lead to long pauses in the output, and you will
be wondering if things are working correctly if that happens during your perfor
mance. With some checking code, you can only add IOIs below a certain length
to the model. Something like this will do the trick – only add IOIs less than 2
seconds and more than some short time to avoid re-triggers:
1 if ( iOI < getSampleRate () * 2 &&
2 iOI > getSampleRate () * 0.05) {
Modelling note onset times 191
Build and run to check things are working as you expect. If you run the plugin
Modelling note onset times 192
inside AudioPluginHost, you should see MIDI coming from the plugin’s output
when you send notes into it, as before.
lot of logic and refactoring in this section. Another aspect you can work on for
yourself is perfecting the timing of sending the note-on and note-off messages.
At the moment, the messages are always triggered at the start of the first audio
block occurring after their trigger time. This means a note could be ‘block-length’
samples late. The solution is to find all notes that should be triggered between
elapsedSamples and elapsedSamples + block size. Then, set timestamp offsets on
those messages according to where they should fall in the block. I leave it to you
to figure that out.
In this chapter, you will complete the modelling of time and rhythm in the impro
viser by implementing a note duration model. The note duration model remembers
when different notes start and when they end. It then computes the duration of
the notes and models this using a separate Markov model. At the end of the
chapter, you should have a reasonable plugin working that can improviser with
monophonic MIDI sequences.
194
Modelling note duration 195
the note-off for that note arrives, we can compute the number of samples that
have elapsed since the stored note-on time. That will be the duration of the note.
Figure 24.1 illustrates the concept of measuring the number of samples between
note-on and note-off. It does not show the array, but there would be a separate
‘start sample’ for each possible note. It does show how notes can fall across calls
to processBlock.
Right, it’s time to edit some code. Add a new private data member in Plugin
Processor.h:
1 unsigned long noteOnTimes [127];
Initialise all values to zero in the constructor, as you already did for the note-
OffTimes array; in fact, just use the same loop:
1 for ( auto i =0; i <127;++ i ) {
2 noteOffTimes [ i ] = 0;
3 noteOnTimes [ i ] = 0;
4 }
Then, when you detect a note in your analysePitches function, log elapsed time
into the array for that note:
1 ...
2 for ( const auto metadata : midiMessages ) {
3 auto message = metadata . getMessage () ;
4 if ( message . isNoteOn () ) {
5 DBG (" Msg timestamp " << message . getTimeStamp () ) ;
6 pitchModel . putEvent ( std :: to_string ( message . getNoteNumber () ) ) ;
7 noteOnTimes [ message . getNoteNumber () ]
8 = elapse dSamples + message . getTimeStamp () ;
9
10 ...
Notice that I did not just use elapsedSamples – elapsedSamples only updates
once per call to processBlock, but the note might have occurred within that block.
On my system running with a block size of 2048, I see output like this from that
code when I run the plugin in the AudioPluginHost as shown in 24.2.
1 Msg timestamp 1107
2 Msg timestamp 593
3 Msg timestamp 1133
4 Msg timestamp 399
5 Msg timestamp 1775
6 Msg timestamp 102
7 Msg timestamp 623
So now you know when any incoming notes started. The next step is to detect
when the notes end and then compute their length. Going back to the note iterator
in analysePitches, add a new block to deal with note-offs:
1 if ( message . isNoteOff () ) {
Modelling note duration 196
FIGURE 24.2
Testing the getTimestamp function on note–on messages – the timestamp is always
between zero and the block size of 2048.
Try holding down the notes for longer and verifying that the lengths of the
notes look correct in the printout. Now, you need a Markov model to model
these note durations. You can simply add another MarkovManager object to your
PluginProcessor.h private fields:
1 MarkovManager n o t e D u r a t i o n M o d e l ;
Now call putEvent on that model in your note iterator code in analysePitches:
1 if ( message . isNoteOff () ) {
2 unsigned long noteOffTime = elaps edSampl es + message . getTimeStamp
() ;
3 unsigned long noteLength = noteOffTime
4 noteOnTimes [ message . getNoteNumber () ];
5 n o t e D u r a t i o n M o d e l . putEvent ( std :: to_string ( noteLength ) ) ;
Modelling note duration 197
6
7 }
Engineers like modular code; analysePitches is supposed to deal with note
pitches, not durations. Let’s create a separate function called analyseDurations
that contains the note duration code, leaving the pitch code in analysePitches.
Something like this:
1 void M i d i M a r k o v P r o c e s s o r :: anal ysePitch es (
2 const juce :: MidiBuffer & midiMessages )
3 {
4 for ( const auto metadata : midiMessages )
5 {
6 auto message = metadata . getMessage () ;
7 if ( message . isNoteOn () )
8 {
9 pitchModel . putEvent ( std :: to_string ( message . getNoteNumber () ) ) ;
10 }
11 }
12 }
And this:
1 void M i d i M a r k o v P r o c e s s o r :: a n al ys eD u ra ti on (
2 const juce :: MidiBuffer & midiMessages )
3 {
4 for ( const auto metadata : midiMessages )
5 {
6 auto message = metadata . getMessage () ;
7 if ( message . isNoteOn () )
8 {
9 noteOnTimes [ message . getNoteNumber () ] = elapse dSamples +
message . getTimeStamp () ;
10 }
11 if ( message . isNoteOff () ) {
12 unsigned long noteOffTime =
13 elap sedSampl es + message . getTimeStamp () ;
14 unsigned long noteLength = noteOffTime
15 noteOnTimes [ message . getNoteNumber
() ];
16 n o t e D u r a t i o n M o d e l . putEvent ( std :: to_string ( noteLength ) ) ;
17 }
18 }
19 }
Then your processBlock can contain three neat function calls like this:
1 analy sePitch es ( midiMessages ) ;
2 a na ly se D urat io n ( midiMessages ) ;
3 analyseIoI ( midiMessages ) ;
That feels better. Nothing worse than poor modularisation. Remember to reset
the new model in the resetModel function on your PluginProcessor.
Modelling note duration 198
The last line dictates the length of the note, which is hardcoded to one second.
Instead, try this line, which queries the note duration model for a length:
1 unsigned int duration = std :: stoi ( n o t e D u r a t i o n M o d e l . getEvent ( true ) ) ;
2 noteOffTimes [ note ] = ela psedSam ples + duration ;
Experiment with the plugin – send sequences of long and short notes to see if
it is correctly modelling note duration. Use the AudioPluginHost MIDI Logger to
check the raw MIDI output. You can see a fully working version of the plugin with
monophonic pitch, IOI and duration modelling in the repo guide section 39.4.5.
Modelling note duration 199
24.3 Quantisation
If you send the improviser MIDI data from your DAW, it will probably already
be quantised, meaning that the timing for the notes snaps to a timing grid. The
improviser will also use this quantised timing for its models. But the timing will
not be quantised if you are playing freely into the plugin. You need to quantise
the timing to avoid ending up with a primarily low-order model. The reason is
that each IOI and note duration will be different because it can fall at any point,
sample-wise.
Can you work out a way to quantise the note durations and IOIs? If you quan
tise them, you will get a richer (higher-order) model as there will be more instances
of each duration. Of course, increasing quantisation also makes the timing more
‘robotic’, and you might not want robotic timing. I leave it to you to experiment
with timing.
In this chapter, you will develop the Markov modelling improviser plugin further
so that it can model polyphonic MIDI sequences. The basic principle here is that
polyphonic states are just another type of state. Our first job will be to figure out
how to pass polyphonic states to the Markov model. Then, we must work out how
to process those states when generating from the model. You will also deal with
some practicalities relating to humans playing chords on keyboards, where the
note-ons do not happen at precisely the same time. The solution is a ChordDe
tector class, which provides you with chords according to some constraints about
note-on timings.
200
Polyphonic Markov model 201
1 std :: string M i d i M a r k o v P r o c e s s o r :: n o t e s T o M a r k o v S t a t e (
2 const std :: vector < int >& notesVec )
3 {
4 std :: string state {""};
5 for ( const int & note : notesVec ) {
6 state += std :: to_string ( note ) + " -";
7 }
8 return state ;
9 }
10
11 std :: vector < int > M i d i M a r k o v P r o c e s s o r :: m a r k o v S t a t e T o N o t e s (
12 const std :: string & notesStr )
13 {
14 std :: vector < int > notes {};
15 if ( notesStr == "0") return notes ;
16 for ( const std :: string & note :
17 MarkovChain :: tokenise ( notesStr , ’ - ’) ) {
18 notes . push_back ( std :: stoi ( note ) ) ;
19 }
20 return notes ;
21 }
Essentially, you create the state for a vector of notes by concatenating them
with hyphens. Then, to get back to the vector from the state, tokenise on the
hyphen. The code uses appropriate const and & syntax to make it clear the sent
data is not edited and to ensure it is not unnecessarily copied.
chord
note on
note-on note-on
note-on
single notes
chord time
thresholds
FIGURE 25.1
If notes start close enough in time, they are chords. If the start times fall outside
a threshold, they are single notes. This allows for human playing where notes in
chords do not happen all at the same time.
notes are when I play a chord, which you can put at the top of your processBlock
function:
1 for ( const auto metadata : midiMessages ) {
2 auto message = metadata . getMessage () ;
3 if ( message . isNoteOn () ) {
4 DBG (" note "
5 << message . getNoteNumber ()
6 << " ts : "
7 << message . getTimeStamp ()
8 << std :: endl ) ;
9 }
10 }
Running on a machine with a sample rate of 96kHz and buffer size of 256 I see
the following output when I play a chord:
1 note 76 ts : 147
2 note 72 ts : 113
3 note 74 ts : 112
The time stamps are not the same – they are up to 30–40 samples or about
0.3ms apart at worst, and they potentially arrive in different calls to processBlock,
so you need a way to remember notes across calls to processBlock. Another test
is to play a fast monophonic sequence and to check the intervals between notes.
That will tell you when you should treat things as monophonic.
Polyphonic Markov model 203
Once you have added it to your project, you can add a ChordDetector object
to your PluginProcessor.h’s private fields:
1 # include " ChordDetector . h "
2 // ... in the private section :
3 ChordDetector chordDetect ;
Then, set the max interval on chordDetect when prepareToPlay is called since
that is when you can calculate the number of samples you want to allow for the
chord threshold interval. In prepareToPlay:
1 double m a x I n t e r v a l I n S a m p l e s = sampleRate * 0.05; // 50 ms
2 chordDetect = ChordDetector (( unsigned long ) m a x I n t e r v a l I n S a m p l e s ) ;
To detect chords and send them to the Markov model appropriately formatted,
replace the note-on section of your MIDI processing loop in analysePitches with
this:
Polyphonic Markov model 204
1 if ( message . isNoteOn () ) {
2 chordDetect . addNote (
3 message . getNoteNumber () ,
4 // add the offset within this buffer
5 elap sedSampl es + message . getTimeStamp ()
6 );
7 if ( chordDetect . hasChord () ) {
8 std :: string notes =
9 M i d i M a r k o v P r o c e s s o r :: n o t e s T o M a r k o v S t a t e (
10 chordDetect . getChord ()
11 );
12 DBG (" Got notes from detector " << notes ) ;
13 pitchModel . putEvent ( notes ) ;
14 }
15 ...
16 }
In that code, you call addNote to tell the chord detector about the note, query
if it is ready to give you a chord with hasChord, and retrieve the chord with
getChord. Then, you use notesToMarkovState to convert the notes vector into
a suitable string for the Markov model. You might get a single note back from
the chord detector if there is no chord – that is fine, as it should also be able to
detect single notes. You can test this in Standalone mode. Click repeatedly on the
on-screen keyboard to verify that you receive single notes. When you do that, you
will see output something like this:
1 ChordDetector :: addNote 2205 n : 98 : 201216
2 Got notes from detector 86
3 ChordDetector :: addNote 2205 n : 88 : 223232
4 Got notes from detector 98
Now try holding the mouse button down whilst sliding up and down the on-
screen keyboard. Let go occasionally, and you will see chords being detected, e.g.:
1 Got notes from detector 76 -77 -79 -81
You should comment out the lines where you pull notes from the pitchModel
when doing this test, as that code cannot cope with polyphonic states like this
currently.
Note that I used the same duration for all the notes in the chord. I set the
velocity to 0.5f (We do not have a velocity model yet, but we will add one in
the next section). If you try to pull a separate duration for each note, you will
over-sample the model, causing the durations generated to not reflect the durations
received by the model. You can try pulling multiple durations to see what happens.
Then don’t forget to call reset on the model in the resetModel function in
PluginProcessor.cpp. That’s it for the velocity modelling part.
That is it. In the repo guide, you can find the fully working model with poly
phonic note modelling in example 39.4.8. I will now suggest experiments and
extensions you can work on.
Welcome to Neural Effects! In this part of the book, I will explain how you can use
a neural network as an effects unit. Before presenting the full details of creating a
neural audio effect later, I will provide some background information to help you
understand the neural effects techniques. The background starts in this chapter,
which presents a brief history of audio effects from the use of reverberant buildings
through to neural networks emulating guitar amplifiers. Following this chapter, I
will present further background material about what I call the ‘DSP Trinity’,
namely finite and infinite impulse response filters and waveshapers. When we
reach the coverage of neural effects, I will show you the full details of preparing
training data, designing the network architecture and implementing the training
in Python. Then, we will switch to C++ to develop the plugin. We will compare
two methods for running neural networks: TorchScript and RTNeural. At the end
of this part of the book, you will have a fully working neural network emulation
of a guitar amplifier running in a plugin.
210
Welcome to neural effects 211
Much of this work relies upon established neural network architectures such
as recurrent and convolutional models and you will learn all about these archi
tectures in the upcoming chapters. But another stream of work in neural effects
involves differentiable digital signal processing (DDSP), which allows more tra
ditional signal processing components to be embedded within neural networks.
The DSP components’ parameters are then adjusted using neural network train
ing methods. The idea is that the neural network gets a head start in learning an
audio transformation as it starts with components which are designed to be able to
carry out that kind of transformation, given the correct parameters. An example
of this work came from Kuznetsov, Parker and Esqueda, who worked for Native
Instruments[22]. They showed how to create efficient non-linear effects models for
EQ and distortion using DDSP techniques.
With that, I will draw this short history of audio processing to a close. For
further reading, the Wilmering et al. article is a good starting point for audio
effects history, but there are also some fascinating articles online covering individ
ual company and brand histories, such as Eventide and Boss. For neural effects,
Vanhatalo et al. presented a survey in 2022 covering much of the work on neural
guitar amp modelling, though we are going to implement one of those models for
ourselves through the next few chapters[43]. If you are keen to read some more
technical detail about state-of-the-art neural effects, you will find some very tech
nical information about neural reverberators and some useful references in[24].
In this chapter, I will introduce some important digital signal processing (DSP)
concepts. It is beyond the scope of this book to take a deep dive into the details of
traditional digital audio effects design, but I am going to provide theoretical and
code examples. For a detailed treatise on audio effects, I recommend Will Pirkle’s
book[33]. Here, I will cover the three main techniques used in digital audio effects
and connect these to concepts found in neural network architectures. I refer to
the three techniques as the ‘DSP trinity’: Finite Impulse Response Filters (FIR),
Infinite Impulse Response Filters (IIR) and waveshapers. In the following sections,
I will lead you through the basics of digital signal processing, trying to make it
as accessible as possible by presenting the information in code, visualisations, and
more traditional mathematical expressions.
214
Finite Impulse Responses, signals and systems 215
We can call that system a ‘one-pole filter’. One-pole because the output calcu
lation only considers a single input. Here is a two-pole filter which considers two
inputs – the input now x[n] and the input at the previous step x[n − 1]. This time,
instead of a single coefficient b we have two coefficients b0 and b1 :
Some people prefer to unpack the equation like this, which is less elegant but
maybe more readable:
FIGURE 27.1
The impulse signal and the impulse responses of a one-pole, two-pole and three-
pole system.
characteristics, i.e., it is linear and time-invariant (more on that later), you can
calculate the system’s output in response to any signal you like.
Why can you calculate the output of a system for any signal, if you have the
impulse response? The answer is that there is a direct relationship between the
coefficients of a system (b0 , b1 etc.) and the impulse response. In fact, impulse
responses are the coefficients, at least for the types of systems we have seen so
far. Think back to the 2-pole averaging filter we discussed earlier. In the equation
and code we defined the coefficients as 0.5 and 0.5:
What about time invariance? This means that passing a signal in with a delay
does not change the output signal in any way aside from delaying it. The behaviour
of the system does not change over time. So the 1930s Leslie speaker we discussed
earlier which has a spinning treble horn is not time-invariant. Here is an example
of time invariance where two signals x1 and x2 , with x2 the same as x1 but with
a single sample delay are passed through the simple system y[n] = b0 x[n]:
Finite Impulse Responses, signals and systems 219
220
Convolution 221
the possibilities for audio processing and evolutionary algorithms. Since then, I
lost touch with Martin, but when preparing this chapter, I needed a simple way to
load audio files into a C++ program with minimal dependencies. The WAV format
is perhaps the most common non-compressed audio file format, but it can be a
bit tricky to work with because it has several ways to format its data. I found the
tinywav library on Github by chance and was delighted to see who had written it –
my old collaborator Martin Roth. Tinywav does precisely what I wanted for these
convolution examples – it loads a WAV file into a list of floats. It also provides
functions to save floats back out to WAV format.
Right, back to the main task. Here is a program that uses tinywav to load a
WAV file into memory:
1 # include "../../ tinywav / myk_tiny . h "
2 # include < string >
3 # include < vector >
4 # include < iostream >
5
6 int main () {
7 std :: vector < float > audio = myk_tiny :: loadWav (" audio / drums_16bit .
wav ") ;
8 }
This code is described in the repository section 39.5.1. The myk tiny namespace
also contains a function saveWav:
1 static void myk_tiny :: saveWav (
2 std :: vector < float >& buffer ,
3 const int channels ,
4 const int sampleRate ,
5 const std :: string & filename ) ;
Go ahead and build the project. See if you can load and save some audio files.
Verify the files are saved correctly by loading them into an audio editor such as
Audacity. I have only tested it with mono files with a sample rate of 44,100Hz,
as those are the files I am interested in for now. Note that the saveWav function
uses the TW INT16 format, two bytes signed integer. Two-byte signed integers
fall in the range of −32768 to 32768. The loadWav function does not specify
the format, so it converts the samples into the default format for tinywav, a 32
bit/four-byte IEEE float. The float values will be from −1 to 1 once they appear
in your program. You can verify this by scanning through the vector and checking
the lowest and highest values. Do some experiments to confirm that things are
loaded and saved correctly and that the various number formats yield values in
the ranges you expect.
Convolution 222
For clarity, in that equation, x is the input signal, y is the output signal, n is
the current time step, N is the number of coefficients, i is the current coefficient
index and b( 0...N ) is the set of coefficients. If you look carefully, you will notice
that the code does more than that equation as the code N also iterates over the
whole of x, so to fully represent the code we need another :
N M
NN
y= bi x[n − i] (28.2)
i=0 n=0
Now we are iterating over the input signal from position 0 to position M , which
is the length, or xs.size()] on line 6 of the code above.
Experiment with the code. First, try creating the impulse response vector ‘b’
with some simple coefficients, e.g.:
Convolution 223
1 void experiment1 () {
2 std :: string xfile = "../ audio / drums_16bit . wav ";
3 std :: string yfile = "../ audio / drums_expt1 . wav ";
4 std :: vector < float > x = myk_tiny :: loadWav ( xfile ) ;
5 // simple moving average low pass filter
6 std :: vector < float > b = {0.5 , 0.5 , 0.5};
7 std :: vector < float > y = conv (x , b ) ;
8 myk_tiny :: saveWav (y , 1 , 44100 , yfile ) ;
9 }
You can find example code in the repo guide section 39.5.3. With my example
drum sample ‘drums 16bit’, which you can find in the audio folder for this part of
the book, I find that the resulting signal is somewhat clipped – after the convolu
tion, some of the values are > 1. When you write these to a file, it ‘ceils’ them to
1. To fix the clipping, I divided all values by the sum of the coefficients. This is
the maximum achievable value for a sample in the output signal. Here is a simple
amp function to apply an in-place scale to a vector of floats:
1 void amp ( std :: vector < float >& xs , float amp ) {
2 for ( int n =0; n < xs . size () ;++ n ) {// iterate over signal x
3 xs [ n ] = xs [ n ] * amp ;
4 }
5 }
6 ... then in main ...
7 std :: vector < float > y = conv (x , b ) ;
8 amp (y , 1/1.5) ;// scale by reciprocal of sum of 0.5 , 0.5 , 0.5
9 myk_tiny :: saveWav (y , 1 , 44100 , yfile ) ;
You could actually go a step further and implement a function to sum the
coefficients:
1 float sumCoeffs ( std :: vector < float >& coeffs ) {
2 float sum = 0.0;
3 for ( float & f : coeffs ) sum += f ;
4 return sum ;
5 }
Swap it out for the hard-coded value 1.5 and it will work for any set of coeffi
cients:
1 ...
2 // instead of
3 // amp (y , 1/1.5) ;
4 // do this :
5 amp (y , 1/ sumCoeffs ( b ) ) ;
6 ...
Earlier, I mentioned that this moving average filter causes a low pass effect
– it removes some of the high frequencies. We can verify that by looking at the
spectrum of the sound before and after. You can use Audacity to plot a spectrum
– use plot spectrum from the analyse menu. I used a Python script, which you can
find in the repository section 39.5.4. The result can be seen in figure 28.1. Note
Convolution 224
FIGURE 28.1
Original drum loop spectrum on the left, filtered version on the right. High fre
quencies have been attenuated in the filtered spectrum.
how the image on the right shows a dent starting at around 15,000Hz caused by the
moving average filter. Experiment with different coefficients and see what variation
you get in the filtered spectrum. Try adding more coefficients, negative coefficients
and so on. Can you create an echo effect somehow? The take-home message is
that you can make different filters and delay effects using various combinations of
coefficients.
It is beyond the scope of this book to fully explore all the possibilities con
volution can offer, but we will continue a little further. Firstly, we will see a
faster implementation of convolution based around the FFT. Then we will see a
real-time, JUCE plugin implementation of convolution based on DSP components
available in the JUCE library.
how long a reverb might last, e.g. if you clap you hands in a cave, 4.5s is not an
unreasonable impulse response.
1 std :: string xfile = "../../ audio / drums_16bit . wav ";
2 std :: string yfile = "../../ audio / drums_expt1 . wav ";
3 std :: vector < float > x = myk_tiny :: loadWav ( xfile ) ;
4 // how many seconds in the file
5 float f i l e_ l e n _ s e c o n d s = x . size () / 44100.0 f ;
6 // simple moving average low pass filter
7 for ( float i =1000; i <200000; i +=1000) {
8 std :: vector < float > b {};
9 for ( int j =0; j < i ;++ j ) {
10 b . push_back (0.5) ;
11 }
12 auto start = std :: chrono :: h i g h _ r e s o l u t i o n _ c l o c k :: now () ;
13 std :: vector < float > y = conv (x , b ) ;
14 auto stop = std :: chrono :: h i g h _ r e s o l u t i o n _ c l o c k :: now () ;
15 auto duration =
16 std :: chrono :: duration_cast < std :: chrono :: milliseconds >( stop
start ) ;
17 // duration is how long it took to process the entire audio file .
18 float du r_per_se cond = ( duration . count () /1000.0 f) /
f i l e _ l e n _ s e c o nd s ;
19 float d u r _ p e r _ c o e f f i c i e n t = dur_per_ second / b . size () ;
20 std :: cout << " With IR len " << b . size ()
21 << " conved " << f i l e _ l e n _ s e c o n d s
22 << " s file in " << ( duration . count () / 1000.0 f )
23 << " s " << std :: endl ;
24 }
I see output like this on my 10th gen Intel CPU Linux machine, using a ‘Re
lease’ version of the build, processing a 2.8s drum loop with varying length impulse
responses:
1 With IR len 1000 conved 2.82782 s file in 0.142 s
2 With IR len 2000 conved 2.82782 s file in 0.234 s
3 With IR len 3000 conved 2.82782 s file in 0.362 s
4 With IR len 4000 conved 2.82782 s file in 0.46 s
5 ...
6 With IR len 25000 conved 2.82782 s file in 2.675 s
7 With IR len 26000 conved 2.82782 s file in 2.885 s
8 With IR len 27000 conved 2.82782 s file in 2.889 s
9 With IR len 28000 conved 2.82782 s file in 2.929 s
This tells me that I can potentially run up to 25,000 coefficients in real time,
as it takes less time than the length of the file to process 25,000 coefficients. A
quick test on my basic 2020-model M1 Mac Mini showed that it can run about
27,000 coefficients (an impulse response of about 0.5s) in real-time. Not bad! But
a typical reverb effect can reverberate for much longer than 0.5s, so how to achieve
greater performance?
Convolution 226
as the time domain convolution. Try convolving signals with both techniques and
comparing the results by listening to them.
At this point, you should attempt to build the plugin. Depending on which IDE
you are using, take the appropriate steps to generate and build the IDE project.
Convolution 228
This gives you access to the actual processor from the effects chain so that
you can interact with it. You need to interact with it so you can tell it to load an
impulse response:
1 juce :: File impFile {"/ fullpath / to / an / im pu ls e re sp on s e / here "};
2 convolution . l o a d I m p u l s e R e s p o n s e ( impFile ,
3 juce :: dsp :: Convolution :: Stereo :: yes ,
4 juce :: dsp :: Convolution :: Trim :: no ,
5 0) ;
Make sure you set impFile to a path that exists. For now, we will hard-code it.
The following line calls loadImpulseResponse. The final option, ‘0’, tells it to keep
the complete impulse response. If you do not have an impulse response file, you
can create one by playing a loud click from your speaker and recording the result.
Impulse responses are also available from online sources such as York University’s
excellent OpenAIR repository3 .
This just passes on the reset message to the convolution processor. The final
3 https://openairlib.net/
Convolution 229
That code informs the convolution processor about the sample rate, block size
and channel count. Without digging into the JUCE convolution implementation, I
assume this will set up the FFT block size and such. Next, put this in processBlock
in PluginProcessor.cpp, removing anything else that is there:
1 S c o p e d N o D en o r m a l s noDenormals ;
2 // How many channels ?
3 const auto numChannels = jmax ( g e t T o t a l N u m I n p u t C h a n n e l s () ,
4 g e t T o t a l N u m O u t p u t C h a n n e l s () ) ;
5
6 // Convert the buffer we were sent by the plugin host ( buffer ) into
7 // an AudioBlock object that the convolution processor can understand
:
8 auto inoutBlock = dsp :: AudioBlock < float >
9 ( buffer ) . g e t S u b s e t C h a n n e l B l o c k (0 , ( size_t )
numChannels ) ;
10
11 // carry out the convolution ’ in place . ’
12 proce ssorCha in . process (
13 dsp :: ProcessContextReplacing < float > ( inoutBlock ) ) ;
Now compile, debug your errors and run it in your plugin host. This setup is
most useful for long impulses like reverberations. Once everything is running you
can try out some extensions and experiments. For a start, the GUI is essentially
non-existent. Can you add some controls? For example, a dry/wet mix is a common
control to find on a reverb unit. Can you add an option to load an impulse response
chosen by the user? Then how about controls to adjust the impulse response? For
example, you could have a ‘length’ control that somehow generates permutations
of the impulse response that are shorter or longer. Think about how you might
make a shorter version of a given impulse response – you could fade it out and
chop off the remainder, for example, or you could create a new version by skipping
every other sample. How about a longer version? Over to you!
Convolution 230
In this chapter, you will find out about infinite impulse responses and how they
differ from the finite impulse responses seen in the previous two chapters. We
will start with a low-level C++ implementation and then move towards a real-
time plugin implementation using JUCE. You will learn about filter coefficients
and frequency responses. You will see how you can use JUCE’s DSP module
functionality to implement IIR filters efficiently with preset coefficients for different
types of filters.
231
Infinite Impulse Response filters 232
FIGURE 29.2
Pole for pole, IIR filters generate much richer impulse responses. The left panel
shows a two-pole, FIR. The right pane shows a two-pole IIR.
effects in music production during the 1970s. Yoganathan and Chapman wrote an
interesting article relating this music production practice to electroacoustic music
practice occurring at a similar point in time but in a very different context[50].
Returning to the topic of digital signal processing, the thing to know about
IIR and feedback is that compared to FIR and convolution, it allows you to create
more drastic filtering and delay effects with far fewer coefficients. Figure 29.2
illustrates this, showing finite and infinite two-pole impulse responses, where the
infinite two-pole is far more complex. This means IIR filters are more efficient to
run.
As an example, in the early days of DSP-based effects for music production, it
was impossible to convolve complex, finite impulse responses such as room reverb
responses with signals in real time. Part of this was computational complexity and
part was the cost of the memory needed to store the impulse responses.
Instead of FIR filters, early real-time digital reverbs such as 1979’s famous
Model 224 from Lexicon, used IIR techniques extensively. Convolution and FIR
were only available for non-real-time use. It was not until 1990 that DSP hardware
was sufficiently fast and RAM sufficiently cheap to allow for convolutional reverbs.
The Sony DRE S777 was a £20,000, 2U rack-mount unit which could load impulse
responses from a CD-ROM and which Sound on Sound magazine described as
‘Bulky, heavy and hot’ but with ‘Stunningly believable room signatures’1 .
IIR filters are also used extensively in digital sound synthesisers where filters
with musical characteristics such as strong resonance peaks are desirable, as IIR
filters are excellent for these kind of applications. This is especially true when
1 https://www.soundonsound.com/reviews/sony-dre-s777
Infinite Impulse Response filters 233
FIGURE 29.3
Comparison of two pole FIR filter (left) and IIR filter (right). The IIR filter has
a more drastic response.
modelling classical analogue synthesiser filters, which are often based on feedback
circuits themselves. Figure 29.3 aptly illustrates the power of IIR filters compared
to FIR filters with a similar number of poles. You can see that the frequency
response of a simple two-pole IIR filter is far more drastic than that of a two-pole
FIR filter.
Music technologists should be familiar with and possibly live in fear of feedback
and its power. You only need to work with music technology for a short time before
you inflict painful feedback upon yourself or others. I personally ripped a pair of
headphones off my head just last week after a granular looping sampler effect
combined badly with a reverb effect in a SuperCollider patch I was working on.
Now, let’s adapt the code to continue computing the output array y for longer
to see the effect of the 0.9 feedback coefficient. In the following code, I computed
100 values of y, using zero padding for input x:
1 int main () {
2 float a = -0.9;
3 float b = 0.5; // feedback
4 float x [] = {0.1 , -0.1 , 0.2 , 0.5 , 0.25 , 0}; // signal
5 float y [100]; // output - we ’ ll run it for 100 samples
6 y [0] = 0;
7 for ( int n =1; n <100;++ n ) {
8 float xn = 0; // zero pad
9 if ( n < 6) xn = x [ n ]; // unless there x has a value for n
10 y [ n ] = a * y [n -1] + b * xn ;
11 printf (" x [% i ]=% f y [% i ]=% f \ n " , n , xn , n , y [ n ]) ;
12 }
13 }
To avoid creating a large, zero-padded input array x, I just decided if I should
obtain xn from x or set it to zero.
1 x [1]= -0.100000 y [1]= -0.050000
2 x [2]=0.200000 y [2]=0.145000
3 x [3]=0.500000 y [3]=0.119500
4 ....
5 x [26]=0.000000 y [26]=0.001718
6 x [27]=0.000000 y [27]= -0.001547
7 ....
8 x [97]=0.000000 y [97]= -0.000001
9 x [98]=0.000000 y [98]=0.000001
10 x [99]=0.000000 y [99]= -0.000001
Infinite Impulse Response filters 235
Even with one pole and only six non-zero inputs, you can see that the signal
continues to register after 100 steps. The working code for this program is described
in the repo guide section 39.5.8. Experiment with the code – what happens with
increasing values for coefficient a, for example?
You can experiment with this code yourself, for example, computing a longer
output signal. You could attempt to integrate the code with the tinywav example
code described in the code repo section 39.5.1 so you can process an entire audio
file with your IIR filter.
This equation looks beastly, but it expands from the one- and two-pole exam
ples above. Sometimes, a normalisation component is added, which is the recipro
cal of the sum of all the a and b coefficients. This is similar to how we normalised
the convolution before. Here is an implementation of the general IIR filter, which
you can find in the repo guide section 39.5.9
1 std :: vector < float > x = myk_tiny :: loadWav (
2 "../../ audio / drums_16bit . wav ") ;
3 std :: vector < float > as = {0.5 , 0.1 , 0.2};
4 std :: vector < float > bs = {0.1 , -0.7 , 0.9};
5 std :: vector < float > y ( x . size () , 0.0 f ) ;
6 for ( auto n = as . size () ;n < x . size () ; ++ n ) {
7 float yn = 0;
8 // weighted sum of previous inputs
9 for ( auto bn =0; bn < bs . size () ;++ bn ) {
10 yn += bs [ bn ] * x [ n - bn ]; // acting on input x
11 }
12 // weighted sum of previous outputs
13 for ( auto an =0; an < as . size () ;++ an ) {
14 yn -= as [ an ] * y [ n - an ]; // acting on output y
15 }
16 y [ n ] = yn ;
17 }
18 myk_tiny :: saveWav (y , 1 , 44100 , "../../ audio / iir_test . wav ") ;
FIGURE 29.4
Two types of IIR filter and their frequency responses. IIR filter design is a com
promise.
For example, say you want a band-pass filter. Band-pass filters have frequency
responses that permit a band of middle frequencies through, blocking low and
high frequencies. Once you have identified the frequency response you desire for
your filter, you must use filter design techniques to approximate that frequency
response. There are families of IIR filters, such as Chebyshev and Elliptical, and
they approximate the desired frequency response in different ways. Figure 29.4
presents two real IIR filter frequency responses. You can see how the Chebyshev
and Elliptical filters approximate the ideal band-pass filter in different ways – the
Chebyshev has more ‘ripple’ in the pass band with a flatter high and low response.
The Elliptical filter has a flat pass band response, with more ripple in the low and
high frequencies. You would choose an approach depending on your requirements.
That brief considering of filter design concludes our high-level introduction
of IIR filters. In the following section, I will show you some simple filter design
techniques based on the IIRCoefficients class in JUCE.
process is the same as for the JUCE convolution implementation from earlier. I
will describe that process again here, with the adjustments needed to create the
IIR plugin.
At this point, you should attempt to build the plugin. Depending on your IDE,
take the appropriate steps to generate and build the IDE project.
That code informs the processor chain about the sample rate, block size and
channel count. Then, it generates some coefficients for a low pass filter using
the statically callable makeLowPass function on the ‘Coefficients’ class. Statically
callable means we can call the function directly without instantiating a Coefficients
object first.
Now, put this into processBlock:
1 S c o p e d N o D en o r m a l s noDenormals ;
2 // How many channels ?
3 const auto numChannels = jmax ( g e t T o t a l N u m I n p u t C h a n n e l s () ,
4 g e t T o t a l N u m O u t p u t C h a n n e l s () ) ;
5
6 // convert the buffer we were sent by the plugin host
7 // ( buffer ) into an AudioBlock object that the
8 // convolution processor can understand :
9 auto inoutBlock = dsp :: AudioBlock < float >
10 ( buffer ) . g e t S u b s e t C h a n n e l B l o c k (0 , ( size_t )
11 numChannels ) ;
12
13 // carry out the convolution ’ in place . ’
14 proce ssorCha in . process (
15 dsp :: ProcessContextReplacing < float > ( inoutBlock ) ) ;
Now compile, debug your errors and run it in your plugin host. The most
obvious feature to add is a GUI to control the filter’s features. You should be able
to make calls from the GUI (PluginEditor) to the PluginProcessor to pass updated
Infinite Impulse Response filters 240
filter parameters. You could then store the updated parameters somewhere and
then update the coefficients the next time processBlock is called. That way, you
should not have a problem with the filter changing in the middle of a block being
processed. See if you can implement it.
In this chapter, I will guide you through the process of creating distortion effects
using waveshaping. You will start by implementing simple clip, relu, and sigmoid
waveshapers available in libtorch as command-line programs to demonstrate the
basic concepts. Then, you will create a basic JUCE waveshaping plugin that allows
you to reshape waveforms in real-time. At the end of the chapter, you will see how
you can combine waveshaping with the other DSP techniques seen in previous
chapters to construct a complete guitar amplifier emulator as a JUCE plugin.
241
Waveshapers 242
FIGURE 30.2
The effect of different waveshaper transfer functions on a sinusoidal signal. Top
row: transfer functions, middle row: sine wave signal after waveshaping, bottom
row: spectrum of waveshaped sine wave.
17 }
18
19 // clip pushes any values above a
20 // a threshold to a maximum value
21 float clip ( float input , float clip_value ) {
22 if ( std :: abs ( input ) > clip_value ) {
23 return ( input < 0) ? -1.0 f : 1.0 f ;
24 }
25 return input ;
26 }
Put together a program that uses the loadWav and saveWav functions we saw
earlier to load a WAV, pass it through one of these waveshaping transfer functions
and save it back out to disk as a new WAV file. Add the name of the function you
applied to the name of the file. Here is a main function to get you started.
1 int main () {
2 std :: vector < float > audio =
3 myk_tiny :: loadWav ("../../ audio / drums_16bit . wav ") ;
4 for ( auto i =0; i < audio . size () ;++ i ) {
5 audio [ i ] = relu ( audio [ i ]) ;
6 }
7 myk_tiny :: saveWav ( audio , 1 , 44100 ,
8 "../../ audio / waveshaped . wav ") ;
9 }
You can find a working version of this waveshaper in the code repository sec
tion 39.5.12. Experiment with the different waveshapers and some different types
of audio signals. How would you characterise the sound of the different transfer
functions? For a challenge, can you think of a suitable transfer function to imple
ment a rudimentary dynamic range compressor? Can you work out how to make
the transfer function stateful so your compressor has an attack then a release
time?
Go ahead and replace the ... with the implementations of the transfer functions.
The final step is to use the transfer functions to process the audio in the Plug
inProcessor’s processBlock function. Here is some minimal code to apply the clip
waveshaper to the incoming samples. Put this into processBlock in PluginProces
sor.cpp:
1 juce :: S c o pe d N o D e n o r m a l s noDenormals ;
2 auto inChannels = g e t T o t a l N u m I n p u t C h a n n e l s () ;
3 auto outChannels = g e t T o t a l N u m O u t p u t C h a n n e l s () ;
4
5 for ( int c = 0; c < inChannels ; ++ c )
6 {
7 auto * cData = buffer . g et Wr it e Po in te r ( c ) ;
8 for ( auto s =0; s < buffer . getNumSamples () ;++ s ) {
9 cData [ s ] = clip ( cData [ s ] , 0.1) ;
10 }
11 }
You can find a working version of this plugin in the code repository section
39.5.13. Experiment by adding parameters controls to the user interface, for exam
ple, to vary the clip parameter. You can also add user interface controls to switch
between different waveshaping transfer functions.
process but this method has the advantage of easier integration with other JUCE
DSP processor chain components such as IIR and FIR filters.
Here is an overview of the steps we will work through:
1. Prepare the plugin project, adding the JUCE DSP module to linker
instructions
2. Add a juce::dsp::ProcessorChain to the plugin
3. Set up the ProcessorChain with a waveshaping function in the plugin
constructor
4. Implement reset and prepareToPlay functions
5. Finally, pass the audio buffer to the ProcessorChain in the processBlock
function of the plugin
Verify that the default project builds and debug any issues as necessary.
This adds a ProcessorChain to the plugin which it can use to carry out the
waveshaping. Note that the code specifies the float data type for the chain. Now
you need to configure the WaveShaper processor so it calls a particular transfer
function. JUCE WaveShapers use lambda functions to carry out their processing.
Lambda functions are similar to anonymous functions in Javascript – essentially
Waveshapers 246
they allow you to provide a block of code to execute when needed. The following
code will set up the waveshaper with a transfer function that simply returns the
input:
1 // setup the wave shaper
2 auto & waveshaper = pro cessorCh ain . template get <0 >() ;
3 waveshaper . functionToUse = []( float x ) {
4 return x ;
5 };
If you have not seen a lambda (in C++) before, here is a quick breakdown of
the parts:
1 [] // the [] allows you to pass in a
2 // ’ scope ’ for the lambda , e . g . you
3 // can pass in ’ this ’
4 // and the lambda can
5 // see the P lu gi nP r oc es s or instance
6 ( float x ) // the next part specifies
7 // the parameters , in this case a float
8 {
9 // then the body of the
10 // function - the code that will execute
11 return x ;
12 };// don ’ t forget the semi colon
Over to you – can you take the code from the body of the clip function seen
above and use it to replace the ‘return x’ statement from the lambda? You might
wonder how to get access to a control parameter for the clip code, given the lambda
only has a single argument. For now, hard-code it, and we will come back to that
once we have the whole setup working.
1 void T e s t P l u g i n A u d i o P r o c e s s o r :: prepareToPlay
2 ( double sampleRate , int sa mp le s Pe rB lo c k )
3 {
4 // work out how many channels
5 const auto channels = jmax ( g e t T o t a l N u m I n p u t C h a n n e l s () ,
6 g e t T o t a l N u m O u t p u t C h a n n e l s () ) ;
7 // send a special ’ struct ’ to the
8 // pr ocessorC hain with the necessary config :
9 proc essorCha in . prepare ({ sampleRate ,
10 ( uint32 ) samplesPerBlock ,
11 ( uint32 ) channels }) ;
12
13 }
Note how there is a slight oddness where you have to organise the parameters
received by the prepareToPlay function and the channel information into a struct
with three properties since processorChain.prepare expects it.
FIGURE 30.3
Automatically generated generic UI for the waveshaper plugin.
Now in the constructor, instantiate the parameter and register it to the plugin
with ‘addParameter’:
1 addParameter ( clipThreshold = new juce :: A u d i o P a r a m e t e r F l o a t (
2 " clipThresh " ,
3 " Clip threshold " ,
4 0.0 f ,
5 1.0 f ,
6 0.1 f
7 ));
FIGURE 30.4
A sine wave passing through a series of blocks that emulate in a simplified way
the processing done by a guitar amplifier.
emulation? In a 2013 white paper, Fractal Audio Systems describe the process
ing blocks involved in a guitar amplifier and how to model them using DSP[42].
Fractal, the makers of the Axe-Fx series of high-end guitar amp emulators, de
scribe guitar amps as a series of filter and distortion blocks. Typically, the guitar
goes into a pre-amp which causes fixed EQ-like filtering and non-linear distor
tion. Which of the three DSP techniques IIR, FIR and waveshaping would you
use to model these blocks? Hopefully you answered IIR then waveshaper. Next,
the signal goes into the tone section which is typically an adjustable EQ – more
IIR filtering. Following that, the signal is distorted again as it is power-amplified
– waveshaping. Finally the signal is converted to sound waves via a speaker and
speaker cabinet. This last stage can be modelled by capturing an impulse response
from the speaker and cabinet and using FIR and convolution techniques.
Implementing a full model of a guitar amp like the one described by Fractal
Audio Systems is beyond the scope of this part of the book. This part of the book
is intended to be a short introduction to DSP techniques in support of the later
chapters about neural DSP. But frankly building amp models is a lot of fun, so
here follows some instructions on how to build a simple version of a multt-stage
amp model. It shows you how to construct a chain of JUCE DSP processors to
carry out waveshaping, IIR filtering then convolution to model the pre-amp, tone
stage and speaker cabinet, respectively.
Note that I have broken it apart in the layout of the code so you can see that
the processor chain presently consists of a single WaveShaper. Let’s add some more
processors. I am going to create the sequence of processors shown in figure 30.4.
These blocks consist of a waveshaper to emulate an overdriven pre-amp, a high
pass filter to emulate a simple tone effect and a convolution processor to emulate
the behaviour of the speaker cabinet.
1 // set up types for filter module
2 using Filter = juce :: dsp :: IIR :: Filter < float >;
3 using FilterCoefs = juce :: dsp :: IIR :: Coefficients < float >;
4
5 juce :: dsp :: Processo rChain
6 <
7 // stage 1: pre - amp distortion waveshaper
8 juce :: dsp :: WaveShaper
9 < float , std :: function < float ( float ) >>,
10
11 // stage 2: tone control IIR filter
12 // wrapped in a duplicator to make
13 // it stereo
14 juce :: dsp :: ProcessorDuplicator < Filter , FilterCoefs > ,
15
16 // stage 3: speaker cab emulattion
17 // FIR convolution
18 juce :: dsp :: Convolution
19 >
20 proc essorCha in ;
The code specifying the IIR filter is longer than the other two modules be
cause the IIR filter cannot operate in stereo. So the IIR filter is wrapped in a
ProcessorDuplicator which converts it to stereo.
We will need to be able to access the three items in the chain later and this is
done via the get function and a numerical index. We saw this in the earlier code
where we called get(0) on the processorChain to gain access to the waveshaper,
but in that code the waveshaper was the only module. A convenient syntax for
accessing the modules in the chain is to add this enum to the private section of
PluginProcessor.h:
1 enum {
2 ws_index ,
3 iir_index ,
4 conv_index
5 };
That code creates a thing called ws index with a value of 0, iir index with a
Waveshapers 252
value of 1 and conv index with a value of 2. You will be using those names in
the next step. Now let’s configure the processing blocks. In PluginProcessor.cpp,
constructor section add this code, which sets up the waveshaper and the FIR
convolver. Note that it does not set up the IIR filter yet because we need to know
the sample rate to configure the coefficients on the IIR filter and we do not know
the sample rate until prepareToPlay is called:
1 // setup the wave shaper
2 auto & waveshaper = pro cessorCh ain . template get < ws _index >() ;
3 waveshaper . functionToUse = [ this ]( float x ) {
4 float clip = this - > clipThreshold - > get () ;
5 if ( std :: abs ( x ) > clip ) {
6 return ( x < 0) ? -1.0 f : 1.0 f ;
7 }
8 return x ;
9 };
10
11 // load the impulse response into the convolver .
12 auto & convolution = pr ocessor Chain . template get < conv_index >() ;
13 juce :: File impFile {" path / to / your / wav / here "};
14 convolution . l o a d I m p u l s e R e s p o n s e (
15 impFile ,
16 juce :: dsp :: Convolution :: Stereo :: yes ,
17 juce :: dsp :: Convolution :: Trim :: no ,
18 0) ;
For brevity I removed some file checking code from the example above, but you
can see it in full in the code repository section 39.5.15. To set up the IIR filter,
add the following code to the prepareToPlay function in PluginProcessor.cpp:
1 // setup the IIR filter - in this case , a 200 Hz high pass
2 auto & filter = pr ocessorC hain . template get < iir_index >() ;
3 filter . state = FilterCoefs :: makeHighPass (
4 getSampleRate () , 200.0 f ) ;
5
6 // Tell the processor about the audio setup
7 const auto channels = jmax ( g e t T o t a l N u m I n p u t C h a n n e l s () ,
8 g e t T o t a l N u m O u t p u t C h a n n e l s () ) ;
9 proce ssorCha in . prepare ({ sampleRate ,
10 ( uint32 ) samplesPerBlock ,
11 ( uint32 ) channels }) ;
The first part configures the coefficients for a high pass ‘tone’ filter. The second
part informs the processor chain how the audio is configured. Build and test.
1 S c o p e d N o D en o r m a l s noDenormals ;
2 const auto numChans = jmax ( g e t T o t a l N u m I n p u t C h a n n e l s () ,
3 g e t T o t a l N u m O u t p u t C h a n n e l s () ) ;
4 auto inoutBlock = dsp :: AudioBlock < float >( buffer )
5 . getSubsetChannelBlock
6 (0 , ( size_t ) numChannels ) ;
7 proce ssorCha in . process (
8 dsp :: ProcessContextReplacing < float > ( inoutBlock )
9 );
That code figures out the number of channels then uses the result to configure
the audio data block for the processor chain, eventually calling process on the
chain.
Now build the project and enjoy playing your favourite
instruments through it. There are all kinds of extensions
you can carry out here. The first thing I did was to capture
a real impulse response from a little ‘Orange’ guitar amp
I had in my studio. To do this, I connected the output of
my sound card to the aux input of the amp. This is the
closest you can get to passing the signal cleanly through
FIGURE 30.5 the speaker on this amp. I played a repeating click into the
Capture an im amp and recorded the output (the click playing through the
pulse response for amp) back into my sound card using a mic. You can see
the convolutional the setup in figure 30.5. Next I edited the recorded impulse
cabinet simulator. response down to a single click and the response in an audio
editor. I faded out the signal at the end. I then loaded the
impulse response into the convolution module in the plugin. I was quite pleased
with the result when I played a guitar through the setup. The impulse response
is in the audio folder in the code repository for you to try. You can then explore
the other components; for example, you could convert the high pass IIR filter into
a peak filter and add some plugin parameters to control it, just as we did for the
clip control. Following that, you can add more modules, for example, a FIR-based
reverb at the end of the chain.
In this chapter, I will give you a high-level overview of the process of creating a
guitar amplifier emulation using a neural network. Through the process of training,
neural networks can learn to emulate the complex non-linear behaviours of tube
amplifiers and other guitar gear. I will explain key concepts like how training data
guides the network parameters and how back-propagation enables the network to
learn. At the end of the chapter, you will have developed a greater understanding
of training a neural network to model a guitar amplifier, ready for the practical
implementation in the following chapter.
254
Introduction to neural guitar amplifier emulation 255
late a specific, high-level part of a real amplifier, e.g. the pre-amp or the speaker
cabinet.
There are established techniques for white, grey and black-box modelling, but
they have limitations which neural networks can overcome. Typical black-box
models are not necessarily able to cope with the highly non-linear behaviour of
certain circuits found in guitar amplifiers, especially when they include valves.
Low-level white-box circuit modelling, whilst it can model complex, non-linear
signal processing behaviours, requires knowledge and understanding of the cir
cuits in a given device, making its implementation complex and highly skilled.
The resulting models might also be computationally expensive and challenging to
deploy in a music technology context. Grey-box models are more manageable than
white-box models to design as they abstract some of the details of the low-level
circuitry but still require complex signal processing chains. Grey-box models can
also suffer from the limitations of their black-box sub-components.
Neural networks are powerful black-box models that can model complex non
linear behaviours of many different circuits. So, they capture some of the advan
tages of white and black-box models. A fundamental feature of neural networks
is that they learn to emulate circuits via a process called training. Given an in
put, such as a clean guitar signal and an output, such as a distorted guitar signal
recorded from an amplifier, a neural network can learn the transformation from
input to output and generalise it to any input. This is conceptually similar to the
idea of convolution seen before, but it goes further because it can model a much
greater range of audio transformations than simple convolution can.
target output
Superamp
Input data 2
network output
Compute
error
3
Adjust network
parameters Back propagate error
FIGURE 31.2
Four stages to train a neural network. 1: send the test input through the device
(e.g. amp) you want to model, 2: send the test input through the neural network, 3:
compute the error between the output of the network and amp, 4: update network
parameters to reduce error using back-propagation. Back to stage 2.
1. Send the test input through the device you want to model and capture
the output
2. Send the test input through the neural network
3. Compute the error between the output of the network and device you
are modelling
4. Update the network parameters to reduce the error using back-
propagation
5. Back to 2.
Introduction to neural guitar amplifier emulation 258
When training neural networks to process audio in specific ways, e.g. to emulate
a guitar amp, we start with stage one from figure 31.2, preparing examples of
inputs and the resulting outputs we would like. This is called the training data.
Again, we are not too far from classical DSP here – capturing the input and output
is similar to capturing an impulse response for an FIR filter.
Figure 31.1 shows the setup I used to capture training data from a small guitar
amplifier I wanted to model. I played a particular test signal through the amp
consisting of about a minute of varied guitar playing. I then captured the output
either from the line-out of the amp (for pre-amp and tone modelling) or using
the mic (for pre-amp, tone and speaker cab modelling). The test signal becomes
the training input, and the recorded output becomes the training target. I want
the neural network to process audio in the same way as the amp does. But wait
– I should be careful what I say here. In fact, this is a black-box model, so I do
not care how the amp goes about processing the audio. I only care about how the
audio changes due to passing through the amp.
Stage two of the four-stage training process from figure 31.2 passes the training
data inputs to the network and stores the outputs. Stage three compares them to
the training data outputs and computes the error between the two. The training
program uses a loss function to calculate the error between the output from the
network and the correct output. There are many ways to measure the loss; for
example, in audio applications, you might compare the spectrum of the desired
output to the achieved output and calculate how different they are.
Once you have computed the ‘error’ using the loss function, stage four involves
a famous algorithm called back-propagation. ‘Back-prop’ adjusts the parameter
settings of the neural network – the loss is back-propagated through the network.
Then, the training program returns to stage 2, passing the test signal into the
network again. In this way, the network learns to approximate the transformation
from the input to the output signal of the original amplifier.
Usually, the dataset is broken into batches and the network parameters are
updated after each batch is passed through the network. So, as the training pro
gram works through the dataset, feeding it through the network and measuring
the loss, it makes multiple adjustments to the neural network parameters. Once
it has worked its way through the entire dataset, e.g. the 1 minute of audio in
my guitar amp modelling example, we say that it has completed a training epoch.
Training typically runs for several epochs.
As a side note, the idea of batch processing in parallel was one of the big ideas
that kick-started the deep learning revolution in the noughties. Parallel batch
processing involves running multiple instances of the neural network in parallel,
passing a different input through each instance, and simultaneously computing
the errors. Graphics Processors are very good at computing these kinds of parallel
jobs. This parallel processing is what enables training over large datasets on large
computing clusters. Luckily, amplifier modelling does not need a large dataset –
Introduction to neural guitar amplifier emulation 259
just one or two minutes of audio. So we can train our networks on more easily
accessible hardware.
In this chapter, you will put the previous chapter’s theory into practice and start
work on an LSTM model that can process audio signals. You will begin by setting
up a Python environment suitable for creating and testing LSTMs. You will make
some LSTMs and learn to pass audio signals through them in Python. Then, you
will learn how to use TorchScript to export models from Python in a format that
can be imported into a C++ program via libtorch. Once your LSTM is working in
C++, you will see how to use it to process an audio signal read from a WAV file. At
the end of the chapter, you will experiment with the performance of different-sized
models to see which can be used in real-time scenarios.
261
Neural FX: LSTM network 262
If you use Anaconda, run the appropriate commands or use the UI to con
figure and activate your Anaconda virtual environment. Now, inside the virtual
environment, install some packages – here are my preferred starter packages:
1 pip install torch librosa scipy numpy ipython jupyter
Go ahead and run the script. Here is what I see when I run it:
Neural FX: LSTM network 263
That number is the output from the LSTM in response to the 1.0 we sent in.
What about the other variable that came back, hx? That is the hidden state of
the LSTM after we passed in the 1.0. Think of the LSTM like an audio delay
effect – after you pass in a signal, the delay effect holds on to that signal and
repeats it. So, the delay effect changes its state after you pass a value in, and so
does the LSTM. The PyTorch LSTM layer is designed so you can choose to reset
it or keep its previous state each time you pass in a number. So, if you hold on to
that hidden state value hx, you can pass it into the network next time, and it will
remember what state it was in after receiving the first value.
We will come back to that feature later. Right now, I want you to see what
happens when you pass a longer signal into the network. You can actually pass
a sequence of values in and receive the output sequence all at once. Try this in a
fresh script:
1 import torch
2 torch . manual_seed (10)
3 my_lstm = torch . nn . LSTM (1 , 1 , 1)
4
5 # make an input
6 in_t = torch . tensor ([[1.0] , [0] , [0] , [0] , [0] , [0]])
7 # pass it through the LSTM layer
8 out_t , hx = my_lstm . forward ( in_t )
9 # print the results
10 print ( out_t )
Do you remember what the name is for that particular signal? Hopefully, you
said ‘the impulse signal’. I see the following result:
1 [[0.0318] ,
2 [0.1224] ,
3 [0.1459] ,
4 [0.1563] ,
5 [0.1615] ,
6 [0.1642]]
When I run that script, I see the output pegging at 0.1678 after about 20 zeroes
have gone through the network. To me, this looks like a DC offset. The network
responds to the signal and eventually settles down, but with a constant DC offset.
Try commenting out the call to manual seed so the network is random each time
you run the script. Run the script a few times. The network always settles on a
constant, but each time it is different.
0.5 → scalar
0.5 0.1 0.25 → vector
0.5 0.1 0.25
→ matrix
0.05 0.9 0.4
0.5 0.1 0.25 0.5 0.1 0.25
→ tensor (32.1)
0.05 0.9 0.4 0.05 0.9 0.4
Most of the work in that script was synthesising the sine tone (lines 8–13) and
reshaping it for input to the LSTM (lines 17–19). You can see the complete script
in the code repository section 39.5.16.
Now, let’s analyse what the LSTM did to the sine wave regarding the time and
frequency domain of the signal. Figure 32.1 shows the 400Hz sine tone before and
after it passed through the LSTM in the time and frequency domains. You can see
that the sinusoidal shape has been ‘pinched’ at its low point and widened at its
high point. This results in the creation of additional harmonics. So you can see that
Neural FX: LSTM network 266
FIGURE 32.1
What does our simple, random LSTM do to a sine wave? It changes the shape of
the wave and introduces extra frequencies.
even with one LSTM unit, we are already creating the kind of harmonic distortion
that guitar players wax lyrical about when discussing their valve amplifier setups.
2 [ -0.3582] ,
3 [ -0.0454] ,
4 [ 0.2963]])
5 tensor ([[ -0.9843] ,
6 [ -0.2911] ,
7 [ 0.1443] ,
8 [ 0.1551]])
9 tensor ([ -0.9655 , 0.1192 , -0.1923 , 0.2199])
10 tensor ([ 0.6780 , 0.5214 , -0.7319 , -0.6145])
I counted four distinct parameter tensors there, but each parameter is made
of four numbers. So, for an LSTM layer with a single unit, we have 16 trainable
parameters. As noted above, the LSTM has four distinct components, and those
four sets of numbers dictate the behaviour of each of those components, eventually
resulting in the LSTM subtly remembering some things and forgetting others. As
the number of units in the layer increases, the number of parameters increases,
and not in a linear way, as each internal parameter interacts with several of the
others:
1 Units : 1 params : 16
2 Units : 2 params : 40
3 Units : 3 params : 72
4 Units : 4 params : 112
At this stage, it is not necessary to go into more detail about how LSTMs work;
having a concept of LSTMs as clever memory devices with adjustable parameters is
sufficient. Now you should experiment with passing signals through different-sized
LSTM networks. If you use the seed setting capability you can restore networks on
subsequent runs if you find any that do interesting things to the signal. You will
find a helpful notebook I provided – check out section 39.5.17 in the repo guide.
That’s pretty straightforward. We have to pass some data through the network
using the trace function, and then we receive a traced network, which we can save
to a TorchScript (.pt) file.
This code replicates the Python example we saw earlier, which passes the im
pulse signal through the network. One oddity of imported TorchScript models
which makes them different from directly defined models is the need to pass in the
tensor as an IValue vector. IValues provide a generic way to pass data into Torch-
Script models. The models also return IValue data but not a vector of IValues.
Here is the first part of the output on my machine:
1 ( 0.0318
2 0.1224
3 0.1459
4 ...
This is great as it looks very similar to the output received from the original
model running in Python. The complete, working code is in the repo guide section
39.5.18.
Then edit the includes in your main.cpp to include the tinywav header:
1 # include "../../ tinywav / myk_tiny . h "
Run the build process to verify your build is working. Then, add these lines to
your main function to 1) load a WAV file, 2) convert it to a tensor, 3) reshape it,
and 4) convert it to an IValue vector.
Neural FX: LSTM network 270
FIGURE 32.2
The steps taken to process a WAV file with a neural network through various
shapes and data formats.
Refer back to figure 32.2 for the steps leading up to entry to the LSTM to
clarify what form the data is in each stage. Run that code; if it works, you have
prepared the data for processing by the network. The most likely problem you will
encounter is having the wrong filename for the WAV file or even the model – try
using an absolute file path instead of a relative one.
Now, to feed the data to the neural network and save the result to a new WAV
file:
1 // feed the inputs to the network
2 torch :: jit :: IValue out_ival = my_lstm . forward ( inputs ) ;
3
4 // convert the return to a tuple , then extract its elements
5 auto out_elements = out_ival . toTuple () -> elements () ;
6
Neural FX: LSTM network 271
Again, refer back to figure 32.2 to see how the data is transformed at each
step. If you are an experienced C++ developer, your C++ ‘spidey-senses’ might
be telling you to worry about the memory allocation that might be going on during
this process. You will see from my comments in the code that the main place I
make an effort with memory allocation is the conversion to and from ‘libtorch
world’. So, I avoid copying the vector when I convert it to a tensor at the start,
and I avoid copying the tensor when I convert it to a vector for saving at the end.
Memory efficiency is a challenging and deep subject when working with libtorch.
For example, how do you know that those internal processes in the LSTM model
are memory efficient when the model was initially written in Python and is opaque
from the C++ side? This is quite a deep topic, and we will see how to use a more
efficient alternative to libtorch called RTNeural later. My take on optimisation
is that it generally makes the code more complex and difficult to maintain and
understand. Therefore, I try not to do anything silly in my code that really slows
things down, but I prefer simple, understandable code. I work on the basis that
optimisation is only necessary when the software does not run fast enough for the
task at hand. So, let’s test how fast the code runs.
FIGURE 32.3
Time taken to process 44,100 samples. Anything below the 1000ms line can poten
tially run in real-time. Linux seems very fast with low hidden units, but Windows
and macOS catch up at 128 units.
Then, there is a C++ program defined in main.cpp in the example that runs a
one-second test signal through each of the exported models and times it using the
std chrono library. I will not go into the details of the code here as it is very similar
to the previous code example. You should open up main.cpp and take a look. You
will see that it times the complete process, starting from having a signal in a float
vector through to receiving a vector of floats representing the signal after passing
through the LSTM via the processes shown in figure 32.2. Figure 32.3 shows the
results of running the program on my development machines: an Intel 10th gen
Linux machine, an Intel 10th gen Windows machine and an M1 Mac Mini. The
Linux machine can run up to 256 LSTM units in real-time. The Mac Mini can run
up to 128 units in real-time, but it seems to have a heavier overhead on all sizes
than the Intel CPU machine – even one LSTM unit takes 658ms on the Mac. The
Mac is running a self-compiled libtorch 2.1. In a real-world scenario, there would
be more processing going on here, but this simple program gives us an insight into
the kind of networks we can expect to use in a real-time audio application. We
will revisit the question of performance later in this part of the book when we
deploy trained LSTM in plugins using libtorch and RTNeural.
Neural FX: LSTM network 273
In this chapter, you will build on your work with LSTMs in the previous chapter,
eventually wrapping up the LSTM model in a JUCE plugin that can pass audio
from the plugin host through the LSTM in real-time. First, you will convert your
WAV processing C++ program to a block-based model. You will learn that block-
based processing causes a problem wherein the state of the LSTM is reset, leading
to audible glitches in the sound. You will learn how to solve the problem by
retaining the state of the LSTM between blocks. This will involve returning to the
Python code to trace the model with more parameters and learning which data
structure to use to represent the LSTM state in your C++ program. Once the
command-line WAV processing program works without generating artefacts, you
will convert that program into a JUCE plugin and see how you can process audio
received from a plugin host through an LSTM model.
274
JUCE LSTM plugin 275
4 // load WAV
5 std :: vector < float > signal
6 = myk_tiny :: loadWav ("../../ audio / s ine_400_ 16bit . wav ") ;
7 // setup vector to store processed signal
8 std :: vector < float > outSignal ( signal . size () ) ;
9
10 // setup blocks for processing
11 int blockSize = 1024;
12 std :: vector < float > inBlock ( blockSize ) ;
13 std :: vector < float > outBlock ( blockSize ) ;
14 // loop through jumping a block at a time
15 for ( auto s =0; s + blockSize < signal . size () ; s += blockSize ) {
16 // copy signal into inBlock
17 std :: copy ( signal . begin () + s ,
18 signal . begin () + s + blockSize ,
19 inBlock . begin () ) ;
20 processBlock ( my_lstm , inBlock , outBlock ) ;
21 // copy outBlock to outSignal
22 // ( won ’ t need to do that in a real - time situation )
23 std :: copy ( outBlock . begin () ,
24 outBlock . end () ,
25 outSignal . begin () + s ) ;
26 }
27 // save to WAV
28 myk_tiny :: saveWav ( outSignal , 1 , 44100 , " test . wav ") ;
Can you test the program with some audio files? I am testing it with a 400Hz
sine wave tone – I suggest you do the same and listen to the output carefully.
You can find a fully working version of the block-based program in the repo guide
section 39.5.21.
FIGURE 33.1
Block-based processing leads to unwanted artefacts in the audio. The left panel
shows the output of the network if the complete signal is processed in one block.
The right panel shows what happens if the signal is passed through the network
in several blocks. The solution is to retain the state of the LSTM between blocks.
0.5 → 0.0575
0.1 → 0.0836
[0.5, 0.1] → [0.0575, 0.1191] (33.1)
The crucial thing to notice is that 0.1 produces a different output when it
comes after 0.5 in a single block. This tells you that the LSTM is stateful – when
you pass in a block, it runs through each number in turn but remembers its state
between them. At the end of the block, it resets its state. Luckily, it is possible
to store the state then to send it back next time you call ‘forward’. Notice how I
JUCE LSTM plugin 278
store the second value returned from the call to forward and then pass it into the
second call to forward in the following code:
1 input = torch . tensor ([[0.5]])
2 output , state = my_lstm . forward ( input )
3 print ( output )
4
5 input = torch . tensor ([[0.1]])
6 output , state = my_lstm . forward ( input , state )
7 print ( output )
8
9 input = torch . tensor ([[0.5] , [0.1]])
10 output , _ = my_lstm . forward ( input )
11 print ( output )
This time, the result from the individually passed values is the same as the
values passed in a block:
0.5 → 0.0575
0.1 → 0.1191
[0.5, 0.1] → [0.0575, 0.1191] (33.2)
So, the secret to retaining the state between calls to forward is to store the
returned state and send it back to the next call to forward. We shall see shortly
how to create the state data in the appropriate form in C++, which is not a
well-documented operation in libtorch for C++ but is well described in the LSTM
examples for Python. Before we do that, though, we need to fix the TorchScript
model.
When we trace the model, we simply pass it an input, not any state data. Will
the resulting TorchScript model accept state data? Let’s try it:
1 torch . manual_seed (21)
2 my_lstm = torch . nn . LSTM (1 , 1 , 1)
JUCE LSTM plugin 279
In this case, I generated random data for inputs and state. Then, I used those
values in the trace. I now have a TorchScript model that is ready to receive state
data. Debugging models like this in Python is much easier than in C++, as the
errors are a little clearer, and the debugging cycle is faster. I know this because
JUCE LSTM plugin 280
FIGURE 33.2
Breakdown of the data type used to store LSTM state.
when I was developing the code for this part of the book, I debugged in C++,
which was slower and more difficult than Python would have been.
The LSTM networks we will work with only have one layer, but we will exper
iment with more hidden units, as we did when testing performance.
1 LSTMState p r o c e s s B l o c k S t a t e (
2 torch :: jit :: script :: Module & model ,
3 const LSTMState & state ,
4 std :: vector < float >& inBlock ,
5 std :: vector < float >& outBlock , int numSamples ) {
6
7 torch :: Tensor in_t = torch :: from_blob ( inBlock . data () , { static_cast <
int64_t >( numSamples ) }) ;
8 in_t = in_t . view ({ -1 , 1}) ;
9 std :: vector < torch :: jit :: IValue > inputs ;
10 inputs . push_back ( in_t ) ;
11 inputs . push_back ( state ) ;
12
13 torch :: jit :: IValue out_ival = model . forward ( inputs ) ;
14 // copy to the outBlock
15 auto out_elements = out_ival . toTuple () -> elements () ;
16 torch :: Tensor out_t = out_elements [0]. toTensor () ;
17 out_t = out_t . View ({ -1}) ;
18 float * data_ptr = out_t . data_ptr < float >() ;
19 std :: copy ( data_ptr , data_ptr + inBlock . size () , outBlock . begin () ) ;
20 // now retain the state
21 return out_elements [1]. toTuple () ;
22 }
The main thing to note in that function is line 11, where we simply add the
state value to the ivalue vector after adding the input. TorchScript will unpack
that vector into a list of arguments for the forward function. Then, on the last
line in the function, we extract the state after processing the input and return it.
When I run this code, the artefacts in the signal at the start of each block are
gone. Over to you – can you get this working in your program? Experiment with
different network architectures – does it still work with more hidden units? Can
you compare performance between the block-based and one-shot versions of the
code? Any bottlenecks here?
Add these variables for the LSTM model and its state to the private section
of PluginProcessor.h:
1 torch :: jit :: script :: Module lstmModel ;
2 LSTMState lstmState ;
Compile and run to verify you can build against the TorchScript components.
Now add the prototype for the getRandomStartState function from the com
mand line program to the private section of your PluginProcessor.h file:
1 LSTMState g e t R a n d o m S t a r t S t a t e ( int numLayers , int hiddenSize ) ;
FIGURE 33.3
The LSTM plugin running in the AudioPlugHost test environment, with an oscil
loscope showing a sine wave test tone before and after LSTM processing.
JUCE LSTM plugin 285
So the trick will be converting that juce::AudioBuffer into a format that the
processBlockState function can tolerate. The good news is – we already have the
two torch-related arguments model and state. We just need an input vector of
floats and an output vector of floats. We need to pay some attention to memory
assignment here – we do not want to be allocating the memory for buffers of floats
in the middle of the audio loop if we can avoid it (noting the limited control we
have over the innards of the TorchScript model).
My solution is to create the input and output blocks when prepareToPlay is
called – we will know the size of the buffer then. Then, I will use std::copy to
copy data from the JUCE AudioBuffer to and from my vector buffers. So – in
PluginProcessor.h, add the following vectors to the private section:
1 std :: vector < float > inBuffer ;
2 std :: vector < float > outBuffer ;
Compile and run to check for any mistakes. Now, to get the data from JUCE’s
processBlock via these vectors into our processBlockState function. In PluginPro
cessor.cpp, processBlock function, add the following code, which will copy the
samples from the incoming block to the inBlock vector:
1 for ( int channel = 0; channel < t o t a l N u m I n p u t C h a n n e l s ; ++ channel )
2 {
3 auto * input = buffer . getReadP ointer ( channel ) ;
4 std :: copy ( input , input + inBuffer . size () , inBuffer . begin () ) ;
5 p r o c e s s B l o c k S t a t e ( lstmModel , lstmState ,
6 inBuffer , outBuffer ,
7 buffer . getNumSamples () ) ;
8 auto * output = buffer . g et W ri te Po i nt er ( channel ) ;
JUCE LSTM plugin 286
Now, compile and run the plugin inside the test host. Try passing a sine signal
through it and listening to the result. Figure 33.3 shows the plugin running in
the AudioPluginHost test environment. In that figure, I am feeding a test signal
through the LSTM and using an oscilloscope plugin from the Linux Studio Plugins
project1 to display the signal before and after processing through the LSTM.
You can make all kinds of extensions to the plugin, but you might want to
hold off until you have found out how to train the model to process the sound in
a particular way, which you will learn in the next chapter.
1 https://lsp-plug.in
34
Training the amp emulator: dataset
In this chapter, I will lead you through the components of the Python training
script I have created to train LSTM models to emulate distortion effects and
amplifier circuits. The script is based partly on open-source code by Alec Wright,
who worked extensively on guitar amplifier modelling with neural networks in
2019–2020. I have rewritten much of Wright’s code to align it with current practice
in PyTorch programming. My approach in this chapter is to review an existing
code base instead of showing you how to create the complete script line-by-line as
we did for the C++ code. The main components are the LSTM model, the data
loader and the training loop. Along the way, you will find out how to monitor
training progress using tensorboard, how to save models for further training later
and how to manage training on CPU and GPU devices. At the end of the chapter,
I will show you how to import the trained model into a JUCE plugin.
287
Training the amp emulator: dataset 288
target output
Superamp
Input data 2
network output
Compute
error
3
Adjust network
parameters Back propagate error
FIGURE 34.2
Four stages to train a neural network. 1: send the test input through the device
(e.g. amp) you want to model, 2: send the test input through the neural network, 3:
compute the error between the output of the network and amp, 4: update network
parameters to reduce error using back-propagation. Back to stage 2.
In step 3, you compute the error between the network output and the ampli
fier’s output. But how will you compare the two output signals? How can you make
sure that the network is improving in the right way over time and that it does not
learn any strange ‘error-avoiding’ solutions? Seasoned machine learning engineers
reading this will be familiar with the problem of error function exploitation, where
the neural network learns to achieve a low error without actually doing what you
want it to do! For some entertaining examples of systems exploiting error func
tions by developing unwanted behaviours, in this case, evolutionary algorithms, I
recommend looking at Lehman et al.’s paper[25].
Then, there are stages 4 and 5, where you will use the error to update the
network parameters before returning to stage 2. There are various parameters and
settings which control the behaviour of the training process, e.g. how much you
should adjust the network parameters each time (learning rate), how many inputs
the network should process between parameter updates (batch size) and so on.
In the following sections, I will revisit these questions and explain how the
training script addresses them.
Training the amp emulator: dataset 289
At the time of writing, I am using the versions of those packages shown in the
list below.
If you install these packages and you have some problems running the script,
I recommend that you report these problems as issues on the book’s GitHub
repository. If you want to install a specific version of a package, you can use a pip
command like this:
1 pip install torch ==2.1.0
The files for the Python training program should be in a ‘python’ sub-folder
in the example code. In that folder, you should see the following files:
When you run the script, if you see error messages, read them carefully and
fix the problems. Common errors are ModuleNotFound, meaning you have not
installed a Python module and assertion errors relating to the location of the
audio files. The default script assumes there is a folder two levels up with a data
and audio ht1 sub-folder, as per line 2 in the output above. That is where the
training data is located, and you should have received an audio ht1 folder with
the correct data in it when you downloaded the code for the book. I will go into
more detail about preparing training data shortly. At this stage, you should be
able to run the train.py script and see an output similar to that shown above.
Read any errors carefully and work to resolve them before continuing.
34.2.1 Tensorboard
The training program is set up to use tensorboard. Figure 34.3 shows tensorboard
in action. It is a machine learning dashboard that lets you observe the progress of
training runs using your web browser. If you run the training script, you will see
it creates a folder called ‘runs’. Each time you run the script, it will create a new
sub-folder in the runs folder. Here is an example of the file structure made in the
runs folder after running train.py twice:
1 | - - Oct24_16 -22 -50 _yogurt52 ht1 LSTM model with 32 hidden units
2 | | - - events . out . tfevents .1698160970. yogurt52 .1869590.0
3 | ‘-- saved_models
4 | | - - 1. wav
5 | | - - 3. wav
Training the amp emulator: dataset 291
2 3
FIGURE 34.3
Tensorboard is a web-based machine learning dashboard. Here, you can see a list
of training runs (1) and graphs showing training progress in terms of training (2)
and validation (3) errors on two separate runs.
6 | | - - l s t m _ s i z e _ 3 2 _ e p o c h _ 1 _ l o s s _ 0 .7031. pth
7 | | - - l s t m _ s i z e _ 3 2 _ e p o c h _ 3 _ l o s s _ 0 .6968. pth
8 | ‘-- r t n e u r a l _ m o d e l _ l s t m _ 3 2 . json
9 ‘-- Oct24_16 -23 -03 _yogurt52 ht1 LSTM model with 32 hidden units
10 | - - events . out . tfevents .1698160983. yogurt52 .1869961.0
11 ‘-- saved_models
12 | - - 1. wav
13 | - - l s t m _ s i z e _ 3 2 _ e p o c h _ 1 _ l o s s _ 0 .756. pth
14 ‘-- r t n e u r a l _ m o d e l _ l s t m _ 3 2 . json
If you run tensorboard as follows in the folder with the runs sub-folder in it:
1 tensorboard -- logdir runs
You will see a URL printed to the console, such as http://localhost:6006/. If
you open that URL in your browser, you should see something like the user inter
face shown in figure 34.3. Try running the train.py script in one terminal window
and then running the tensorboard command in another. Watch the tensorboard
dashboard in your web browser, and you should see the progress of the training
run. I will explain more about the files in the runs folder shortly. Still, as a quick
insight, the pth file is a snapshot of the trained model at a certain point in time,
the JSON file is an exported model in RTNeural format (more on RTNeural later),
and the WAV files are examples of test data being run through the network at the
time the pth file was saved.
Training the amp emulator: dataset 292
FIGURE 34.4
Spectrogram of the Atkins training signal ‘v2 0 0.wav’. You can see the signal is
quite varied and dynamic.
modulation applied. You can use this file as your training input. You can see a
spectrogram of the file in figure 34.4.
As noted above, you then need a recording of that file after it has passed
through some sort of signal path. Figure 34.5 illustrates the concept of re-amping.
In the example, I am playing the clean training signal out of a sound card and
into a small guitar amplifier. Then, I record the output of the amplifier’s speaker
back into the sound card using a microphone. The recording of the mic would be
the training output signal. In practice, I achieved more satisfying results when
I recorded directly from the output jack of the amplifier instead of via a mic.
Those familiar with sound engineering will know this as ‘DI’ing’ or direct input.
DI’d signal paths are easier to model than fully re-amped signals, as DI’d signals
only contain the pre-amplifier and tone circuit. Fully re-amped and mic’d signals
include the pre-amplifier circuit, the tone circuit, the power amplifier circuit, the
speaker itself and the mic, including the additional reflected signal from the room.
You could model such a signal path, but as for more traditional models of mod
elling guitar signal paths, you might need a more complex model to achieve good
results.
If you use longer recordings for your training data, the model will take longer to
train, but you might obtain a more thorough model of the effects signal path you
are capturing. The thoroughness of the model depends not just on the amount of
training data but also on the variety of timbres in the training signal. The training
process needs to see what happens to various signals when they pass through your
effects chain. Training data with a wide range of timbres will allow the trainer
to learn a more dynamic model since the training data will show how the effects
respond to different signals.
FIGURE 34.6
Clean signal (top) and re-amped signal (bottom) in Reaper.
able to you. For example, you could pass the signal through an amp modelling
plugin and see if you can replicate it with the LSTM. Of course, you will be mod
elling something that might not be very realistic. It is better (and more fun) to
pass the signal through an actual, physical pedal or amplifier.
34.3.3 Latency
When you pass a signal out of your computer and then capture it back in via an
effects chain, latency is added to the signal due to the buffer used in the audio
hardware. Depending on how you capture the signal, the latency may or may not
be an issue. If you play the signal from a DAW and record it back to the DAW
via the effects you are modelling, you will probably be okay. This is because the
DAW will automatically apply latency compensation based on the buffer size of
the audio device. If you use some other means to carry out the playback and
capture, then you may need to align the captured signal with the clean signal.
If you are using some external digital effects in your effects chain, these will add
more latency on top of that added by the sound card input and output. Your
DAW will not be able to compensate automatically for latency caused by external
digital effects. Luckily, the test file ‘v2 0 0.wav’ has some very short impulses at
the start, allowing you to align the recording with the original more easily. If you
do not align the files, the training might not work correctly.
focus on LSTM models as it is easy to understand and test them, and they will
run in real-time in a plugin format.
In this chapter, I will explain in detail the shape of the data and how it is organised
for training. My experience has been that figuring out exactly what form the
data structure takes can help a lot in understanding what the neural network and
training scripts are doing. After that, I will explain why you need to add additional
layers to the LSTM model so the ‘multichannel’ output of the LSTM can be mixed
down to a single channel. We will finish the chapter with an examination of the
loss function designed to guide the training script to adjust the parameters of the
neural network correctly to achieve a usable trained model.
1. The input data folder. This should contain one or more WAVs. These
are the clean audio signals. Essentially, you can just put the test signal
WAV file in there.
2. The output data folder. This should contain the processed version of the
input WAVs. As for the input, one file will be enough for our purposes.
3. The frag or sequence length, which is the length of the sequence sent
into the LSTM each time we call ‘forward’ during training.
296
Data shapes, LSTM models and loss functions 297
The input and output folder arguments are pretty straightforward, but what
exactly does the sequence length do here? Regarding the dataset preparation, the
audio files are concatenated into two long sequences of samples, one for the input
and one for the output. Then, they are chopped into sub-sequences with the length
you specify in the sequence length parameter.
For example, imagine you have 180 seconds of audio at 44,100Hz in the input
folder with a matching, processed signal in the output folder. If you ask for a
sequence length of 0.5s, you will receive 360 sequences with a length of 22,050 for
the input and the same for the output. The sequence length you choose then has
an effect on the training process. The longer the sequence, the more information
the trainer will have about how the system you are modelling behaves over time.
To put that into more precise terms, the purpose of the LSTM model is to
produce the correct output sample for the given input sample. If you have a
sequence length of one sample, when you are training, the network only gets to
see one sample at a time, and it has to guess what the output should be. This might
be okay for a waveshaper-type effect – waveshapers take one input and produce
one output; they do not know about any older inputs. But what about some of the
delay and filter effects we have seen in previous chapters? They often take account
of several previous inputs. Then consider modelling complex analogue circuits
such as valve amplifiers – valves are quite ‘stateful’, which is partly what provides
their lively tone. As is often the way with neural networks, experimentation can
lead you to the best setting. A good starting point for sequence length is 0.5s,
meaning the neural network sees 0.5s worth of input samples and has to predict
the following output sample.
Sequence length
FIGURE 35.1
Sequence length and batch size.
not receive anything else. So, when the network runs in a plugin, it will probably
ignore anything older than 0.5s.
the ideal sequence length and batch sizes for the problem of guitar amplifier mod
elling, and those settings provide a good starting point. The defaults for the script
are a sequence length of 0.5s and a batch size of 50.
Numpy arrays must also have a regular shape. Consider the following code,
which will crash:
1 import numpy as np
2 data = [[[1] , [2] , [3]] ,[[4] , [5]]]
3 data_np = np . array ( data )
The conversion to a tensor is necessary because tensors are the data structures
that torch will accept, as discussed in an earlier chapter.
[ [
[[x_1], [x_2], [x_3], [x_4]], [[y_1], [y_2], [y_3], [y_4]],
[[x_5], [x_6], [x_7], [x_8]], [[y_5], [y_6], [y_7], [y_8]],
[[x_9], [x_10], [x_11], [x_12]] [[y_9], [y_10], [y_11], [y_12]]
] ]
input tensor output tensor
To give you some hands-on experience with the shape of the data and how to
select things from it, here is some Python code that takes a simple sequence from
0-11 in a 1D array and reshapes it:
1 import numpy as np
2 total_len = 12
3 a = np . arange (0 , total_len , 1)
4 print ( a )
5 channels = 1 # mono
6 seq_len = 2 # 6 samples in a sequence
7 num_seqs = int ( len ( a ) / seq_len / channels )
8 b = b = a . reshape (( num_seqs , seq_len , channels ) )
9 print ("1 st seq :" , b [0])
10 print ("3 rd seq :" , b [2])
11 # first seq , first sample , first channel
12 print ( b [0][0][0])
13 # third seq , second sample , first channel
14 print ( b [2][1][0])
In the above, a batch would consist of multiple sequences, e.g. b[0:2]. Selecting
within lists is a powerful feature in the Python language. b[0:4] selects indexes
0,1,2 and 3 from b. b[:,1] selects all ‘rows’ from b, then selects the second column
from each row. Experiment with Python’s list selection syntax yourself.
Data shapes, LSTM models and loss functions 301
The get train valid test datasets function has a ‘splits’ parameter, which dic
tates how the data is split into three subsets, and it defaults to splits=[0.8, 0.1,
0.1]. This means the training data is 80%, and the validation and test data are
10% each. The purpose of these three subsets of data is as follows:
• Training data: fed through the network during training to find errors. Errors are
fed back to the network to update parameters.
• Validation data: fed through the network during training to check how training
is progressing. Validation data errors are not fed back into the adjustment of
parameters, so it is considered ‘unseen’ data. Therefore, it tests how the network
performs with data it has not learned from.
• Test data: used to test the network on unseen data at the end of training. This
differs from validation data as it is used to compare across training runs, where
the training parameters (sequence length, batch size, learning rate) may have
been adjusted.
the validation data, we are observing how well the network generalises. If during
training, the training data performance is getting better, but the validation data
performance is getting worse, the network is over-fitting the training data and not
generalising well. That is when you should stop training.
You can see that it either selects the cpu device or the cuda device. CUDA
stands for Compute Unified Device Architecture, and it is a framework that Py-
Torch uses to carry out computations on Nvidia graphics accelerators. It should
have been installed on your machine when you installed PyTorch. It will only be
available if you have an appropriate type of GPU (basically a recent Nvidia one)
Data shapes, LSTM models and loss functions 303
In 1
Dense 1
Out
FIGURE 35.2
An LSTM network with a four hidden unit LSTM layer and a densely connected
unit which ‘mixes down’ the signal to a single channel.
and the correct drivers. You can run this training script on a CPU, but it will take
longer to complete training.
Once you have the device, the next step is to prepare the data loaders. The
data loaders present an interface on the dataset that allows batch processing,
shuffling and other features. It is easy to convert a TensorDataset object into a
DataLoader:
1 train_dl = DataLoader ( train_ds , batch_size = batch_size , shuffle = True )
This model, which is illustrated in figure 35.2, produces a single output channel:
1 torch . Size ([1 , 1])
FIGURE 35.3
What does loss mean? The top two plots show extracts from the target output.
The middle two plots show the output of an untrained (left) and trained network
(right). The right-hand side is much closer to the target. The bottom plots show
a simple error between each point in the two plots above. The sum of these values
could be a simple loss function.
Some things to note: the LSTM layer is created in batch first mode, meaning
the first dimension of the input is the sequence selection index, which is how
we’ve been working. The two layers are assigned to class variables called ‘lstm’
and ‘dense’. The forward function passes the data through the LSTM and then
the dense layer. You can test the class as follows:
1 model = SimpleLSTM ()
2 model . forward ( torch . zeros ((1 , 1) ) )
The complete class is defined in the file myk models.py. If you examine
the code, you will see it has some extra functions:zero on next forward and
save for rtneural. zero on next forward causes a set of zeroes to be passed in for
the LSTM state on the next call to forward. It is used to reset the network when
necessary during training. save for rtneural saves the current model parameters
into a JSON file that can be imported into the RTNeural library, allowing for
faster inference than is possible with TorchScript. More on that later!
data. The loss function needs to capture pertinent information about the signal
to guide the training towards the correct settings for the network. Figure 35.3
illustrates an example of a loss that can be easily calculated. In this case, the loss
is the distance between the target signal and the signal received from the network.
This is an example of perhaps the simplest possible loss function – the Euclidean
distance. However, it is not the most appropriate for training models to emulate
guitar amplifiers.
Designing loss functions appropriate for your problem is something of an art
form. Alec Wright describes a quite complex loss function in his paper about non
linear guitar amp emulation, which we are re-implementing here[45]. Wright’s loss
function consists of three stages, which he found effective in guiding the training
process. Let’s consider each of those in turn, but before we do, I will point out
that all the loss functions are written using torch functions, as that means they
can be computed using torch’s parallelisation capabilities where the hardware is
available.
as larger losses on loud signals, preventing the training from over-focusing on the
louder parts of the signal. The following code shows the implementation of ESR
loss:
1 def forward ( self , output , target ) :
2 self . epsilon = 0.00001
3 loss = torch . add ( target , - output )
4 loss = torch . pow ( loss , 2)
5 loss = torch . mean ( loss )
6 energy = torch . mean ( torch . pow ( target , 2) ) + self . epsilon
7 loss = torch . div ( loss , energy )
8 return loss
The implementation is located in myk loss.py. You can use the following code
to experiment with the ESR loss. I recommend that you run IPython from inside
the ‘python’ folder in project 39.5.23 from the repo guide:
1 import myk_loss
2 import torch
3 esr = myk_loss . ESRLoss ()
4 target = torch . randn ((10 , 1) )
5 output = target * 0.1 # quite different
6 print ( esr . forward ( output , target ) )
7 output = target * 0.99 # much closer
8 print ( esr . forward ( output , target ) )
Experiment with some different signals to see how the ESR loss comes out.
Here is some code to experiment with DC offset loss. It generates a signal and
then adds different types of DC offset:
1 import myk_loss
2 import torch
3 dc = myk_loss . DCLoss ()
Data shapes, LSTM models and loss functions 308
The loss function has weightings for each stage, which dictate how much that
loss stage inputs into the final loss. By default, ESR loss is weighted at 75% and
DC offset at 25%.
In this chapter, I will take you through the LSTM training loop. The training loop
is the block of code that carries out the actual training of the neural network, where
its parameters are gradually adjusted until it processes the signal correctly. This
training loop has some interesting features that work around some of the problems
encountered by researchers working on training LSTM networks, for example,
using a warm-up step and the exotic-sounding truncated backpropagation through
time.
The script next goes into the main flow of the training loop, which is shown
graphically in figure 36.1. In this diagram, you will see how an epoch breaks
the data into batches, computing loss for each batch and updating the network
parameters. There is a function called myk train.train epoch interval, which does
all the work involved in a training epoch. More on that below.
After the epoch completes, the losses are logged for display in the tensorboard.
Then various checks occur: save the model weights if the validation loss is a new
record, exit if there has been no new record for too long, and exit if there have
been more than ‘max epochs’ epochs. If the exit tests do not end the script, it
runs another epoch.
309
The LSTM training loop 310
FIGURE 36.1
The training loop. Data is processed in batches with updates to the network
parameters between batches. Between epochs, checks are done on whether to save
the model and exit.
The LSTM training loop 311
The optimiser’s job is to update the network parameters according to the loss.
The loss is converted into a set of gradients with respect to the network’s pa
rameters using backpropagation (loss.backward()). Intuitively, the gradients show
each parameter’s influence on the error, so if you follow the gradient by adjusting
the parameters, you will reduce the error. The learning rate dictates how far the
parameter will be adjusted along that gradient. If the learning rate is too high,
the adjustment will be too much, and you will overshoot the ideal setting for the
parameter. A high learning rate leads to the loss swinging up and down, never
settling on the optimum. If the learning rate is too low, optimisation proceeds too
slowly.
This is where the scheduler comes in – it automatically adjusts the learning
rate to stop the oscillatory behaviour once you are close to the optimum setting
for a parameter.
has gone through, e.g. every 2046 samples. The state of the network is retained
as this proceeds. This technique has the fancy name of Truncated backpropaga
tion through time. It is used because the shorter sequence lengths allow for more
frequent updates to the parameters with less computational complexity than for
longer sequences, which, according to Wright, leads to better training[46].
The train epoch interval function ultimately returns the mean loss across all
batches. The main loop then logs this training loss as well as computing and
logging the validation set loss.
FIGURE 36.2
Comparison of training runs with different sized LSTM networks. At the top you
can see the input signal and the target output signal recorded from Blackstar HT
1 valve guitar amplifier. The descending graphs on the left show the validation
loss over time for three LSTM network sizes. The waveforms show outputs from
the networks before and after training.
37
Operationalising the model in a plugin
In this chapter, I will show you how to operationalise your trained LSTM models
in a plugin. You have already seen how to run a TorchScript model in a JUCE
plugin project in a previous chapter, but here, I will update that code to cope with
the final version of the model. This model version is more straightforward because
it maintains its state internally instead of relying on you managing it externally.
At the end of the chapter, you should have a fully trained LSTM model running
inside a plugin in your plugin host.
There is a file called ‘lstm size 32 epoch 3 loss 0.6968.pth’. That is the file you
want to convert into a TorchScript model file. The following Python code will load
a model from a pth file and export it using TorchScript:
1 import torch
2 import myk_models # make sure this file is in the same folder
315
Operationalising the model in a plugin 316
3
4 # set this to the name of your actual pth file
5 # possibly with its full path
6 saved _pth_pa th = ’ l s t m _ s i z e _ 3 2 _ e p o c h _ 3 _ l o s s _ 0 .6968. pth ’
7 expor t_pt_pa th = ’ dist_32 . ts ’
8 # load from pth
9 model = torch . load ( sa ved_pth_ path )
10 model . eval ()
11 # save
12 scrip ted_mod el = torch . jit . script ( model )
13 torch . jit . save ( scripted_model , export_ pt_path )
lstmModel will store the model loaded with TorchScript. You have seen the
inBuffer and outBuffer idea before when we were working with the simple, random
LSTM model – they are used to pass the audio data from the JUCE processBlock
function to the processBlockNN function. The processBlockNN will pass the data
through the model.
Now, over to PluginProcessor.cpp in the constructor. Set the audio channels
to mono:
1 . withInput (" Input " , juce :: Au d io Ch an n el Se t :: mono () ,
2 ...
3 . withOutput (" Output " , juce :: Au d io Ch an n el Se t :: mono () ,
4 ...
Then, in the constructor, load the model. We shall hard code the path to the
model for now. In the following example, I am loading from C:\temp\models on
Windows, and I have added some code to double-check that the file exists and to
crash if not:
1 // might need this at the top of the file
2 # include < filesystem >
3 ...
4 std :: string modelFolder {" C :\\ temp \\ models \\"};
5 std :: string fp { modelFolder + " dist_32 . ts "};
6 if (! std :: filesystem :: exists ( fp ) ) {
7 DBG (" File " << fp << " not found ") ;
8 throw std :: exception () ;
9 }
10 DBG (" Loading model from " << fp ) ;
11 lstmModel = torch :: jit :: load ( fp ) ;
Critical note for Windows users: remember you must build in the same mode
as the version of libtorch you are using, e.g. Release or Debug mode, or your
program will silently and un-debuggably crash on the jit load statement.
Operationalising the model in a plugin 318
This is very similar to the code you saw earlier when working with the random
LSTM networks in JUCE. The difference is that we are no longer managing the
state as the state is now internally managed by the model, and the input tensor is
3D instead of 2D. To make the 3D tensor from the 1D list from the signal buffer,
we call view(1, -1, 1) instead of view(-1, 1) to reshape the tensor. The 3D tensor
is needed as the model class SimpleLSTM has been set up to process batches of
sequences instead of single sequences. So, we just create a batch containing one
sequence. Now, build the project and verify that execution gets past the call to
torch::jit::load in the constructor. As mentioned above, you must build in release
mode on Windows unless using a debug build of libtorch.
That does the work of ferrying the audio to the neural network and back again.
Compile and test. If you are lucky or running on Linux, you might hear the neural
network working its magic on your signal and emulating whatever you trained
it to emulate. But it is more likely that you will hit the performance limit of
TorchScript, and you will hear the audio glitching. For example, with a 32-unit
LSTM layer running on my M1 Mac Mini (2020 model), the plugin runs ok in the
AudioPluginHost, but in Reaper, I can hear each block being processed, followed
by a short silence. If I render the audio from Reaper, it renders at 0.2x real-time,
but the result sounds correct. The plugin runs surprisingly well on my Linux Intel
10th Gen i7 machine. This shows that the best-case scenario performance figures
we looked at in section sec:lstm-performance are far from reality.
In this chapter, you will learn how to make your neural network process audio
faster using the RTNeural inferencing engine. You will find out why RTNeural
exists and who created it. Then, you will learn how to export your model’s weights
to a JSON file, which can be read back in and used to set up an RTNeural network.
You will see a comparison of RTNeural and TorchScript, showing that models with
the same weights and architecture in both systems output the same values. You
will see that RTNeural models run two or three times faster than TorchScript
models. Finally, you will see how you can deploy RTNeural in a JUCE plugin,
which is the conclusion of our technical work with neural effects.
320
Faster LSTM using RTNeural 321
That is quite a complex type specifier – let’s break it apart. This line sets up a
new type of model called MyLSTMType. You can call it whatever you want. The
new model will process floats and have one input and one output.
1 using MyLSTMType = RTNeural :: ModelT < float , 1 , 1 ,
The following two lines specify the layers: an LSTM layer with one input and
‘lstm units’ units followed by a Dense layer with lstm units inputs and one output.
1 RTNeural :: LSTMLayerT < float , 1 , lstm_units > ,
2 RTNeural :: DenseT < float , lstm_units , 1 >
Faster LSTM using RTNeural 322
Once you have this type defined, you can create an instance of a model of that
type, and then you load weights into it layer by layer:
1 // create a model using the new type
2 MyLSTMType model ;
3
4 // create a json object from a file
5 std :: ifstream jsonStream (" lstm_weights . json " , std :: ifstream :: binary ) ;
6 nlohmann :: json modelJson ;
7 jsonStream >> modelJson ;
8
9 // get the lstm layer from the model
10 auto & lstm = model . get <0 >() ;
11 // write the weights from the json object into the layer
12 RTNeural :: torch_helpers :: loadLSTM < float > ( modelJson , " lstm ." , lstm ) ;
13
14 // get the dense layer
15 auto & dense = model . get <1 >() ;
16 // load the weights from the JSON object into the layer
17 RTNeural :: torch_helpers :: loadDense < float > ( modelJson , " dense ." , dense
);
In the example code in main rtneural basic.cpp, you can see this in action.
I added some sanity-checking code that verifies that the number of units in the
model is the same as the number of units in the exported JSON. The program
creates a model, loads in weights, and then passes some numbers through the
model.
This assumes you have a folder above your project’s folder called RTNeu
ral containing a clone of the RTNeural GitHub repository. It should already be
there if you are working with the code pack for the book. Next, change the PLU
GIN CODE and PLUGIN NAME properties in CMakeLists.txt as you see fit. You
should also ensure that IS SYNTH is set to FALSE so your plugin can be an ef
fects processor. You can attempt a test build to verify you have your RTNeural
and JUCE paths set correctly.
Faster LSTM using RTNeural 324
Notice that I have given the model specification the name ‘RTLSTMModel32’,
which will make it clear (to me later!) that it is a 32-unit model. Next, we can
add an object of that type to the private section of PluginProcessor.h:
1 RTLSTMModel32 lstmModel ;
FIGURE 38.1
A sinusoidal test signal passing through an RTNeural LSTM distortion effect in
AudioPluginHost. The sinusoidal wave is the original signal, the clipped out wave
is the LSTM-processed signal.
Then you can call that code from your PluginProcessor constructor in Plugin
Processor.cpp:
1 setupModel ( lstmModel , " path - to - json - file ") ;
Then implement processBlock such that it sends each incoming sample through
the LSTM network – in PluginProcessor.cpp:
1 for ( int channel = 0; channel < t o t a l N u m I n p u t C h a n n e l s ; ++ channel ) {
2 auto * outData = buffer . g e tW ri te P oi nt er ( channel ) ;
3 auto * inData = buffer . getRea dPointe r ( channel ) ;
4 for ( auto i =0; i < buffer . getNumSamples () ; ++ i ) {
5 // as simple as this !
6 outData [ i ] = lstmModel . forward (& inData [ i ]) ;
7 }
8 }
At this point, you probably think this is a lot simpler than TorchScript, espe
cially when preparing the data to be sent to the network. You may be correct, but
Faster LSTM using RTNeural 326
the bespoke weight-loading code is tricky – TorchScript does this with one line
of code. Example 39.5.26 in the repo guide contains working code for JUCE and
RTNeural.
values. You will need to adjust the network architecture so it can take two values
instead of one at its input. Eventually, you will run the trained model in a plugin.
The difference here will be that you need to pass in two values: the audio signal
and the control parameter setting. Good luck!
38.9 What other effects can you model with neural net
works?
Neural networks can model many other types of effects. I do not have space to
cover any more effects types here, but the general principles of training and deploy
ment are undoubtedly transferable. As a bonus, I have provided some starter and
experimental code for a reverberator based on Christian Steinmetz’s paper[39]. It
is described in the code repo section 39.5.27. An area of rapid growth in neural
effects is differentiable effects. You can find lots of information about those in Lee
et al.’s review paper[24].
In this chapter, I will tell you where you can find the projects referred to through
out the book in the GitHub source code repository that accompanies the book.
328
Guide to the projects in the repository 329
39.2.2 002-cmake-juce
1 |-- Part1_GettingStarted
2 | | - - 002 _ m i n i m a l _ p l u g i n _ c m a k e 39.2.8 FM plugin with proper
parameters
| - - P a r t 3 _I m p r o v i s er | - - P a r t 3 _ I m p r o v i s er
| | - - 020 d _ m i d i _ m a r k o v _ i o i | | - - 020 h _ m i d i _ m a r k o v _ v e l
Guide to the projects in the repository 332
[1] Andrew Maz. Music Technology Essentials: A Home Studio Guide. Focal
Press, 2023.
[2] Ron Begleiter, Ran El-Yaniv, and Golan Yona. On prediction using variable
order Markov models. Journal of Artificial Intelligence Research, 22:385–421,
2004.
[3] John A. Biles. Life with GenJam: Interacting with a musical IGA. In
IEEE SMC’99 Conference Proceedings. 1999 IEEE International Conference
on Systems, Man, and Cybernetics (Cat. No. 99CH37028), volume 3, pages
652–656. IEEE, 1999.
[4] Jean-Pierre Briot, Gaëtan Hadjeres, and François-David Pachet. Deep
Learning Techniques for Music Generation – A Survey, August 2019.
arXiv:1709.01620 [cs].
[5] Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit
poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
Publisher: American Association for the Advancement of Science.
[6] Jatin Chowdhury. RTNeural: Fast Neural Inferencing for Real-Time Systems.
arXiv preprint arXiv:2106.03037, 2021.
[7] Nick Collins, Vit Ruzicka, and Mick Grierson. Remixing AIs: mind swaps,
hybrainity, and splicing musical models. In Proceedings of the 1st Joint Con
ference on AI Music Creativity, Sweden, 2020.
[8] Darrell Conklin. Music generation from statistical models. In Proceedings
of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the
Arts and Sciences, pages 30–35. Citeseer, 2003.
[9] John Covert and David L. Livingston. A vacuum-tube guitar amplifier model
using a recurrent neural network. In 2013 Proceedings of IEEE Southeastcon,
pages 1–5. IEEE, 2013.
[10] Giovanni De Sanctis and Augusto Sarti. Virtual analog modeling in the wave-
digital domain. IEEE transactions on audio, speech, and language processing,
18(4):715–727, 2009. Publisher: IEEE.
335
Bibliography 336
[11] Nina Düvel, Reinhard Kopiez, Anna Wolf, and Peter Weihe. Confusingly
Similar: Discerning between Hardware Guitar Amplifier Sounds and Simula
tions with the Kemper Profiling Amp. Music & Science, 3:205920432090195,
January 2020.
[12] Jesse Engel, Chenjie Gu, Adam Roberts, and others. DDSP: Differentiable
Digital Signal Processing. In International Conference on Learning Repre
sentations, 2019.
[13] Fiammetta Ghedini, François Pachet, and Pierre Roy. Creating Music and
Texts with Flow Machines. In Giovanni Emanuele Corazza and Sergio Agnoli,
editors, Multidisciplinary Contributions to the Science of Creative Thinking,
pages 325–343. Springer Singapore, Singapore, 2016. Series Title: Creativity
in the Twenty First Century.
[14] Gaëtan Hadjeres, François Pachet, and Frank Nielsen. DeepBach: a Steerable
Model for Bach chorales generation. arXiv preprint arXiv:1612.01010, 2016.
[15] Dorien Herremans, Ching-Hua Chuan, and Elaine Chew. A Functional Tax
onomy of Music Generation Systems. ACM Computing Surveys, 50(5):1–30,
September 2018.
[16] Lejaren A. Hiller Jr and Leonard M. Isaacson. Musical composition with
a high speed digital computer. In Audio engineering society convention 9.
Audio Engineering Society, 1957.
[17] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural
computation, 9(8):1735–1780, 1997. Publisher: MIT press.
[18] Geoffrey Holmes, Andrew Donkin, and Ian H Witten. Weka: A machine
learning workbench. In Proceedings of ANZIIS’94-Australian New Zealnd
Intelligent Information Systems Conference, pages 357–361. IEEE, 1994.
[19] S. R. Holtzman. Using generative grammars for music composition. Computer
music journal, 5(1):51–64, 1981. Publisher: JSTOR.
[20] Feng-hsiung Hsu. IBM’s deep blue chess grandmaster chips. IEEE micro,
19(2):70–81, 1999. Publisher: IEEE.
[21] Shulei Ji, Jing Luo, and Xinyu Yang. A comprehensive survey on deep music
generation: Multi-level representations, algorithms, evaluations, and future
directions. arXiv preprint arXiv:2011.06801, 2020.
[22] Boris Kuznetsov, Julian Parker, and Fabian Esqueda. DIFFERENTIABLE
IIR FILTERS FOR MACHINE LEARNING APPLICATIONS. In Proc. Int.
Conf. Digital Audio Effects (eDAFx-20), 2020.
Bibliography 337
[23] Yann LeCun, Yoshua Bengio, Geoffrey Hinton, Lecun Y., Bengio Y., and Hin
ton G. Deep learning. Nature, 521(7553):436–444, 2015. ISBN: 3135786504
eprint: arXiv:1312.6184v5.
[24] Sungho Lee, Hyeong-Seok Choi, and Kyogu Lee. Differentiable artificial re
verberation. IEEE/ACM Transactions on Audio, Speech, and Language Pro
cessing, 30:2541–2556, 2022. Publisher: IEEE.
[25] Joel Lehman, Jeff Clune, Dusan Misevic, Christoph Adami, Lee Altenberg,
Julie Beaulieu, Peter J. Bentley, Samuel Bernard, Guillaume Beslon, and
David M. Bryson. The surprising creativity of digital evolution: A collection
of anecdotes from the evolutionary computation and artificial life research
communities. Artificial life, 26(2):274–306, 2020. Publisher: MIT Press One
Rogers Street, Cambridge, MA 02142-1209, USA journals-info . . . .
[26] Louis McCallum and Mick S Grierson. Supporting Interactive Machine Learn
ing Approaches to Building Musical Instruments in the Browser. In Proceed
ings of the International Conference on New Interfaces for Musical Expres
sion, pages 271–272. Birmingham City University Birmingham, UK, 2020.
[27] Eduardo Reck Miranda. Cellular Automata Music: An Interdisciplinary
Project. Interface, 22(1):3–21, January 1993.
[28] David Moffat. AI Music Mixing Systems. In Handbook of Artificial Intelli
gence for Music, pages 345–375. Springer, 2021.
[29] Hans Moravec. When will computer hardware match the human brain. Jour
nal of evolution and technology, 1(1):10, 1998.
[30] Gerhard Nierhaus. Algorithmic composition: paradigms of automated music
generation. Springer Science & Business Media, 2009.
[31] Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan,
Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray
Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint
arXiv:1609.03499, 2016.
[32] Andrew Pickering. Cybernetics and the mangle: Ashby, Beer and Pask. Social
studies of science, 32(3):413–437, 2002. Publisher: Sage Publications London.
[33] Will Pirkle. Designing audio effect plugins in C++: for AAX, AU, and VST3
with DSP theory. Routledge, 2019.
[34] Nicola Plant, Clarice Hilton, Marco Gillies, Rebecca Fiebrink, Phoenix Perry,
Carlos González Dı́az, Ruth Gibson, Bruno Martelli, and Michael Zbyszynski.
Interactive Machine Learning for Embodied Interaction Design: A tool and
Bibliography 338
[46] Alec Wright, Eero-Pekka Damskägg, and Vesa Välimäki. Real-time black-box
modelling with recurrent neural networks. In 22nd international conference
on digital audio effects (DAFx-19), pages 1–8, 2019.
[49] Matthew John Yee-King, Leon Fedden, and Mark d’Inverno. Automatic pro
gramming of VST sound synthesizers using deep networks and other tech
niques. IEEE Transactions on Emerging Topics in Computational Intelli
gence, 2(2):150–159, 2018. Publisher: IEEE.
[50] Nimalan Yoganathan and Owen Chapman. Sounding riddims: King Tubby’s
dub in the context of soundscape composition. Organised Sound, 23(1):91–
100, 2018. Publisher: Cambridge University Press.
Index
340
Index 341
C++, 242
Definition, 241
JUCE, 243
Parameters, 247
Wekinator, 99
Inputs and outputs, 100
Windows
Visual Studio, 37
Windows debug output, 27
Xcode
Build, 39
Installing, 19