The C Sound Book
The C Sound Book
All rights reserved. No part of this book may be reproduced in any form by any electronic or
mechanical means (including photocopying, recording, or information storage and retrieval)
without permission in writing from the publisher.
This book was set in Times Roman by Graphic Composition, Inc., on the Miles 33
typesetting system.
Foreword xxvii
Barry Vercoe
Preface xxxi
Max Mathews
Acknowledgments xxxiii
Introduction xxxvii
How to Use This Book xxxix
Software Synthesis
Csound Fundamentals
1. Introduction to Sound Design in Csound 5
Richard Boulanger
2. Understanding and Using Csound’s GEN Routines 65
Jon Christopher Nelson
3. What Happens When You Run Csound 99
John ffitch
viii Book Contents
Imitative Synthesis
6. Designing Acoustically Viable Instruments in Csound 155
Stephen David Beck
7. Designing Legato Instruments in Csound 171
Richard Dobson
8. Contiguous-Group Wavetable Synthesis of the French Horn
in Csound 187
Andrew Horner and Lydia Ayers
9. FM Synthesis and Morphing in Csound: from Percussion
to Brass 197
Brian Evans
10. Modeling “Classic” Electronic Keyboards in Csound 207
Hans Mikelson
Algorithmic Synthesis
11. A Survey of Classic Synthesis Techniques in Csound 223
Rajmil Fischman
12. A Guide to FM Implementation in Csound 261
Russell Pinkston
13. A Guide to Granular Synthesis in Csound 281
Allan S. C. Lee
14. A Guide to FOF and FOG Synthesis in Csound 293
Michael Clarke
15. Processing Samples with Csound’s FOF Opcode 307
Per Byrne Villez
ix Book Contents
Mathematical Models
16. A Look at Random Numbers, Noise, and Chaos with Csound 321
John ffitch
17. Constrained Random Event Generation and Retriggering
in Csound 339
Russell Pinkston
Signal Processing
Programming
Adding Opcodes
31. Extending Csound 599
John ffitch
32. Adding New Unit Generators to Csound 613
Marc Resibois
Appendixes
Index 727
CD-ROM Contents
CD-ROM Chapters
Algorithmic Composition
7. An Introduction to Cscore
Archer Endrich
8. Algorithmic Score Generators
Michael Gogins
xiv CD-ROM Contents
Interface Design
9. Creating and Using a Platform-Independent GUI for Csound in Java
Michael Gogins
10. Improving a Composer’s Interface: Recent Developments to Csound for
the Power Macintosh Computer
Matt Ingalls
Sonification
13. Audification of Heart Rhythms in Csound
Mark Ballora, Bruce Pennycook and Leon Glass
14. Some “Golden Rules” for Designing Auditory Displays
Stephen Barrass
15. Using Csound for Sonification
David Rossiter
26. Terrain-Mapping
Hans Mikelson
27. Three Modeling Approaches to Instrument Design
Eduardo Reck Miranda
28. The Design of Equalizers and Compressors for Studio Use
Erez Webman
CD-ROM Tutorials
An Introduction to Csound
Eric L. Singer
A Beginner Tutorial
Barry Vercoe
CD-ROM Music
Hiway 70
Bill Alves
Shadowland
Steve Antosca
Swarm
Tim Barrass
Voices ⫽ Wind
Mike Berry
Intract
Noel Bush
BlueCube( )
Kim Cascone
Cymbolic and Perks
Robert L. Cooper
Otis
Sean Costello
Leap Frog
Steven Curtin
Drums and Different Canons
John ffitch
For Fofs
Dan Gutwein
Acid Bach Suite
Jeff Harrington
Direct X Csound
Gabriel Maldonado
Active X Csound and Java Csound
Michael Gogins
The Canonical Csound Sources and Executables
John ffitch
Cecilia
Alexandre Burton and Jean Piché
Cmask
Andre Bartetzki
CsEdit
Roger Klaveness
Csounder
Dustin Barlow and Tim Mielak
Csound MIDI Fader
Young Jun Choi
xxii CD-ROM Contents
CsRef
Mike Berry
DrawSound
Brian Fudge
General MIDI to Csound
Young Jun Choi
Grainmaker
Jon Christopher Nelson
Hydra and Hydra Java
Malte Steiner
Csound Studio Lite
Young Jun Choi
CMaxSound
John Burkhardt
MIDI2CS
Rudiger Boormann
Pv2Pict
Roger Klaveness
RhythGranTest
David W. Hainsworth
ScorePlot
Fabio P. Bertolotti
Silence
Michael Gogins
SoftSamp
Dustin Barlow and Tim Mielak
Space
Richard Furse
xxiii CD-ROM Contents
Spliner
Jeff Bellsey
Visual Orchestra
David Perry
WCshell
Riccardo Bianchini
WebSynth
Eric Lyon
WinHelp
Rasmus Ekman
Csound Links
Csound Music
The Csound FAQ and Mailing List Archives
Csound FrontEnds and Launchers
Getting Csound
Csound Tutorials
Csound Utilities
Computer Music Organizations
Samples and Impulses
Foreword
Barry Vercoe
It is indeed a pleasure to peruse this volume, to see so many composers and authors
joined in a similar purpose of making their insights and experiences public and to
feel that the computer music community will surely benefit from such broad-based
disclosure. It is never easy to have more than one living composer present at a single
concert of their collected works, and to the extent that these contributions represent
composing time lost and thoughts and insights given away free, the richness and even-
ness of this volume suggests that these composer/authors must all have been ensemble
performers first. Of course, every ensemble has its taskmaster, and I stand in awe of
what Richard Boulanger has done to bring this one to the concert stage.
This field has always benefited most from the spirit of sharing. It was Max Ma-
thews’s willingness to give copies of Music 4 to both Princeton and Stanford in the
early 1960s that got me started. At Princeton it had fallen into the fertile hands of
Hubert Howe and the late Godfrey Winham, who as composers imbued it with con-
trollable envelope onsets (envlp) while they also worked to have it consume less
IBM 7094 time by writing large parts in a BEFAP assembler (Music4B). Looking
on was Ken Steiglitz, an engineer who had recently discovered that analog feedback
filters could be represented with digital samples. By the time I first saw Music4B
code (1966–1967) it had a reson filter—and the age of subtractive digital sound de-
sign was already underway.
During 1967–68 I wrote a large work (for double chorus, band, string orchestra,
soloists and computer-generated sounds), whose Seattle Opera House performance
convinced me that this was a medium with a future. But on my arrival back at Prince-
ton I encountered a major problem: the 7094 was to be replaced by a new machine
called a 360 and the BEFAP code would no longer run. Although Godfrey responded
by writing a Fortran version (Music4BF, slower but eternally portable), I took a
gamble that IBM would not change its assembler language again soon, and wrote
Music 360. Like Max Mathews, I then gave this away as fast as I could, and its
xxviii Foreword
super efficiency enabled a new generation of composers with limited budgets to see
computer music as an affordable medium.
But we were still at an arm’s length from our instrument. Punched cards and batch
processing at a central campus facility were no way to interact with any device, and
on my move to the Massachusetts Institute of Technology in 1971 I set about design-
ing the first comprehensive real-time digital sound synthesizer, to bring the best of
Music 360’s audio processing into the realm of live interactive performance. After
two years and a design complete, its imminent construction was distracted by a gift
from Digital Equipment Corporation of their latest creation, a PDP-11. Now, with a
whole computer devoted exclusively to music, we could have both real-time pro-
cessing and software flexibility, and Music 11 was the result.
There were many innovations in this rewrite. First, since my earlier hardware de-
sign had introduced the concept of control-rate signals for things like vibrato pitch
motion, filter motion, amplitude motion and certain envelopes, this idea was carried
into the first 1973 version of Music 11 as k-rate signals (familiar now to Csound
users). Second, envelopes became more natural with multi-controllable exponential
decays. Indeed, in 1976 while writing my Synapse, for Viola and computer, I found
I could not match the articulation of my soloist unless I made the steady-state decay
rate of each note in a phrase be a functional inverse of the note length. (In this regard
string and wind players are different from pianists, who can articulate only by early
release. Up to this time we had all been thinking like pianists, that is, no better than
MIDI.) My envlpx opcode fixed that.
This had been my second gamble that a particular machine would be sufficiently
common and long-lived to warrant assembler coding, and Music 11’s efficiency and
availability sustained a decade of even more affordable and widespread computer
music. Moreover, although the exported code was not real-time, our in-house experi-
ments were: Stephen Haflich connected an old organ keyboard so that we could play
the computer in real-time; if you played something reasonably metric, the computer
would print out the score when you finished; if you entered your score via our graphi-
cal score editor, the machine would play it back in real-time (I made extensive use
of this while writing Synapse); if you created your orchestra graphically using Rich
Steiger’s OEDIT, Music 11 would use those instruments. Later, in 1980, student
Miller Puckette connected a light-sensing diode to one end of the PDP⫺11, and an
array-processing accelerator to the other, enabling one-dimensional conducting of a
real-time performance. Haflich responded with a two-dimensional conducting sen-
sor, using two sonar cells from a Polaroid camera. This was an exciting time for real-
time experiments, and the attendees at our annual MIT Summer Workshops got to
try many of these.
Meanwhile, my interest had shifted to tracking live instruments. At IRCAM in
Paris in 1982, flutist Larry Beauregard had connected his flute to DiGiugno’s 4X
xxix Foreword
Extended Csound. This is the first stage of an orderly progression towards multi-
processor fully-interactive performance. In the current version, Csound is divided
between two processors, a host PC and a DSP-based soundcard. The host does all
compiling and translation, disk I/O, and graphical-user-interface (GUI) processing,
such as Patchwork (editing) and Cakewalk (sequencing). The DSP does all the signal
processing, with sole access to the audio I/O ports; it also traps all MIDI input with
an on-chip MIDI manager, such that each MIDI note-on results in an activated instru-
ment instance in less than one control period.
The tightly-coupled multi-processor performance of Extended Csound has in-
duced a flurry of new opcodes, many of them tailored to the internal code of the
DSP I am using (a floating-point SHARCTM 21060 from Analog Devices). The new
opcodes extend the power of Csound in areas such as real-time pitch-shifting,
compressor-limiting, effects processing, mixing, sampling synthesis and MIDI pro-
cessing and control. Since this volume is not the place for details, the curious can
look at my paper in the 1996 ICMC Proceedings. I expect to be active in this area
for some time.
Meanwhile, the fully portable part of Csound is enjoying widespread use, and this
volume is a testament to the ingenuity of its users. Some are familiar names (Stephen
Beck, Richard Dobson, Brian Evans, Michael Clarke, John ffitch, Richard Karpen,
Jon Nelson, Jean Piché and Russell Pinskton) whom we have come to lean on out of
both habit and dependency; some are newer lights (Elijah Breder, Per Byrne Villez,
Michael Gogins, Andrew Horner, Alan Lee, Dave Madole, David McIntyre, Hans
Mikelson, Michael Pocino, Marc Resibois, and Erik Spjut) we are attending to with
growing interest; others are fresh faces (Bill Alves, Mike Berry, Martin Dupras, Raj-
mil Fischman, Matt Ingalls, Eric Lyon, Gabriel Maldonado, and Paris Smaragdis)
likely to leave long-term images. The range of topics is broad, from sound design
using various synthesis and modeling methods to mathematical and signal process-
ing constructs we can learn from, and from compositional strategies using existing
Csound resources to some powerful lessons on what to do when what you need is
not already there. The whole is amply supported by CD-ROM working examples of
ports to other machines, user-interfaces, personal approaches to composing, and
most importantly, compositions that speak louder than words.
I am impressed and somewhat humbled by the range of thought and invention that
Csound has induced. Gathering all this energy into one place creates a singularity
that defies real measurement, but I am certain the effects of this volume will be
much felt over both distance and time. On behalf of the readers, my thanks to all the
contributors. Finally, my hat off to Richard Boulanger for all that composing time
lost (you should have known better), and my best wishes to Csound users as you
continue to redefine your own voice.
Preface
Max V. Mathews
The synthesis of music by a computer was born in 1957 when an IBM704 in New
York City slowly computed a 17-second “piece” composed in what was probably the
first audible example of a new musical scale with my program, Music 1. The pro-
gram was limited, but many limitations were overcome in a sequence of successor
programs culminating in Music 5, which was finished in the 1960s. Subsequently,
others have written programs that both added powerful new features and ran on the
many new computers that rapidly developed. Today, Csound, written by Barry Ver-
coe is, in my opinion, the best and most popular software synthesis program. In addi-
tion to new features, it is written in the C language, which is universally available
and likely to remain so in the future. Thus, music written in Csound can be expected
to have a long technical lifetime in the sense that it can be played on future
computers.
Until recently, general-purpose music programs all had one major restriction—
they could not be used for live performance because computers were not fast enough
to synthesize interesting music in real-time, that is to say it took more than one sec-
ond to synthesize a second of sound. Special purpose computers called digital syn-
thesizers overcame this limitation. But real-time synthesizers brought with them a
new major problem—rapid obsolescence. The commercial lifetime of a new synthe-
sizer is only a few years and therefore music written for such machines cannot be
expected to be playable in a decade.
Live performance with a general purpose program like Csound is now possible,
either running native on a fast PC, or when ported to a DSP such as Vercoe’s Ex-
tended Csound—a superbly efficient version that makes rich complex music in
real-time on today’s fast processor chips and that can be expected to run even faster
on tomorrow’s chips. Because of the universality of the C language, not only can
Csound be compiled and run on general purpose processors such as a Macintosh G3
or a Pentium II, but it can be run on digital-signal-processing (DSP) chips that have
xxxii Preface
C compilers. Thus a parallel processor using a multiplicity of DSP chips can provide
almost unlimited musical power for a reasonable cost.
One of the crucial factors contributing to the success of the Music 5 language was
my book, The Technology of Computer Music, which described the program in detail
so it could be used and modified; it was intended both as a tutorial and as a reference.
The Csound Book, conceived of and edited by Richard Boulanger, fulfills these same
needs for Csound.
A measure of the growth and richness of today’s computer music is the sizes of
these two books. The Technology of Computer Music contained only 137 pages and
had only one author. The Csound Book has more than 700 pages that were contrib-
uted by over 50 authors. It is my belief that this book, plus the Csound programs, in-
struments and utilities on the accompanying CD-ROM, provide the essential tools for
the next generation of composers and performers of computer music.
Acknowledgments
In one way or another, I have been working on The Csound Book for over fifteen
years (including over 9000 email messages between collaborators, contributors, my
student assistants and myself). Without the help and support of the following people,
this book would have never become a reality. So . . .
Thanks to my mentors Tom Piggott, Robert Perry, and Hugo Norden for showing
me the way and starting me on the path. Thanks to my computer music teachers,
collaborators and friends—Bruce Pennycook, Dexter Morrill, F. Richard Moore,
Mark Dolson, Hal Chamberlain, and Max Mathews—each of you opened my eyes,
ears, and mind a little wider. And thanks to my artistic soulmates Marek Choloniew-
ski and Lee Ray—you have been a constant source of friendship and opened my
heart a little wider.
Thanks to my colleagues in the Music Synthesis Department at the Berklee Col-
lege of Music, but most importantly to Berklee vice president Dave Mash who taught
me how to teach music through technology. Thanks especially to my Csound stu-
dents at Berklee, whose enthusiasm and creativity continue to be a source of en-
ergy—their technical questions and their musical answers keep me growing and
keep me going.
In particular, I would like to recognize my fourteen most dedicated student assis-
tants: David Bax, Brian Cass, Luigi Castelli, Young Jun Choi, Flavio Gaete, Nate
Jenkins, Jacob Joaquin, Juno Kang, Andy Koss, Samara Krugman, Bobby Pietrusko,
Sandro Rebel, Yevgen Stupka, and Chuck Van Haecke.
Yevgen, all the authors owe you a debt of gratitude for the months of dedicated
work you did on the block diagrams. They add a level of consistency to the entire
manuscript and conceptual clarity to the concepts presented. Flavio, thank you for
the hundreds of hours you put into the initial layout and editing—especially the or-
chestras. Samara, thanks for straightening out the references, and for converting and
xxxiv Acknowledgments
reformatting the entire book and reference manual. Bobby, thanks for the final read
and then for spending the last weeks of summer (two years in a row) fixing ALL the
“little” things you found. Chuck, thanks for rendering and formatting the hundreds
of orchestras for the CD-ROM; and Brian thanks for taking over and finishing when
Chuck left to tour with a Blues band. Andy, thanks for entering the figure captions
and for your recommendations and suggestions on the preliminary CD-ROM and
cover designs; and David, thanks for getting the CD-ROM off to such a good start.
Sandro, thanks for running all the scores. Nate, thanks for the help proofing and
correcting the figure text. Juno and Luigi, thanks for all your help with the instrument
anthologies; and again Bobby, thanks for debugging and fixing all the busted ones.
Young, thanks for all the help finishing the CD-ROM.
Finally, Jacob, thank you for all your work on The Csound CD-ROM and The
Csound FrontPage. After graduating from Berklee, the fact that you took a year off
before graduate school to work solely on this project is something for which I and
the entire Csound community owe you a great deal. You single-handedly transformed
a huge collection of “Csound stuff” into a coherent and beautiful interactive experi-
ence—a true “on-line” extension of The Csound Book. I would not have made it
without you. In fact, without the dedicated and generous contributions of all these
students from the Berklee College of Music, this book and CD-ROM would have
taken forever to finish and would not have been half as beautiful.
Thanks to all the authors, developers, and composers who so generously contrib-
uted their knowledge and expertise to this project. I have learned so much from each
of you and am so happy and honored to help you share your discoveries and insights
with the world. Thank you Dustin Barlow and Tim Mielak for Csounder and Soft-
Samp; Young Jun Choi for Instant Csound; Russell Pinkston, and Keith Lent for
Patchwork; Dave Perry for Visual Orchestra; Michael Gogins for Silence, ActiveX
Csound, and JCsound; Alexandre Burton and Jean Piché for Cecilia; John Gauther
for The Amsterdam Catalog; Gabriel Maldonado for your wonderful Direct X
Csound and VMCI; Mike Berry, Dave Madole, and especially Matt Ingalls for your
Macintosh PPCsound port and support; Dave Phillips for all your work on the
LINUX port; and finally Hans Mikelson for your fabulous Csound E-Zine and
your incredible Instrument Anthology. Thank you to these and all the other develop-
ers whose Csound “launchers,” front-ends, and utilities have made the language so
much more teachable, learnable, and usable.
I would like to thank the Csound documentation team lead initially by Jean Piché
and including John ffitch, Rasmus Ekman, Tolve, Matt Ingalls, Mike Berry, Servando
Valero, Gabriel Maldonado, Jacob Joaquin, and especially David Boothe. David, you
have worked to produce and, more important, to maintain, the definitive reference
xxxv Acknowledgments
document that is printable, portable, linked and beautifully indexed. Your Canonical
Public Csound Reference Manual in both PDF and HTML formats has become the
de facto standard.
Thanks, of course, to all the Csound opcode developers. I doubt that Barry Vercoe
himself would recognize the 1999 version of Public Csound with all the cool new
stuff that YOU have put in there! Most important, on behalf of the entire Csound
community, I would like to acknowledge and thank my dear friend John ffitch for
maintaining, improving and extending Csound and for so generously sharing his
code, his time, and his expertise with virtually anyone who posts a question to the
Csound mailing list. While people like me speculate on the many ways of improv-
ing Csound, John ffitch just does it. In so many ways, Public Csound is truly John
ffitch’s program.
Thanks to Douglas Sery of the MIT Press for his encouragement, advice and sup-
port. Several years ago he contacted me about doing a Csound book and along the
way, whenever I would fumble or drop the ball, he was there to pick it up and toss it
back. His faith in me and belief in this project are largely responsible for its success-
ful completion. Also at MIT Press, thanks to Chryseis Fox for such a beautiful layout
and design. Thank you to Michael Sims for all his editorial input and guidance. And
thank you Ori Kometani for your cover art, which at a glance captures the essence of
Csound—a language that transforms letters and numbers into beautiful sonically
animated waveforms!
Thanks to Mike Haidar for initiating and shepherding the Extended Csound Proj-
ect at Analog Devices that resulted in my working closely with Scotty Vercoe, Sree-
nivasa Inukoti, and Barry Vercoe from 1995–1997 on a DSP-based version of
Csound—and loving every minute of it! Thanks also to Bill Verplank and the Interval
Research Corporation for supporting my research in alternate controllers and sound
design at Berklee that resulted in my working closely with you, Perry Cook, Rob
Shaw, Paris Smaragdis, and Max Mathews—another computer music dream team I
got to play with!
Finally, thank you Barry Vercoe . . . for Csound. As a gifted young composer, you
chose to put aside your manuscript paper and pencil to develop a tool that gave voice
to a world of young and aspiring composers—including me. On behalf of every
musician and teacher who has discovered the science of sound and the mathematical
basis of music through Csound, I thank you.
At home here, I thank my parents for their love, faith, and support; my three sons
Adam, Philip, and Adrien for keeping me in touch with reality and bringing me back
to earth . . . and to music. And last, but not least, I extend my most heartfelt gratitude
and ever lasting love to my wife, my song, my Susan—who understands me deeply
xxxvi Acknowledgments
take you along the paths that they have traveled. They will guide you and help you
understand and explore the world of Csound; and through their algorithms, sounds,
and music, they will point out many new and wonderful places along the way—both
technically and artistically.
As you will soon discover, the world of Csound is vast and ever expanding. I have
found it to be a constant source of wonder, enlightenment, and fulfillment. On behalf
of all of the authors, I welcome you to the world of Csound and I wish you the joy
of endless discovery . . .
The Csound Book is not meant to be “read” from cover to cover. Although I have
organized the chapters into thematic sections such as Software Synthesis, Signal Pro-
cessing, Programming, Composition, MIDI, and Real-time, the material in each sec-
tion does not necessarily progress from beginning to advanced. The best way to
“read” The Csound Book will depend on your level of experience with synthesizers,
signal processors, and computers. I assure you that there is something for both the
beginner and the expert in every section, but given the breadth and scope of what is
covered here, it is quite easy for beginners to lose their way. Below, I have outlined
several of the paths I take with my students at Berklee. I hope you will find these
suggestions helpful as you begin your journey into the world of Csound.
In my Csound class at Berklee I typically assign the following:
that touches upon medicine, perception, and even mind models, I introduce Barrass,
Batista, Rossiter, Ballora, and Pennycook. And for those interested in 3D and In-
ternet Audio I assign the Casey, Furse, McIntyre, and Breder chapters.
Another approach entirely, is to simply study the instruments. In this case you
could jump directly into the chapters focused on specific synthesis techniques or
you could choose to study and modify the Csound instruments in the: Amsterdam,
Comajuncosas, Lyon, Pinkston, Mikelson, Risset, and Smaragdis anthologies. These
inspiring and diverse collections represent a wealth of knowledge and hold all of
Csound’s secrets. Once you begin to check out the instruments on the CD-ROM, you
will soon discover that Csound is capable of making the most amazing sounds of any
commercial synthesizer or signal processor. And although this book and CD-ROM
seem to emphasize sound design, do remember that what this noisy world needs
most . . . is music. Above all else, turn your Csound into Cmusic.
Software Synthesis
Csound Fundamentals
1 Introduction to Sound Design in Csound
Richard Boulanger
Csound is a powerful and versatile software synthesis program. Drawing from a tool-
kit of over 450 signal processing modules, one can use Csound to model virtually
any commercial synthesizer or multieffect processor. Csound transforms a personal
computer into a high-end digital audio workstation—an environment in which the
worlds of sound-design, acoustic research, digital audio production, and computer
music composition all join together in the ultimate expressive instrument. As with
every musical instrument, however, true virtuosity is the product of both talent and
dedication. You will soon discover that Csound is the ultimate musical instrument.
But you must practice! In return, it will reward your commitment by producing some
of the richest textures and most beautiful timbres that you have ever heard. In the
audio world of Csound, knowledge and experience are the key . . . and your imagina-
tion the only limitation.
The goal of this chapter is to get you started on Csound’s road of discovery and
artistry. Along the way we’ll survey a wide range of synthesis and signal processing
techniques and we’ll see how they’re implemented in Csound. By the end we’ll have
explored a good number of Csound’s many possibilities. I encourage you to render,
listen to, study and modify each of my simple tutorial instruments. In so doing, you’ll
acquire a clear understanding and appreciation for the language, while laying down
a solid foundation upon which to build your own personal library of original and
modified instruments. Furthermore, working through the basics covered here will pre-
pare you to better understand, appreciate and apply the more advanced synthesis and
signal processing models that are presented by my colleagues and friends in the sub-
sequent chapters of this book.
Now there are thousands of Csound instruments and hundreds of Csound compo-
sitions on the CD-ROM that accompanies this text. Each opens a doorway into one
of Csound’s many worlds. In fact, it would take a lifetime to explore them all. Clearly,
one way to go would be to render all the orchestras on the CD-ROM, select the ones
6 Richard Boulanger
that sound most interesting to you and merely sample them for use in your own
compositions. This library of presets might be just the collection of unique sounds
you were searching for and your journey would be over.
I believe, however, it would be better to read, render, listen to, and then study the
synthesis and signal processing techniques that fascinate you most by modifying
existing Csound orchestras that employ them. Afterward you should express this un-
derstanding through your own compositions—your own timbre-based soundscapes
and sound collages. Through this active discovery process you will begin to develop
your own personal Csound library and ultimately your own voice.
To follow the path I propose, you’ll need to understand the structure and syntax
of the Csound language. But I am confident that with this knowledge, you’ll be able
to translate your personal audio and synthesis experience into original and beautiful
Csound-based synthetic instruments and some unique and vivid sound sculptures.
To that end, we’ll begin by learning the structure and syntax of Csound’s text-
based orchestra and score language. Then we’ll move on to explore a variety of
synthesis algorithms and Csound programming techniques. Finally we’ll advance
to some signal processing examples. Along the way, we’ll cover some basic digital
audio concepts and learn some software synthesis programming tricks. To better un-
derstand the algorithms and the signal flow, we’ll block-diagram most of our Csound
instruments. Also, I’ll assign a number of exercises that will help you to fully under-
stand the many ways in which you can actually work with the program.
Don’t skip the exercises. And don’t just read them—do them! They are the key to
developing real fluency with the language. In fact, you may be surprised to discover
that these exercises teach you more about how to work with Csound than any of the
descriptions that precede them. In the end, you should have a good strong foundation
upon which to build your own library of Csounds and you will have paved the way
to a deeper understanding of the chapters that follow.
So, follow the instructions on the CD-ROM; install the Csound program on your
computer; render and listen to a few of the test orchestras to make sure everything is
working properly; and then let’s get started.
The Csound orchestra file consists of two parts: the header section and the instru-
ment section.
In the header section you define the sample and control the rates at which the instru-
ments will be rendered and you specify the number of channels in the output. The
orchestral header that we will use throughout the text is shown in figure 1.1.
8 Richard Boulanger
sr ⫽ 44100
kr ⫽ 4410
ksmps ⫽ 10
nchnls ⫽ 1
The code in this header assigns the sample rate (sr) to 44.1K (44100), the control
rate (kr) to 4410 and ksmps to 10 (ksmps ⫽ sr/kr). The header also indicates that
this orchestra should render a mono soundfile by setting the number of channels
(nchnls) to 1. (If we wanted to render a stereo sound file, we would simply set nchnls
to 2).
In the Csound orchestra file, the syntax of a generic opcode statement is:
Output Opcode Arguments Comment (optional)
In the case of the oscil opcode, this translates into the following syntax:
output oscil amplitude, frequency, function # ; COMMENT
a1 oscil 10000, 440, 1 ; OSCILLATOR
9 Introduction to Sound Design in Csound
In our first orchestra file, instr 101 uses a table-lookup oscillator opcode, oscil, to
compute a 440 Hz sine tone with an amplitude of 10000. A block diagram of instr
101 is show in figure 1.2. The actual Csound orchestra code for this instrument is
shown in figure 1.3.
The block diagram of instr 101 clearly shows how the output of the oscillator,
labeled a1, is “patched” to the input of the out opcode that writes the signal to the
hard-disk.
Csound renders instruments line-by-line, from top to bottom. Input arguments are
on the right of the opcode name. Outputs are on the left. Words that follow a semico-
lon ( ; ) are considered to be comments and are ignored.
In instr 101, as show in figure 1.3, the input arguments to the oscillator are set at
10000 (amplitude), 440 (frequency) and 1 (for the function number of the waveshape
template that the oscillator reads). The oscillator opcode renders the sound 44100
times a second with these settings and writes the result into the variable named a1.
The sample values in the local-variable a1 can then be read as inputs by subsequent
opcodes, such as the out opcode. In this way, variable names are similar to patch
cords on a traditional analog synthesizer. Audio and control signals can be routed
virtually anywhere in an instruments and used to: set a parameter to a new value,
10000 440
OSCIL
f1
(a1)
OUT
Figure 1.2 Block diagram of instr 101, a simple fixed frequency and amplitude table-lookup
oscillator instrument.
Figure 1.3 Orchestra code for instr 101, a fixed frequency and amplitude instrument using
Csound’s table-lookup oscillator opcode, oscil.
10 Richard Boulanger
dynamically control a parameter (like turning a knob), or serve as an audio input into
some processing opcode.
In figure 1.4, you can see that instr 102—instr 106 use the same simple instrument
design as instr 101 (one signal generator writing to the out opcode), but replace
the oscil opcode with more powerful synthesis opcodes such as: foscil—a simple
2-oscillator FM synthesizer, buzz—an additive set of harmonically-related cosines,
pluck—a simple waveguide synthesizer based on the Karplus-Strong algorithm,
grain—an asynchronous granular synthesizer and loscil—a sample-based wave-
table synthesizer with looping.
Clearly the single signal-generator structure of these instruments is identical. But
once you render them you will hear that their sounds are quite different. Even though
they each play with a frequency of 440 Hz and an amplitude of 10000, the underlying
synthesis algorithm embodied in each opcode is fundamentally different—requiring
the specification of a unique set of parameters. In fact, these six signal generating
opcodes (oscil, foscil, buzz, pluck, grain, and loscil) represent the core synthesis
technology behind many of today’s most popular commercial synthesizers. One
might say that in Csound, a single opcode is an entire synthesizer. Well . . . maybe
not an exciting or versatile synthesizer, but, in combination with other opcodes,
Csound can, and will, take you far beyond any commercial implementation.
Now let’s look at the Csound score file that performs this orchestra of instruments.
Like the orchestra file, the score file has two parts: tables and notes. In the first part,
we use Csound’s mathematical function-drawing subroutines (GENS) to generate
function-tables ( f-tables) and/or fill them by reading in soundfiles from the hard-
disk. In the second part, we type in the note-statements. These note-events perform
the instruments and pass them performance parameters such as frequency-settings,
amplitude-levels, vibrato-rates, and attack-times.
Csound’s function generating subroutines are called GENS. Each of these (more than
20) subroutines is optimized to compute a specific class of functions or wavetables.
For example, the GEN5 and GEN7 subroutines construct functions from segments
of exponential curves or straight lines; the GEN9 and GEN10 subroutines generate
composite waveforms made up of weighted sums of simple sinusoids; the GEN20
subroutine generates standard window functions, such as the Hamming window and
10000
440 1 2 3 1
FOSCIL
(a1)
OUT
BUZZ
(a1)
OUT
PLUCK
(a1)
OUT
GRAIN
(a1)
OUT
Figure 1.4 Block diagrams and orchestra code for instr 102–instr 106, a collection of fixed
frequency and amplitude instruments that use different synthesis methods to produce a single
note with the same amplitude (10000) and frequency (440).
12 Richard Boulanger
10000 440
LOSCIL
4
(a1)
OUT
the Kaiser window, that are typically used for spectrum analysis and grain envelopes;
the GEN21 subroutine computes tables with different random distributions such as
Gaussian, Cauchy and Poison; and the GEN1 subroutine will transfer data from a
prerecorded soundfile into a function-table for processing by one of Csound’s op-
codes such as the looping-oscillator loscil.
Which function tables are required, and how they are used by the instruments in
your orchestra, is totally up to you—the sound designer. This can be a matter of
common sense, preference or habit. For instance, since instr 106 used the sample-
based looping oscillator, loscil, I needed to load a sample into the orchestra. I chose
GEN1 to do it. Whereas in instr 102, since I was using the foscil opcode I could
have chosen to frequency modulate any two waveforms, but decided on the tradi-
tional approach and modulated two sinewaves as defined by GEN10.
F-Statements
In the score file, the syntax of the Csound function statement ( f-statement) is:
f number loadtime table-size GEN Routine parameter1 parameter... ; COMMENT
As a result, the f-table ( f 111) would generate the function shown in figure 1.5.
13 Introduction to Sound Design in Csound
+1
-1 value address
0 00
0.3827 01
0.7071 02
0.9239 03
1 04
0.9239 05
0.7071 06
0.3827 07
0 08
-0.3827 09
-0.7071 10
-0.9239 11
-1 12
-0.9239 13
-0.7071 14
-0.3827 15
0 16
Figure 1.5 A 16 point sine function defined by GEN10 with the arguments: f 111 0 16 10 1.
f 1 0 4096 10 1
f 2 0 4096 10 1 .5 .333 .25 .2 .166 .142 .125 .111 .1 .09 .083 .076 .071 .066 .062
f 3 0 4097 20 2 1
f 4 0 0 1 ”sing.aif” 0 4 0
As you can see, a sinewave drawn with 16 points of resolution is not particularly
smooth. Most functions must be a power-of–2 in length (64, 128, 256, 512, 1024,
2048, 4096, 8192), and so, for wavetables, we typically specify function table sizes
between 512 and 8192. In our first score, etude1.sco, we define the following func-
tions using GEN10, GEN20, and GEN1 as shown in figure 1.6.
All four functions are loaded at time 0. Both f 1 and f 2 use GEN10 to fill 4K
tables (4096 values) with one cycle of a sinewave ( f 1) and with the first 16 harmon-
ics of a sawtooth wave ( f 2). GEN20 is used to fill a 4K table ( f 3) with a Hanning
window for use by the grain opcode. Finally, f 4 uses GEN1 to fill a table with a
44.1K mono 16-bit AIFF format soundfile of a male vocalist singing the word la
at the pitch A440 for 3 seconds. This sample is used by the loscil opcode. (Note
that the length of f 4 is specified as 0. This tells the GEN1 subroutine to read the
actual length of the file from the header of the soundfile “sing.aif.” In this specific
case the length would be 132300 samples—44100 samples-per-second * 3 seconds.)
14 Richard Boulanger
In the second part of the Csound score file we write the notes. As in the orchestra
file, each note-statement in the score file occupies a single line. Note-statements (or
i-statements) call for an instrument to be made active at a specific time and for a
specific duration. Further, each note-statement can be used to pass along a virtually
unlimited number of unique parameter settings to the instrument and these parame-
ters can be changed on a note-by-note basis.
Like the orchestra, which renders a sound line-by-line, the score file is read line-
by-line, note-by-note. Notes, however, can have the same start-times and thus be
performed simultaneously. In Csound one must always be aware of the fact that
whenever two or more notes are performed simultaneously, or whenever they over-
lap, their amplitudes are added. This can frequently result in samples-out-of-range
or clipping. (We will discuss this in detail shortly.)
You may have noticed that in the orchestra an opcode’s arguments were separated
by commas. Here in the score both the f-table arguments and i-statement parameter
fields (or p-fields) are separated by any number of spaces or tabs. Commas are not
used.
In order to keep things organized and clear sound designers often use tab-stops
to separate their p-fields. This practice keeps p-fields aligned in straight columns
and facilitates both reading and debugging. This is not required—just highly
recommended.
In all note-statements, the meaning of the first three p-fields (or columns) is reserved.
The first three p-fields (as in fig. 1.7 on page 15) specify the instrument number, the
start-time and the duration.
; P1 P2 P3
i INSTRUMENT # START-TIME DURATION
The function of all other p-fields is determined by you—the sound designer. Typ-
ically, p4 is reserved for amplitude and p5 is reserved for frequency. This conven-
tion has been adopted in this chapter and throughout this text. In our first score,
etude1.sco, as shown in figure 1.7, a single note with a duration of 3 seconds is
played consecutively on instr 101—instr 106. Because the start-times of each note
are spaced 4 seconds apart there will be a second of silence between each audio
event.
15 Introduction to Sound Design in Csound
; P1 P2 P3
; INSTRUMENT # START-TIME DURATION
i 101 0 3
i 102 4 3
i 103 8 3
i 104 12 3
i 105 16 3
i 106 20 3
Figure 1.7 Simple score used to play instruments 101 through 106 as shown in figure 1.2
and 1.4.
䡲 Render the Csound orchestra and score files: etude1.orc & etude1.sco.
䡲 Play and listen to the different sound qualities of each instrument.
䡲 Modify the score file and change the duration of each note.
䡲 Make all the notes start at the same time.
䡲 Comment out several of the notes so that they do not play at all.
䡲 Cut and Paste multiple copies of the notes. Then, change the start times ( p2) and
durations ( p3) of the copies to make the same instruments start and end at differ-
ent times.
䡲 Create a canon at the unison with instr 106.
䡲 Look up and read about the opcodes used in instr 101–106 in the Csound Refer-
ence Manual.
k/ar oscil k/amp, k/cps, ifn[, iphs]
ar grain xamp, xpitch, xdens, kampoff, kpitchoff, kgdur, igfn, iwfn, imgdur
[; igrnd]
ar loscil xamp, kcps, ifn[, ibas[, imod1, ibeg1, iend1[, imod2, ibeg2, iend2]]]
䡲 In the orchestra file, modify the frequency and amplitude arguments of each
instrument.
䡲 Change the frequency ratios of the carrier and modulator in the foscil instrument.
䡲 Change the number of harmonics in the buzz instrument.
䡲 Change the initial function for the pluck instrument.
16 Richard Boulanger
amplitude
a.
time
period
1
amplitude
b.
1/3
1/5
1/7
1(f) 2 3 4 5 6 7 ...
partials frequency
Figure 1.8 A time domain (a) and frequency domain (b) representation of a square wave.
...10100101... ...10100101...
In our second orchestra, we modify instr 101–106 so that they can be updated and
altered from the score file. Rather than setting each of the opcode’s arguments to a
fixed value in the orchestra, as we did in etude1.orc, we set them to “p” values that
correspond to the p-fields (or column numbers) in the score. Thus each argument
can be sent a completely different setting from each note-statement.
In instr 107, for example, p-fields are applied to each of the oscil arguments: am-
plitude ( p4), frequency ( p5) and wavetable ( p6) as shown in figure 1.10. Thus, from
the score file in figure 1.12, we are able to re-use the same instrument to play a
sequence of three descending octaves followed by an A major arpeggio.
In our next p-field example, shown in figures 1.13, 1.14, and 1.15, our rather lim-
ited instr 102 has been transformed into a more musically versatile instr 108—an
instrument capable of a rich and varied set of tone colors.
In the score excerpt shown in figure 1.15, each of the foscil arguments has been
assigned to a unique p-field and can thus be altered on a note-by-note basis. In this
case p4 ⫽ amplitude, p5 ⫽ frequency, p6 ⫽ carrier ratio, p7 ⫽ modulator ratio, p8
⫽ modulation index and p9 ⫽ wavetable. Thus, starting 7 seconds into etude2.sco,
instr 108 plays six consecutive notes. All six notes use f 1 (a sine wave in p9). The
first two notes are an octave apart ( p5 ⫽ 440 and 220) but have different c:m ratios
( p7 ⫽ 2 and .5) and different modulation indexes ( p8 ⫽ 3 and 8), resulting in two
different timbres. Clearly, p-fields in the orchestra allow us to get a wide variety of
pitches and timbres from even the simplest of instruments.
19 Introduction to Sound Design in Csound
p4 p5
OSCIL
p6
(a1)
OUT
Figure 1.10 Block diagram of instr 107, a simple oscillator instrument with p-field
substitutions.
Figure 1.11 Orchestra code for instr 107, a simple oscillator instrument with p-field
arguments.
; P1 P2 P3 P4 P5 P6
; INS STRT DUR AMP FREQ WAVESHAPE
i 107 0 1 10000 440 1
i 107 1.5 1 20000 220 2
i 107 3 3 10000 110 2
i 107 3.5 2.5 10000 138.6 2
i 107 4 2 5000 329.6 2
i 107 4.5 1.5 6000 440 2
Figure 1.12 Note-list for instr 107, which uses p-fields to “perform” six notes (some over-
lapping) with different frequencies, amplitudes, and waveshapes.
p4 p5 p6 p7 p8 p9
FOSCIL
(a1)
OUT
Figure 1.13 Block diagram of instr 108, a simple FM instrument with p-fields for each
parameter.
20 Richard Boulanger
Figure 1.14 Orchestra code for instr 108, a simple FM instrument with p-field substitutions.
; P1 P2 P3 P4 P5 P6 P7 P8 P9
; INS STRT DUR AMP FREQ C M INDEX WAVESHAPE
i 108 7 1 10000 440 1 2 3 1
i 108 8.5 1 20000 220 1 .5 8 1
i 108 10 3 10000 110 1 1 13 1
i 108 10.5 2.5 10000 130.8 1 2.001 8 1
i 108 11 2 5000 329.6 1 3.003 5 1
i 108 11.5 1.5 6000 440 1 5.005 3 1
Figure 1.15 Note-list for instr 108 in which nine p-fields are used to “play” an FM synthe-
sizer with different start-times, durations, amplitudes, frequencies, frequency-ratios, and mod-
ulation indices.
䡲 Using instr 110, experiment with the various pluck methods. (See the Csound
Reference Manual for additional arguments).
䡲 Using instr 110, experiment with different initialization table functions—f 1 and
f 2. Also try initializing with noise and compare the timbre.
䡲 Explore the various parameters of the grain opcode.
䡲 Create a set of short etudes for each of the instruments alone.
䡲 Create a set of short etudes for several of the instruments in combination. Remem-
ber to adjust your amplitude levels so that you do not have any samples-out-of-
range.
䡲 Lower the sample-rate and the control rate in the header. Recompile some of your
modified instruments. Do you notice any difference in sound quality? Do you no-
tice a change in brightness? Do you notice any noise artifacts? Do you notice any
aliasing? (We will discuss the theory behind these phenomena a little later.)
As stated previously, if you have a 16-bit converter in your computer system (which
is still quite common in 1999), then you can express 216 possible raw amplitude
values (i.e., 65536 in the range –32768 to ⫹32767). This translates to an amplitude
range of over 90 dB (typically you get about 6 dB of range per bit of resolution). If
you have been doing the exercises you have probably noticed that note amplitudes in
Csound are additive. This means that if an instrument has an amplitude set to 20000
and you simultaneously play two notes on that instrument, you are asking your con-
verter to produce a signal with an amplitude of ⫾40000. The problem is that your
16-bit converter can only represent values up to about 32000 and therefore your
Csound job will report that there are samples-out-of-range and the resulting sound-
file will be clipped as show in figure 1.16.
0
+ 0
= 0
Dealing with amplitudes is one of the most problematic aspects of working with
Csound. There is no easy answer. The problem lies in the fact that Csound amplitudes
are a simple mathematical representation of the signal. These measurements take no
account of the acoustical or perceptual nature of the sound.
Simply put, two times the linear displacement of amplitude will not necessarily
be perceived as two times as loud. A good book on acoustics will help you appreciate
the complexity of this problem. In the Csound world, remember that whenever two
or more notes are sounding their amplitudes are added. If the numbers add up to
anything greater than 32000 your signal will be clipped. Csound has some opcodes
and tools that will help you deal with this samples-out-of-range problem but none of
the current opcodes or value converters truly solve it. Most of the time you will just
have to set the levels lower and render the file again (and again and again) until
you get the amplitudes into a range that your system can handle, with the mix your
ears desire.
Data Rates
As you have seen in the first two etude orchestras, we can set and update parameters
(arguments) with floating-point constants either directly in the orchestra or remotely
through parameter-fields. But the real power of Csound is derived from the fact that
one can update parameters with variables at any of three update or data-rates: i-rate,
k-rate, or a-rate, where:
䡲 i-rate variables are changed and updated at the note-rate.
䡲 k-rate variables are changed and updated at the control-rate (kr).
䡲 a-rate variables are changed and updated at the audio-rate (sr).
Both i-rate and k-rate variables are scalars. Essentially, they take on one value at
a given time. The i-rate variables are primarily used for setting parameter values
and note durations. They are evaluated at initialization-time and remain constant
throughout the duration of the note-event.
The k-rate variables are primarily used for storing and updating envelopes and
sub-audio control signals. These variables are recomputed at the control-rate (4410
times per second) as defined by kr in our orchestra header. The a-rate variables are
arrays or vectors of information. These variables are used to store and update data
such as the output signals of oscillators and filters that change at the audio sampling-
rate (44100 times per second) as defined by sr in our orchestra header.
One can determine and identify the rate at which a variable will be updated by the
first letter of the variable name. For example, the only difference between the two
23 Introduction to Sound Design in Csound
Figure 1.17 Two oscil opcodes with asig versus ksig outputs.
oscillators shown in figure 1.17 is that one is computed at the audio-rate and the
other at the control-rate. Both use the same opcode, oscil and both have the same
arguments. What is different, then, is the sample resolution (precision) of the out-
put signal.
Given our default header settings of sr ⫽ 44100 and kr ⫽ 4410, the output ksig
would be rendered at a sample-rate of 4K and the output asig will be rendered at a
sample-rate of 44.1K. In this case, the resulting audio would sound quite similar
because both would have enough sample resolution to accurately compute the 1000
Hz sinewave. If, however, the arguments were different and the waveforms had addi-
tional harmonics, such as the sawtooth wave defined by f 2 in figure 1.18, the k-rate
setting of 4410 samples-per-second would not accurately represent the waveform
and aliasing would result. (We will cover this in more detail later.)
You should note that it is left up to you, the sound designer, to decide the most
appropriate, efficient and effective rate at which to render your opcodes. For ex-
ample, you could render all of your low frequency oscillators (LFOs) and envelopes
at the audio-rate, but it would take longer to compute the signals and the additional
resolution would, in most cases, be imperceptible.
Variable Names
In our instrument designs so far we have been using a1, asig, k1 and ksig—in many
cases interchangeably. What’s with these different names for the same thing? Csound
is difficult enough. Why not be consistent?
Well, when it comes to naming variables, Csound only requires that the variable
names you use begin with the letter i, k, or a. This is so that the program can deter-
mine at which rate to render that specific line of code. Anything goes after that.
For instance, you could name the output of the loscil opcode below a1, asig, asam-
ple, or acoolsound. Each variable name would be recognized by Csound and would
run without error. In fact, given that the lines of code each have the same parameter
settings they would all sound exactly the same when rendered—no matter what
name you gave them. Therefore, it is up to you, the sound designer, to decide on a
variable naming scheme that is clear, consistent and informative.
24 Richard Boulanger
Let’s review a bit more theory before getting into our more advanced instrument
designs. As we stated earlier, the undersampled sawtooth (ksig) in figure 1.18 is an
example of aliasing and a proof of the Sampling Theorem. Simply put, the Sampling
Theorem states that, in the digital domain, to accurately reconstruct (plot, draw, or
reproduce) a waveshape at a given frequency you need twice as many samples as the
highest frequency you are trying to render. This hard upper limit at 1/2 the sampling
rate is known as the Nyquist frequency. With an audio-rate of 44100 Hz you can
accurately render tones with frequencies (and partials) up to 22050 Hz—arguably
far above the human range of hearing. And with a control-rate of 4410 Hz you can
accurately render tones up to 2205 Hz. This would be an extremely fast LFO and
seems a bit high for slowly changing control signals, but you should realize that
certain segments of amplitude envelopes change extremely rapidly and high-
resolution controllers reduce the zipper-noise sometimes resulting from these rapid
transitions.
Figure 1.19 illustrates graphically the phenomena known as aliasing. Because a
frequency is undersampled an alias or alternate frequency results. In this specific
case our original sinewave is at 5 Hz. We are sampling this wave at 4 Hz (remember
that the minimum for accurate reproduction would be 10 Hz–2 times the highest
frequency component). What results is a 1 Hz tone. As you can see from the figure,
the values that were returned from the sampling process trace the outline of a 1 Hz
sinewave, not a 5 Hz one. The actual aliased frequency is the difference between the
sampling frequency and the frequency of the sample (or its partials).
To totally understand and experience the results of this phenomena it would be
informative to go back to the earlier instruments in this chapter and experiment with
different rate variables (I would recommend that you duplicate and renumber all the
instruments and then change all the asig and a1 variables to ksig and k1 variables.
25 Introduction to Sound Design in Csound
Figure 1.18 An “under-sampled” sawtooth wave (given kr ⫽ 4410 and a frequency setting
of 1000), resulting in an “aliased” ksig output.
a.
b.
c.
Figure 1.19 Aliasing. A 5Hz sine (a) is “under-sampled” at 4 times a second (b) resulting
in the incorrect reproduction of a 1Hz sine (c).
You’ll be surprised and even pleased with some of the lo-fi results.) For now let’s
move on.
p4
p7 p8
LINEN
p3
(k1)
p5
OSCIL
p6
(a1)
Figure 1.20 Block diagram of instr 113, an example of one opcode’s output controlling the
input argument of another. In this case we are realizing dynamic amplitude control by modi-
fying the amplitude argument of the oscil opcode with the output of another—the linen.
Figure 1.21 Orchestra code for instr 113, a simple oscillator instrument with amplitude en-
velope control.
and engaging is the subtle, dynamic and interdependent behavior of its three main
parameters—pitch, timbre, and loudness. What makes Csound a truly powerful soft-
ware synthesis language is the fact that one can literally patch the output of any
opcode into virtually any input argument of another opcode—thereby achieving an
unsurpassed degree of dynamic parameter control. By subtly (or dramatically) modi-
fying each of your opcode’s input arguments, your synthetic, computerized sounds
will spring to life.
Up to this point we have essentially gated our Csound instruments—simply turn-
ing them on at full volume. I’m not sure that any acoustic instrument works like that.
Clearly, applying some form of overall envelope control to these instruments would
go a long way toward making them more musical. And by adding other dynamic
parameter controls we’ll render sounds that are ever more enticing.
In instr 113, shown in figure 1.20 and 1.21, Csound’s linen opcode is used to
dynamically control the amplitude argument of the oscillator, thus functioning as a
typical attack-release (AR) envelope generator.
In instr 115, shown in figures 1.22 and 1.23, a linen is again used to apply a
dynamic amplitude envelope. This time the enveloping is done by multiplying the
output of the linen opcode (k1) with the output of the buzz opcode (a1). In fact, the
multiplication is done in the input argument of the out opcode (k1 * a1). Here we
27 Introduction to Sound Design in Csound
EXPON
(k2)
1
LINEN 1 p5 p6
(k1)
BUZZ
(a1)
Figure 1.22 Block diagram of instr 115 showing amplitude control by multiplying two out-
puts and dynamic control of an argument.
Figure 1.23 Orchestra code for instr 115, an instrument with dynamic amplitude and har-
monic control.
LINE EXPON
(k3)
(k5)
LINE
LINSEG (k4) EXPON
(k2) (k6)
p4 1 p6 1
GRAIN
(a1)
LINEN
(a2)
Figure 1.24 Block diagram of instr 117 showing amplitude control by passing the signal
(a1) through an a-rate envelope (a2).
Figure 1.25 Orchestra code for instr 117, a granular synthesis instrument with dynamic
control of many parameters. Note that the output of grain (a1) is “patched” into the amplitude
argument of an a-rate linen to shape the sound with an overall amplitude envelope.
29 Introduction to Sound Design in Csound
Envelopes
I have to admit that as a young student of electronic music I was always confused by
the use of the term envelope in audio and synthesis. I thought of envelopes as thin
paper packages in which you could enclose a letter to a friend or a check to the phone
company and could never quite make the connection. But the algorithm used in instr
117 makes the metaphor clear. Here we see that the linen opcode completely pack-
ages or folds the signal into this odd-shaped AR container and then sends it to the
output. Figure 1.26 is another way to visualize the process. First, we see the raw
bipolar audio signal. Then, we see the unipolar attack-decay-sustain-release (ADSR)
+1
amp 0
time
-1
+10000
amp 0
time
+10000
amp 0
time
-10000
+10000
amp 0
time
-10000
amplitude envelope. Next, we see the envelope applied to the audio signal. In the
final stage we see the bipolar audio signal whose amplitude has been proportionally
modified by the contour of the ADSR.
Another way of looking at figure 1.26 would be that our bipolar signal is scaled
(multiplied) by a unipolar ADSR (Attack-Decay-Sustain-Release) envelope that sym-
metrically contours the unipolar signal. The result is that the unipolar signal is envel-
oped in the ADSR package. Let’s apply this new level of understanding in another
instrument design.
In instr 118, shown in figure 1.27 and 1.28, we illustrate yet another way of ap-
plying an envelope to a signal in Csound. In this case, we are using an oscillator
whose frequency argument is set to 1/p3. Let’s plug in some numbers and figure out
how this simple expression will help us compute the correct sub-audio frequency
that will transform our periodic oscillator into an aperiodic envelope generator.
For example, if the duration of the note was 10 seconds and the frequency of our
oscillator was set to 1/10 Hz it would take 10/10 Hz to completely read 1 cycle of the
function table found in p7. Thus, setting the frequency of an oscillator to 1 divided by
the note-duration, or 1/p3, guarantees that this periodic signal generator will compute
p4 1/p3
OSCIL EXPSEG
p7
(k1) (k2)
p6
LOSCIL
(a1)
Figure 1.27 Block diagram of instr 118, an instrument with an oscillator for an envelope
generator.
Figure 1.28 Orchestra code for instr 118, a sample-playback instrument with an oscillator-
based envelope and dynamic pitch modulation.
31 Introduction to Sound Design in Csound
Figure 1.29 Linear and exponential envelope functions using GEN5 and GEN7.
Figure 1.30 Orchestra code for instr 119, an FM instrument with an oscillator envelope in
which p8 determines the “retriggering” frequency.
only 1 period, or read only 1 complete cycle of its f-table during the course of each
note event.
In instr 118 the envelope functions called by p7 ( f 6, f 7 and f 8) use GEN7 and
GEN5 to draw a variety of unipolar linear and exponential contours. It is important
to note that it is illegal to use a value of 0 in any exponential function such as those
computed by the GEN5 subroutine or by the expseg opcode. You will notice there-
fore, that f 8, which uses GEN5, begins and ends with a value of .001 rather than 0.
The enveloping technique employed in instr 118 (using an oscillator as an envelope
generator) has several advantages. First, you can create an entire library of preset
envelope shapes and change them on a note-by-note basis. Second, since the enve-
lope generator is in fact an oscillator, you can have the envelope loop or retrigger
during the course of the note event to create interesting LFO-based amplitude-gating
effects. In instr 119, shown in figure 1.30, p8 determines the number of repetitions
that will occur during the course of the note. If p8 is set to 10 where p3 is 5 seconds,
the instrument will retrigger the envelope 2 times per second. Whereas if the duration
of the note was 1 second ( p3 ⫽ 1), then the envelope would be re-triggered 10 times
per second.
+1
-1
+1
Figure 1.32 Two normalized functions ( f 1 and f 2) and two non-normalized functions ( f 3
and f 4).
䡲 Render the third Csound orchestra and score: etude3.orc & etude3.sco.
䡲 Play and listen to the different sound qualities and envelope shapes of each note
and each instrument.
33 Introduction to Sound Design in Csound
䡲 Modify the orchestra file and change the variable names to more meaningful ones.
For instance, rename all a1 variables asig1 and k1 variables kenv1.
䡲 In the Csound Reference Manual, look up the new opcodes featured in instr
113–119:
k/ar linen k/xamp, irise, idur,idec
k/ar line ia, idur1, ib
k/ar expon ia, idur1, ib
k/ar linseg ia, idur1, ib[, idur2, ic[...]]
k/ar expseg ia, idur1, ib[, idur2, ic[...]]
䡲 Modify the attack time ( p7) and release times ( p8) of the linen opcodes in instr
113–117.
䡲 Add a pitch envelope to instr 113, 114, and 115 by adding a linseg to each instru-
ment and adding it’s output to p5.
䡲 Experiment with the dynamic controls of the grain parameters found in instr 117.
䡲 Substitute oscil-based envelopes for the linen-based envelopes in instr 113–117.
䡲 Use GEN5 and GEN7 to design several additional envelope functions. Try to imi-
tate the attack characteristics of a piano—f 9, a mandolin—f 10, a tuba—f 11, a
violin—f 12 and a male voice singing “la”—f 13. Apply these envelopes to your
newly designed versions of instr 113–117.
䡲 Following the examples of the figures you have studied so far, draw block diagrams
for instr 112, 113, 114, and 119.
Next, we improve the quality of our instruments by first mixing and detuning our
oscillators to create a fat “chorused” effect. Then, we crossfade opcodes to create
a hybrid synthesis algorithm unlike anything offered commercially. (The synthesis
approach is sometimes referred to as modal synthesis.) Finally, we animate our in-
struments by introducing sub-audio rate and audio rate amplitude and frequency
modulation (AM and FM). We also employ several of Csound’s display opcodes to
visualize these more complex temporal and spectral envelopes. And we’ll learn a
little more about the language as we go.
34 Richard Boulanger
In instr 120, shown in figures 1.33 and 1.34, we mix together three detuned oscilla-
tors that all use the same envlpx opcode for an amplitude envelope. Using the dis-
play opcode this envelope is plotted on the screen with a resolution set to trace the
envelope shape over the entire duration of the note ( p3) and thus display the com-
plete contour.
Although instr 120 is still rather simple in design, it does serve as a model of the
way that more complex instruments are typically laid out and organized in Csound.
In figure 1.34 you can see that variables are initialized at the top of the instrument
and given names that help us to identify their function (resulting in a self-comment-
ing coding style). Clearly you can read that the attack time is assigned to iatk from
the score value given in p7 (iatk ⴝ p7) and that the release time is assigned to irel
from the score value given in p9 (irel ⴝ p9). Most importantly, by looking at where
they are patched in the envlpx opcode you can see and remember which arguments
correspond with which particular parameters, thereby making the opcode easier to
read.
You should note that in Csound the equals sign ( ⴝ ) is the assignment operator.
It is in fact an opcode. Assigning plain-English mnemonics and abbreviated names
to variables at i-time at the top of your instrument, the initialization block, makes an
instrument much easier to read and is highly recommended.
Spectral Fusion
Next we will look at instr 122, as shown in figures 1.35 and 1.36. This instrument
uses independent expon opcodes to crossfade between a foscil and a buzz opcode
that are both fused (morphed/mixed/transfigured) with a pluck attack creating a
beautiful hybrid timbre. This instrument employs Csound’s dispfft opcode to com-
pute and display a 512 point Fast Fourier Transform (FFT) of the composite signal
updated every 250 milliseconds. Although the display and dispfft opcodes are a
wonderful way to look into the behavior of your instrument, it is important to note
that when you are using your instruments to make music you should always remem-
ber to comment out these display and print opcodes. They significantly impact the
performance of your system. These opcodes are informative and educational but re-
ally function as debugging tools and you should think of them as such.
Rather than simply mixing or crossfading opcodes as we have done in instr 120
and instr 122, another popular approach is to modulate one audio opcode with the
frequency and amplitude of another. In instr 124, as shown in figures 1.37 and 1.38
35 Introduction to Sound Design in Csound
ENVLPX
(kenv)
(amix) DISPLAY
Figure 1.33 Block diagram of instr 120 illustrating three chorusing oscillators with a com-
mon envelope and display.
Figure 1.34 Orchestra code for instr 120, a chorusing instrument in which p-fields are given
i-time variable names. Also an envlpx, which is displayed, is used as a common envelope.
36 Richard Boulanger
ENVLPX
(kenv)
EXPON
EXPON (kbuzswp)
(kmodswp) 1
ifrq*.5
ifrq 1 1 ifun ifrq*.99 ifun iamp ifrq 0 1
(amix)
.25 1024
DISPFFT
Figure 1.35 Block diagram of instr 122 illustrating the FFT display of 3 mixed (fused) and
crossfaded (morphed) opcodes.
EXPON LINE
(kmodpth) (kmodfrq)
OSCIL
imodfun
(alfo) ifr q
OSCIL
ifun
ENVLPX
(asig)
(kenv)
Figure 1.37 Block diagram of instr 124, a dynamic amplitude modulation instrument.
Figure 1.38 Orchestra code for instr 124, an amplitude modulation instrument with inde-
pendent amplitude envelope and variable LFO.
38 Richard Boulanger
LINSEG
(kvibenv)
imoddpt
imodfr q
OSCIL
f1
LINEN (klfo)
(kenv)
ifr q
iharm 1
BUZZ
(asig)
Figure 1.39 Block diagram of instr 126, an additive instrument with delayed vibrato.
for example, an a-rate oscil (asig) is amplitude modulated with the output of a dy-
namically swept a-rate oscil (alfo) whose frequency is dynamically altered with a
line opcode and whose amplitude is controlled by an expon. This simple oscillator
combination can produce a wide range of dynamically evolving harmonic and inhar-
monic timbres.
Next, in instr 126, shown in figures 1.39 and 1.40, we present a simple vibrato
instrument that uses a linseg opcode to delay the onset of the modulation resulting
in a more natural vibrato effect. Even the relatively simple designs of instr 124 and
instr 126 can lend themselves to an extraordinarily diverse and rich palate of colors.
Take time to explore and modify them.
Value Converters
In the initialization block of instr 120, shown in figure 1.34 (and in all the instru-
ments in this etude for that matter), you might have noticed that two of Csound’s
value converters ampdb and cpspch were used (iamp ⴝ ampdb(p4) and ifrq ⴝ
cpspch(p5)). These allow us to express frequency and amplitude data in a more
familiar and intuitive format than having to use the straight Hz and linear amplitudes
we have used thus far.
The cpspch value converter will read a number in octave-point-pitch-class nota-
tion and convert it to Hz (e.g., 8.09 ⫽ A4 ⫽ 440 Hz). Octave-point-pitch-class is a
39 Introduction to Sound Design in Csound
Figure 1.40 Orchestra code for instr 126, a buzz instrument with delayed vibrato.
notation shorthand system in which octaves are represented as whole numbers (8.00
⫽ Middle C or C4, 9.00 ⫽ C5, 10.00 ⫽ C6, etc.) and the 12 equal-tempered pitch
classes are numbered as the two decimal digits that follow the octave (8.01 ⫽ C#4,
8.02 ⫽ D4, 8.03 ⫽ D#4, etc.). The scale shown in figure 1.41 should make the
system extremely clear to you. And by adding more decimal places, it is also possible
to specify microtones as shown in figure 1.42.
As you can see, cpspch converts from the pch (octave point pitch-class) represen-
tation to a cps (cycles-per-second) representation. If you are writing tonal or micro-
tonal music with Csound you might find this value converter particularly useful.
Similarly, the ampdb value converter will read a decibel value and convert it to a
raw amplitude value as shown in figure 1.43.
You should note that although the logarithmic dB or decibel scale is linear in per-
ception, Csound doesn’t really use dB. The ampdb converter is a direct conversion
with no scaling. Regrettably, you will still have to spend a great deal of time adjust-
ing, normalizing, and scaling your amplitude levels, even if you are using Csound’s
ampdb converter, because the conversion is done prior to the rendering.
䡲 Render the fourth Csound orchestra and score: etude4.orc & etude4.sco.
䡲 Play and listen to the dynamic timbre and modulation effects of each note and
each instrument.
40 Richard Boulanger
C4 261.626 8.00 60
C#4 277.183 8.01 61
D4 293.665 8.02 62
D#4 311.127 8.03 63
E4 329.628 8.04 64
F4 349.228 8.05 65
F#4 369.994 8.06 66
G4 391.955 8.07 67
G#4 415.305 8.08 68
A4 440.000 8.09 69
A#4 466.164 8.10 70
B4 493.883 8.11 71
C5 523.251 9.00 72
Figure 1.41 A chromatic scale beginning at middle C specified by using Csound’s cpspch
value converter.
NOTE # CPSPCH
C4 8.00
C4⫹ 8.005
C#4 8.01
C#4⫹ 8.015
D4 8.02
D4⫹ 8.025
D#4 8.03
D#4⫹ 8.035
E4 8.04
E4⫹ 8.045
F4 8.05
F4⫹ 8.055
F#4 8.06
F#4⫹ 8.065
G4 8.07
G4⫹ 8.075
G#4 8.08
G#4⫹ 8.085
A4 8.09
A4⫹ 8.095
A#4 8.10
A#4⫹ 8.105
B4 8.11
B4⫹ 8.115
C5 9.00
Figure 1.42 An octave of equal-tempered quartertones specified using the cpspch value
converter.
41 Introduction to Sound Design in Csound
ampdb(42) ⫽ 125.9
ampdb(48) ⫽ 251.2
ampdb(54) ⫽ 501.2
ampdb(60) ⫽ 1000
ampdb(66) ⫽ 1995.3
ampdb(72) ⫽ 3981.1
ampdb(78) ⫽ 7943.3
ampdb(84) ⫽ 15848.9
ampdb(90) ⫽ 31622.8
ampdb(96) ⫽ 63095.7 ; WARNING: SAMPLES OUT OF RANGE!!!
䡲 Modify instr 120 so that you are chorusing three foscil opcodes instead of three
oscil opcodes.
䡲 As shown in instr 126, add a delayed vibrato to your three foscil version of instr
120.
䡲 Block diagram instr 121 (which was not discussed in this section) and add delayed
vibrato plus some of your own samples to this wavetable synthesizer.
䡲 Modify instr 122 so that you are creating different hybrid synthesizers. Perhaps
you could add a grain of loscil?
䡲 Block diagram instr 123 (which was not discussed in this section) and change the
rhythms and pitches. Try audio-rate modulation. Finally, make and use your own
set of amplitude modulation functions.
䡲 Modify instr 124 so that you are not sweeping so radically. Add chorusing and
delayed vibrato.
䡲 Block diagram instr 125 (which was not discussed in this section). Change the
modulation frequency and depth using the existing functions. Modulate some of
your own samples.
䡲 Modify instr 126 so that these synthetic voices sing microtonal melodies and
harmonies.
䡲 Block diagram instr 127 (which was not discussed in this section). Have fun modi-
fying it with any of the techniques and bits of code you have developed and mas-
tered so far.
䡲 In the Csound Reference Manual, look up the new opcodes featured in instr
120–127:
k/ar envlpx k/xamp,irise,idur,idec,ifn,iatss,iatdec[,ixmod]
print iarg[, iarg, ...]
display xsig, iprd[, inprds][, iwtflg]
dispfft xsig, iprd, iwsiz[, iwtyp][, idbouti][, iwtflg]
42 Richard Boulanger
䡲 Create a new set of attack functions for envlpx and use them in all the instruments.
䡲 Add print, display, and dispfft opcodes to instr 123–127. (But do remember to
comment them out when you are making production runs with your instruments.)
The next set of instruments employ a number of Csound signal modifiers in various
parallel and serial configurations to shape and transform noise and wavetables. In
instr 128, shown in figures 1.45 and 1.46, we dynamically filter white noise produced
by Csound’s rand opcode. Separate expon and line opcodes are used to indepen-
dently modify the cutoff frequency and bandwidth of Csound’s two-pole reson
(bandpass) filter. Also, an expseg amplitude envelope is used and displayed.
In instr 129 through 132, shown in figure 1.47, a white noise source (rand) is passed
through a series of one-pole lowpass filters (tone). The significant contribution made
amplitude
a.
amplitude
-3dB
b.
1(f) 2 3 4 5 6 7 8 9 10 11 12 ... partials
frequency
amplitude
-3dB
c.
1(f) 2 3 4 5 6 7 8 9 10 11 12 ... partials
frequency
amplitude
-3dB
d.
amplitude
-3dB
e.
Figure 1.44 A source signal (a) modified by the four basic filters: tone—a one-pole lowpass
(b), atone—a one-pole highpass (c), reson—a two-pole bandpass (d), and areson—a two-
pole bandreject (e).
44 Richard Boulanger
ifrq
RESON EXPSEG
(afilt) (kenv)
idur
DISPLAY
Figure 1.46 Orchestra code for instr 128, a bandpass filtered noise instrument with variable
cutoff frequency and bandwidth.
by each additional pole should be quite apparent from this set of examples. In fact,
each pole increases the slope or steepness of a filter by adding an additional 6 dB
per octave of roll-off at the cutoff frequency. A cascade filter design such as this
makes the slope proportionally steeper with each added tone thus resulting in a more
“effective” filter. Thus, in our cascade design instr 129 would have a slope corre-
sponding to an attenuation of 6 dB per octave; instr 130 would have a slope of 12 dB
per octave; instr 131 would have a slope of 18 dB per octave; and instr 132 would
have a slope of 24 dB per octave. The dispfft opcode should clearly show the pro-
gressive effect on the spectrum of the noise source in each instrument.
45 Introduction to Sound Design in Csound
Figure 1.47 Orchestra code excerpts from instr 129–132, which pass white noise through a
cascade of one-pole lowpass filters.
Displays
For several examples now, we have been using Csound’s display and dispfft opcodes
to look at signals. But what exactly is being displayed? And how are these opcodes
different?
As you know, signals can be represented in either the time or the frequency do-
main. In fact, these are complementary representations illustrating how the signal
varies in either amplitude or frequency over time. Csound’s display opcode plots
signals in the time domain as an amplitude versus time graph whereas the dispfft
opcode plots signals in the frequency domain using the Fast Fourier Transform
method. Both allow us to specify how often to update the display and thereby provide
the means of watching a time or frequency domain signal evolve over the course of
a note. We used display in instr 128 to look at the shape of the expseg amplitude
envelope and see the way that the amplitude varied over the entire duration of the
note.
46 Richard Boulanger
In instr 129–132 we used the dispfft opcode to look at the way that the frequencies
were attenuated by our filter network. By specifying that the FFT be 4096 we divided
our frequency range into 2048 linearly spaced frequency bins of about 21.5 Hz
(44100/2048 ⫽ 21.533), but we could have divided the spectrum anywhere from 8
bands (each 5512 Hz wide) to 2048 bands (each 21.5 Hz wide). We will continue
using these opcodes to look into the time domain and frequency domain characteris-
tics of the sounds that our instruments produce. In particular, the dispfft opcode will
help us better understand the effect that Csound’s different filters are having on the
signals we put into them.
In the early days of analog synthesis the filters defined the sound of these rare and
now coveted “classic” instruments. The tone and reson filters we have used thus far
were some of Csound’s first filters. They are noted for their efficiency (they run fast)
and equally noted for their instability (they blow up). In fact it has always been good
advice to patch the output of these filters into Csound’s balance opcode in order to
keep the samples-out-of-range under control.
Over the years, however, many new filters have been added to the Csound lan-
guage. Today in Csound the Butterworth family of filters (butterlp, butterhp, but-
terbp and butterbr) sound great and are becoming more common in virtually all
instrument designs. This is due in part to the fact that the Butterworth filters have:
more poles (they’re steeper and more effective at filtering), a flatter frequency re-
sponse in the pass-band (they are smoother and cleaner sounding) and they are sig-
nificantly more stable (meaning that you do not have to worry so much about
samples-out-of-range). In instr 133, shown in figures 1.48 and 1.49 we use a parallel
configuration comprised of a 4-pole butterbp and a 4-pole butterlp filter pair as a
way to model the classic resonant-lowpass filter commonly found in first generation
analog synthesizers.
As you can see and hear from the previous set of examples, dynamic parametric
control of Csound’s filter opcodes, combined in various parallel and serial configura-
tions, opens the door to a wide world of subtractive sound design.
An Echo-Resonator
Let’s shift our focus now to another set of Csound’s signal modifiers—comb and
vdelay.
A comb filter is essentially a delay line with feedback as illustrated in figure 1.50.
As you can see, the signal enters the delay line and its output is delayed by the length
of the line (25 milliseconds later in this case). When it reaches the output, it is fed
back to the input after being multiplied by a gain factor.
47 Introduction to Sound Design in Csound
EXPON PLUCK
(kcut) (aplk)
.2
BUTTERBP BUTTERLP
(abpf) (alpf)
ir esgn
(amix)
idur 1024
DISPFFT
Figure 1.48 Block diagram of instr 133, a parallel butterbp and butterlp configuration
resulting in a “classic” resonant-lowpass design.
Figure 1.49 Orchestra code for instr 133, a “classic” resonant-lowpass filter design.
48 Richard Boulanger
gain=.5
INPUT OUTPUT
DELAY
25 ms
.5
.25
.125 .0625
25 50 75 100 125 ms
The time it takes for the signal to circulate back to the input is called the loop-
time. As demonstrated in instr 135, shown in figures 1.51, 1.52, and 1.53, a diskin
opcode is used to read (in both forward and reverse) and transpose samples directly
from disk into a comb filter. When the loop-time is long, we will perceive discrete
echoes; but, when the loop-time is short, the comb filter will function more like
a resonator.
As shown in figure 1.50, the impulse response of a comb filter is a train of im-
pulses spaced equally in time at the interval of the loop-time. In fact, the resonant
frequency of this filter is 1/loop-time. In instr 135 this is specified in milliseconds.
In the score comments you will see where I have converted the period of the loop,
specified in milliseconds, into the frequency of the resonator, specified in Hz.
Although we can vary the loop time in instr 135 on a note-by-note basis, the comb
opcode will not allow you to dynamically vary this parameter during the course of a
note event; the vdelay opcode will allow you to dynamically vary this parameter.
Variable delay lines are the key to designing one of the more popular studio effects—
a flanger.
In instr 136, as shown in figures 1.54 and 1.55, noise cascades through a series of
variable delay lines to make a flanger. By patching the output from one vdelay op-
code into the input of another, the strength and focus of the characteristic resonance
is emphasized (just as in our tone example in instr 132 as shown in figure 1.47
above). Furthermore, this resonant peak is swept across the frequency spectrum un-
der the control of a variable rate LFO whose frequency is dynamically modified by
the line opcode.
49 Introduction to Sound Design in Csound
“hellor cb.aif” 1
DISKIN LINEN
(ain)
(kenv)
irvt iloopt 0
COMB
(acomb)
Figure 1.51 Block diagram of instr 135, a soundfile delay/resonator instrument using a dis-
kin to read directly from disk without the use of an f-table and a comb to delay or resonate.
Figure 1.52 Orchestra code for instr 135, an echo-resonator instrument using the comb
opcode.
Figure 1.53 Score code for instr 135, the looptime ( p6) sets the period and resonant fre-
quency of this recirculating delay line.
50 Richard Boulanger
LINE
(krate)
idur
imsdel
ifrq
OSCIL
RAND f 19
(anoise) (alfo)
imsdel
VDELAY
(adel4)
imsdel
VDELAY
(adel3)
imsdel
VDELAY
(adel2)
imsdel
EXPSEG
VDELAY
(kenv)
(adel1)
imsdel
VDELAY
(adel0)
(amix)
.25 1024
DISPFFT
Admittedly, creating and editing text-based note-lists is not fun. Granted, the note-
list does offer you the most exacting and direct control over the behavior of your
instruments, but it is still one of the most unmusical and tedious aspects of working
with Csound.
As stated at the beginning of this chapter, Csound does read MIDI files and this
may prove a more intuitive way of generating notes and playing your Csound instru-
51 Introduction to Sound Design in Csound
Figure 1.55 Orchestra code for instr 136, a variable delay line “flanger” instrument.
ments. However, Csound instruments must be designed to work with MIDI. You
will need to adapt your traditional Csound instruments before they will work with
MIDI devices.
Although not covered in the text of The Csound Book, there are a number of chap-
ters on the CD-ROM dedicated to controlling Csound from MIDI keyboards and
MIDI files.
Still, without resorting to MIDI, Csound does feature a collection of Score State-
ments and Score Symbols (text-based shortcuts) that were created to simplify the
process of creating and editing note-lists. Like f-statements, these score commands
begin with a specific letter and are sometimes followed by a set of arguments. I
employ many score statements in etude5.sco.
The first score statement I employ in etude5.sco is the Advance statement—a,
shown in figure 1.56. The advance statement allows the beat-count of a score to be
advanced without generating any sound samples. Here it is used to skip over the first
two notes of the score and begin rendering 10 seconds into the piece. The advance
statement can be particularly useful when you are working on a long and complex
composition and you are interested in fine-tuning something about a sound in the
middle or at the end of the piece. Rather than waiting for the entire work to render
just to hear the last note, you can advance to the end and only render that section—
saving yourself hours of work. The syntax of the advance statement is shown in the
comments in figure 1.56.
52 Richard Boulanger
; INS ST DUR AMP FRQ ATK REL CF1 CF2 BW1 BW2
i 128 1 5 .5 20000 .5 2 8000 200 800 30
i 128 6 5 .5 20000 .25 1 200 12000 10 200
Figure 1.56 Score excerpt from etude5.sco featuring the advance statement.
Figure 1.57 Cut-and-paste repeated score excerpt from etude5.sco featuring the section
statement.
s
f 0 2 ; DUMMY F 0 STATEMENT - 2 SECONDS OF SILENCE BETWEEN SECTIONS
s
; INS ST DUR AMP FRQ ATK REL CF1 CF2 BW1 BW2
i 128 0 5 .5 20000 .5 2 8000 200 800 30
i 128 4 5 .5 20000 .25 1 200 12000 10 200
i 128 8 3 .5 20000 .15 .1 800 300 300 40
i 128 10 11 .5 20000 1 1 40 90 10 40
s
Figure 1.58 A score excerpt from etude5.sco featuring the “dummy” f 0 statement used to
insert 2 seconds of silence between two score sections.
54 Richard Boulanger
Figure 1.59 A score excerpt from etude5.sco featuring the carry ( . ), p2 increment, ( ⴙ )
and ramp ( ⬍ ) symbols.
Figure 1.60 Another view of the etude5.sco excerpt shown in figure 1.59 in which the ramp
( ⬍ ), carry ( . ) and ⴙ symbols are replaced by the actual numerical values they represent.
lets you change this default value of 60 in both a constant and a variable fashion.
Figure 1.61 illustrates both uses.
The statement t 0 120 will set a constant tempo of 120 beats per minute. Given
this setting, the internal beat-clock will run twice as fast and therefore all time values
in the score file will be cut in half.
The statement t 0 120 1 30 is used to set a variable tempo. In this case the tempo
is set to 120 at time 0 (twice as fast as indicated in the score) and takes 1 second to
gradually move to a new tempo of 30 (twice as slow as indicated in the score). Obvi-
ously, using a variable tempo can make your scores less mechanical and more
musical.
55 Introduction to Sound Design in Csound
Figure 1.61 An excerpt from the end of etude5.sco in which the tempo statement is used
in fixed and variable mode.
Working with Csound’s text-based score language can be laborious. In fact it has
inspired many a student to learn C programming in order to generate their note-lists
algorithmically. Real-time and MIDI are both solutions. But, taking advantage of
Csound’s score shortcuts can make your work a lot easier and your sound gestures,
phrases and textures a lot more expressive.
䡲 In instr 128 substitute a loscil opcode for rand opcode and dynamically filter some
of your own samples.
䡲 In instr 128 substitute a butterbp for the reson and listen to the difference in
quality.
䡲 Substitute butterlp filters for the tone filters in instr 129–132. Compare the
effectiveness.
䡲 Transform instr 133 into a resonant highpass filter instrument.
䡲 Make an instrument that combines the serial filter design of instr 132 with the
parallel filter design of instr 133.
䡲 Block diagram instr 134, a delay line instrument (not covered in the text).
䡲 By adding more delay opcodes transform instr 134 into a multitap delay line
instrument.
䡲 Modify instr 135 to make a multiband resonator.
䡲 Add more combs and vdelays to instr 135 and create a multitap delay with feed-
back/multiband-resonantor super-flanger.
䡲 Using the score statements covered in this section, return to etudes 3 and 4. In
them, repeat some section, insert some silences, vary the tempo during sections,
advance around a bit and ramp through some parameters to better explore the
range of possibilities these instruments have to offer.
䡲 In instr 136, substitute a diskin opcode for the rand opcode and flange your
samples.
䡲 In instr 136, add and explore the dynamic frequency and amplitude modification
of the control oscillator.
䡲 Change the waveforms of the control oscillator in instr 136. (Try randh.)
䡲 Add a resonant-lowpass filter to your modified flanger instrument.
䡲 Go for a walk and listen to your world.
57 Introduction to Sound Design in Csound
Global Variables
Until now, we have used only local i, k and a variables—those that are associated
with the specific instrument. Local variables are great because you could use the
same variable name in separate instruments without ever worrying about the asig or
amix data getting corrupted or signals bleeding-through from one instrument to an-
other. In fact, the instr and endin delimiters truly do isolate the signal processing
blocks from one another—even if they have the same exact labels and argument
names.
There are times, however, when you would like to be able to communicate across
instruments. This would make it possible to pass the signal from a synthesis instru-
ment to a reverb instrument, similar to the way one routes the signals on a mixing
console to the effects units, using “aux sends” and “aux returns.” In Csound this
same operation is achieved using global variables. Global variables are variables that
are accessible by all instruments. And like local variables, global variables are up-
dated at three basic rates, gi, gk, and ga, where:
gi-rate variables are changed and updated at the note rate.
gk-rate variables are changed and updated at the control rate.
ga-rate variables are changed and updated at the audio rate.
Because global variables belong both to all instruments and to none, they must be
initialized. A global variable is typically initialized in instrument 0 and filled from
within a local instrument. Where is this mysterious instrument 0? Well, instrument 0
consists of the lines in the orchestra file immediately following the header section
and before the declaration of the first instr. Thus, in figure 1.62, immediately after
the header, in instrument 0, the gacmb and garvb variables (our 2 global FX buses)
are cleared and initialized to 0.
sr ⫽ 44100
kr ⫽ 4410
ksmps ⫽ 10
nchnls ⫽ 1
gacmb init 0
garvb init 0
Figure 1.62 Global variables gacmb and garvb are initialized in instrument 0 after the
header and before the “first” instr.
58 Richard Boulanger
Let’s put global variables to use and add some external processing to our instruments.
From within instr 137, shown in figures 1.63 and 1.64, the dry signal from loscil is
added (mixed) to the wet signal on a separate reverb and echo bus.
asig loscil kenv, ifrq, ifun
out asig
garvb ⫽ garvb⫹(asig*irvbsnd)
gacmb ⫽ gacmb⫹(asig*icmbsnd)
Note that the dry signal is still sent out directly, using the out opcode, just as we have
from our first instrument. But in this case that same signal is also globally passed out
instr 137
LINEN
(kenv)
ifr q ifun
LOSCIL
garvb gacmb
(asig)
irvbsnd icmbsnd
garvb gacmb
Figure 1.63 Block diagram for instr 137, 198, and 199, a wavetable synthesis instrument
(instr 137), and two global effects (instr 198 and 199).
59 Introduction to Sound Design in Csound
Figure 1.64 Orchestra code for three instruments that work together to add reverb (instr
199) and echo (instr 198) to a looping oscillator (instr 137).
of the instrument and into two others, instr 198 (echo) and instr 199 (reverb) as
shown in figures 1.63 and 1.64.
It is important to note that in the score file (figure 1.65) all three of these instru-
ments must be turned on. In fact, to avoid transient and artifact noises, global in-
struments are typically left on for the duration of the section and the global variables
are always cleared when the receiving instrument is turned off (gacmb ⫽ 0 and garvb
⫽ 0).
Our next instrument, instr 138, shown in figures 1.66 and 1.67 is based on an
earlier FM design, but now the instrument has been enhanced with the ability to pan
the signal.
60 Richard Boulanger
; INS STRT DUR AMP FRQ1 SAMPLE ATK REL RVBSND CMBSND
i 137 0 2.1 70 8.09 5 .01 .01 .3 .6
i 137 1 2.1 70 8.09 5 .01 .01 .5 .6
Figure 1.65 Score file for our “global comb/nreverb loscil” instrument. The global nreverb
instrument, instr 199, is turned on at the beginning of the score and left on for the duration of
the passage. Three copies of our global comb instrument instr 198 are started simultaneously
with different looptimes. Finally, two copies of our loscil instrument, instr 137, start one
after another.
ilfofr q
ilfodep
OSCIL
f1
(klfo)
ifr q EXPON
EXPSEG
(kindex)
(kampenv)
ifc ifm 1
FOSCIL
(asig) garvb
(1 - ipan)
irvbsnd
ipan
OUT 2 OUT 1
garvb
Figure 1.66 Block diagram of instr 138, a dynamic FM instrument with panning and
global reverb.
61 Introduction to Sound Design in Csound
Figure 1.67 Orchestra code for instr 138, a dynamic FM instrument with vibrato, discrete
panning, and global reverb.
You should note that in instr 138 the panning is realized using a single variable
that functions like a panning knob on a traditional mixing console. How is this done?
Well, as you know by now, if we multiply a signal by a scalar in a range of 0 to 1
we effectively control the amplitude of the signal between 0 and 100%. So if we
simultaneously multiply the signal by the scalar (ipan) and one minus the same scalar
(1⫺ipan), we would have two outputs whose amplitudes would be scaleable between
0 and 100% but would be inversely proportional to each other.
For example, if the scalar is at 1, and that corresponds to 1 times the left output,
we would have 100% of our signal from the left and (1 - 1) or 0% signal from the
right. If, on the other hand, the amplitude scalar was set to .2, then we would have .2
times left or 20% of our signal coming from the left and 1 - .2 or .8 time the right
signal or 80% coming from the right. This algorithm provides a simple means of us-
ing a single value to control the left/right strength of a signal and is used in instr 138
and illustrated in figures 1.66 and 1.67.
The file etude6.orc contains four additional globo-spatial instruments. All of these
instruments are based on those presented in previous etudes. You should recognize
the algorithms. But all have been enhanced with panning and global reverb capabili-
ties. You are encouraged to block diagram and study them. Each demonstrates a
different panning and reverb approach. You are also encouraged to go back and add
global reverb and panning to all of the instruments we have studied so far.
62 Richard Boulanger
To end the chapter I will present instr 141, shown in figures 1.68 and 1.69, which
adapts an earlier amplitude modulation design and adds both global reverb and LFO-
based panning.
Notice here that the amplitude of the panning LFO is set to .5. This means that
this bipolar sinewave has a range of ⫺.5 to ⫹.5. Then, notice that I bias this bipolar
signal by adding .5 to it (kpanlfo ⫽ kpan ⫹ .5). This makes the signal unipolar. Now
the sinewave goes from 0 to 1 with its center point at .5. In instr 141 the “panning
knob” is being turned periodically from right to left (0 to 1) at the frequency of our
LFO panning oscillator. Since a biased sinewave is used, the output signal will move
from side to side. You surely realize, however, that you can substitute any envelope
or oscillator function into this design and thereby move your signal, periodically or
aperiodically (1/p3 Hz,) along any path or trajectory.
Moving sounds in space is one of the most exciting aspects of computer-based
sound design. The sky is the limit. I hope my ending tutorial instrument is just the
beginning of your exploration.
Conclusion
In this introductory chapter I have attempted to introduce the syntax of the Csound
language while covering some of the elements of sound design. Given this basic
understanding, the subsequent chapters of this text, written by the world’s leading
educators, sound designers, programmers, and composers, should serve to unlock
the secret power of Csound and help you find the riches buried therein. Along the
way, I sincerely hope that you not only discover some exquisite new timbres, but that
your work with Csound leads to a deep and profound awareness of the true spirit
embodied in organized sound . . . and silence.
63 Introduction to Sound Design in Csound
imodfrq
imodpth
.5 ipanfrq
ENVLPX OSCIL
imodfun
OSCIL
f1 (kenv) (klfo)
(kpan)
ifrq
.5 OSCIL
(kpanlfo) ifun
(asig)
garvb
(1 - ...)
irvbsnd
OUT 2 OUT 1
garvb
Figure 1.68 Block diagram of instr 141, an amplitude modulation instrument with an LFO
panner and global reverb.
Figure 1.69 Orchestra code for instr 141, an Amplitude Modulation instrument with an LFO
panner and global reverb.
64 Richard Boulanger
References
Chadabe, J. 1997. Electric Sound: The Past and Promise of Electronic Music. New York:
Prentice-Hall.
De Poli, G., A. Piccialli, and C. Roads. 1991. Representations of Musical Signals. Cambridge,
Mass.: MIT Press.
De Poli, G., A. Piccialli, S. T. Pope, and C. Roads. 1997. Musical Signal Processing. The
Netherlands: Swets and Zeitlinger.
Dodge, C., and T. Jerse. 1997. Computer Music. 2d rev. New York: Schirmer Books.
Eliot, T. S. 1971. Four Quartets. New York: Harcourt Brace & Co.
Mathews, Max V. 1969. The Technology of Computer Music. Cambridge, Mass.: MIT Press.
Mathews, Max V., and J. R. Pierce. 1989. Current Directions in Computer Music Research.
Cambridge, Mass.: MIT Press.
Pierce, J. R. 1992. The Science of Musical Sound. 2d rev. ed. New York: W. H. Freeman.
Pohlmann, Ken C. 1995. Principles of Digital Audio. 3d ed. New York: McGraw-Hill.
Roads, C., and J. Strawn. 1987. Foundations of Computer Music. 3d ed. Cambridge, Mass.:
MIT Press.
Csound uses lookup tables for musical applications as diverse as wavetable synthe-
sis, waveshaping, mapping MIDI note numbers and storing ordered pitch-class sets.
These function tables (f-tables) contain everything from periodic waveforms to arbi-
trary polynomials and randomly generated values. The specific data are created with
Csound’s f-table generator subroutines, or GEN routines. Csound includes a family
of GEN routines that write sampled soundfiles to a table, sum sinusoidal waves, draw
lines and curves between specified points, create Chebyshev polynomials, calculate
window functions, plot individual points in a table and generate random values. This
tutorial surveys the Csound f-table generators, interspersing suggestions for efficient
and powerful f-table utilization in Csound, including the introduction of a new tech-
nique for generating three-dimensional wave terrain synthesis f-tables.
A Csound f-table is an array of floating point values calculated by one of the GEN
routines and stored in RAM for use while Csound generates sound. These f-tables
are traditionally specified in the Csound score file and are generally restricted in size
to lengths of a power of two (2n ) or a power of two plus one (2n ⫹ 1). An f-table with
a size of 1024 (210) will contain floating point values in sequential data storage loca-
tions with data addresses numbering from 0 to 1023. Tables can be visually displayed
as mathematical graphs with data addresses plotted from left to right along the hori-
zontal, or x-axis (abscissa values) and the actual stored data plotted along the vertical,
or y-axis (ordinate values) as shown in figure 2.1.
An f-table outputs the data stored at the given address when it receives an input
index value. In Csound, f-tables receive isolated indexes as well as streams of index
values that scan tables once, periodically or randomly. Although f-tables are defined
by score file GEN routines, a variety of orchestra file opcodes create index values
and access table data. For example, the oscil opcode repeatedly generates index val-
ues from 0 to 1023 as it periodically scans a 1024-point f-table, an operation called
wrap-around lookup. Orchestra file opcodes that include the suffix “i” have the
66 Jon Christopher Nelson
Figure 2.1 A close up of a 129-point function table highlighting the data at addresses 63
through 67.
added capability of linearly interpolating between successive table values. This fea-
ture effectively provides an audio smoothing function. For example, periodically
scanning a 512-point f-table with a 43 Hz oscil opcode at a 44.1 kHz sampling rate
reaps approximately two copies of each point in the f-table. In contrast, the oscili
opcode interpolates between each of these values as shown in figure 2.2. Other inter-
polating opcodes include tablei, oscil1i, foscili and randi.
Although Csound f-table sizes are defined as either 2n or 2n ⫹ 1, the space allocation
for each f-table array is always 2n ⫹ 1, providing an extended guard point at the end
of each table. If the requested table size is 2n, the extended guard point contains a
duplicate of the first value of the f-table. If the requested size is 2n ⫹ 1, the guard
point contains the final value of the requested function. The reason for explicitly
defining an extended guard point is that Csound opcodes using a 512-point table
send indexes only from 0 to 511 (making the extended guard point seem superflu-
ous). However, interpolating opcodes use the extended guard point value to calculate
interpolation values after index 511. As a result, interpolating wrap-around lookup
opcodes (oscili, foscili and phasor-tablei pairs) should scan 2n size f-tables. This
ensures smooth interpolation between the table end points since the extended guard
67 Understanding and Using Csound’s GEN Routines
OSCIL OSCILI
signal signal
without with
interpolation interpolation
point will be a copy of the first table value. In contrast, interpolating single scan
opcodes (oscil1i and envlpx) should scan f-tables with a 2n ⫹ 1 size. For example,
the envlpx opcode scans a f-table to determine the rise shape of the attack portion
of an envelope. If it scans a table with a 2n size, the extended guard point will contain
the first table value. If this attack is slow, the envlpx opcode will interpolate between
the last table value and the starting table value, creating an amplitude discontinuity
that could result in an audible click. Using a table size of 2n ⫹ 1, however, ensures
that the interpolation will create a smooth amplitude envelope transition between the
attack and any subsequent pseudo-steady state.
Score F-Statements
While parameter numbers 5 and higher specify different f-table attributes for dif-
ferent GEN routines, the first four parameters determine the following qualities for
all GEN routines:
䡲 p1 ⫽ a unique f-table identification number
䡲 p2 ⫽ f-statement initialization time expressed in score beats
䡲 p3 ⫽ f-table size
䡲 p4 ⫽ GEN routine called to create the f-table
68 Jon Christopher Nelson
will delete f-table 8 at beat 132 of the score. This provides a means of managing
a computer’s limited RAM while performing a score file containing numerous or
large f-tables.
Furthermore, a p1 value of 0 creates a null table that can be used to generate
silence at the end of a soundfile. For example, a solitary:
f 0 60
in a score file creates a soundfile with sixty seconds of silence. This statement can
also be used to turn on Csound, enabling the reception of MIDI data for one minute
of real-time synthesis.
In addition to specifying the desired f-table GEN subroutine, p4 values determine
whether or not the function will be scaled. A positive p4 value results in a table that
is post-normalized to ordinate values between –1 and 1. Default f-table normalization
facilitates more predictable control in applications such as wavetable or nonlinear
synthesis. A negative p4 value inhibits rescaling and should be used for f-table appli-
cations requiring raw data, such as storing head-related transfer functions, algorith-
mic compositional data or MIDI mapping functions.
Figure 2.3 The f-table display for the soundfile “great.snd” as read into Csound via GEN1.
SFDIR into the f-table while the p5 entry “great.snd” loads the soundfile entitled
great.snd into the f-table. Csound reads soundfiles from the beginning and reads the
sample format from the soundfile header unless a skiptime and format are specified
in p6 and p7.
GEN1 allows a table size of 0. In this case, the table size is determined by the
number of samples in the soundfile. These tables, however, can only be read by the
loscil opcode. Although loscil provides sample playback with transposition and
loops, it cannot start at a specified skip time into the table, nor can it read the table
backward. These manipulations are possible only if the soundfile is written into a
GEN1 table with a 2n or 2n ⫹ 1 size. Using these table sizes will probably result in
either soundfile truncation or extension with zero values. Any opcode that calls an
f-table, however, will be able to read these soundfiles. In figure 2.4 we see a flowchart
of a Csound instrument that reads a tabled soundfile using a line opcode to provide
table lookup values. The accompanying lines of orchestra and score code provide the
Csound implementation of this instrument.
In this example, the audio rate line segment aread is drawn from a starting sample
iskip to an ending sample ilast. The starting sample is calculated by multiplying the
sampling rate with the desired skip time ( p5). Similarly, the final sample is deter-
mined by first multiplying the sampling rate, note duration ( p3) and a transposition
factor ( p6) and then adding the result to iskip. The line segment aread provides the
index into the raw soundfile data stored in f-table 30, which must have a 2n size.
Multiplying asnd with kenv, the peak amplitude of which is determined by an ampli-
tude multiplier value ( p4), imposes an amplitude envelope on this soundfile segment.
You might try adding an oscillator to the aread value in this example to create an
FM instrument.
Normalized soundfiles may be best for some sound playback and processing tech-
niques while raw amplitude data are best for others. Normalized and raw tables con-
taining the same soundfile are generated by using f-statement p4 values of ⫹1 and
70 Jon Christopher Nelson
iskip ilast
LINE LINSEG
(kenv)
TABLEI
f1
Figure 2.4 Block diagram of instr 201, a soundfile reading instrument using a tablei opcode
Figure 2.5 Orchestra code for instr 201, a soundfile reading instrument with accompanying
GEN1 call from score file.
⫺1 respectively. In the example code above, raw sample data are read into f-table 1
and then normalized to allow for amplitude scaling by the linseg envelope.
Csound also includes function generators that add sets of sinusoidal waves to create
composite waveforms. In Csound, single 2 periods of summed sinusoids are written
into f-tables by the GEN10, GEN9, GEN19, and GEN11 subroutines. These GEN
routines facilitate the creation of sine, cosine, square, sawtooth, triangle, and pulse
waveforms, all of which are useful as waveshapes for wavetable synthesis. Periodi-
cally cycling through these tables with wrap-around lookup opcodes generates a pe-
riodic waveform.
71 Understanding and Using Csound’s GEN Routines
Figure 2.6 F-table display of simple geometric waveshapes—sine, saw, square, and pulse
generated using Csound’s GEN10 subroutine.
GEN10
GEN10, GEN9, and GEN19 all create composite waveforms by summing sine waves
with specified relative strengths. Of these, GEN10 adds harmonic partials that are
all in phase. The relative strengths of ascending harmonic partials (1, 2, 3, etc.) are
defined in parameter numbers, p5, p6 and p7 respectively. Figure 2.7 shows a flow-
chart for two equivalent oscillators, followed by their accompanying orchestra file
and score file code containing a few simple GEN10 waveshapes.
GEN9
Like GEN10, GEN9 creates composite waveforms by adding partials with specified
relative strengths. GEN9, however, can create inharmonic partials with unique
phase dispositions.
The partial number, strength and phase are defined in groups of three parameters
beginning with p5. Figure 2.11 shows a few more waveshapes that can be generated
with GEN9.
Unlike orchestra opcodes that specify phase as a fraction of a cycle (0–1), f-
statements in a score express phase in degrees (0–360). In addition, the partial num-
ber need not be an integer. Consequently, specifying partial number .5 in f 8 (fig.
2.11) generates half of a sine wave, which could be used as an envelope, frequency
multiplier, or a table for waveshaping.
72 Jon Christopher Nelson
p5
Figure 2.7 Block diagram of instr 202 and 203, table-lookup oscillator instruments realized
via the phasor and tablei opcodes and an interpolating table-lookup oscillator.
Figure 2.8 Orchestra code for instr 202 and instr 203, table-lookup oscillator instruments
using a phasor/tablei pair versus an oscili.
Figure 2.9 F-table code for simple geometric waveforms as illustrated in figure 2.6.
73 Understanding and Using Csound’s GEN Routines
Figure 2.10 F-table displays for three waveforms generated using GEN9—a cosine, a tri-
angle, and a half-sine.
Figure 2.11 F-table code for cosine, triangle, and half sine as illustrated in figure 2.10.
Figure 2.12 Score code and f-table display for a quasi-Gaussian curve generated using
GEN19.
GEN19
The DC offset will add .5 to the ordinate values of this sine wave. As a result,
these unscaled (⫺p4) values will span from 0 to 1 rather than from ⫺.5 to .5. This
provides a good grain envelope for granular synthesis instruments, or this function
74 Jon Christopher Nelson
could serve as an amplitude multiplier that controls both left-right and front-back
panning in a four channel orchestra. These same values could also provide table in-
dexes into f-tables containing head-related transfer function data for controlling dy-
namic filters and reverb times for virtual reality applications.
Inharmonic Partials
Although GEN9 and GEN19 can create inharmonic (noninteger) partials, only par-
tials that are either integers or .5 multiples should be used for wrap-around lookup
operations. Other inharmonic partials typically generate nonzero values at the end of
the f-table, creating discontinuities and resultant noise. Discontinuities can be dimin-
ished by reducing the relative strength of the detuned partial. Moreover, pairing par-
tials with complementary offsets and equal strengths will eliminate discontinuities.
For example, partials x .75 and y .25, where x and y are positive integers, will exhibit
equivalent positive and negative ordinate values at the end of the f-table. Summing
these values results in a final table ordinate value of 0 so no discontinuity will exist
during wrap-around lookup. Similarly, partials x .66 and y .33, x .2 and y .8 and x .79
and y .21 will also “zero out” at the end of the f-table. Pairs of equal-strength partials
with shared offsets and inverse phase relationships also avoid final table discontinu-
ities. For example, partials x .9 and y .9, where x and y are unequal positive integers,
with shared strengths and respective initial phases of 0 and 180 will add up to 0 at
the end of the table. The f-tables in figure 2.14 exhibit sums of 0 at their end points
and demonstrate cycle offset and phase inversion cancellation.
Although these inharmonic partial pairing strategies eliminate discontinuities,
they will not result in periodic phasing of detuned partials. Periodic phasing can be
achieved only by generating a table containing summed high harmonic partials with
complex relationships. This results in a f-table with inherent phasing relationships.
While using this type of inharmonic function, a desired fundamental frequency is
generated by dividing the desired oscillator rate by the lowest harmonic number pres-
ent in the function. For example, an f-table containing harmonic partial numbers 21,
22, 25, 27, 31, 33, 34, and 35 should be read with the oscili opcode’s cps argument
set to the desired frequency divided by 21, the lowest partial present. Since this will
often result in a low oscillator frequency that scans the table slowly, it is best to use
a large table with an interpolating oscili opcode.
The phasing relationship in this f-table is still nondynamic. Dynamic phasing is
only achieved through mixing multiple oscillators with inharmonic relationships in
an orchestra instrument.
Figure 2.13 F-table display of inharmonic spectra without discontinuities.
LINSEG
(kenv) cpspch(p5)/21
(partials 21, 22, 25, 27, 31, 33, 34)
OSCILI
f 13
Figure 2.15 Block diagram of instr 204, a simple oscillator instrument with “desired” pitch
computed as cpspch(p5)/21 (i.e. the lowest partial).
f 13 0 8192 9 21 1 0 22 1 0 25 1 0 27 1 0 31 1 0 33 1 0 34 1 0 35 1 0
Figure 2.16 Orchestra and f-table code for an oscillator instrument with high order partials
illustrated in figure 2.17.
f 14 0 4096 11 10 1 .9
Figure 2.18 F-table display and score code for a pulse-train generated using GEN11.
GEN11
Unlike GEN10, GEN9, and GEN19, GEN11 adds cosine partials instead of sines. As
shown in figure 2.18, this GEN routine is designed to generate pulse-trains with the
number of harmonics, the lowest harmonic, and an amplitude coefficient specified
in p5, p6 and p7 of the f-statement.
Although f 14 above creates a pulse-train by summing sines, GEN11 generates
pulse-trains with fewer parameters. It also calculates the relative amplitudes ac-
cording to an exponential strength coefficient. GEN11 pulse-trains, however, are
nondynamic. For dynamic pulse-trains use the buzz and gbuzz opcodes.
Drawing Segments
GEN7
Csound also includes four GEN routines that draw lines and curves between speci-
fied points. For each of these, the odd-numbered parameters ( p5, p7, and p9) and
even-numbered parameters ( p6, p8, p10), contain ordinate and segment length val-
ues respectively. Figure 2.18 illustrates a few waveforms that use GEN7 to draw
line segments.
Segment lengths are expressed as table points and values of 0 result in discontinu-
ities. Furthermore, segment lengths generally should add up to the table length. Seg-
ment length sums greater than or less than the table length truncate the function or
pad the table with zeros respectively.
77 Understanding and Using Csound’s GEN Routines
Figure 2.20 F-table code for simple waveshapes drawn with GEN7 and shown in figure
2.19.
GEN5
The GEN5 parameters are identical to those of GEN7. Since GEN5 draws exponen-
tial segments, however, only nonzero ordinate values are allowed. Figure 2.21 shows
a typical example of GEN5 used to draw an exponential envelope function.
GEN8
Although GEN8 creates smooth cubic spline curves, using two equal ordinates sepa-
rated by a short segment length results in large humps. Figure 2.22 shows a couple
of typical GEN8 f-tables.
With a table length of 513, the quasi-Gaussian curve in f 20 works well for single
scan operations.
GEN6
; QUASI-GAUSSIAN
f 20 0 513 8 0 150 0.5 50 1 113 1 50 0.5 150 0
; STRETCHED COSINE
f 21 0 2048 8 1 750 0 550 -1 400 0 348 1
Figure 2.22 F-table display and score code for two spline curves drawn with GEN8, a quasi-
Gaussian, and a stretched cosine.
polynomial function segment. Here, the third ordinate value becomes the first in the
next grouping of three. These cubic polynomial segments are relatively smooth if the
odd numbered ordinates oscillate between maximum and minimum values while the
even ordinates, the points of inflection, maintain somewhat linear relationships be-
tween the odd ordinates. Successive maxima or minima odd ordinate values or points
of inflection without linear relationships result in segment spikes.
GEN3, GEN13, GEN14, and GEN15 all create polynomial functions that can be
used effectively in waveshaping instruments. Of these, GEN3 is the most flexible
since it creates a polynomial function over any left ( p5) and right ( p6) ordinate value
with any number of specified coefficients in p7 and higher. Figure 2.24 shows a
typical example.
GEN13, GEN14, and GEN15, on the other hand, create specific types of functions
known as Chebyshev polynomials. Chebyshev polynomials can split a sinusoid into
79 Understanding and Using Csound’s GEN Routines
; SMOOTH
f 22 0 8193 6 0 2048 .5 2048 1 2048 0 2049 -1
; POINTS OF INFLECTION ARE NOT BETWEEN ODD ORDINATE VALUES
f 23 0 8193 6 0 2048 1 2048 -1 2048 1 2049 0
; SUCCESSIVE MAXIMA
f 24 0 1024 6 0 256 .25 256 .5 256 .75 256 1
Figure 2.23 F-table display and score code for 3 cubic polynomial functions.
f 25 0 1025 3 -1 1 5 2 4 1 3 1 2 1
Figure 2.24 F-table display and score code for a polynomial function drawn with GEN3.
LINSEG
440
OSCILI
f2
2
Figure 2.25 Block diagram of instr 205, a simple waveshaping instrument with normali-
zation.
Figure 2.26 Orchestra and f-table code for a simple waveshaping instrument, instr 205.
Figure 2.27 F-table displays of a waveshaping function drawn with GEN7 and a normalizing
function drawn with GEN4 as defined in figure 2.26.
81 Understanding and Using Csound’s GEN Routines
resulting in table indexes that oscillate around the midpoint of the waveshaping func-
tion. When the ksweep values that in turn dictate the aindex amplitude reach .49, the
index values oscillate between .01 and .99, scanning the entire waveshaping function.
Since f 26 contains a line segment from ⫺1 to ⫹1 over the middle section of the
table, the sinusoidal index values generate a sinusoidal wave when the ksweep values
are below .25. As ksweep exceeds .25 throughout the middle section of each note,
all aindex values less than ⫺.25 and greater than .25 will result in f-table values of
⫺1 and 1 respectively. This nonlinear waveshaping simulates amplifier distortion in
which loud input signals are clipped, introducing numerous high partials into the
sound. Another possible amplifier distortion model might use GEN9 to create a wave-
shaping function containing half of a sine wave with an initial phase disposition of
270 degrees.
Clearly, adjusting the range of periodic index values into a waveshaping f-table pro-
vides a powerful means of dynamic signal modification, but it also directly alters the
amplitude of the sound. For instances in which this causal relationship is undesirable,
the waveshaped f-table output must be normalized to exhibit a consistent peak value
before imposing an amplitude envelope. The GEN4 subroutine provides the means
to accomplish this task by analyzing a waveshaping function and creating a comple-
mentary amplitude normalization f-table. Multiplying the outputs of a waveshaping
f-table and its complementary GEN4 normalizing f-table generates a waveshaped
signal with a consistent peak amplitude value if the sweep function determines the
index for both tables. An amplitude envelope can then be imposed on the resultant
signal. This technique can be used to create paradoxical sounds in which the nonlin-
ear distortion has no relationship to the amplitude. For example, inversely related
sweep and amplitude envelopes can generate a sound that distorts more as the ampli-
tude diminishes.
In instr 205, shown in figure 2.27, f 27 uses GEN4 to analyze f 26 and create a
normalizing function. The GEN4 f-statement specifies the f-table number to analyze
in p5 and defines the table mode in p6. Any nonzero value in p6 will perform the
analysis, assuming the waveshaping function has a bipolar structure. This bipolar
analysis data must be written into a table that is half the size of the waveshaping
f-table. A p6 value of zero analyzes a function that scans from left to right. In this
case, both the waveshaping and normalizing tables must be the same size.
82 Jon Christopher Nelson
Although any GEN routine can create waveshaping functions, Chebyshev polynomi-
als provide more predictable waveshaping results by splitting waves into harmonic
partials with specified relative strengths. As the periodic index values increase, the
resultant Chebyshev polynomial f-table output generally adds the specified par-
tials in ascending order. GEN14 creates a slightly more gentle curve than GEN13 but
the resultant waveshaped signals sound similar. Specifying strengths of more than
20 harmonics in either GEN13 or GEN14 results in large amounts of distortion as the
index values approach the table length limits. For both GEN13 and GEN14, the
Chebyshev polynomial is drawn over ordinate values that span from ⫺p5 to ⫹p5
f-statement values while p6 defines an amplitude scaling factor that is applied to any
index into the table. Using a table opcode offset argument of .5 and an f-statement
p6 value of .5 provides a simple means of scaling a normalized periodic index. The
relative strengths of harmonic partials are defined in p7 and following. Figure 2.28
defines some typical spectra using GEN13 and GEN14.
Tabling sine waves with amplitudes that move slowly over the entire range of the
waveshaping table provides an effective means of exploring the properties of wave-
shaping functions. After exploring these properties, the amplitude sweeps can be
adjusted to highlight the optimum desired dynamic timbres. Wrap-around tables
other than sines and GEN1 tables containing sampled sounds can also be waveshaped
with interesting results.
The basic waveshaping instruments defined in figures 2.26 and 2.28 achieve ampli-
tude normalization through the use of GEN4 f-tables. Unpredictable amplitude fluc-
tuations can also be minimized by inverting the phase of consecutive pairs of
harmonics, a procedure called signification. In GEN13 and GEN14, harmonic phases
are shifted by 180 degrees if a negative harmonic strength is used as in figure 2.30.
Consequently, the f-table will result in a more consistent amplitude output if the
harmonic strengths follow the pattern: ⫹, ⫹, ⫺, ⫺, ⫹, ⫹, ⫺, ⫺, etc. Additional
waveshaping amplitude normalization models can be found in Roads (Roads 1996)
and Le Brun (Le Brun 1979).
instr 206 ; WAVESHAPING WITH NORMALIZATION
iwshpfun ⫽ p6
inormfun ⫽ p7
ksweep linseg 0, p3*.5, .49, p3*.5, 0 ; INDEX SWEEP FUNCTION
aindex oscili ksweep, p5, 2 ; SOUND TO WAVESHAPE
atable tablei aindex, iwshpfun, 1, .5 ; WAVESHAPE AINDEX
knorm tablei ksweep*2, inormfun, 1 ; MAKE NORMALIZATION FUNCTION
kenv linen p4, .01, p3, .02 ; AMPLITUDE ENVELOPE
asig ⫽ (atable*knorm)*kenv ; NORMALIZE AND IMPOSE ENV
out asig
dispfft asig, .1, 1024
endin
Figure 2.28 Orchestra and f-table code for instr 206, a simple waveshaping instrument with
normalization that uses Chebyshev polynomials of first (GEN13) and second (GEN14) kind
to allow the specification of specific and definable harmonic partials and amplitudes under
waveshaping.
Figure 2.29 Four f-table displays of waveshaping functions—odd harmonics ( f 28), same
harmonics ( f 29), even harmonics ( f 30), and over 20 harmonics ( f 31), using GEN13 and
GEN14 as defined by the f-table code in figure 2.28.
84 Jon Christopher Nelson
; SIGNIFICATION
f 32 0 8193 13 1 1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1
Figure 2.30 F-table display and code of a waveshaping function generated with GEN13 in
which consecutive pairs of harmonics are inverted in phase.
Csound’s GEN12 subroutine generates the log of a modified Bessel function. This
function provides an amplitude scaling factor for the asynchronous FM instrument
described by Palamin and Palamin (Palamin and Palamin 1988).
This instrument is an amplitude-modulated FM instrument in which the AM and
FM modulating frequencies are synchronized. The lookup into this table is depen-
dent upon the modulation index “I” and a partial strength parameter “r” in Palamin
and Palamin’s formula I(r⫺1/r). Parameter r values greater than 1 emphasize higher
frequencies while values less than 1 emphasize lower frequencies. Parameter r val-
ues of 1 will not modify the usual FM spectrum. Figure 2.36 is a flowchart for this
85 Understanding and Using Csound’s GEN Routines
LINE EXPSEG
.5
ifrq ifrq
OSCILI OSCILI
f2 f1
1 .5 1 .5
GEN15 (GEN15)
TABLEI TABLEI
f 33 f 34
GEN4 GEN4
TABLEI TABLEI
f 35 f 36
Figure 2.32 Orchestra code for instr 207, a phase-quadrature waveshaping instrument with
amplitude normalization.
86 Jon Christopher Nelson
Figure 2.33 F-table code using GEN15 for phase-quadrature and GEN4 for normalizing
functions.
Figure 2.34 F-table displays of a GEN15 phase-quadrature pair (f-tables f 33 and f 34) and
their amplitude normalizing function as defined with GEN4 (f-tables f 35 and f 36).
Figure 2.35 F-table display and code of a modified Bessel function produced with GEN12.
87 Understanding and Using Csound’s GEN Routines
I r m c I r m
LINE
I(r -1/r)/2 I(r+1/r)/2
(aindex)
TABLEI OSCILI
f 37 f2
(alook) OSCILI
(aamod) f1
(afmod)
EXPSEG EXP
(aenv)
(aamod)
OSCILI
f1 I = modulation index
m = modulator frequency
(acar) c = carrier frequency
r = parameter “r” from
BALANCE Palamin and Palamin article
(asig)
GEN2 takes parameter values in the f-statement and transfers them directly into a
table. For most applications, normalization is undesirable and a value of –2 will be
used in p4 of the f-statement. Although older versions of Csound allowed only 150
parameters in any given f-statement, newer versions either have limits of at least
1024 parameters or they dynamically allocate memory to accommodate parameter
number limits dictated solely by the available RAM.
The GEN2 subroutine lends itself well to many algorithmic applications. For ex-
ample, GEN2 tables containing same-sized weighted pitch-class sets and collections
of note durations can be randomly indexed by a k-rate randh opcode with an am-
plitude argument equal to the table length. These random indexes can generate
weighted random rhythmic and pitch-class values in a melody generator. GEN2
tables containing ordered data such as a twelve-tone row can be scanned linearly to
provide algorithmic compositional parameters. The GEN17 subroutine also writes
parameter values directly into a table. GEN17, however, holds these values for a
88 Jon Christopher Nelson
Figure 2.37 Orchestra code for instr 208, an amplitude-modulated FM instrument as shown
in figure 2.37.
number of table points, creating a step function. GEN17 tables are most useful for
mapping MIDI note numbers onto register numbers or onto sampled sound f-table
numbers. Figure 2.38 shows sample GEN2 and GEN17 tables.
The flowchart in figure 2.39 and the accompanying orchestra code in figure 2.40
provide an example of an instrument that cycles through a twelve-tone row using the
row defined in f 38.
The GEN20 subroutine generates window functions. Such functions can be used as
spectrum analysis windows, granular synthesis amplitude envelopes, or in a variety
of other applications. The window functions, which are specified with f-statement
p5 values of 1–9, include Hamming, Hanning, Bartlett, Blackman, Blackman-Harris,
Gaussian, Kaiser, Rectangle and Sinc. For each of these, the number in p6 of the
f-statement defines the peak window value. Of these, the Kaiser window requires an
additional p7 value between 0 and 1, adjusting the function to approximate a rect-
angle or a Hamming window. Figure 2.41 shows the f-table display and figure 2.42
defines the corresponding score code for f-statements of each window type.
f 38 0 16 -2 2 1 9 10 5 3 4 0 8 7 6 11
f 39 0 128 -17 0 1 12 2 24 3 36 4 48 5 60 6 72 7 84 8 96 9 108 10 120 11
Figure 2.38 F-table display and score code for non-normalized versions of GEN2 and
GEN17, a pitch map, and a MIDI keyboard map.
p5*(12/idur)
(irowrep)
iamp
OSCIL 1/idur
f 10
(kenv) PHASOR
6 iseed (kcnt1)
12
RANDH (kcnt2)
(krn)
TABLE
abs(int(krnd))+5 f 38
(koct) (kpc)
koct+(kpc*.01)
(kpch)
CPSPCH
(kcps)
OSCIL
f4
(asig)
Figure 2.40 Orchestra code for instr 209, an instrument that cycles thought table f 38, a
twelve-tone row defined using GEN2.
90 Jon Christopher Nelson
Figure 2.41 F-table displays of standard window functions: Hamming ( f 40), Hanning
( f 41), Bartlett ( f 42), Blackman ( f 43), Blackman-Harris ( f 44), Gaussian ( f 45), Kaiser
( f 46), Rectangle ( f 47) and Sinc ( f 48).
f 40 0 513 20 1 1; Hamming
f 41 0 513 20 2 1; Hanning
f 42 0 513 20 3 1; Bartlett
f 43 0 513 20 4 1; Blackman
f 44 0 513 20 5 1; Blackman-Harris
f 45 0 513 20 6 1; Guassian
f 46 0 513 20 7 1.75; Kaiser
f 47 0 513 20 8 1; Rectangle
f 48 0 513 20 9 1; Sinc
Figure 2.42 F-table code for generating standard window functions using GEN20 as shown
in figure 2.41.
91 Understanding and Using Csound’s GEN Routines
The GEN21 subroutine enables the creation of a variety of f-tables with random
distributions. The p5 values in GEN21 f-statements define different types of random
distributions. Values of 1 through 11 in p5 correspond to Uniform, Linear, Triangular,
Exponential, Biexponential, Gaussian, Cauchy, Positive Cauchy, Beta, Weibull and
Poisson distributions. As with GEN20, the p6 values in a GEN21 f-statement define
the peak table value. The Beta distribution requires two additional parameters, and
the Weibull one additional parameter. A thorough explanation of these random distri-
butions and suggestions for their applications to composition may be found in Dodge
and Jerse’s Computer Music (Dodge and Jerse 1985). Figure 2.43 shows the f-table
display and figure 2.44 defines the corresponding score code for f-statements defin-
ing each of the GEN21 random distribution types.
The Csound GEN routines create two-dimensional f-tables for use in the majority
of synthesis paradigms. Three-dimensional tables, however, are necessary for wave
terrain synthesis. This synthesis technique generates audio waveforms by scanning
three-dimensional tables using elliptical orbits through the table terrain. In three-
dimensional tables, the axes are plotted as follows:
䡲 x-axis plotted left/right
䡲 y-axis plotted front/back
䡲 z-axis plotted top/bottom
Wave terrain synthesis uses indexes for both the x and y axes and the resultant
z-axis values create the actual waveform. This is analogous to tracking the height of
a large ball rolling around in a hilly landscape.
By altering the orbit pattern or moving it across the terrain throughout the dura-
tion of a note, the waveform can be transformed radically. Moreover, wave terrains
that maintain z-axis values of 0 around their entire perimeter are of particular interest
for wave terrain synthesis. The family of three-dimensional f-tables exhibiting this
special property allow index orbits to traverse multiple copies of the table without
waveform discontinuities. As an orbit departs one table edge, it can smoothly re-
enter at the same point on the opposite table edge. This can be visualized as an ellip-
tical orbit crossing multiple copies of a wave terrain placed next to each other like
92 Jon Christopher Nelson
Figure 2.43 F-table displays for random distributions generated using GEN21: Uniform
( f 49), Linear ( f 50), Triangular ( f 51), Exponential ( f 52), Bi-exponential ( f 53), Gaussian
( f 54), Cauchy ( f 55), Positive Cauchy ( f 56), Beta ( f 57), Weibull ( f 58), Poisson ( f 59).
f 49 0 513 21 1 1 ; Uniform
f 50 0 513 21 2 1 ; Linear
f 51 0 513 21 3 1 ; Triangular
f 52 0 513 21 4 1 ; Exponential
f 53 0 513 21 5 1 ; Biexponential
f 54 0 513 21 6 1 ; Gaussian
f 55 0 513 21 7 1 ; Cauchy
f 56 0 513 21 8 1 ; Positive Cauchy
f 57 0 513 21 9 1 1 2 ; Beta
f 58 0 513 21 10 1 2 ; Weibull
f 59 0 513 21 11 1 ; Poisson
floor tiles. As an orbit amplitude increases to traverse multiple tables, the sonic trans-
formation includes both a timbre change and an ascending pitch.
Although this synthesis model provides an efficient and powerful means of gen-
erating dynamic waveforms, wave terrain synthesis remains relatively unexplored. A
comprehensive bibliography of the technique includes four articles and a description
of the technique in Road’s Computer Music Tutorial (Bischoff 1978, Borgonovo
1984 and 1986, Mitsuhashi 1982, and Roads 1996).
Although the Csound GEN routines make two-dimensional f-tables, it is possible
to create three-dimensional functions by multiplying the outputs of two discretely
scanned f-tables representing the x and y axes. The flowchart in figure 2.45 and the
accompanying orchestra code in figure 2.46 provide an example of a wave terrain
synthesis instrument. The wave terrain instrument shown in instr 210 needs to in-
clude the following parameters:
䡲 p4 ⫽ amp
䡲 p5–6 ⫽ xtransverse init.—final
䡲 p7–8 ⫽ xoscil amplitude init.—final
䡲 p9–10 ⫽ xoscil frequency init.—final
䡲 p11 ⫽ xfn
䡲 p12–13 ⫽ ytransverse init.—final
䡲 p14–15 ⫽ yoscil amplitude init.—final
䡲 p16–17 ⫽ yoscil frequency init.—final
䡲 p18 ⫽ yfn
䡲 p19 ⫽ x-axis f-table
䡲 p20 ⫽ y-axis f-table
In this instrument, the x and y f-tables must have endpoint values of 0 in order to
facilitate the traversal of multiple tables. In addition, the x and y f-tables should both
be post-normalized by using positive p4 f-statement values. With absolute peak val-
ues of 1, the x and y f-table outputs can be multiplied to generate z-axis values. Fig-
ure 2.49 shows several three-dimensional terrains that result from multiplying the x
and y-axis f-table data.
The tablei opcodes that scan these x and y tables should also have their ixmode
argument set to 1, normalizing the indexes to scan the entire table with values from
0 to 1 rather than 0 through the f-table length.
Using a normalized index with fractional values enables wave terrain orbits to
easily scan across multiple tables as the x and y orbit amplitudes exceed values of 1.
94 Jon Christopher Nelson
p7 p8 p14 p15
p9 p10 LINEN p16 p17
LINE LINE
(kenv)
(kxamp) LINE (kyamp) LINE
.5 (kxfreq) .5 (kyfreq)
p5 p6 p12 p13
OSCILI OSCILI
p11 LINE p18 LINE
(kxndx) (kxtrans) (kyndx) (kytrans)
frac(kxndx+1000.5+kxtrans) frac(kyndx+1000.5+kytrans)
(kxndx) (kyndx)
TABLEI TABLEI
p19 p20
(ax) (ay)
(az)
Figure 2.45 Block diagram of instr 210, a wave-terrain synthesis instrument for scanning
3D function tables.
Figure 2.47 F-table displays of x and y coordinates of simple and complex wave-terrains as
defined in figure 2.48.
f 61 0 8192 10 0 0 1
f 62 0 8192 10 1 .43 0 .25 .33 .11 0 .75
f 63 0 8192 9 1 1 0 1.5 1 0
f 64 0 8192 9 3 1 0 3.5 1 0
Figure 2.48 F-table code for generating the x and y-axis coordinates of simple and complex
wave-terrains.
This fractional value is derived from the addition of the index function, transverse
value and 1000.5, which provides an offset of .5 while avoiding negative values for
any index amplitude less than 1000. Altering the frequency or amplitude values for
the x or y oscillators or adding transverse movement in either the x or y axes will
result in sonic transformations. An elliptical orbit is attained in this instrument by
using synchronized cosine and sine pairs for the x and y axis indexes. Other indexes,
however, such as triangle, sawtooth, linear, exponential, or other functions will alter
the sonic result.
The sonic result of wave terrain synthesis is similar to amplitude modulation. Al-
though the two techniques display similarities, classical AM synthesis uses static
waveforms while the moving elliptical x and y f-table indexes provide greater dyna-
mism in wave terrain synthesis. The wave terrain model provided above can be ex-
tended by adding a w-axis f-table with a unique index to create a four-dimensional
wave terrain f-table. Furthermore, using additional axes with independent indexes
creates wave terrains with many dimensions.
Figure 2.49 Simple and complex wave terrain realized by modulating two f-tables.
97 Understanding and Using Csound’s GEN Routines
Conclusion
References
Bischoff, J., R. Gold, and J. Horton. 1978. “A microcomputer-based network for live perfor-
mance.” Computer Music Journal 2(3): 24–29. Revised and updated version in C. Roads and
J. Strawn, eds. 1985. Foundations of Computer Music. Cambridge, Mass.: MIT Press.
Borgonovo, A., and G. Haus. 1984. “Musical sound synthesis by means of two-variable func-
tions: experimental criteria and results.” In D. Wessel, ed. Proceedings of the 1984 Inter-
national Computer Music Conference. San Francisco: International Computer Music
Association, pp. 35–42.
Borgonovo, A., and G. Haus. 1986. “Sound synthesis by means of two-variable functions:
experimental criteria and results.” Computer Music Journal 10(4): 57–71.
Dodge, C., and T. Jerse. 1985. Computer Music. New York: Schirmer Books.
LeBrun, M. 1979. “Digital waveshaping synthesis.” Journal of the Audio Engineering Society
27(4): 250–265.
Mackenzie, J. 1996. “Using strange attractors to model sound.” In Clifford Pickover, ed. Frac-
tal Horizons: The Future Use of Fractals. New York: St. Martin’s Press, pp. 225–247.
Mackenzie, J. 1995. “Chaotic predictive modeling of sound.” In ICMC Proceedings 1995. San
Francisco: International Computer Music Association.
Mitsuhashi, Y. 1982. “Audio signal synthesis by functions of two variables.” Journal of the
Audio Engineering Society 30(10): 701–706.
Palamin, J. P., and P. Palamin. 1988. “A method of generating and controlling musical asym-
metric spectra.” Journal of the Audio Engineering Society 36(9): 671–685.
Roads, C. 1996. The Computer Music Tutorial. Cambridge, Mass.: MIT Press.
3 What Happens When You Run
a Csound Program
John ffitch
Running Csound
When running Csound, there are actually three kinds of information that are passed
to the program: (1) the various “command-line flags,” that set and control detailed
aspects of the way Csound works; (2) the orchestra file; and (3) the score file—be it
text-based, a MIDI file, real-time MIDI, real-time audio, or any combination of these
inputs. For the present, we will not consider MIDI or real-time audio inputs. Rather,
we will focus on Csound’s traditional text-based score file.
Essentially, the three types of information that the user presents to the Csound
program point to the three main functions of the software: (1) general control and
house-keeping; (2) orchestra translation; and (3) score translation. You might be ask-
ing, “What about the synthesis?” Well, in my three-part division of labor, I will in-
clude Csound’s rendering or sound synthesis actions as an aspect of part 1 or control.
Argument Decoding
files through dialog boxes and clicking on-screen buttons with a mouse or point-
ing device. The original interface to Csound, however, was through a text-based
“command-line” with options. And in fact, the command-line interface is still acces-
sible from most of today’s graphical “front-ends.” For the sake of clarity, it is this
command-line interface that will be described here.
File Names
Csound requires but two arguments to run: the names of the orchestra and the score
files. These can be specified without any additional or special flags. This is because
Csound has default settings for all flags. When you run the program, the number of
names are checked and an error message would be produced warning if one or both
were missing, or if there were more than two file names specified.
Flags
Flags are command-line options. They are introduced with a “-” character. In gen-
eral, command-line flags control the form of the output, such as the size of the sample
(8, 16 or 32bit) or the header format (AIFF, WAVE, Floating Point) as well as speci-
fying the name of the input MIDI file, the name of the listing file used for debugging
purposes, and the name of the output file. Other flags control the quantity of mes-
sages the system generates, the quality of graphics and a number of other special
features. The manual gives a complete list of these flags.
Most flags are checked for validity, so it is possible to get error messages or warn-
ings from the argument decoding. For example, it is wrong to specify both WAVE
and AIFF output and any such attempt would give rise to a message. Many graphical
user interfaces would not even allow such a selection. With a graphical user interface
the checking itself may be done in a different way, but the purpose is the same: to
control the detailed behavior of Csound.
Orchestra Compilation
The orchestra file describes the individual instruments in a language for signal pro-
cessing. In common with all computer languages, this language has to be translated
into an internal form that is usable by the control components. This translation takes
101 What Happens When You Run Csound
Figure 3.1 Orchestra code for instr 301, instr 302 and instr 399, a simple stereo orchestra
consisting of two instruments and a global reverb. Note that only the reverb instrument gener-
ates output to both channels.
place in two parts, as described in the source files otran.c and oload.c, together with
associated files such as rdorch.c.
The first stage of any compilation process is the recognition of the words of the
language. Csound checks for a number of possible errors. It notices missing labels,
variables whose values are used but unassigned, and similar errors. If the orchestra
is a particularly big one, or has a large number of variables, Csound will expand
from its initial allocations of space for the translation process. This is sometimes
accompanied by a short message indicating that expansion has happened. Larger
orchestras inevitably take longer to compile and so such a message serves as a warn-
ing that you may have to wait a while.
For each opcode there are usually two kinds of associated action, an initialization
and an action to be performed at the control or audio rate. The first task the compila-
tion has to achieve is to arrange that each opcode in an instrument be initialized. The
second task is to call these opcode-actions with the appropriate arguments for as
long as required by the score. The orchestra translation creates structures that facili-
tate this, but they are only used during the performance. The performance needs to
know what to play and when to play and this information is read from the score or
MIDI file.
102 John ffitch
Score Compilation
The score language is just a collection of events, with information about when to
play and what to play. As such, the score does not need much alteration or translation.
The two main actions needed by the score compiler are: (1) to sort the note events
into time-order; and (2) to take account of the number of beats per second plus any
accelerations and declarations. The sorting is necessary as Csound allows the notes
to be written in any order because it is it is sometimes more convenient to write one
instrument’s part in total before writing the other parts. While sorting, the system
also takes account of the non-numerical characters in the score language, like the
carry (.) and the ramp (⬎) characters, expanding them to numeric values. The result
i 399 0 42 1
Figure 3.2 Note-list from the score file for the orchestra shown in figure 3.1, a two instru-
ment score in which both parts are listed separately and in which variable tempo is specified.
103 What Happens When You Run Csound
Figure 3.3 After compilation, the sorted score file from figure 3.2. Note that the instruments
are time and number ordered and that the tempo statement is at the top.
of all this analysis and sorting can be seen in the file score.srt, which is produced as
a side effect of running Csound. The score in figure 3.2 is shown as written by the
user. The score in figure 3.3 is the translated version as it appears in the score.srt file.
The sorted and normalized score (reformatted slightly to make it easier to read) is
shown in figure 3.3. The most obvious feature of this score is the time-ordering ( p2)
of events. Looking a little closer, you see that there are 6 parameter fields (p-fields)
while the original score file only has 4. The additional two fields are the third and
fifth, which give the time in seconds of the start time and length of the note. As this
score has a nonconstant beat speed, as defined by the t-statement, these additional
parameters are not such simple values. This form allows Csound to work in seconds,
but report the start and end of events in beats, which is easier to relate to the original
score. Also, the tempo information in the score’s t-statement is transformed to the
w-statement.
104 John ffitch
Performance
When a note finishes, the space is retained for a later note on the same instrument.
Only at the end of a section are the instruments deallocated. While apparently waste-
ful, this system saves on the time taken to create new instrument instances and is
actually one of the reasons for the speed of Csound. Figure 3.4 shows the typical
display from a Csound score and orchestra run.
As you might notice, ends of notes are not always displayed. What the user sees
in the display is a report of the score progress as new instruments are allocated, or
when new notes are started. This is also because some instruments require internal
105 What Happens When You Run Csound
Figure 3.4 Typical message output during compilation indicating: new instrument alloca-
tions (new alloc for instr 301); begin time of score events (B 0.000); the duration of the score
event ( . . 0.500); the current time in this section (T 0.251); the total running time of the piece
(TT 0.251) and the maximum amplitude of each channel (M: 7997.5 5831.7).
memory whose size is not known until performance time. In these cases the initial-
ization function itself must allocate the memory. Of course there are internal func-
tions to take care of this, but there may be times when Csound will not run because
you have not allocated enough RAM to the application or you do not have enough
RAM in your computer.
The other action that could happen at performance-time is the creation of a table.
This is rather like playing a note and there are internal functions that create the table.
This is described in more detail in a later section.
In the process of compiling the Csound orchestra and performing the resulting
sounds, the system will most likely produce a number of lines of output. These mes-
sages can be divided into three main categories: information, warnings and errors;
the last category subdivides into fatal and nonfatal errors.
106 John ffitch
Information Messages
In common with other computer systems, there are a number of messages that are
produced to provide information to the user. Figure 3.5 is an example output from a
small Csound rendering.
Every one of the lines in figure 3.5 is an information message. These messages
tell us the names and locations of the orchestra and score files (Csound: Book:
ffitch: 303.orc, Csound: Book: ffitch: 303.sco) and what instrument
is defined in it (instr 303). It also tells which version of Csound you are using
(MIT Csound: 3.483 (Jul 6 1998)), where the output was written (written
to Csound:Soundfiles:303.snd (AIFF)), as well as its size (87 16384-
byte soundblks of shorts).
Each score event corresponds with a line beginning with the letter B. Each of these
lines corresponds with the duration of the event in score-time and real-time, in both
the section-time (T) and the overall piece-time (TT). The M information gives the
mean amplitude of the score at that event-time. (B 2.000 .. 3.000 T 3.000 TT
11.000 M: 12426.7). At the end of each section we are told that instruments are
deallocated (inactive allocs returned to freespace) and the section peak
amp (end of section 2 sect peak amps: 20606.6). At the end of the ren-
dering we are reassured from the overall amplitude (end of score. overall
amps: 20606.6) that the number of samples that were out of range was zero
(overall samples out of range: 0). Finally, we are told how long it took to
compute the file (**Total Rendering time was: 11.91662 secs). This in-
formation is particularly useful because it is an indication of the system performance.
By comparing the time it took to generate the files with the total duration of the file,
one can determine if the orchestra and score will run in real-time. For example, in
figure 3.5 you can see that the total-time of the score was 16 seconds and the total-
rendering-time took only 11.91662 seconds. There are other factors, of course, but
given these numbers, this file would probably run in real-time on your system.
There are a number of other information messages that can appear and the com-
mand line option -m (or the GUI equivalent) can control just how “informative” the
system is. It should be noted, however, that if one is attempting to obtain real-time
performance, the fewer messages produced, the more time there is for the “perfor-
mance.” Printing, even on a screen, is still slow.
Warning Messages
orchname: Csound:Book:ffitch:303.orc
scorename: Csound:Book:ffitch:303.sco
sorting score ...
... done
orch compiler:
16 lines read
instr 303
MIT Csound: 3.483 (Jul 6 1998)
(Mills/PPC: 3.4.8a2)
orch now loaded
audio buffered in 8192 sample-frame blocks
writing 16384-byte blks of shorts to Csound:Soundfiles:303.snd (AIFF)
SECTION 1:
ftable 1:
new alloc for instr 303:
B 0.000 .. 2.000 T 2.000 TT 2.000 M: 9373.1
B 2.000 .. 3.000 T 3.000 TT 3.000 M: 12292.8
B 3.000 .. 4.000 T 4.000 TT 4.000 M: 16540.2
end of section 1 sect peak amps: 16540.2
inactive allocs returned to freespace
SECTION 2:
new alloc for instr 303:
B 0.000 .. 4.000 T 4.000 TT 8.000 M: 20606.6
end of section 2 sect peak amps: 20606.6
inactive allocs returned to freespace
SECTION 3:
new alloc for instr 303:
B 0.000 .. 2.000 T 2.000 TT 10.000 M: 9810.3
B 2.000 .. 3.000 T 3.000 TT 11.000 M: 12426.7
B 3.000 .. 4.000 T 4.000 TT 12.000 M: 16152.1
end of section 3 sect peak amps: 16152.1
inactive allocs returned to freespace
SECTION 4:
new alloc for instr 303:
B 0.000 .. 4.000 T 4.000 TT 16.000 M: 19692.0
end of section 4 sect peak amps: 19692.0
end of score. overall amps: 20606.6
overall samples out of range: 0
0 errors in performance
87 16384-byte soundblks of shorts written to Csound:Soundfiles:303.snd (AIFF)
Figure 3.5 Complete output of a simple score indicating source directories, Csound version,
instrument allocation, event onset times, amplitudes, samples-out-of-range and errors in
performance.
108 John ffitch
if left unaddressed, the results would still be valid and the job would still run. The
most common warning message is one pointing out that there have been some
samples out of range. Samples out of range means that the requested amplitude was
greater than the resolution of the DAC. For instance, if you are writing 16-bit
samples, then the maximum value that can be represented is 216 or 32,768. When
Csound computes notes, it sums their amplitudes. If two notes sounding at the same
time each had an amplitude of 20000, the output amplitude would be 40000 and
result in a warning message that there were 102972 samples out of range as in
304.orc. Typically, if this number is small, indicating that there are only a few
samples out of range, you might decide not to do anything about it, even if there were
a slight reduction in audio quality and noticeable clicks or crackling. The program is
offering advice that a reduction in amplitude might be warranted. Of course, if the
number of samples that are too large in magnitude is a significant proportion of the
audio output, the resulting audio may be so distorted as to be useless.
Other typical warning messages include remarks that input samples or analysis
files have finished before the instrument using them finished playing. Again, this may
be what you wanted, or it may indicate a mis-calculation and you would want to
increase the size of the input sample or the length of the analysis file.
What you should do with these warnings is to look at them and convince yourself
that the situation is what you expected, or is acceptable. If it is neither of these, then
it is worth some investigation into the source of the warning. You might then choose
to remove the cause of the warning, or perhaps realize that it is acceptable after all.
In figure 3.6 the number of samples out of range is reasonably small and so you
might choose to live with the added grunge. Alternatively, reducing the amplitude of
one of the instruments playing between times 1 and 5.5 in the score might prove the
correct action in this case to reduce the noticeable distortion.
Another warning that is frequently seen is the “pmax ⫽ ... note pcnt ⫽
...” message that appears when there is a mismatch between the number of p-fields
provided by a score and the number of p-fields used in an instrument.
For example, figure 3.7 warns us that the first line in the score for instr 309 pro-
vides a value for p7 even though the instrument only uses up to p6. It also warns that
instr 307 requires parameters up to and including p4, while the score only provided
3. Other warnings are indicated as well. As with all warnings, you should verify that
this is what you wanted or expected and consider correcting the situation.
Error Messages
Error messages indicate that something is definitely wrong and corrections are
needed. Most (but not all) errors will lead to the premature stopping of the pro-
109 What Happens When You Run Csound
orchname: Csound:Book:ffitch:304.orc
scorename: Csound:Book:ffitch:304.sco
sorting score ...
... done
orch compiler:
19 lines read
instr 304
instr 305
MIT Csound: 3.483 (Jul 15 1998)
(Mills/PPC: 3.4.8a3)
orch now loaded
audio buffered in 8192 sample-frame blocks
writing 16384-byte blks of shorts to Csound:Soundfiles:304.snd (AIFF)
SECTION 1:
ftable 1:
new alloc for instr 304:
B 0.000 .. 0.500 T 0.250 TT 0.250 M: 15861.2
new alloc for instr 305:
B 0.500 .. 1.000 T 0.500 TT 0.500 M: 27514.9
new alloc for instr 304:
B 1.000 .. 1.500 T 0.750 TT 0.750 M: 37372.7
number of samples out of range: 20
new alloc for instr 305:
B 1.500 .. 2.000 T 1.000 TT 1.000 M: 36577.8
number of samples out of range: 54
new alloc for instr 304:
B 2.000 .. 2.500 T 1.250 TT 1.250 M: 40464.4
number of samples out of range: 111
new alloc for instr 305:
B 2.500 .. 3.000 T 1.500 TT 1.500 M: 39064.1
number of samples out of range: 55
new alloc for instr 304:
B 3.000 .. 3.500 T 1.750 TT 1.750 M: 51162.4
number of samples out of range: 366
new alloc for instr 305:
B 3.500 .. 5.500 T 2.750 TT 2.750 M: 49288.0
number of samples out of range: 617
B 5.500 .. 7.500 T 3.750 TT 3.750 M: 15807.4
B 7.500 .. 8.000 T 4.000 TT 4.000 M: 8073.8
B 8.000 .. 9.000 T 4.500 TT 4.500 M: 815.1
B 9.000 .. 9.500 T 4.750 TT 4.750 M: 0.0
B 9.500 .. 12.500 T 6.250 TT 6.250 M: 0.0
end of score. overall amps: 51162.4
overall samples out of range: 1223
0 errors in performance
34 16384-byte soundblks of shorts written to Csound:Soundfiles:304.snd (AIFF)
Figure 3.6 Messages from 304.orc. Note the error message indicating “samples out of
range” in instr 304 and instr 305.
110 John ffitch
orchname: Csound:Book:ffitch:305.orc
scorename: Csound:Book:ffitch:305.sco
sorting score ...
... done
orch compiler:
55 lines read
instr 307
instr 308
instr 309
instr 310
MIT Csound: 3.483 (Jul 15 1998)
(Mills/PPC: 3.4.8a3)
orch now loaded
audio buffered in 8192 sample-frame blocks
writing 32768-byte blks of shorts to Csound:Soundfiles:305.snd (AIFF)
SECTION 1:
ftable 1:
ftable 11:
ftable 51:
ftable 52:
new alloc for instr 308:
B 0.000 .. 1.000 T 1.000 TT 1.000 M: 3686.9 3686.9
new alloc for instr 309:
WARNING: instr 309 pmax ⫽ 6, note pcnt ⫽ 7
B 1.000 .. 2.000 T 2.000 TT 2.000 M: 7797.4 7423.2
new alloc for instr 307:
WARNING: instr 307 pmax ⫽ 4, note pcnt ⫽ 3
B 2.000 .. 3.000 T 3.000 TT 3.000 M: 11579.7 13208.2
new alloc for instr 309:
WARNING: instr 309 pmax ⫽ 6, note pcnt ⫽ 7
B 3.000 .. 4.000 T 4.000 TT 4.000 M: 17053.1 15530.3
new alloc for instr 309:
WARNING: instr 309 pmax ⫽ 6, note pcnt ⫽ 7
B 4.000 .. 5.000 T 5.000 TT 5.000 M: 16912.9 14230.5
new alloc for instr 310:
new alloc for instr 310:
WARNING: instr 310 pmax ⫽ 4, note pcnt ⫽ 3
B 5.000 .. 5.100 T 5.100 TT 5.100 M: 20522.9 17494.9
B 5.100 .. 5.200 T 5.200 TT 5.200 M: 16204.1 12399.1
B 5.200 .. 5.300 T 5.300 TT 5.300 M: 16392.6 11418.4
B 5.300 .. 5.400 T 5.400 TT 5.400 M: 20373.0 15302.8
B 5.400 .. 5.500 T 5.500 TT 5.500 M: 20727.0 16972.2
B 5.500 .. 5.600 T 5.600 TT 5.600 M: 15574.4 11174.1
B 5.600 .. 5.700 T 5.700 TT 5.700 M: 23788.3 19189.1
B 5.700 .. 7.000 T 7.000 TT 7.000 M: 16765.4 11694.7
B 7.000 .. 7.100 T 7.100 TT 7.100 M: 27350.8 23295.1
B 7.100 .. 10.000 T 10.000 TT 10.000 M: 16147.2 14967.6
B 10.000 .. 11.000 T 11.000 TT 11.000 M: 12579.9 15027.0
B 11.000 .. 12.000 T 12.000 TT 12.000 M: 10018.8 9908.6
B 12.000 .. 13.000 T 13.000 TT 13.000 M: 5018.3 4896.6
B 13.000 .. 14.000 T 14.000 TT 14.000 M: 30.6 26.2
end of score. overall amps: 27350.8 23295.1
overall samples out of range: 0 0
0 errors in performance
76 32768-byte soundblks of shorts written to Csound:Soundfiles:305.snd (AIFF)
Figure 3.7 Message from 305.orc showing warning message indicating both missing and
additional p-fields.
111 What Happens When You Run Csound
gram—the so-called fatal errors. Error messages are generally explicit as to the
cause of the problem and should give you a good idea of what is required to correct
the situation. From the description earlier in this chapter, it is hoped you will have a
good idea at what stage of the Csound compilation and performance process the
error was detected. Fatal errors include missing or unreadable orchestra files, missing
tables for instruments, detected at initialization of notes and thus the incorrect skip-
ping of initialization of an instrument. In the nonfatal cases either a default value is
used or an individual note may be deleted. There is a complete and annotated list of
Csound messages in the manuaappendix.
Conclusion
In this chapter, an initial glimpse at the internal structure of the Csound program has
been given. There is a great deal more to learn if you wish to become a Csound
programming expert. Still, I hope that I provided you with sufficient clues to start
you off on your investigation. If you decide that you do not want to look any further,
I hope that this preliminary view at the underlying structure of the program will
help you better understanding the warnings, errors and other messages that Csound
presents and assist you as you debug your scores and develop your instruments.
Good luck.
䡲 Information messages are printed by Csound as it runs. These include the names
of the orchestra file, number of instruments and so on. They are mainly for reassur-
ance, but they can also indicate small errors.
Message Type
note deleted. Iinteger had integer init errors Information
note deleted. instr integer(integer) undefined Information
As a result of previous errors (in the first case initialization errors in some opcode of
the instrument and in the second case because the requested instrument does not
exist), the note event has been deleted. The required actions are to correct the initial-
ization error, define the instrument, or correct the typing error in the score.
Message Type
audio_in string has integer chnls, orch Information
integer chnls
audio_in string has sr ⫽ integer, orch sr ⫽ Information
integer
While reading an input audio file, there were some inconsistencies noticed, with the
orchestra having a different sampling rate or a different number of channels. This
could be what was meant, but could indicate a problem.
Message Type
cannot load string, or SADIR undefined Fatal
The system failed to load an analysis file; it could not find it; or the SADIR environ-
ment parameter was not set to say where the analysis files are stored.
The names file could not be written. This could indicate a problem with the host-
computer’s file-system, or a file being open by another program.
These three messages refer to the method of deferring the allocation of space for a
table until the data are ready. This is only available for tables generated by method 1
and for it to work, the size of the input file must be determinable. An attempt to use
a “deferred table” before it is properly filled will also give an error.
In the score file a command was found that was not known. This is usually because
an older version of Csound is being used that does not have, for example, the b
command, or because the score file is corrupted; sometimes this happens when a
DOS file is fed to a UNIX system.
In the orchestra, a reference was made to a label that was not defined. This is usually
due to a typing error. Correct the label name and re-render. It could also indicate that
the text defining the instrument was not complete.
A character was found in the score that was not recognized. Arguments to score
events are typically numbers, strings or one of the special ramp and carry indicators.
The character that was not recognized is printed, both as a character and in hexadeci-
mal code in case it was unprintable.
The score attempted to use more that the permitted maximum number of arguments
to an instrument or table. This maximum is currently 800 for most systems, but it
might be more or less depending on local modifications.
An attempt was made to reference a table that was not defined. This is often a simple
user error, in not providing a table that an instrument taken from some other orchestra
requires. Of course, it could also be a typing error. Find and correct.
This message is not so disastrous as it sounds. The score has called for a table to be
removed so the memory it occupies can be recycled. It should have been expected.
In GEN17 there are restrictions on the x-values; for instance, they must be in as-
cending order. In GEN5 and GEN7 segments of the curve must be positive in size.
GEN1 reads in a sound file. If the file is larger that the user-specified table-size then
this message occurs. It could be expected or acceptable, or it may indicate a user
error in ignoring part of the input sound.
There are restrictions on a number of Csound’s parameters and arguments; for ex-
ample, table numbers need to be positive integers. Orchestra/scores that break these
rules will be rewarded with one of these messages.
The variables sr, kr and ksmps must satisfy the equation kr * ksmps ⫽ sr. If this
constraint is not satisfied, one sees this error message. The corrective action is to
change one of the variables, or to omit one so that the default values take over.
115 What Happens When You Run Csound
When Csound starts, the number of tables accepted is set to a smallish number (typi-
cally 300), but if a table of a larger number is used, then the space grows to accom-
modate the new table. This message indicates that this expansion has occurred.
This message summarizes the errors that occurred in the initialization of the num-
bered instrument.
This warning indicates that the instrument used pmax arguments but pcnt were pro-
vided by the score and that these two values were different. The action is either to
fix the score or to fix the orchestra.
An instrument has been scheduled that needs a MIDI channel that has not been as-
signed. The user needs to review the channels being used and their assignments.
Although these messages seem similar, they arise in a number of different contexts.
The first of this group comes from the arguments to an f-table, while the third is from
the command-line argument decoding. This shows that is not just sufficient to read
the message, but that one also needs to be aware of the stage at which it is generated.
116 John ffitch
At the end of a sound rendering, a message like one of these is produced to say how
much sound was produced, in what format and to which file it was written.
An internal error occurred when attempting to get some more memory. It is usually
the case that the user has made inordinate demands on the system, with large tables,
multiple delay lines, or similar excessive uses.
Just reporting the association between MIDI channels and Csound instruments.
An instrument has not been properly terminated in the orchestra. You must terminate
your instruments with an endin statement. This indicates a serious error with the
orchestra and Csound cannot continue.
Each note-event requires a small data-block to maintain the information from one
control-period to the next. When a note terminates, the space is not returned to the
central pool, but retained for subsequent use by the same instrument, or until the end
of a section. When there is no suitable data-block for a new note, this message dis-
plays to show simultaneous use of the instrument. It is usually ignorable, but it does
show the level of multiple invocation of the instrument and can help when looking
for samples-out-of-range.
In argument decoding, names were missing although they had been indicated. A
user error.
In reading a sound file, there was no indication in the file of the correct sampling
rate. Csound takes a default value from the orchestra’s sr and reports it. This could
result in some curious transpositions and you might want to adjust the rates so that
they agree with each other.
Internal errors in preparing an instrument for performance led to some internal struc-
tures being empty. Apart from system errors, this can happen with incorrect use of
goto statements.
The output variable was not of acceptable type for the opcode. Type here refers to i-,
k- or a-rate variables, as well as the spectral w- type.
For d- and w- variables there are additional restrictions that mean they cannot be
reused. This message reports a violation of that rule.
Only a certain number of initialization opcodes can occur outside any instrument, in
the header block. An attempt to use a k- or a- rate opcode in the header will give rise
to this message. A common problem is using ⴝ rather than init in the header.
In the phase vocoder analysis utility, the frameSize parameter must be a power of 2.
It is set with the –n option on the command line.
The first of these says that the analysis file for use by the phase vocoder opcode was
shorter that the input sound, so it was truncated. The second of these is the error
produced by an attempt to use a negative time pointer.
F-tables can be replaced at any time by alternative ones, by using the second argu-
ment that says when the table is required. The message marks a replacement in case
it was not intended.
At least one of the sample rates has been overridden by a command-line switch.
In the diskin opcode, the part of the soundfile skipped is longer that the duration of
the sample, so a skip time of zero is used instead.
Csound does not know what to do with the particular MIDI meta event. It is ignored
with this message.
The system had detected that the data written to a sound file were not accepted. The
common reason for this is that the file system is full. One should never forget the
size of audio files, especially at CD sampling rates and in stereo or more channels.
These are both failures to read a soundfile, either because it could not be found, or
having found it, the opening of the file failed.
A string was found in the score where it is not allowed, particularly in the first three
arguments of a note-event.
In the score, an instrument was detected with more arguments than is allowed. The
value of PMAX is typically 800.
Just an indication that the score is larger than the space allocated allows, so the space
is expanded.
The reported character in the particular section and line is not one that is allowed in
a score file. Look at the score and it should be apparent what caused the problem.
In writing out the score for sorting into time order, an unterminated string was de-
tected. The section and line are given in the message to assist in finding the problem.
A score a-statement has advanced time; this message reports that this has happened.
This rather cryptic message indicates that the program has run out of memory and
will stop. One could purchase more memory, reconfigure the machine, or more prob-
ably check why the orchestra and score used such a large amount. It could be that a
number of large tables are being used.
The command line that invoked Csound called for more that two files. The first
should have been the orchestra; the second should have been the score; so what are
the others files? One possible reason is the omission of a command-line flag charac-
ter “-”.
A macro in either the orchestra or score can only have up to five arguments. This
indicates that the orchestra or score needs some simplification.
All Csound scores are subjected to time warping, which is applying the variations in
tempo indicated by a t-statement. This cannot work if the tempo is negative or zero.
Almost certainly a user error.
121 What Happens When You Run Csound
The orchestra seemed to want to use an opcode that was not recognized by the parser.
This is usually a typing error, or a problem with new versions.
Csound can recognize a number of different sound formats (signed and unsigned
characters, ulaw bytes, shorts, longs and floats), but the one used is not one of the
set. The integer printed in both decimal and hexadecimal is an indication of the for-
mat found, which is helpful to a system debugger if this message is unexpected.
Coda
There are of course many more messages than these and each individual composer’s
style of use tends to generate a personal pattern of errors. So your messages may not
be in this set, but there are general lessons to be learnt about the kinds of errors that
exist and how they are reported. I hope this enumeration has given you some ad-
ditional insight into the workings of the program and its sometimes ambiguous
utterances.
4 Optimizing Your Csound Instruments
Paris Smaragdis
Even though Csound is a fast and highly optimized synthesis language, inefficiencies
in instrument design can force it to be slow. Since modern computer hardware allows
us just barely to reach the level of performance of modern synthesizers, it is impor-
tant that the code be as efficient as possible. It’s not too difficult even to double the
performance of a Csound orchestra simply by rewriting the code and eliminating
clutter. This chapter is about writing efficient instruments and taking full advantage
of Csound’s power.
Optimization
It is important at this point to make the distinction between lines and operations.
Even though the line count is a rough indication of efficiency, it is by no means
sufficient. A line of code can include from one to tens of operations. Optimizing is
124 Paris Smaragdis
midictrl midictrl
(k1) (k1)
127 127
(k2)
1 1
(k3)
440 440
32000 32000
OSCIL OSCIL
1 1
(a1) (a1)
OUT OUT
Figure 4.1 Block diagram of instr 401 and instr 402 illustrating “in-line” coding.
about reducing that number of operations or replacing them with faster equivalents,
not reducing the number of lines. Consider the two instruments shown in figure 4.1
and 4.2.
Even though instr 402 “looks” faster and more concise, it computes at the same
speed as instr 401. That is because the number of operations remains the same,
despite the fact that there are fewer lines. If we wanted to optimize this instrument
we would attempt to reduce the number of operations, as shown in figures 4.3 and
4.4. In this case we have combined the two division operations into one. Taking advan-
tage of algebraic expressions is always an excellent way of removing redundant
operations.
In fact, operation reduction, as described above, is one way of optimizing code
that applies to most languages. Csound, however, is a high-level language that has a
lot of features “under the hood.” It is important that we know something about the
internals of the code, so that we can optimize with greater success. It is often the
case that there are two opcodes that do the same thing, with one being much faster
than the other. Substituting the more efficient opcode will of course improve perfor-
mance. This is knowledge that is not easy to gain. Furthermore, it changes as new
versions of Csound and new hardware platforms become available. Tips that incorpo-
rate that knowledge will be shown in some of the following sections.
The rest of the chapter will be in the form of a checklist, in no particular order,
that contains some standard tips for improving performance.
125 Optimizing Your Csound Instruments
Figure 4.2 Orchestra code, two equally slow MIDI instruments. In instr 401 k-rate expres-
sions are evaluated on individual lines. In instr 402, expressions are evaluated in opcode
arguments.
midictrl
(k1)
127
(k2)
440
32000
OSCIL
1
(a1)
OUT
Figure 4.4 Orchestra code for instr 403, an optimized MIDI instrument in which the number
of computations are reduced by fusing the 2 divisions into 1.
126 Paris Smaragdis
The easiest way to get better performance with minimal effort is to adjust the sam-
pling and control rates. These two rates offer a performance/quality tradeoff. Even
though expert users can select the right combination of rates on the first try, the de-
cision may not be as clear for newer users.
The sampling rate is the most important rate since this is the speed at which the
orchestra will operate. If the sampling rate is high then the computer works harder;
if it is lower then computation is more relaxed. This would be an easy optimization
if there were not any other factor involved. But the sampling rate has to be at least
twice as high as the highest audio frequency you wish to generate. If not, then un-
wanted effects can appear and the sound quality will not be as good.
The high end of our frequency hearing range is around 20000 Hz, which would
seem to require that we use a sampling rate of at least 40000. Not all sounds, how-
ever, include frequencies as high as that and we can often take advantage of infor-
mation to improve performance. If you make use of instruments that do not have
high-frequency content and you have performance problems, then you can afford to
lower the sampling rate.
If you are playing back a cymbal, then you would require almost all the range of
audible frequencies in order to have an accurate rendition and the sampling rate
would have to be high (such as 40000). But if you are synthesizing an acoustic bass,
you can use a much lower sampling rate (22050 or even 11025), since the sound will
most likely not contain higher frequencies. Many well known commercial synthe-
sizers use sampling rates lower that the ideal 40000 and still sound great. Do not be
tricked into believing that you must use a high sampling rate to get better quality;
always let your ears be your guide.
Just to give a feel for how the sampling rate should look: common values start
from 8000 and go as high as 48000. Numbers outside that region will either produce
low-quality sound or slow performance. Using a sampling rate greater than 48000 is
redundant since additional quality would be inaudible. The in-between sampling
rates are not always arbitrary numbers. Sound hardware usually handles the follow-
ing standard sampling rates: 8000, 11025, 22050, 32000 and 44100. Using other val-
ues might result in soundfiles that are not playable from mainstream hardware and
software.
The control rate has a more subtle effect and a more involved explanation. In order
to increase efficiency, Csound can compute more than one sample when an operation
127 Optimizing Your Csound Instruments
is called. It can generate more than one output sample during every iteration and
gain in performance. The effect of this is impressive. If the control rate is equal to
the sampling rate then we have the worst possible performance. You should never
use this setting unless you have a strong reason to do so. If the control rate is set to
half the sampling rate performance can improve by 60%. Usual settings have the
control rate ranging from one-tenth to one-hundredth of the sampling rate. Such set-
tings will speed up your instrument by more than 400%.
In order to deal with memory constraints on older computers, Csound offers a set of
interpolating opcodes (oscili, foscili, tablei, etc.). These opcodes perform additional
computation so that we can use small f-tables, which usually offer low sound quality,
and get the sound quality of large f-tables. In all modern computers, however, there
is sufficient memory to render such opcodes useless. By using the original versions
of the opcodes, instead of the interpolating versions, the performance improvements
can start from to 8% and go up to 30% per oscillator. The only change you need to
perform in your code for this optimization is to remove the “i” character after oscili,
foscili and tablei.
A common problem is that new users tend to implement certain synthesis techniques
by using many simple opcodes. Even though this approach has a strong educational
value, it is always inefficient. Csound is a big synthesis language that offers special-
ized opcodes for a variety of synthesis methods. Before you implement a synthesis
method look at the manual and see if an opcode for it exists. Performance gains by
using specialized opcodes can range from 20% (for simple FM instruments) to even
more than 1000% (for FOF or granular synthesis).
Csound opcodes are optimized to work with different rate variables and you always
get better performance if you use the lowest rate. Consider for example, the follow-
ing instrument that implements a simple vibrato, as shown in figures 4.5 and 4.6.
In this example we observed that the vibrato signal was not so fast as to be a-rate
and we changed it to k-rate. Internally, the oscil opcode will be more efficient, since
it would be using a lower rate variable. Similarly, if you have k-rate variables that do
not change over time, you can get performance gains by changing them to i-rate.
128 Paris Smaragdis
5 1 5 1
OSCIL OSCIL
1 1
(a1) (k1)
440 440
32000 32000
OSCIL OSCIL
1 1
(a1) (a1)
OUT OUT
Figure 4.5 Block diagram of instr 404 and instr 405, comparing a-rate with k-rate vibrato.
Figure 4.6 Orchestra code for instr 404, an inefficient a-rate vibrato instrument compared
with instr 405, and an efficient k-rate vibrato instrument.
0 p3 1 32000
0 p3
LINE
(k1) LINE
(k1)
32000 440
440
OSCIL
OSCIL 1
1
(a1)
(a1)
OUT
OUT
Figure 4.7 Block diagram of instr 406 and instr 407, comparing inefficient and efficient
amplitude scaling.
Figure 4.8 Orchestra code comparing instr 406, an instrument with inefficiently scaled en-
velope to instr 407, an instrument with envelope scaled efficiently by setting the scale directly
as an opcode argument.
Most synthesis methods offer more control than we want and we often do not take
full advantage of an instrument. If we do not make full use the inputs of every op-
code, it suggests that we could use something more efficient. Here is a simple ex-
ample using FM synthesis. If we do not use a varying modulation index and a varying
frequency ratio then we are generating a constant waveform, which we can do much
faster. In figure 4.9 we show a basic FM instrument in which f 1 is a sine.
In the instrument shown in figure 4.10, f 1 is a frequency modulated sine wave.
Thus performance improvements can start at 18% and go up very substantially.
130 Paris Smaragdis
Figure 4.9 Orchestra code for instr 408, an instrument illustrating redundant oscillator
calculation.
Figure 4.10 Orchestra code for instr 409, an optimized static FM instrument that reads a
precomputed wavetable.
It is well known that computers are slower at performing division than multiplica-
tion. We can take advantage of that to perform a simple optimization. Expressions
that divide a signal can be altered to use multiplication of the inverse as shown in
figure 4.13.
Performance improvement is not great, but when used consistently such an alter-
ation can make a significant difference, especially if Csound is run on special pur-
pose hardware such as digital-signal-processing (DSP) boards.
If there is no special reason not to do so, you can replace exponential segment op-
codes such as expseg and expon with their linear counterparts linseg and line. This
is a dangerous optimization, since it will change the way the instruments sound; but
131 Optimizing Your Csound Instruments
p5 440 p5 440
OSCIL OSCIL
1 1
(a1) (a1) 2
sin TABLE
(a2)
(a2)
p4
p4
Figure 4.11 Block diagram of instr 410 and instr 411, comparing efficient function calls
using Csound’s value converters to Csound’s more inefficient table look-up.
Figure 4.12 Orchestra code for instr 410 and instr 411, one using Csound’s value converters
and function calls whereas the other is using inefficient table-lookup.
if sound fidelity has to be sacrificed for performance, this can be one of the first
optimizations to take place. Usually the linear opcode is much faster than the expo-
nential version.
Even though this seems like a memory optimization problem, it also affects per-
formance. Consider the instruments shown in figures 4.15, 4.16, and 4.17. These
compare the use of dedicated unique output variables to the reuse of many fewer
output variables.
Every time we call instr 414, Csound has to allocate space for the five variables
called for by this instrument design. Such allocation can be a time-consuming opera-
tion, especially in the case of audio variables. Also, since it takes place at an impor-
132 Paris Smaragdis
.5 p3 1 2 p3 1
LINE LINE
(k1) (k1)
440 440
32000 32000
OSCIL OSCIL
1 1
(a1)
OUT OUT
Figure 4.13 Block diagram of instr 412 and instr 413, comparing inefficient division to
efficient multiplication.
Figure 4.14 Orchestra code for instr 412 and 413, comparing inefficient use of division with
efficient use of multiplication.
tant point in performance, when other instrument initialization is taking place, it can
cause an audible click every time we play a note. A simple way around this is to
rewrite the instrument as shown in figure 4.17. This time only one allocation is re-
quested and besides memory savings, we also reduce the risk of clicks. We also help
improve data locality, which significantly helps performance on modern computers.
It is important to use this tip judiciously. Constant re-use of a variable can make
code hard to decipher and will certainly hinder any future debugging efforts.
A lot of users print debugging statements from their instruments and enable Csound’s
runtime messages. This is useful when developing an instrument, but printing mes-
133 Optimizing Your Csound Instruments
8000 p5 8000 p5
OSCIL OSCIL
1 1
(a1) (a1)
2 2
(a2) (a1)
4000 4000
(a3) (a1)
sqrt sqrt
(a4) 1 (a1) 1
REVERB REVERB
(a5) (a1)
Figure 4.15 Block diagram of instr 414 and instr 415, showing inefficient declaration of
unique output variables versus the highly efficient reuse of the same variable.
Figure 4.16 Orchestra code for instr 414 illustrating the use of a unique a-rate variable for
each output argument.
Figure 4.17 Orchestra code for instr 415, showing the reuse of the same a-rate variable for
each subsequent output.
134 Paris Smaragdis
sages is one of the most inefficient operations you can ask your computer to perform.
If there is no need for printed information, disable the messages using the –m 0 flag
and remove the print statements from your code. Performance improvements can be
extremely high and if you are generating audio in real-time you will also dramatically
reduce the danger of audible clicks.
Be Wary of Soundfiles
On all computers, reading and writing to disk is a slow procedure. If you have an
instrument that uses the soundin opcode you might want to consider rewriting it so
that it references the sound from memory using an f-table. This is not always possible
owing to memory constraints, but the efficiency gains are big. Also, the soundfile
format that you use can make a big difference in performance. If you use a big-
endian machine (most UNIX boxes and Macintoshes) it is best to use a big-endian
format such as AIFF. For little-endian machines (x86 and DEC Alphas) it is best to
use the WAVE format. If a computer has to write to a non-native byte format it will
have to go though a conversion routine, which is usually slow. Disk input/output
(I/O) can occupy a significant amount of the overall time of the computer’s work,
so optimizing with respect to this feature is important.
Concluding Remarks
These are some of the most common optimizations used in Csound. Be warned that
Csound can behave differently on different architectures and some ways of doing
things are not always faster than others. In general, on today’s general purpose com-
puters, all of the above tips apply and it is only the performance ratio differences that
vary. Also make sure you have a version optimized for your computer. If you have a
Power Macintosh do not use the 68K processor version and if you have a Pentium
Pro you can get much better performance from a version specifically for this CPU
instead of a 486.
Finally, note that, with the exception of the rates and exponential-to-linear sec-
tions, the optimizations that were presented do not alter the output of the instrument.
Optimization that potentially alters the sound quality is a complicated procedure that
requires strong listening and sound design abilities. This is a skill that Csound users
develop slowly with experience and one that is hard to teach.
135 Optimizing Your Csound Instruments
To close this chapter, I will present a real-life inefficient instrument in figure 4.18
that you can use to test your optimization knowledge. If you optimize it well enough
you should get a significant improvement in the speed of computation. Good luck!
5 Using Csound’s Macro
Language Extensions
John ffitch
Since version 3.48 of Csound, the system has been enhanced by two macro facilities,
one for the score language and one for the orchestra file. Of course, these are similar,
but their uses are quite different. A macro is a string that gets replaced by some other
text, but this simple description does not give any indication of how complex macros
can be. Their power is deceptive. These macros can be used to make the writing of
scores and orchestras easier and more tuned to an individual’s style and taste. In the
description here, I present a number of examples showing how macros can be used
in Csound, but there will surely be a large number of uses that you will imagine and
that will ultimately find a place in your personal world.
In the score, macros are textual replacements that are made as the score is being
read—even before score sorting. The macro system in Csound is a simple one and
uses two special characters to indicate the presence of macros, the characters # and
$. To define a macro, one uses the # character, at the start of a line.
#define NAME #replacement text#
The NAME of the macro can be anything made from letters, upper or lower case and
digits, though digits may not be used as the first character of the name. The replace-
ment text is any character string (not containing a #) and can extend over more than
one line. The exact text used for the replacement is enclosed within two further #
characters, which ensures that additional characters are not inadvertently captured
and there is complete control over spaces. Remember that the replacing is exact text
and takes no account of complete words.
138 John ffitch
Figure 5.1 Definition and use of a p-field macro to replace p4, p5 and p6 in the score.
To use a macro after it has been defined, the NAME must follow a $ character and
must be terminated by a period. If the period is omitted, then the next nonletter or
nondigit terminates the name, but this can be ambiguous and so its use is not encour-
aged. The complete string $NAME. is replaced by the replacement text from the
definition. Of course the replacement text can also include macro calls, as long as
they are defined by the time of expansion.
If a macro is not required any longer it can be removed with:
#undef NAME
A simple example of this could be when a note-event has a set of p-fields that are
repeated in all, or nearly all occurrences. We can define a macro for this set of values:
1.01, 1.99 and 138 as shown in figure 5.1. This will get expanded before sorting into
the score shown in figure 5.2. This can save typing and is easier to change if, for
example, one needed to change one of the parameters. If there were two sets of
p-fields one could have a second macro, as there is no real limit on the number of
macros one can define.
139 Using Csound’s Macro Language Extensions
#define C #8.00#
#define Csharp #8.01#
#define Dflat #8.01#
#define D #8.02#
#define Dsharp #8.03#
#define Eflat #8.03#
#define E #8.04#
#define F #8.05#
#define ... ...
Figure 5.4 Macros that convert cpspch values into actual note names.
Figure 5.5 A simple note-list employing the cpspch macro from figure 5.4.
Another simple use for a macro might be to assist in remembering the relationship
between the traditional note names and the decimal notation used in connection with
the cpspch opcode by defining a set of macros as in figure 5.4.
Clearly this use of the macro facility makes the traditional Csound score easier to
read and write. But do note that care is needed with textual macros as they can
sometimes do strange things. They take no notice of any meaning and so spaces are
significant, which is why the definition has the replacement text surrounded by “# ”
characters. Used carefully, simply macros are a powerful concept, but they can be
abused.
Macros can also be defined with parameters. This can be used in more complex
situations, where there are a number of repeats of a string with minor differences. To
define a macro with arguments, the syntax is:
#define NAME(A#B#C) #replacement text#
Within the replacement text the arguments can be substituted by the form $A. In
fact, the implementation defines the arguments as simple macros. There may be a
few arguments, currently up to 5 and their names can be any choice of letters exactly
as with macro names.
140 John ffitch
In use, the argument form is a macro and so has the same rules as the other macros.
The score with macros and an argument shown in figure 5.6 expands to the notelist
shown in figure 5.7. As with the simple macros, these macros can also be undefined
with:
#undef NAME
Note that the arguments are not required when undefining a macro.
Another use for macros is when writing a complex score with many instruments,
where it is sometimes all too easy to forget to what the various instrument numbers
refer. One can use macros to give names to the instrument numbers. An example is
shown in figure 5.8.
It is sometimes convenient to have the score in more than one file. This use is sup-
ported by the # include facility, which is part of the macro system. A line containing
the text:
#include "filename"
141 Using Csound’s Macro Language Extensions
s ; DEFINE A SECTION...
#include “section1”
s ; ... AND REPEAT IT
#include “section1”
Figure 5.9 Using the include facility to read a section of score from another file and repeat it.
where the double-quote character, ", causes the text from the named file to be inserted
into the score at this point and when input is finished, to revert to the previous input.
There is currently a limit of 20 on the depth of included files and macros.
One suggested use of # include would be to define a set of macros that are part of
the composer’s style. Another would be to use # include to provide repeated sections.
Repeated Sections
Originally, the way to repeat sections in Csound was to copy and paste the text in the
score file. As was previously shown, a newer method would be to employ the #in-
clude facility. A third alternative would be to use the new r directive introduced into
the score language in version 3.48.
r4 NN
The directive above (r4 NN) starts a repeated section, which lasts until the next s, r
or e directive. The section is repeated 4 times in this example. In order that the
sections may be more flexible than a simple “copy-and-paste” repetition, the macro
NN is given the value of 1 for the first time through the section, 2 for the second, 3
for the third and 4 for the fourth. This can be used to change p-field parameters, or
indeed ignored. But a macro name must be given even if it is not used. (It should be
noted that because of serious problems of interaction with macro expansion, sections
must start and end in the same file and not in a macro).
Evaluation of Expressions
In earlier versions of Csound the numbers presented in a score were used as given,
but this could be an irksome restriction. There are occasions when some simple arith-
metic evaluation would make the construction of the score easier. This need is in-
creased when there are macros. On the accompanying CD-ROM significant score
generation techniques are described and for handling real complexity this is the best
142 John ffitch
r3 CNT
i 509 0 [0.3*$CNT.]
i 509 ⫹ [($CNT./3)⫹0.2]
e
s
i 509 0 0.3
i 509 0.3 0.533333
s
i 509 0 0.6
i 509 0.6 0.866667
s
i 509 0 0.9
i 509 0.9 1.2
e
approach. However for some simpler cases, the syntax of an arithmetic expression,
within square brackets [ ], has been introduced to the score language. Expressions
built from the four arithmetic operations ⴙ, -, * and / are allowed, together with ( )
grouping. The expressions can include numbers and, naturally, macros whose values
are numeric or strings that are arithmetic expressions. All calculations are made in
floating-point numbers. (Note that unary minus is not yet supported.)
As an example of this syntax use, consider a short section that is to be repeated
three times, but with differences in the lengths of notes (see figure 5.10). As the three
copies of the section have the macro $CNT. set with different values of 1, 2, and 3,
this expands to the notelist shown in figure 5.11. The changes from section to section
here are fairly major, but the evaluation system could also be used to ensure that
repeated sections are subtly different, through the introduction of small changes in
amplitude or speed.
One could also use the evaluation mechanism in conjunction with macros to pro-
vide alternative ways of describing pitches. If, instead of the macros $C. $Csharp.
etc., which we introduced earlier, we used one set of macros for the pitch-class and
another for the octave, we could have a system as shown in figure 5.12. Note the use
of the ⫹ inside the definitions. This allows a score to like the one in figure 5.13.
Surely you will be able to think of similar and possibly more readable uses.
143 Using Csound’s Macro Language Extensions
#define C #0.00⫹#
#define Csharp #0.01⫹#
#define Dflat #0.01⫹#
#define D #0.02⫹#
#define Dsharp #0.03⫹#
#define Eflat #0.03⫹#
#define E #0.04⫹#
#define F #0.05⫹#
#define ... ...
#define oct(N) #6⫹$N.#
Figure 5.12 Macros that convert cpspch to pitch class and octave.
Figure 5.13 Score file employing the pitch class and octave macro from figure 5.12 plus the
argument macro from figure 5.3.
Another alternative to using either # include or repeated sections for reusing parts of
a score is provided by the marking and insertion of named sections. The score state-
ment below introduces a marked location in the score.
m Name
The mark need not be at the start of a section but it usually will be. Csound will
remember this source position and file, then associate it with the given name so it
can be used later in an n statement. For internal reasons, this statement can only be
used in a file and not as part of a macro expansion.
The related n statement will include the score from a previously marked location,
referred to by name, until the end of the section. The expected use of these two
statements is for verse and chorus structure, or a simple rondo. The start of the first
chorus is marked with the m statement. Then subsequent chorus can use the n state-
ment. This use is demonstrated in the simple score shown in figure 5.14. Of course,
for a chorus as simple as this, a macro might have been simpler to use, but for lengthy
repeated loops or grooves, this will prove a useful addition to your repertoire of
shortcuts.
144 John ffitch
; 1ST VERSE
i 511 0 .3 15000 8.00
i 511 ⫹ . 9000 8.02
i 511 ⫹ . 12000 8.04
i 511 ⫹ . 9000 8.00
s
m Chorus
i 512 0 .3 20000 8.07
i 512 ⫹ . 12000 8.09
i 512 ⫹ . 18000 8.07
i 512 ⫹ . 11000 8.05
s
; 2ND VERSE
i 511 0 .3 15000 8.04
i 511 ⫹ . 9000 8.00
i 511 ⫹ . 12000 8.02
i 511 ⫹ . 9000 8.00
s
n Chorus
s
; 3RD VERSE
i 511 0 .3 15000 8.04
i 511 ⫹ . 9000 8.00
i 511 ⫹ . 12000 8.04
i 511 ⫹ . 9000 8.05
s
n Chorus
s
; 4TH VERSE
i 511 0 .3 15000 8.04
i 511 ⫹ . 9000 8.04
i 511 ⫹ . 12000 8.02
i 511 ⫹ . 9000 8.02
s
i 511 0 4 15000 8.00
i 512 .05 4 10000 7.001
Figure 5.14 Score that employs the marking and naming statement to conveniently inter-
sperse, reorder, and repeat sections.
145 Using Csound’s Macro Language Extensions
Modifying Time
The basic Csound score language has had a facility for varying the rate at which time
passes with the t “time-warping” statement. Sometimes, however, this is not suffi-
cient for what one really wants. There are two “local” score statements that can be
used for organizing components of the score, one that changes the base from which
time is counted and another that changes the rate at which time passes for part of a
score. The score statements are b and v.
The b score statement sets the internal base-clock, so that all subsequent note-on
sets have this new base-time added. The effect is textual and lasts until the end of a
section or the next b directive.
A typical use for this command would be when you have a phrase or theme that
you want to repeat without waiting for all the sound to cease, as in a section end,
you could reset the base-time and repeat the actual event list, possibly even with
a # include.
Using instr 511 and 512 from an earlier example, one could use the b statement
to overlap two copies of the motif and form a strict canon. This exact repeat can sub-
sequently be moved to a different starting time by changing the base time-rather than
having to modify each event where time is mentioned explicitly.
If the motif is long and complex this is a significant saving in score editing time.
Used with the # include facility it could make for an interesting use of scores with
Figure 5.15 Score that employs the resetting of the base-time to create a canon.
146 John ffitch
overlapping repeats. But since the expansion takes place prior to the score sorting,
the score.srt file will have no indication of this operation.
There is one other way in which sounds, associated in the score, can be organized
into higher level structures. As you know, the p2 field is used to set the start time of
a “note.” It is possible to use the ⴙ statement as a shorthand way of having the cur-
rent note’s start-time be immediately following the end of the previous event. The fol-
lowing sequence of notes:
i 511 0 .5 9500 8.00
i 511 ⫹ . 9000 8.02
i 511 ⫹ . 8000 8.04
i 511 ⫹ . 7000 8.00
translates as follows:
i 511 0 .5 9500 8.00
i 511 .5 . 9000 8.02
i 511 1 . 8000 8.04
i 511 1.5 . 7000 8.00
As you can see, when a ⴙ statement is placed in p2, that note event’s start-time is
equal to the sum of the previous note’s p2⫹p3 (the start ⴙ duration). This wonderful
shorthand has one serious limitation, however: it applies only to lists of notes played
on the same instrument. To address this limitation, a new version of the ⴙ statement
has been added—^ⴙ. In its purest form, the ^ⴙ directive will set the start-time of
the current note-event equal to the sum of p2⫹p3 from the “previous” note-event.
The form ^ⴙ1 indicates 1 time unit after the start of the previous note-event. This
(with the ⴙ directive) provides a way of grouping a small set of events together.
Incidentally, the number following a ^ can be negative, so one could arrange to write
a motif backward, provided time does not go negative!
The other local-textual operation is to treat all time as passing at a different rate—
to allow independent simultaneous tempo control. It is now possible, using the v
directive, to modify the speed of a number of lines before the global time warping is
computed by the t directive. A simple use of this would be to generate some phase-
shifting of themes, in ways often used in minimal music.
Another suggested use is to have independent time scales for two melodic ideas,
but played simultaneously, as employed in many of the works of Charles Ives. As
with all these score operations, the real use is for you, the composer, to determine.
These operations are not supposed to provide all possible facilities, for which one of
the specialized score generation languages is a better approach, but it provides just
another small collection of potentially time-saving facilities.
147 Using Csound’s Macro Language Extensions
w 0 60
f 1 0 0 4096 4096 10 1
f 2 0 0 4096 4096 10 10 0 5 0 1
w 0 60
Figure 5.16 Score that employs the local relative positioning of events followed by the
sorted score output found in the score.srt file.
148 John ffitch
Figure 5.17 Score that employs the local time variation to generate “phase music.”
The same style of macro facilities are available as part of the orchestra reading. These
macros are also textual and are independent of the score macros. The syntax for a
simple macro is:
#define NAME #replacement text#
#undef NAME
We could use this form, for instance, to extend the reverberation system and allow
different audio-rate variables to be sent to the reverberation instrument as illustrated
in figure 5.20. This expands to the orchestra shown in figure 5.21.
149 Using Csound’s Macro Language Extensions
Figure 5.18 Macro used in orchestra to substitute for several lines of code.
Figure 5.19 Expansion of reverb macro from figure 5.15 into two different instruments.
Figure 5.20 Orchestra macro with unique argument substitutions per instrument.
150 John ffitch
#define CLARINET(I) #
instr $I.
ipanl ⫽ sqrt(p6)
ipanr ⫽ sqrt(1-p6)
ka linen p4, .01, p3, .1
kv linseg 0, 0.5, 0, 1, 1, p3-0.5, 1
a1 wgclar ka, p5, -0.2, 0.15, 0.1, 0.25, 5.735, kv*0.01, 1
outs a1*ipanl, a1*ipanr
$REVERB(a1)
endin #
A rather extreme, but practical, use of macros in the orchestra would be to have
each instrument defined as a macro, with the instrument number as a parameter.
Figure 5.22 shows what one would write in the file clarinet. So now we have a clari-
net instrument whose number is a macro argument. If we have similar files for other
instruments, an entire orchestra could be constructed from a number of # include
statements followed by macro calls, as in figure 5.23, to give them specific instru-
ment numbers for the piece—it’s just like having your own Csound librarian pro-
gram. This shows also that the # include syntax is allowed in orchestras as well as in
scores. It should be noted, however, that there is no evaluation system in the orchestra
as it is not needed to the same extent.
Conclusion
The Csound score language is essentially just a list of events. The system allows the
user to write the events in any order; there are time-warping controls for tempo; and
there are sections for organization. In this chapter we have seen that macros are a
151 Using Csound’s Macro Language Extensions
#include “my_orc_header”
#include “my_orc_macros”
#include “fx”
#include “clarinet”
#include “bassoon”
#include “guitar”
$CLARINET(517)
$BASSOON(518)
$GUITAR(519)
Figure 5.23 Using # include and macros to load a “library” of Csound instruments into the
orchestra by name and assign their instrument numbers as macro arguments.
useful way in which this simplicity can be extended, in an individual fashion. Clearly,
macros and the additional score shortcuts do not take the place of the more general
score-writing language described on the CD-ROM, but they can make the task of
constructing a score a little easier. In addition, the extensions for repeated and
marked sections, together with base-time setting and local variation of time are
aimed at facilitating the entry of traditional and popular groove-based structures.
These both provide another simple tool for beginners who often complain about the
tedium of typing a score.
It was also shown that macros could be used in the orchestra, mainly to customize
and personalize the orchestra language a little. Again, there are a number of powerful
and intuitive graphical tools for designing Csound instruments, but in many cases, a
few macros can simplify the most straightforward and direct way of doing the work
and make the code more readable. Finally, using included instrument files, is just
another approach that can help you to organize your instrument collection. Over time
you will no doubt construct your own library of macros and include them in your
scores and orchestras, with the # include facility, as a matter of course.
Imitative Synthesis
6 Designing Acoustically Viable
Instruments in Csound
Many chapters of this book discuss the fundamental techniques of sound synthesis.
It is important to remember that, in the abstract, these methods can create almost any
sound imaginable. How do we decide which method or combination of methods will
work best? What defines the best method? What makes a synthetic sound interesting
and attractive to a listener? Or to a performer?
Acoustic Viability is a principle of synthetic instrument design that recognizes the
importance of real instrument acoustics and their impact on timbre and expression.
By incorporating this sensibility in your instrument designs, you can create synthetic
instruments that are expressive, evocative and compelling. For the performer,
whether human or computer, instruments that are acoustically viable are intrinsically
more satisfying.
This chapter will discuss the differences between classical and modern acous-
tic theory and the notion of acoustic viability. There will be examples of building
synthetic instruments that are acoustically viable, including transformations across
intensity and frequency. The last section will present other options for exploring
acoustic viability.
In the late 1800s, Heinrich Helmholtz developed a set of theories and principles that
we call the classical acoustic theory. These theories provide a reasonably accurate
description of the physics of sound, in particular, musical or instrumental sound.
Along with the mathematical theories of Fourier, Helmholtz’s ideas about instrumen-
tal acoustics remain the basis of most sound synthesis techniques.
According to the classical acoustic theory, instrument sounds are periodic vibra-
tions that travel like waves through air. The number of vibrations per second is the
156 Stephen David Beck
sound’s fundamental frequency, which a listener hears as pitch. The timbre of any
musical instrument is described by a spectrum of multiple frequencies whose general
amplitude varies over time. Component frequencies (partials) within the spectrum
have values that are integer multiples of the fundamental frequency. A piano has a
spectrum whose partials decrease in relative amplitude. On the other hand, a clarinet
has a spectrum where only odd-numbered multiples of the fundamental appear (in
decreasing amplitude).
The amplitude envelope, which describes the change of overall amplitude over
time, is in three distinct sections: the attack, the steady state, the decay. The attack is
the section of time at which the instrument is first heard. The steady state continues
for as long as the performer plays the note and the decay is the portion of time when
the musician stops playing but the sound continues. A clarinet’s amplitude envelope
has a modest attack, a well defined sustain and a short decay. A piano’s envelope has
a sharp attack, no sustain, and a long decay.
According to Helmholtz, it is these differences between harmonic content and be-
tween amplitude envelopes that allow us to recognize one instrument sound from an-
other. It is these differences that allow us to recognize the voices of different people.
For the most part, the principles that Helmholtz developed remain an influential
and important part of our understanding of acoustics. Both analog and digital synthe-
sizers still use the classical acoustics model as the foundation of their internal design.
Like Newton’s theories of physics, the classical acoustic model begins to break down
when we examine sound on a small scale. When the harmonic content of sound is
examined over small time periods, (roughly 40–50 milliseconds), we discover that,
contrary to the Helmholtz model, a sound’s spectrum changes profoundly over time.
During the attack portion of a sound, harmonic content may change rapidly and un-
predictably. This phenomenon is called the initial transient. During the decay, upper
partials tend to disappear before the entire sound fades away. While the sustain por-
tion of the sound is certainly more stable than the attack or decay, it is hardly as static
as Helmholtz would suggest.
An examination of the spectra of an instrument playing different notes and differ-
ent dynamics demonstrates the most important difference between the classical
model and our modern understanding of instrumental acoustics. Loud and soft articu-
lations of the same pitch produce substantially different spectra (e.g., a loud note has
much more harmonic content than a soft note). In recognizing the timbral changes
between the three registers of a clarinet, we can infer that all instruments have such
157 Designing Acoustically Viable Instruments in Csound
timbral shifts as they move through their performance range. Flute, violin, cello,
bassoon, piano, all have profoundly different timbres in their respective registers.
Clearly, the basic premise of classical acoustic theory (a static timbre with an am-
plitude envelope) is by no means an accurate representation of sound. Nor does the
classical model predict the wide variations in timbre owing to dynamics and pitch.
Modern acoustic theory provides us with a more comprehensive understanding of
the subtleties of acoustic instrumental timbres. Timbral variations, owing to changes
of intensity, are powerful acoustic cues to musical dynamics. And timbral variations
owing to changes in frequency are critical components of an instrument’s sonic char-
acter. Recognizing the importance of these cues is the foundation of Acoustic
Viability.
Perhaps the most attractive part of computer music is the ability to design, create and
compose for new sounds and new timbres. But the freedom that comes with the clean
slate of synthetic instrument design has its own problems. Namely, where do you
start in designing these new sounds? What designs produce interesting sounds? What
designs provide composers with instruments capable of the same expressiveness that
acoustic instruments have?
Acoustic Viability is a design aesthetic that provides composers with a set of
principles to create powerful, expressive and evocative sounds. By recognizing the
importance of instrument acoustics and its relationship to expression, synthetic
instrument designers can build synthesis processes that respond to changes in loud-
ness, pitch, and articulation that are consistent with our understanding of acoustic
instruments.
Although the principle may seem intuitive, its importance cannot be overstated. A
common complaint about “standard” synthesis instruments, particularly those that
try to emulate orchestral instruments, is that they sound flat, dull, and inexpressive.
If a synthetic instrument’s timbre responds to variations in pitch and loudness in ways
that are consistent with acoustic instruments, then the synthetic instrument will be
perceived as being viable in an acoustic space. Its sound becomes acoustically viable.
If the virtual space projected beyond the loudspeakers reacts in an acoustically valid
manner and the sounds in the space are acoustically viable, then the listener will be
convinced the sounds are real, in spite of an intellectual awareness that the sounds
are in fact synthesized.
This aesthetic is by no means new. The vast majority of research in sound syn-
thesis has focused on the acoustic properties of musical instruments. The extensive
158 Stephen David Beck
The relationship between the frequency of vibration and timbre is readily apparent
in some instruments, but less clear in the abstract. The clarinet, for example, has
159 Designing Acoustically Viable Instruments in Csound
three distinct registers; three distinct pitch ranges with different timbral characteris-
tics. The Chalumeau register is dark and moody, ranging from its lowest note written
(E below middle C to F above middle C). The throat register is tight and weak, (be-
tween F# and A# written above middle C). And the clarion register is bright, clear and
sometimes piercing, ranging upward from B above middle C to as high as the per-
former dares.
It is sometimes hard to believe that a single instrument can have such a variety of
timbres. But the reality of the clarinet’s three registers proves the impact of a reson-
ating body on an instrument’s timbre. Resonators, by their nature, tend to amplify
certain frequencies louder than others. These resonant zones, or formants, are
uniquely related to the size and shape of the instrument and its resonator. It is these
properties that give instruments like the flute, the clarinet, the bassoon and the cello
such a wide range of timbral possibilities.
The relationship between applied energy and timbre is relatively clear. As more en-
ergy is put into the instrument, higher modes of vibration are achieved. With that,
more partials are present in the frequency spectrum. This is why a note played forte
is not just louder in volume than a note played piano, but brighter in timbre as well.
The connection between amplitude, frequency and amplitude envelope is far less
obvious. Before an instrument is played, it rests in a state of equilibrium. As with all
mechanical systems, there is a certain amount of resistance or inertia that keeps the
instrument from vibrating on its own. Performers must overcome that inertia before
their instrument will sound properly. The more energy a performer uses, the faster
that resistance is overcome and the faster the instrument reaches its “steady-state”
vibration.
A piano has strings of different lengths and thickness, ranging from the thick, long
low strings to the thin, short high strings. Different fingerings on a wind instrument
produce different lengths of air columns—longer columns mean more mass to vi-
brate. We know from basic physics that large masses have more inertia to overcome,
but also have more momentum once they are in motion. Similarly, low notes take
longer to sound, but last longer than high notes with the same initial energy. In all
160 Stephen David Beck
instruments, the relative speed of its amplitude envelope is directly related to the
pitch (and mass) of a given note—higher pitches are faster than lower pitches.
The connections between amplitude, frequency, envelopes and timbre are critical
pieces of our ability to identify musical sounds, their dynamics and their pitch. Build-
ing synthetic instruments that have these relationships is the foundation of acoustic
viability.
The importance of the timbral cues to pitch and dynamics cannot be ignored, espe-
cially when designing instruments for real-time performance situations. Performers
also depend on these cues for performance feedback. They are trained for years to
listen to these subtle cues of timbre from their own instruments. When performers
are given an acoustically viable synthetic instrument, the connections between per-
formance technique and timbre become intuitive.
Pitch-Related Transformations
In acoustic instruments, pitch-based timbre shifts are due to the formants of a reso-
nating body. Formants are areas of the frequency spectrum that are naturally ampli-
fied by the resonator. These can be modeled with simple reson filters that have a low
Q value.
Q, the “quality” of a bandpass filter, is defined as the ratio of a reson’s center fre-
quency and bandwidth. When Q is on the order of 5 or less, the filter acts as a gen-
eral boost to that portion of the frequency spectrum. But as Q rises, the filter begins
to resonate its center frequency with greater clarity at the exclusion of frequencies
outside the stopband.
161 Designing Acoustically Viable Instruments in Csound
Q ⫽ Fcenter/Bandwidth (6.1)
The instrument shown in figure 6.1 attaches two reson filters to the end of an oscil
and linen opcode. The filters are at fixed center frequencies. Each has a different
Q. The output of the filters is then balanced against the prefilter signal to assure
reasonable amplitudes.
Low-Q reson filters produce a subtle but noticeable spectrum transformation
across the instrument’s registers. Higher Q values produce more extreme transforma-
tions over the frequency spectrum. This can be used to exaggerate the relationship
between pitch and timbre. The three instruments in 601.orc demonstrate the variety
of transformations available as each of these reson instruments has a more extreme
Q value.
Amplitude-Related Transformations
The connection between amplitude and timbre is clear. When higher amounts of
energy are put into a vibrating system, higher modes of vibrations are achieved and
more partials appear in the spectrum. There are several strategies that can be used to
simulate this connection.
p5
LINEN
CPSPCH
(kenv)
OSCIL
p6
(asig)
ifc1/iq1 ifc2/iq2
ifc1 ifc2
RESON RESON
(ares1) (ares2)
(areso)
BALANCE
(aout)
Figure 6.1 Block diagram of instr 601, a parallel band-pass filter instrument.
Figure 6.2 Orchestra code for instr 601, a parallel filter instrument with fixed center fre-
quency and bandwidth.
p4
AMPDB
LINEN
p5
(kenv)
index 99 CPSPCH
TABLE
(ift)
OSCIL
ift
(asig)
Figure 6.3 Block diagram of instr 604, a simple instrument that uses amplitude (p4) to index
a table.
Figure 6.4 Orchestra code for instr 604 in which oscil uses indexed values to specify func-
tion table.
Figure 6.5 Score code for instr 604, in which f 99 uses ampdb(p4) to index into the non-
normalizing version of GEN17 (⫺17) to return the specific f-table ( f 1–f 10) used by the oscil.
164 Stephen David Beck
p4
20
(imodin)
p4 1
AMPDB 1
(iattmod) p3 .1
.1 p3 .1
p5 LINEN
LINEN
CPSPCH (kmodenv)
(kenv)
1 1 1
FOSCIL
(asig)
Figure 6.6 Block diagram of instr 606, a dynamic FM instrument using the foscil opcode.
Figure 6.7 Orchestra code for instr 606, a FM instrument with amplitude mapped to attack-
time and mod-index.
they do have tools in FM and AM to dynamically alter timbre during the course of a
note’s duration. This dynamism of timbre is simply not possible using wavetables.
FM and AM provide elegant solutions to the connection between amplitude and
spectral content. Scaling amplitude to the modulation index of any FM or AM instru-
ment creates an effective connection between the dynamic of a note and its timbre.
If the modulation index is modified by a time-variant envelope, a scaling of the enve-
lope’s attack-time further emphasizes the connection and strengthens the aural cue.
The example shown in figure 6.6 ties the amplitude (in dB) to the modulation-
index of a foscil opcode. The attack time of the index’s envelope is also scaled to the
amplitude (higher dB ⫽ faster attack).
165 Designing Acoustically Viable Instruments in Csound
The relationships between pitch, amplitude and envelopes are complex and interre-
lated. Pitch is a reflection of the amount of mass that must be initiated into a vibra-
tional mode. Low pitches mean higher mass and more inertia to overcome, hence
slower attacks in energy-dependent envelopes (i.e., amplitude envelopes, modulation
envelopes). But low pitches also mean more momentum within the vibration, which
lengthen decays. Higher pitches mean lighter masses, less inertia, faster attacks and
shorter decays.
Amplitude is the amount of energy used to overcome the inertial resistance of
that mass and sustain vibrational momentum. Low amplitudes mean less energy to
overcome the resistance and less energy to sustain the momentum. This results in
slower attacks and faster decays. Higher amplitudes mean more energy, inertia is
overcome quicker, faster attacks and longer decays.
Devising a computational linkage between pitch and envelopes is relatively
straightforward. The same is true for amplitude and envelopes. But creating one that
compensates for both is much more complicated. The following example (instr 607 )
modifies an amplitude envelope through a combined multiplier derived from ampli-
tude and pitch values ( p4 & p5).
The multiplier is created from lookup-tables that independently correlate ampli-
tude to attack and decay times and pitch to attack and decay times. The amplitude
and octave attack values are multiplied together and used to scale the score’s attack
value. The same for the decay values. The instrument also includes a spectral trans-
form function that returns modulation index scalars from amplitude (a different ap-
proach from example instr 606).
The scalar values used in the example instrument’s f-tables provide realistic trans-
formations between amplitude, frequency and spectral envelopes. For the amplitude-
attack lookup-table, an exponential curve returns smaller attack scalars, (meaning
shorter attack times), for higher amplitudes. The amplitude-decay table returns expo-
nentially higher scalars, (longer decays), for higher amplitudes. The octave-attack
and decay tables both use linear functions that return smaller values, (shorter times),
for higher frequencies. Higher amplitudes also return exponentially higher modula-
tion indexes, which result in brighter spectra (more partials).
p5
OCTPCH
10
(iocti)
iampi 90 iampi 91 92 93
p7 p8
(iattack) (idecay)
p4 94
TABLE
(imodf)
p4
p9
AMPDB (imodind)
p3
p3
p5 LINEN
LINEN
CPSPCH (kmodenv)
(kenv)
1 1 p6
FOSCIL
(asig)
Figure 6.8 Block diagram of instr 607, a FM instrument that uses table-lookups to correlate
attack, release, and mod-index.
Instead of p4 score value controlling amplitude, veloc would return the velocity of
the current MIDI event and ampmidi would return the MIDI velocity scaled into an
amplitude value. The pitch value read from p5 would be replaced with notnum (raw
note number of the current MIDI event) or cpsmidi (current note number converted
into cps).
Other MIDI parameters (aftertouch, channel pressure, pitch bend) can be accessed
through Csound and used as additional controllers for synthesis. These values can be
easily placed into any of the previous instrument examples.
167 Designing Acoustically Viable Instruments in Csound
Figure 6.9 Orchestra code for instr 607, a FM instrument with pitch and amplitude mapped
to envelopes and spectra.
Figure 6.10 The set of f-tables used to map amplitude and pitch in instr 607 from figure 6.9.
There are other characteristics of acoustic instruments that, when incorporated into
a synthetic instrument, help complete the illusion of viability. One of the most obvi-
ous is vibrato. A time-variant vibrato function adds substantially to the realism of
any synthetic instrument. Other pitch functions, such as the pitch deviation found in
initial transients, also help reinforce an instrument’s viability.
But the most significant factor in an instrument’s acoustic viability is its behav-
ior in space. We identify an instrument’s distance from us through the balance be-
tween direct and reverberated sound. We locate an instrument’s direction through its
168 Stephen David Beck
placement between audio channels. Doppler shifts are critical cues to a moving
sound. By dealing with all of the considerations discussed in this section, designers
can create comprehensive instruments that are dynamic, expressive, and musical. For
real-time synthetic instruments, this complete connection between performance and
sound is most important.
The goal of acoustic viability is to create synthetic instruments that behave realisti-
cally by creating connections of performance and timbre to the synthesis processes.
When these principles are applied to synthesis in nonintuitive ways, interesting and
sometimes provocative results can occur.
In instr 607, transfer-functions, in the form of lookup-tables, are used to translate am-
plitude and octave information into scaling factors for attack, decay, and modulation-
index. The functions were designed to replicate intuitive connections between all the
elements. When transfer-functions that represent nonintuitive translations are used,
exciting possibilities of impossible (but viable) instruments arise.
169 Designing Acoustically Viable Instruments in Csound
For example, inverting the connection between amplitude and envelope rates (us-
ing the design in instr 607 ), envelope rates would slow down with higher amplitudes
and speed up at lower amplitudes. Such an unlikely scenario would still be acousti-
cally viable in that a specific connection between parameters remains. Surprisingly,
the perception of that connection is enough to maintain its expressive quality.
One of Csound’s major strengths is the flexibility of its f-tables. By using f-tables as
transfer-functions, designers are able to carefully sculpt relationships between pa-
rameters that are not simply linear or exponential calculations. The functions can be
arbitrary breakpoint functions of linear or exponential values, polynomial expres-
sions, spline curves or sets of raw data.
Because these transforms are stored in f-tables, they can be easily changed ac-
cording to any criteria. And this flexibility leads to new and complex interactions
that are infinite in possibility.
Conclusion
The art of sound synthesis has many levels of complexity and subtlety. Finding the
right combination of techniques and variables can be a daunting and overwhelming
task. Designing sounds simply by trial-and-error is quite often a futile effort. Acous-
tic viability presents a set of guidelines and strategies for designing synthetic instru-
ments that are psychoacoustically coherent and musically satisfying. For the digital
performer, acoustically viable instruments are intuitively responsive. For the com-
poser, these instruments are compelling and musical. And for the listener, the sounds
created by these instruments are profoundly real, whether they are recreations of
acoustic instruments, or completely new sonic entities. By using acoustic viability,
composers can communicate expressive musical ideas using synthetic instruments
with the power and emotion of the traditional orchestra.
References
Campbell, M., and C. Greated. 1987. The Musician’s Guide to Acoustics. New York: Schir-
mer Books.
De Poli, G., A. Piccialli, and C. Roads. 1991. Representations of Musical Signals. Cambridge,
Mass.: MIT Press.
Dodge, C., and T. Jerse. 1985. Computer Music. New York: Schirmer Books.
Roads, C. 1996. The Computer Music Tutorial. Cambridge, Mass.: MIT Press.
Roads, C., and J. Strawn. 1985. Foundations of Computer Music. Cambridge, Mass.: MIT
Press.
Shatzkin, M. 1993. Writing For The Orchestra. Englewood Cliffs, N.J.: Prentice-Hall.
7 Designing Legato Instruments in Csound
Richard W. Dobson
The approach adopted in this chapter is that of an instrumentalist teaching the com-
puter, so to speak, how to sing and articulate a short melody. In the process, both
technical (programming-related) and musical (expression and style-related) issues
will be explored. As so often in instrumental teaching, the emphasis will be less on
the sound per se and much more on control, for it is skill in control that enables an
artist to sing with expression, however simple the instrument may be. Although the
nominal target instrument is the flute, I do not attempt to present more than a token
synthesis of its sonic characteristics. The accurate synthesis of any instrument is
inevitably complex (and well explored in other chapters of this book) and in the
present context it would obfuscate the control-oriented approach presented here. It
will be sufficient to provide just enough to draw out specific aspects of expressive
control and to demonstrate the general principles involved. Artists with a desire to
express through music naturally seek out instruments that offer multiple levels of
control. We need to provide these levels; it is a secondary issue whether the instru-
ment we design bears any recognizable relation to anything existing out there.
Legato
pitch and the decay stage only for the last. Such a phrase therefore, comprises at least
two notes.
Music notation also defines the tie, in which two notes of the same pitch are slurred
or tied together. This typically arises for purely notational reasons, for example,
when a note at the end of one measure is tied over the bar-line, or a note of irregular
duration is required, but can also arise as a compositional device, where the com-
poser wishes to specify some unusual expressive timing that cannot be adequately
written for a single note.
It goes almost without saying that a keyboard instrument, such as the piano, is
incapable of this form of legato, since it consists of a large set of distinct and mechan-
ically unrelated sound generators (the string and the hammer), whereas vocal legato
requires a single mechanism permitting variation of pitch and volume. Keyboard
players can only suggest legato by slightly overlapping successive notes. Some de-
gree of legato control was possible on the early monophonic synthesizers and this
facility has been carried over to modern polyphonic instruments in the form of MIDI
“mono mode.” This provides the essential elements of legato control, such as porta-
mento and the takeover of one note by another within a single envelope. A full legato
performance is limited, however, by the number of independent continuous control
streams made available to the player, such as foot pedals, modulation controls and
key pressure. MIDI itself does not directly support vocal legato—these controls have
to be encoded as System-Exclusive instructions for a particular instrument. This is
not really a criticism of MIDI, since it was defined primarily for keyboard instru-
ments and for real-time performance; and whereas human performers routinely read
(or think, in the case of an improvisation) several notes ahead, MIDI instruments
themselves cannot.
The ability to look ahead is essential for the expressive elements of legato play-
ing—the way one note is played depends on what the next note or notes will be. A
player will make a crescendo from the first note through several more, to a target
note, for example. Such a crescendo may have been marked by the composer; thus
both the composer and performer are thinking not in terms of single notes, but in
terms of whole phrases or gestures.
The primary technical requirement of a legato instrument is therefore that it
achieves continuity of loudness and pitch from note to note. This must be the case
even if the instrument we are modeling appears to make instantaneous changes of
pitch, as is the case for most wind instruments. We cannot apply truly instantaneous
changes of either amplitude or pitch to a waveform without causing an audible glitch.
Provision of continuity in turn leads to a consideration of context. Whereas the
boundary of a single note consists of the attack and decay stages, the boundary of a
173 Designing Legato Instruments in Csound
slurred note consists of the connection to the previous and following notes. We will
need to provide this minimum context to the instrument.
To develop a musically useful instrument we also need to decide precisely how
we want to change amplitude and pitch from one note to another. Our design must
go beyond mere technical continuity to incorporate essential elements of expressive
performance. The instrument must be capable of playing a legato phrase of any
length, where each note may be at an arbitrary dynamic level and it should be able
to give each note an expressive dynamic shape. The most familiar such shape is the
swell or messa di voce. In the case of the flute, this is usually done by intensifying
and relaxing the vibrato. By varying the shape of the swell (making the peak early
or late), we can give a note a shape ranging between an accent or diminuendo (early
peak) and a crescendo (late peak), which, together with the symmetrical swell itself
(and its inverse), form the core expressive resources for most singers and players. If
necessary, more elaborate gestures can be constructed from these elements, at the
cost of increased complexity in the score. For example, wind-players may use a quick
crescendo at the end of a note as a substitute for portamento, which they ordinarily
cannot execute. This can be scored using a two-note tie, the second note being given
the crescendo shape. Note that in this case, no pitch ramp is required and can be
bypassed in the instrument.
I have taken as my starting point for a legato instrument in Csound the special hold
feature of the duration parameter ( p3) of the standard score. To indicate a held note,
the duration parameter must be negative (this is a common programming trick—to
use an otherwise impossible or meaningless value to indicate something special).
The important point to understand is that, just as in the case of common musical
notation, a slur or tie is a relationship between at least two notes. A single note coded
as a tie is meaningless. Thus, if a score specifies only a single note with a negative
duration, no sound will in fact be generated. A second or final nonheld note must be
provided. Through the hold mechanism, this second note will take over the data space
of the held note. Thus the amplitude and pitch of the first note will be replaced by
that of the second and the net duration may be modified.
The hold feature is illustrated by the following orchestra and score, which also
makes use of some of Csound’s conditional statements, which here enable the orches-
tra to have a different behavior for the held and tied notes. For the latter, a simple
expressive swell is generated and added to the ongoing initial envelope. By ensuring
174 Richard W. Dobson
OSCILI
1
(asig)
Figure 7.1 Block diagram of instr 701, a legato instrument that supports ties.
Figure 7.2 Orchestra code for instr 701, a legato instrument featuring tival and tigoto
opcodes.
Figure 7.3 Score file for instr 701 featuring the use of a negative p3 (⫺3) to enable Csound’s
“hold” feature.
175 Designing Legato Instruments in Csound
that the swell starts and ends with zero amplitude, we avoid a glitch at the change-
over—so long as we only use two notes.
Initialization
When an instrument is initialized, the internal parameters of each unit generator are
reset to their default values and any local memory (e.g., in delay lines and filters) is
cleared. Other parameters are then read from the orchestra. In the current context,
this means that each time our instrument starts, the phase of the oscillator is reset to
default zero. If we need to, we can tell Csound to start the phase at a specific point.
For example, setting the initial phase to 0.25 would turn a sinewave (starting at zero
amplitude) into a cosine wave (starting at maximum amplitude.)
By setting this optional argument to a negative value, such as ⫺1 (another example
of the common programming trick referred to above), we can stop Csound from re-
setting the phase, so that for the second note, the oscillator starts from the phase at
which the previous note finished. This is not pitch continuity—we need to provide a
connecting portamento for that—but without this phase continuity we would almost
always get a bad amplitude glitch at the change-over. In fact, whenever we design le-
gato instruments, we will need to prevent reinitialization for any unit generators that
retain sample or phase data. This includes all oscillators, filters, delay lines, reverbs,
and so on.
In the example above, you will see that the conditional instruction tigoto has been
used. In Csound, tigoto is equivalent to “on a tied note, igoto . . .” but it only applies
during the initialization stage and is used here to bypass the re-initialization of the
primary envelope katt. In this case it will simply continue to decay as before (its
level and decay rate are remembered, deep within Csound). The interesting result of
this is that, even if you set the amplitude of the second note to zero, it will sound
with zero expression, continuing the envelope of the first note. Csound does not have
a matching inverse test (i.e., “if not a tie, goto . . .”) so it has to be coded explicitly
as shown, making use of the value obtained from tival. Note that all that is needed
to turn off the take-over mechanism is to make p3 positive; in this case two overlap-
ping notes will sound, as usual, each with a complete envelope, but with no swell.
Normally, tigoto and igoto are used to bypass other initialization statements—to
bypass unit generators would result in an opcode not being initialized, in the worst
case causing an error. For example, if igoto is substituted for kgoto in the second
test, an error will result, since linseg will not have been initialized. In this case how-
ever, we are relying firstly on the fact that katt has already been initialized by the
176 Richard W. Dobson
held note and secondly that its duration (as passed from the score) is enough to sup-
port both notes of the slur.
Figure 7.4 Score file featuring pp and np directives to refer to previous and future parame-
ter data.
177 Designing Legato Instruments in Csound
idur ⫽ abs(p3)
iamp ⫽ p4 ; OR ZERO FOR FIRST NOTE
ipch1 ⫽ cpspch(p6)
ipch2 ⫽ cpspch(p5) ; (0-⬎)p4-⬎p7
amp linseg iamp, 0.05, p4, idur-0.1, p4, 0.05, p7
iport ⫽ 0.1 ; INITIAL 100MSEC PORTAMENTO
kpch linseg ipch1, iport, ipch2, idur-iport, ipch2
Figure 7.5 Amplitude and pitch envelope segment of orchestra code for instr 702, the legato
instrument for performing score shown in figure 7.4.
Figure 7.6 Orchestra code for instr 702, a more elaborate legato instrument featuring con-
ditional init block, portamento, and slurring capabilities plus dynamic crescendo and
diminuendo.
178 Richard W. Dobson
first note. It also adjusts ramp durations for short notes down to 10 milliseconds
(msec.)—grace notes shorter than that are probably errors. Note also the change to
the amplitude ramp, in order to preserve the audio quality for the short ramps.
A long 100 msec. portamento has been set so that the effect can be heard clearly.
For the flute model, iport should be reduced to anything between 10 and 25 msecs.
If you want to play with this further, use a new p-field to set a value for iport. Simi-
larly, to manage long crescendi and diminuendi, p-fields for iatt and/or idec can also
be set from the score.
With this instrument the core technical components of legato control have been
completed. It can be regarded as a template from which it is possible to develop a
variety of instruments and performance styles. You can experiment with the ampli-
tude ramp by changing the times to various combinations of short and long segments,
with dynamic changes from note to note (including ties) and with portamento. The
one important thing is to ensure that the total of the times in the envelopes adds up
to the overall duration (idur), otherwise linseg will give unwanted results.
Expression Controls
Figure 7.8 Score file with swells, slurs, and context sensitivity.
volume is tremolo, though in music the latter term is associated primarily with rapid
reiteration of notes, such as up and down bowing on a violin. Confusingly, it also ap-
plies to the rapid alternation of pitches more than a second apart, which would other-
wise be a simple trill). On the flute, vibrato is almost all variation of volume and
should, strictly speaking, be called tremolo, whereas on the violin it is all pitch vari-
ation. Following is a description of the different elements that make up a vibrato.
䡲 Speed: between 4–8 Hz, the faster rates are reserved for short notes, or moments.
For a wind player this is physically demanding. It is common for the speed to vary
not only through a phrase, but also within a note.
䡲 Depth: shallow for low to moderate intensity, deep for power. The flute has a poor
dynamic range in comparison with other instruments; a progressive deepening of
vibrato can be used where a crescendo is marked—the ear hears the peak level of
the vibrato as the level of the note. It is usually a bad idea to combine deep vibrato
with high speed.
䡲 Fast notes: depends on the instrument. On the violin, the fingering and vibrato
mechanisms are the same, so that it is not feasible to apply vibrato to fast passage-
work, though it can be added to single short notes. On the flute, vibrato and finger-
ing are independent, so it is possible, in principle, to play fast passage-work while
adding vibrato.
䡲 Timing: on the flute, vibrato will normally start from the start of a note, though
delayed vibrato may be used occasionally. Most usually, vibrato will start off slow
180 Richard W. Dobson
and shallow, increasing in both dimensions as the note develops. A player cannot
calculate in advance how fast a vibrato must be to fit exactly into a note. Rather, a
comfortable and expressive speed is used and a minimal adjustment (speeding up
or slowing down) is made toward the end of the note. Vibrato may or may not be
inhibited for fast notes.
䡲 Evenness: vibrato is as vulnerable to human irregularities as are most other ele-
ments of performance. Strictly speaking, therefore, a fully realistic implementa-
tion of vibrato must include random variation of both amplitude and frequency.
This is avoided in the examples presented here, not only for the sake of simplicity,
but also because I found that, when developing this model, even the simplest
oscillator-based vibrato can sound acceptably dynamic and nonmechanical, when
combined with the simple expression control described here. Excessive use of ran-
dom modulation can merely lead to the synthesis of bad playing.
To implement all these aspects would involve considerable complexity in both
instrument and score. I confine myself here to a relatively simple, fixed implementa-
tion that, when combined with the dynamic swell already in place, will nevertheless
enable melodic lines of considerable expressiveness to be produced. It is, in any case,
easy enough to add further p-fields, for example, to change vibrato speed.
The most important technical issue is how the vibrato is added to the main tone.
We already have a swell envelope combined additively, so this would appear to be
equally reasonable for vibrato. Indeed, it can be done this way and in most cases it
can appear to work reliably, though it will always require amplitudes to be set care-
fully. The presence of two additive layers, however, will increase the danger, in a
legato instrument, of amplitude discontinuities. Instead, recalling that on the flute,
dynamic inflections are projected through vibrato, we can multiply the expression
envelope (which always starts and ends at zero) with the vibrato signal (which must
lie within the range 0–1), so that we keep with one safe additive layer. Although this
prevents any glitches resulting from phase discontinuities, we nevertheless will need
to preserve phase across single-pitch ties. In the orchestra excerpt from instr 704
shown in figure 7.9, the vibrato is turned off for short notes.
One small detail worth noting here is the phase offset set for the vibrato. Assuming
a sinewave shape, this turns it into a cosine, ensuring that vibrato is always applied
with the maximum amplitude at the start of the note. It is well worth experimenting
with this offset—the differences are audible but subtle and you may feel that this is
another control worth managing from the score.
181 Designing Legato Instruments in Csound
Figure 7.9 Orchestra code excerpt from instr 704 with conditional vibrato control.
talk of a fine claret.” This keenness of aural discrimination and evaluation is no less
important to the electroacoustic composer.
For our immediate purposes then, it is less important to achieve a truly accurate
timbral synthesis of the flute than to provide in principle similar properties—a tone
that can be varied timbrally and in which the attack has special identifying character-
istics. Flute players are often taught in their early stages that they “must start notes
with the tongue.” Though this might be true sometimes, advanced players do not
tongue simply because they have to, but often because they want to. The sound of a
tongued note on the flute is subtle and distinctive and a musical pleasure in its own
right, when properly cultivated. It is particularly vivid on the lower-voiced instru-
ments, such as the alto and bass flute, both of which are widely felt to exhibit an es-
pecially vocal character.
This “chiff ” is a complex, semichaotic sound that I have merely approximated here
by mixing a shaped noise burst with the main tone, (it could as well be the bow noise
for a violin, or the stick noise for a drum). Since my model is based on the sound of
a classically trained player, the chiff is discreet—the flute tone found on most syn-
thesizers and samplers is modeled on that of the typical jazz doubler whose first
instrument is the saxophone and whose sound consists of as much noise as tone.
I have distinguished two components: a fixed high frequency noise burst and a
pitched component that is more evident in the low register. I further distinguish the
attack note from tied notes by lengthening the attack ramp itself if the note is long
enough to leave room for the chiff. The chiff is therefore highly context-sensitive
and requires further conditional statements, in addition to those already provided.
These sometimes take the form of a selection from two alternatives depending on
another value, for which an especially compact conditional expression is available.
To suggest the thinning of timbre with register, a simple fixed filter is applied to
the output tone, to limit the high partials on notes above E, in the second octave. As
a final refinement I have also assigned the decay length idec from the score, requiring
a tenth p-field in the score, in order to set a long diminuendo on a final note.
Note that instr 705 includes two simple examples of “defensive programming.”
The first one deals with the case of p9 being set to zero in the score and the second
prevents an overly negative expression amplitude ( p8) from producing a net negative
amplitude. Both are really conveniences—for example, instead of setting some low
nonzero value for p9, I can write a neat zero and get the effect I want. Similarly, by
preventing a net negative amplitude whatever the value of p8, I can experiment with
p4 values without having to check p8 values too.
Although this seems to run counter to the prevailing style amongst Csound pro-
grammers, to rely on rigorously correct scores, it does reflect real-life instrumental
practice more than some composers may like to admit—there can be a few perform-
instr 705 ; FULL LEGATO INSTRUMENT WITH “CHIFF”
idur ⫽ abs(p3) ; MAIN INIT BLOCK
ipch1 ⫽ cpspch(p6)
ipch2 ⫽ cpspch(p5)
kpch ⫽ ipch2
iport ⫽ 0.02 ; TIGHT PITCH
iatt ⫽ 0.02 ; AND AMPLITUDE RAMPS
idec ⫽ p10 ; GET DECAY FROM SCORE
irise ⫽ idur*p9 ; SET SWELL PEAK POSITION
; ... (IFALL SET LATER)
idovib ⫽ 1 ; ASSUME WE USE VIBRATO
icut ⫽ (p5 ⬎ 9.01 ? 4000 : 2500) ; TRIM HIGHEST PARTIALS
; ASSUME THIS IS A TIED NOTE
iamp ⫽ p4 ; TIED NOTE STARTS AT SCORE AMP
i1 ⫽ -1 ; PHASE FOR TIED NOTE
i2 ⫽ -1 ; PHASE FOR VIBRATO
ir tival ; TIED NOTE? ; CONDITIONAL INIT BLOCK
tigoto tie
i1 ⫽ 0 ; FIRST NOTE, RESET PHASE
i2 ⫽ 0.25 ; COSINE PHASE FOR VIBRATO
iamp ⫽ 0 ; SET START AMP
iatt ⫽ 0.08 ; STRETCH ATTACK IF FIRST NOTE
tie:
iadjust ⫽ iatt⫹idec ; LONG NOTE, WE’RE SAFE
if idur ⬎⫽iadjust igoto doamp ; adjust ramp durations for
iatt ⫽ (idur/2)-0.005 ; ... SHORT NOTES, 10MSEC LIMIT
idec ⫽ iatt ; CAN’T HAVE ZERO TIMESPAN
iadjust ⫽ idur-0.01 ; (ENSURE ILEN !⫽ 0 FOR LINSEG)
idovib ⫽ 0 ; NO VIBRATO ON SHORT NOTES
iport ⫽ 0.005 ; EVEN TIGHTER PITCH RAMP
doamp:
ilen ⫽ idur-iadjust ; MAKE AMPLITUDE RAMP
amp linseg iamp, iatt, p4, ilen, p4, idec, p7
; ADD CHIFF ON FIRST NOTE
if ir ⫽⫽ 1 goto pitch ; NO CHIFF ON TIED NOTES
ichiff ⫽ p4/10 ; MATCH CHIFF TO VOLUME OF NOTE
; BALANCE CHIFFS WITH REGISTER
ifac1 ⫽ (p5 ⬎ 9.01 ? 3.0 : 1.) ; (AVOID DIVISION AT AUDIO...
ifac2 ⫽ (p5 ⬎ 9.01 ? 0.1 : 0.2) ; ...RATES)
aramp linseg 0, 0.005, ichiff, 0.02, ichiff*0.5, 0.05, 0, 0, 0
anoise rand aramp
achiff1 reson anoise, 3000, 500, 1, 1 ; 2 FILTERS FOR FIXED HF CHIFF,
achiff2 reson anoise, 6000, 1000, 1, 1 ; ... WITH RESCALING
achiff3 reson anoise,ipch2*2, 20, 0, 1 ; ONE FILTER FOR PITCHED CHIFF,
achiff ⫽ (achiff1⫹achiff2)*ifac1⫹(achiff3*ifac2)
pitch: ; MAKE PITCH RAMP
if ir ⫽⫽ 0 || p6 ⫽⫽ p5 kgoto expr ;skip ptchramp gen if 1st note or tie
kpramp linseg ipch1, iport, ipch2, idur-iport, ipch2
kpch ⫽ kpramp
expr: ; MAKE EXPRESSION ENVELOPE
; p8 SETS PEAK OF EXPRESSION POINT, P9 MOVES IT IF p9⫽⫽0 (ILLEGAL FOR LINSEG)
irise ⫽ (p9⬎0.?irise:iatt) ; SET MAXIMUM ACCENT SHAPE
ifall ⫽ idur-irise
; MAKE SURE A NEG p8 DOES NOT TAKE AMP BELOW ZERO
p8 ⫽ ((p8⫹p4) ⬎ 0. ? p8 : -p4)
aslur linseg 0, irise, p8, ifall, 0 ; MAKE VIBRATO
if idovib ⫽⫽ 0 goto play ; SKIP VIBRATO IF SHORT NOTE
avib oscili 0.5, 4.5, 2, 0.25 ; MED SPEED, ASSUME SINE F2
avib ⫽ avib⫹0.5
aslur ⫽ aslur*avib
play: ; MAKE THE NOTE
aamp ⫽ amp⫹aslur
aflute oscili aamp, kpch, 1, i1 ; TRIM PARTIALS OF HIGH
asig butterlp aflute, icut, 1 ; ... NOTES, NO REINIT
out asig⫹achiff*0.25 ; FINAL SCALING OF CHIFF
endin
Figure 7.10 Orchestra code for instr 705, full legato instrument with added “chiff” attack
noise.
184 Richard W. Dobson
Figure 7.11 Score file for first line of Debussy’s “Syrinx” to be performed by instr 705
shown in figure 7.10.
ers who have not, at some time, discreetly rewritten some passage to make it more
manageable, or at all playable.
This instrument will play the test score shown in figure 7.8, with the second func-
tion table f 2 added for the vibrato and if the extra p-field for the decay is added. But
for those who would like to test the instrument with a piece of real music, I have
added in figure 7.11 an interpretation of the first line of Debussy’s Syrinx for solo
flute. The expression is idiosyncratic in places, but it does illustrate all the spe-
cial facilities described above. The final note is coded as a tie, to enable a delayed vi-
brato, concluding with a longish diminuendo. Note also the difference in timbre for
the low D flat, at the end of the first bar.
The remaining unanswered question is, of course, why these values and not oth-
ers? Some patterns can be explained as pronunciation: accent the first note of a slur
and soften the rest (especially for an appoggiatura, for example, as at the end of the
first score example above). But as the size of the slur increases other “rules” can take
over, such as the need to make a crescendo to the next strong note. Other gestures,
such as the inverse swell, are less explainable, except perhaps by reference to those
spoken (or sung) at times of particular emotion, or simply by the need for variety.
Certainly there is an infinite number of ways of playing even this first line. The pub-
lished edition conceals the fact that the music was originally composed as incidental
185 Designing Legato Instruments in Csound
music for a play and was written without bar-lines or expression marks. Accordingly,
players tend to adopt a particularly free approach to its interpretation. For example,
it is almost traditional amongst players schooled in the French style (as taught by
Marcel Moyse) to accelerate vigorously through the first bar. Though this may be
done purely for expressive reasons, there is no doubt that it also makes the line easier
to play in one breath. The values used above correspond to a fairly conservative and
literal rendering of the published score and reflect an accordingly restrained style.
Some players prioritize tone and their own favorite gestures so much that even pro-
nunciation is sometimes foregone. In short, it is tempting to invent all manner of
pseudo-reasons this or that note is played in this or that way. Other players will do
some things differently; the reader is encouraged to experiment widely and to listen
to a variety of different performances.
Conclusion
This chapter introduces a wavetable synthesis model for the French horn in Csound.
The model is intuitive, since it uses a single spectrum to describe each part of the
horn’s range. Predefined amplitude envelopes control the desired articulation and en-
sure that the tone gets brighter with increasing amplitude.
The Horn
The French horn is probably the most mysterious of the brass instruments. Its ethe-
real sound is due in part to the bell being pointed toward the back wall instead of
directly at the audience. The horn has a conical bore wrapped tightly in a circle of
valves and tubes. The horn has a much more mellow sound than the trumpet and
trombone, more closely related to the tuba, which is also conical. The horn plays
prominent roles in orchestras, brass quintets, woodwind quintets and as a solo instru-
ment in hunting calls and fanfares.
The French horn has a much wider range than most people realize (from F2 to F5
on a MIDI scale), though the lower register of the horn is seldom used. The instru-
ment requires considerable skill, since it is easy to miss notes because the upper reg-
ister uses higher natural harmonics than the other brass instruments.
Previous work on modelling wind instruments has primarily focused on simulat-
ing the trumpet (Risset and Mathews 1969; Chowning 1973; Morrill 1977; Moorer,
Grey and Strawn 1978; Beauchamp 1982). The horn has not received as much atten-
tion, despite its interesting and distinctive character.
This chapter develops a wavetable synthesis model for the horn. An expressive
Csound implementation of the model is also given along with a transcription of the
opening horn solo in Strauss’s Til Eulenspiegel’s Merry Pranks.
188 Andrew Horner and Lydia Ayers
The Model
Our wavetable synthesis model uses a spectrum to determine the steady-state re-
sponse and one amplitude envelope to control the articulation. We use a special case
of wavetable synthesis, contiguous-group wavetable synthesis, inspired by the idea
of group synthesis using disjoint sets of harmonics in each wavetable (Kleczkowski
1989; Eaglestone and Oates 1992; Cheung and Horner 1996). Figure 8.2 is a block
diagram of our contiguous-group synthesis instrument. In our model, we allocate
the fundamental to group 1, harmonics 2 and 3 to group 2, harmonics 4–7 to group
3 and harmonics 8 and above to group 4. This contiguous grouping is perceptually
motivated, roughly corresponding to the division of the frequency range by critical
bands (Zwicker and Terhardt 1980).
We find a single representative steady-state spectrum from the original tone and
then break it into disjoint groups of harmonics. How do we decide which steady-
state spectrum is most representative? One method is picking the brightest spectrum.
Another method is picking the least bright spectrum. Unfortunately, either method
can select an unrepresentative spectrum if the original tone is unusually bright or
dark at times.
A more successful strategy is picking a spectrum with average brightness to ensure
that we do not get an extreme solution. Optimization is also possible, although we
have found the average brightness solution to be similar and to require much less
computation. We can intuitively imagine the amplitude envelopes for these contigu-
ous groups as equalization levels for the different frequency bands. We could use
least squares to construct envelopes that would best match the original tone (Cheung
and Horner 1996).
This assumes, however, that the original tone has the articulation we want in resyn-
thesis. For wind instruments, articulation seems to vary more owing to musical con-
text than from instrument to instrument. After looking at dozens of sustained tones,
we devised the prototype amplitude envelope shown as ampEnv1 in figure 8.3.
We use this envelope for wavetable 1, which contains the fundamental. For the
other groups, we use exponentially-related envelopes. For group 2, we square the
189 Contiguous-Group Wavetable Synthesis of the French Horn in Csound
representative spectrum
out
Figure 8.3 Exponentially related amplitude envelope set derived from prototype envelope
ampEnv1.
190 Andrew Horner and Lydia Ayers
prototype amplitude envelope, for group 3 we cube it and for group 4 we raise it to
the fourth power.
These relationships ensure that the tone gets brighter as it reaches the peak ampli-
tude value and gradually gets darker during the decay. This approximation certainly
loses some spectral nuances, but it guarantees a smooth spectral evolution with the
brightness characteristic of acoustic wind instruments. Also, even though the group
1 amplitude envelope contains linear segments, the exponential relationship of the
other groups makes the attack and decay exponential.
So far we have only discussed how to match a single tone from an instrument. To
represent spectral changes over the instrument’s pitch range, we use the same trick
as most synthesizers by matching about two pitches from each octave. For un-
matched pitches, we simply pick the nearest matched pitch and use its wavetables
and amplitude envelopes with the desired fundamental frequency.
The Implementation
To verify the usefulness of our horn design, we created a Csound orchestra of the
group synthesis model and derived the following parameters:
䡲 p4 ⫽ pitch in Hz (normal pitch range: F2–F5 (sounding))
䡲 p5 ⫽ overall amplitude scaling factor
䡲 p6 ⫽ percent of vibrato depth, recommended range [0, .5], 0 ⫽ no vibrato, 1 ⫽
1% vibrato depth
䡲 p7 ⫽ attack time in seconds, recommended range [.03, .12]
䡲 p8 ⫽ decay time in seconds, recommended range [.04, .25]
䡲 p9 ⫽ overall brightness/filter cutoff factor, 1 being the least bright/lowest fil-
ter cutoff frequency (40 Hz) and 9 being brightest/highest filter cutoff frequency
(10,240 Hz)
Let us examine the Csound instrument in detail. After the global variables, a few
lines of comments describe the parameter fields (p-fields) p4–p9 and their appro-
priate values. The lines immediately following move the p-values into i-variables,
making reordering of the p-fields easy if desired. The variable iamp ( p4) controls
the approximate maximum amplitude the instrument will reach on a sustained note.
The variable ifreq ( p5) contains the fundamental frequency. Avoid notes higher than
F5, since aliasing can occur as the wavetable’s highest harmonics go beyond the
Nyquist frequency (half the sampling rate).
191 Contiguous-Group Wavetable Synthesis of the French Horn in Csound
giseed ⫽ .5
garev init 0
Figure 8.4 Orchestra code for instr 801, an example of continuous-group wavetable synthe-
sis of French Horn.
192 Andrew Horner and Lydia Ayers
The variable ivibd ( p6) controls the maximum depth of vibrato relative to the
fundamental frequency. A value of 1.0 indicates a 1% vibrato relative to the funda-
mental frequency (a strong vibrato), while .5 represents a .5% vibrato (a moderate
vibrato).
Attack and decay times are stored in iatt ( p7) and idec (p8), respectively. Attack
times for the horn range from about .03 seconds for the higher pitches to .12 seconds
for lower pitches. Decay times range from .04 seconds to as much as .25 seconds.
Finally, ifiltcut stores a lowpass filter cutoff frequency, determined by p9 as a
brightness index between 1 (least bright) and 9 (most bright), where each increment
increases the cutoff frequency by an octave. The brightness factor allows softer dy-
namics to be played with less brightness.
Next, after enveloping the vibrato depth, the vibrato rate increases from about 3
Hz to about 6 Hz over the duration of the tone. The changing vibrato rate helps the
instrument sound more natural, as performers typically use dynamic vibrato rates to
push along the phrase. Since we know the note durations in advance in Csound, so
we can push the rate appropriately according to the note length (something you can-
not do on a real-time synthesizer). Small random perturbations are also added to the
vibrato rate.
When we were first designing this instrument, Csound did not have an i-time ran-
dom number generator, so we used the statement shown in figure 8.5 to quickly
generate pseudo-random numbers between zero and one.
Defining giseed as a global variable prevents it from being re-initialized on each
note. We generated 1,000,000 values using the above formula to check the distribu-
tion of the two most significant digits. Figure 8.6 shows that the distribution is basi-
cally uniform, indicating that the random number generator is good enough for our
purposes. We use giseed throughout the instrument design to make each note unique.
After the vibrato code, the amplitude envelopes are defined and the pitch range
of the current note is determined. The pitch range determines which wavetables
and amplitude normalization factor to use. The oscil statements control wavetable
lookup. The scaled output of each wavetable is summed into asig, which is then
normalized to the desired level.
Next, the signal goes into a lowpass filter and is sent to the reverberator in instr
899. Since the horn is pointed toward the back when played, the listener rarely hears
the direct sound from the instrument. Thus, without reverberation, the instrument
sounds unnatural to most people, but quite normal to players used to practicing in
small rooms with absorptive walls. We use a 10% reverberation of 1.2 seconds to
simulate the sound of a concert hall. Finally, we pass the signal to the output.
193 Contiguous-Group Wavetable Synthesis of the French Horn in Csound
giseed ⫽ frac(giseed*105.947)
Figure 8.5 Algorithm for generating an i-time pseudo-random number sequence.
Figure 8.6 Distribution of 1,000,000 values from our pseudo-random number generator.
Figure 8.7 Opening horn solo from Strauss’s’ Til Eulenspiegel’s Merry Pranks. The score
(for horn in F) sounds a fifth lower than written, except the bass clef notes that sound a fourth
higher than written.
The Proof
To verify the usefulness of our horn design, we scored the well-known horn solo
from Til Eulenspiegels’ Merry Pranks by Strauss as an example. The solo is unusual
in that it covers the full range of the instrument in a single phrase as shown in fig-
ure 8.7.
Conclusion
Our wavetable synthesis model provides an elegant way to represent the horn. Com-
paring the spectra from each range to see how they differ allows them to be changed,
in order to produce horn-like timbres. It is possible to manipulate the few parame-
ters directly.
194 Andrew Horner and Lydia Ayers
f 1 0 4097 -9 1 1.0 0
f 2 0 16 -2 40 40 80 160 320 640 1280 2560 5120 10240 10240
f 3 0 64 -2 11 12 13 52.476 14 15 16 18.006 17 18 19 11.274 20 21 22
6.955 23 24 25 2.260 26 27 10 1.171 28 29 10 1.106 30 10
10 1.019
f 4 0 2048 -17 0 0 85 1 114 2 153 3 204 4 272 5 364 6 486 7
f 10 0 5 -9 1 0.0 0
f 11 0 4097 -9 2 6.236 0 3 12.827 0
f 12 0 4097 -9 4 21.591 0 5 11.401 0 6 3.570 0 7 2.833 0
f 13 0 4097 -9 8 3.070 0 9 1.053 0 10 0.773 0 11 1.349 0 12 0.819 0 13
0.369 0 14 0.362 0 15 0.165 0 16 0.124 0 18 0.026 0 19
0.042 0
f 14 0 4097 -9 2 3.236 0 3 6.827 0
f 15 0 4097 -9 4 5.591 0 5 2.401 0 6 1.870 0 7 0.733 0
f 16 0 4097 -9 8 0.970 0 9 0.553 0 10 0.373 0 11 0.549 0 12 0.319 0 13
0.119 0 14 0.092 0 15 0.045 0 16 0.034 0
f 17 0 4097 -9 2 5.019 0 3 4.281 0
f 18 0 4097 -9 4 2.091 0 5 1.001 0 6 0.670 0 7 0.233 0
f 19 0 4097 -9 8 0.200 0 9 0.103 0 10 0.073 0 11 0.089 0 12 0.059 0 13
0.029 0
f 20 0 4097 -9 2 4.712 0 3 1.847 0
f 21 0 4097 -9 4 0.591 0 5 0.401 0 6 0.270 0 7 0.113 0
f 22 0 4097 -9 8 0.060 0 9 0.053 0 10 0.023 0
f 23 0 4097 -9 2 1.512 0 3 0.247 0
f 24 0 4097 -9 4 0.121 0 5 0.101 0 6 0.030 0 7 0.053 0
f 25 0 4097 -9 8 0.030 0
f 26 0 4097 -9 2 0.412 0 3 0.087 0
f 27 0 4097 -9 4 0.071 0 5 0.021 0
f 28 0 4097 -9 2 0.309 0 3 0.067 0
f 29 0 4097 -9 4 0.031 0
f 30 0 4097 -9 2 0.161 0 3 0.047 0
t 0 324
i 899 0 46
Figure 8.8 Score file for opening horn solo from Strauss’s Til Eulenspiegel’s Merry Pranks.
195 Contiguous-Group Wavetable Synthesis of the French Horn in Csound
References
Cheung, N.-M., and A. Horner. 1996. “Group synthesis with genetic algorithms.” Journal of
the Audio Engineering Society 44(3): 130–147.
Chowning, J. 1973. “The synthesis of complex audio spectra by means of frequency modula-
tion.” Journal of the Audio Engineering Society 21(7): 526–534.
Eaglestone, B., and S. Oates. 1990. “Analytical tools for group additive synthesis.” In Proceed-
ings of the 1990 International Computer Music Conference. San Francisco: International
Computer Music Association, pp. 66–68.
Kleczkowski, P. 1989. “Group additive synthesis.” Computer Music Journal 13(1): 12–20.
Moorer, J., J. Grey, and J. Strawn. 1978. “Lexicon of analyzed tones (part 3: the trumpet).”
Computer Music Journal 2(2): 23–31.
Morrill, D. 1977. “Trumpet algorithms for computer composition.” Computer Music Journal
1(1):46–52.
Risset, J.-C., and M. Mathews. 1969. “Analysis of instrument tones.” Physics Today 22(2):
23–30.
Zwicker, E., and E. Terhardt. 1980. “Analytical expressions for critical-band rate and critical
bandwidth as a function of frequency.” Journal of the Acoustical Society of America 68(5):
1523–1525.
9 FM Synthesis and Morphing in Csound:
from Percussion to Brass
Brian Evans
An FM Review
Imagine a violinist adding a syrupy vibrato to a long, legato note. This is frequency
modulation. As the note is played the finger moves ever-so-slightly back and forth
on the violin string, maybe five to ten times a second. The pitch is slightly raised and
then slightly lowered, up and down, up and down.
Analog electronic music instruments simulate this type of vibrato with a low fre-
quency oscillator (LFO). The voltage output of the LFO adds to the frequency
198 Brian Evans
Figure 9.1 An analog synthesis patch that adds an output voltage from an LFO to the voltage
controlling the frequency of a VCO. For FM synthesis, the LFO can be replaced with a VCO
allowing for modulating frequencies within the audio spectrum (greater that 20 Hz).
voltage of a voltage controlled oscillator (VCO), slightly raising and lowering the
frequency of the VCO output as shown in figure 9.1. The modulating oscillator, or
modulator, defines the vibrato. The oscillator to which the vibrato is applied is the
carrier.
This simple vibrato patch gets especially interesting when we raise the frequency
of the vibrato above 20 Hz (the low-end frequency of human audio perception).
When this occurs we no longer hear vibrato, but rather we hear a distortion of the
tone being modulated. The defining characteristics of the carrier and modulating
waves determine the nature of the distortion. These defining characteristics, or pa-
rameters, are few, hence the economy of FM as a synthesis technique.
The primary parameters in FM synthesis are the frequency and amplitude values
of the carrier and the modulating oscillators (for the sake of simplicity we will as-
sume the waveshape for each oscillator is a sinewave). The parameters for a basic
FM instrument, as seen in figure 9.1 above, can be listed as:
䡲 c – frequency of the carrier.
䡲 m – frequency of the modulator.
䡲 amp – amplitude of the carrier (the amplitude control of the patch).
䡲 dev – deviation, the amount of frequency change (in Hz) above and below c . . .
. . . (the amplitude envelope of the modulator).
199 FM Synthesis and Morphing in Csound: from Percussion to Bass
Figure 9.2 This graphic representation of basic FM shows the modulating oscillator output
(which would be in Hz) being added to the frequency input of the carrier.
Figure 9.3 The carrier here is a simple sine wave of frequency c. FM sidebands are calcu-
lated as c ⫾ nm, with n ⫽ the counting numbers 1, 2, 3, etc.
The ratio of the carrier frequency to the modulator frequency, commonly called
the c/m ratio determines the relative frequencies of the partials in the FM spectrum.
When c/m reduces to a simple rational number such as 1/2 or 3/4 the output spectrum
is harmonic. When c/m reduces to an irrational number such as 1/冑2 or 2/冑3, partials
are in a noninteger relationship and so the spectrum is inharmonic (in practice, com-
plex ratios such as 411/377 or 1/1.617 will create spectra that will sound inharmonic.
True irrational numbers can only be approximated digitally anyway).
The ability to create inharmonic spectra so easily is one of the attractions of FM
(again, refer to Chowning for the details as to how and why c/m ratios determine the
frequencies of the FM spectra components).
The relative amplitudes of the sideband frequencies are dependent on both the
deviation and frequency of the modulator, so it is convenient to think of these values
together as an index of modulation. This modulation index is calculated as the ratio
shown in figure 9.4.
With a deviation of zero there will be no modulation and so no sidebands around
the carrier. As the deviation increases, however, the amplitudes of the sidebands in-
crease as a function of the index.
The modulation index provides an intuitive measure when thinking about the FM
spectra. A higher index value generally indicates more sidebands and more energy
in the sidebands. In other words, the higher the index the more complex the spec-
trum, with the highest sideband of significant amplitude equal to the index⫹1 (or
a frequency of c⫹m (index⫹1)). Chowning shows that we can predict the relative
amplitudes of the sidebands mathematically (see Chowning 1973 or Dodge 1985).
In summary, an FM spectrum is harmonic when the c/m ratio is simple and rational
and inharmonic otherwise. A high index indicates a spectrum rich in partials, while
a lower index creates a simpler spectrum.
201 FM Synthesis and Morphing in Csound: from Percussion to Bass
modulation deviation
modulation frequency
Figure 9.4 Formula for computation of FM “modulation index.”
FM in Csound
LINSEG
(adevenv)
indexdif
ilowndx
LINSEG
imodfr eq
(aampenv)
imodfr eq
OSCILI
1
(amodosc)
imaxamp
icarrfr eq
OSCILI
1
(acarosc)
Figure 9.5 Block diagram of instr 901. This is a graphic representation of a basic FM instru-
ment implemented in Csound. The ADSR envelopes control both the output amplitude of the
instrument and the modulation index (and the deviation indirectly through the index).
Figure 9.6 Orchestra code for instr 901, a Chowning FM instrument featuring dynamic
spectra. Note that all envelope (linseg) times are a percentage of the note duration ( p3).
203 FM Synthesis and Morphing in Csound: from Percussion to Bass
Figure 9.7 The foscili opcode with arguments. A FM synthesizer in a single line of code.
cally to implement Chowning FM. These opcodes, foscili and foscil, define a unit
that is a composite of two oscillators. Parameters for these opcodes include c/m ratio
values, along with a common frequency value, FM index and output amplitude and
a function table (f-table) that defines the waveshape. As the c/m ratio values are input
in their lowest terms, the effective frequencies of the carrier and modulator are deter-
mined by multiplying c and m by the common frequency value. For example, the
line in figure 9.7 defines an FM instrument that will have an output amplitude of
20000 with a carrier frequency of 200 ( freq ⫽ 100, c ⫽ 2), a modulator frequency
of 300 (m ⫽ 3*c) and an index of 5. The waveshapes for the oscillators in the unit
are defined in f 1, in the score file. (See the Csound Reference Manual for a full
description of this opcode).
Chowning’s Trumpet
Figure 9.8 This ADSR envelope controls both the output amplitude and the modulation
index of Chowning’s brasslike FM instrument.
sustain and a quick release. The spectrum will reach maximum complexity at the
end of the attack, sustain a complex harmonic spectrum and simplify as it ends. This
spectral envelope is also typical of a brass instrument.
Chowning’s Drum
Figure 9.9 a) Chowning uses an exponential envelope (the dotted curve seen here) to control
the amplitude of the percussive instrument. We approximate this envelope with a linear ADSR
envelope. b) This envelope controls the index of the percussive FM instrument. The index
starts with the high value and quickly lowers.
The exponential envelope for amplitude that was specified in Chowning’s instru-
ment definition is not possible with a linear ADSR envelope. In the Csound imple-
mentation we are using the curved envelope is approximated as shown in figure 9.9a.
In general, the percussive envelope has an instantaneous attack, a rapid swell and an
equally rapid decay. Different p-values are given to illustrate a variety of per-
cussive sounds.
FM Morphing
Spectral morphing is the idea of evolving from one instrumental timbre to another.
Parametric interpolation in FM is one of the earliest examples of morphing and one
of the easiest to explore. Chowning used the technique in his first FM composition
Sabelithe, which dates from 1966 to 1971. Around 4 minutes and 50 seconds into
the piece a dramatic moment occurs where a drum slowly morphs into a trumpet.
Using the Csound versions of Chowning’s FM instruments we can simulate the
instruments he used in Sabelithe and explore a similar morph. We can use the param-
eters we have already defined to create a brass instrument and a drum. All we need
to do is create a Csound score that starts with a drum note and ends with a trumpet
note. The notes played between the start and end interpolate between the parameters,
defining the two different instruments. Using the linear ramping capabilities built
into the Csound score language, (the “⬍”), a simple linear interpolation between the
parameters creates a morph from one instrument to the other.
206 Brian Evans
Conclusion
This small exercise shows both the simplicity and power of FM as a musical tool and
how John Chowning’s ideas, instrument and compositions continue to have potency
for musicians looking for rich and ever new sonic worlds to explore.
References
Chowning, J. 1973. “The synthesis of complex audio spectra by means of frequency modula-
tion.” Journal of the Audio Engineering Society 21(7): 526–534.
Chowning, J. 1988. “Sabelithe.” John Chowning, WER 2012-50. Mainz, Germany: Wergo.
Dodge, C., and T. Jerse. 1997. Computer Music: Synthesis, Composition and Performance.
New York: Schirmer.
10 Modeling “Classic” Electronic Keyboard
Instruments in Csound
Hans Mikelson
This instrument pair simulates the sounds of a tone-wheel organ with a rotating
speaker effect like a Hammond B3 organ and a Leslie speaker. The first models
the tone-wheel organ and the second models the rotating speaker. The final section
discusses some suggestions for further modifications.
Tone-Wheel Organ
The first step toward reproducing this instrument is to consider the internal struc-
ture of the tone-wheel organ. The heart of the tone wheel organ is its tone wheel
mechanism. This consists of a central shaft with gears coupled to 91 tone wheels.
A magnetic pick-up senses the distance to the wheel and generates an electric sig-
nal proportional to this distance. Owing to the “flower” shape of the wheels and the
filtering effect of the pickups, the signal produced is close to a sine wave. GEN10
can be used to set up a sine table:
f 1 0 8192 10 1
An organ timbre is created by adding together signals from nine tone wheels. The
amount of each harmonic added is controlled by nine drawbars with the following
frequencies: Sub-fundamental, sub-third, fundamental, second, third, fourth, fifth,
sixth and eighth. The sub-fundamental is one octave below the fundamental. The
sub-third is one octave below the third. A separate oscil opcode can be used to gener-
ate each harmonic. The following Csound code fragment implements this aspect of
the sound:
208 Hans Mikelson
Pickup
In this instrument, parameter fields (p-fields) p6–p14 specify the values of the 9
drawbars. The fundamental frequency is given by ifqc. This is multiplied by equally
tempered steps to obtain the other eight harmonics.
The twelve lowest tone wheels have only two teeth and, in the Hammond B3 se-
ries, had increased odd harmonic content that may be interpreted as having a some-
what rectangular shape. To implement this, f-table 2 ( f 2) is set up with some odd
harmonic content:
f 2 0 1024 10 1 0 .2 0 .1 0 .05 0 .02
The keyboard key pressed can be calculated from the frequency parameter ( p5).
The key position is used to assign f-table 2 to the lowest twelve frequencies and
f-table 1 to the others:
iwheel1 init ((ikey–12) ⬎ 12 ? 1 : 2)
iwheel2 init ((ikey⫹7) ⬎ 12 ? 1 : 2)
iwheel3 init (ikey ⬎ 12 ? 1 : 2)
iwheel4 init 1
The phase of the tone wheels depends on the number of “petals” on a wheel and
the alignment of the wheels. Some continuous variation in phase, based on the wheel
number and the time ( p2), is desirable. The variable iphase is set to p2 and then di-
vided by the tone-wheel number. In a way, p2 represents the central shaft of the or-
gan. Finally, the fundamentals are added together into a global variable.
209 Modeling “Classic” Electronic Keyboards in Csound
The second aspect of the classic Hammond organ sound is the Leslie rotating
speaker. The speaker is powered by a tube amplifier. Structurally, the Leslie speaker
system consists of a high-frequency driver pointing up into a plastic horn. The driver
remains stationary and the horn rotates, spraying the sound around the room. A low
frequency driver directs sound downward into a rotating scoop.
The elements required to model this system were: distortion, introduced by the
amplifier, acceleration rates of the horn and scoop, the Doppler effects of the rotating
horn and scoop, different directional effects of different frequencies and stereo phase
separation, owing to either microphone placement or separation between the listen-
er’s ears.
The first step taken in simulating this system is to add distortion to the sound by
passing it through a waveshaping table. GEN8 is used to set up a table such that val-
ues close to zero will be reproduced linearly while values further away from zero are
compressed. This is done using one of the following tables:
f 5 0 8192 8 -.8 336 -.78 800 -.7 5920 .7 800 .78 336 .8
f 6 0 8192 8 -.8 336 -.76 3000 -.7 1520 .7 3000 .76 336 .8
Treble Driver
Bass Driver
Bass Scoop
Acceleration of the horn takes about one second. The slow and fast rates of the scoop
are about .7 Hz and 7 cycles/sec2. The acceleration of the bass scoop takes about two
seconds. The following envelopes use the linseg opcode to set the rotation rates,
acceleration rates and hold times. This is set up in a separate instrument, which acts
as the foot switch that may be turned on using global variables:
gkenv linseg gispeedi*.8, 1, gispeedf*.8, .01, gispeedf*.8
gkenvlow linseg gispeedi*.7, 2, gispeedf*.7, .01, gispeedf*.7
The next step is to produce a Doppler effect that is due to the rotation of the
speakers. This is accomplished by setting up a delay line using the delay opcode and
accessing it with a variable time-delay tap using the deltapi opcode. The tap point is
controlled by an oscil opcode accessing a sine table. Separate taps are used for right
and left channels. The magnitude of the oscillation corresponds to the radius of
rotation:
koscl oscil 1, kenv, 1, ioff
koscr oscil 1, kenv, 1, ioff⫹isep
The phase separation between the signals is controlled by the variable isep. This
can be thought of as the angle of separation between the listener’s ears, or the place-
ment of microphones around the speaker. The variable ioff is a phase offset that is
intended to allow for multiple rotating speakers at different locations in the room.
The Doppler effect is repeated for the low frequency section:
kdopl ⫽ .01-koscl*.0002
kdopr ⫽ .012-koscr*.0002
aleft deltapi kdopl
aright deltapi kdopr
The center frequency and bandwidth were determined empirically. The high fre-
quency segment is modulated by a narrow pulse, using GEN7 to set up the follow-
ing table:
f 3 0 256 7 0 110 0 18 1 18 0 110 0
211 Modeling “Classic” Electronic Keyboards in Csound
The middle frequency sound is modulated by a wider pulse and the low frequency
is modulated by a sine wave:
alfmid butterbp aleft, 3000, 2000
arfmid butterbp aright, 3000, 2000
The pulse table is swept with oscil using the same offset and separation as the
Doppler tap:
kflohi oscil 1, kenv, 3, ioff
kfrohi oscil 1, kenv, 3, ioff⫹isep
kalosc ⫽ koscllow*.4⫹1
karosc ⫽ koscrlow*.4⫹1
Finally, the three filtered and modulated signals for each channel are added to-
gether to complete the instrument.
Discussion
My orchestra and score as shown in figure 10.3 can only approximate the rich, warm,
gritty character of a real Hammond organ. However, one advantage of using the
Csound environment is its flexibility. The shape of the tone wheels can be changed
easily. Samples from single tone wheels of actual instruments could be set up in
tables. This may improve the simulation. Triangle waves or other waveshapes could
be used instead of the sine waves. The instrument could be extended to have any
number of additional speakers. Finally, variable delay lines could be set up to simu-
late the complex echoes produced by the rotating speaker system.
The following Csound instruments and tables attempt to model a classic analog syn-
thesizer instrument. To produce sound early synthesizers used fairly straightforward
periodic waveforms that were then processed through a resonant lowpass filter. This
type of filter provided one of the most distinctive elements of the early synthesizer
sound. Development of a two-pole resonant lowpass filter and some variations of this
filter are discussed. Furthermore, the cascade form of a filter is discussed and a four-
pole filter, which uses it, is derived.
sr ⫽ 44100
kr ⫽ 2205
ksmps ⫽ 20
nchnls ⫽ 2
Figure 10.3 Orchestra and score code for instr 1001–1004 a tone-wheel organ (i1003) with
global rotating speaker (i1004).
alfhi butterbp aleft, 8000, 3000 ; DIVIDE FREQ INTO THREE
arfhi butterbp aright, 8000, 3000 ; GROUPS AND MOD EACH WITH
alfmid butterbp aleft, 3000, 2000 ; DIFFERENT WIDTH PULSE TO
arfmid butterbp aright, 3000, 2000 ; ACCOUNT FOR DIFFERENT
alflow butterlp aleftlow, 1000 ; DISPERSION OF DIFFERENT
arflow butterlp arightlow, 1000 ; FREQUENCIES.
kflohi oscil 1, gkenv, 3, iioff
kfrohi oscil 1, gkenv, 3, iioff⫹isep
kflomid oscil 1, gkenv, 4, iioff
kfromid oscil 1, gkenv, 4, iioff⫹isep
; AMPLITUDE EFFECT ON LOWER SPEAKER
kalosc ⫽ koscllow*.6⫹1
karosc ⫽ koscrlow*.6⫹1
; ADD ALL FREQUENCY RANGES AND OUTPUT THE RESULT
outs1 alfhi*kflohi⫹alfmid*kflomid⫹alflow*kalosc
outs2 arfhi*kfrohi⫹arfmid*kfromid⫹arflow*karosc
gaorgn ⫽ 0
endin
; SINE
f 1 0 8192 10 1 .02 .01
f 2 0 1024 10 1 0 .2 0 .1 0 .05 0 .02
; DISTORTION TABLES
f 5 0 8192 8 -.8 336 -.78 800 -.7 5920 .7 800 .78 336 .8
f 6 0 8192 8 -.8 336 -.76 3000 -.7 1520 .7 3000 .76 336 .8
t 0 200
; INS STA DUR AMP PIT SUBF SUB3 FUND 2ND 3RD 4TH 5TH 6TH 8TH
i 1003 0 6 200 8.04 8 8 8 8 3 2 1 0 4
i 1003 0 6 . 8.11 . . . . . . . . .
i 1003 0 6 . 9.02 . . . . . . . . .
i 1003 6 1 . 8.04 . . . . . . . . .
i 1003 6 1 . 8.11 . . . . . . . . .
i 1003 6 1 . 9.04 . . . . . . . . .
i 1003 7 1 . 8.04 . . . . . . . . .
i 1003 7 1 . 8.11 . . . . . . . . .
i 1003 7 1 . 9.02 . . . . . . . . .
i 1003 8 1 . 8.04 . . . . . . . . .
i 1003 8 1 . 8.09 . . . . . . . . .
i 1003 8 1 . 9.01 . . . . . . . . .
i 1003 9 8 . 8.04 . . . . . . . . .
i 1003 9 8 . 8.08 . . . . . . . . .
i 1003 9 8 . 8.11 . . . . . . . . .
i 1003 17 16 200 10.04 8 4 8 3 1 1 0 . 3
i 1003 20 13 200 9.09 8 4 8 3 1 1 0 . 3
i 1003 23 10 200 8.04 8 4 8 3 1 1 0 . 3
i 1003 26 7 200 7.04 8 4 8 3 1 1 0 . 3
; ROTATING SPEAKER
; INS STA DUR OFFSET SEPARATION
i 1004 0 33.2 .5 .1
f 1 0 1024 10 1 ; SINE
f 2 0 256 7 -1 128 -1 0 1 128 1 ; SQUARE
f 3 0 256 7 1 256 -1 ; SAWTOOTH
f 4 0 256 7 -1 128 1 128 -1 ; TRIANGLE
Figure 10.4 F-tables used to define the basic geometric waveshapes used in “classic” ana-
log synthesizers.
Oscillators
Oscillators from early analog synthesizers generated simple waveforms. Some of the
most commonly used are the sine, square, sawtooth and triangle waveforms. These
can be set up easily using f-tables, which are then accessed using the oscil opcode
(see figure 10.4).
Pulse-width modulation is often used in analog synthesis to produce a wider vari-
ety of timbres. With pulse-width modulation, a square wave is gradually transformed
from a narrow square pulse to a wide square pulse. A sine wave can be used to mod-
ulate the pulse-width. The rate of modulation can be set to increase gradually. The
following code accomplishes this:
ksine oscil 1.5, ifqc/440, 1
ksquare oscil ifqc*ksine, ifqc, 2
axn oscil iamp, ifqc⫹ksquare, itabl1
The opcodes rand and buzz can also be used as sources for modeling an analog
synthesizer.
The key step in reproducing a classic analog sound is the development of a good
resonant lowpass filter. A discussion of filter theory will help to illuminate the ap-
proach taken in designing this filter (Oppenheim, Willsky, and Young 1983). A gen-
eral filter can be defined using the differential equation:
a 0 X + a1 X ′ + a 2 X ′′ + K = b0 Y + b1 Y ′ + b2 Y ′′ + K (10.1)
Where:
䡲 X is the input signal
䡲 Y is the output signal
215 Modeling “Classic” Electronic Keyboards in Csound
As the frequency approaches zero, b1 and b2 vanish as the function approaches one.
As the frequency approaches infinity the squared term begins to dominate and this
function approaches zero. Resonance is introduced by making b1 negative and b2
positive. Carefully adjusting the values of the coefficients results in a peak at the cut
off frequency. A typical curve is given in figure 10.5.
The differential equation corresponding to this frequency response is:
Y + b1 Y ′ + b2 Y ′′ = X (10.4)
To be used in Csound, this equation must be converted from the continuous form
to a discrete form. One way to do this is to convert the differential equation into
a difference equation. The slope Y⬘ ⫽ dY/dt can be approximated by DY/Dt and
the curvature Y⬙ ⫽ d2Y/dt2 can be approximated by D(DY)/D(Dt). Letting ⌬t ⫽ 1
216 Hans Mikelson
Resonance
A
m
p
l
i 1
t
u
d
e
0
Frequency
simplifies the algebra involved in solving it. This is valid if the sample rate is constant
and the units of time are equal to 1/sr. Substituting the difference approximation for
the differentials in equation 10.4 and knowing that DY ⫽ Yn⫺Yn⫺1 yields:
Yn = Xn − b1(Yn − Yn−1 ) − b2(Yn − 2Yn−1 + Yn−2 ) (10.5)
Note that:
e j = cos ( ) + j sin ( ) (10.9)
217 Modeling “Classic” Electronic Keyboards in Csound
1000
b2 = (10.11)
co
This completes the basic two-pole filter. Some modifications of the filter will be
considered next. The Roland TB–303 is a much sought after synthesizer owing to its
unique sound. This is in part due to the unusual character of its filter. It has been
suggested that one of the aspects of this filter that make it unique is the circuitry that
prevents the filter from going into self-oscillation. When resonance is increased, the
filter distorts instead of self-oscillating. This type of behavior can be simulated in
Csound by doing the following: isolate the resonating portion of the signal and pass
it through a distortion table. Add the distorted signal to an original lowpass-filtered
signal:
atemp tone axn, kfco
aclip1 ⫽ (ayn-atemp)/100000
aclip tablei aclip1, 7, 1, .5
aout ⫽ aclip*20000⫹atemp
This produces an interesting sound with some of the character of the TB–303.
Extension of the above techniques to higher order filters presents some problems. As
the order of the equations increases, the coefficients become extremely sensitive to
small changes and round-off errors. One form of higher-order filters that does not
have this defect is called the cascade form. The Csound code shown in figure 10.6
implements the cascade form to obtain a four-pole filter.
This technique can be extended to obtain higher order filters. Figure 10.7 shows
the orchestra and score code for one of the instruments from 1002.orc that models
an analog synthesizer with a resonant lowpass filter:
ayn1 ⫽ ((ka1⫹2*ka2)*ayn1m1-ka2*ayn1m2⫹axn)/(1⫹ka1⫹ka2)
ayn1m2 ⫽ ayn1m1
ayn1m1 ⫽ ayn1
ayn ⫽ ((kb1⫹2*kb2)*ayn2m1-kb2*ayn2m2⫹ayn1)/(1⫹kb1⫹kb2)
ayn2m2 ⫽ ayn2m1
ayn2m1 ⫽ ayn
Figure 10.6 Orchestra code excerpt for the simulation of a 4-pole filter by cascading two
2-pole filters.
sr ⫽ 44100
kr ⫽ 44100
ksmps ⫽ 1
nchnls ⫽ 1
f 1 0 1024 10 1 ; SINE
f 2 0 256 7 -1 128 -1 0 1 128 1 ; SQUARE
f 3 0 256 7 1 256 -1 ; SAW
f 4 0 256 7 -1 128 1 128 -1 ; TRI
f 5 0 256 7 1 64 1 0 -1 192 -1 ; PULSE
f 6 0 8192 7 0 2048 0 0 -1 2048 -1 0 1 2048 1 0 0 2048 0
f 7 0 1024 8 -.8 42 -.78 400 -.7 140 .7 400 .78 42 .8
r 4 NN
t 0 400
; DISTORTION FILTER INSTRUMENT TB-303
; INS STA DUR AMP PITCH FC Q TAB
i 1007 0 1 5000 6.07 5 50 3
i 1007 ⫹ . ⬍ 6.07 ⬍ . 3
i 1007 . . ⬍ 6.05 ⬍ . 3
i 1007 . . ⬍ 6.02 ⬍ . 3
i 1007 . . ⬍ 6.07 ⬍ . 3
i 1007 . . ⬍ 6.02 ⬍ . 3
i 1007 . . ⬍ 6.10 ⬍ . 3
i 1007 . . 8000 6.06 100 20 3
s
Figure 10.7 Orchestra and score code excerpt from 1002.orc featuring instr 1007, a Roland
TB-303 emulation instrument with a distorted resonant lowpass filter.
219 Modeling “Classic” Electronic Keyboards in Csound
Conclusion
The filters presented in this chapter provide a useful basis for future work in simulat-
ing analog synthesizers. Some further things to try would be to modify the distortion
table in the distortion filter to unusual shapes. Development of more complex wave-
forms, such as a oscillator-sync or a sample-and-hold waveform would also be inter-
esting. Waveforms with sharp edges stimulate the resonance aspect of the filter and
may result in more interesting timbres. Also highpass and bandpass filters could be
developed using the methodology described in this section.
References
Henricksen, C. 1981. “Unearthing the mysteries of the leslie cabinet.” Recording Engineer/
Producer Magazine, http://wcbi.com/organs/hammond/faq/mystery/mystery.html
Oppenheim, A., A. Willsky, and I. Young. 1983. Signals and Systems. Englewood Cliffs, N.J.:
Prentice-Hall.
Rajmil Fischman
The development of digital synthesis and processing owes a great deal to the “clas-
sic” techniques employed in the early years of electroacoustic music. These involved
processes that in most cases were based on the capabilities of analog devices. Even
today, when the flexibility offered by digital systems allows tighter control of spectral
characteristics and application of sophisticated mathematical models, the theoretical
principles underpinning the “classic” processes still offer a powerful set of tools for
the achievement of complex morphologies, which is greatly enhanced by the versatil-
ity of the new technology.
“Classic” techniques may be categorized according to the principles involved in
their realization, as shown in figure 11.1.
Frequency-domain techniques are based on the assumption that any signal can be
considered to be the sum of sines or cosines—each with its own amplitude, fre-
quency and phase—according to theoretical principles developed by Fourier (1768–
1830). Mathematically, this may be expressed as follows:
s(t ) = ∑
i=0
Ai sin ( 2f it + i ) (11.1)
where Ai, fi and i are, respectively, the amplitude, frequency and phase of the ith
sinewave.
On the other hand, granular synthesis is a time-domain technique based on the
construction of signals from the combination of short sounds, called grains.
Linear techniques are processes that do not distort the input and, as a consequence,
do not create new frequencies that were not already contained in the input before it
was processed. Linear procedures process signals in three possible ways: delay of
samples, scaling (which is equivalent to multiplying a sample by a constant) and
addition or subtraction of scaled samples (which may or may not have been delayed).
Nonlinear techniques consist of the controlled distortion of a signal, which results
in the creation of frequencies not found before it was processed. This can be achieved
224 Rajmil Fischman
⎡ Additive synthesis
⎡Linear ⎢⎣Subtractive synthesis
Frequency − domain ⎢ ⎡Ring modulation
⎢⎣Non - linear ⎡ Amplitude modulation
⎣⎢Frequency modulation ⎣⎢ Waveshaping
Time - domain (Non - linear) Granular synthesis
in various ways. For example, it is possible to use a signal in order to modify the
amplitude of another (amplitude modulation) by multiplying the former by the latter.
Alternatively, a signal may be used to modify the frequency of another (frequency
modulation).
We will now proceed to examine the various techniques in more detail.
Additive Synthesis
The waveform in the “bassoon” passage of the final section of this example is
generated with f 2, which uses GEN10 to produce a waveform resulting from com-
bining eight components, each with its own relative amplitude:
225 A Survey of Classic Synthesis Techniques in Csound
LINEN
(kenv)
p5
OSCILI
p6
(asig)
Figure 11.2 Block diagram of instr 1101, a simple oscillator instrument with an amplitude
envelope.
Figure 11.3 Orchestra code for instr 1101, a simple oscillator instrument.
In its simplest form, additive synthesis may be used to produce a static spectrum.
This is a combination of sines and cosines in which the amplitude and frequency of
each component remain unchanged throughout the duration of the sound. Within
these constraints, timbral identity is determined by the particular set of relative am-
plitudes, frequencies and—to a lesser extent—phases of the components. For ex-
ample, consider the sounds in figure 11.4. Close inspection of sounds 1 and 2 reveals
that their frequency components have relative ratios 1, 2, 3, 4, 5, 6, and 7, namely:
200 ⫽ 2 x 100 and 220 ⫽ 2 x 110
300 ⫽ 3 x 100 330 ⫽ 3 x 110
400 ⫽ 4 x 100 440 ⫽ 4 x 110
etc. etc.
The same happens to the corresponding amplitudes, which have relative ratios 1,
1.5, 2, 2.5, 2, 1.5, 1. Therefore, we expect these sounds to have the same timbre (and
different pitch), in spite of the fact that they do not have any common frequency
components. On the other hand, the frequency ratios in sound 3 are 1, 3, 5, 7, 9, 11,
13 and the relative amplitudes are 7, 6, 5, 4, 3, 2, 1. These are not the same as sound
2; thus in spite of having some common frequencies with the latter, sound 3 has a
different timbre. Sounds 1, 2 and 3 are realized in 1102.orc and 1102.sco. The orches-
tra uses instr 1102, which is similar to instr 1101, except for the fact that p6 and p7
226 Rajmil Fischman
indicate the attack and decay and p8 indicates the function table. The score file
1102.sco implements the spectrum of sounds 1, 2, and 3 from figure 11.4 using the
function tables shown in figure 11.7.
Frequency ratios may also determine whether a signal has pitch. If f is the lowest
frequency component of a sound, its spectrum is said to be harmonic when all other
components are integer multiples of f. In this case, the sound will normally have a
definite pitch determined by f, which is called the fundamental. The components are
then said to be harmonics: the first harmonic is f, the fundamental; the second har-
monic is 2f, the third 3f and so on. When the relative frequencies of the components
are not integer multiples of f, the spectrum is inharmonic and it is more difficult to
recognize pitch. Increasing deviation from harmonic ratios causes sounds to become
less pitched. The files 1103.orc and 1103.sco demonstrate this: the first event is a
sound with a 280 Hz fundamental and six additional harmonics with relative ampli-
tudes 1, 0.68, 0.79, 0.67, 0.59, 0.82, and 0.34. This is followed by an inharmonic
sound in which the lowest component is also f ⫽ 280 Hz but the other six are not
integer multiples of f but rather 1.35f, 1.78f, 2.13f, 2.55f, 3.23f, and 3.47f. Although
the relative amplitudes of the components are kept, the second sound does not resem-
ble the first and has no definite pitch because its spectrum is inharmonic.
In order to implement a sound with seven components, instr 1103 uses seven oscil-
lators that are subsequently mixed and multiplied by the overall envelope (kenv).
The relative amplitudes of the components are given in p6, p8, p10, p12, p14, p16,
and p18. The frequencies of the oscillators are obtained by multiplying the reference
frequency f (given by p5) by the component ratios specified in p7, p9, p11, p13, p15,
p17, and p19. For example, if f ⫽ 280 Hz and the second component has a relative
amplitude of 1 and a frequency ratio of 1.35, the values of p5, p8, and p9 will be
280, 1 and 1.35, respectively. The oscillator producing this component will be:
a1 oscil p8, p5*p9, 1 ; 1ST COMPONENT
p6 p7
LINEN
(kenv) p5
OSCILI
p8
(asig)
Figure 11.5 Block diagram of instr 1102, a variable waveform oscillator instrument.
Figure 11.6 Orchestra code for instr 1102, a variable waveform single oscillator instrument
with variable attack and release amplitude envelope.
OSCIL
1
(a1) ...
7
LINEN
(kenv)
Figure 11.8 Block diagram of instr 1103, a seven partial additive synthesis instrument with
common amplitude envelope.
228 Rajmil Fischman
So far, the discussion has focused on static spectra. Most sounds in nature, how-
ever, have dynamic spectra, whereby the amplitude and frequency of each compo-
nent change throughout the duration of a sound. This means that Ai, fi and i in
equation 11.1 are time-dependent.
Using additive synthesis in order to achieve dynamic spectrum may become a
laborious task given the amount of data involved. Convincing results may sometimes
require the use of a large number of components, each of which requires independent
amplitude, frequency and phase control. In 1104.orc and 1104.sco I present an ex-
ample of dynamic spectrum synthesis realized with an instrument that employs six
oscillators, each with variable amplitude and frequency. In addition, the output is
spatialized. The amplitude and frequency of each component vary by a percentage
specified respectively in p8 and p9. These are translated into fractions using the fol-
lowing statements:
imaxaf ⫽ p8/100.00 ; MAXIMUM AMPLITUDE FLUCTUATION
imaxff ⫽ p9/100.00 ; MAXIMUM FREQUENCY FLUCTUATION
In order to spatialize the output, the number of channels is set to 2 in the header
(nchnls⫽ 2), an outs statement is used instead of out, and asig is multiplied by time-
varying scaling factors kpleft and kpright before it is sent to the left and right
channels:
outs kpleft*asig, kpright*asig ; OUTPUT
kpleft and kpright are calculated according to the following algorithm: if kpan is the
instantaneous position along the line joining the speakers, kpan ⫽ ⫺1 and kpan ⫽
1 represent respectively the positions of the left and right speakers. Values between
229 A Survey of Classic Synthesis Techniques in Csound
1st component
iramp1 imaxaf ifreq1 imaxff
6th component
(imaxaf1) (imaxaff1)
0 idur 0 idur
OSCIL1
iafunc1
OSCIL1
iffunc1 ...
(kampf1) (kfreqf1)
OSCILI
1
(a1)
LINEN
iramp1 . . . iramp6
(kenv)
ipanfunc (iampsum)
imaxpan
(asig)
PAN
(kpleft) (kpright)
Figure 11.9 Block diagram of a six component additive synthesis instrument with variable
amplitude and frequency of each partial, plus a group envelope and panning.
Figure 11.10 Orchestra code for the first component of additive instrument shown in fig-
ure 11.9.
230 Rajmil Fischman
⫺1 and 1 represent positions between the speakers (kpan ⫽ 0 is the center). Values
below ⫺1 represent positions beyond the left speaker.
If the source is between the speakers then:
2 (1 − kpan ) 2 (1 + kpan )
kpleft = ; kpright = ( − 1 ≤ kpan ≤ 1)
2 1 + kpan2 2 1 + kpan2
(11.2)
2
kpleft = ; kpright = 0 ( kpan < − 1) (11.3)
1 + kpan 2
2
kpleft = 0 ; kpright = ( kpan > 1) (11.4)
1 + kpan 2
where imaxpan is 2 and ipanfunc ( p34) is the function table determining the shape
of the trajectory. Since there are different formulas for the position of the source, the
instrument must check the value of kpan and decide which formula to apply using if
. . . kgoto statements.
Summary
Additive synthesis may be used to create dynamic spectra. Each component requires
the following parameters:
1. Amplitude envelope.
2. Time-varying frequency.
Subtractive Synthesis
This is the counterpart of additive procedures. While the latter constructs spectra
using single sinewaves in order to implement the equation, subtractive synthesis uses
231 A Survey of Classic Synthesis Techniques in Csound
Figure 11.11 Orchestra code excerpt for panning algorithm supporting both an equal-power
pan and a wide-stereo effect.
complex spectra as inputs that are shaped by enhancing or attenuating the component
sinewaves, as illustrated in figure 11.12. Mathematically, this means that the values
of Ai in equation 11.1 are modified.
The output depends on the choice of sources and the behavior of the filters. In
addition, some type of amplification may be required after filtering as a result of
power loss owing to the attenuation of components.
The main consideration regarding choice of sources is their spectral content. If we
intend to process frequencies in a certain region of the spectrum, it is important to
ensure that these frequencies exist in the source; otherwise there will be nothing to
filter. For this reason, noise and trains of pulses are frequently used, since they offer
a uniform spread of components throughout the auditory range.
Ideal white noise is probably the richest available source. For practical purposes,
it is possible to consider it as a signal that contains all frequencies evenly distributed
throughout the auditory range. White noise is normally obtained using a generator
that produces a random number every sample. The aural result is rather like a hiss.
An ideal train or sequence of pulses consists of a signal containing an infinite
number of harmonics, all of which have the same relative amplitude. In practice,
approximations of a pulse sequence may be obtained by combining as many harmon-
ics as possible up to the upper threshold of the auditory range. It is also important to
consider the limitations imposed by the sampling rate sr (according to the sampling
theorem, the highest frequency sampled with sr must be less than sr/2). The file
232 Rajmil Fischman
LINEN
(kenv)
RAND
(asig)
Figure 11.14 Orchestra code for instr 1105, an envelope controlled white noise instrument.
1105.orc consists of instruments that produce white noise and trains of pulses. For
instr 1105 a rand generator is used to produce white noise as shown in figure 11.13.
For instr 1106 a buzz is used to produce a train of pulses with as many harmonics
as possible given the sampling rate (sr). If the frequency of the fundamental is p5,
then the frequency of the n th harmonic is n times p5. This frequency must be less
than sr/2; therefore, the maximum number of harmonics, iinh, must be equal or less
than sr/2/p5. Since the number of harmonics must be an integer, the operator int is
used in order to calculate iinh. Figure 11.15 shows a block diagram of instr 1106.
The file 1105.sco produces white noise (instr 1105) followed by a train-of-pulses (a
pulse-train instr 1106) with a fundamental of 75 Hz.
Filters are characterized by their response, which represents the frequency regions
they attenuate and enhance. Figure 11.17 shows the four ideal types of filters used in
subtractive synthesis. These are classified as follows:
1. Filters that only pass frequencies above a cut-off value fc, or highpass.
2. Filters that only pass frequencies below fc, or lowpass.
3. Filters that only pass components with frequencies inside a band above and below
a center frequency fc, or bandpass. The size in Hz of the bandpass is the band-
width (bw).
4. Filters that only pass components with frequencies outside a band above and be-
low a center frequency fc, or bandreject.
233 A Survey of Classic Synthesis Techniques in Csound
sr/2
LINEN
p5
(kenv)
(iinh)
p5 1
BUZZ
(asig)
Figure 11.15 Block diagram of instr 1106, a buzz (pulse train) instrument with an ampli-
tude envelope.
Figure 11.16 Orchestra code for instr 1106, a buzz instrument with controls for the number
of harmonics in the pulse-train.
BW BW
stop pass pass stop stop pass stop pass stop pass
fc fc fc fc
a b c d
Figure 11.17 Ideal filter types: (a) Highpass, (b) Lowpass, (c) Bandpass, and (d) Bandreject.
234 Rajmil Fischman
FILTER
FILTER
a b
The four filter types shown in figure 11.17 represent ideal filters. In practice, the
transition between pass and stop regions is not as sharp as in the figure. Its slope is
known as the roll-off and is measured in decibels per octave, which is the change in
attenuation when the frequency is doubled. The fact that there is a slope means that
the cut-off frequency must be re-defined as the value at which the attenuation is –3
dB, which is equivalent to a drop in amplitude by a factor of about 0.71.
In order to achieve sharper responses, filters may be cascaded by using the output
of one filter as the input of another. Filters may also be connected in parallel, with
two or more filters sharing the same input and having their outputs mixed into one
signal; thus achieving complex response curves. Cascade and parallel connections
are shown in figure 11.18.
We have seen above that in order to achieve a dynamic spectrum it is necessary to
vary the amplitudes and frequencies of components in time. Linear filters cannot
alter the frequencies of the components of a source; however, they can change their
amplitudes. This may be done by varying the cut-off frequency in low and highpass
filters and by varying the center frequency and bandwidth in bandpass and bandrej-
ect filters.
As an example, the files 1107.orc and 1107.sco produce sounds resembling vowel
articulation by modeling the mechanism driving the vocal chords: a rich pulse (maxi-
mum possible harmonics) is passed through five parallel bandpass filters.
The center frequencies and bandwidths of the filters fluctuate randomly in the
vicinity of values corresponding to the filtering processes in human speech. The rate
at which the random numbers are generated is varied between a minimum (irfmin)
and a maximum (p9), according to the control variable krfl:
irfmin ⫽ p8 ; MINIMUM RANDOM RATE
itfl ⫽ p9-p8 ; MAXIMUM FLUCTUATION
irfunc ⫽ 2 ; FLUCTUATION FUNCTION
...
...
235 A Survey of Classic Synthesis Techniques in Csound
0 .5 idur
p9 p5
OSCIL1I
p8 irfunc 5
(irf1) (krand) (iff1)
p5 sr
p8 p5
max harm.
(krf1) (kfrnd) (iinh)
BUZZ
(apulse)
1 ifr eq
OSCIL
iplfunc
(afilt) (abal)
BALANCE
(asig)
LINEN
(kenv)
Figure 11.20 Orchestra code excerpt from instr 1107, a pulse-train generator with fluctuat-
ing frequency.
Each filter uses a randi generator in order to produce center frequency and band-
width fluctuations. For example, the first formant is controlled by k1, which is mul-
tiplied by the maximum center frequency fluctuation, if1cff and added to a minimum
center frequency if1cf. The bandwidth is controlled similarly:
k1 randi 1, krfl, .12 ; RANDOM GENERATOR
...
... ; FIRST FORMANT
afilt1 reson apulse, if1cf⫹k1*if1cff, if1bw*(1⫹k1), 0
The input to the formant filter is apulse, a train of pulses generated using buzz.
Its fundamental frequency is made to fluctuate by up to 1/5 of its value. This process
is controlled by krand, which is scaled by iffl—the maximum frequency fluctuation
and added to ifreq—the minimum frequency value, to produce kfrnd, the frequency
input to the pulse generator shown in figure 11.20.
Finally, the filtered pulses are mixed, balanced with a sinewave and sent to the
output:
afilt ⫽ afilt1⫹afilt2⫹afilt3⫹afilt4⫹afilt5 ; MIX FILTR OUT
abal oscil 1, ifreq, iplfunc ; SINEWAVE CONTROL SIG
asig balance afilt, abal ; OUTPUT BALANCE
out kenv*asig
Summary
There are three important factors that must be considered in order to obtain dynamic
spectra using subtractive procedures:
237 A Survey of Classic Synthesis Techniques in Csound
Ring Modulation
This nonlinear technique consists of the use of a signal, the modulator, to modify the
amplitude of another signal, the carrier. Each sample of the modulator multiplies a
corresponding sample of the carrier, distorting the latter and creating new spectral
components.
The simplest case of amplitude modulation is that of a sinewave that multiplies
another sinewave. If the frequencies of the carrier and modulator are, respectively, fm
and fc, the output is:
s( t ) = sin ( 2 fm t )sin ( 2 fc t ) (11.5)
But, from the trigonometric identity for the product of two sines, we have:
s(t ) =
1
2
[cos (2[ fc + fm ]t ) − cos (2[ fc − fm ]t )] (11.6)
The equation above represents a spectrum containing two components with fre-
quencies fc⫹fm and fc⫺fm. These are called sidebands, because they appear on both
sides of the carrier, as shown in figure 11.21.
The modulation process requires caution. In the first place, it is important to ensure
that fc⫹fm does not exceed half of the sampling rate (sr/2) in order to comply with the
sampling theorem, avoiding aliasing, which causes frequencies over sr/2 to reflect,
appearing to be lower, thus assuming a different spectral “identity.” Second, if fc and
fm are close, their difference may be below the auditory range; this occurs when fc⫺fm
X =
fm fc fc
fc- fm fc + fm
1 p5 1 p8
OSCILI OSCILI
p9 p9
(acarr) (amod)
LINEN
(kenv) (asig)
Figure 11.22 Block diagram of instr 1109, an amplitude modulation instrument with ampli-
tude envelope.
Figure 11.23 Orchestra code for instr 1109, the amplitude modulation instrument shown in
figure 11.22.
is below about 20 Hz. If fc is larger than fm, the difference will be a negative number.
But from the identity:
We can infer that only the absolute value of the difference is important and that the
sign can be ignored. In other words, a negative frequency reflects as a positive one.
A simple implementation of an amplitude modulator consists of two oscillators, a
carrier and a modulator, which have their outputs multiplied, as shown in figure
11.22.
The effects of modulation using sinewaves are shown in 1109.orc and 1109.sco.
The orchestra consists of two instruments: instr 1108 is identical to instr 1101 and is
used to produce separate pairs of sinewaves. Whereas instr 1109 is used to carry out
the amplitude modulation by multiplying the sinewaves. The score produces pairs
followed by their product, in the order shown in figure 11.24.
It is worth noticing that the 10 Hz modulator of the first pair is inaudible; however,
its effect on the 400 Hz carrier results in two distinct sidebands in the auditory range.
Furthermore, the third pair produces a difference sideband of 15 Hz. Therefore, only
the 785 Hz component is perceived.
239 A Survey of Classic Synthesis Techniques in Csound
Figure 11.24 Table of amplitude modulated input signals and output results.
X =
fm1 fc fc
fm2
fm3
The process above can be extended to signals with several components. For ex-
ample, the modulator may consist of three frequencies, fm1, fm2 and fm3. Each of these
can be considered individually when the modulator is applied to a carrier fc, as illus-
trated in figure 11.25. Therefore, the output consists of the following pairs:
fc + fm1 and fc − fm1 fc + fm 2 and fc − fm 2 fc + fm 3 and fc − fm 3
The file 1110.orc consists of instr 1110, a modified version of instr 1109 that allows
use of different function tables for both carrier and modulator (the function for the
carrier is given in p9 and that for the modulator in p10). The file 1110.sco includes
two ring-modulation processes. In the first, a 110 Hz signal with 5 harmonics modu-
lates a 440 Hz carrier. In the second, the same carrier (440 Hz) is modulated with a
134 Hz signal. When synthesized, it is immediately apparent that the first sound is
pitched while the second is not. This can be explained by inspection of the output
frequencies. The components of the first modulator are 110, 220, 330, 440, and 550
Hz. In the second case, they are 134, 268, 402, 536, and 670 Hz; therefore the com-
ponents of the respective outputs can be calculated as shown in figure 11.26.
The output of the first sound has the following components; 0 (not heard), 110,
220, 330, 550, 660, 770, 880, and 990 Hz, which produce a harmonic series having
a definite pitch. On the other hand, the second output is composed of 38, 96, 172,
230, 306, 574, 708, 842, 976, and 1110 Hz, which produce an inharmonic spectrum.
240 Rajmil Fischman
Figure 11.26 Tables evaluating result of amplitude modulating a complex source, consisting
of five harmonics, with a 110 Hz sine wave and a 134 Hz sine wave.
The example above shows that it is possible to predict harmonicity when the fre-
quencies of carrier and modulator components are known. This may, however, be a
laborious task when using complex modulators. Obviously, if the modulator is an
inharmonic signal, the output will also be inharmonic. If the modulator is harmonic,
it is enough to check the result of dividing the carrier by the modulator, called the
carrier to modulator ratio, or the c/m ratio. If c/m is an integer, then the carrier is a
multiple of the modulator and subtracting or adding the later to fc will create another
multiple. Therefore, all the frequencies will be multiples of fm, which will effectively
become the fundamental. Similarly, if c/m is of the form 1/n, where n is an integer,
the modulator is a multiple of the carrier and, as a consequence, the output frequen-
cies will also be multiples of fc. When c/m deviates from an n/1 or 1/n ratio, the out-
put frequencies become more and more inharmonic. Small deviations (e.g., 1.001)
will still produce pitched sounds because the output components will be close to ac-
tual harmonic values. In fact, these small deviations produce beating, which may add
some liveliness. The effect of the carrier to modulator ratio is shown in 1110a.orc
and 1110a.sco. The orchestra consists of instr 1110 (described above) and the score
contains the following:
Ring modulation can also be used to produce dynamic spectrum. The morphology
of the resulting sound can be controlled through the following parameters:
1. Duration.
2. Overall amplitude.
3. Frequency of the carrier.
241 A Survey of Classic Synthesis Techniques in Csound
4. Carrier to modulator ratio, which also determines the modulator frequency from
the frequency of the carrier.
5. Fraction of the carrier that is actually modulated (a small percentage will produce
less noticeable distortion).
Summary
0 1 p3 0 ifr p3 0 p6 p3
1 1
1
OSCILI OSCILI
10 11
(acarr) (amod)
(aoutnm)
(aoutm)
0 p4 p3
(aoutm)
OSCIL1
p7
(kamp)
Figure 11.28 Orchestra code for instr 1111, a dynamic Ring-modulation instrument with
time varying: amplitude, carrier frequency, modulating frequency, and modulation ratio.
243 A Survey of Classic Synthesis Techniques in Csound
Waveshaping
+1
-1 time
-1 +1
time
Figure 11.30 Waveshaping a sinusoidal waveform with a linear transfer function. The input
is below and the output is to the right.
4x 3 − x (11.8)
If the polynomial of the third degree is used as a transfer function that is fed a
sinusoidal input of frequency f, we should expect the highest frequency in the output
245 A Survey of Classic Synthesis Techniques in Csound
ioffset
p5
OSCIL
p8
(ain) ioffset
p9 0
TABLEI
LINEN
(awsh)
(kenv)
Figure 11.32 Orchestra code for instr 1112, a simple waveshaping instrument.
to be 3f. Replacing a sinewave of frequency f, using the trigonometric identity for the
sine of three times an angle and performing some algebraic manipulation, we obtain:
s ( t ) = 2 sin ( 2ft ) − sin (2[3 f ]t ) (11.9)
As expected, the highest frequency, 3f, appears in the second term. A further re-
finement is provided by a family of polynomials that produce a specific harmonic of
a sinewave, known as Chebyshev polynomials. The first four Chebyshev polynomials
of the first kind are shown in figure 11.35.
It is easy to check that these produce the desired harmonics by replacing sin(2pft)
instead of x in any of the polynomials above. And with the aid of Chebyshev poly-
nomials, it is possible to achieve any combination of harmonics, each with a specified
relative amplitude. For instance, if the combination shown in figure 11.36 is required
the transfer function will be:
So far, the input has consisted of sinusoids with an amplitude of 1. If the former is
multiplied by an amplitude factor K, normalized between 0 and 1, the relative ampli-
246 Rajmil Fischman
Figure 11.33 A set of transfer functions and their outputs. The input is a sinewave as shown
in figure 11.30.
247 A Survey of Classic Synthesis Techniques in Csound
Figure 11.35 The first four Chebyshev polynomials of the first kind.
Figure 11.36 Table of user specified harmonic number and relative amplitudes.
tude of the harmonics will be affected. Feeding this type of input to the waveshaper
in our equation by using the identity for the sine of 3 times an angle and rearranging
terms, will result in the following output:
ifr ifr
1.01 .99
(ibeatfb) (ibeatff)
p3
LINE
(kfreq2)
0 1 p3
ioffset OSCIL1I
ifr p8 ioffset
ioffset ioffset
p7 1 p7 1
TABLEI TABLEI
(awsh1) (awsh2)
inobeat ibeat
p4
(asig)
The relationship between the amplitude of a signal and its harmonic content makes
waveshaping suitable for synthesis of brass-like instruments, which are a class char-
acterized in part by the prominence of high components when the overall amplitude
increases. The files 1113.orc and 1113.sco produce a short brass-like fanfare using
the instrument shown in figures 11.37 and 11.38.
In our example instr 1113 is composed of two waveshapers: the first one processes
a sinewave of constant frequency f and the second processes a sinewave that varies
its frequency throughout the duration of each note from 1.01f to 0.99f. The outputs
of these waveshapers are mixed in a relative proportion of 0.8:0.2 (inobeat:ibeat),
producing a variable beating pattern that further enhances the waveshaping process.
The reader may notice that ioffset is 0.5 and not half of the table size minus one. This
is because the table statements are used in normalized mode (the waveshaped signal
varies between ⫺0.5 and ⫹0.5).
249 A Survey of Classic Synthesis Techniques in Csound
Figure 11.38 Orchestra code for instr 1113, a dual waveshaping instrument.
Summary
Frequency Modulation
... fc - 4fm fc- 3fm fc- 2fm fc - 1fm fc fc+ 1fm fc+ 2fm fc+ 3fm fc+ 4fm ...
Where I, called the modulation index, controls the degree of distortion applied to
the carrier. Its function can be compared with that of the distortion index in wave-
shaping and will be discussed below.
Equation 11.12 may be manipulated by expressing the sine functions as a power
series that, after a rather lengthy process, results in the following expression
LINEN
(kenv)
p5 p6 p7 p8 p9
FOSCILI
(asig)
Figure 11.41 Orchestra code for instr 1114, a simple static FM instrument.
We saw above that ring modulation may generate negative frequencies. This also
happens in Frequency Modulation (FM). In this case, formula 11.13 only contains
sines (as opposed to cosines in ring modulation); therefore, from the trigonometric
identity:
we can infer that negative components reflect with a change of sign. This is equiva-
lent to a phase shift of , or half a cycle.
The carrier to modulator ratio is also an important FM parameter and has a similar
effect on the output to that of the ring modulation c/m. If c/m is not a rational number,
the spectrum will be inharmonic, but if the c/m can be represented as a ratio of
integers
fc Nc
= (11.15)
fm Nm
then the fundamental will be f ⫽ fc /Nc ⫽ fm /Nm and the spectrum will be harmonic.
Also, fc and fm will be respectively the Nc th and Nm th harmonics. If the fundamental
f is below the auditory range, however, the sound will not be perceived as having
definite pitch, as demonstrated in 1114a.orc and 1114a.sco, which use instr 1114 to
produce the three sounds shown in figure 11.42. Furthermore, because the harmonic
content depends on the difference between the carrier and multiples of the oscillator
frequency, we can conclude that if Nm ⫽ 1, the spectrum will contain all the harmon-
ics. If Nm is even, every second harmonic will be missing and the spectrum will
252 Rajmil Fischman
Figure 11.42 Table of tutorial parameter values and description of their sound.
f⫹2f ⫽ 3f f-2f ⫽ -f
f⫹2x(2f) ⫽ 5f f-2x(2f) ⫽ -3f
f⫹3x(2f) ⫽ 7f f-3x(2f) ⫽ -5f
f⫹4x(2f) ⫽ 9f f-4x(2f) ⫽ -7f
etc.
Figure 11.43 Table of odd partials resulting from a c/m ration of 1:2.
contain f, 3f, 5f, etc. For example, if c/m ⫽ 1/2, then Nm ⫽ 2, fc ⫽f and fm ⫽ 2f;
therefore, the spectrum will only contain odd harmonics, which are the result of
adding f to a multiple of 2f, as shown in figure 11.43. Similarly, if Nm⫽3, every third
harmonic will be missing.
Dynamic FM spectra may be obtained by making I and c/m functions of time. In
fact, a time-varying index alone provides enough versatility to produce a variety of
sounds, as illustrated in 1115.orc and 1115.sco in which instr 1115 controls the enve-
lope and the index using the variables kenv and kidx. The score file 1115.sco produces
a passage that includes various types of sounds based on settings for bells, wood-
wind, brass and membranophones, proposed by Chowning (1973), in which c/m is
fixed for each type and only the index changes between a maximum and a minimum
value ( p8 and p9, respectively). The modulation-index, I, is driven by an oscillator
with different generating functions given in the score, according to the desired type
of sound. The overall amplitude envelope also plays an important role in modeling
these sounds. For example, the sudden attack resulting from hitting the body of a
bell and the subsequent slow decay into a pitched sound may be modeled using an
exponential amplitude envelope that lasts a few seconds, an inharmonic carrier-to-
modulator ratio and a modulation-index that initially favors high partials (kidx⫽6)
and decays slowly to zero, gradually making the carrier more prominent (kidx⫽1.2).
The amplitude and index envelopes are modeled with the following function tables:
f 11 0 512 5 1 512 .0001 ; BELL AMPLITUDE
f 12 0 512 5 1 512 .2 ; BELL INDEX
p8-p9
0 p3
0 p4 p3 OSCIL1
p12
OSCIL1 (kindx)
p11
p9
(kenv)
ifr eq
p6 p7 p10
FOSCILI
(asig)
Figure 11.45 Orchestra code for instr 1115, a dynamic FM instrument with amplitude and
spectral envelopes.
Summary
There are two important parameters that determine the spectral characteristics of
sounds produced by frequency modulation.
1. The carrier to modulator ratio determines the location of frequency components.
2. The index determines which components will be prominent.
Granular Synthesis
Granular synthesis theory was first developed by Gabor (1947), who argued that
signals can be conceived as the combination of short sonic grains. A particular grain
may be indistinguishable from another grain; however, combinations of large num-
bers of these sonic units may yield different morphologies, depending on the internal
254 Rajmil Fischman
constitution of the grains and on the particular way in which the combination is
structured.
According to psychoacoustic theory, human perception becomes ineffective in
recognizing pitch and amplitude when sonic events become too short: the threshold
has been estimated to be in the region of 50 milliseconds (Whitfield 1978). There-
fore, typical durations of grains usually fall between 10 to 60 milliseconds.
A grain usually consists of a waveform with an envelope, as shown in figure 11.47.
In principle, the former could be any signal—ranging from pure sinewaves to re-
corded samples of complex sounds. The envelope can have various shapes—for ex-
ample, it could be a Gaussian curve, a triangle, a trapezoid, half-a-sine, etc. When
grains are combined, the shape of the waveform and envelope are influential factors
that determine the timbre of the overall sonic result. As a rule of thumb, complex
waveforms will lead to sounds with larger noise content. Also, envelopes with edges
(such as triangles and trapezoids) will produce rougher signals. Furthermore, de-
pending on the sampling rate, if the duration of a grain is short, the attack and/or decay
of a trapezoid may become vertical, causing clicks. On the other hand, smooth enve-
lopes such as half-sines may be effective in preventing clicks at shorter durations.
Because grains are short, it is necessary to manipulate these in large numbers to
obtain any significant results; sometimes, the numbers may reach up to 1000 grains
per second of sound. Therefore, it is useful to adopt high level organizational strate-
gies that take care of the manipulation of the various parameters associated with
grain characteristics and with their combination.
Throughout the history of electronic synthesis, there have been various approaches
and, even now, new strategies are being developed. In the early sixties, Xenakis
(1971) proposed organization of grains according to a synchronous process: time
may be divided into frames that are then played in succession at a constant rate,
similar to movie frames that produce continuous movement. Each frame consists of
two axes—one of which measures frequency and the other amplitude—and contains
a particular set of grains, each with its own amplitude-frequency values. Therefore,
when the frames are “played,” a particular succession of grains with varying fre-
quency and density is obtained.
Another strategy, perhaps the most popular to date, consists of the creation of
asynchronous “clouds,” described in depth by Roads (1985, 1991). The main idea
behind a cloud of grains is that the tendencies of the various parameters that influence
the resulting musical sound may be controlled by means of a process that is partly
random and partly deterministic. For example, the frequency of the grain waveform
may be generated by a random device that produces values falling between a lower
and upper limit determined by the composer. These limits may change in time, pro-
ducing dynamic spectra.
255 A Survey of Classic Synthesis Techniques in Csound
The files 1116.orc and 1116.sco produce a 20 second cloud created by using an instru-
ment that implements the parameters described above. The overall envelope of the
cloud is produced using an oscil1 statement with duration idur ( p3) amplitude imax-
amp ( p4) and function iampfunc ( p5)
kenv oscil1 0, imaxamp, idur, iampfunc ; OVERALL ENVELOPE
The lower limit of the frequency band varies between a minimum ilfbmin and a
maximum ilfbmax, given respectively by p11 and p12. The difference of these values,
ilfbdiff, is fed to an oscil1 statement driven by the function ilbffunc ( p3). The output
of the oscillator is then added to lbfmin in order to obtain the time-varying lower
limit klbf.
ilfbmin ⫽ p11 ; MINIMUM FREQ OF LIMIT
ilfbmax ⫽ p12 ; MAXIMUM FREQ OF LIMIT
ilfbdiff ⫽ ilfbmax-ilfbmin ; DIFFERENCE
ilbffunc ⫽ p13 ; LOWER LIMIT FUNCTION
klfb oscil1 0, ilfbdiff, idur, ilbffunc ; LOWER LIMIT FLUCTUATION
klfb ⫽ ilfbmin⫹klfb ; LOWER LIMIT
A similar procedure is applied in order to produce the upper limit of the frequency
band (kufb), the carrier-to-modulator ratio (kcmr), the index (kidx) and the width of
the cloud (kgscat).
Spatialization of the cloud is implemented using the same algorithm employed in
the dynamic additive synthesis instrument described previously in this chapter. The
only difference consists of the addition of grain scatter to the overall panning.
kpan ⫽ kpan⫹kgscat ; ADD GRAIN SCATTER
Since p6 and p7 give the minimum and maximum grain duration in milliseconds,
it is necessary to divide these by 1000 in order to obtain imingd and imaxgd in sec-
257 A Survey of Classic Synthesis Techniques in Csound
onds. The maximum fluctuation in grain duration is igddiff, which is used as ampli-
tude for an oscillator driven by the grain duration function igdfunc ( p8). Therefore,
kgdur is the result of adding the output of the oscillator to the minimum duration
imingd. Finally, kgrate is calculated.
The grain envelope uses kgrate as its frequency and igefunc ( p10) as the generat-
ing function table.
igefunc ⫽ p10 ; GRAIN ENVELOPE FUNC TABLE
kgenvf ⫽ kgrate ; GRAIN ENVELOPE FREQUENCY
kgenv oscili 1.0, kgenvf, igefunc ; ENVELOPE
Also, kgrate is used to generate the relative amplitude and waveform frequency of
the grain. This is done by using a randh statement that produces values between
specified maxima and minima. The relative amplitude consists of a scaling factor
that may assume values between hmaxfc (⫽0.5) and 1.
ihmaxfc ⫽ 0.25 ; HALF OF MAXIMUM AMPLITUDE DEV
kgafact randh ihmaxfc, kgfreq, iseed/3 ; -IHMAXFC⬍RAND NUMBER⬍⫹IHMAXFC
kgafact ⫽ 1.00-(ihmaxfc⫹kgafact) ; 2⬍MAXFC⬍SCALING FACTOR⬍1.00
The waveform frequency assumes a random value between the low and high limits
of the cloud’s frequency band (klfb and kufb).
kgband ⫽ kufb-klfb ; CURRENT FREQUENCY BAND
kgfreq randh kgband/2, kgrate, iseed ; GENERATE FREQUENCY
kgfreq ⫽ klfb⫹kgfreq⫹kgband/2 ; FREQUENCY
A grain is generated using a foscili opcode that uses the cloud’s instantaneous
carrier-to-modulator ratio and index (kcmr and kidx).
igfunc ⫽ p9 ; GRN. WAVE FN
agrain foscili kgenv, kgfreq, 1, kcmr, kidx, igfunc ; FM GENERATOR
In order to avoid mechanical repetition and achieve grain overlap, a random variable
delay is applied to each generated grain. The maximum possible delay value is equal
to the grain duration (kgdur). The actual delay is produced using a randi generator.
kgdel randi kgdur/2, kgrate, iseed/2 ; RANDOM SAMPLE DELAY
kgdel ⫽ kgdel⫹kgdur/2 ; MAKE IT POSITIVE
adump delayr imaxgd ; DELAY LINE
adelgr deltapi kgdel
delayw kgafact*agrain
After the grain is delayed, an additional delay line applies Doppler shift to the
moving grain according to its spatial position kpan. This assumes a speaker distance
of 10 meters.
258 Rajmil Fischman
The output of the Doppler processor is then multiplied by kpleft and kpright to
create the left and right channels.
asig ⫽ kenv*agdop
outs kpleft*asig, kpright*asig
The cloud generated by 1116.sco has an amplitude envelope that is half of a sine-
wave. The grain duration varies between 10 and 30 milliseconds: initially, grains
assume the longer duration, which becomes shorter toward the middle section and
then increases up to about 24 msec., shortening slightly toward the end to 21 msec.
The frequency band is initially narrow, around 2500 Hz, widening and narrowing as
the sound progresses, with a lower boundary that varies between a minimum of 1000
Hz and a maximum of 2500 Hz and an upper boundary that varies between 2500 and
4670 Hz. The carrier-to-modulator ratio assumes an initial value of 1 and progresses
for 1.25 seconds toward 4, its maximum value; it then hovers between a minimum
of 1.48 and a maximum of 2.911. The FM index changes between 1 and 8, reaching
the maximum (producing higher frequency components) after 12.5 seconds. Also f 9
controls the spatial movement in the stereo field, including Doppler shift and f 10
controls the scattering of grains by means of a sinusoid with its second harmonic.
This means that maximum scatter happens at about 2.5 and 17.5 seconds—which
correspond respectively to 1/8th and 7/8ths of a cycle—and minimum scatter hap-
pens in the vicinity of 10 seconds (half a cycle).
Summary
The following parameters are typically used to control the characteristics of a cloud
of grains:
1. Grain duration.
2. Grain waveform type (which may be varied using different techniques such as
FM synthesis).
3. Grain envelope.
259 A Survey of Classic Synthesis Techniques in Csound
4. Cloud density.
5. Cloud amplitude.
6. Envelope.
7. Cloud frequency band.
8. Cloud spatial movement.
9. Cloud width (grain spatial scattering).
Conclusion
This chapter surveyed “classic” synthesis techniques that derive from the capabilities
of devices available in the early stages of electroacoustic music development. These
may be classified into frequency-domain and time-domain techniques. Frequency-
domain techniques can be linear—including additive and subtractive synthesis, or
nonlinear, including ring-modulation, waveshaping and frequency-modulation. The
most popular time-domain “classic” technique is granular synthesis.
Although the above techniques are called classic, they are by no means a thing of
the past. In fact some of these are only beginning to fulfill their true potential since
the advent of computer systems with fast processing speeds, which have given a new
lease of life and extended the possibilities they offer, particularly in the generation
of dynamic complex spectra. Subtractive and additive principles have developed into
sophisticated mechanisms such as linear predictive coding (LPC) and phase vocod-
ing. The combination of subtractive and granular synthesis has led to the develop-
ment of formant-wave-function synthesis (FOF) and wavelet analysis and synthesis.
Furthermore, given that different techniques are conductive to the synthesis of differ-
ent classes of sounds, combinations of “classic” procedures are effectively used to
achieve sonorities that would be extremely difficult—if not impossible—to realize
by means of a single technique.
Therefore, truly understanding “classic” techniques is essential in order to com-
prehend the possibilities offered by new technology and use this to achieve new and
interesting timbral resources—particularly given the many interesting and complex
hybrid combinations.
Finally, the reader should realize that composition is not only about sounds on
their own; new sonorities are also the result of context. Thus, far from being con-
cerned only with the internal properties of sonic structures, electroacoustic composi-
tion is more than ever dependent on the way sounds are combined with each other,
on how they interact and on the way they can be used to shape our perception of time.
260 Rajmil Fischman
References
Backus, J. 1977. The Acoustical Foundations of Music. New York: W. W. Norton & Co.
Chowning, J. 1985. “The synthesis of complex audio spectra by means of frequency modula-
tion.” Journal of the Audio Engineering Society 21: 526–534. Reprinted in C. Roads and
J. Strawn, eds. Foundations of Computer Music. Cambridge, Mass.: MIT Press, pp. 6–29.
Dodge, C., and T. Jerse. 1985. Computer Music, Synthesis Composition and Performance.
New York: Schirmer Books, Macmillan.
Fischman, R. 1991. Musical applications of digital synthesis and processing techniques. real-
ization using csound and the phase vocoder. Keele: unpublished.
Gabor, D. 1947. “Acoustical quanta and the theory of hearing.” Nature. 159 (4044): 591–594.
Roads, C. 1985. “Granular synthesis of sound.” In C. Roads and J. Strawn, eds. 1985. Founda-
tions of Computer Music. Cambridge, Mass.: MIT Press, pp.145–159.
Vercoe, B. 1993. Csound. Software and manual. Cambridge, Mass.: MIT Press.
Whitfield, J. 1978. “The neural code.” In E. Carterette and M. Friedman, eds. 1978. Handbook
of Perception. Vol. 4, Hearing. Orlando: Academic.
FM, or frequency modulation, has been for decades one of the most widely used
digital synthesis techniques. It was first gainfully employed in radio, of course, and
also implemented on some of the earliest voltage controlled synthesizers. It was
not common as a tool for timbre generation until the mid⫺1970s, however, follow-
ing the publication of John Chowning’s seminal article, “The Synthesis of Complex
Audio Spectra by Means of Frequency Modulation” (Chowning 1973). Subse-
quently, the Yamaha Corporation used it in their popular DX7 synthesizers and the
synthesis technique quickly attained “celebrity status.”
In a nutshell, FM consists of using one audio wave (the modulator) to continuously
vary (modulate) the frequency of another audio wave (the carrier). The amount of
change induced in the carrier frequency is sometimes referred to as the “peak devia-
tion” and it is proportional to the amplitude of the modulator. When the frequency
of the carrier is in the audio range (between 20 and 20 kHz), but the frequency of
the modulator is in the subaudio range (⬍ 20 Hz), our ears are able to track the
continual deviation in the carrier frequency. The result is a common musical effect,
called vibrato. If the frequency of the modulator is in the audio range, however, we
can no longer track the deviation induced in the carrier. Instead of hearing the carri-
er’s frequency change, we perceive a change in its timbre, which is due to the genera-
tion of “sidebands”—new frequency components that appear on either side of the
carrier frequency (both above and below), at intervals of the modulating frequency.
The number of sidebands generated is directly proportional to the amplitude of the
modulator. This phenomenon makes FM a useful and efficient technique for the syn-
thesis of complex spectra. The theory of FM synthesis is complicated and since it
has been explained in great detail elsewhere, I will not attempt to do more than
summarize the most important aspects here.
In his now famous article, John Chowning gives the following formula for FM,
using simple sine waves for both carrier and modulator:
262 Russell Pinkston
where:
䡲 A is the amplitude of the carrier
䡲 wc is the frequency of the carrier in radians/second
䡲 I is the Index of Modulation
䡲 wm is the frequency of the modulator in radians/second
In Chowning’s formula, the amplitude of the modulator is specified using the vari-
able I, which stands for the “Index of Modulation.” Chowning states that the Index
of Modulation is approximately equal to the number of “significant” sidebands gen-
erated and he defines it as the peak deviation in the carrier frequency divided by the
frequency of the modulator, that is,
D
I = (12.2)
M
where:
䡲 I ⫽ Index of Modulation
䡲 D ⫽ Peak deviation of the carrier frequency in Hertz
䡲 M ⫽ Frequency of the Modulator
In simple FM, the nature of the spectrum generated (i.e., the position of the side-
bands) is determined by the relationship of the carrier frequency ( fc ) to the modulator
frequency ( fm ), while the richness of the spectrum (the number of sidebands present)
is proportional to the amplitude of the modulator. Consequently, one can easily ob-
tain a wide variety of rich and time-varying spectra, simply by controlling the rela-
tionship of carrier and modulator frequencies precisely and gating the modulator.
Whenever the relationship between the carrier and modulator frequencies can be
expressed as a ratio of integers (1:1, 2:1, 3:2, 5:4, etc.), the sidebands belong to a
harmonic series and there tends to be a clear sense of fundamental pitch. When the
relationship cannot be reduced to a ratio of integers, the spectrum will be inharmonic
and sound either clangorous (metallic) or noisy, especially with high indices of
modulation.
EXPSEG
(kenv)
ivar dev
imindev
(kmodamp)
imodhz
OSCILI
1
(amodsig)
iamp
icarhz
OSCILI
1
(acarsig)
Figure 12.2 Orchestra code for instr 1201, a simple Chowning FM instrument with modula-
tor and carrier sharing the same expseg envelope generator.
264 Russell Pinkston
f 1 0 2048 10 1 ; SINE
; ST DUR AMP CARHZ MODHZ NDX NDX1 RISE DECAY
i 1201 0 .6 20000 440 440 5 0 .1 .2 ; BRASS
i 1201 1 .6 20000 900 300 2 0 .1 .2 ; WOODWIND
i 1201 2 .6 20000 500 100 1.5 0 .1 .2 ; BASSOON
i 1201 3 .6 20000 900 600 4 2 .1 .2 ; CLARINET
i 1201 4 15 20000 200 280 10 0 .001 14.99 ; BELL
Figure 12.3 Score file for instr 1201, using Chowning’s recommended settings for imitating
acoustic instruments—Brass, Woodwind, Bassoon, Clarinet, and Bell.
Figure 12.3 shows a sample score for instr 1201 in which a variety of acoustic
instrument timbres are imitated.
Clearly, instr 1201 is a straightforward implementation of simple FM, similar to
the one provided in the Chowning article, and the score contains the parameters used
in Chowning’s own example sounds. The instrument, however, is so limited as to be
almost useless in actual practice. For one thing, there is no separate envelope pro-
vided for the modulator, a highly desirable improvement. But most important, it
would be awkward to have to specify the exact frequencies of carrier and modulator
in Hertz for every note. Since it is the relationship between the carrier and modulator
frequencies that governs the nature of the spectrum, it would far more convenient to
specify a single basic pitch for each note and have Csound compute the actual carrier
and modulator frequencies based on the desired ratio. Indeed, Csound provides an
opcode called foscil that does this automatically.
ar foscil xamp, kcps, kcarfac, kmodfac, kindex, ifn[, iphase]
The parameters of foscil are similar to those of oscil, including parameters for
amplitude (xamp), fundamental frequency (kcps), the number of the table containing
a stored function (ifn) and the optional initial phase (iphase). In addition, there are
parameters for carrier and modulator factors (kcarfac and kmodfac) that are multi-
plied by the kcps parameter to produce the actual carrier and modulator frequencies
and a modulation index variable (kindex), which is multiplied by the modulator fre-
quency to produce the peak deviation in the carrier, according to the formula (D ⫽
I * M). There is also an interpolating version called foscili, analogous to oscili. Using
foscili, instr 1202, an improvement of instr 1201, can be written as shown in fig-
ures 12.4–12.6.
Phase Modulation
LINSEG LINSEG
(kamp) (kndx)
icfac
ihz0 imfac isine
FOSCILI
(asig)
Figure 12.4 Block diagram of instr 1202, an improved Chowning FM instrument using
Csound’s foscili opcode.
Figure 12.5 Orchestra code for instr 1202, an improved FM instrument with independent
carrier and modulator envelopes.
backs. These drawbacks are associated with the fact that a conventional digital os-
cillator is, in effect, an integrator. It produces an output sample from its current phase
location (position in the wavetable), then increments the phase by a value propor-
tional to the current frequency input and finally stores the result until the next output
is required. Consequently, its current phase is always equal to the sum of all previous
increments. When we change the frequency of an oscillator by adding the constantly
changing output of another oscillator to it, we may introduce unwanted side-effects,
such as DC bias and carrier “drift,” which may linger indefinitely. These side-effects
don’t tend to become apparent until one tries to implement more complex modulation
266 Russell Pinkston
i 1202 6 . . 8.09 3 2 4 . .
. . .5 . . 1
; THREE NOTES SLIGHTLY DETUNED AND SLIGHTLY STAGGERED ENTRANCES
i 1202 8 6 10000 6.00 1 1.4 10 3 1.9
.7 1 0 3 1.9 .25 1
i 1202 8.005 . . . . . . . .
. . . . . . . 1
i 1202 8.012 . . . . . . . .
. . . . . . . -1
Figure 12.6 Score file for an improved Chowning FM instrument, instr 1202, with paramet-
ric control of virtually all parameters including detuning of carrier and independent envelopes
for modulation-index and carrier amplitude.
schemes, such as stacked modulators, or feedback, both of which are common in so-
called FM synthesizers, such as the Yamaha DX7.
Consider the case of feedback, for example, which would seem to be impossible
with classic FM: if an oscillator is frequency modulating itself and the sum of its
basic (carrier) frequency and the current deviation ever equals zero (which will occur
if the modulation index is ever greater than or equal to 1), the oscillator will stop in-
crementing and never start again. In fact, the Yamaha DX7 is only able to use feed-
back because it doesn’t really implement classic frequency modulation, but a closely
related technique known as “Phase Modulation.”
In Phase Modulation, the instantaneous phase of the carrier oscillator is altered by
the modulator, not its frequency. The spectrum produced is virtually identical to FM,
but without the undesirable side-effects. In fact, the formula from Chowning’s article
given above is actually the one for phase modulation, rather than frequency modula-
tion, so we can implement it more or less verbatim. In that formula, the amplitude of
the modulator (Index of Modulation) is assumed to be a value in radians, as opposed
to Hertz (cycles per second). Hence, we needn’t concern ourselves with Chowning’s
I ⫽ D/M equation, as we must with classic FM; we simply plug in the desired index
directly, as the amplitude of the modulator.
To implement Phase Modulation (PM) in Csound, we must use a tablei opcode
for the carrier, instead of oscili, since we need to access the internal phase directly.
The tablei opcode will reference a single cycle of a sine wave for its stored function,
267 FM Synthesis in Csound
just like oscili and it must be used in conjunction with a phasor opcode, which
converts an input frequency argument into a moving “phase” value. (Note that the
output of phasor is not a value in degrees or radians, as might be expected, given its
name. Instead it moves from 0 to 1 repeatedly, once per cycle and hence is appro-
priate for use as a normalized index for table opcodes, which may have functions of
arbitrary length. This implies, however, that we must divide the desired “Index of
Modulation” (amplitude of the modulator) by 2, to convert it from radians to a
normalized table index, as well) (see figures 12.7 and 12.8).
As you can see, instr 1203 is virtually identical to instr 1202, except that the foscil
opcode is replaced by an oscili, tablei, and a phasor, implementing Phase Modula-
tion instead of Frequency Modulation. In fact, we can use the same score data from
figure 12.6 and compare the sound of the two instruments. Note that the FM and PM
instruments sound virtually identical using the same score, which is to be expected
in simple modulation involving one carrier and one modulator. The advantages of
Phase Modulation, however, become apparent in more complex algorithms.
OSCILI
1
(amodsig)
ihz0 icfac LINSEG
(kamp)
PHASOR
(acarphs)
isine 1 0 1
TABLEI
(asig)
Figure 12.8 Orchestra code for instr 1203, a phase-modulation instrument with similar pa-
rameters and design to instr 1202.
269 FM Synthesis in Csound
Figure 12.9 Orchestra code for instr 1204, a phase-modulation instrument with stacked
modulators.
270 Russell Pinkston
f 1 0 2048 10 1 ; SINE
f 2 0 513 5 .001 513 1 ; EXPONENTIAL RISE
Figure 12.10 Score file for phase instr 1204, phase-modulation with stacked modulators in
which some parameter data is passed to the instrument via an f-table ( f 3) that employs a non-
normalized GEN2 (⫺2) subroutine.
loop:
; READ OPERATOR PARAMETERS
ilvl table 0,iopfn ; OPERATOR OUTPUT LEVEL
ivel table 1,iopfn ; VELOCITY SENSITIVITY
iegr1 table 2,iopfn ; EG RATE 1
iegr2 table 3,iopfn ; EG RATE 2
iegr3 table 4,iopfn ; EG RATE 3
iegr4 table 5,iopfn ; EG RATE 4
iegl1 table 6,iopfn ; EG LEVEL 1
iegl2 table 7,iopfn ; EG LEVEL 2
iegl3 table 8,iopfn ; EG LEVEL 3
iegl4 table 9,iopfn ; EG LEVEL 4
iams table 10,iopfn ; AMPLITUDE MOD SENSITIVITY
imode table 11,iopfn ; OPERATOR MODE (FIXED OR RATIO)
ifreq table 12,iopfn ; OPERATOR RATIO OR FREQUENCY
idet table 13,iopfn ; DETUNE
irss table 14,iopfn ; RATE SCALING SENSITIVITY
Figure 12.11 Orchestra code for instr 1205, an emulation of the Yamaha DX-7 Synthesizer:
algorithm 16, featuring a 6-operator loop and the use of ihold and iturnoff for sustain.
273 FM Synthesis in Csound
; INITIALIZE OPERATOR
ihz ⫽ (imode ⬎ 0 ? ifreq : ibase * ifreq) ⫹ idet/idetfac
iamp ⫽ ilvl/99 ; RESCALE TO 0 -⬎ 1
ivfac table ivel,ivsfn ; VEL SENSITIVITY CURVE
; SCALE EG LEVELS TO OP OUTPUT
LVL
iegl1 ⫽ iamp*iegl1
iegl2 ⫽ iamp*iegl2
iegl3 ⫽ iamp*iegl3
iegl4 ⫽ iamp*iegl4
; FACTOR IN VELOCITY
iegl1 ⫽ iegl1*(1-ivfac)⫹iegl1*ivfac*iveloc
iegl2 ⫽ iegl2*(1-ivfac)⫹iegl2*ivfac*iveloc
iegl3 ⫽ iegl3*(1-ivfac)⫹iegl3*ivfac*iveloc
iegl4 ⫽ iegl4*(1-ivfac)⫹iegl4*ivfac*iveloc
i1egl3 ⫽ iegl3
i1egl4 ⫽ iegl4
i1ams ⫽ iams
i1hz ⫽ ihz
iop ⫽ iop ⫹ 1
iopfn ⫽ iop2fn
igoto loop
i5egl4 ⫽ iegl4
i5ams ⫽ iams
i5hz ⫽ ihz
iop ⫽ iop ⫹ 1
iopfn ⫽ iop6fn
igoto loop
op6:
i6egd1 ⫽ iegd1
i6egd2 ⫽ iegd2
i6egd3 ⫽ iegd3
i6egd4 ⫽ iegd4
i6egl1 ⫽ iegl1
i6egl2 ⫽ iegl2
i6egl3 ⫽ iegl3
i6egl4 ⫽ iegl4
i6ams ⫽ iams
i6hz ⫽ ihz
; SIMPLE LFO
kvary expseg .001,idelay,1,1,1
klfo oscili kvary,kvary*ilfohz,isine ; LFO
kvib ⫽ 1⫹klfo*ivibwth
until the envelope of the carrier operator has reached zero before issuing the turnoff
statement. Depending on which DX7 algorithm is being implemented, there may be
more than one operator serving as a carrier, in which case it would be necessary to
test them all. Note that an alternate method of extending the duration of a note be-
yond p3 of the score is to use an opcode such as linenr.
The implementation of the envelope generators in this instrument is somewhat
complicated. Yamaha envelopes consist of four levels, or breakpoints and four rates
(not durations). The same 0–99 scale is used for both rates and levels, which must be
mapped to usable values. As it happens, the rate mapping for rise segments is quite
different than the one for decay segments. Moreover, the rates must be converted
into durations, as required by Csound envelope generators. Finally, although Yama-
ha’s diagrams of its envelope generators show the segments as linear, they really are
not. Consequently, for the sake of simplicity, I used linseg opcodes for the timing of
segments, then table opcodes to map the linear amplitudes onto the Yamaha operator
output levels. The amplitude scaling data were provided in the Chowning/Bristow
book. Note that a different table must be used for any operator that will serve as a
modulator. The only difference between the two tables is that one has been divided
by 2, so it is properly scaled for use as a normalized table index.
277 FM Synthesis in Csound
f 1 0 512 10 1
; OPERATOR OUTPUT LEVEL TO AMP SCALE FUNCTION (DATA FROM CHOWNING/BRISTOW)
f 2 0 128 7 0 10 .003 10 .013
10 .031 10 .079 10 .188 10 .446
5 .690 5 1.068 5 1.639 5 2.512
5 3.894 5 6.029 5 9.263 4 13.119
29 13.119
; RATE SCALING FUNCTION
f 3 0 128 7 0 128 1
; EG RATE RISE FUNCTION FOR LVL CHANGE BETWEEN 0 AND 99 (DATA FROM OPCODE)
f 4 0 128 -7 38 5 22.8 5 12 5
7.5 5 4.8 5 2.7 5 1.8 5 1.3
8 .737 3 .615 3 .505 3 .409 3
.321 6 .080 6 .055 2 .032 3 .024
3 .018 3 .014 3 .011 3 .008 3
.008 3 .007 3 .005 3 .003 32 .003
; EG RATE RISE PERCENTAGE FUNCTION
f 5 0 128 -7 .00001 31 .00001 4 .02 5
.06 10 .14 10 .24 10 .35 10 .50
10 .70 5 .86 4 1.0 29 1.0
; EG RATE DECAY FUNCTION FOR LVL CHANGE BETWEEN 0 AND 99
f 6 0 128 -7 318 4 181 5 115 5
63 5 39.7 5 20 5 11.2 5 7
8 5.66 3 3.98 6 1.99 3 1.34 3
.99 3 .71 5 .41 3 .15 3 .081
3 .068 3 .047 3 .037 3 .025 3
.02 3 .013 3 .008 36 .008
; EG RATE DECAY PERCENTAGE FUNCTION
f 7 0 128 -7 .00001 10 .25 10 .35 10
.43 10 .52 10 .59 10 .70 10 .77
10 .84 10 .92 9 1.0 29 1.0
; EG LEVEL TO PEAK DEVIATION MAPPING FUNCTION (INDEX IN RADIANS/2PI)
f 8 0 128 -7 0 10 .000477 10 .002
10 .00493 10 .01257 10 .02992 10 .07098
5 .10981 5 .16997 5 .260855 5 .39979
5 .61974 5 .95954 5 1.47425 4 2.08795
29 2.08795
; VELOCITY TO AMP FACTOR MAPPING FUNCTION (ROUGH GUESS)
f 9 0 129 9 .25 1 0
; VELOCITY SENSITIVITY SCALING FUNCTION (SEEMS LINEAR)
f 10 0 8 -7 0 8 1
; FEEDBACK SCALING FUNCTION (SEEMS LINEAR)
f 11 0 8 -7 0 8 7
; OPERATOR 1 PARAMS: OUTLVL KEYVEL EGR1 EGR4 EGR2 EGR3
f 12 0 32 -2 99 1 99 38 33 14
; EGL1 EGL4 EGL2 EGL3
99 0 80 0
; AMS DET FIXED? FREQ
0 0 1 1
; RSS
2
; OPERATOR 2 PARAMETERS
f 13 0 32 -2 67 6 75 19 45 36
99 0 87 0
0 -2 0 11.22
2
; OPERATOR 3 PARAMETERS
f 14 0 32 -2 99 7 99 46 30 34
99 0 80 0
0 0 0 .5
0
Figure 12.12 Score file for instr 1205, in which function tables as well as note statement
p-fields are used to pass parameter data.
278 Russell Pinkston
; OPERATOR 4 PARAMETERS
f 15 0 32 -2 78 7 90 82 67 21
99 0 85 0
0 0 0 7
0
; OPERATOR 5 PARAMETERS
f 16 0 32 -2 99 4 99 8 64 0
85 0 48 0
0 0 0 3
0
; OPERATOR 6 PARAMETERS
f 17 0 32 -2 99 1 99 82 75 0
99 0 87 0
0 0 1 2570
0
Finally, note that other 6-operator FM algorithms may be implemented quite easily
with this same basic instrument. In fact, all 32 original DX7 algorithms have now
been implemented based on this instrument and they are available on the CD-ROM.
But other combinations of 6 operators are also possible. The only code that needs to
be modified is below the comment line “Algorithm-specific code.” The operators
must be rearranged, feedback implemented on the correct operator and the trick is to
remember to change the envelope output scaling functions appropriately, using
iampfn for any carrier, but idevfn for any modulator. Also, note that oscili opcodes
can be employed for any operator that isn’t being modulated. Thus, in this example,
which implements DX7 algorithm 16, operator 2 and operator 4 use oscili opcodes,
but the remaining operators must use the phasor/tablei pairs.
279 FM Synthesis in Csound
References
Chowning, J. 1973. “The synthesis of complex audio spectra by means of frequency modula-
tion.” Journal of the Audio Engineering Society 21(7).
Chowning, J., and D. Bristow. 1985. FM Theory and Applications—By Musicians for Musi-
cians. Tokyo: Yamaha Music Foundation.
Opcode Systems: DX/TX Editor/Librarian Manual. Appendix, pp. 108–114. Palo Alto, Calif.:
Opcode Systems, 1986.
Schottstaedt, W. 1977. “The simulation of natural instrument tones using frequency modula-
tion with a complex modulating wave.” Computer Music Journal 1(4): 46–50.
13 Granular Synthesis in Csound
Allan S. C. Lee
There are two granular synthesis unit generators in Csound. The first is called grain,
which was written by Paris Smaragdis and the other one is called granule, which
was developed by me.
The grain unit generator uses a function table as the source of a single stream of
grains with grain density controlled by the parameter xdens. A second f-table is used
for generating the envelope. Each grain starts at a random position within the source
f-table and sustains for a duration specified by kgdur.
The granule opcode embodies a high level approach and was developed to pro-
vide an easy way for composing music with granular synthesized sound. Multiple
voices and random offset of parameters are built into the generator. Wide sound-field
effects with multiple outputs can be produced by assigning an opcode statement to
each individual channel and setting a different random number seed for each state-
ment. In this chapter, background information on granular synthesis with a focus on
the implementation of the granule opcode and step-by-step working examples will
be presented.
Background Information
As described in the paper Introduction to Granular Synthesis (Roads 1988), the the-
ory of granular synthesis was initially suggested by Dennis Gabor in his paper Acous-
tical Quanta and the Theory of Hearing (Gabor 1947). In Gabor’s theory, any sound
can be described by sonic grains. This suggestion was mathematically verified by
Bastiaans (1985).
Since 1971, composers such as Lippe (1993), Roads (1978) and Truax (1988)
have been using many different techniques to synthesize sounds using grains. These
techniques range from dedicated software to custom built digital-signal-process-
ing (DSP) hardware. In the Computer Music Tutorial (Roads 1996), Roads outlines
282 Allan S. C. Lee
existing granular synthesis methods in five categories: Fourier and wavelet grids,
pitch-synchronous overlapping streams, quasi-synchronous streams, asynchronous
clouds, and time-granulated or sampled-sound streams with overlapped quasi-
synchronous or asynchronous playback.
Both the grain and granule opcodes fall into the last category listed above. The
opcodes read in a small chunk of sound data (normally from 1 millisecond to 100
milliseconds) from a source f-table and apply an envelope to it, then generate streams
of grains. In the case of the grain opcode, it reads a random portion of the source
sound data to produce sonic grains. The result is that the original sampled sound is
granulated and rearranged randomly with a grain density controlled by the parameter
xdens. In the case of the opcode granule, it reads a small part of the source data
linearly in the time domain, with or without random offset in the starting position,
to generate multiple streams of grains.
These 22 parameters give full control of the characteristics of the synthesis. And
although there are 22, most of these parameters are i-rate and thus the synthesis is not
as complicated as it appears to be. Their general function is illustrated in figure 13.2.
283 Granular Synthesis in Csound
voice 1
voice 2
voice 3
+ Output
voice ivoice
igskip in seconds
igskip_os igap_os
kgsize in seconds
kgap in seconds
Figure 13.2 An illustration of how the parameters of the granule opcode operate.
284 Allan S. C. Lee
The parameter xamp, which is similar to most of the other Csound unit generators,
controls the overall output amplitude.
The parameter ivoice defines the number of voices or streams to be generated. The
greater the value of ivoice, the higher the grain density that will be produced, hence
the richer the sound. It takes longer, however, to generate the output. (My suggestion
is to use ivoice with a value around 10 to experiment with the sound and to use a
higher value for production).
The parameter, iratio, defines the speed of the sampling pointer moving along the
f-table, relative to the audio rate defined in sr. For example, a value of 0.1 would
stretch (expand) an original one-second sample by a factor of ten, producing ten sec-
onds of output, whereas, a value of 10 would compress the same one-second sample
by a factor of ten producing 0.1 second of output.
The parameter imode controls the direction of the sampling pointer; a value of ⫹1
causes the pointer to acquire data from the source function table in the normal direc-
tion (forward), a value of ⫺1 makes the pointer acquire data from the source f-table
in reverse direction, a value of 0 will cause the pointer to acquire data forward and
backward randomly for each grain.
The parameter ithd defines the threshold value. Samples in the f-table below this
value will be skipped. The thresholding process is simply to compare the value of
each sample with the value ithd and to skip the sample if the value is less. It is
a simple design for skipping silent space within a sound sample, but it will cause
some distortion.
The parameter ifn is the f-table number of the source sound sample opened by an
f-statement in the score file.
The parameter ipshift is the pitch shift control. When ipshift is set to 0, the pitch
will be set randomly up and down an octave for each grain. When ipshift is set to 1,
2, 3, or 4, as many as four different pitch shifts can be set for the number of voices
defined in ivoice. When ipshift is set to equal 4, the value of ivoice needs to be set
equal to 4 or greater. The optional parameters ipitch1, ipitch2, ipitch3, and ipitch4 are
used to quantify the pitch shifts. Time scaling techniques, with linear-interpolation
between data points, are used to produce the pitch shift relative to the original pitch.
A value of 1 will generate the original pitch, a value of 2 will be one octave up and
a value of 0.5 will be one octave down.
The next three parameters igskip, igskip_os and ilength are designed for easy con-
trol of the precise location within the function being used as a source. As you know,
the GEN1 subroutine creates f-table in sizes of powers of two. This means that there
might be some zeroes or other unwanted data at the end of the f-table. The value of
igskip defines the starting point within the function table and ilength defines the
length of data to be used. Both parameters are measured in seconds. The sampling
285 Granular Synthesis in Csound
pointer of the unit generator moves from the starting point defined by igskip and runs
to the end of ilength before looping back to the starting point. The parameter
igskip_os provides a random offset of the sampling pointer in seconds; a value of 0
implies no offset.
The two parameters, kgap, gap size in seconds, and igap_os, the random offset in
% of kgap, define the gap or delay between grains within each voice stream. The
value of kgap can either be time-varying and generated by Csound functions or be set
to a constant value. When the value of igap_os is set to 0, no offset will be produced.
The two parameters, kgsize, grain size in seconds, and igsize_os, the random offset
in % of kgsize, define the size of each grain. As above, the value of kgsize can either
be generated by Csound functions or set to a constant value. If no random offset is
desired, set igsize_os to 0%.
The two parameters, iatt and idec, define the attack and decay of the grain envelope
in % of grain size.
The parameter, iseed, is optional; it is the seed for the random number generator,
the default value being 0. 5. In a multichannel design, using different values for each
output would generate a different random sequence for each channel, producing a
wide sound-field effect.
The final optional parameter, ifnenv, defines the shape of the grain envelope. The
default value is 0 and linear attack (iatt) and decay (idec) are used as described in
figure 13.3. A positive value will be interpreted as an f-table number and the data
stored in it will be used to generate the attack curve of the envelope. The decay curve
will be a mirrored image of the attack curve. If a full envelope image is stored in this
f-table, then iatt must equal 100% and idec 0% in order to generate a full envelope
for each grain, as illustrated in figure 13.4.
Working Examples
A series of working examples are defined and described below to demonstrate the
characteristics of the granule unit generator. The following is the listing of three
orchestra files: the first one, 1301.orc is for a single channel output, the second one,
1302.orc is for two channels and the third one, 1304.orc, is for four channels. The
linseg statement is used to generate a simple overall envelope of 5% attack and decay,
in order to eliminate “clicks” at the beginning and the end. All the parameters are
passed in from the score file. In the two channel version, two granule opcodes are
called. They share all the parameters, except iseed ( p20)—for which an arbitrary
value of 0.17 is added to give a different seed for each opcode. These different seeds
will generate different random offsets for all the offset parameters, providing a slight
286 Allan S. C. Lee
kgsize
Grain
iatt % idec %
100%
Figure 13.3 Linear grain envelope as defined by the iatt and idel parameters when ifnenv is
set to 0.
ifnenv
iatt % idec %
Figure 13.4 Symmetrical grain envelopes from function tables as specified by the ifnenv
parameter.
287 Granular Synthesis in Csound
LINSEG
(k1)
p4 p5 . . . p25
GRANULE
(a1)
OUT
Figure 13.6 Orchestra code for instr 1301, A simple granular synthesizer with linseg ampli-
tude envelope.
LINSEG
(k1)
p20+.17
p4 p5 . . . p25 p4 p5 . . . p19 p21 . . . p25
GRANULE GRANULE
(a1) (a2)
Figure 13.8 Orchestra code for instr 1302, a stereo granular synthesizer with a common
amplitude envelope.
288 Allan S. C. Lee
LINSEG
(k1)
p20+.17
p4 p5 . . . p25 p4 p5 . . . p19 p21 . . . p25
GRANULE GRANULE
(a1) (a2)
p20+.13 p20+.33
p4 p5 . . . p19 p21 . . . p25 p4 p5 . . . p19 p21 . . . p25
GRANULE GRANULE
(a3) (a4)
Figure 13.10 Orchestra code for instr 1304, A quad granular synthesizer with a common
amplitude envelope.
delay between all the grains; this is sufficient to generate a wide stereo sound field.
Obviously, in the case of the four-channel output, four granule opcodes are used and
each is given a different values for iseed.
The first set of examples use a simple 220 Hz sine tone as the source. The orchestra
and score files sine220.orc and sine220.sco, used to generate 10 seconds of a 220 Hz
sine tone, are shown in figure 13.11. The output of this orchestra and score should
be generated as an AIFF file that is named sine220.aif and placed in the samples
directory, SADIR, for use in the first set of granular synthesis examples.
The three notes from 1301.sco shown in figure 13.12 are used to generate three
distinctly different granular textures.
Obviously, a single-channel output can be generated by using the 1301.sco with
1301.orc; a stereo output can be generated from running 1301.sco with 1302.orc;
289 Granular Synthesis in Csound
f 1 0 524288 10 1
i 1303 0 10
Figure 13.11 Orchestra and score for generating a 220 Hz sine tone with 524288 points
of resolution.
Figure 13.12 The first score file for instr 1301, each of the three notes has a different grain
size and gap size.
and quad output by running 1301.sco with 1304.orc. (It should be noted, however,
that your hardware may not support quad playback.)
In the first ten second “note” from 1301.sco, the grain size is set to equal 5 milli-
seconds, the gap is set to equal 100 milliseconds and four different pitches are used.
Since the grain size is quite small, the sound is “grainy” and the pitches are not
noticeable. In the second 10 second “note” from 1301.sco, the grain size is increased
to 50 milliseconds. In this case, the different pitches are quite noticeable owing to
the greater grain size, but the result still sounds discontinuous because of the rela-
tively large gap. Finally, in the third ten second “note” from 1301.sco the gap is re-
duced to 10 milliseconds. The result is a smoother and richer granular sound.
The second set of examples use a sampled environmental sound as the source.
Four seconds of a mono soundfile, seashore.aif, sampled at 44.1 kHz, is used to play
the two notes from 1302.sco shown in figure 13.13.
Most of the parameters are the same as the first set of examples. Grain size is set
to 50 ms and the gap is set to 10 ms. But since the length of the source sound file,
seashore.aif, is 4 seconds, ilength is set to equal 4. Notice that in the first note, the
soundfile is repeated twice, since there are only 4 seconds of sound in the source and
the note duration is 8 seconds. In contrast, the second note, which has a duration of
290 Allan S. C. Lee
f 1 0 524288 1 “seashore.aif” 0 4 0
; GRAIN SIZE IS SET TO 50 MS, GAP IS SET TO 10 MS
i 1301 0 8 7000 12 1 1 0 1 4 0 0 4 0.01 30 .05 30 20 20
0.39 1 1.42 0.29 4 0
; GRAIN SIZE IS SET TO 50 MS, GAP IS SET TO 10 MS
i 1301 10 16 7000 12 .25 1 0 1 4 0 0 4 0.01 30 .05 30 20 20
0.39 1 1.42 0.29 4 0
Figure 13.13 Score file 1302.sco that granulates and time-scales a sound file.
f 1 0 524288 1 “female.aif” 0 4 0
; GRAIN SIZE IS SET TO 50 MS, GAP IS SET TO 10 MS
i 1301 0 10 6000 12 1 1 0 1 4 0 0 6 0.01 30 .05 30 20 20 0.39 1
1.42 0.29 4 0
; GRAIN SIZE IS SET TO 50 MS, GAP IS SET TO 10 MS
i 1301 12 10 6000 12 .25 1 0 1 4 0 0 6 0.01 30 .05 30 20 20 0.39 1
1.42 0.29 4 0
; GRAIN SIZE IS SET TO 50 MS, GAP IS SET TO 10 MS
i 1301 24 10 3000 48 .25 1 0 1 4 0 0 6 0.01 30 .05 30 20 20 0.39 1
1.42 0.29 4 0
Figure 13.14 The file 1305.sco for use with either 1301, 1302, or 1304.orc to granulize a
sample of a female singing voice.
16 seconds, does not repeat the soundfile at all, since iratio is set to 0.25. Imagine
that the rate at which the source sound samples are read is one-quarter of the audio
rate, hence for eight seconds of output, it only needs four seconds of source samples.
This is one of the typical applications of granular synthesis, to “stretch” or extend a
sound in the time domain; it brings out the inner details of the sound by slowing
down the tempo.
In the set of examples in figure 13.14, six seconds of a mono sound file, female.aif,
sampled at 44.1 kHz is used. It is the sound of a woman singing.
In 1305.sco, you will notice that most of the parameters are the same in the three
“notes,” except ivoice, iratio and xamp. The value of ivoice is set to equal 12 in the
first two “notes,” then it is set to 48 in the third. The higher value of ivoice in the
third “note” produces a smoother sound, but as mentioned earlier, will take much
longer to render. Also, notice that xamp in the third “note” is set lower to stop the
output from getting out of range. The iratio in the first note is set to 1 and then to
0.25 in the second and third “notes,” hence the tempo of these notes is slowed down.
Finally we switch to Csound grain opcode for a comparative example. In the
Csound Reference Manual, the arguments are given as follows:
ar grain xamp, xpitch, xdens, kampoff, kpitchoff, kgdur,
igfn, iwfn, imgdur[, igrnd]
291 Granular Synthesis in Csound
220 0 .05 2
2000 1000 20000 1 1
GRAIN
(a1)
Figure 13.15 Block diagram of instr 1306, a simple granular synthesizer using the grain
opcode.
f 1 0 8192 10 1
f 2 0 1025 20 2 1
i 1306 0 10
Figure 13.16 Orchestra and score for instr 1306, a simple granular synthesizer using the
grain opcode with a Hanning window envelope ( f 2) and a sinewave as the source ( f 1).
The example above uses a single cycle of a sine function as the source and a Han-
ning window as the envelope, to produce 10 seconds of granular sound. The grain
density, xdens is set to equal 1000 grains per second. The parameters xpitch and
kpitchoff are set to 220 Hz and 20,000 Hz respectively, producing a wide range of
frequencies at the output. The grain size (grain duration) kgdur is set to 50 milli-
seconds. The result is a beautifully dispersed granular texture—a “new-millennium
sample-and-hold” sound.
Conclusion
As demonstrated in the various examples above, both granular synthesis opcodes are
simple to use and understand, in principle; and both are capable of generating rich
and colorful textures. The results, however, depend on the type and nature of the
source samples. In using granular synthesis for composition, much effort should be
spent in choosing the source and then fine tuning the parameters to produce the
desired sound.
292 Allan S. C. Lee
References
Gabor, D. 1947. “Acoustical quanta and the theory of hearing.” Nature 159(1044): 591–594.
Lippe, C. 1993. “A musical application of real time granular sampling using the IRCAM
signal processing workstation.” Proceedings of the 1993 International Computer Music
Conference.
Roads, C. 1978. “Automated granular synthesis of sound.” Computer Music Journal 2(2):
61–62.
Roads, C. 1998. “Introduction to granular synthesis.” Computer Music Journal 12(2): 11–13.
Roads, C. 1996 . “Multiple wavetable, wave terrain, granular and subtractive synthesis.” Com-
puter Music Tutorial. Cambridge, Mass.: MIT Press.
Truax, B. 1998. “Real time granular synthesis with a digital signal processor.” Computer Music
Journal 12(2): 14–26.
14 FOF and FOG Synthesis in Csound
Michael Clarke
The fof (Fonction d’Onde Formantique) synthesis unit-generator is based on the syn-
thesis method originally developed by Xavier Rodet for the CHANT program at
IRCAM. It produces a series of partials, shaped into a formant region, that can be
used in building up a vocal or instrumental simulation. Since the unit-generator
works in the time domain, generating a sequence of excitations or grains, it can also
be used for granular synthesis, as well as for interpolation between timbral and gran-
ular synthesis. Whereas the grains in FOF synthesis are normally based on a stored
sine-wave, FOG (FOF Granular) synthesis (Eckel, Iturbide, and Becker 1995; Clarke
1996a, b) is designed specifically for use with waveforms derived from soundfiles.
Generating grains with many of the particular characteristics of FOF synthesis, the
fog control parameters are designed to facilitate the transformation of prerecorded
sounds. In Csound, fog can be used both as a form of time-stretching and as pitch-
shifting and to generate more radical granular transformations of the original
material.
Detailed descriptions of the theory of FOF synthesis can be found elsewhere (Rodet
1984; Rodet, Potard, and Barrière 1984). A brief description of the underlying theory
is given here as it may be helpful in understanding how the unit-generator works.
It is important to realize that, although FOF synthesis is often used to generate a
carefully shaped spectrum and the names of many of its parameters refer to the fre-
quency domain (e.g., fundamental frequency, formant frequency, bandwidth), it in
fact operates in the time domain, producing a sequence of excitations (grains com-
prising enveloped sine-waves.) The resulting spectral contour is shaped, perhaps
surprisingly, by adjustments to the time domain. In normal usage, FOF produces a
periodic sequence of excitations (though often modified by vibrato). The frequency
294 Michael Clarke
at which these excitations are generated is heard as the fundamental frequency of the
formant region. The output of the unit-generator is a set of overtones of this funda-
mental, whose relative amplitudes are shaped by a spectral envelope (figure 14.1).
It is the shape of the local envelope, that is, the amplitude envelope used for each
excitation, that determines the contour of the formant envelope in the frequency-
domain. In brief, the shorter the grain envelope, the broader the spectral envelope of
the formant region. Conversely, lengthening the grain envelope narrows the formant
region. Different aspects of the local envelope shape (its rise time and decay rate)
alter the shape of the formant region in precise ways, permitting detailed control over
the synthesized timbre.
Most natural timbres comprise several formant regions. In imitating these timbres,
it is therefore necessary to sum the output of several fof opcodes, each representing
a single formant region.
This section considers each argument in turn and examines its basic function. More
complex examples of FOF synthesis are described in my chapter From Research to
Programming to Composition, on the accompanying CD-ROM. The arguments of
the fof unit-generator are (figure 14.2):
ar fof xamp, xfund, xform, koct, kband, kris, kdur, kdec,
iolaps, ifna, ifnb, itotdur [, iphs[, ifmode]]
A single formant region can be produced by the orchestra and score shown in figure
14.3. Starting from this model, changes will be made to each of the input parameters
in turn to illustrate their basic operation.
䡲 xamp ⫽ amplitude
The constant amplitude of the initial instrument may be modified to take an envelope,
like a simple linseg function:
a2 linseg 0, p3*.3, 20000, p3*.4, 15000, p3*.3, 0
a1 fof a2, ....(as before)...
Amplitude
Bandwidth
–6dB
Skirtwidth
–40dB
Frequency
Fundamental Overtones
Figure 14.2 Time domain parameters for fof, which control its spectral envelope, the func-
tion which is sampled ( fna), the rise and decay waveform ( fub), the exponential decay time
(ktand), and the rise (krise), decay (kdec), and duration (kdur).
296 Michael Clarke
f 1 0 4096 10 1
f 2 0 1024 19 .5 .5 270 .5
i 1401 0 3
Figure 14.3 Orchestra file for instr 1401, an FOF synthesizer producing a single formant.
This parameter can also be used to move between timbral synthesis and granular
textures. Since FOF synthesis produces a rapid succession of (normally) overlapping
excitations or grains. The argument xfund controls the speed at which new excita-
tions are formed. If the fundamental is low these excitations are perceived as separate
grains. In such cases the fundamental is no longer a pitch but a pulse speed. If the
parameter is also varied randomly (perhaps using the rand unit-generator) so that a
regular pulse is no longer audible, xfund becomes the density of grain distribution.
The possibility of moving between pitch and pulse and between timbre and granular
texture is one of the most interesting aspects of fof. The following provides a simple
demonstration. The transformation from pulse to pitch will be most easily heard if
the note duration is lengthened to about 10 seconds:
a2 expseg 5, p3*.8, 200, p3*.2, 150
a1 fof 15000, a2, .....(as before)...
The linseg and fof code shown above produce a drop of six octaves; if the note is
sufficiently long, it should be possible to hear the alternate excitations fading out
toward the end of the example.
䡲 xform ⫽ formant frequency
䡲 ifmode ⫽ formant mode (0 ⫽ striated, non–0 ⫽ smooth)
The spectral output of an fof unit-generator resembles that of an impulse genera-
tor, filtered by a bandpass filter. It is a set of partials above a fundamental (xfund)
with a spectral peak at the formant frequency (xform). Motion of the formant can be
implemented in two ways. If ifmode ⫽ 0, data sent to xform has effect only at the
start of a new excitation. That is, each new excitation gets the current value of this
parameter at the time of creation and holds it until the excitation ends. Successive
overlapping excitations can have different formant frequencies, creating a richly var-
ied sound. This is the mode of the original CHANT program. If ifmode is nonzero,
the frequency of each excitation varies continuously with xform. This allows glis-
sandi of the formant frequency. To demonstrate these differences, a low fundamental
is used so that the granules can be heard separately. The formant frequency is audible,
not as the center frequency of a “band,” but as a pitch in its own right. Compare the
following, in which only ifmode is changed:
298 Michael Clarke
In the first case, the formant frequency moves by step at the start of each excitation,
whereas in the second it changes smoothly. A more subtle difference is perceived
with higher fundamental frequencies (note that the later fof parameters were changed
in this example to lengthen the excitations so that their pitch could be heard more
easily).
The xform argument also permits frequency modulation of the formant frequency.
Applying FM to an already complex sound can lead to strange results, but here is a
simple example:
acar line 400, p3, 800
index ⫽ 2.0
imodfr ⫽ 400
idev ⫽ index*imodfr
amodsig oscil idev, imodfr, 1
a1 fof 15000, 5, acar⫹amodsig, 0, 1, .003, .5, .1, 3, 1, 2, p3, 0, 1
Run this with a note length of 10 seconds. Notice how the attack of the envelope
of the granules lengthens. The shape of this attack is determined by the shape of ifnb
(here a sigmoid).
Next is an example of kband changing over time:
k1 linseg 0, p3, 10
a1 fof 15000, 2, 300, 0, k1, .003, .5, .1, 2, 1, 2, p3
299 FOF and FOG Synthesis in Csound
Following its rise, an excitation has a built-in exponential decay and kband deter-
mines its rate. The larger the kband, the steeper the decay; zero means no decay. In
the above example the successive granules had increasingly fast decays.
The next example demonstrates the operation of kdec. Because an exponential
decay never reaches zero, a terminating decay is necessary to prevent discontinuity
in the signal. The argument kdur determines the overall duration (in seconds from
the start of the excitation) and kdec sets the length of the terminating decay. This
decay therefore starts at the time determined by kdur - kdec. In this example, the
terminating decay starts early in the first granules and then becomes progressively
later and shorter. Note that kband is set to zero so that only the terminating decay
is evident.
k1 linseg .8, p3, .003
a1 fof 15000, 1, 300, 0, 0, .003, .9, k1, 2, 1, 2, p3
In the next example the start time of the termination remains constant, but its
length gets shorter, as does the grain itself:
k1 expon .3, p3, .003
a1 fof 15000, 2, 300, 0, 0, .003, .01⫹k1, k1, 2, 1, 2, p3
It may be surprising to find that for higher fundamentals, the local envelope deter-
mines the spectral shape of the sound. Electronic and computer music, however, have
often shown that parameters of music once considered to be independent (for ex-
ample pitch, timbre and rhythm) are in fact different aspects of the same phenome-
non. The inverse relationship of time and frequency also has a parallel in the
uncertainty principle of modern physics. In general, the longer the local envelope
segment, the narrower the band of partials around that frequency. The argument
kband determines the bandwidth of the formant region at ⫺6dB and kris controls the
skirtwidth at ⫺40dB. Increasing kband increases the local envelope’s exponential
decay rate, thus shortening it and increasing the ⫺6dB spectral region. Increasing
kris (the envelope attack time) inversely makes the ⫺40dB spectral region smaller.
The next example changes first the bandwidth and then the skirtwidth. The differ-
ence should be apparent:
k1 linseg 100, p3/4, 0, p3/4, 100, p3/2, 100 ; KBAND
k2 linseg .003, p3/2, .003,p3/4, .01, p3/4, .003 ; KRIS
a1 fof 15000, 100, 440, 0, k1, k2, .02, .007, 3, 1, 2, p3
In the first half of the note, kris remains constant while kband broadens then nar-
rows again. In the second half, kband is fixed while kris lengthens (narrowing the
spectrum), then shortens again.
300 Michael Clarke
Note that kdur and kdec do not shape the spectrum significantly. They simply tidy
up the decay so as to prevent unwanted discontinuities that would distort the sound.
For vocal imitations, these parameters are typically set at .017 and .007 and left
unchanged. With high (“soprano”) fundamentals it is possible to shorten these values
and save computation time (reducing overlaps).
䡲 iolaps ⫽ number of overlap spaces
Granules are created at the rate of the fundamental frequency and new granules
are often created before earlier ones have finished, resulting in overlaps. The number
of overlaps at any one time is given by xfund * kdur. For a typical “bass” note the cal-
culation might be 200 * .018 ⫽ 3.6 and for a “soprano” note 660 * .015 ⫽ 9. 9. The
fof opcode needs at least this number (rounded up) of spaces in which to operate. The
number can be over-estimated at no computation cost and at only a small space cost.
If there are insufficient overlap spaces during operation, the note will terminate.
䡲 ifna, ifnb ⫽ stored function tables
These two parameters identify two function tables. Normally, ifna, which defines
the waveform on which the granules are based, is a sine wave. Normally, ifnb is the
waveform used for the rise and final decay of the local envelope, typically a sigmoid.
Definitions for both of these functions can be found as “f 1” and “f 2”, respectively.
䡲 itotdur ⫽ total duration within which all granules in a note must be completed
So that incomplete granules are not cut off at the end of a note, the fof opcode will
not create new granules if they will not be completed by the specified time. Normally
given the value of p3 (the note length), this parameter can be changed for special
effects; fof will output zero after time itotdur.
䡲 iphs ⫽ initial phase (optional, defaulting to 0)
Specifies the initial phase of the fundamental, which is normally zero, but giving
different fof generators different initial phases can be helpful in avoiding “zeros” in
the spectrum.
Vocal Imitation
In doing a vocal imitation, all the fof opcodes share a common fundamental fre-
quency, modified by a complex vibrato which is modeled on the CHANT program.
301 FOF and FOG Synthesis in Csound
Each fof has its own settings to determine the shape of its formant region. A list of
formant data for different vowels can be found in the appendix and in the Csound
Reference Manual. The output of all the fof generators is summed.
Transformations based on this vocal model can be found among the examples for
the CD-ROM chapter From Research to Programming to Composition.
FOG synthesis is essentially similar to FOF synthesis, but the unit-generator and its
control parameters are adapted for use in granulating soundfiles. Many of the fog
input parameters have a direct parallel with those of FOF synthesis but the names
reflect their orientation toward granular synthesis. The granulation of soundfiles has
been explored by a number of people, including Barry Truax (1994). Using the fof
algorithm as the basis for granulation has certain characteristic features, in particular
the shape of the local envelope and the precise timing of grains (Clarke 1996a). FOG
synthesis permits time-stretching and pitch shifting of the original sounds, although
with side effects (which can often be interesting compositionally), as well as the cre-
ation of new textures through the random variation of certain parameters.
The orchestra and score shown in figures 14.5 and 14.6. will be used as the basis
for a number of variations, which will demonstrate the basic functioning of the fog
unit-generator. Initially, those parameters which are significantly different from FOF
synthesis will be described and demonstrated.
䡲 ifna ⫽ stored function table
As in FOF synthesis, ifna is a stored f-table, but here this table normally contains
data from a soundfile. In the score above, note that the f-table reads data from the file
“basmrmba.aif.” This file, a bass marimba phrase, can be found on the accompanying
CD-ROM and should be used for running these examples.
302 Michael Clarke
sr FTLEN
(i1)
p5
PHASOR
(a2)
... ...
LINSEG FOG
(a3) (a1)
f 1 0 131072 1 “basmrmba.aif” 0 4 1
f 2 0 1024 19 .5 .5 270 .5
䡲 xspd ⫽ speed
The xspd argument does not have a direct parallel in FOF synthesis. It determines
the rate at which successive grains progress through the stored f-table (ifna). In FOF
synthesis, the sine wave which forms the basis of the excitations/grains is always
reset to zero phase (i.e., the start of the f-table) at the start of each grain. In FOG
synthesis, successive grains may begin to read from different places in the function
table. Thus xspd controls the rate of progression by means of an index moving be-
tween 0 and 1. To recreate the speed of the original soundfile, account must be taken
both of the sample rate (sr) and the length of the f-table. Hence, in the instrument
above, the line:
i1 ⫽ sr/ftlen(1)
303 FOF and FOG Synthesis in Csound
f 1 0 131072 1 “basmrmba.aif”0 4 1
f 2 0 1024 19 .5 .5 270 .5
; START DUR
i 1415 0 20
Figure 14.7 Orchestra and score code for dynamic speed change of FOG playback.
This is then modified by p5, which enables the speed of playback to be varied in
the score. A value of 1, as in the example above, recreates the original speed of the
soundfile. Lower values for p5 decrease the speed; higher values increase it. A nega-
tive value will reverse the direction of playback. (Try running the example above
with different values for p5 in the note statement of the score). The example shown
in figure 14.7 demonstrates the possibility of dynamically changing the speed.
䡲 xptch ⫽ pitch factor
The argument xptch results in a change of pitch. Whereas xspd determines the start
point for each successive grain reading from the f-table, xptch determines the speed
at which each grain progresses from this starting point. It does not therefore change
the general rate of progression through the f-table, but does change the perceived
pitch. A value of 1 results in the original pitch. In terms of the internal workings of
the unit-generator, there is a close parallel between the xptch parameter in FOG syn-
thesis and the xform parameter in FOF synthesis.
䡲 xdens ⫽ grain density
The argument xdens determines the rate at which new grains are generated and is
directly parallel to xfund in FOF synthesis. In the orchestra shown in figure 14.9, the
number of grains falls rapidly, resulting in the disintegration of the original sound
recording.
䡲 kband, kris, kdur, kdec ⫽ grain envelope shape
The arguments kband, kris, kdur and kdec all function as in FOF synthesis, con-
trolling the local envelope (the amplitude envelope of each grain). The original ex-
ample of FOG synthesis shown in figure 14.6 outputs the original sound unaltered.
This is in part because xptch ( p4) and xspd ( p5) are both set to 1. It also depends on
304 Michael Clarke
sr FTLEN
(i1)
LINSEG PHASOR
(a4) (a2)
15000 1 ...
LINSEG FOG
(a3) (a1)
Figure 14.8 Block diagram of instr 1416, a FOG instrument that “disintegrates” a sample.
f 1 0 131072 1 “basmrmba.aif” 0 4 1
f 2 0 1024 19 .5 .5 270 .5
; START DUR
i 1416 0 9
Figure 14.9 Orchestra and score code for instr 1416, as illustrated in figure 14.8.
the successive grains overlapping, so that the sound is continuous and the local enve-
lopes of the grains overlap, adding up to unity at all times: in effect they cancel each
other out. For this to work, careful coordination is needed of the parameters kris,
kdur and kdec, together with xdens. kris and kdec are both set to .01 (10 millisec-
onds): they are symmetrical, and if the decay of one excitation can be made to overlap
precisely with the rise of the next, they will cancel each other out. In this case this is
done by setting kdur (the duration of the envelope) to .02 and kdens to 100. A density
of 100 means that new grains will be created 100 times per second, that is, a new
grain will start .01 seconds after the last. Therefore, successive grains will overlap
by .001 seconds, the first .001 seconds of the new grain (corresponding to its rise
time) overlapping with the last .001 seconds of the previous grain (corresponding to
its decay time). The local envelope rise and decay therefore crossfade symmetrically,
305 FOF and FOG Synthesis in Csound
1/'xfund' = .01
(fundamental period)
'kris 'kdec'
=.01' =.01
('xfund' = 100)
canceling each other out and leaving the original signal unchanged. kband is set to
0, so that there is no exponential decay to disrupt the symmetry of the local envelope
(figure 14.10).
In general, the following conditions must be met for cancellation of envelopes to
work as demonstrated above:
䡲 kband ⫽ 0
䡲 kris ⫽ kdec
䡲 kdens ⫽ 1/(kdur - kdec)
䡲 ifnb ⫽ stored function table
The argument ifnb specifies the stored f-table used for the local envelope rise and
decay (as with FOF). It is read forward for the rise and backwards for the decay. In
the above examples it has been a sigmoid, as usual, but other shapes, for example a
linear envelope, may be used. The smoothness and continuity of the sigmoid, how-
ever, help in reducing the side-effects of the processing. (A symmetrical shape is
also important for the envelope cancellation described above.)
䡲 xamp ⫽ amplitude
Finally, xamp adjusts the amplitude of the output and functions as in fof. It is
dependent upon the number of overlaps and their envelopes.
306 Michael Clarke
References
Clarke, J. M. 1996a. “Composing at the intersection of time and frequency.” Organised Sound
2: 107–117. Cambridge, England: Cambridge University Press.
Dodge, C., and T. Jerse. 1985. Computer Music. New York: Schirmer.
Eckel, G., M. R. Iturbide, and B. Becker. 1995. “The development of GiST, a granular syn-
thesis toolkit based on an extension of the FOF generator.” Proceedings of the International
Computer Music Conference. Banff, Canada: International Computer Music Association, pp.
296–302.
Rodet, X., Y. Potard, and J.-B. Barrière. 1984. “The CHANT project: From the synthesis of
the singing voice to synthesis in general.” Computer Music Journal 3: 15–31.
Truax, B. 1994. “Discovering inner complexity: time-shifting and transposition with a real
time granulation technique.” Computer Music Journal 2: 38–48.
15 Processing Samples with Csound’s
FOF Opcode
This tutorial presents a practical introduction to the powerful fof synthesizer imple-
mented in Csound. It is a practical tour of the unit generator, focusing on some of its
most powerful aspects. As with granular synthesis, FOF synthesis allows the sound
designer to blur the perception of the listener by changing sound streams from con-
tinuous to discrete events and vice versa. In this primer, I have strayed from the pure
synthesis aspects of fof and incorporated digitized samples. It would be helpful to
be familiar with the fof syntax, but as long as the reader can compile Csound orches-
tras and scores he or she will be able to work through the material provided.
To begin with, the examples are kept simple, gradually gaining in complexity and
power. By the end of the primer, the user should be acquainted with many versatile
FOF techniques. The examples generally aim to synthesize traditional acoustic in-
struments. They will demonstrate that, by using digitized sound as the input to a fof
generator, little effort is required to infuse life into an otherwise dull sample loop.
This is not exactly technically challenging, but seems to be a problem for most manu-
facturers of commercial samplers and sample-based synthesizers.
Csound’s fof opcode, when used with samples, can be used to create realistic
acoustic structures with great economy of processing time. With more effort, truly
complex textures can be created. This is important for students and experimenters
alike, as it allows the speedy compilation of complex timbres with the minimum of
coding and memory. Generally, my source samples are mono and only 1 or 2 cycles
in duration.
FOF Syntax
f 1 0 4096 10 1
f 2 0 1024 19 .5 .5 270 .5
i 1501 0 10
Figure 15.1 Orchestra and score code for instr 1501, a simple FOF synthesizer.
The basic orchestra and score shown in figure 15.1 will help describe the syntax of
a basic fof synthesizer in Csound.
An outline of the fof opcode’s arguments and a brief description follows:
䡲 ar—Output type (audio or a-rate)
䡲 fof—ugen call
䡲 xamp—Overall amplitude (constant or k-rate)
䡲 xfnd—Fundamental frequency of fof oscillator or fof pulse train (constant or
k-rate)
䡲 xform—Formant frequency (constant or k-rate)
䡲 koct—Octaviation, fundamental octave deviation (A coefficient of zero is the ini-
tial octave. A value of 2 is 2 octaves down. 3.7 is an octave and a fifth down (con-
stant or k-rate).
䡲 kband—Bandwidth of the formant in Hz (constant or k-rate)
䡲 kris—Attack of fof grain envelope (constant or k-rate)
䡲 kdur—Duration of grain envelope (constant or k-rate)
䡲 kdec—Decay of grain envelope (constant or k-rate)
䡲 iolaps—Grain overlaps (This is calculated by xfund*kdur. This should be rounded
up and the effects of any k-rates on the fundamental such as vibrato or jitter must
be considered.)
䡲 ifna—Function-table (f-table) for formants specified in the score
䡲 ifnb—Function-table for grain envelope (kris, kdur, kdec), also specified in the
score
䡲 idur—Duration (If the score specifies a note of 10 seconds and idur specifies a
value of 7, the fof generator will cutoff at 7 seconds. This is useful when allowing
a reverb tail to decay without any direct sound.)
䡲 [iphs]—Initial phase of the fof generator
309 Processing Samples with Csound’s FOF Opcode
SOUNDIN
Below is the score layout for the fof examples presented in this chapter. We will use
GEN1 for the sine waves and sample imports and GEN19 for the fof grain envelopes.
; F# TIME SIZE 1 FILENAME SKIPTIME FORMAT CHANNEL
f 1 0 32768 1 “tambura.aif” 0 0 0
f 19 0 1024 19 .5 .5 270 .5
; p1 p2 p3 p4
i 1502 0 5 155
Where:
䡲 f #—a label to identify the f-table number
䡲 time—the action time of the f-table in beats
䡲 size—the number of sample frames in the table (This will be discussed in more
detail further on in the tutorial. The number of samples and the base pitch of the
digitized sound are related.)
䡲 1—the f-table has been post normalized, –1 means prenormalized or skip the nor-
malization process
䡲 filename—the name of the sample file to be input (Make sure that the filename is
surrounded by double quotes.)
䡲 skiptime—where you want the sample to start playing from, in milliseconds (The
skiptime is important when trying to get a clean cycle of the sample into the fof
granule; bad settings can lead to unwanted harmonics at high fof fundamentals
and clicks at lower pitches. Experiment with altering this parameter to achieve the
cleanest result. It can be used to create movement within a sound by having two
channels out of phase with each other, or alternatively to break up a speech sample
or such into separate regions, without having to go outside of Csound.)
䡲 format—a value of 0 means “take the sample file format information from the
header” of the sample
䡲 channel—the channel number to read into the table
310 Per Byrne Villez
Tambura Example
The first processing example is constructed using a short sample from an East Indian
instrument called the tambura. The tambura is a 4-stringed drone instrument in the
bass register used in Indian classical music (figure 15.2).
The wave is not particularly simple in its spectral content, but demonstrates how a
short snippet of sound can be looped effectively and smoothly. There is an important
relationship between the number of samples in the table and the original sampling
rate of the digitized sound. In the above example, the user will have noticed that the
number of samples is 32768. This is because the original sampling rate of the sample
Tambura.aif is 22050 Hz at 16 bits. For a sampling rate of 44100 Hz that number
would be doubled to 65536 Hz:
f 1 0 32768 1 “tambura.aif” 0 0 0
This relationship maintains the original pitch of the sample. If the above example
is compiled, the reader will not fail to notice that, even though the resulting timbre
has a smooth loop, it has little or no sonic interest whatsoever; it is completely life-
less. There are various ways in which we could animate the tambura. To realize this,
we modify the formant frequency parameter, xform:
a1 fof 5000, p4, 1, 0, 0, .003, .02, .005, 20, 1, 19, p3, 0, 1
f 1 0 32768 1 “tambura.aif” 0 0 0
f 19 0 1024 19 .5 .5 270 .5
i 1502 0 8 155
Figure 15.2 Orchestra and score code for instr 1502, an instrument that does FOF pro-
cessing of a Tambura sample.
311 Processing Samples with Csound’s FOF Opcode
xform = *1 xform = *2
Altering the formant pitch by using the formant frequency parameter in the orches-
tra, instead of using the number of samples in the score has the advantage of allowing
the parameter to be treated as a variable instead of a constant and consequently,
allowing the production of frequency domain effects such as frequency modulation
(FM), chorusing, delay and flanging. It works in the following way.
A quantity of 1 will produce the sample’s base pitch and 1 cycle of the sample per
grain. Notice I did not say fit, the latter would depend on the fof grain envelope
parameters. Setting the value to 2 would produce an octave above the base pitch and
therefore 2 cycles of the sample per grain. This logic of pitch unity applies upward
and not downward (e.g., 0.5) and it does not necessarily have to be a natural integer.
It could be a floating point number with a fractional part.
By using linseg or expseg envelopes, we can gradually change the formant fre-
quency. Doing this modifies the resonant qualities of the timbre against the funda-
mental, thus animating the sound:
ktwist linseg 1, p3*.7, 1.2, p3*.2, 1.1, p3*.1, 1
multiplying the base formant with a range from 36 to 1. The effect is similar to a
resonating lowpass filter with a tight Q (resonance) and could be used as a quick way
to produce classic analog textures. Using this process, however, has the effect of
decreasing the amplitude as the formant frequency rises. Some form of compensa-
tion is needed and the next example does just this. If the reader does decide to go for
higher multiples of the formant frequency, he or she must make sure that the sam-
pling rate can handle the resulting aliasing generated by such a process.
We have seen that dramatic modulation effects can be produced by changing the
formant frequency. Not all samples respond in this way, however. The recording of a
blown plastic juice bottle was patched into the following Csound instrument:
f 1 0 65536 1 “bottle.aif” 0 0 0
f 2 0 65536 1 “bottle.aif” .01 0 0
Note that f 2 is the same as f 1, except for the skiptime, which starts 0.01 seconds
later. Because f 1 is used in the left channel and f 2 in the right channel, we have just
created a phase difference between them. This changes the timbral characteristics of
the overall sound, thinning it out as such. It is as if we had applied subtle equalization
to the overall timbre as can be heard in 1505.orc and 1505.sco.
Again, because the main objective was to recreate the acoustics of the original
instrument, a classic FOF synthesis technique is employed.
Jitter
Jitter is useful for recreating the random fluctuations of pitch found in human singing
and acoustic instrument playing. It can be safely said that there are not many per-
formers who can keep a truly steady pitch on a sustained note. In fact, this is one
element that makes music more “human.” To imitate this, the randi random signal
generator is employed to modulate the fof fundamental:
k50 randi .01, 1/.05, .8135
k60 randi .01, 1/.111, .3111
k70 randi .01, 1/1.219, .6711
kjitter ⫽ (k50⫹k60⫹k70)*p4
313 Processing Samples with Csound’s FOF Opcode
The displacement and rate of change are just enough to promote the desired effect
and even though it is not truly a random modulation, it is sufficient for the purpose
of experimentation. The random signal generator is useful for a variety of textures.
It can be used to dismantle any sense of fundamental (3 to 30 Hz with displacements
of a quartertone upward) as shown in the files 1506.orc and 1506.sco. It can also
create the breathy noises (rates and displacements of 200 Hz and .4 of an octave
respectively) found in the files 1507.orc and 1507.sco and as we shall see later, it can
be used to create huge sonic washes.
Envelope kblow is used to control the overall pitch of the fundamental. It begins
at 1 Hz so that the attack portion of the blown bottle can be distinguished from the
sustained part of the sound. It then proceeds to the actual fundamental specified in
the score.
kblow linseg 1, p3*.02, 1, p3*.02, ifq, p3*.96, ifq
The arguments kdur and kdec control the amplitude characteristics of the fof granule.
They allow the initial attack of breath on the bottle to be heard. Subsequently these
synchronize with kblow, halving the envelope duration from 1 second to 0.5.
kdur linseg 1, p3*.02, 1, p3*.02, .5, p3*.96, .5
kdec linseg 1, p3*.02, 1, p3*.02, .5, p3*.96, .5
Overlaps
There is a special relationship between the fundamental, kdur and overlaps. Setting
these incorrectly can lead to longer processing times, or much worse, the computer
crashing outright. Overlaps are worked out with the following arithmetic:
overlaps ⫽ fundamental * kdur
The example in question requires 123.47 Hz, times the maximum kdur, 1. This
totals 124 overlaps. Still, it is not that simple, we have not yet taken into account the
effects of kjitter on the fundamental. This has been summed with kblow to form kf0:
kf0 ⫽ kblow⫹kjitter
A Room in a Box
Subtle artificial room ambiance is an effect that demands a lot of Csound coding.
The sweeping effects produced in the tambura (1503.orc) and analog (1504.orc)
examples worked well because they contained rich upper harmonics. The effect on
smoother timbres is far more subtle. “Air” is produced around the instrument, similar
to the result of a good microphone technique or the use of an ambient program from
an expensive digital reverberator. This is the result of phase differences set in the
score’s f-tables, together with the motions of ktwist and ktwist2 gradually deviating
the original formant pitch. The gradually changing formant pitches and the amount
of wavetable within each fof grain create this spatial blurring. Consequently, the
sound becomes thinner and more open, receding into the distance.
Choir Example
The sample in the following example uses the English word or spoken by a seven
year old boy. Besides the clash of the sample’s fundamental and the fof fundamental,
the resulting timbre is fairly simple as in 1508.orc and 1508.sco. Now compile the
files 1509.orc and 1509.sco. The multiple voice effect is the result of the jitter tech-
nique used previously, but in a new dimension. We saw how artificial imperfections
could be reproduced in a digital timbre by the use of artificial randomness. It is
this lack of dirt that makes so much computer animation look like, well, computer
animation. The files 1509.orc and 1509.sco use a faster modulation frequency. Be-
fore, we were modulating the fundamental at approximately 20 Hz with a fairly small
pitch deviation. Now we are using rates of between 110 and 122 Hz with a displace-
ment of .03 of an octave:
k50 rand .03, 110, .8135
k60 randi .03, 122, .5111
k70 randi .03, 104, .6711
kjitter ⫽ (k50⫹k60⫹k70)*p4
The resulting sound is a swarming choir cluster from only two modulated fof oscil-
lators. The extra movement is produced by gliding the fundamental between the
ranges of 49 Hz and 53 Hz, using two further envelopes:
kfund1 linseg 50, p3*.5, 49, p3*.5, 50
kfund2 linseg 51, p3*.5, 53, p3*.5, 51
For a further example of choralization, compare the previous instrument with the
files 1504.orc and 1504.sco. They are the same, except that the latter uses the analog
315 Processing Samples with Csound’s FOF Opcode
wave presented earlier. Clearly, this is a richer variation of the analog sweep pre-
sented earlier.
Octaviation
The next example uses the same sample as employed in the previous instrument. As
in the files that processed the bottle sample, the fundamental has been kept at 1 Hz
for part of the sound duration, it then glides up toward the fundamental ifq, perceived
in the score at 149 Hz:
kfund linseg 1, p3*.04, 1, p3*.3, ifq, p3*.66, ifq
Now, compare the above with the following example, 1511.orc and 1511.sco. It
sounds similar to the files 1510.orc and 1510.sco, yet it uses a different parameter
altogether. It is called octaviation and is one of the most powerful fof devices. Basic
octaviation gradually removes or introduces every other granule in the fof “pulse-
train,” or fundamental.
koct linseg 7, p3*.04, 7, p3*.3, 0, p3*.66, 0
a1 fof kenv, kf0, kforms, koct, 0, .003, kdur, kdec, 150, 1, 19, p3, 0, 1
The line starting koct substitutes the line starting with kfund. The parameter repre-
sents how many octaves to deviate from the original octave, 0. In this case it starts 7
octaves down at 1.1640625 Hz, the nearest to the previous example’s 1 Hz and moves
to the 0th octave at 149 Hz. The multiple voices are due to long fof grains overlapping
each other. This is caused by opening the fof grain’s amplitude envelopes to 1 second
(usually only 20 ms). This is long enough to smear new grains over existing ones.
Octaviation as Disintegration
The most-well known fof octaviation “trick” allows the sound stream to “evaporate”
in front of the listener. The following example, 1512.orc and 1512.sco does just this.
Here, the singing voice evaporates into drops of water. This file is different from the
others, in that, not only is it more complex but it also takes the pure synthesis ap-
proach to fof.
I started by analyzing recordings of a singing style originating in Tuva, Siberia
and Mongolia, which is similar to some Buddhist chants. At a workshop I attended
some years ago, I was told by a Mongolian singer that the technique originated from
Mongolian shepherds, who copied the sound of the wind whistling and howling
316 Per Byrne Villez
koct = 0
koct = 1
koct = 2
Figure 15.4 Octaviation.
between the mountain crevices. The singing style is called sygyt or khoomei. Melodies
are sung by reinforcing resonances within the singers nasal cavities. The effect can be
best described as a mid-range vocal drone with a piercing melody “whistled” on top.
Five different fof oscillators are used to create the necessary formants. This is the
usual number employed for good vowel imitation. Two of the formants are controlled
by formant amplitude and pitch-shifting envelopes:
The octaviation is carried out by a further envelope to take the fundamental down
in octave steps, from the frequency 108 Hz (0) to 0.4218 Hz (8).
317 Processing Samples with Csound’s FOF Opcode
As the fundamentals of each fof begin to octaviate, they are desynchronized by in-
dividual envelopes, which create disparate sound streams, just like individual drops of
water. This is done by multiplying the fundamental by a constantly changing variable,
in the case of a1, from 1 to 1.1.
kd1 linseg 1, p3*.3, 1, p3*.7, 1.1
kfreq1 ⫽ kfq*kd1
a1 fof kamp1, kfreq1, 317, koct, 80, .003, .02,
.007, 50, 1, 2, ip3, 0, 1
Normalization
One important feature of this exercise is to normalize the output levels. Details like
this are important when working with complex instruments. Often, the user cannot
predict what the overall effect of summing individual oscillator levels might be. The
numerical complexity in effecting this is great. Yet there is an elegant way to get
around it and in our example it is implemented in the following way:
䡲 Csound’s maximum output is 32000. We use this number as a constant
kconst⫽32000.
䡲 Change all minus signs to positive. The 5 formant levels in our example are 12,
kflvl, kflvl2, 26.9 and 45.1 (as we saw earlier, kfvl and kfvl2 are envelopes that
gradually change levels).
䡲 Divide 1 over the sum of all the available levels and assign to this result a klabel:
knorm ⫽ 1/(12⫹kflvl⫹kflvl2⫹26.9⫹45.1)
䡲 Individually multiply each level by kconst (32000) and multiply this result by
knorm:
kamp1 ⫽ (kconst*12)*knorm
Include this arithmetic in your instruments and your soundfiles will always be at their
maximum level, without distortion.
Finally, to add a sparkle to this drama we add some reverb, by using nreverb.
Complexity can be added to the signal by using two separate summed reverbs.
To allow the reverberation tail to decay after the fof generator, 3 seconds have been
subtracted from its idur, to force it to close 3 seconds earlier than specified by p2 in
the score. This is simply done by subtracting the tail length required, in seconds,
from the appropriate parameter.
koct linseg 0, (p3-3)*.5, 0, (p3-3)*.2, 6, (p3-3)*.3, 8
ip3 ⫽ p3-3
a1 fof kamp1, kfreq1, 317, koct, 80, .003, .02, .007, 50,
1, 2, ip3, 0, 1
Finally, a third of the treated signal ar (100*.03) is added to the fof signal a6.
Summary
Further possibilities can be explored using the techniques described in this chapter;
re-harmonization, re-melodification, sample amplitude control, FM, striated changes
to the formant frequency by changing ifmode. Furthermore, the recently imple-
mented fof2 opcode allows further control over elements like the phase of the sample.
This should permit time randomization and stretching. From the simplicity of the
first example to the relative complexity of the latter, the reader will be more aware
of the possibilities presented in using samples inside fof grains. I hope I have encour-
aged users to experiment further in this direction.
References
Jaffe, D. A. 1995. “Ten criteria for evaluating synthesis techniques.” Computer Music Journal
19(1): 76–78.
John ffitch
Random numbers are useful compositional and sound design aids. In this chapter we
consider the various random number generators available in Csound and with a mini-
mal amount of mathematics, we consider the ways in which they could be used in a
sonic design context.
When a mathematician uses the term random it has a precise meaning—that it is
not possible to predict the value before it is obtained. In fact, there are ways of testing
a series of values to see if they are truly random. This is not necessarily the definition
used by other people. The Oxford English Dictionary considers randomness to be
more concerned with lack of purpose, but captures the same essence. The critical
feature is that while a random event may not be predictable as a single action, there
are things that can be said about a sufficient number of such random events.
The major idea in considering a random value is that of a “distribution.” If we
continue to obtain random numbers by some mechanism, we can count the number
of times we get a number on the ranges of 0 to 10, 10 to 20 and so forth. It is
sometimes useful to see this process as placing the numbers in buckets labeled with
each range. If we plot a graph, where for each bucket we draw a bar proportional in
height to the number of times we got a number in that bucket, we will begin to see a
distribution. Such a graph is called a histogram. To get the true mathematical mod-
el, we have to shrink the range for each bucket and increase the number of values
we take.
An alternative way of thinking about this is that there is a large collection of poten-
tial values that we are just sampling (called technically the population) and the distri-
bution is just a way of saying how many numbers of each value there are in this
collection; what we are doing is to take samples from this collection. Many uses of
statistics are concerned with estimating what the distribution is from a sample; com-
mon examples are opinion polls and market research. A distribution has to be nor-
malized, as in the description above we have talked about taking larger and larger
322 John ffitch
numbers of sample values. The normalization is used to make the area under the
curve equal to 1 and so the area bounded by the curve and two particular values of x
is the probability that a value chosen at random from the population will fall between
these two values of x.
We can see now that there are a large number of methods of obtaining random
values, depending on the shape of the distribution graph. Of course, to make use of
this discussion there has to be a way in which we can obtain random numbers.
Strictly speaking, within a deterministic computer this is impossible. Lottery sys-
tems, for example, use noncomputational processes and we have to rely on an ap-
proximation called pseudo-random numbers. This is considered later in the chapter.
There are many uses of such statistics in music; a common one being the analysis
of the chorales of J.S.Bach in order to determine what progressions of notes are
characteristic of that composer. Then a score generation technique could be based on
these distributions of the next note, using a Markov chain algorithm. Within the
Csound context we are more interested in the use of random values drawn from a
known distribution. These are considered in this chapter.
A large number of people have attempted to create musical scores by deciding on
the next note by some random process. In general, this leads to tunes that are aimless
and have no direction. However it need not be like this. With careful choices of the
underlying distribution and subtle use of the values, it is possible to generate interest-
ing and meaningful music. Seminal here is Iannis Xenakis, who has frequently used
stochastic processes (random numbers) in his music (Xenakis 1992).
Uniform Distribution
The simplest distribution is the uniform distribution, when every value in a range is
equally likely. The graph of the distribution is just a rectangle (figure 16.1).
1/(xmax-xmin)
In Csound there are two ways in which one can use this distribution. The simplest
version is the opcode rand. In its audio format it generates values at audio rate with
the value xmin being the negation of xmax and range being a specified amplitude.
ar rand 1000
The rand opcode above would generate “white noise;” that is, noise with all frequen-
cies equally likely, with an energy of 1000/冑2. Such noise is a rich source for filtering
in subtractive synthesis. For example, instr 1601 takes white noise and filters it to a
band of frequencies centered on 333 Hz.
323 A Look at Random Numbers, Noise, and Chaos with Csound
1/ (xmax ⫺ xmin)
p4
LINE
RAND
(kband)
(a1) 333
BUTTERBP LINSEG
(abp) (aenv)
Figure 16.2 Block diagram of instr 1601, a subtractive synthesizer using white noise as
input.
Figure 16.3 Orchestra code for instr 1601, filtered noise with a “declicking” envelope.
324 John ffitch
In order to create a clean start and stop, it is useful to add a “de-clicking” envelope,
which removes any sudden amplitude jumps. Random sequences may start and stop
at any amplitude value, which will lead to clicking if not considered carefully.
This could be used, for example, to generate 7 seconds of white noise, which adds
an increasingly wide frequency band.
i 1601 0 7 30000 0 666
There is of course a k-rate version of this opcode and other versions of the uni-
formly distributed random number generator that generate random numbers at a
slower rate than the sampling rate, with interpolation (randi) or holding of values
(randh).
Normal Distribution
The distribution associated with a particular random process can be any curve for
which the area below it is 1 and so there are a large number of possible types of
random numbers. In fact, there is one particular shape of a distribution that occurs
frequently. It is called the normal distribution, the bell curve, and sometimes the
Gaussian distribution. This particular shape appears in a large number of physical
situations; we will consider just one.
If one tosses a single coin, then there are only two outcomes, it lands heads or
tails, ignoring the extremely unlikely case of staying on its edge. With an unbiased
coin there is an equal chance of a head or a tail. Now toss the coin twice; there are
four possibilities, which we can summarize as HH, HT, TH, and TT. If we award a
score of ⫹1 to a head and a ⫺1 to a tail, we can expect the most likely score to be
zero, but ⫹2 and ⫺2 are possible. If we toss the coin four times we would still get
zero as the most likely value, but there is a chance of other values. This distribution
is called the binomial distribution and is a discrete function, depending on the num-
ber of times we toss the coin. If we continue this tossing of the coin and remember
the average score, we would expect the answer to be zero, or very close, but many
other values can happen. As the number of coin tosses tends to infinity, this process
converges to the normal distribution. The curve can be described mathematically as
exp(⫺x2)/sqrt(2) and looks as shown in figure 16.4, where the tails of the distribu-
tion get arbitrarily close to zero as |x| gets large. The division by sqrt(2) is needed
to ensure that the total area under the curve is 1, as required for a distribution that
gives probabilities. In fact, there is a family of these bell curves based on the spread
of the curve. We cannot describe this by where the curve reaches zero, since it never
does. Instead the spread of the curve must be described by such things as the value
325 A Look at Random Numbers, Noise, and Chaos with Csound
of x, for which half the area lies in the range ⫺x to x. The commonest descriptions
of the spread of the normal distribution, or indeed any distribution, are called the
variance, calculated as the sum of the squares of the distance from the average
value as:
2
∫ ( x − mean ) f ( x )dx (16.1)
As this value is the product of two distances it represents an area; for many uses a
distance is preferred as a measure of the spread of the curve and so the standard
deviation which is the square root of the variance is defined, usually represented by
. The standard deviation is dimensionally a distance and so is a direct way of talking
about the spread of the curve. For practical purposes all the area under the curve is
within 3 standard deviations of the symmetry point.
This Gaussian distribution is extremely useful when we want values that concen-
trate on a special value, with increasingly less chance of being a long way away. A
simple application would be in a flexible chorusing effect. In instr 1602 a single pure
tone is generated at 333 Hz and then thickened by the addition of a second sine wave
on each channel at a frequency displaced by a random value drawn from the Gaussian
distribution. On each control period the additional frequency is adjusted. As we are
using the opcode gauss the values are zero on average and are much more likely to
be small than large. The parameter to the gauss opcode is a measure of the spread of
the bell-curve, the practical maximum value it can give.
In this case the additional pitch could be in the range of 300 to 366 Hz, but most
likely will be closer to 333 Hz. The inspiration for this idea is the typical chorusing
effect, but instead of using fixed frequencies, each voice is attempting to sing the cor-
rect pitch and is missing and then adjusting. The stereo effects on headphones can
be quite striking.
326 John ffitch
p5 p6
GAUSS GAUSS
(k1) (k2)
333 333
p4/2 333 p4/2 p4/2
Figure 16.5 Block diagram of instr 1602, a chorusing instrument with Gaussian “blur.”
f 1 0 8192 10 1
i 1602 0 2 10000 33 33
i 1602 ⫹ . 15000 33 33
i 1602 ⫹ . 20000 33 33
Figure 16.6 Orchestra and score code for instr 1602, an additive instrument with chorusing
featuring Gaussian deviation of center frequency around 333 Hz.
This particular distribution for random variables appears in almost all cases that
involve averaging and so is a good model of errors. For that reason this is also some-
times called the error curve. It describes, for example, the horizontal distance from
the center that arrows fall when shot at a target, that is, the horizontal error in the
shot; the breadth of the distribution is a measure of how good the archer is. Many
teachers see this distribution, or a similar one, in examination marks.
It is important to realize that probability is concerned with what we expect to hap-
pen before the event, not what has happened. Just because we see a value far away
from the average, in a Gaussian distribution, we should not think something has gone
wrong with the process or model. Until we have attempted a large number of tests,
327 A Look at Random Numbers, Noise, and Chaos with Csound
or had a large number of parallel universes, we will not see the true distribution. This
concept is captured by the Law of Large Numbers, which informally says that the
more tests we run the better the approximation we will see to the underlying distribu-
tion. This does affect the manner in which Csound calculates the binomial distribu-
tion. In fact, the Csound algorithm uses the rectangular distribution and averaging.
This will be discussed in more detail below. It is necessary to realize, however, that
the calculations are only approximate. In particular, although all values between ⫺∞
and ⫹∞ are theoretically possible in a true Gaussian distribution, the Csound algo-
rithm cannot generate a random value outside the range specified by the k-rate
argument.
Both the distributions described in the first section above are symmetric, that is,
values above and below the average are equally likely. Not all distributions are like
this. We will consider the Poisson process, which is a model of how long one has to
wait for something to happen. In a Poisson process, each incident is independent of
previous incidents.
Consider some event that has a fixed chance of happening in the next minute, such
as catching a fish, or one’s bus arriving. If we assume that the probability is indepen-
dent of which minute it is (that is, we are ignoring rush hours in the bus example)
then we could ask how long do we have to wait, or the probability that a bus will ar-
rive in a fixed period.
A simple example can be seen by returning to the coin tossing. We decide to toss
the coin until it shows a heads and we will do this a number of times. Each attempt
is called a trial. We know that the chance for each toss is 1 in 2 (50%) but long runs
of tails are possible. Clearly in about half the times we try, we expect to get the head
on the first toss. If we get the required result, we finish the trial. If it is a tail, however,
we toss again and in half of the second throws we expect a head, which is a quarter
of all trials. We can draw a picture of this distribution in the same way that we did
for the normal distribution. It will fall away from an initial peak. This is called the
exponential distribution. If we do the same process, but we are considering a continu-
ous process, such as catching the fish, a similar shape will occur, but it will depend,
in detail, on the skill of the fisherman, which translates into the chance of catching a
fish in the next minute.
The mathematical form of this curve is a*exp(⫺ax). We are only concerned with
positive x. The parameter a is an encoding of the chance (or fisherman’s skill). The
average time we have to wait is 1/a.
328 John ffitch
The other related question is to ask how long we have to wait for a certain number
of events to occur, each one of which is independent of the others and equally likely.
This is slightly different from the earlier distributions in that it is a discrete one—it
only makes sense for the integer values. This model is described by the Poisson
distribution, in which the chance of getting the number n is given by the equation
exp()*(n )/n. The average value of this distribution is lambda and the shape is some-
what like that shown in figure 16.8.
If we consider a fisherman who catches fish at a random rate of a, then the number
of fish we would expect him to catch in a time t is given by the Poisson distribution
with ⫽ at.
Musical applications of these ideas are numerous, from modeling small fluctua-
tions within a single note of an instrument to large scale uses of score generation.
Both distributions are available in a Csound orchestra.
In the case of the Poisson distribution the k-rate parameter is the same as the
parameter in the above. For the exponential distribution the krange parameter is
actually 6.9078/a, as we could only expect 1 value in 1000 to be above this value
(and indeed the Csound implementation cannot get values above it).
A useful metaphor that can guide the use of random values within a score is that
of the way in which a crowd behaves. Xenakis describes how in a political demon-
stration each individual person behaves in an individual way, but is influenced by
higher level forces. For example, if one person starts to chant a slogan, there is a
chance that it will be picked up by others. Each event of a song starting will either
329 A Look at Random Numbers, Noise, and Chaos with Csound
die out if insufficient people join in, or will grow until it becomes the collective ac-
tion of the crowd. The crowd moves as individuals, but with an average motion to-
ward the goal. We can use this metaphor to control musical movement to a climax and
to control when individual voices join in a new idea.
Xenakis takes this process to great lengths, postulating stochastic processes to
generate all aspects of the music, pitch, timbre, speed, amplitude and note density. A
stochastic process is one in which all transitions are governed by random values
drawn from suitable distributions.
Csound provides all the distributions mentioned in this chapter so far, as well as
triangular distributions, Cauchy distributions, Weibull distributions, and the Beta
distribution. The shapes of these distributions are shown in figures 16.9 and 16.10.
The detailed reasons why these distributions are studied are beyond this introductory
text. The triangle is clearly useful when one wants values concentrated on a central
value with a finite spread of values and can be used as roles similar to the Gaussian
distribution, as can the Cauchy distribution, which has more values away from the
peak than does the Gaussian curve.
The Weibull distribution, t*x(t⫺1)*exp(⫺(x/s)t )/st, is quite variable in shape de-
pending on the parameter t. For t ⫽ 1 this is the same as the exponential distribution.
For values of t between 0 and 1 it has a concentration of probability toward zero. As
t gets larger, a hump appears looking like a distorted bell curve. The hump gets more
pronounced for larger t. Clearly the Weilbull distribution could be used to make
smooth transitions between exponential and Gaussian distributions. The parameter s
just scales the horizontal spread (figure 16.11).
The final distribution that is available within Csound is the Beta distribution. The
interesting cases are when the two parameters, a and b, are both less than 1, then the
distribution has its large values at the ends of the graph and is small in the middle. It
gives a distribution where one end or the other is the most likely value but the transi-
tion is smooth. The parameters control the relative likelihood of the end points. The
330 John ffitch
parameter a controls likelihood at zero and b at 1. The smaller the parameter the
more concentration there is on the related end point (figure 16.12).
and only the top 16 bits are used. This is usually provided in C so programs can be
ported that require random numbers, but in fact it is not a particularly good sequence.
There is a better sequence:
332 John ffitch
with particular initial 55 starting values, but that is not universally available. It seems
to have a long cycle before it repeats.
As there is a need for Csound-written pieces to be reproducible, it uses a 16-bit
pseudo-random number generator:
X n = 15625 X n−1 + 1 mod 2 16 (16.4)
Again this is not particularly good, in the sense that the sequence fails some of the
tests for randomness, but it has been used for many years and composers rely on it.
Whatever sequence is used, they all need at least one initial value, called a seed.
This can be set to a particular value to create a particular sequence, or a starting value
in a long cycle and so allow the recreation of a work even though there is a random
element. In Csound this seed can be reset by the opcode seed, which affects the se-
quence of all random variables, except rand, randi and randh. These last three op-
codes reset the sequence with an optional argument.
In a pseudo-random sequence there is a way of determining the next value, but it
is not very obvious. In a true random sequence, as Max Mathews so eloquently puts
it, “only God knows the next number.”
The pseudo-random sequence generated by the formula of the previous section give
an approximation to a rectangular distribution. In order to create pseudo-random
values for other distributions some new techniques are necessary.
As the Gaussian distribution is describable by an averaging process from a uni-
form distribution, it is possible to derive an algorithm for this distribution by aver-
aging. Csound averages 12 uniform random values, which is an acceptable
approximation. The other distributions use similar scaling and averaging algorithms.
For example, an exponential distribution can be constructed from the uniform distri-
bution by taking the logarithm of the random variable and there are methods of con-
structing triangular and other distributions by other simple techniques. The study of
random sequences and their construction is one of concern to statisticians and for
further information on random variables it is best to consult the literature there.
333 A Look at Random Numbers, Noise, and Chaos with Csound
Adding Noise
One of the problems of synthetic instruments is that they are too “clean” and “sterile”
sounding and as such, they lack the impact of physical instruments. Of course one’s
personal aesthetic may desire such a sound, but for many listeners, the purity of the
sound can be a distraction. One solution to this problem is to inject some elements
of noise into the instrument. This can be done in a wide variety of ways. Not only do
we have the range of different distributions available to us but we can use randomness
at various stages of the instrument.
As a simple example of this let us consider a basic frequency modulation bell
sound as shown in figures 16.13 and 16.14. There are a number of places where some
degree of randomness can be inserted. These range from adding a small random
element to the amplitude aenv, to randomizing the frequency amod and so forth back
through the instrument. It is worth experimenting with the different places, as well
as different distributions and ranges. This opens a whole field of potential sounds.
In addition, the bell can be given greater impact by adding an initial burst of noise
to simulate the action of the physical clanger. This needs to be short. In instr 1604
as shown in figures 16.15 and 16.16 simple white noise is added and filtered to be
related to the bell’s frequency. Similarly, when creating wind sounds, some noise to
mimic the sound of the breath escaping can add greatly to the creation of a “nonclini-
cal” sound.
Colored Noise
In a similar way to random values coming in different varieties, noise can also be
classified. It is usual to assign colors to the more interesting noises. We have already
seen that white noise, which is where the power density is constant over some finite
frequency range, can be generated by random values from a rectangular distribution,
as typified by the rand opcode. Noise in which the power density decreases 3 dB
per octave with increasing frequency (density proportional to 1/f) over a finite fre-
quency range is commonly called pink noise. The significant property of this kind of
noise is that each octave contains the same amount of power.
At the other end of possibilities there is blue noise, where the power density in-
creases similarly with frequency with a density proportional to f. This kind of noise
is said to be good for dithering.
These colors are clearly derived from analogy with the spectrum of color. The
other frequently encountered colored noise is brown noise, which is in reality not
334 John ffitch
p5
CPSPCH
7
(ifq2)
imax 1/p3
OSCILI
ifdyn
(adyn)
p5
p4 1/p3 CPSPCH OSCILI
if2
OSCILI 5 (amod)
ifenv (ifq1)
(aenv)
OSCILI
if1
(acar)
Figure 16.14 Orchestra code for instr 1603, a basic FM “bell” instrument.
335 A Look at Random Numbers, Noise, and Chaos with Csound
p5
noisend
.5 idur
CPSPCH 7
(ifq2) TIMEOUT
imax 1/idur
100
LINSEG
OSCILI
ifdyn (knenv)
RAND
(adyn)
(anoise)
RAND
(anoise3)
p5 iamp 100
iamp
1/idur CPSPCH OSCILI
if2 BUTTERBP
OSCILI 5 (amod) (anoise4)
ifenv (ifq1)
(aenv)
BALANCE
OSCILI (anoise5)
if1
noisend:
(acar)
2 .1
NREVERB
(arvb)
Figure 16.15 Block diagram of instr 1604, enhanced FM bell instrument with random varia-
tion of various parameters.
related to the color but to Brownian motion (originally the way in which a molecule
moves in a fluid and the process responsible for the milk spreading throughout a cup
of coffee without stirring). In Brownian noise the power density is proportional to
1/f 2; this is also the noise of a random walk.
There are claims in the literature that all good music has the distribution of fre-
quencies within the piece like pink noise. While this result is not universally ac-
cepted, it could be an interesting area to explore. Unfortunately, calculating pink
noise is not particularly easy; if you wish to explore this area, then refer to a standard
text on statistics. On the other hand, brown noise can be calculated from white noise,
by adding the random values together and using each partial sum as a value.
The algorithm shown in figure 16.17, due to Moore (and following Gardner
1978), will generate a set of 2n numbers with an approximately pink distribution
of frequency.
336 John ffitch
Figure 16.16 Orchestra code for instr 1604, an enhanced FM bell instrument mixed with a
filtered noise burst to simulate striker.
void pink_seq(int N)
{
float *r ⫽ (float*)malloc(N*sizeof(float));
int len ⫽ 1⬍⬍N; float range ⫽ 2.0/(float)N;
int lastn ⫽ len-1;
int n;
for (n⫽0; n⬍len; n⫹⫹) {
float R ⫽ 0.0;
int i;
for (i⫽0; i⬍N; i⫹⫹) {
int iup ⫽ 1⬍⬍i;
if ((iup & n) !⫽ (iup & lastn))
r[i] ⫽ ((float)rand()/(float) RAND_MAX - 0.5) * range;
R ⫹⫽ r[i];
printf(“%f\n”, R);
lastn ⫽ n;
}
}
Figure 16.17 Algorithm in C for the generation of “pink” noise.
337 A Look at Random Numbers, Noise, and Chaos with Csound
x ⫽ (16.7)
(1/1048576)e4 a9 cos(9ct)
-(65/1048576)e4 a9 cos(7ct)
⫹(335/262144)e4 a9 cos(5ct)
-(7797/1048576)e4 a9 cos(3ct)
-(1/32768)e3 a7 cos(7ct)
⫹(43/32768)e3 a7 cos(5ct)
-(417/32768)e3 a7 cos(3ct)
⫹(1/1024)e2 a5 cos(5ct)
-(21/1024)e5 a5 cos(3ct)
-(1/32)e a3 cos(3ct)
⫹a cos(ct))
x ′′ + x = e * x 3 (16.5)
t 0 400
Figure 16.19 Orchestra and score code for instr 1605, a cubic oscillator instrument via the
Duffing equation.
which shows cyclical behavior and from any starting values the solution evolves to
a circular evolution. For small e this transition is small, but for larger values it is
more abrupt.
Equations of this type, circular motion with nonlinear terms, are an interesting
area to investigate either musically or mathematically. But to quote Richard Strauss,
“beyond this point is chaos.”
References
Gardner, M. 1978. “White and brown music, fractal curves and 1/f fluctuations.” Scientific
American 238 (4): 16–31.
Xenakis, I. 1992. Formalized Music. Rev. ed. New York: Pendragon Press.
17 Constrained Random Event Generation
and Retriggering in Csound
Russell Pinkston
One of the most idiomatic uses of the traditional analog synthesizer was to create a
patch that generated a single event from a “trigger” and then to create a series of
such events automatically by using something like a voltage-controlled timer or pulse
generator, to produce the triggers at a controlled rate. Consequently, a complex series
of events could be initiated by the push of a button, or a single step in an analog se-
quencer and then treated as a single entity—a phrase, or a gesture. The gestures, in
turn, could be processed and transformed in various ways and combined with other
gestures to produce rich and complex contrapuntal textures. This capability encour-
aged composers to work at a higher level than the individual note and the music
produced tended to be more rhythmically free, timbrally varied and generally less
keyboard-oriented than some of the music produced using MIDI systems.
Csound provides a mechanism for generating multiple notes or events from a
single i-statement in the score. It involves reinitializing portions of an instrument
design during performance using the following Csound opcodes:
reinit start_label
rigoto target_label
rireturn
Figure 17.1 Block diagram of instr 1701, periodic reinitialization via the timout and re-
init opcodes.
Figure 17.2 Orchestra code for instr 1701, a periodic reinitialization instrument that beeps
4 times per second.
on every k-period. The opcode most often used for this purpose is timout, which
forces a branch to the target_label beginning at time istart, for idur seconds.
timout istart, idur, target_label
Figures 17.1 and 17.2 illustrate a simple example showing periodic reinitializa-
tion. The example will generate a series of .25 second tones (“beeps”), from a single
i-statement in the score, 4 per second, for as long as the instrument is playing. The
timout causes program execution to skip over the reinit statement, starting immedi-
ately (istart ⫽ 0) for exactly .25 seconds (idur ⫽ .25), which is just enough time for
the linen to complete its .25 second envelope. After that, however, branching stops
and the reinit is executed, which results in a temporary suspension of normal k-
rate processing.
A reinitialization pass begins at the timout statement, which has the label start:
and proceeds through the entire instrument until the endin statement is encountered.
Consequently, the timout opcode is reset, along with all the remaining opcodes in
the instrument block (since no rireturn statement is used in this example). When
reinitialization is complete, k-rate processing resumes, the timout causes the reinit
statement to be skipped for another .25 seconds, linen generates a new envelope and
341 Constrained Random Event Generation and Retriggering in Csound
p4 4
OSCIL
2
(kgate) 1000
OSCILI
1
(asig)
Figure 17.3 Block diagram of instr 1702, reinitialization via amplitude modulation with a
low frequency oscillator.
Figure 17.4 Orchestra code for instr 1702, amplitude modulation with LFO.
a new beep is produced. The process will continue until the i-statement’s duration has
expired and the instrument is turned off. Note that we could have placed a rireturn
immediately after the linen statement, since only the envelope generator actually re-
quires reinitialization here.
Of course, the identical effect could be obtained without the timout/reinit mecha-
nism, by simply using an oscil with an appropriate function to generate the envelope,
instead of the linen and giving it a frequency argument of 4 Hz (see figures 17.3
and 17.4).
As you can see, instr 1702 also generates a repeating beep, since the kgate oscil
references a function containing the shape of a complete linear envelope, which will
be repeated 4 times per second. In fact, for this trivial case, it would be both simpler
and more efficient than using timout and reinit. But there are circumstances in
which the reinitialization method of generating repeating events is preferable. Con-
sider the design variation shown in figures 17.5 and 17.6.
There are several important differences in instr 1703: an overall phrase-dynamic
(crescendo-diminuendo) is applied to the beeping, but more significantly, the rate of
beeping changes gradually from 10 Hz to 1 Hz over the course of the note (as kdur
moves from .1 to 1), while the duration of the beep is fixed (.1 second). This is an
342 Russell Pinkston
LINE continue: 0 p4 .1
(kdur) OSCIL1
start: 2
I (kgate) 1000
0 continue
OSCILI
TIMOUT LINEN 1
(kphrase) (asig)
start
REINIT
Figure 17.5 Block diagram of instr 1703, a reinit instrument with a “phrase” envelope.
Figure 17.6 Orchestra code for instr 1703, reinit instrument with phrase-envelope illustrated
in figure 17.5.
instance in which using timout and reinit is the only way to achieve the desired
result, because using an oscil to gate the signal (as we did in instr 1702) would re-
sult in an envelope whose shape and duration would be tied to the frequency of repeti-
tions—as the rate is slowed down, the envelopes would be longer and have more
gradual rise and decay shapes. Here, since we will reinitialize it for every beep, an
oscil1 can be used to generate the beep envelopes, whose duration and shape will be
independent of the rate of repetitions.
䡲 A linseg, expseg or envlpx opcode could also be used here to generate a fixed
length envelope shorter than kdur, but not linen, because it continues to decay (and
go negative) if executed longer than its duration parameter. The oscil1 opcode is
probably the most flexible and easiest Csound opcode to use for this purpose.
343 Constrained Random Event Generation and Retriggering in Csound
f 1 0 8192 10 1
; LINEAR ENVELOPE FUNCTION
f 2 0 512 7 0 41 1 266 1 205 0
; INS START DUR AMP
i 1703 0 4 10000
䡲 The statements that generate kphrase and kdur are above the start: label and hence
are not reinitialized along with the timout and oscil1. Consequently, they produce
values that change smoothly and gradually over the course of the entire note.
䡲 It should be noted that the timout opcode’s idur argument, which must be an i-time
expression, is obtained by using the karg function, which forces an i-time result
from the current value of the kdur variable.
Now that the basic reinitialization mechanism has been demonstrated, let us try a
more complicated example, one that incorporates some random elements. Let us say
we wanted to imitate the sound of popcorn popping. First, there is only an occasional
pop, then a few more at random intervals, then progressively more and more until
the popping is virtually constant, then progressively fewer and fewer pops occur
until, eventually, all silent and “it’s time for the butter.” Each pop has approximately
the same sound—same duration and amplitude and more or less the same timbre. In
the example instrument shown in figure 17.8, the “pops” are simply bandpass filtered
white noise with a short (.075 second) exponential envelope. The key difference
between this example and the previous ones is that the timout/reinit pair uses a
constrained random gap-time (time interval between pops), whose limits vary over
the course of the note.
An expseg is used to produce an exponential decay from p6 (the longest gap time)
to p5 (the shortest gap time) over the first half of the note ( p3/2), followed by an ex-
ponential rise back to p6 over the second half of the note. The actual gap time (igap)
between any two pops is a random value between .035 and .035⫹kvary seconds. As
kvary moves from p6 to p5 and back, the average gap time diminishes and then
increases again. The result is that pops occur infrequently at first, then gradually
become more frequent, then less frequent again, but always somewhat unpredictably.
Thirty seconds of popcorn can be made using the orchestra and score shown in
figure 17.9.
p5
INIT
(kvary)
p5 p6 p5 .5 p7
EXPSEG RAND
(krnd)
(kvary)
start: continue:
p4
I I
.5 RAND
(irnd) (anoise) 400 2
100
TIMOUT RIRETURN
start
REINIT
Figure 17.8 Block diagram of instr 1704, a reinit design with random elements simulating
popping popcorn.
instr
1704 ; A “POPCORN” SIMULATOR
kvary expseg
p5, p3/2, p6, p3/2, p5 ; VARY GAP DURS BETWEEN p5 AND p6
krnd rand
.5, p7 ; p7 IS RANDOM SEED
kvary init
p5 ; BEGIN WITH MAXIMUM POSSIBLE GAP
start: ; START OF REINIT BLOCK
irnd ⫽ .5⫹i(krnd) ; OFFSET IRND TO BETWEEN 0 AND 1
igap ⫽ .035⫹irnd*i(kvary) ; BETWEEN .035 AND .035⫹KVARY SECS
kbw ⫽ 50⫹100*irnd ; SMALL RANDOM VAR IN FILTER BW
timout 0, igap, continue ; SKIP REINIT FOR IGAP SECONDS
reinit start
continue: ; POPCORN ENVELOPE
agate expseg .0001, .005, 1, .07, .0001
rireturn ; END REINITIALIZING HERE
anoise rand p4 ; WHITE NOISE FOR POPS
asig reson anoise, 400, kbw, 2 ; BANDPASS FILTERING
out asig*agate ; APPLY ENVELOPE
endin
Figure 17.9 Orchestra and score code for instr 1704, a “popcorn” simulator as illustrated in
figure 17.8.
345 Constrained Random Event Generation and Retriggering in Csound
䡲 The expseg and rand opcodes that are producing kvary and krnd, respectively
must be outside the reinit block.
䡲 The assignment statement that produces kbw must be contained in the reinit block,
because even though kbw only gets updated at k-time, Csound will precalculate the
expression (50⫹100⫹irnd) at i-time, since it contains only constants and i-type
variables. In other words, if this expression were placed outside the reinit block
(directly in the reson opcode’s kbw field, for example), it would not change dur-
ing performance, even though the irnd variable was being changed owing to
reinitialization.
䡲 A rireturn is used immediately following the expseg producing the envelopes,
because nothing else needs reinitialization.
䡲 The bandwidth of the reson filter (kbw) is given a slight random variation, to make
the individual pops sound a little different.
The next example combines elements of everything covered so far (figure 17.10). A
single i-statement produces a complete musical phrase, or “gesture,” which has three
distinct parts (figure 17.11). Part 1 consists of two oscillators tuned an octave apart
that are randomly gated together, amplitude modulated with a low-frequency oscilla-
tor (LFO), and made to swell gradually. Part 2 is actually a variation of the popcorn
instrument, which generates a random sequence of tuned percussive sounds, the last
of which is forced to match the pitch of the final part of the gesture. Part 3 begins
with the last pop of part 2 and consists of a single oscillator that pans back and forth
rapidly from left to right, with the panning rate slowing as the sound decays. The
duration of the entire gesture is controlled by p3 of the i-statement, of course, but
the portion of p3 allocated to the individual parts of the gesture is determined by
factors in p9 and p10. An oscil1i is used to apply an overall panning curve, which is
stored in a function table (f-table) in the score.
䡲 In this instrument, timout is used for an additional purpose: it controls the timing
and duration of each part of the gesture, by causing execution to skip over a block
346 Russell Pinkston
Figure 17.10 Orchestra code for instr 1705, a three-part gesture instrument.
347 Constrained Random Event Generation and Retriggering in Csound
; INS ST DUR AMP PCH1 PCH3 SEEDS DUR FACS MIN MAX OCT PAN
; PT1 PT2 PT1 PT2 DUR N RNG FN
i 1705 0 10 32000 9.00 8.11 .4 .3 .4 .035 .04 4 1 4
i 1705 4 10 30000 9.08 7.09 .2 .1 .4 .05 .05 5 2 7
i 1705 8 10 26000 7.06 9.01 .8 .7 .4 .1 .06 7 3 6
i 1705 9 9 . 8.09 8.03 .6 .51 .3 .2 .06 4 3 5
Figure 17.11 Score code for instr 1705, a three part gesture instrument.
348 Russell Pinkston
of statements once they have completed their particular task. Hence the first ti-
mout, which is at the beginning of the code labeled part 1: waits for idur1 seconds
and then branches to the statement labeled part 2: for the remainder of the note
(p3). At the bottom of the code for part 1 is the statement kgoto end, which keeps
the code for parts 2 and 3 from being executed as long as part 1 is playing. Sim-
ilarly, the timout at the beginning of part 2 waits for idur2 seconds and then
branches to part 3 for the remainder of the note.
䡲 Since the last “pop” of part 2 is supposed to coincide with the beginning of part 3
of the gesture, it is necessary to execute the code for both part 2 and part 3 simulta-
neously for a brief period of time. This is accomplished with the use of a “flag,”
ilast, which is initialized to 0, but set to 1 when the last pop starts. The final state-
ment of the part 2 code section is if (ilast !⫽ 1) kgoto end, which means that part
3 will be skipped as long as the flag is 0, but as soon as it is set to 1, the final part
of the gesture commences.
䡲 A simple method is used to determine when the final note of part 2 is about to be
played. The variable itime is initialized to 0 at the beginning of the gesture. During
part 2, each time a new pop is generated, a new idur value is computed and added
to itime. As long as itime⫹idur is less than the total duration of part 2, idur2, we
have not reached the last note, so we can go choose a new random pitch and pro-
ceed as usual. But when itime⫹idur does equal or exceed idur2, we do not take
the conditional branch igoto choose, but instead drop through, set our ilast flag
equal to 1, trim idur to equal exactly the remainder of idur2, select the pitch of
part 3 for our final pop and then branch around the customary random pitch calcu-
lation. Note that all the branches must be igoto statements so that they are executed
during the reinitialization passes.
䡲 The rhythm of the percussive sounds in part 2 is randomized, but constrained to
adhere to a grid based on the smallest allowed durational value, imindur. This is
accomplished by generating a random number between ⫹/⫺ .5, adding .5 to it and
then multiplying by imaxnum, which must be some number greater than 1. The
result is a random number between 0 and imaxnum, which is stored in irndur.
Using the int function to take the integer part of that, multiplying the result by
imindur and then adding it to imindur, yields a random duration that is equal to
imindur plus N times imindur, where N is an integer between 0 and imaxnum.
Consequently, the pops in part 2 will occur at random intervals, but they will al-
ways fall somewhere on the rhythmic grid.
䡲 The pitches in part 2 are also randomly derived, but constrained to fall within a
designated band (ioctrng) centered around the pitch of part 1. A new random pitch
349 Constrained Random Event Generation and Retriggering in Csound
is selected for each new pop except for the last one, which is set to the pitch of
part 3.
The final example (figure 17.12) actually generates a short algorithmic composition
from a score containing just four i-statements (figure 17.13). The pitches, durations
and pan position of all notes are selected randomly, but with significant constraints.
The only pitch classes allowed for the two primary voices are those contained in a C
minor 7 chord (C, Eb, G, Bb) and the only durations allowed are .25 and .125—
quarter notes and 8th notes, if 1 second represents a whole note. The amplitude of a
note is related to its duration—the longer, the louder—which tends to make the
music sound syncopated, although not metrical. The net result is strongly rhythmic
music in C minor, which does not sound particularly random.
䡲 The three rand opcodes are above the reinit block so that they continue to produce
a steady stream of random numbers at the k-rate, over the course of p3. Whenever
the code that follows is reinitialized, the statements that use kdurloc, kpchloc and
kpan grab whatever their current values happen to be via the Csound i(kvalue)
function. Each rand must have a different iseed value, or the “random” sequences
produced by the three opcodes would be identical. Similarly, each i-statement
should pass a different set of seed values and table numbers, or the music produced
will be exactly the same each time.
䡲 The values of kdurloc and kpchloc are not used directly, but are first offset by .5
(to place them in the range 0–1) and then used as normalized indices into tables
containing a limited set of acceptable values. Consequently, although the choices
for pitch and duration are made randomly, the possible results are carefully
controlled.
䡲 The tables may be of various sizes (subject to Csound’s power of 2 rule) and are
filled with the acceptable choices for pitch and duration in the desired proportions.
For example, pitch function 3 ( f 6 in the example score) has eight locations and
contains four 7.03s, two 7.05s, one 6.10, and one field that is blank (zero). Conse-
quently, although short-term results may vary over time, about 50% of the ran-
350 Russell Pinkston
Figure 17.12 Orchestra code for instr 1706, an algorithmic music generator.
domly selected pitches will be 7.03, 25% will be 7.05, 12.5% will be 6.10 and
12.5% will be 0s, which produce rests.
䡲 When a rest is to be generated, all the statements involved with producing sound
(the knote expseg, pluck, etc.) are skipped, so no samples are generated or output.
The timout statement is still executed, however, in order to obtain the correct
duration of silence.
䡲 When a zero value is obtained from the duration table, it is effectively ignored,
since a timout with idur ⫽ 0 results in the succeeding reinit statement being
351 Constrained Random Event Generation and Retriggering in Csound
Figure 17.13 Score code for instr 1706, an algorithmic music machine. Note that tables are
used to “constrain” the random pitch, duration, and amplitudes.
352 Russell Pinkston
Conclusion
Using Csound instruments such as the Constrained Random Music Instrument (instr
1706) is probably not the best way to generate compositions algorithmically. Al-
though the Csound orchestra has some of the functionality of higher level languages,
it cannot compete with a general purpose programming language such as C or LISP
when it comes to implementing sophisticated custom algorithms. Indeed, the basic
architecture of all the Music N style languages, of which Csound is a direct descend-
ent, reflects the underlying assumption that the purpose of the orchestra is to generate
and process sounds that are triggered by individual notes in the score. The format of
the score, in turn, was deliberately kept simple and the task of generating and pro-
cessing was kept separate from the task of performing it.
For the most part, this division of labor makes perfect sense, especially with re-
spect to algorithmic composition. It would be far easier to write a stand-alone pro-
gram in C to generate a score from a complex algorithm than to try to implement the
same thing in orchestra code. Moreover, Csound provides the Cscore utility to make
the task of working with score data relatively straightforward. There are instances,
however, in which complex instruments such as the ones that have been described in
this chapter are both useful and convenient. The only serious limitation concerning
the generation of multiple notes within an instrument, using a retriggering mecha-
nism such as timout/reinit, is that such notes cannot overlap. Consequently, tech-
niques such as granular synthesis, which produce numerous overlapping “grains” of
sound, or the algorithmic generation of polyphonic compositions cannot readily be
implemented within a single instrument. On the other hand, using multiple instances
of instruments that generate individual monophonic voices can sometimes achieve
the desired results.
18 Using Global Csound Instruments
for Meta-Parameter Control
Martin Dupras
Control Instruments
p5 p6
LINSEG
(kfr)
p4
OSCIL
1
(asig)
Figure 18.1 Block diagram of instr 1801, a simple oscillator instrument with portamento.
f 1 0 8193 10 1
Figure 18.2 Orchestra and Score code for instr 1801, featuring “pp” directive to relate cur-
rent pitches to previous ones.
0 instr 1803
INIT LINEN
(gkfr)
(kenv) (gkfr)
instr 1802
LINSEG OSCIL
1
(gkfr) (asig)
Figure 18.3 Block diagram of instr 1802 and 1803, global frequency control.
f 1 0 8193 10 1
Figure 18.4 Orchestra and score code for instr 1802 and 1803, a global frequency control
pair. Note instr 1802 has no “local” output.
356 Martin Dupras
0 0
INIT INIT
(gkfr) (gklfo)
instr 1804 instr 1805 instr 1806
p4 p5
LINSEG LINEN (gklfo)
OSCIL
(gkfr) 1 (gkfr) 1
(kenv)
(gklfo)
OSCIL
1
(asig)
Figure 18.5 Block diagram of instr 1804, 1805, and 1806, a set featuring separate global
frequency and global LFO control instruments.
f 1 0 8193 10 1
OSCIL 1 p5 1 p5
1 (kpch)
(gklfo) iamp iamp iamp
Figure 18.7 Block diagram of instr 1807,1808, and 1809, an analog synthesizer model with
three oscillators (instr 1809), a global LFO (instr 1807) and pitch control (instr 1808).
gklfo init 0
gkpch init 0
gkad init 0
f 1 0 8193 10 1
f 2 0 2 2 1 -1
Figure 18.8 Orchestra and score code for instr 1807, 1808, and 1809, an analog synthesizer
model with global controls.
instr 1811 instr 1812
p4 p5 p4 p5
OSCIL OSCIL
1 2
(asig) (asig)
Figure 18.9 Block diagram of instr 1811 and 1812, two simple fixed-frequency oscillator
instruments.
f 1 0 8193 10 1
f 2 0 2 10 1
Figure 18.10 Orchestra and score code for instr 1811 and 1812 illustrating mixing via note
statements.
Spectral Fusion
The use of global control instruments is especially useful to create “spectral fusion.”
This technique relies on auditory perception to create the illusion that two spectra
belong to the same source, by modulating their spectra simultaneously.
When two notes of different timbres are played together, they tend to be perceived
as distinct tones (see figures 18.9 and 18.10). If a single, common vibrato is applied
to the two sounds, however, they tend to fuse. The vibrato creates the illusion of a
single spectrum (see figures 18.11 and 18.12).
359 Using Global Csound Instruments for Meta-Parameter Control
(kline) 1 1
1
p5 p5
OSCIL p4 p4
1
(kvb) OSCIL OSCIL
1 2
.01
(gkvib) (asig) (asig)
Figure 18.11 Block diagram of instr 1813, 1814, and 1815, illustrating a common global
vibrato applied to a pair of simple static oscillator instruments to promote spectral fusion.
gkvib init 0
Figure 18.12 Orchestra code for instr 1813, 1814, and 1815, which realize spectral fusion
through a common global vibrato.
In some cases, such as some granular-synthesis techniques, control signals are useful
to impose a direction to a stream of events. Because the control signal can evolve
over the duration of the score, each instance of an instrument triggered by a note
event from the score can access the instantaneous values of the control signal, thus
effectively “mapping” the direction onto the sound event.
In the next example we will use a meta-control instrument to generate complex
random values that will be used in a sound-producing instrument. First, let us design
360 Martin Dupras
imean
(gkgauss)
Figure 18.13 Block diagram of instr 1816 and 1817, a time-varying stochastic generator pair.
the control instrument using the rand opcode. But since rand produces a uniform
distribution of numbers, how can we generate a more complex distribution, such as
a normal (Gaussian) curve?
One method, as shown in figures 18.13 and 18.14, consists of assigning the output
of a complex mathematical expression to a k-rate variable. But this yields an ex-
tremely slow instrument, because the result of the expression has to be computed on
every k-rate pass. Additionally, the mathematical expression may not be particu-
larly intuitive.
A more effective method consists in loading a table with a set of values statistically
distributed according to the expected distribution. If we index that table with a uni-
formly distributed random generator, rand, the output is close to the expected result
(see figures 18.15 and 18.16). Instead of computing a complex expression many
times, it then only has to be computed once when the table is created. The computa-
tion time for creating the table is extremely small compared to calculating the expres-
sion in the instrument.
Furthermore, we can load a table with virtually any distribution and achieve the
same relative computation cost. There is however a trade-off in accuracy, because by
filling the table with our distribution we explicitly calculate a finite amount of values
(equal to the size of the table), whereas the direct use of a mathematical expression in
the instrument yields the precise result of the expression for the current parameters.
361 Using Global Csound Instruments for Meta-Parameter Control
gkgauss init 0
f 1 0 8193 10 1
f 2 0 513 2 1
i 1816 0 20
i 1817 0 20 2000 1.14
i 1817 0 20 2000 0.75
i 1817 0 20 2000 2.14
i 1817 0 20 2000 3.47
Figure 18.14 Orchestra and score code for instr 1816 and 1817, a stochastic instrument pair
illustrated in figure 18.13.
Probability Masks
RAND OSCIL
2
(krand) (gkgauss)
(kgt)
.5
3 1 0 0 SAMPHOLD
TABLE (kfr)
1
(ksig)
OSCIL
idev 1
(aosc)
imean
p4
(gkgauss)
Figure 18.15 Block diagram of instr 1818 and 1819, an efficient and intuitive time-varying
stochastic generator pair.
gkgauss init 0
f 1 0 8193 10 1
f 2 0 513 2 1
f 3 0 1024 21 6 ; NORMAL CURVE
i 1818 0 20 0
i 1819 0 20 2000 1.14
i 1819 0 20 2000 0.75
i 1819 0 20 2000 2.14
i 1819 0 20 2000 3.47
Figure 18.16 Orchestra and score code for the table-constrained stochastic generator instru-
ment pair illustrated in figure 18.15.
363 Using Global Csound Instruments for Meta-Parameter Control
Figure 18.17 Orchestra code for a probability mask, this is an example of a linear evolution
between two instantaneous values.
solution is to store the evolution of the minima and maxima for the whole duration
of the score in a table. This makes it easy to alter the curves without modifying
the instrument. These curves can be generated in programs other than Csound (in a
mathematics package, for instance) if the GEN routines are not sufficient. In most
cases however, GEN routines should be more than adequate.
Context-Sensitive Instruments
We can use the techniques we have seen so far to make instruments “listen” to each
other. This is especially useful since most commercial synthesizers or synthesis
methods do not have the kind of flexibility that Csound affords for implementing
such methods.
Our first example will consist of an FM instrument playing a series of notes of
varying amplitudes (see figures 18.18 and 18.19).
A second instrument will also play an independent series of notes, but its ampli-
tude will be controlled by the first instrument, so that when the latter plays loud, the
second one will play softly and vice-versa (see figures 18.20 and 18.21).
To broadcast the loudness of the instr 1821 to instr 1822 we only need to add a
global variable inside the first instrument. This variable will send the power value of
the signal generated by the first instrument (see figure 18.20).
We can then use that global variable inside instr 1822 to control the amplitude of
each note as an inverse function of gkpwr.
It is possible to send information from within instr 1822 back to instr 1821. To il-
lustrate this idea, we will create instr 1824 and have it receive a signal from instr 1823;
ring modulate it with a sine wave whose frequency will be passed from the score to
364 Martin Dupras
p5 igm LINE
(klin)
1 p5 1 1
FOSCIL
(afm1) .01 p3 .04
LINEN
(asig)
p4
Figure 18.18 Block diagram of instr 1820, a simple FM instrument with pitch and ampli-
tude envelopes.
f 1 0 8193 10 1
i 1820 0 3 20000 125
Figure 18.19 Orchestra and score code for instr 1820, a simple FM instrument.
instr 1824 and send the ring-modulated signal back to instr 1823. In effect, we will
have created auxiliary inputs and outputs in instr 1823 to send and receive the signal
to be processed by instr 1824 (see figures 18.22 and 18.23).
Feedback Instruments
In some cases, we may want to create a “feedback loop” by sending the output of an
instrument back to itself. Some applications of feedback include physical modeling
and nonlinear systems.
As with any feedback system, great care must be taken to ensure that the value
being fed back does not grow infinitely. Additionally, because we need to use global
365 Using Global Csound Instruments for Meta-Parameter Control
RMS
(gkpwr)
Figure 18.20 Block diagram of instr 1821 and 1822, context sensitive instrument that use a
global rms opcode to control relative amplitude of each note.
f 1 0 8193 10 1
Figure 18.21 Orchestra and score code for instr 1821 and 1822, a context sensitive instru-
ment pair.
366 Martin Dupras
Figure 18.22 Block diagram of instr 1823 and 1824, featuring intercommunication from
one to another and then back to the first.
garing init 0
gasine init 0
f 1 0 8193 10 1
Figure 18.23 Orchestra and score code for instr 1823 and 1824 that pass data back and forth
between each other.
367 Using Global Csound Instruments for Meta-Parameter Control
f 1 0 8193 10 1
Figure 18.24 Orchestra and score code for instr 1825, a global feedback instrument.
variables within the instrument, we must make sure the global value is initialized
before being used in the instrument. This must be done before the instrument, if the
feedback is to be sustained across notes. If the feedback is to be constrained within
the note then the global variable must be assigned a value at init time to ensure that
the value will not be reset at every k-rate or a-rate pass.
By default, if we simply use a global variable, the signal will be delayed by one
sample. Thus, the actual delay time in seconds depends on the sample rate. This
must be taken into account when creating the instrument. Also, it is best to design,
test and execute the instrument at the same sample rate, since changing it will mod-
ify the behavior of the algorithm.
Now, let us create an instrument where the oscillator modulates its own fre-
quency (see figure 18.24). First of all, we have to make sure that the value to be fed
back is in a usable range. Thus, we will set the amplitude of the oscillator to 1 and
multiply it later at the output. The frequency of the oscil will be the multiplication
of p5 (the “base frequency” of the oscillation) by the feedback signal and a time-
varying re-scaling value, changing from 0 to 1 over the course of the note.
Feedback systems are, in general, nonlinear because they rely on previous states
of the system. This makes feedback instruments hard to predict and control. There-
fore, much experimentation is needed in order to obtain satisfying results.
368 Martin Dupras
Conclusion
The techniques we have seen are only a few possibilities afforded by the use of
control instruments and meta-parameters. I hope that the examples will trigger ideas
and improve the elaboration of future algorithms in simpler and more powerful ways.
References
Knuth, D. E. 1981. The Art of Computer Programming. 2d ed. Reading, Mass.: Addison-
Wesley.
19 Mathematical Modeling with Csound:
From Waveguides to Chaos
Hans Mikelson
Digital waveguide modeling has become one of the most active areas of research in
sound synthesis in recent years. It is based on the solution of the wave equation, a
differential equation that describes the behavior of a wave in a medium:
Where:
䡲 U is the wave function
䡲 c is the speed of propagation of the wave in the medium
䡲 Uii is the second derivative of the wave function with respect to i
Boundary values and initial conditions are usually specified as well (Lomen and
Mark 1988; Gerald 1980). Plucked string instruments and many wind instruments
can be approximated by the one-dimensional wave equation. For wind instruments,
breath pressure is usually given as an input to the system. A drum membrane can
be approximated by the two dimensional wave equation. Solving the wave equation
mathematically is often difficult and solving the wave equation numerically is com-
putationally intensive.
Digital waveguides provide a computationally efficient method of expressing the
wave equation (Smith 1992). For a general one-dimensional case, a bi-directional
delay line is used to simulate samples of a traveling wave and its reflection. A more
complex system may have a bi-directional delay line for each separately vibrating
section.
370 Hans Mikelson
A Waveguide Bass
This instrument is derived from the Karplus-Strong algorithm, one of the most easily
implemented waveguide models (Karplus and Strong 1983). It is accomplished as
follows: fill a delay line with random numbers, take the average of the current output
and the previous output and add this average to the input of the delay line. This
simple procedure produces sounds remarkably like a plucked string (see figures 19.1
and 19.2).
The initial noise in the delay line is representative of a string in a high energy
state. The average is a type of digital filter. It represents damping, which occurs at
the ends of the string. This simple delay line, filter and feedback sequence is typical
of waveguide instruments.
At low frequencies the initial noise begins to dominate the sound in the Karplus-
Strong algorithm. To eliminate this problem and more closely simulate a plucked
string’s initial state, a filtered triangle waveform was selected as an initial state of the
delay line.
iplk ⫽ 1/ifqc*p6
kenvstr linseg 0, ipluck/4, -p4/2, ipluck/2, p4/2, ipluck/4, 0, p3-ipluck,0
aenvstr ⫽ kenvstr
ainput tone aenvstr, 200
The duration of the initial pulse, parameter p6, can give the effect of different
plucking styles.
Filters and initial conditions can introduce an offset from zero. When these are
fed back into the system they can rapidly produce off-scale values. To solve this
problem, a special type of filter called a DC blocker is introduced.
ablkin init 0
ablkout init 0
ablkout ⫽ afeedbk-ablkin⫹.99*ablkout
ablkin ⫽ afeedbk
ablock ⫽ ablkout
The sum of the input and the DC-blocked feedback is fed into a delay line of
length 1/frequency. This instrument is slightly flat owing to the delays introduced by
the filters. Subtracting about 15 samples brings it into tune.
adline delay ablock⫹ainput, 1/ifqc–15/sr
afeedbk tone adline, 400
Some resonances are generated and scaled to simulate the resonance of an acoustic
bass body:
371 Mathematical Modeling with Csound: From Waveguides to Chaos
Delay Filter
Input Output
These are modified by an envelope to produce a swell shortly after the initial pluck.
This is added to the output from the delay line, scaled again and used as the output
from the instrument:
out 50*koutenv*(afeedbk⫹kfltenv*(abody1⫹abody2))
This instrument could be enhanced by introducing a bridge delay line for transmit-
ting the string vibrations to the body, or modifying the string-filter for fingered and
open strings. Perhaps a system of waveguides could be set up to simulate the acoustic
bass body.
A Slide-Flute
The next instrument considered is a slide-flute derived from Perry Cook’s instrument
(Cook 1995). The input to this system is a flow of air. Noise is added to simulate a
breath sound.
The feedback section of this instrument consists of two delay lines. One delay line
models the embouchure for the air-jet and the other models the flute bore. Optimally,
the embouchure delay is equal to one-half of the length of the flute bore:
atemp1 delayr 1/ifqc/2
ax deltapi afqc/2
delayw asum2
The interaction between the embouchure and the flute bore is modeled by a cubic
equation, x-x3:
ax delay asum2, 1/ifqc/2; EMBOUCHURE DELAY
apoly ⫽ ax-ax*ax*ax ; CUBIC EQUATION
The end of the flute bore reflects low frequencies. This is modeled with a lowpass
filter at the beginning of the bore delay line. The bore delay is then fed back into the
system in two places, before the embouchure delay, where it is added to the flow and
before the filter, where it is added to the output from the cubic equation.
The pitch is changed by changing the length of the bore delay line:
afqc ⫽ 1/ifqc-asum1/20000–9/sr⫹ifqc/12000000
373 Mathematical Modeling with Csound: From Waveguides to Chaos
flow 3
Emb Delay X-X Filter
In order to be able to tune precisely, an interpolating variable delay tap was used
to implement the bore delay. The delayr, delayw and deltapi opcodes were used
for this.
atemp2 delayr 1/ifqc
aflute1 deltapi afqc
delayw avalue
In a real flute, pitch varies slightly based on breath pressure. Vibrato can be intro-
duced in this way. To implement this in the waveguide model, the delay tap length
includes a term based on the pressure.
atemp delayr 1/ifqc
aflute1 deltapi 1/ifqc–12/sr⫹asum1/20000
delayw avalue
One modification of this instrument could be to make the embouchure delay length
a function of pressure, to allow for overblowing techniques. This can be tricky, be-
cause overblowing occurs at a lower pressure for low frequencies. Re-tuning the flute
would also be required.
A Waveguide Clarinet
The table can be modified to account for variations in reed stiffness and embou-
chure. The bell at the end of a clarinet functions as a filter, low frequencies are re-
flected back into the bore and high frequencies are allowed to pass out of the bore.
This is implemented as a pair of filters. The output from the lowpass filter feeds the
reflection delay line and the output from the highpass filter is scaled and used as the
output of the instrument.
375 Mathematical Modeling with Csound: From Waveguides to Chaos
Breath Pressure
Reed To Bell Delay Output Filter
Reed
Table Reflection Filter
A Waveguide Drum
The membrane can be left on for accumulation of successive impulses and can be
turned off to simulate muting. The output from the drum is fed into a delay line,
meant to simulate the drum body.
atube delay anodea, itube/ifqc
afiltr tone atube, 1000
afiltr ⫽ afiltr/ifdbck2
The second part of this instrument simulates the drum stick striking the drum. This
sends an impulse into the drum membrane. The pitch determines the duration of
the impulse and along with the filter can be thought of as specifying the type of
drum stick.
This drum can be adjusted to produce a wide variety of percussion sounds. Sounds
similar to a bongo, conga, wood blocks, struck glass bottles, bells and others can be
produced.
A pitch dependence on amplitude could be introduced to simulate the drum head
tightening during an impact. One drawback of this instrument is the large number of
delay lines, requiring extensive computation. It is possible to simplify this system
substantially. Another problem with this instrument is that the timbre tends to change
with pitch. This presents a problem for reproducing marimba-like tones.
377 Mathematical Modeling with Csound: From Waveguides to Chaos
A B
C D
Figure 19.7 A simplified diagram of a waveguide square drum. Each double-arrowhead line
represents a bi-directional delay line.
Figure 19.8 Orchestra code for instr 1904 and 1905, a drumstick and a waveguide drum.
378 Hans Mikelson
Summary
The Lorenz Attractor, discovered by Edward Lorenz, is one of the first chaotic sys-
tems to be discovered (Pickover 1991). It describes the motion of convection currents
in a gas or liquid. It is defined by the following set of equations:
dx / dt = ( y − x )
dy / dt = − y − xz + rx (19.2)
dz / dt = xy − bz
Where is the ratio of the fluid viscosity of a substance to its thermal conductivity,
r is the difference in temperature between the top and the bottom of the system and
b is the width to height ratio of the box used.
380 Hans Mikelson
Figure 19.9 Orchestra code for instr 1908, a Lorenz Attractor instrument.
The variable h represents the time step and can be used to modify the frequency
of the system. Smaller values of h result in a more accurate approximation of the sys-
tem. As the value of h is increased, the approximation becomes less and less accurate
until the system becomes unstable, when h is somewhat larger than .1. The x and y
coordinates are scaled and given as the output for the instrument. Initial values of
the coordinates and coefficients are provided in the score.
; START DUR AMP X Y Z S R B H
i 1908 0 8 600 .6 .6 .6 10 28 2.667 .01
The values listed are the historic values of the coefficients. This system will pro-
duce drastically different results with different initial values of the coordinates and
different coefficients (see figure 19.9).
381 Mathematical Modeling with Csound: From Waveguides to Chaos
Figure 19.10 Orchestra code for instr 1909, Rossler’s Attractor instrument.
The Rossler Attractor (Gleick 1987) is a chaotic system defined by the following
equations:
dx / dt = − y − z
dy / dt = x + Ay (19.3)
dz / dt = B + xz − Cz
This can be implemented using the methods described for implementing the Lo-
renz attractor (see figure 19.10).
This instrument is based on simulating the orbit of a planet in a binary star system
(Dewdney 1988). The planet is initialized to some location and the stars are given
positions and masses. For each sample, the position, velocity, and acceleration of the
planet is calculated based on the laws of gravity and momentum.
382 Hans Mikelson
m1m 2G
F = (19.4)
r2
F = ma (19.5)
mstar ⌬x
ax =
r2 r
mstar ⌬y
ay = (19.7)
r2 r
mstar ⌬z
az =
r2 r
The acceleration due to each star is calculated separately and summed to obtain
an overall acceleration. Once the total acceleration has been determined the velocity
may be incremented:
vx = vx + ax
vy = vy + ay (19.8)
vz = vz + az
The instrument shown in figure 19.11, instr 1910, implements a planet orbiting in
a binary star system as described above. Note that the one is added to the radius
squared, to stabilize the system by avoiding division by zero during a close approach
between a planet and a star. If computation of an actual star system is desired, the
gravitational constant should be included. The one added to the radius-squared term
should also be omitted.
instr 1910 ; PLANET ORBITING IN BINARY STAR SYSTEM
kampenv linseg 0, .01, p4, p3-.02, p4, .01, 0
; PLANET POSITION (X, Y, Z) & VELOCITY (VX, VY, VZ)
kx init 0
ky init .1
kz init 0
kvx init .5
kvy init .6
kvz init -.1
ih init p5
ipanl init p9
ipanr init 1-ipanl
; STAR 1 MASS & X, Y, Z
imass1 init p6
is1x init 0
is1y init 0
is1z init p8
; STAR 2 MASS & X, Y, Z
imass2 init p7
is2x init 0
is2y init 0
is2z init -p8
; CALCULATE DISTANCE TO STAR 1
kdx ⫽ is1x-kx
kdy ⫽ is1y-ky
kdz ⫽ is1z-kz
ksqradius ⫽ kdx*kdx⫹kdy*kdy⫹kdz*kdz⫹1
kradius ⫽ sqrt(ksqradius)
; DETERMINE ACCELERATION DUE TO STAR 1 (AX, AY, AZ)
kax ⫽ imass1/ksqradius*kdx/kradius
kay ⫽ imass1/ksqradius*kdy/kradius
kaz ⫽ imass1/ksqradius*kdz/kradius
; CALCULATE DISTANCE TO STAR 2
kdx ⫽ is2x-kx
kdy ⫽ is2y-ky
kdz ⫽ is2z-kz
ksqradius ⫽ kdx*kdx⫹kdy*kdy⫹kdz*kdz⫹1
kradius ⫽ sqrt(ksqradius)
; DETERMINE ACCELERATION DUE TO STAR 2 (AX, AY, AZ)
kax ⫽ kax⫹imass2/ksqradius*kdx/kradius
kay ⫽ kay⫹imass2/ksqradius*kdy/kradius
kaz ⫽ kaz⫹imass2/ksqradius*kdz/kradius
; UPDATE THE VELOCITY
kvx ⫽ kvx⫹ih*kax
kvy ⫽ kvy⫹ih*kay
kvz ⫽ kvz⫹ih*kaz
; UPDATE THE POSITION
kx ⫽ kx⫹ih*kvx
ky ⫽ ky⫹ih*kvy
kz ⫽ kz⫹ih*kvz
aoutx ⫽ kx*kampenv*ipanl
aouty ⫽ ky*kampenv*ipanr
outs aoutx, aouty
endin
Figure 19.11 Orchestra code for instr 1910, a planet orbiting in a binary star system.
384 Hans Mikelson
One modification could be to add more stars to the system. It is difficult to find a
system that remains stable over a long period of time. Eventually, a close encounter
with a star accelerates the planet to escape velocity from the system.
Summary
There seems to be a great deal of room for exploration in this area. You will find that
many different dynamical systems are useful as tone generators. These systems also
work well at low frequencies for modulating pitch, amplitude and panning. At low
amplitudes they can be used to introduce subtle complex modulations to otherwise
sterile sounds.
References
Cook, P. 1995 . “Integration of physical modeling for synthesis and animation.” International
Computer Music Conference. Banff.
Cook, P. 1995. “A meta-wind-instrument physical model and a meta-controller for real time
performance control.” International Computer Music Conference. Banff.
Dewdney, A. 1988. The Armchair Universe. New York: W. H. Freeman and Co.
Karplus, K., and A. Strong. 1983. “Digital synthesis of plucked-string and drum timbres.”
Computer Music Journal 7(4).
Lomen, D., and J. Mark. 1988. Differential Equations. Englewood Cliffs, N.J.: Prentice-Hall.
Pickover, C. 1991. Computers and the Imagination. New York: St. Martin’s Press.
Smith, J. O. III. 1992. “Physical modeling using digital waveguides.” Computer Music Journal
16(4): 74–91.
Signal Processing
Understanding Signal Processing through Csound
20 An Introduction to Signal Processing
with Csound
R. Erik Spjut
Signal processing has become a vital part of a modern engineer’s vocabulary. From
earthquake retrofitting to direct-from-satellite TV broadcasts to chemical-process
control, a knowledge of signal processing and dynamic systems has become essential
to an engineer’s education. Learning signal processing, however, is often a dull exer-
cise in performing integrations, summations, and complex arithmetic. Csound has a
number of features that make it valuable as an adjunct to the traditional course work.
First, it was designed from the ground up to follow the principles of digital signal
processing, although this fact is probably hidden from the casual user. Second, hear-
ing a swept filter or an aliased sine wave brings the signal processing concepts home
in an immediate and visceral way.
This chapter introduces the student of signal processing to the powerful examples
available in Csound. It also introduce the concepts of signal processing to the Csound
practitioner in the hope that you will better understand why Csound does what it
does and also be better able to make Csound do what you want.
Because all of the topics are interrelated, occasional use or mention is made of
later topics in the derivation of earlier topics. The order of presentation keeps forward
references to a minimum.
Even though Csound is powerful, it will not do every sort of calculation that you
might desire. Often you need to do some sort of symbolic or numeric calculation to
figure out what values to use in Csound. These sorts of calculations are best done
using a program like Mathematica, Maple, MathCAD, or Matlab. I have given some
examples of these sorts of calculations using Mathematica to assist the interested
reader. Just as you do not really need to know how to do long divisions in order to
use a calculator, you do not really need to know calculus to use Mathematica to
integrate.
390 Erik Spjut
Mathematical Conventions
Variables
A variable is a symbol for a quantity or thing that can change or vary. Common vari-
ables in algebra are x and y. Some of the variables used in this chapter are t (usually
stands for time), T (a time interval or period), f (frequency of a continuous signal)
and F (frequency of a discrete signal).
Functions
A function is the way one variable depends on (or relates to) another variable, for
example, in the equation
y = 3x + 5 (20.1)
the variable y is a function of the variable x. The function says “take whatever value
x has currently, multiply it by three and add five to that result.” That final result is the
current value of y. Often, when one variable such as y is a function of another vari-
able, such as t or n, but we do not know (or care) exactly what the function is, we
write y (t) or y [n]. When speaking, we read y (t) as “y-of-t” and y [n] as “y-of-n”.
Three of the most common functions in signal processing are the sine of x (written
as sin x), the cosine of x (written as cos x) and the exponential of x (written as ex or
exp x). A function where y (⫺t) ⫽y (t) (like y (t)⫽t2) is called an even function. A
function where y (⫺t)⫽ ⫺y (t) (like y (t)⫽t) is called an odd function. Many func-
tions are neither even nor odd.
Complex Numbers
If we have two complex numbers a ⫹ jb and c ⫹ jd, we can define the relationships
for addition as the sum of the real parts and the sum of the imaginary parts:
( a + jb ) + ( c + jd ) = ( a + c ) + j ( b + d ) (20.2)
and subtraction as the difference of the real parts and the difference of the imagi-
nary parts:
( a + jb ) − ( c + jd ) = ( a − c ) + j ( b − d ) (20.3)
Polar Form
| a + jb | = a2 + b2 (20.4)
We can think of the angle between the positive x-axis and the line segment from the
origin to the complex number as the angle (also known as the phase angle, or the
phase or the argument) of the complex number:
b
∠ a + jb = arctan (20.5)
a
392 Erik Spjut
b a + jb
0
a
x
With two complex numbers, r1e j1 and r2e j2, in polar form, multiplication is the
product of the magnitudes and the sum of the angles:
( r1 e j 1 ) ( r2 e j 2 ) = ( r1 r2 )e j ( 1 + 2 ) (20.6)
Division is the ratio of the magnitudes and the difference of the angles:
r1 e j 1 ⎛r ⎞
j 2
= ⎜ 1 ⎟ e j ( 1 − 2 ) (20.7)
r2 e ⎝ r2 ⎠
There are formulas for multiplication and division using a and b instead of r and
, but they usually make calculations much harder.
Complex Variables
By analogy with complex numbers, the complex variable z contains two normal
(real) variables x and y:
393 An Introduction to Signal Processing with Csound
z = x + jy (20.8)
Again we call x the real part and y the imaginary part. We can (and usually do)
express complex variables in polar form:
z = re j (20.9)
Where r ⫽ 冑a2 ⫹ b2 and ⫽ arctan y/x, but r and are now variables instead
of numbers.
Complex Functions
The real part of e z is the product of the exponential of x and the cosine of y, e x
cos y and the imaginary part is the product of the exponential of x and the sine of y,
e x sin y. In terms of polar form, the magnitude of e z is e x and the angle is y.
(It may be worth noting that Mathematica treats almost all variables and functions
as being complex. For example, Mathematica will evaluate Exp[I Pi/2] or Exp[3 ⫹
2 I] as readily as Sin[Pi/2].)
Summation
The symbol:
27
∑ f (n)
n=0
(20.11)
is a summation symbol. It means to take the index variable (the variable on the
bottom, usually n), let it take on all integer values from the starting index (the num-
ber on the bottom) to the ending index (the number on the top), substitute the integer
into the main expression and add up the values of all of the substitutions.
For example,
7
∑ n 2 = 3 2 + 4 2 + 5 2 + 6 2 + 7 2 = 9 + 16 + 25 + 36 + 49 = 135
n =3
(20.12)
394 Erik Spjut
The symbol:
∫ (20.13)
∫ sin x (20.14)
∫0 cos x (20.15)
Value
Time
Signals
Continuous Signals
Signals are divided into two classes, continuous-time and discrete-time. Continuous-
time signals have a value at any instant of time. Most physical phenomena, such
as temperature, sound, or a person’s weight are continuous-time signals (see figure
20.2). Two continuous-time signals that are important in signal processing are
sin(2f0t) and cos(2f0t). Engineers often use the Euler formula as a shorthand nota-
tion for representing these two signals at the same time:
where j ⫽ 冑⫺1. One can get back to sines and cosines with:
e j 2 f0 t + e − j 2 f 0 t
cos ( 2f0 t ) = (20.19)
2
and:
e j 2 f0 t − e − j 2 f 0 t
sin (2f0 t ) = (20.20)
2j
396 Erik Spjut
Value
Time
A Period, T0
Periodic Signals
The signals sin (2f0t), cos (2f0t) and e j2f0t are all periodic signals (see figure 20.3),
meaning that for some value T, the signal x (t) ⫽ x (t ⫹ T). If f0 in sin (2f0t), cos (2f0t),
or e j 2f0t is not equal to 0, then the smallest positive value of T for which x (t) ⫽ x (t
⫹ T) is called the fundamental period, T0 . T0 ⫽ 1/|f0|. For an arbitrary periodic signal
(where x (t) ⫽ x (t ⫹ T0)), f0 is called the fundamental frequency.
In the abstract world of signals, a periodic signal like sin (2f0t) started before the
dawn of time and goes on past the end of the universe. Those of us who are not
deities do not have the patience to listen to such signals. Our signals must have a
start and an end. For us mere mortals, a reasonably long section of sin (2f0t) can be
generated by the instrument shown in figures 20.4 and 20.5.
Csound has some special facilities for generating periodic signals, but we need to
wait for a little more background before we fully understand how they work. Let us
listen to the examples first and then learn how they work.
For a periodic function, x (t) ⫽ x (t ⫹ T ), we do not really have to let time go from
the start to the end. We can just let time run from 0 to T and then start over. If we
have a new variable, say t⬘, defined as t⬘ ⫽ t/T, then we can generate x (t) by letting
t⬘ run from 0 to 1 and then start over at 0.
Discrete Signals
The other class of signals, discrete-time signals, only have values at fixed instants of
time. For example, the closing value of the Dow Jones industrial average is a dis-
crete-time signal that only has a value determined at the closing bell (4 p.m. Eastern
Time in the United States). In between the closing bells, the closing value is unde-
fined. In other words, it makes no sense to talk about the closing-bell value at noon.
Discrete-time signals are often represented graphically as “lollipop diagrams,” where
397 An Introduction to Signal Processing with Csound
440
PHASOR
(a1)
itwopi
SIN
32767
(a2)
Figure 20.4 Block diagram of instr 2001, a 440 Hz sinusoidal oscillator at full amplitude
synthesized mathematically by solving for sin(x).
i 2001 0 5
Figure 20.5 Orchestra and score code for instr 2001, a sine wave synthesizer as shown in
figure 20.4.
32767 440
OSCIL
p4
(aprd)
Figure 20.6 Block diagram of instr 2002, a simple Csound oscillator instrument.
398 Erik Spjut
i 2002 0 5 1
i 2002 6 5 2
i 2002 12 5 3
i 2002 18 5 4
Figure 20.7 Score code for instr 2002 with function for sine, saw, square, and triangle called
consecutively by each note.
the height of the lollipop stick corresponds to the value of the signal at that time. The
lollipop itself is meant to remind you that the function only has values at discrete
points (see figure 20.8).
Because a discrete-time signal only has a value at discrete times, we can number
these discrete times, for example, t0 , t1, t2, . . . The values of the discrete-time signal,
x, can then be represented as either x (t0), x (t1 ), x (t2), . . . , or x [0], x [1], x [2], . . . ,
where x [n] ⫽ x (tn ) and n is called the index variable.
The natural representation of a signal in a computer is as a discrete-time signal. A
computer has individual memory or storage locations, so the values of a discrete-
time signal can be stored in successive memory locations. Csound has a lot of built-
in facilities for generating and storing discrete-time signals.
The GEN routines (in the .sco file) all create discrete-time signals that are stored
in a section of memory called a table. You use an index into a table to find a particular
stored value of the function. You can store a complete discrete-time signal in the com-
puter using a GEN routine and then generate the signal by indexing the table with
the integers from 0 to table-length ⫺ 1. Since tables can be of different lengths, it is
sometimes easier to say you want a value halfway through the table or three-quarters
of the way through the table, instead of at index 4095. The Csound opcode table lets
you access a table either way. The line:
a1 table 73, 1, 0
n
–5 –4 –3 –2 –1 0 1 2 3 4 5
The line:
a2 table 0.25, 2, 1
A natural question then arises: How can one convert a discrete-time signal such
as the output from Csound into a continuous signal, such as a sound? Or how can
one convert a continuous-time signal such as a sound into a discrete-time signal,
such as an .aif file? Under a specific criterion, known as the sampling theorem, these
tasks can be accomplished exactly, without the loss or addition of any information.
But before we discuss the sampling theorem, we have a few additional topics to
cover.
Fourier Series
It was discovered long ago that any reasonable periodic signal (one that has a max-
imum value less than infinity and a finite number of wiggles or jumps in a pe-
riod) can be represented as a weighted sum of integrally-related (meaning that the
frequencies are related by integers) sines and cosines or complex exponentials. This
weighted sum is known as a Fourier series and the weightings are known as the
Fourier coefficients. Mathematically, the periodic signal x (t) is given by the sum:
+∞
x (t ) = ∑ ake2 jk f0 t
k =−∞
(20.22)
where the summation is the Fourier series and the ak are the Fourier coefficients. If
the periodic function is an even function, the complex exponentials can be rewritten
as cosines:
∞
x (t ) = ∑ ak cos 2k f0 t
k =0
(20.22)
If the periodic function is an odd function, the complex exponentials can be rewrit-
ten as sines:
∞
x (t ) = ∑
k =1
ak sin 2k f0 t (20.23)
The ak can be calculated by evaluating the integral of the product of the function
and the appropriate complex exponential over one period and dividing by the period.
Mathematically:
1
ak = ∫ x (t )e − j2k f0 t dt (20.24)
T0 T0
A set of sines, cosines or complex exponentials, where the frequencies are related
by fk ⫽ kf0, where k ⫽ 0, ⫾1, ⫾2, ⫾3, . . . are said to be harmonically related and
401 An Introduction to Signal Processing with Csound
the individual terms are called the harmonics or partials. The terms where k ⫽ ⫾1
are called the fundamental harmonic or fundamental. The terms where k ⫽ ⫾2 are
called the second harmonic or second partial. The terms where k ⫽ ⫾3 are called the
third harmonic or third partial and so forth. Much of the tone quality or timbre of a
musical note is governed by the number and relative strengths of the harmonics in
the note. As an interesting side-note: the ak can be thought of as a discrete-time
signal.
The Csound function generators, GEN9, GEN10 and GEN19 are designed to cre-
ate a table with one period of a periodic function from the Fourier coefficients. They
differ in the amount of detail you have to specify about each harmonic. As an ex-
ample, a sawtooth wave (which is an odd function) has the following Fourier series:
2 ∞ 1 ⎛ 2k ⎞
x (t ) = ∑ sin ⎜ t⎟ (20.25)
k =1 k ⎝ T0 ⎠
Two other common periodic signals are the square wave (see figure 20.10):
4 ∞ 1 ⎛ 2k ⎞ ⎧⎪ 4 1 ⎫⎪
x (t ) = ∑ sin ⎜ t⎟ , ak = ⎨ , k , for k odd ⎬ (20.27)
k =1 k ⎝ T0 ⎠ ⎪⎩ 0, for k even⎪⎭
f 1 0 16384 10 1 0 0.3333 0 0.2 0 0.1429 0 0.1111 0
⎛ 2k ⎞ ⎧⎪ 8 (−1) ( n −1 ) / 2 ⎫
(−1) ( n−1) / 2 , for k odd ⎪ (20.28)
∞
8
x (t ) = ∑ sin ⎜ t⎟ , a = ⎨ 2
k 2 ⎬
2
k
k =1 k2 ⎝ T0 ⎠ ⎪⎩0, for k even⎪⎭
f 3 0 16384 10 1 0 -0.1111 0 0.04 0 -0.0204 0 0.0123 0 -0.0083 0 0.0059
402 Erik Spjut
0.4
0.3
0.2
0.1
0 n
0 2 4 6 8 10 12 14 16 18 20
1.5
0.5
0 n
0 5 10 15 20
If your calculus is not really sharp, you can use a program such as Mathematica
to calculate the Fourier coefficients for you. You still have to come up with an expres-
sion for one period of the periodic function. But Mathematica will then calculate the
Fourier series and the Fourier coefficients.
For example, one period of a sawtooth wave is described by y ⫽ ⫺(2x ⫹ 1). The
following Mathematica commands will give you the first ten terms of the Fourier
series and also give you the first ten coefficients.
⬍⬍Calculus ‘Fourier Transform’
y⫽-(2x-1)
FourierTrigSeries[y,{x,0,1},10]
Table[Pi/2 FourierSinSeriesCoefficient[y,{x,0,1},n],{n,1,10}]
403 An Introduction to Signal Processing with Csound
0.5
0 n
0 5 10 15 20
–0.5
Not all signals are periodic. It turns out that even aperiodic continuous-time sig-
nals can be constructed from the sum of sines and cosines or complex exponentials.
We need the weighted sum of sines and cosines or complex exponentials at all
possible frequencies, however, instead of at just harmonically-related frequencies.
The mathematical way to sum lots of little bits right next to each other is to integrate.
Instead of the discrete ak we had for the Fourier series, we now have a continuous
function of frequency, so the continuous-time signal x (t) can be constructed from
the weighted sum of all frequencies with the weighting function being X ( f ),
that is:
∞
x (t ) = ∫ X ( f )e j2f t d f (20.29)
−∞
The equation for calculating X ( f ) is called the Fourier Transform. The equation
for reconstructing x (t) from X ( f ) is called the Inverse Fourier Transform. The im-
portance for us in these transforms is that we now have two ways of looking at a
continuous-time signal. We can look at it as a function of time (time domain), or as
a function of frequency ( frequency domain). Since humans hear the pitches of notes
as a function of frequency and not as functions of time, the frequency domain is
often a valuable place to examine musical signals.
404 Erik Spjut
Plotted below is the function (see figures 20.12, 20.13, and 20.14):
⎧ 0 , t < 0⎫
y = ⎨ −t
, t ≥ 0⎬⎭
(20.31)
⎩e
and its Fourier transform. The plots were generated with Mathematica from the fol-
lowing instructions:
⬍⬍Calculus‘FourierTransform’
foft⫽UnitStep[t]Exp[-t]
fofw⫽FourierTransform[foft,t,w]
foff⫽fofw/.w-⬎2 Pi f
Plot[foft,{t,-5,5}]
Plot[Abs[foff ],{f,-5,5}]
Plot[Arg[foff ],{f,-5,5}]
x[ n ] = ∫ X ( F )e 2 jnFdF (20.33)
1 cycle
The integration is performed over any range of F with a length of 1, for example,
0 to 1 or ⫺1/2 to 1/2. We will make use of both of these formulas in the section
on filters.
Because the Fourier transform can consist of complex numbers, it is customary to
plot functions in the frequency domain as a plot of magnitude (or absolute value)
versus frequency and phase (or phase angle or argument) versus frequency. Csound
has a built-in function for plotting the magnitude of a Fourier transform but not the
phase. The opcode dispfft and the analysis program pvanal use the Fast Fourier
Transform (FFT), which is an efficient method for calculating a sampled version of
the discrete-time Fourier transform.
The Impulse
way that the area remains constant. The true impulse (like the ideal fashion model)
is infinitely tall and infinitely thin but has an area of 1. The discrete-time impulse,
␦[n], is much simpler. It is a single sample of value 1 surrounded by zeros (see
figure 20.15).
The impulse has several properties that make it useful. The first is its sampling
property. You can find the value of a function at a given time t0 or index n0, with the
following formulas: the integral (or sum) of the product of the impulse and a function
is the value of the function at the impulse’s time, that is:
∞
or
∞
∑ x[ n ]␦[ n − n 0 ] = x[ n0 ]
n =−∞
(20.35)
This property is most useful in designing FIR filters (described below). The sec-
ond property is the impulse’s frequency content. The Fourier transform of the im-
pulse is a constant, that is:
∞
⌬( f ) = ∫ ␦ (t )e − j2 ft dt = e − j2 f (0) = 1 (20.36)
−∞
or
∞
⌬( F ) = ∑ ␦[ n ]e −2 jFn = e −2 jF ( 0 ) = 1
n =−∞
(20.37)
In other words, an impulse contains all possible frequencies in equal strength (see
figure 20.16). That property makes it ideal for testing filters and evaluating their
performance.
You cannot really hear an impulse. But you can hear the impulse response of your
computer and sound system (I will explain what an impulse response is a little later).
The following Csound code will generate an impulse to let you hear the impulse re-
sponse of your computer and sound system (see figures 20.17 and 20.18).
Often one does not want an impulse with all possible frequencies, but rather a
periodic or pitched signal with all of the harmonics at equal strength. A train of
impulses will have the desired characteristics. Unfortunately, in a sampled system,
the period of a pulse-train must be an integer multiple of the sampling period, which
greatly limits the frequencies that can be used. Csound provides a way around this
limitation. The opcode buzz is designed to provide a signal with equal-strength har-
monics for any desired frequency. The Csound instrument shown in figures 20.19
and 20.20 will give you a buzz.
407 An Introduction to Signal Processing with Csound
n
–4 –3 –2 –1 0 1 2 3 4
magnitude
F
1/2 –1/2
As the frequency sweeps from 10 Hz to 640 Hz you should hear the transition
from individual clicks to a buzzing sound. The transitions (the “gear shifts”) are
heard because one has to specify the number of harmonics in buzz; the maximum
number allowed without aliasing (explained below) decreases with increasing
frequency.
Often a signal will have one or more frequency ranges where there is no infor-
mation, which means that X ( f ) ⫽ 0. Such a signal is called band-limited, meaning
the information content is limited to certain frequency ranges or bands (see figure
20.21).
With this background we are ready to tackle converting continuous signals to discrete
signals and back. The Sampling Theorem states that a band-limited continuous signal
with X ( f ) ⫽ 0 for | f | ⬎ fM is uniquely determined by its samples, if the sampling
frequency is at least twice the band-limited frequency, fs ⬎ 2fM, where the sampling
frequency is the reciprocal of the sampling period fs ⫽ 1/Ts. The original signal,
x (t), can be reconstructed from its samples by generating an impulse-train whose
amplitudes equal the sample values and passing the impulse-train through an ideal
lowpass filter with a gain of Ts and a cutoff frequency greater than fM and less than
fs ⫺ fM.
408 Erik Spjut
LINSEG
(a1) 4096
32767 p3 p3 0 1
Figure 20.17 Block diagram of instr 2003, an instrument that generates an impulse and
displays it in both the time and frequency domains to let you hear and measure the impulse
response of your system.
Figure 20.18 Orchestra and score code for instr 2003, which generates a single impulse and
displays the impulse and frequency response.
The details of converting from a continuous signal to a discrete signal and back
again are usually handled by the analog-to-digital (ADC) and digital-to-analog
(DAC) converters in your computer or sound card. Most ADC’s have filters on them
to remove any frequency content higher than one-half the sample rate. Consequently,
when recording a signal into a computer, one does not normally have to worry about
making sure that the signal is band-limited.
When generating or processing a signal inside the computer, however, one has to
be careful not to generate any frequency content above one-half the sample rate.
Csound has absolutely no built-in protection against this. In fact, there is no general
way to protect against this.
409 An Introduction to Signal Processing with Csound
EXPON
(afreq)
1 iharm 1 -1
LINSEG
(a1) BUZZ
(a2)
Figure 20.19 Block diagram of instr 2004, an instrument that generates a pitched “impulse
train” using Csound’s buzz opcode.
f 1 0 16384 10 1
i 2004 0 6 10 20
i 2004 6 6 20 40
i 2004 12 6 40 80
i 2004 18 6 80 160
i 2004 24 6 160 320
i 2004 30 6 320 640
Figure 20.20 Orchestra and score code for instr 2004, an instrument that generates a sweep-
ing train of impulses with all the harmonics at equal strength.
|X( f )|
f
fM – fM
If you generate a frequency above one-half of the sampling rate and attempt to
play it, the actual frequency you hear will depend upon just how high above one-half
the sampling rate the frequency is. For frequencies between 1/2fs and fs, the fre-
quency you will hear is fs ⫺ f. For frequencies between fs and 3/2fs, the frequency
you will hear is f ⫺ fs. The pattern repeats.
This reappearance of a high frequency as a much lower frequency is called
aliasing (i.e., the high frequency appears as its low-frequency alias). The Csound
code in figures 20.22 and 20.23 illustrates aliasing.
The result sweeps the waveform from 20 Hz to 176,400 Hz, which is four times
the sample rate. Humans cannot hear sounds at 176,400 Hz, but you will hear the
alias of 176,400 Hz quite clearly. For the sine wave you should be able to hear the
pitch sweep up and down four times. For the others, you will hear the partials sweep
up and down repeatedly before the fundamental starts to sweep back down. For the
buzz wave, all of the harmonics are the same strength so you will hear an audio mess.
Aliasing is neither a good nor a bad thing. However, if you are not expecting it,
aliasing may generate “noise” (my definition of “noise” is any sound that you do not
want to hear). It is best not to depend on aliasing in a Csound instrument, because
changing the sampling rate will dramatically change the sound. Another way to think
about aliasing is that the discrete-time frequency, F ⫽ f/fs, only has a useful range
from ⫺1/2 to 1/2. Any frequency generated outside this range will be aliased back
into this range (see figure 20.24). The most common causes of aliasing in Csound
are sampling a table too rapidly or having too high a modulation index in a modulat-
ing scheme such as FM (discussed below).
Systems
A system is anything that takes an input signal and transforms it into an output signal.
Systems are normally classified as either linear or nonlinear and time-invariant or
time-varying. A time-invariant system is one in which a given input causes a given
output no matter when the input occurs. For example, consider a doorbell as a system
with a push on the button as the input and the ringing sound as the output. No matter
when you push the button, the bell will ring; the system is time-invariant. If you
consider the system to be the doorbell, the residents, and the front door taken to-
gether, this new system is time-varying, that is, the response that you get to a push
on the doorbell depends on the time of day. Mathematically a system is time-invariant
if the input x (t ⫺ t0 ) gives rise to the output y (t ⫺ t0 ).
A linear system is one in which the output is directly proportional to the input.
Most doorbells are not linear because pushing the button twice as hard does not
411 An Introduction to Signal Processing with Csound
LINSEG EXPON
(a1) (a2)
OSCIL
p4
(a3)
Figure 20.22 Block diagram of instr 2005, an instrument designed to illustrate the sounds
and concepts of aliasing.
; SINE
f 1 0 8192 10 1
; SQUARE
f 2 0 8192 10 1000 0 333 0 200 0 143 0 111 0 91 0 77 0 67 0 59 0 53 0
; SAWTOOTH
f 3 0 8192 10 1000 500 333 250 200 167 143 125 111 100 91 83 77 71 67
63 59 56 53 50
; IMPULSE-LIKE
f 4 0 8192 10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
i 2005 0 20 1
i 2005 21 20 2
i 2005 42 20 3
i 2005 63 20 4
Figure 20.23 Orchestra and score code for instr 2005, an instrument whose frequency
moves from 20 Hz to 176,400 Hz—4x the sample rate.
F
–fs – fs /2 fs – f0 f0 fs
fs /2
x( t ) LTI System y( t )
δ( t ) LTI System h( t )
Figure 20.25 Block diagram of linear time-invariant systems in the time and frequency
domains.
cause the bell to ring twice as loud. A properly functioning grocery clerk is a linear
system because they will charge you twice as much for two boxes of corn flakes and
they will charge you the sum of three times the price of one box of corn flakes and
twice the price of one doughnut if you buy three boxes of corn flakes and two
doughnuts.
Mathematically, a linear system has the following properties: if x1(t) and x2(t) are
inputs to a system, a and b are constants and y1(t) is the system response to x1(t) and
y2(t) is the system response to x2(t), then the system is linear if the response to ax1(t)
⫹ bx2(t) is ay1(t) ⫹ by2(t).
From a design and analysis standpoint, linear time-invariant systems are the most
useful because you can predict their behavior from limited information. Most of the
rest of this chapter will be concerned with linear time-invariant (LTI) systems.
If you input an impulse into an LTI system you will get an output signal called the
impulse response, h (t), for continuous systems and h [n] for discrete systems (see
figure 20.25). It has been proven that you can get the response to any input signal,
x (t), from the impulse response by convolution of x (t) with h (t). The symbol for
convolution is usually the asterisk—“*.” For continuous signals:
∞
y(t ) = x (t ) ∗ h( t ) = ∫ x ( )h( t − ) d (20.38)
−∞
For discrete systems with short impulse responses, the convolution sum is fairly
easy to program and runs quite quickly. For long impulse responses, however, such
413 An Introduction to Signal Processing with Csound
Sines, cosines and complex exponentials are eigen functions of LTI systems, which
means that the output from a complex exponential input is a complex exponen-
tial with exactly the same frequency. The output is multiplied, however, by a (pos-
sibly complex) constant. In other words, for an input of ej2ft you get an output of
H ( f )ej2ft, where H ( f ) is a constant that depends on f but not on t. Since we learned
above that any reasonable signal can be constructed from a weighted sum (or inte-
gral) of complex exponentials, we can find the response of an LTI system as:
∞
y(t ) = ∫ H ( f ) X ( f )e j2f t d f or Y ( f ) = H ( f ) X ( f ) (20.40)
−∞
For discrete-time systems the result is Y (F) ⫽ H (F)X (F). You may have guessed
by now that H ( f ) is the Fourier transform of the impulse response, h (t). H ( f ) is
the transfer function or frequency-response function of the system. In other words,
multiplication in the frequency domain corresponds to convolution in the time do-
main. The procedure is to take the Fourier transform of the input signal and the
impulse response, multiply the two transforms and take the inverse transform of the
result to get the output signal.
The Csound orchestra and score shown in figures 20.26 and 20.27 can be used to
find the impulse response and the frequency response of an LTI Csound opcode or
section of code.
The opcode convolve and its analysis utility companion, cvanal, implement
frequency-domain convolution. The cvanal utility takes the Fast Fourier Transform
of a soundfile, creating a frequency-response function (the soundfile is assumed to
be the impulse response of some system or filter). The convolve opcode takes the
frequency-response function, multiplies it by the Fourier Transform of the input sig-
nal and takes the Inverse Fourier Transform.
The Csound code shown in figures 20.28, 20.29 and 20.30 is designed to do a
comparison. Note: because Csound’s analysis files are machine specific (you cannot
414 Erik Spjut
LINSEG
(a1)
1
OSCIL
1
sr/200
(a2)
sr/10
RESON
(a3) 4096
p3 p3 0 1
DISPLAY DISPFFT
Figure 20.26 Block diagram of instr 2006, an instrument designed to find the impulse and
frequency response of a linear time-invariant opcode (reson).
Figure 20.27 Orchestra and score code for instr 2006, an instrument that generates an im-
pulse response of a reson filter.
use PC analysis files on a Mac or vice versa) the impulse responses on the accompa-
nying CD-ROM must be preanalyzed by the Csound cvanal utility before they can
be used for convolution.
Csound has some other facilities for performing frequency-domain convolution
but they will not be discussed here. The interested reader is directed to pvoc and
pvanal, as well as to the spectral data types.
LINSEG
(abuzz) iharm
1 p4 1 -1
sr/5000
(a4) sr/100
RESON
(a5)
BALANCE
(a6)
4096
p3/5 p3/5 0 1
DISPLAY DISPFFT
Figure 20.28 Block diagram of instr 2007, an instrument designed to do a comparison test
and measure the impulse and direct implementation frequency response of an LTI opcode and
the convolution method as shown in figure 20.29.
LINSEG
(abuzz)
iharm
1 p4 1 -1
8192/sr
1.1
(a4) “...” DELAY
(adely)
CONVOL VE
(a5)
BALANCE
(a6)
4096
p3/5 p3/5 0 1
DISPLAY DISPFFT
Figure 20.29 Block diagram of instr 2008, an instrument designed to compare the impulse
and frequency response of the convolution opcode with the direct method shown in figure
20.28.
416 Erik Spjut
i 2007 0 6 110.25
i 2008 6 6 110.25
Figure 20.30 Orchestra code for instr 2007 and 2008, a direct implementation of an instru-
ment that generates an impulse and measures the time and frequency responses (2007) and a
convolution instrument for comparison (2008).
Difference Equations
For continuous-time signals and systems, the behavior of the systems is often based
on linear constant-coefficient differential equations. For discrete-time signals and
systems, systems are often described by constant coefficient difference equations.
The general form of a constant-coefficient difference equation is:
N M
∑ a k y[ n − k ] = k∑=0 bk x[ n − k ]
k =0
(20.41)
417 An Introduction to Signal Processing with Csound
If one assumes that the system is initially at rest, one can calculate the frequency
response and the impulse response from the difference equation in the following
way:
The discrete-time Fourier Transform of the difference equation is:
N M
∑ a k e −2 jkFY ( F ) = k∑=0 bk e −2 jk f X ( F )
k =0
(20.42)
The impulse response is then found from the inverse Fourier transform of H (F).
Block Diagrams
Difference equations are often represented by block diagrams in the form illustrated
in figure 20.31. A circle with a plus sign is a summation operator. The box with the
D in it is a delay operator. It corresponds to shifting the index from n to n ⫺1. The
coefficient by the arrow corresponds to multiplication by the coefficient. Thus we
can rewrite equation 20.41 as a block diagram (see figure 20.32). The coefficients by
the arrows correspond to multiplication by the coefficients in the previous equation.
If all of the ak are zero, except for a0, then the output only depends on the present
and past inputs and not on the past outputs. The impulse response of such a system
returns to zero after the impulse has passed through the M th input. Such a system is
said to have a finite impulse response, because the impulse response has a finite dura-
tion and the system is known as an FIR system. If any of the ak are not zero, then, in
general, the impulse response never truly returns to zero, although it may get very
small. These systems have feedback because the present value of the output depends
on all of the previous ones. Such a system is said to have an infinite impulse response
and is known as an IIR or recursive system (because of the feedback).
Stable Systems
b
x[n] y[x] means y[n] = bx[n]
x[n] D y[x] means y[n] = x[n-1]
x[n] + y[x] means y[n] = x[n] + w[n]
w[n]
Figure 20.31 Typical figures and form of block diagrams used to represent difference
equations.
1/a 0 b0
x[n] y[n]
D
-a 1 b1
D
-a 2 b2
D
-a N bN
equation obtained, by replacing the factors of e⫺2jF with z⫺1, for example, e⫺2jkF →
z⫺k, setting the denominator of the transfer function to zero and solving for z. If the
magnitudes or absolute values of all of the roots are less than 1, the system is stable.
If the magnitude of any of the roots is greater than 1 the system is unstable. As an
example, take the system with the transfer function:
1
H(F) = (20.44)
1+ e − 2 jF
+ 2 e − 4 jF
First, replace the factors of e⫺2jF with z⫺1.
1
H(z) = (20.45)
1+ z −1
+ 2 z −2
Then, set the denominator equal to zero.
1 + z −1 + 2 z −2 = 0 (20.46)
1 7 1 7
z = − + j , − − j (20.47)
2 2 2 2
Check the magnitudes.
|z| = 2, 2 (20.48)
Since at least one (in this case both) of the roots has a magnitude greater than 1,
the system is unstable.
The transfer function of an FIR system has no denominator and so it is always
stable. In Csound, an unstable system usually shows up as having infinity (INF) ap-
pear as the magnitude of the largest sample.
Filters
Csound also provides some tools for generating filters from scratch. Discrete-time
filters are usually constructed from difference equations as discussed above. The
420 Erik Spjut
| H(f) | | H(f) |
fc
f –f fc
f
–fc c
| H(f) | | H(f) |
f f
–f c2 –f c1 f c1 f c2 –fc2 –fc1 f fc2
c1
FIR Example
Figure 20.34 Impulse response for tone ((KHP⫽fs/100). An impulse response plots the
amplitude (or intensity) of the output from a filter when the input is an impulse. In this case,
the output decays exponentially with time.
Figure 20.35 Frequency response for tone (khp⫽f/100). A frequency response plot shows
how a given system or filter amplifies or attenuates a signal as you change its frequency. It
also shows how the signal’s phase would change as you change its frequency. In this case, the
gain is constant up to about 441 Hz. Above 441 Hz the gain drops with frequency. The phase
shift is almost 0° at low frequency and drops to ⫺180° at 1/2 of the sampling rate.
422 Erik Spjut
LINSEG
(a1)
1
OSCIL
1
(a2) .25
DELAY1
(a3)
.5
DELAY1
(a4)
.25
(a5)
4096
p3 p3 0 1
DISPLAY DISPFFT
Figure 20.36 Block diagram of instr 2009, an instrument that implements and tests the FIR
filter defined in equation 20.49.
IIR Example
Figure 20.37 Orchestra and score code for instr 2009, a simple lowpass design implemented
by delaying, scaling, and summing the input signal with the scaled delayed copies of the
original.
1
H(F) = 1 −2 jF 1 − 4 jF (20.53)
1+ e + e
2 2
sort of a bandpass-highpass response. A Csound instrument to implement and test
this filter is shown in figure 20.38.
Always check an IIR filter that is built from scratch for stability. It is easy to make
an unstable IIR filter. When using filters in musical applications, it is common to
change filter parameters such as the cutoff frequency or the bandwidth during a note.
Strictly speaking, if you change the parameters with time, the filter is no longer time-
invariant and the analysis techniques we have discussed no longer apply. If the
changes are slow compared with the fundamental frequency of the note, however,
approximating the filter as an LTI system is usually adequate.
Filter Design
Figure 20.38 Orchestra and score code for instr 2010, an IIR bandpass filter based on equa-
tion 20.52.
frequency response must be a periodic function, so we have to sketch or plot one full
period of the function. A plot of one period of an ideal lowpass filter with a cutoff
frequency of 1/8 (Remember the useful range of F is from 0 to 1/2) is given in fig-
ure 20.39.
The next step is to decide how many points to include in the filter. It should be an
odd number. The larger the number, the better the filter, but the more work you will
have to do. Next, for calculation purposes, you need to find the index range. For
example, for a 5-point filter, the index range is from ⫺2 to 2. For a 15-point filter
the index range is from ⫺7 to 7. Call the ending index N.
With the index range in hand you need to calculate the Inverse Fourier Transform
for the points in the index range. The result will be a set of raw filter coefficients,
a [n]. Because our filter uses only a finite number of points we will have to smooth
the coefficients. We will do this using a window. Every DSP specialist has his or her
own favorite window. We will use the Hamming window:
n
w[ n ] = 0.54 + 0.46 cos (20.54)
N
where n is the coefficient index and N is the ending index. The final filter coefficients,
b [n], are found by multiplying the raw coefficients by the window coefficients:
b[ n ] = a[ n ] ⋅ w[ n ] (20.55)
425 An Introduction to Signal Processing with Csound
Figure 20.39 Plot of one period of an ideal lowpass filter with a cutoff frequency of 1/8.
For short filters, the coefficients can be used as explained above. For longer filters,
the coefficients should be put into a soundfile as sound samples (using something
other than Csound), then cvanal should be used to convert the response to a fre-
quency response and convolve should be used to implement the filter.
It is usually wise to check how close the actual filter came to the desired filter. You
do so by taking the Fourier Transform of the filter coefficients with the equation
above. The summation only runs from ⫺N to N instead of ⫺∞ to ∞, however. The
results for a 41-point filter (N⫽20) are shown in figure 20.40.
Mathematica can handle most of the calculation work for you. The following
Mathematica commands will calculate and plot the results for a 41-point filter with
a cutoff frequency of 1/8.
⬍⬍Calculus‘FourierTransform‘
XofF⫽1-UnitStep[F-1/8]⫹UnitStep[F-7/8]
Plot[XofF,{F,0,1},AxesLabel-⬎{F,mag}]
xofn⫽Integrate[XofF Exp[2 Pi I F n],{F,0,1}
Nmax⫽20
RawFilter⫽Table[Limit[xofn,n-⬎z],{z,-Nmax,Nmax}]
Window⫽Table[0.54⫹0.46Cos[n Pi/Nmax],{n,-Nmax,Nmax}]
Filter⫽Window RawFilter
ListPlot[RawFilter,Prolog-⬎PointSize[0.02],PlotRange-⬎All]
ListPlot[Filter,Prolog-⬎PointSize[0.02],PlotRange-⬎All]
InverseTable⫽Table[Exp[-2 Pi I n F],{n,-Nmax,Nmax}]
NewXofF⫽Apply[Plus,(Filter InverseTable)]
Plot[Abs[NewXofF],{F,0,1},AxesLabel-⬎{F,mag}]
426 Erik Spjut
Figure 20.40 The plot for a 41-point filter with a cutoff frequency of 1/8 as calculated by
Mathematica using the commands above.
The only reason for the ⬍⬍Calculus‘FourierTransform’ line is for the UnitStep
function. Nmax can be changed to create larger or smaller filters.
You have probably guessed by now that the impulse response of an FIR filter
contains the filter coefficients in sequence. In other words, the filter coefficient b [n]
equals the impulse response, h [n]. This idea is the guiding principle behind con-
volve. Since any soundfile could be the impulse response of some filter we can as-
sume that it is and use the soundfile as an FIR filter. The result of using the opening
bars of Beethoven’s “Ninth Symphony” to filter “Wild Thing” will probably not be
what you would expect, but it can be done.
Modulation
Modulation is the use of one signal to change the properties of another signal. Modu-
lation is not a time-invariant process, but the methods we have developed to this
point can provide a surprising amount of insight into modulation.
Frequency Modulation
called the carrier signal, c (t) and the other is called the modulating signal, xm(t). In
FM the modulating signal changes the frequency of the carrier. For example:
where c (t) ⫽ A sin(2f0t) and IM is the modulation index. For communications sys-
tems, xm(t) is usually band-limited, f0 is usually much higher than any frequency
component in xm(t) and IM is fairly small. For musical applications, f0 is usually com-
parable to the frequency components in xm(t) and IM is rather large. One less common
use for FM is to create alien voices for science-fiction movies. The example in figures
20.41 and 20.42 demonstrates modulating a sine wave with a spoken voice. The
sound file Hamlet.aif is on the accompanying CD-ROM but any recording of a spo-
ken voice may be used.
A thorough exploration of FM requires Bessel functions and other unpleasantries
that we will forego at this point.
“...”
SOUNDIN
LINE
(a1)
(a3)
256
10000
OSCIL
1
(a2)
Figure 20.41 Block diagram of instr 2011, an instrument that frequency modulates a sine
wave with a male speaking voice.
Figure 20.42 Orchestra and score code for instr 2011, a FM instrument with a sinusoidal
carrier and a soundin modulator.
428 Erik Spjut
Amplitude Modulation
A less common form of modulation in music production, though still valuable and
much more amenable to analysis, is amplitude modulation or AM. In AM the carrier
is multiplied by the modulating signal.
yAM ( t ) = xM ( t )c( t ) (20.57)
Voice Scrambler-Descrambler
One final example makes use of most of the concepts we have covered in this chapter.
A common use for DSP is a voice scrambler for the telephone. Most telephones are
band-limited. This example uses a band limit of 8 kHz. If one amplitude modulates
an 8 kHz carrier with a band-limited voice and then filters out all frequencies
above 8 kHz and below –8 kHz, one ends up with a perfectly reversed voice spec-
trum. It is hard to understand someone speaking upside down (see figure 20.49). The
Csound example shown in figures 20.46 and 20.47 performs the modulation and
filtering. (The soundfile Sharp_8kHz_Lowpass.aif must be processed with cvanal to
generate Sharp_8kHz_Lowpass.con before the code can be run.)
To recover the original voice, one can modulate the new signal with an 8 kHz
carrier and then filter out all frequencies above 8 kHz and below –8 kHz. This process
reverses the spectrum once again, restoring it to its original state. One needs only to
change the name of the input file and use the 2013.orc file to perform the descram-
bling as shown in instr 2014 (figure 20.48).
429 An Introduction to Signal Processing with Csound
EXPSEG
(afmod)
1
“...”
OSCIL
SOUNDIN 1
(a2)
(a1)
Figure 20.43 Block diagram of instr 2012, an instrument that amplitude modulates a sound-
file of a male speaking voice with a sine wave.
Figure 20.44 Orchestra and score code for instr 2012, an AM instrument with a soundin
modulator.
X m( f ) C (f)
f f
–f m fm –fc fc
X m ( f ) * C( f )
f
–f + f –f – f f +f f –f
c m c m c m c m
–f f
c c
“Hamlet.aif”
SOUNDIN
(amod) 10000
TONE
(a1)
8k_lowpass.con 1 8000
CONVOLVE OSCIL
1
(a2)
(acarr)
(asig)
10000
TONE
(a3)
8k_lowpass.con
CONVOLVE
(ascram)
.2999
Figure 20.46 Block diagram of instr 2013, an instrument that reverses the spectrum of a
soundin and thus creates a voice scrambler/descrambler.
The difference is that the scrambling orchestra band-limits the input signal to 8
kHz before scrambling. Because the scrambled signal is already band-limited to 8
kHz, the prefiltering is not necessary for descrambling.
This example works well in Csound at the common sampling rate of 44.1 kHz but
will not run properly at 22 kHz, because the modulation creates sound at frequencies
up to 16 kHz before the lowpass filtering. At the lower sampling rate the portion of
the scrambled spectrum from 11 to 16 kHz aliases back into the range of 11 to 6
kHz, so the lowpass filter cannot remove it. The voice would be well scrambled but
impossible to unscramble.
Conclusion
This chapter has only scratched the surface of signal processing and Csound. I hope
I have convinced you, however, that Csound is a valuable, if not an optimal, tool for
431 An Introduction to Signal Processing with Csound
i 2013 0 24
Figure 20.47 Orchestra and source code for instr 2013, which scrambles a soundin using
the convolve opcode.
i 2014 0 24
Figure 20.48 Orchestra and source code for instr 2014, which descrambles a previously
scrambled.
Scrambler
Sharp 8kHz
xm(t) Lowpass yscrambled(t )
Filter
c (t )= sin(2π 8kHz t )
Descrambler
Sharp 8kHz
xm(t) Lowpass ydescrambled(t )
Filter
c (t )= sin(2π 8kHz t )
exploring the concepts of digital signal processing and that signal-processing meth-
ods and techniques are valuable for understanding what Csound does and why.
References
Oppenheim, A. V., A. S. Willsky, and I. T. Young. 1983. Signals and Systems. Englewood
Cliffs, N.J.: Prentice-Hall.
Figure 20.50 Impulse and frequency response graphs.
Figure 20.50 continued
Figure 20.50 continued
21 Understanding Csound’s Spectral
Data Types
Barry Vercoe
us really need for computer-assisted music performance are sound analysis and sens-
ing mechanisms that work just like our own.
The Csound spectral data types are based on perceptually relevant methods of
analysis and feature detection. From the initial massaging of audio input data to the
gradual mounting of evidence favoring a certain pitch, pulse or tempo, the methods
and opcodes I have devised are inspired by what we currently know about how hu-
mans “get” a musical signal. As a result, the opcodes enable one to build models
of human auditory perception and detection within a running Csound instrument.
This gives the resulting pitch and rhythm detectors both relevance and strength. In
describing the nature and use of these opcodes below I will occasionally allude to
their physiological and perceptual roots. For a more detailed account, however, see
my chapter “Computational auditory pathways to music understanding” in (Vercoe
1997).
Opcodes
A feature-detecting sequence that uses spectral data types is formed from a small set
of opcodes that can be grouped as shown in figure 21.1.
The connecting data object wsig contains not only spectral magnitudes, but also a
battery of other information that makes it self-defining. In a chain of processing
opcodes, each will modify its input spectral data, but the output object will retain the
self-defining parts to pass on to the next opcode. This opcode will in turn “know”
things, such as which spectrum opcode is periodically refreshing the first link in the
chain, the time of last refresh, how often refreshes occur, how many spectral points
there are per octave, whether they are magnitude or dB, the frequency range, etc. All
of this means that an ending operator like specdisp or specptrk can tell from its
input how often it must do work and how detailed this must be. It can also opt to
ignore some changes and work at a slower pace.
The originating spectral analysis of audio is done by:
wsig spectrum xsig, iprd, iocts, ifrqa, iq[, ihann, idbout,
idisprd, idsinrs]
The analysis is done every iprd by a set of exponentially spaced Fourier match-
ings, wherein a windowed segment of the audio signal is multiplied by sinusoids
with specific frequencies. The process is first performed for the top octave, for ifrqs
different frequencies exponentially spaced. The window size is determined by iq, the
ratio of Fourier center frequency to bandwidth. For efficiency, the data are not actu-
ally windowed at all, but the sinusoids are and these can be viewed by making idsines
nonzero. Next, the audio data are downsampled and the process repeated for the next
octave and so on, for as many octaves as requested. To fill the window of the lowest
bin in the lowest octave, the downsampled data must be kept around for some time.
The stored down-samples (dynamically changing) can be periodically displayed by
giving idisprd a nonzero value.
Keeping downsampled audio to fill a slow-moving low-frequency window brings
us to a problem we will encounter later. Both the Hanning and Hamming-shaped
windows are symmetric (bell-shaped sinusoidal) and designed to focus analytic at-
tention on their temporal center. To make the centers of all frequency-analysis win-
dows coincide at the exact same time-point, the higher frequency windows are
delayed until the low frequency window is complete. This introduces an input-output
time delay across the spectrum opcode. The amount of delay depends on both the
window size (indirectly iq) and the number of octaves (iocts). While this delaying
strategy might at first seem unnecessarily fussy, the coincident windows turn out to
make a big difference for some spectral operations like specptrk, as we will see
shortly.
Once we have reliable spectral data, the other opcodes can then proceed with their
work. The unit specaddm does a weighted add of two incoming wsigs, while
specdiff calculates the difference between consecutive frames of a single varying
spectrum. This latter can be seen as a delta analyzer, operating independently on
each “channel” of the spectrum to produce a differential spectrum as output. In fact,
it reports only the positive differences to produce a positive difference spectrum and
is thus useful as an energy onset detector. The units spechist and specfilt are similar
to each other, the first accumulating the values in each frequency channel to provide
a running histogram of spectral distribution, while the second injects each new value
into a first-order lowpass filter attached to each channel. We will see this used in one
of the examples below.
440 Barry Vercoe
The units specptrk and specsum have only control signal output and provide a
way back into standard Csound instrument processing. The first is a pitch detector,
which reports the frequency and amplitude as control signals.
koct, kamp specptrk wsig, kvar, ilo, ihi, istrt, idbthresh,
inptls, irolloff [, iodd, iconfs,
interp, ifprd, iwtflg]
The detection method involves matching the spectral data of wsig with a template of
harmonic partials (optionally odd, with some roll-off per octave). Matching is done
by cross-correlation to produce an internal spectrum of candidate pitches over a lim-
ited pitch range (ilo to ihi). The internal spectrum is then scanned for the strongest
candidate, which, if confirmed over iconfs consecutive wsigs, is declared the winner.
The output is then modified accordingly.
The combination of suitably scaled spectrum and specptrk units creates a robust
pitch detector, capable of extracting the most prominent component of a mixed
source signal (e.g., a sitar against a background drone). We can observe some of this
at work: we can display the original spectrum via a specdisp and we can display the
cross-correlation spectrum of the present unit by giving ifprd a nonzero value. When
an incoming signal has almost no energy at the fundamental (e.g., a high bassoon-
like nasal sound), this tracker will still report the human-perceived fundamental
pitch. And whereas traditional pitch detectors have difficulty with fast-moving tones
like octave scoops, this tracker will stay with the signal, largely because we have
time-aligned all the windows of octave down-samples (as described above). Lastly,
the pitch resolution of any tone is not restricted to the frequency bins-per-octave
of the originating spectrum, but employs parabolic interpolation to obtain much
higher resolution.
With an understanding of the above we are now in a position to consider some
applications.
Energy assessment in the human auditory system is a complex affair. It is not mea-
sured immediately but is integrated over time, and we cannot gauge the full intensity
of a single impact for about 300 milliseconds. If another impact should occur within
that period, the integration of the first is incomplete and the second impact becomes
the beneficiary of the remainder (Povel and Okkerman 1981). Consequently, when a
stream of impacts arrives grouped in pairs, the first of each pair will seem softer than
the second, even when both have the same physical intensity. This leads us to the
441 Understanding Csound’s Spectral Data Types
perception of a “lilting” rhythm, and the same phenomenon is at the base of all
human rhythmic perception.
A machine will not see it that way. An instrumentation-quality intensity detector
will report something much closer to the truth. And if it is digital, even its own
integration time (in the low nanoseconds) will be thwarted by the sample-and-hold
process.
So how do we get a computer to hear rhythms the way we do? We could program
a set of rules that would reinterpret the intensity patterns along human perceptual
lines; for a complex score, this could be time consuming. Or we could model the
above energy integration in the data gathering itself. This latter is the strategy imple-
mented in Csound’s spectral data processing, and an instrument that would track
audible beats and follow a changing tempo in human-like fashion would look as
shown in figure 21.2.
Every .01 seconds we form a new spectrum, 8 octaves, 12 frequencies per octave,
with a bandwidth Q of 8. We use a Hamming window, request magnitudes in dB and
skip the display of downsampled data. We next apply Fletcher-Munson scaling, us-
ing stored function tables f 3 and f 4, to simulate the frequency-favoring effect of the
auditory canal.
For the inner ear, calculation of a positive difference spectrum is relevant for the
following reason: when the human cochlea receives a sudden increase in energy at a
hair cell, the neural firing rates on its attached auditory nerve fibers register a sudden
increase, then a rapid adaptation to more normal behavior. By contrast, when it re-
ceives a sudden decrease in energy, the hair cell almost ignores the event. Clearly
our hearing has evolved to be highly sensitive to new onsets (life-threatening?) and
almost oblivious to endings, and our music reflects this with event-oriented struc-
tures flavored with percussive sounds. We give our machine a similar predilection on
each frequency channel with specdiff.
The energy integration phenomenon, however, is not visible on the auditory nerve
fibers. Apparently this must be happening at a later stage of processing and we can
measure it only by psychoacoustic experiment (Povel and Okkerman 1981). It is not
yet clear how this actually works. We simply presume in the above model to inject
the positive difference data directly into integrating filters (specfilt), whose time con-
stants are frequency dependent and are conveyed via stored f-table f 5. Finally, we
sum the energy sensation across all frequency bins to produce a running composite,
in ksum4. This is a simple sum, purposely disregarding the effects of simultaneous
masking on loudness perception (Zwicker and Scharf 1965), since our real goal is to
compare the energies across time.
To the extent that ksum4 adequately represents the fluctuation in our own sensa-
tions, we can now perform pulse and tempo estimation on a single channel of k-rate
442 Barry Vercoe
IN
(asig)
...
SPECTRUM
(wsig1) 3 4
SPECSCAL
(wsig2)
SPECDIFF
(wsig3)
5
SPECFILT
(wsig4)
1
SPECSUM
(ksum4)
...
TEMPEST
(ktempo)
60
TEMPO
Figure 21.2 Block diagram of instr 2101, a beat tracker and tempo follower.
Figure 21.3 Orchestra code for instr 2101, an instrument for taking microphone input and
controlling the tempo of the performance based on beat tracking.
443 Understanding Csound’s Spectral Data Types
data. The tempest unit does not traffic in spectral data types, so it will not be de-
scribed here beyond what is already covered in the Csound manual and further in
(Vercoe 1997). It does however afford some good graphic display of the short-term
(echoic) memory modeling from which the beat structure and tempo are derived,
along with its development of rhythmic expectations that are an essential part of
human beat and tempo tracking, and the reader is advised to try running the unit with
the input values given above so as to observe them.
The final tempo opcode takes us beyond analysis and observation. Although it
does nothing for the beat-tracking instrument itself, the tempo opcode takes the
machine-estimated running ktempo and passes it to the Csound scheduler, which
controls the timing of every new event. Therefor if the above instrument is inserted
into another working orchestra and score, and the command-line flag -t 60 is invoked,
you can control that orchestra’s performance by simply tapping into the microphone.
A live demonstration of this was intially given at the 1990 ICMC (Vercoe and Ellis
1990), when Dan Ellis controlled the tempo of a Bach performance by tapping arbi-
trary drum rhythms on a table near a microphone.
Since the same human ear that detects rhythms is also responsible for sensations of
pitch, we can build a model of this new phenomenon using many of the same initial
principles. The two paths of course eventually diverge, and we will be forced to
consider some of the special needs of pitch acuity as we get deeper into the search.
Given a good sense of pitch, it is not hard to build automatic harmonizers and pitch-
to-MIDI converters. We will look briefly at both of these before forming some
conclusions.
The two examples we will use, however, employ opcodes that are not part of the
normal Csound distribution. These are from my Extended Csound, a version I have
developed that can run complex instruments in real-time using the Analog Devices
21060 floating-point DSP accelerator (Vercoe 1996). In that system, some number
of DSPs (1–6) on a plug-in audio card can dynamically share the computational load
of a large Csound orchestra, which often contains new opcodes that extend both its
repertoire and its real-time performance capacity. Although my personal exploration
into these real-time complexities is currently dependent on such accelerators, I fully
expect the experience gained will eventually migrate to more generally accessible
platforms. The interested reader will also find additional presaging examples of Ex-
tended Csound on the CD-ROM that accompanies this volume.
444 Barry Vercoe
A Csound instrument that can pitch-track an incoming audio signal and turn that
into a five-part harmony would look as shown in figures 21.4 and 21.5.
First, we take one channel from our stereo microphone and give it some simple
equalization (EQ) to heighten the voice partials. Our spectral analysis is similar to
the above, with the following new considerations: we will form a new spectrum only
every .02 seconds, since percussive rhythm is not the likely input. We request 6 oc-
taves of downsampling, 24 frequencies per octave, with a bandwidth Q of 12. We also
request a Hanning window and root magnitude spectral data.
The choice of 24 frequency bins of Q 12 merits some discussion. Both are
weighted toward pitch-tracking rather than intensity measurement as was the case
above, yet they still fall short of an ideal model of the ear. The human cochlea has
about 400 hair-cell detectors per octave in this frequency region. On the other hand
those detectors are broad-band, with a Q of 4 (1/3 octave). Broad-band implies fast
energy collection, where things like binaural sensing of direction depend on accurate
measurement of interaural time differences. This is not our goal here, and we opt for
slower, more narrowly focused filters, one quarter-tone apart. The parabolic interpo-
lation in specptrk will do the rest.
We are now sending specptrk some favorable data. The range restriction of 6.5 to
8.9 (in decimal octaves) is sufficient to cover my voice range even on a good day and
we give it an initial hint of 7.5. So that it will not try to pitch-track just microphone
noise, we set a minimum threshold of 10 dB, below which it will output zeroes for
both pitch and amplitude. Since I may decide to sing some strange vocal sounds
(e.g., with missing fundamentals), we ask for an internal template of 7 harmonic
partials, with a rolloff of .7 per octave. We request just 3 confirmations of any octave
leap (proportionally less for smaller intervals) and ask that the pitch and amplitude
outputs be k-rate interpolated between consecutive analyses. Finally, we ask it to
display the running cross-correlation spectrum so that we can observe the various
pitch candidates in dynamic competition.
There was a price to pay for all this, it may be recalled. So that the tracker would
stay locked onto fast-moving voice “scoops,” we carefully delayed all channels of
analysis until the low-frequency window was full and the other windows could be
centrally aligned. The amount of delay incurred by spectrum is reported on the user
console at i-time. For a sampling rate of 16K, a Q of 12 and 6 octaves of downsam-
pling, that value is 66 milliseconds. Having now emerged from the spectral data type
world with a running pitch value, we now delay the audio signal by this amount so
that the audio and its pitch estimate are synchronized.
We are now ready for the harmonizer. The harmon4 unit is not part of regular
Csound, but an addition that exists in Extended Csound (Vercoe 1996). It is simi-
lar to Csound’s harmon unit, but depends on other processing modules (such as
445 Understanding Csound’s Spectral Data Types
INS
(a1)
0 3000 1
RESON
(a1)
...
SPECTRUM
(w1)
.066 ...
SPECPTRK
DELAY
(koct)
(a2)
...
HARMON4
(a3)
Figure 21.4 Block diagram of instr 2102, a pitch tracker and harmonizer.
Figure 21.5 Orchestra code for instr 2102, a pitch tracking harmonizer instrument shown
in figure 21.4.
446 Barry Vercoe
specptrk) to provide a reliable pitch estimate. Like harmon, harmon4 will pitch-
shift the original audio stream while preserving the vocal formants and vowel quality.
Also, like harmon, the pitch-shifts can be specified either as frequency ratios (with
the source) or as specific cps values. Its main advantage is better sound quality and
the ability to generate up to four vocal transpositions at once.
If you have an Extended Csound accelerator card you can run the harmon4 instru-
ment as shown above. The four transpositions are given as ratios from the source:
.75, 1.25, 1.5 and 1.875, outlining a major triad in 6–4 position with an added major
seventh at the top. This is basically the instrument that I demonstrated live at the 1996
International Computer Music Conference (Vercoe 1996), and the voice transposing
quality is quite good. If you have only the standard Csound distribution, you should
replace the harmon4 line with the following:
a3 harmon a2, cpsoct(koct), .2, 1.25, .75, 0, 110, .1
The transposed voice quality will not be as good, and there are only two added voices
instead of four, but the example will serve to demonstrate the effect.
One can imagine many variants of the above. A simple one is to replace the fixed-
ratio harmonies with independently derived pitches, as from an external MIDI key-
board, or from some algorithm cognizant, say, of the “changes” in a jazz standard.
Another is to replace harmon4 with a different generator, either a Csound looping
sample oscillator reading a different sound (a voice-controlled trombone is fun), or
a pitch-to-MIDI converter that would let you take your voice control outside the
system to another device:
midiout kamp, koct, iampsens, ibendrng, ichan
The possibilities for experimentation and development here are quite unbounded and
the reader is encouraged to develop his or her own instruments or opcodes that would
take advantage of the feature detection that spectral data types provide.
Conclusion
Csound’s spectral data types, based on perceptual methods of gathering and storing
auditory relevant data, provide a fresh look at how to enable computer instruments
to extract musically important information from audio signals. They offer a new
future of computer-assisted ensemble performance connected by sound, not merely
by electrical signals. While we do not yet have a full understanding of how humans
do the feature extraction that informs both their own performance and their listening,
we have shown that imbedding what we do know within a computer instrument can
447 Understanding Csound’s Spectral Data Types
give it an ensemble relevance that normally only live performers achieve. This is
why listening to a live performance is still so exciting and this is where computer
music eventually must go.
References
Povel, D., and H. Okkerman. 1981. “Accents in equitonal sequences.” Perception and Psycho-
physics 30: 565–572.
Vercoe, B. 1996. “Extended Csound.” Proceedings of the International Computer Music Con-
ference, pp. 141–142.
Vercoe, B., and D. Ellis. 1990. “Real time Csound: software synthesis with sensing and con-
trol.” Proceedings of the International Computer Music Conference, pp. 209–211.
Zwicker, E., and B, Scharf. 1965. “A model of loudness summation.” Psychological Review
72: 3–26.
Delay, Chorus, Reverberation, and 3D Audio
22 Using Csound to Understand Delay Lines
and Their Applications
Russell Pinkston
A delay line is a simple mechanism that records an input signal, stores it temporarily
and then plays it back again. In its most trivial application, it can be used to produce
simple “slapback echo” effects. But in truth, a simple delay line is the basis for a
wide variety of synthesis and signal processing algorithms, some of which are highly
sophisticated. Examples include reverberation, phasing, flanging, chorus, pitch-
shifting and harmonization, not to mention all types of digital filtering. Delay lines
are also central to some of the most advanced synthesis techniques, such as physical
modeling. Indeed, it can be argued that the delay line is the single most important
basic function in audio signal processing.
To understand how a digital delay line works, let us look at something comparable
in the analog studio. An effect called tape echo can be created by simultaneously
monitoring both the record and playback heads of a standard reel-to-reel tape deck
while recording. We hear a sound as it is being recorded on tape, then a short time
later as it is played back. The delay time is a function of the tape speed and the phys-
ical distance between the record and playback heads on the tape recorder. Since we
can’t change the distance between the heads, the only way we can affect the delay
time is by changing the speed of the tape: the slower the tape goes by the heads, the
longer the delay before we hear the echo and vice versa. If we don’t want to waste a
lot of good tape, we can use a short tape loop. In theory, it just needs to be longer
than the distance between the record and playback heads, but in practice, of course,
it has to be long enough to fit over the tape transport mechanism.
A digital delay line is, in effect, a model of a classic analog tape recorder with a
tape loop. Instead of recording the analog waveform on a moving strip of magnetic
tape, however, the digital waveform is stored sample-by-sample in memory, in what
is referred to as a “circular buffer.” The buffer isn’t actually circular, of course, but it
functions like the loop of tape in our analog delay system, which is recorded over
and over again as it continually circles through the transport mechanism and passes
452 Russell Pinkston
by the record and playback heads. In a circular buffer, however, the memory doesn’t
move—the “heads” do.
Essentially, we record (write data) into a buffer starting at the beginning and con-
tinuing until the end of the buffer is reached, after which we “circle back” and start
recording at the beginning again, erasing what was stored there previously. Mean-
while, we keep playing back (reading data) some distance behind where we are re-
cording. Both the record and playback points keep moving in the same direction,
continually “chasing” one another and “wrapping around” when they reach the end
of the buffer.
As in the analog tape echo system, the length of the delay in a digital system is a
function of the distance (i.e., the number of samples) between the record and play-
back points and the speed at which we are recording (the sample rate). But unlike our
analog model, in a digital delay system the distance between the record and playback
points is not necessarily fixed. It can vary anywhere from a single sample to the
full length of the buffer. Moreover, in some delay lines, the distance can be varied
dynamically, with interesting results. Finally, there can be multiple output (playback)
points in a single delay line, referred to as “taps,” each of which may be moving
independently in relation to the input (record) point.
Csound provides the following basic digital delay lines:
ar delay1 asig[, iskip]
ar delay asig, idlt[, iskip]
ar vdelay asig, adlt, imaxdlt[, iskip]
The first of these opcodes, delay1, is a fixed, one sample, delay that is intended
for use in constructing FIR filters. The other two opcodes, however, are standard
delay lines with a single output tap that allows the delay time to be specified in
seconds: delay has a fixed delay time idlt, while vdelay has a variable delay time
adlt, which can range from 1/sr seconds (1 sample period) to imaxdlt seconds.
Figures 22.1 and 22.2 show an example of a simple instrument that generates a
stereo slapback echo from a plucked string sound using the Csound delay opcode.
In order to demonstrate the basic idea as clearly as possible this instrument was
deliberately kept simple. It works fine and produces the expected results. The design,
however, has a number of problems or “inelegancies” that may not be immediately
apparent. The first problem is that we have tied our “sound” to a particular “effect.”
We might not always want echoes on our plucked string and we might want to use
our echo unit on some other sounds.
Aside from limiting our creative options, however, this approach makes our lives
much more difficult in terms of the score, because the instrument will always need
to execute long enough to produce not only the pluck, but also both echoes. Conse-
453 Using Csound to Understand Delay Lines and Their Applications
p5
LINSEG CPSPCH
(icps)
(kgate)
0 1 0 0
PLUCK
(asig)
.7 p6 .5 p7
DELAY DELAY
(atap1) (atap2)
Figure 22.1 Block diagram of instr 2201, a stereo slap-back echo instrument.
Figure 22.2 Orchestra and score code for instr 2201, stereo slap-back echo of a plucked string.
quently, the p3 duration setting of the note statements will not correspond to the
musical note durations, as usual and things might get pretty complicated if we used
a variable tempo statement, because the delay times are in seconds, not beats.
Also, we are wasting memory, because each delay opcode allocates its own inter-
nal buffer at initialization time (i-time). The size of each buffer is determined by the
delay time in seconds, specified in the score as p6 and p7, respectively. Hence, with
a sample rate of 44100, the buffer of the first delay opcode would be .3 * 44100 ⫽
13230 samples long and the second .4 * 44100 ⫽ 17640 samples long. Admittedly,
that isn’t very much memory these days. But what if we want to play a big chord, or
a long passage of very fast notes, whose echoes would overlap? Every time Csound
454 Russell Pinkston
Figure 22.3 Orchestra and score code for instr 2202 and 2203, a combination of a dry
plucked string instrument (2202) with a “global” effect instrument (2203).
needs to allocate a new instance of our instrument to make a note, it will also need
to allocate a new pair of delay buffers. The approach shown in figure 22.3 is both
more flexible and more efficient, even though it may look a bit more complicated.
This example plays the same melody as before, with a simple harmony added. This
requires that the note amplitudes in the score be reduced somewhat, to avoid samples
out of range. Instrument 2202 is basically the same as instrument 2201, but without
the delay lines (see figure 22.3). It now just produces a “dry” (un-echoed) pluck,
which is sent directly to both output channels via outs. It also adds its signal into the
variable gasend, however, which functions like an “effects send” on a traditional
mixing console. The ga prefix indicates that it is a global audio-rate variable—acces-
sible anywhere in the orchestra.
455 Using Csound to Understand Delay Lines and Their Applications
The delay opcodes are now placed in a separate, higher numbered instrument
(2203) that receives input from the global variable gasend. Only one instance of this
instrument is required. It serves as an effects unit that may be used by the whole or-
chestra and hence must be turned on for the entire duration of the passage it is pro-
cessing, plus the longest delay time.
The gasend variable is initialized in instr 2202 and must be cleared after it is used
by instr 2203. Otherwise, the samples being added by all copies of instr 2202 would
simply accumulate indefinitely. The basic concept is that we are “mixing” the outputs
of multiple instruments together and “sending” them to a single “effects unit.” Hence,
the sending instruments add to the global variable, while the receiving instrument
uses the sum for input and then zeros the global variable.
The most common use for a variable delay line is to produce a temporary pitch
change in an audio signal. It is common knowledge that if a sound is recorded at one
speed and played back at a different speed, it produces a proportional change in pitch
and an inversely proportional change in duration. Similarly, in a digital delay line, if
the record and playback points keep moving through the circular buffer at the same
rate, we hear a fixed delay time without any change in pitch. If they move at different
speeds, however, both the delay time and the perceived pitch of the output will vary.
In the Csound delay opcodes, the record point always moves at the same speed
(the sample rate), but the speed of the playback point can vary. If the playback point
is moving more rapidly than the record point, the delay time will gradually shorten
and the pitch will be raised. Conversely, if the playback point is moving more slowly
than the record point, the delay time will gradually lengthen and the pitch will be
lowered. Obviously, if two points are moving at different constant speeds around a
circle, one point will eventually overtake the other, so a pitch change in any one di-
rection can only be temporary. Hence, a common practice with a variable delay line is
to apply a low frequency oscillator (LFO) to the delay time parameter, which causes a
periodic fluctuation in the delay time and, indirectly, the pitch. As the delay time
increases, the playback point is moving more slowly than the record point and the
pitch drops; as the delay time increases, the playback point is moving more rapidly
than the record point and the pitch rises.
The waveshape used for the LFO is typically a sine, which results in a smooth
“vibrato” effect. The greater the amplitude of the LFO (amount of change in the
delay), the greater the depth of the vibrato. The delay time is never allowed to reach
zero or exceed the buffer length, so the record and playback points will never actually
meet, but the distance between them (and hence, the speed of the playback pointer)
is continually changing (see figures 22.4 and 22.5).
Once again, this example is deliberately kept simple, in order to demonstrate both
the basic technique and the gross effect. A single instrument contains both the sound
generator and the effect, which is not ideal, as has been discussed. Moreover, this is
not clearly the best way to generate a vibrato in a synthesized wave, but the variable
delay method will work with any input signal, including recorded sounds. To get a
perceptible vibrato with a single delay line, the maximum delay time has to be mod-
erately long and the amount of variation (LFO amplitude) fairly large.
457 Using Csound to Understand Delay Lines and Their Applications
imaxdel 2
p5 idelfac
(ilfoamp) ilfohz
LINEN CPSPCH
(icps) OSCILI
(kgate)
1
imaxdel 2
OSCILI (avarydel)
2
(adelay)
(asig)
imaxdel
VDELA Y
(aout)
Figure 22.4 Block diagram of instr 2204, a variable delay vibrato instrument.
Figure 22.5 Orchestra and score code for instr 2204, a variable delay-line vibrato
instrument.
458 Russell Pinkston
Flanging
The same basic algorithm used in the previous instrument can also be used to create
a well-known effect called flanging. The term derives from the practice of playing
the same recording from two tape decks simultaneously and alternately applying
pressure to the flange of one or the other machines supply reels. This causes a slight,
temporary slowing of the tape speed of the affected machine, which slightly lowers
the pitch of the recording. When the detuned signal from that deck is combined with
that of the other deck, the result is a distinctive “wooshing” sound that sweeps up
and down throughout the frequency spectrum, caused by frequency cancellations
and reinforcements between the two signals.
The effect can be simulated digitally by combining the original sound with a copy
sent through a delay line with a varying delay time. The delay time must be short (on
the order of a few milliseconds) and vary at a relatively slow rate of speed and the
delayed signal must be combined with the original in order to create the flange effect.
The effect can be accentuated by incorporating a feedback loop into the delay line,
but it is a wise precaution to use a balance opcode to make sure that the feedback
stays in control. In the following example, we will use a single delay line that has
two variable taps, each of which is controlled by a separate LFO. The output of one
tap is sent to the left speaker, the other to the right.
The following Csound delay opcodes must be used if multiple taps from the same
delay buffer are needed:
ar delayr idlt[, iskip]
ar deltap kdlt
ar deltapi xdlt
delayw asig
These are the most flexible Csound delay opcodes, but they are a bit more compli-
cated to use. The delayr and delayw opcodes must be used together in order to
establish the delay line and write a signal into it. The delayr opcode, which allocates
a delay buffer idlt seconds long, must come first. After the delayr, any number of
deltap or deltapi opcodes may be used, followed by the delayw opcode, which actu-
ally writes data into the buffer. The deltapi opcode is simply the interpolating version
of deltap. It is slightly less efficient, but it produces a significantly better sound when
the delay time is varying. Figure 22.6 shows a global stereo flanger effect instrument
and figure 22.7 shows the corresponding Csound code.
459 Using Csound to Understand Delay Lines and Their Applications
(gasend) ifeed
(afeed)
BALANCE
(ainput)
DELAYW
Figure 22.6 Block diagram of instr 2205 and 2206, a soundin instrument (2205) with a
global output to a stereo flanger instrument (2206).
Flanging Notes
In this example (see figure 22.7), instr 2205 uses basically the same method as instr
2202 to send its signal to the effects instrument: it initializes the global variable
gasend to zero at i-time, then mixes a portion of the signal from the soundin opcode
into gasend on every sample. The only addition is the variables imain and isend,
which are used to control the proportion of the signal sent directly to the outputs
and/or to the effects unit, respectively. Once again, the effects instrument (instr
2206) must zero the global variable (gasend) after it has been used as an input.
In instr 2206, a single delay circuit is established by the delayr/delayw pair, which
allocates a buffer large enough for a delay of imaxdel seconds. The output of delayr
(afixed ) is not used, but a result variable is required nonetheless.
The deltapi opcodes are used to provide two variable taps, whose delay times are
controlled by two oscili opcodes. The oscili opcodes both use Hamming window
460 Russell Pinkston
sr ⫽ 44100
kr ⫽ 44100 ; SR SHOULD EQUAL KR WHEN
ksmps ⫽ 1 ; ... USING AUDIO FEEDBACK.
nchnls ⫽ 2
Figure 22.7 Orchestra and score code for instr 2205 and 2206, a dry soundin instrument
(2205) with global sends to a stereo flanger instrument (2206).
461 Using Csound to Understand Delay Lines and Their Applications
functions, whose values range from 0–1. They have different frequencies and ampli-
tudes (peak delay times), producing two completely independent variable delay taps.
The “depth” of the flange produced by each deltapi is determined by the values of
p8 and p9 from the score, which must have values between 0–1, since they will be
multiplied by imaxdel to determine the actual peak delay time for each tap, which
cannot be longer than the size of the buffer.
The outputs of the deltapi opcodes are first sent to separate output channels
(which produces a simulated stereo effect from a mono input signal), then summed
and fed back into the delay line. The feedback accentuates the flange effect, but it
must be controlled carefully. The variable ifeed attenuates the summed signal before
it is added back into the delay input, which provides one means of limiting gain. It
is wise, however, to use a balance opcode as well, which ensures that the feedback
will not become uncontrolled.
Whenever audio feedback (or “recursion”) is used in Csound, ksmps must always
be smaller than the feedback interval. In this example, kr is set equal to sr, so a
feedback interval as small as a single sample can be implemented.
The maximum delay time in this example is short (.005 seconds) and the taps set
to .8 and .9 of imaxdel, yielding peak delays of .004 and .0045, respectively. These
short delay times result in slight detunings, which are appropriate for flanging.
Longer delay times cause proportionally larger amounts of pitch shift, which can also
be an interesting effect. If the pitch shifting becomes too great, however, it results in
an audible vibrato and the distinctive sound of flanging is lost.
The minimum delay time for the Csound deltapi opcode is one k-period. Here,
the opcodes are given minimum delay values of .001 and .0005, respectively. The
times are staggered slightly to avoid the resonant peak that would occur when both
times are initially the same and then summed and fed back into the delay network.
In both of the previous examples, the delay times were varied periodically, with the
result that the playback point was sometimes moving faster than the record point and
sometimes moving slower, but the points were never allowed to meet. This produced
a “vibrato” effect, with the pitch sometimes higher and sometimes lower and con-
stantly fluctuating. To produce a continuous pitch shift using a delay line, the play-
back point must consistently move at a different rate from the record point, which
will eventually result in the two points meeting somewhere as they move around the
circular buffer. When that happens, the delay instantly changes from the maximum
to the minimum, or vice-versa, which causes an audible click or pop.
462 Russell Pinkston
To solve this problem, we can use two delay taps one-half of a buffer-length apart
and crossfade back and forth between them, so that we are never listening to a tap at
the moment when it meets and crosses the record point. This is the basic method
used by some commercial pitch shifters.
The amount of pitch shift produced by a variable delay line depends on the rate at
which the delay time is changing. The more rapid the rate of change, the greater the
interval of shift. If the delay time is shortening, the pitch will be shifted up; if it is
lengthening, the pitch will be shifted down. The exact amount of shift is determined
by the ratio of the speed of the playback point to that of the record point. The Csound
deltapi opcode, however, does not allow us to specify the speed of the playback point
directly; we can only affect it indirectly by changing the delay time dynamically. This
is a bit awkward and counter-intuitive, so a brief explanation may be useful.
For a moment let us think of the delay buffer as a short wavetable. If the table
contains D seconds of sound, then it should take exactly D seconds to play back that
sound if we don’t plan to change the pitch. If the table were being used as the stored
function for a Csound oscil opcode, which assumes its function contains one cycle
of a fundamental, we would need to use a frequency input of 1/D cycles per second
(Hz) in order to play back the sound exactly once in D seconds and reproduce the
original pitch.
To shift up by an octave, we would simply double the frequency to 2/D Hz; to shift
down an octave, we would halve the basic frequency (.5/D Hz). In fact, we could get
any relative pitch we wanted by simply taking the ratio of the desired frequency to
the original frequency and multiplying it by 1/D, where D is the original duration of
the sound. So, we can define a simple formula for pitch shifting a recorded sound by
“resampling” with an oscil:
newhz 1
oscil_cps = ∗ (22.1)
oldhz D
In effect, gradually changing the delay time of a deltap opcode from the maximum
delay time to zero is the same thing as moving the playback point all the way through
(completing one cycle of) a wavetable, relative to the record point. So we can think
about the rate of change in delay time (and the resulting pitch shift) in terms of cycles
per second, just as we did with oscil—each time we sweep the delay time from the
maximum to zero (or vice versa), we are completing a single “cycle.”
But in a delay line, of course, the playback point is already moving at a rate of
1/D th of the buffer per second, even when there is no pitch shift and the delay time
is constant. Consequently, anything we do to alter the pitch by dynamically changing
the delay time is simply being added to that basic rate. So we can use almost the
same formula for pitch shifting with deltap that we used with oscil; but we need to
463 Using Csound to Understand Delay Lines and Their Applications
subtract out the rate at which the playback point is already moving for untransposed
playback, which is 1/D, hence:
xdlt = ⎛
newhz 1 ⎞ 1
∗ − (22.2)
⎝ oldhz D⎠ D
or
Pitch-Shift Notes
The ratio for pitch shifting is calculated by converting the number of semitones speci-
fied in p6 to a fraction of an octave (insemis/12), then adding the fraction to an arbi-
trary base octave (8 ⫽ middle C) and comparing the CPS frequency of the resulting
pitch to the frequency of the base octave (see figure 22.8).
Two phasor opcodes are used to control the delay times and gating functions of
the two deltapi opcodes. They are given the same input frequency, -irate, but one of
the phasor opcodes has an initial phase of .5. This results in the two delay taps being
kept precisely 1/2 buffer length apart and the two tablei opcodes used for gating
being similarly offset by 1/2 the length of the triangle window function they both
reference. The delay times are changing gradually and continually, as are the gate
signals, agate1 and agate2. Whenever the delay time of a particular deltapi opcode
approaches the wraparound, between 0 and idelay seconds, its gate function is ap-
proaching zero; when it approaches the midpoint—idelay/2, its gate function is ap-
proaching 1. The outputs of the two taps are summed, so that we are always listening
to both of them, but we never hear the click of the wraparound.
The frequency argument to the phasor opcodes is negated, which causes their
outputs to ramp down from 1 to 0 (shortening the delay times and raising the pitch)
when irate is positive, but ramp up from 0 to 1 (lengthening the delay times and
lowering the pitch) when irate is negative.
464 Russell Pinkston
Figure 22.8 Orchestra and score code for instr 2207, a continuous pitch-shift instrument.
This pitch shifting algorithm will work regardless of the length of the delay (p5).
The longer the delay line, however, the more of an audible delay there is in the output.
On the other hand, the shorter the delay line, the more rapid the gating between the
two taps, which can cause audible aliasing with large transposition values and/or
short delay times. The gating length of the delay for this example was chosen to be
.1 seconds, which is a reasonable compromise.
This example does not use a separate “effects” unit that receives input from a glo-
bal variable, which is generally to be preferred, because the score plays a four-note
chord from a single soundin file and a separate delay network is required for each
transposed pitch in the chord.
465 Using Csound to Understand Delay Lines and Their Applications
Harmonizer Instrument
The final example is an extension of the previous instrument, with the addition of an
LFO and a feedback loop, which produces the distinctive interval cycling that is
the hallmark of commercial harmonizers. In the example shown in figure 22.9, the
harmonizer is placed in an effects instrument and it receives input from the global
variable gasend.
Harmonizer Notes
The addition on an LFO requires that the rate variable be changed from irate to krate.
Note that if the LFO frequency (ilfohz1) is zero, the oscili opcode is skipped alto-
gether during performance (see figure 22.9).
The outputs of the two deltapi opcodes are fed back into the delayw opcode, but
the only limits are the igain and ifeed variables, so it is possible for uncontrolled
feedback to occur, which will result in samples out of range and clipping. A balance
opcode could be employed, but its use would prevent the distinctive interval cycling
that continues after notes in commercial harmonizers.
With longer delay times, larger intervals and significant feedback, the interval cy-
cling effect is most pronounced. With shorter delay times, smaller intervals, feedback
and LFO, the harmonizer can serve as a flanger.
If feedback is employed with short delays, it may be necessary to make kr ⫽ sr, as
was done in the stereo flanger instrument (instr 2206).
Conclusion
In this chapter we have explained some of the basic concepts associated with digital
delay lines and demonstrated several of the most common applications for them. We
have only scratched the surface of this subject, however from what has been pre-
sented, one can clearly see that digital delay lines are exceedingly flexible and power-
ful functions, capable of an enormous variety of subtle and complex effects and the
reader is strongly encouraged to experiment further with them. A particularly useful
exercise is to study a typical commercial multieffects box and try to implement some
of the effects in Csound. Most of these devices include documentation on the various
algorithms and with manual in hand, a good ear and a little patience, it is often not
difficult to emulate them using the standard Csound opcodes.
instr 2208 ; SOUNDIN 2
gasend init 0 ; INIT THE GLOBAL VAR
idry ⫽ p4 ; AMT OF SIGNAL TO OUT
iwet ⫽ p5 ; AMT OF SIGNAL TO EFFECT
ainput soundin p6
out ainput*idry
gasend ⫽ gasend⫹ainput*iwet
endin
Figure 22.9 Orchestra and score code for instr 2208 and 2209, a soundin instrument (2208)
with sends to a global harmonizer instrument (2209) with internal LFO and feedback.
23 An Introduction to Reverberation Design
with Csound
Eric Lyon
This chapter introduces the creative design of Csound reverberators. Before proceed-
ing to orchestra design, we will introduce some basic concepts of reverberation. A
full treatment of the subject is beyond the scope of the chapter. Fortunately, there are
several excellent texts providing more in-depth technical and mathematical informa-
tion on the subject, such as Moorer (1979) and Begault (1994).
Reverberation is the result of sound propagation in an environment with reflective
surfaces. When you strike a bass drum or stroke a violin, sound propagates in all
directions from the vibrating surfaces of the instrument in the elastic “medium”
known as air. The sound wave continues traveling outward at the speed of sound until
it encounters a not-so-elastic medium such as a wall or a person. At this point, part
of the energy of the wave is absorbed and part is reflected back. As illustrated in
figure 23.1, the reflected wave continues to propagate until it hits another reflective
surface.
Thus there is a gradual buildup of complexity as more and more reflections pile
onto each other. At the same time there is a gradual loss of amplitude as the reflec-
tions dissipate their energy through friction and repeated collisions with surfaces.
When the source sound stops, the reverberation will continue for a certain amount
of time before dying away. The amount of time for the energy to die down below
audibility is known as the “reverberation time,” as seen in figure 23.2.
Reverberation provides the listener with information about both the environment
and the sound source. The ratio between the direct signal and the reverberated signal
indicates the distance of the source. The amount of time between early reflections
indicates the size of a space. In a larger space, it takes longer for a sound to reach a
wall and therefore the time interval between the reflections is longer. If the space is
large enough, the first few reflections will be heard distinctly and perceived as ech-
oes. When the reflections become close together, we perceive a “wash” of sound
rather than distinct echoes. If the source is in motion, Doppler shift results. Familiar
468 Eric Lyon
dB SPL
EARLY REFLECTIONS
90
BUILD-UP
RT 60
AMBIENT
30 NOISE
LEVEL
1 2 3 seconds
as the change of perceived pitch of a train siren as the train approaches (higher pitch)
and then recedes (lower pitch).
Stereo hearing allows us to estimate direction of the sound because of interaural
time delay (ITD), the time interval between the sound entering one ear and entering
the other. It turns out that we make even more precise localization measurements by
perceiving the filtering effect on sound due to angle of incidence on the head, shoul-
ders and, especially, the pinnae (outer ears).
Since reverberation conveys much information about acoustic spaces, there have
been a number of “room simulation” approaches to reverberation design. Although
we will not pursue that approach here, we should note that rooms have striking fre-
quency impact on sound. Waveforms that “fit” the distance between opposing walls
(or floor/ceiling) will be reinforced and result in so-called standing waves with differ-
469 An Introduction to Reverberation Design with Csound
ent degrees of loudness at different locations in the room. A good way to experience
the effects of room response is to generate a slow scale of sine waves over the audio
frequency range with a resolution of a 1/4 tone. You may be surprised at the range
of amplitude variation you hear. A striking demonstration of the filtering effects of
rooms is heard in the composition “I Am Sitting in a Room” by Alvin Lucier, in
which a recording of spoken voice is played into the room from one tape recorder
and recorded to a second deck, repeatedly, so that the filtering effects of the room
are increasingly reinforced. After a few generations, the voice gradually becomes
unintelligible and is replaced by a remarkable variety of room resonances (Lucier
1990).
With this background information, we begin our experiments with Csound. In our
first orchestra (see figures 23.3 and 23.4), we will use the Csound “house” reverbera-
tor. We will drive the reverberator with a short, enveloped burst of band-limited white
noise. Although not the most attractive sound in the world, it is quite useful for test-
ing reverberators since digital white noise contains all frequencies up to the Nyquist
Frequency (one half of the sampling rate) and the sharp attack will reveal the tran-
sient response. If white noise sounds good in your reverberator, most other sounds
will too.
We will separate the noise generator from the reverberator and write the noise into
a global variable. The advantage of this strategy is that any signal can be mixed into
the ga-variable. Think of it as an “effects-send” buss. Next, we modify this orchestra
slightly so that we may control the ratio of wet to dry signal (see figure 23.5).
You may have noticed that despite our description of reverberation as a carrier of
spatial information, our orchestras have been monophonic. We will simulate the
stereo diffusion of reflections with the Csound delay unit (see figure 23.6). We will
scatter the signal and route it to two reverberation units. Note that we must set the
control rate equal to the sample rate. This is necessary for variable delay times to be
represented internally with sample-level precision. There is nothing special about
these delay times, however, as they were generated at random by a computer pro-
gram.
Now we are definitely hearing a stereo field. We will use one more trick to further
enrich the stereo field. We will run the output of each reverb channel through a slowly
time-varying delay line, simulating Doppler shift (see figure 23.7).
At this point I recommend playing the reverb orchestras we’ve designed using
sampled sounds. Simply replace the noise generation orchestra with soundfile read-
ing orchestra shown in figure 23.8 and adjust the durations in the score file.
470 Eric Lyon
LINSEG (gadrysig) p4
0 (kenv) sr/2 .5 REVERB
INIT RANDI (areverb)
(gadrysin) (anoise)
(gadrysig)
(gadrysig)
Figure 23.3 Block diagram of instr 2301 and 2302, an instrument that produces a gated
burst of white noise (2301) and sends it to a global reverb instrument (2302).
; INS ST DUR
i 2301 0 .15
; INS ST DUR REVERB_TIME
i 2302 0 5.3 5
Figure 23.4 Orchestra and score code for instr 2301 and 2302 as shown in figure 23.3.
We will make one final modification to the Csound reverb before designing some
reverberators from scratch. You may have noticed that the house reverb sounds a bit
“harsh.” Harshness is often the perceptual result of too much high frequency energy.
We will put a lowpass filter at the end of the signal chain to roll off the high end (see
figure 23.9). Csound currently offers two lowpass filter opcodes: tone and butterlp.
The Butterworth filters give a sharper cutoff, which is often desirable. In this case
however, the more gradual rolloff of tone works better to simulate the gradually
increased attenuation of high frequencies in sound propagation. The Butterworth fil-
ters, by contrast, are more useful for isolating spectral bands.
instr 2303 ; NOISE BURST WET/DRY
iwetamt ⫽ p4
idryamt ⫽ 1-p4
kenv linseg 19000, .1, 1000, p3-.1, 0
anoise randi kenv, sr/2, .5
gadrysig ⫽ gadrysig ⫹ anoise*iwetamt
out anoise*idryamt
endin
Figure 23.5 Orchestra and score code for instr 2303 and 2304, a gated noise instrument
(2303) with variable control over the mix between the direct signal and the amount of signal
sent to the global reverb instrument (2304).
Figure 23.6 Orchestra code for instr 2305, an instrument that simulates a stereo diffusion
of echoes with a band of deltap opcodes inserted between a delayr and delayw pair.
472 Eric Lyon
Figure 23.7 Orchestra code for instr 2306, a stereo global reverb with time varying delay
on each channel to simulate Doppler shift.
Figure 23.8 Orchestra code for instr 2307, a stereo soundfile instrument with scaleable
global send levels.
473 An Introduction to Reverberation Design with Csound
Figure 23.9 Orchestra code for instr 2309, a stereo global reverb in which tone filters have
been added to gradually attenuate the highs.
One of the most important unit generators for reverberator design is the feedback
loop or recirculating delay line. Feedback units are ideal for simulating the buildup
of reflections in a acoustic space. Just as the reflections of physical sound are recur-
sively reflected, the output of a feedback loop is recirculated back into itself. There
are two varieties of feedback loops commonly used in reverberators: comb filters
and allpass filters.
474 Eric Lyon
Comb filters are simply a delay line with feedback, generating resonance at har-
monics of the loop frequency at a strength specified by the amount of feedback. All-
pass filters are slightly modified comb filters that have a uniform frequency response,
frequency-specific phase response and similar reverberant characteristics to the comb
filter. Allpass filters are often deployed in series, so that the buildup of reflections
from one filter is further accumulated in the next filter. By contrast, comb filters are
often connected in parallel to avoid accumulating strong resonances. Of course
such accumulation can be done for special effect and we will see an example of this
later.
A single comb or allpass filter imparts clear reverberant qualities to its input, but
is generally too simple and predictable to the ear. Therefore in most reverberators
these units are combined in complex networks. Our next reverberator will be built
from allpass filters and lowpass filters (see figures 23.10 and 23.11). We will use a
simple strategy. The early allpass loops will have shorter reverberation times and the
later ones will have gradually longer reverberation times. Also, the cutoff frequency
for each successive lowpass filter is gradually lower, simulating the increasing loss
of high frequency energy. Notice that the reverberation times are all set relative to
the main reverberation time, which is set with p4.
We will now build a true stereo reverberator. As it is a bit complex, we will design
and listen to it in stages. We begin with two allpass filters in parallel for each channel,
run into a time-varying delay (see figures 23.12 and 23.13).
Next we will create some of the background reverb “wash” with three parallel
comb filters going into an allpass filter and then a lowpass filter (see figure 23.14).
We will decorrelate this background wash to stereo by running the mono reverber-
ant signal aglobrev into two slowly randomly varying delays. Finally, we will mix
down a combination of the different reverberant components (see figure 23.15).
In our next design we will create a deliberately harsh metallic sounding reverberator
that might be appropriate for industrial music applications. In this design we will
connect comb filters in sequence to create strong resonances. Since we are driving
the reverberator with white noise, all the resonances are generated by the comb fil-
ters. Try it on some sampled sounds (voice is a good source) and notice the inter-
action between the spectral content of the source sound and the resonance of the
reverberator (see figure 23.16).
p4 3
(istretch)
.2
(gadrysig) (irvt1)
.04
ALPASS
.5
(ar1) (irvt2)
.09653
5000
ALPASS
TONE 2.1
(ar2) (irvt3)
(arl1) .065
3000
ALPASS
TONE 3.06
(ar3) (irvt4)
(arl2) .043
1500
ALPASS
TONE (ar4)
(arl3) 500
TONE
(arl4)
(arev)
gadrysig = 0
Figure 23.10 Block diagram of instr 2310, a series of allpass filters used to progressively
accumulate the buildup of reflections and delays following by a tone filter for high fre-
quency damping.
Figure 23.11 Orchestra code for instr 2310 as shown in figure 23.10.
476 Eric Lyon
INIT
(gadrysig)
(gadrysig)
1.7 .1
ALPASS
RANDI
.3 (atmp) 1.01 .07
(kdel1)
.1 DELAYR ALPASS
(kdel1) (addl1) (aleft)
DELTAPI gifeed
DELAYW
(afeed1)
(aleft)
(afeed1)
(gadrysig)
1.5 .2
ALPASS
RANDI
.3 (atmp) 1.33 .05
(kdel2)
.1 DELAYR ALPASS
(kdel2) (addl2) (aright)
DELTAPI gifeed
DELAYW
(afeed2)
(aright)
(afeed2)
gadrysig = 0
Figure 23.12 Block diagram of instr 2311, the preliminary version of a true stereo reverb
instrument with 2 parallel allpass filters in each side with random delay times.
Our final example is a bit strange and not a customary reverberation architecture
but it builds on the Csound tools and design issues we have already seen. I hope it
will suggest the possibilities of creative reverberation design with the purpose of not
just replicating familiar reverberation models but also exploring processing architec-
tures that go beyond what fixed-algorithm commercial reverberators can produce
(see figure 23.17). The basic idea of this reverberator is to break the input signal into
four bands, using Csound’s reson bandpass filter. Each of these bands is scaled by a
slowly changing envelope so we hear the different frequency bands moving in and
out of phase.
sr ⫽ 44100
kr ⫽ 44100
ksmps ⫽ 1 ; NEEDED FOR SMOOTH DELAY TIME INTERPOLATION
nchnls ⫽ 2
gifeed ⫽ .5
gilp1 ⫽ 1/10
gilp2 ⫽ 1/23
gilp3 ⫽ 1/41
giroll ⫽ 3000
gadrysig init 0
Figure 23.13 Orchestra code for instr 2311, a stereo reverb as shown in figure 23.12.
Figure 23.14 Orchestra code for instr 2313, a global “wash of reverb” achieved with 3 paral-
lel comb filters mixed and passed through an allpass and tone in series for smearing, color
and high frequency absorption.
instr 2314 ; GLOBAL REVERB INTO 2 VARYING DELAYS
atmp alpass gadrysig, 1.7, .1
aleft alpass atmp, 1.01, .07
atmp alpass gadrysig, 1.5, .2
aright alpass atmp, 1.33, .05
kdel1 randi .01, 1, .666
kdel1 ⫽ kdel1⫹.1
addl1 delayr .3
afeed1 deltapi kdel1
afeed1 ⫽ afeed1⫹gifeed*aleft
delayw aleft
kdel2 randi .01,. 95, .777
kdel2 ⫽ kdel2⫹.1
addl2 delayr .3
afeed2 deltapi kdel2
afeed2 ⫽ afeed2⫹gifeed*aright
delayw aright
aglobin ⫽ (afeed1⫹afeed2)*.05
atap1 comb aglobin, 3.3, gilp1
atap2 comb aglobin, 3.3, gilp2
atap3 comb aglobin, 3.3, gilp3
aglobrev alpass atap1⫹atap2⫹atap3, 2.6, .085
aglobrev tone aglobrev, giroll
kdel3 randi .003, 1,. 888
kdel3 ⫽ kdel3⫹ .05
addl3 delayr .2
agr1 deltapi kdel3
delayw aglobrev
kdel4 randi .003, 1, .999
kdel4 ⫽ kdel4⫹ .05
addl4 delayr .2
agr2 deltapi kdel4
delayw aglobrev
arevl ⫽ agr1⫹afeed1
arevr ⫽ agr2⫹afeed2
outs arevl, arevr
gadrysig ⫽ 0
endin
Figure 23.15 Orchestra code for instr 2314, the final phase of our stereo reverb, which
decorrelates the monophonic background “wash of reverb” by passing the signal through two
slow randomly varying delays.
Figure 23.16 Orchestra code for instr 2315, a metallic sounding reverb. The strong reso-
nances are derived from passing the dry signal through a series of comb filters.
479 An Introduction to Reverberation Design with Csound
instr 2316
iorig ⫽ .05
irev ⫽ 1.-iorig
igain ⫽ 1.0
ilpgain ⫽ 1.5
icgain ⫽ .1
ialpgain ⫽ 0.1
ispeed1 ⫽ p4
ispeed2 ⫽ p5
ispeed3 ⫽ p6
ispeed4 ⫽ p7
icf1 ⫽ p8
icf2 ⫽ p9
icf3 ⫽ p10
icf4 ⫽ p11
ifac ⫽ 2
ibw1 ⫽ icf1/ifac
ibw2 ⫽ icf2/ifac
ibw3 ⫽ icf3/ifac
ibw4 ⫽ icf4/ifac
; CYCLIC AMPLITUDE ENVELOPES
aenv1 oscil igain, ispeed1, 1
aenv2 oscil igain, ispeed2, 2
aenv3 oscil igain, ispeed3, 3
aenv4 oscil igain, ispeed4, 4
; BREAK INTO BANDS
ares1 reson gadrysig, icf1, ibw1, 1
ares2 reson gadrysig, icf2, ibw2, 1
ares3 reson gadrysig, icf3, ibw3, 1
ares4 reson gadrysig, icf4, ibw4, 1
; SUM THE ENVELOPED BANDS
asum ⫽ (ares1*aenv1)⫹(ares2*aenv2)⫹(ares3*aenv3)⫹(ares4*aenv4)
; LOWPASS AND COMB SEQUENCE
alp tone asum, 1000
adright delay alp, .178
adleft delay alp, .215
asumr ⫽ asum⫹(adright*ilpgain)
asuml ⫽ asum⫹(adleft*ilpgain)
acr1 comb asumr, 2, .063
acr2 comb acr1⫹asumr,. 5, .026
acl1 comb asuml, 2, .059
acl2 comb acl1⫹asuml, .5, .031
acsumr ⫽ asumr⫹(acr2*icgain)
acsuml ⫽ asuml⫹(acl2*icgain)
Figure 23.17 Orchestra code for instr 2316, a “strange reverb” with shifting resonances.
480 Eric Lyon
Conclusion
References
Begault, R. 1994. 3-D Sound for Virtual Reality and Multimedia. New York: AP Professional.
Moorer, J. A. 1979. “About this reverberation business.” In C. Roads and J. Strawn eds. 1985.
Foundations of Computer Music. Cambridge, Mass.: MIT Press.
Roads, C. 1996. The Computer Music Tutorial. Cambridge, Mass.: MIT Press.
24 Implementing the Gardner Reverbs
in Csound
Hans Mikelson
This chapter describes the implementation of reverbs based on nested allpass filters.
Reverberant sound occurs when sound waves are reflected by surfaces repeatedly
until the individual reflections merge into a continuous sound. Nested allpass filters
proposed by Barry Vercoe and Miller Puckette (1985) can be used to simulate the
dense reflections associated with room reverberation. This chapter describes several
different types of allpass filters and uses them to implement three different reverbs.
The reverbs presented in this section are derived from those developed by Bill Gard-
ner (1992).
An allpass filter is made by adding both a feedback path and a feedforward path to a
delay line as shown in figure 24.1. Gain is applied to the feedback path and negative
gain is applied to the feedforward path. An allpass filter passes all frequencies unal-
tered but changes the phase of each frequency. This can be implemented in Csound
as follows:
adel1 init 0
aout ⫽ adel1-igain*ain ; FEEDFORWARD
adel1 delay ain⫹igain*aout, itime ; FEEDBACK
Delay
ain aout
-g
g1
-g2
Delay Delay2
ain aout
g2
-g1
The nested allpass filters are now combined to form reverbs. The first reverb pre-
sented is for a small room. It consists of a double nested allpass filter followed by a
single nested allpass filter. The input is prefiltered at 6 kHz to reduce metallic ringing.
An overall feedback path is bandpass filtered and added to the input. The feedback filter
also reduces the metallic character of the reverb and reduces DC offset. Another sim-
plified notation is presented in figure 24.6. Delays are indicated by putting the delay
time above the signal path; all times in these figures are expressed in milliseconds.
The small room reverb shown in figure 24.6 can be implemented in Csound as
shown in figure 24.7.
t1 (g1)
t2 (g2)
ain aout
g1
g2 g3
-g 2 -g 3
-g 1
t1 (g1)
t2 (g2) t3 (g3)
ain aout
ain
0.5 0.5
LPF
6 kHz
35 (0.15) 66 (0.08)
24
gain 0.5
Figure 24.7 Orchestra code for instr 2402, a small room reverb.
488 Hans Mikelson
ain
0.5 0.5 0.5
LPF
6 kHz
35 (0.25) 39 (0.25)
5 67 15 108
gain = 0.4
The next reverb is for a medium room (see figure 24.8). It consists of a double nested
allpass filter followed by an allpass filter, followed by a single nested allpass filter.
The input is prefiltered at 6 kHz and is introduced at both the beginning and before
the final nested allpass filter. Output is taken after each allpass filter section. The
overall feedback is bandpass filtered at 1 kHz with a bandwidth of 500 Hz. There are
four delays in this reverb. The first delay follows the first output tap. The second and
third delays are before and after the second output tap and the third delay precedes
the overall feedback. The medium room reverb shown in figure 24.8 can be imple-
mented in Csound as shown in figure 24.9.
The final reverb considered is for a large room (see figure 24.10). The major elements
are two allpass filters in series, followed by a single nested allpass filter and finally a
double nested allpass filter. Outputs are taken after the first two allpass filters, after
the single nested allpass filter and after the double nested allpass filter. Delays are
489 Implementing the Gardner Reverbs in Csound
Figure 24.9 Orchestra code for instr 2403, a medium room reverberator.
490 Hans Mikelson
ain
1.5 0.8 0.8
LPF
4 kHz
87 (0.5) 120 (0.5)
4 17 31 3
0.5 0.5
BPF 1 kHz,
500 Hz
introduced before and after the first two output taps. The input is again prefiltered and
the overall feedback is scaled and bandpass filtered. The large room reverb shown in
figure 24.10 can be implemented in Csound as shown in figure 24.11.
Conclusion
The nested allpass filters presented here suggest other configurations of allpass fil-
ters. For instance, a third allpass filter could be inserted into the double nested allpass
filter for three allpass filters in series. An additional level of nesting could be applied
to the nested allpass filters. Many other configurations of nesting could be the subject
of future experimentation and many other reverb configurations could be imple-
mented as a result. The final Csound orchestra and score accompanying this chap-
ter, instr 2405.orc features a flexible system for experimenting with various reverb
configurations. Have fun.
491 Implementing the Gardner Reverbs in Csound
Figure 24.11 Orchestra code for instr 2404, a large room reverb.
492 Hans Mikelson
References
Gardner, W. G. 1992. The Virtual Acoustic Room. Master’s thesis, MIT Media Lab.
Vercoe, B., and M. Puckette. 1985. Synthetic Spaces—Artificial Acoustic Ambiance from Ac-
tive Boundary Computation. Unpublished NSF proposal. Cambridge, Mass. Music and Cogni-
tion Office at MIT Media Lab.
25 Csound-based Auditory Localization
Elijah Breder and David McIntyre
Within Csound there exist the traditional tools for spatial manipulation—panning
and multichannel processing. Both these tools have their limitations. Panning limits
the composer to the stereo field, while multichannel processing requires a large and
complex arrangement of speakers. In addition, both techniques rely on amplitude re-
lations between channels to create spatial locations. But by incorporating spectral
modifications, sound spatialization better approximates the way we hear sounds in
the real world.
The hrtfer opcode performs these spectral modifications to create the illusion of
a three-dimensional auditory space. This space is created by processing a mono sig-
nal with Head Related Transfer Functions (HRTFs), producing a left/right pair of
audio signals. Although these signals may be presented over speakers, this project
was intended for headphone-based listening.
This chapter will first discuss the perceptual and psychophysical issues related to
human localization. This will be followed by a general discussion of 3D audio sys-
tems. The hrtfer unit generator will then be presented along with some examples of
its use.
3D Sound
Doppler effect and head movement. This section will discuss these various aspects
of human localization.
Interaural difference cues are probably the most important localization cues we use
to localize sound sources on the horizontal plane. From an evolutionary standpoint,
this makes perfect sense: humans are terrain-based animals whose auditory system
has been optimized through evolution to deal with terrain-based sound sources, in-
cluding those sources that are outside the field of view. The horizontal placement of
our ears maximizes interaural differences for sound waves emitted by a source on
the horizontal plane. Our auditory system has the ability to detect interaural differen-
ces in phase, amplitude envelope onset and intensity. By minimizing these differ-
ences with head movement, we are able to direct our focal vision to items of interest
that may not be in our current field of view.
One of the first people to study and explain binaural localization of sound was
Lord Rayleigh who, in 1876, performed experiments to determine his ability to local-
ize sounds of different frequencies (Rossing 1990). He found that lower frequencies
were much harder to localize than higher frequencies. His explanation was that
sounds coming from one side of the head produced a more intense sound in the ear
closer to the source (ipsilateral ear) than in the opposite ear (contralateral ear).
Rayleigh went on to explain that for high frequencies, the head casts a “shadow”
on the contralateral ear, thereby reducing a given sound’s intensity. This did not occur
for lower frequencies because the wavelengths were long enough as to diffract
around the head. In 1907, Rayleigh performed a second localization experiment, this
time investigating localization of low frequencies. What he discovered was that the
sound reached one ear before the other, which resulted in a phase difference between
the two ears.
Modern experiments have investigated the role of interaural time and intensity
differences and have confirmed Rayleigh’s initial findings. Figure 25.1 shows the
travel path of sound waves for two sources. We see that for the source directly in
front of the listener (source A) the sound waves reach the two ears at the same time.
In this case the interaural time and intensity differences are minimized (they are not
equal owing to the asymmetry of the human head and ears). The sound waves emitted
by the second source to the right of the listener (source B), however, will produce
significant interaural differences.
In general, if source B is below approximately 1 kHz, localization will be depen-
dent on the interaural phase or time differences. If the source is greater than about
495 Csound-based Auditory Localization
A
B
1.5 kHz (wavelengths are now smaller than the diameter of the head), the interaural
intensity differences will be used. This head shadow effect increases with increasing
frequency (Middlebrooks and Green 1991).
The use of IIDs and ITDs by the auditory system is commonly referred to as the
“duplex” theory of localization. The theory suggests that IIDs and ITDs operate over
exclusive frequency regions. Although in the laboratory it is relatively easy to esti-
mate the boundary point (around 1.5 kHz) where the system switches from using
ITDs to IIDs, the interaction of the two mechanisms in the real world are not fully
understood.
There have been ITD studies that investigated the role of amplitude envelope onset
times in the higher frequency region (Begault 1994). In the frequency region above
1.5 kHz, the phase relationship between the two ears leads to an ambiguous situation:
it is hard to tell which is the leading soundwave. The studies have shown that if an
amplitude envelope is imposed on the test signal, the auditory system is able to detect
the difference in envelope onset times, thus providing useful ITD cues.
Sound sources in the real world typically contain frequency components above
and below the cutoff (about 1.5 kHz) suggested by the duplex theory. It is quite likely
that the auditory system does not really rely on any one mechanism for localization.
Rather all available information is used to provide the most suitable answer.
The use of pan pots on conventional stereo mixing boards illustrates how ampli-
tude changes, independent of the sources’ frequency content, are sufficient in sep-
arating and placing individual sounds on a horizontal plane (the stereo field).
Amplitude differences between the left and right channels are interpreted by the lis-
tener as various spatial locations. For example, a sound can appear to move across
the horizon by continuously varying the amplitude difference of the left and right
channels (Begault 1994).
496 Eli Breder and David McIntyre
The Precedence effect (other names include the Haas effect and Blauert’s “the law
of the first wavefront”) describes the auditory system’s ability to localize a sound
source within a reverberant environment. Localization experiments have studied the
Precedence effect by delaying one side of a stereo audio signal (either through head-
phones or loudspeakers) and noting the perceptual effects while varying the delay
time. Results show that as the delay time is increased from 1.5 milliseconds to 10
milliseconds the virtual sound position will be associated with the undelayed channel
but its width will seem to increase. At some point between 10 milliseconds and 40
milliseconds, depending on the sound source, a distinct echo will be heard coming
from the delayed channel. The original event, however, is still perceived as coming
from the undelayed channel. In terms of real world localization, the Precedence ef-
fect explains how we are able to localize the original sound source (or direct signal)
in spite of potentially being confused by reflections and echoes (Begault 1994).
In real-world perception and localization, our head acts as a pointer, helping us inte-
grate information from both our visual and auditory senses. As already mentioned,
we use auditory information to locate and focus on particular objects that may or
may not be part of the current visual scene. We use head movements to minimize
interaural differences. The following example (adapted from Begault 1994) shows
how head movements are used to locate a source at right 150 degrees azimuth, which
may be confused with a source at right 30 degrees azimuth.
At first the interaural difference cues suggest that the source is to the right of the
listener. As the listener starts turning his or her head toward the right, if the interaural
differences are minimized, then the source must be in front. If, on the other hand,
the differences increase, then the source is further in the back.
Head movements are also important in front/back disambiguation. Studies have
shown that the listener is able to integrate changes in IIDs and ITDs, as well as
spectral changes, owing to head movement and use this information in localization
judgments (Begault 1994). A simpler example of the importance of quick judgments
based on head movements is “if I don’t see it but hear it, it must be in the back.”
497 Csound-based Auditory Localization
Another important cue that exists in real world human localization is the perceived
pitch change of a sound source as it moves past the listener. This is termed the Dopp-
ler effect.
If, for example, a 100 Hz emitting sound source and listener are stationary, the num-
ber of vibrations per second “counted” or heard by the listener will be 100. If, however,
the source or listener is moving, the number of vibrations encountered by the listener
will differ. If the two are moving away from each other, a drop in pitch will be per-
ceived. If they are moving toward each other, an increase in pitch will be perceived.
Although the Doppler effect has been studied and understood as a perceptual phe-
nomenon, further investigation is necessary to examine its interaction with other lo-
calization cues, including cognitive processes such as experience and familiarity.
Spectral Cues
Although IIDs and ITDs are probably the most important cues for localization of
sound sources on the horizontal plane, they provide rather ambiguous cues for
sources located on the median plane. Although IID and ITD values won’t be exactly
the same owing to the asymmetrical construction of our head and pinnae, interaural
difference values will be minimal along the median plane. This would lead to con-
fusion when trying to determine whether a source is directly in front (0 degrees
azimuth) or directly in back (180 degrees azimuth), solely based on interaural differ-
ence cues.
The “Cone of Confusion” concept describes how, for any two points on a conical
surface extending outward from a listener’s ear, identical (hence ambiguous) IID and
ITD values may be calculated (points a & b and c & d in figure 25.2). It is in these
a b
d
Figure 25.2 Cone of confusion.
498 Eli Breder and David McIntyre
situations that spectral cues provide further localization cues and disambiguate front
from back and up from down.
The pinnae are responsible for the spectral alterations of incoming soundwaves.
They act as directional filters, imposing amplitude and phase changes as a function
of sound source location. Most of these spectral alterations are caused by time delays
(0–300 sec.) owing to the complex folds of the pinnae (Begault 1994). Because of
the asymmetrical construction of the pinnae, sound coming from different locations
will have different spectral changes imposed on it. The listener recognizes these mod-
ifications as spatial cues.
The filtering process of the outer ears is most often called the head related transfer
function (HRTF). Other terms used to describe this process include head transfer
function (HTF), pinnae transform, outer ear transfer function (OETF), and direc-
tional transfer function (DTF). Modern experiments and studies record, analyze, and
simulate HRTFs in order to gain a better understanding of the process of using spec-
tral cues for localization. In general, studies have shown that although different
people exhibit different ear impulse responses, or HRTFs, most measurements share
similar spectral patterns (Hiranaka and Yamasaki 1982; Asano, Suzuki, and Sone
1990). Although people do better in localization tests when using their own HRTFs,
they are able to make quite accurate localization judgments when using HRTFs of
others. Some even do better with “foreign” HRTFs than with their own.
Besides the pinnae, other parts of the body can influence the spectrum of an in-
coming soundwave. These can be broken down into directional and nondirectional
spectral modifications. For example, the upper body will cause directionally depen-
dent alterations to the spectrum in the 100 Hz–2 kHz range (Genuit 1984). The ear
canal on the other hand, is a nondirectional influence owing to its natural resonance
between 2 and 5 kHz.
3D Audio Systems
What Is 3D Audio?
are filtering techniques that simulate the directional cues used in everyday listen-
ing. Many of these systems use HRTF-based filters to approximate the direction-
ally-dependent spectral modifications imposed by our outer ears on incoming
soundwaves.
HRTF data sets are obtained by playing an analytical signal at desired locations
and measuring the impulse responses with probe microphones placed at, or near, the
ear canal of a human or dummy head. Often, the source of the signal is placed at a
distance of 1.4 meters to minimize the effects of reverberation on the recording. The
output of the microphones is then stored digitally as a series of sample points. These
data may then be compressed and post-equalized for the frequency response of the
measuring (and possibly the playback) system. An inherent difficulty with HRTFs is
that they are discrete measurements of a continuous phenomenon. Obviously an in-
finite number of measurements would be required to accurately represent the contin-
uous nature of this phenomenon, but this is not practical. This leaves a trade-off
between spatial resolution and data size/processing load.
Every individual has his or her own unique set of HRTFs. Since it is not practical
to create individualized HRTFs, most systems use generalized data sets. These non-
specific HRTFs are created by taking the same measurements on a dummy head that
approximates the features of the human ear. The KEMAR mannequin from MIT is
an example of one such dummy head. As discussed above, listeners are still able to
make accurate localization judgments using these generalized HRTF’s.
Most HRTF based 3D audio systems provide better results under binaural (head-
phone) listening conditions. In transaural (loudspeaker) playback, much of the detail
in the spatial imagery is lost because the listening environment and loudspeakers will
impose unknown nonlinear transformations on the resulting audio output. Transaural
playback may be improved by using techniques such a cross-talk cancellation, but
these can only approximate binaural listening conditions.
Unless head-tracking is used, binaural presentation of a sound source placed in a
virtual space fails to account for head movements. In a normal listening environment,
head movements are used to help pinpoint sound sources outside the field of view.
Again owing to practicality, most binaural 3D audio systems do not account for head
movements, thus removing one key technique used to localize sound sources.
MIT HRTFs produced by Bill Gardner and Keith Martin. These data sets were re-
corded using the KEMAR mannequin, a dummy head and torso that produces a
reasonable set of generalized HRTFs. Gardner and Martin have several versions of
their HRTFs available. From these, we chose to use the compact set of HRTFs. This
is a reduced data set of 128 point symmetrical HRTFs derived from the left ear mea-
surements of their full 512 point set. What this means is that for a given left azimuth
of theta, the HRTF used for the right ear would be the measurement for the left ear
at 360-q (e.g., left ear ⫽ left 45, right ear ⫽ left 315). Gardner explains that this was
done for efficiency reasons. This, however, eliminates the interaural time differences
along the median plane that have been shown to help in vertical localization. Posi-
tions on the horizontal plane were sampled at approximately every five degrees,
while positions on the median plane were sampled every ten degrees. In addition,
elevation data are restricted to the range between forty degrees below the listener to
90 degrees above. It should also be noted that the impulse responses were sampled
at 44.1 kHz. In order to use these data at a different sampling rate, the HRTFs must
be resampled at the desired sampling rate (Gardner and Martin, documentation for
MIT HRTFs).
While the HRTF data sets are sampled in the time domain, the convolution per-
formed by hrtfer requires frequency domain data. In order to reduce the compu-
tational load of hrtfer, the HRTF measurements were preprocessed using the Fast
Fourier Transform (FFT) to translate the time domain data into the frequency do-
main. The hrtfer utility program, hrtfread, was used to obtain the FFT of the MIT
HRTFs and store them in a file. This file is then used by hrtfer when processing
audio.
The hrtfer unit generator takes four arguments and produces two audio signals, a
stereo left-right pair. The following is an example of an hrtfer call:
aleft, aright hrtfer asig, kaz, kelev, “HRTFcompact”
Where:
䡲 asig is a mono audio signal. This can either be a sampled sound or a Csound
generated signal. If a sample is used, it must match the sampling rate of the HRTFs
(in this case, 44.1 kHz). If a Csound generated signal is used, the sampling rate of
the orchestra must match that of the HRTFs.
䡲 kaz and kelev are the requested azimuth and elevation values in degrees. Positions
on the left are negative while positions on the right are positive. Similarly, eleva-
tion positions below the listener are negative while positions above the listener are
positive. In fact, these values can be k-rate values, allowing for dynamic
movement.
501 Csound-based Auditory Localization
“speech1.aif”
LINE LINE
SOUNDIN
(kaz) (kel)
(asrc)
“HRTFcompact”
HRTFER
(aleft) (aright)
300 300
(aleftscale) (arightscale)
Figure 25.3 Block diagram of instr 2501, a 3D audio instrument using the hrtfer opcode to
localize and move the sound source in virtual space.
䡲 The HRTFcompact file for this version of hrtfer is literally HRTFcompact and
needs to reside in Csound’s analysis directory.
䡲 aleft and aright are the resulting audio signals. Note that these signals need to be
appropriately scaled before being output.
For example, the following line of Csound code places a sound at hard left and 40
degrees above the listener:
aleft, aright hrtfer “someSound”, -90, 40, “HRTFcompact”
A more complicated example might have a sound move around the listener in a
circle. Figures 25.3 and 25.4 show examples of Csound orchestra and score files that
do just that.
Where:
䡲 Negative values specify positions/movement to the left
䡲 Positive values specify positions/movement to the right
䡲 0 degrees azimuth is directly in front, 180 is directly in back
䡲 HRTFs for elevation range from 40 degrees below (⫺40) to 90 degrees above (90)
the listener
䡲 0 degrees azimuth/elevation specifies the position directly ahead of the listener at
a distance of 1.4 meters (the distance used in MIT’s HRTFs)
Conclusion
By taking into account spectral modifications and not just amplitude changes, sound
spatialization may be made more effective. The hrtfer unit generator implements the
502 Eli Breder and David McIntyre
f 1 0 8192 10 1
Figure 25.4 Orchestra and score code for instr 2501, a 3D audio instrument with the ability
to move a sound vertically and horizontally in 3D space.
directionally dependent spectral filtering performed by our outer ears. The use of
hrtfer in orchestras may be complemented by the implementation of Doppler effects,
distance attenuation and reverberation through other, more traditional Csound tech-
niques (e.g., pitch manipulations, lowpass filters and reverb unit generators) to create
truly vivid listening spaces.
References
Asano, F., Y. Suzuki, and T. Sone. 1990. “Role of spectral cues in median plane localization.”
Journal of the Acoustical Society of America.
Begault, D. R. 1994. 3-D Sound for Virtual Reality and Multimedia. San Diego, Calif.: Aca-
demic Press.
Butler, R., and K. Belendiuk. 1976. “Spectral cues utilized in the localization of sound in the
median sagittal plane.” Journal of the Acoustical Society of America.
503 Csound-based Auditory Localization
Gardner, M., and R. Gardner. 1972. “Problem of localization in the median plane: effect of
pinnae cavity occlusion.” Journal of the Acoustical Society of America.
Gardner, M. 1973. “Some monaural and binaural facets of median plane localization.” Journal
of the Acoustical Society of America.
Hiranaka, Y., and H. Yamasaki. “Envelope representations of pinnae impulse responses relat-
ing to three-dimensional localization of sound.” Journal of he Acoustical Society of America.
Kendall, G. S., and W. Martens. 1984. “Simulating the cues of spatial hearing in natural envi-
ronments.” Proceedings of the 1984 International Computer Music Conference. San Francisco:
International Computer Music Association.
Kramer, G. 1994. Auditory Display: Sonification, Audification, and Auditory Interfaces. Read-
ing, Mass: Addison Wesley.
Middlebrooks, J. C., and D. M. Green. 1991. “Sound localization by human listeners.” Annual
Review of Psychology 42: 135–159.
Plenge, G. 1972. “On the differences between localization and lateralization.” Journal of the
Acoustical Society of America.
Rossing, D. 1990. The Science of Sound. (second edition). Reading, Mass.: Addison-Wesley.
Working with Csound’s Signal Processing Utilities
26 Convolution in Csound: Traditional
and Novel Applications
R. Erik Spjut
The Mathematics
Mathematically, convolution requires two signals, which we will call signal a and
signal b. We will number the samples in signal a from 0 to N and the samples in
signal b from 0 to M. If we let our sampling variable n range from 0 to N⫹M then
the value of signal a at time tn is a [n] and the value of signal b is at time tn is b [n].
To perform a convolution of the two signals we have to pick one of the signals.
We can choose either signal and usually (but not always) the shorter one is chosen.
We will choose signal b. Signal b is usually called the Impulse Response and signal
a is usually called the signal, the input signal, or the excitation. Step one is to flip
the impulse response, signal b, around.
Steps two and three are to slide signal b over signal a and then multiply the corre-
sponding samples of a and b and sum the result. If we call the convolution of signals
a and b signal c, then:
N+M
c[ n ] = ∑ a[ k ]b[ n − k ]
k =0
(26.1)
The signal processing texts give the limits of the summation as—from negative
infinity to infinity, but most of us don’t have infinitely long soundfiles. Equation 26.1
can be implemented in Csound using the delay1 opcode. For real-time performance,
508 R. Erik Spjut
b[1] ⫽ 0.002177629
b[2] ⫽ 0.032279228
b[3] ⫽ 0.153256114
b[4] ⫽ 0.304044017
b[5] ⫽ 0.304044017
b[6] ⫽ 0.153256114
b[7] ⫽ 0.032279228
b[8] ⫽ 0.002177629
Figure 26.1 Signal b for use in direct convolution example from instr 2601.
when signal b is quite short, the convolution sum is very fast. The Csound instrument
shown in figure 26.3 (instr 2601.orc) implements the convolution sum where signal
b is shown in figure 26.1. Figure 26.2 is a block diagram of a direct convolution
instrument and figure 26.3 is the orchestra code.
When signal b is long or comparable in length to signal a, a mathematically equiv-
alent method is much faster. The equivalent method is to pad signal b with zeros
until it is the same length as signal a, pad both of them with zeros to the nearest
power-of-two length, take the Fast Fourier Transform (FFT) of both signals, multiply
the two FFT’s together and take the inverse FFT. Multiplying the FFT’s together
means that frequencies common to both signals will be emphasized and frequencies
in only one or the other will be greatly diminished.
The Csound opcode convolve uses a modification of the FFT method that is more
suitable when the longer signal won’t all fit in memory at once. The Csound instru-
ment in figure 26.4 (instr 2602.orc) performs the same convolution as instr 2601 but
by using the convolve opcode.
The savings can be enormous. If signal b is 256 samples long, the convolution
sum requires 256 multiplications and additions to produce one sample in signal c,
whereas convolve requires about 10 multiplications and additions to produce one
sample in signal c.
It can be proven mathematically that the convolution of a and b, written a * b, is
equal to the convolution of b and a, written b * a.
Mathematically, it does not matter which signal we use for signal b, but because of
the way convolve is written, it is faster to use the shorter signal for signal b. In
addition, the delay before the sound appears (see below) is shorter if the shorter
signal is used for signal b.
“fox.aif”
SOUNDIN
i1
(a1)
DELAY1
i2
(a2)
DELAY1
i3
(a3)
DELAY1 i4
(a4)
DELAY1
i5
(a5)
DELAY1
i6
(a6)
DELAY1 i7
(a7)
DELAY1 i8
(a8)
Figure 26.3 Orchestra code for instr 2601, a direct convolution instrument.
510 R. Erik Spjut
Figure 26.4 Orchestra for instr 2602, an instrument that performs FFT-based convolution
with Csound’s convolve opcode.
In order to use convolve in Csound, signal b (the impulse response) must be preana-
lyzed using the sound utility cvanal. To use the sound file oneEcho.aif, you would
type at the command line:
csound -Ucvanal oneEcho.aif oneEcho.con
Then to convolve oneEcho.aif with the audio signal ain in Csound, you would
write in the body of an instrument:
aout convolve ain, “oneEcho.con”
The demos in this chapter all assume you have the appropriate “.con” files in your
SADIR. The corresponding “.aif ” files are on the accompanying CD-ROM. The
“.con” files are machine specific, so you may have to generate them from the “.aif ”
files.
Almost all of the instruments have exactly the same structure. We will use the strset
opcode to put the filenames in the orchestra and pass them as p-field values from the
score. Each subsequent section of this chapter will have its own instrument and score
something like those shown in figure 26.5.
Where iexcite and irespond are replaced with the appropriate strset “filename”
when referred to by number in p5 and p6 and where, via p4, iscale is the scaling
factor to keep the signal in range. In the examples that follow, the shorter file was
chosen as signal b (the impulse response), purely to decrease computation time. As
mentioned above, either file would have worked.
It is important to note that the output from convolve is silent for approximately
the length of signal b (the impulse response). The exact length is given in the Csound
Reference Manual. For your reference, the lengths of all of the source “.aif ” files
used in this chapter are shown in figure 26.6.
511 Convolution in Csound: Traditional and Novel Applications
strset 1, “hello.con”
strset 2, “fox.con”
strset 3, “oneEcho.aif”
strset 4, “twoEchos.aif”
strset 5, “fiveEchos.aif”
strset 6, “gaussReverb.aif”
strset 7, “uniformReverb.aif”
strset 8, “whiteNoise.aif”
Figure 26.5 Orchestra and score for instr 2603, a basic convolution instrument.
This delay is due to the method used to do the convolution. Direct convolution,
using equation 26.1, does not have this delay. The nonzero output (the useful signal)
runs approximately the sum of the lengths of signal a and signal b past the initial
delay. In the score, (2603.sco), p3 was chosen as the sum of the delay and the nonzero
output length (see figure 26.5). The start time value of p2 was chosen to add a little
space between the examples by accounting for both the end of the previous note and
the delay in the present note.
Echoes or Scale-and-Translate
The convolution of an audio signal with a single sample gives a copy of the signal
scaled by the height of the sample and shifted in time by the position of the sample.
512 R. Erik Spjut
388CyclesMidC.aif 65536
5th.aif 129287
a440.aif 65533
altoSax.aif 55296
bassDrum.aif 3872
brass.aif 43987
cymbal.aif 63960
eightPoint.aif 8
fiveCyclesMidC.aif 1024
fiveEchos.aif 65536
fox.aif 121600
gaussReverb.aif 131065
hello.aif 27208
highPass300Hz.aif 255
highPass1kHz.aif 255
highPass3kHz.aif 255
lowPass3kHz.aif 128
lowPass1kHz.aif 128
lowPass300Hz.aif 256
marimba.aif 44228
middleC.aif 65533
oneCycleMidC.aif 256
oneEcho.aif 65536
piano.aif 46406
pluckBass.aif 36013
rimShot.aif 19482
twoEchos.aif 65536
uniformReverb.aif 131065
violin.aif 53429
whiteNoise.aif 131065
wilmTell.aif 64313
Figure 26.6 List of example .aif files and their length in samples.
* =
signal a signal b signal c
* =
signal a signal b signal c
Figure 26.8 Convolution of a signal (a) with two widely spaced impulses (b).
* =
* =
* =
* =
* =
Figure 26.9 Convolution of a signal (a) with a variety of impulses (b).
26.9). If the echoes are dense enough, we end up with reverberation. The set of
convolution demos in 2603.orc and 2603.sco runs from one echo through a nonde-
caying reverb.
The file oneEcho.aif consists of 65536 samples, but only the first and the last are
not zero. The result is an echo 1.5 seconds after the initial sound. The file twoEcho-
s.aif has three nonzero entries, one at 0 seconds, one at about 0.7 seconds and one at
1.5 seconds. The result is the signal and two echoes. The file fiveEchos.aif has six
nonzero entries and gives the signal and five overlapping echoes.
514 R. Erik Spjut
Convolution Reverb
In reverberation, the echoes are dense and randomly placed. The file gaussReverb.aif
was created with a decaying exponential envelope applied to a random signal gener-
ated by gauss. The file uniformReverb.aif was created with a decaying exponential
envelope applied to a random signal generated by linrand. The difference between
the two is rather subtle, but a Gaussian Reverb is, in theory, a little more realistic. If
you have a favorite pattern of echoes, you can create the reverb of your choice. The
whiteNoise.aif file is a random signal generated by linrand. It gives basically a re-
verb with no decay but a cutoff after 2.9 seconds.
When doing generalized reverberators via convolution, such as the previous set of
examples, one wants to control the length of time before the first echo and the relative
strengths of the dry signal and the reverb. Of course, it is possible to adjust these in
the impulse response file itself, using a soundfile editor, but it is easier to add a few
lines in the instrument and leave the file alone. The instrument found in 2604.orc and
illustrated in figure 26.10 has these features and capabilities.
The 2.972789 is the delay calculated by the formula in the Csound manual for
gaussReverb.aif. Subtracting 20 ms (0.02s) causes the first reverb to appear 20 ms
after the main signal. The useful range for the spacing is between 5 ms and 50 ms.
The parameter, irat, is the ratio of the reverberated signal to the direct sound. For
natural-sounding reverbs it should be less than 0.5. The tone opcode is used to filter
out some of the high frequencies from the reverberated signal, similar to what hap-
pens in a large volume of air.
“fox.aif”
SOUNDIN
(aa) idelay
“gaussReverb.con”
DELAY
CONVOLVE
(ad)
(ab) 1000
TONE
(ab)
irat iscale
strset 9, “fox.aif”
strset 10, “gaussReverb.con”
Figure 26.11 Orchestra code for instr 2604, a convolution-based Gaussian reverb instrument
with delay and tone.
matically (and sonically) the same as sending the signal through the filter. As you
may have guessed, the b [k] ’s can be thought of as the impulse response of a filter.
Lowpass Filters
The next set of demos, found in 2605.orc and 2605.sco, demonstrates using convolu-
tion for lowpass filtering. The first three files, lowPass3kHz.aif, lowPass1kHz.aif and
lowPass300Hz.aif, are impulse responses for lowpass filters with cutoffs at 3 kHz, 1
516 R. Erik Spjut
kHz and 300 Hz, respectively. If you listen to the files by themselves you will hear a
sort of “thunk” sound. But if you listen to their convolution with hello.aif and fox.aif
you’ll clearly hear the filtering effect.
The methods for calculating the values in lowPass3kHz.aif, lowPass1kHz.aif and
lowPass300Hz.aif are beyond the scope of this chapter but can be found in any good
digital filter text. Note that the files are quite short, either 128 or 256 samples. Rea-
sonably good filters do not have to be huge.
The next two files, rimShot.aif and bassDrum.aif, are recordings of a snare-drum
rim shot and a drum-set kick drum. If you think of them as impulse responses of
lowpass filters, you will find that they do a reasonable job of lowpass filtering. The
kick drum is, of course, a much better lowpass filter than the snare drum. Notice how
much longer these two files are than the first three.
Highpass Filters
The next set of demos, 2606.orc and 2606.sco, demonstrates using convolution for
highpass filtering. The first three files, highPass300Hz.aif, highPass1kHz.aif and
highPass3kHz.aif, are impulse responses for highpass filters with cutoffs at 300 Hz,
1 kHz and 3 kHz, respectively. If you listen to the files by themselves you will hear
a sort of “click” or “snap” sound. But if you listen to their convolution with hello.aif
you will clearly hear the filtering effect.
The last file, cymbal.aif, is a recording of a cymbal crash from a drum set. A
cymbal has lots of high frequency content and little or no low frequency content so
it works well as a highpass filter.
brass.aif (a crummy brass section), altoSax.aif (an alto sax) and violin.aif (a violin
with a little vibrato at the end). Some people have likened this use of convolution to
singing through a violin or playing a saxophone through a marimba.
Cross Synthesis
Why should we stop at playing a flute through a clarinet? Why not recite Hamlet
through Beethoven’s “Fifth Symphony?” Most of us do not have the gigabytes of
memory that would be required, but we can experiment with some shorter examples.
The next set of demos, 2608.orc and 2608.sco, convolves hello.aif and fox.aif, with
5th.aif (the opening motif from Beethoven’s “Fifth Symphony”) and wilmTell.aif (a
short excerpt from Rossini’s “William Tell Overture”). This sort of convolution is
often called cross synthesis. As you listen to these files, try to decide if hello.aif and
fox.aif are playing the music or if the music is playing hello.aif and fox.aif. The
mathematics says either interpretation is correct.
Most natural sounds, such as voice and acoustic instruments, have a limited high-
frequency content. The high frequencies that are present give a feeling of brightness
or openness to the sound. When we cross-synthesize two natural sounds, the small
high frequencies of one are multiplied by the small high frequencies of the other and
we get a very small overall high-frequency content. The sparkle sort of disappears
from the music. A good recording engineer with a set of parametric equalizers could
fix the problem, but Csound does not have a good set of built-in parametric
equalizers.
In most filtering applications you want the filter transition (the region where you
change from passing frequencies to blocking frequencies) to be as steep as possible.
In our case, we need something that will emphasize the high frequencies and de-
emphasize the low ones, but we do not want to block out any frequencies completely.
The simplest way in Csound is to use the differentiator opcode, diff. It works by
taking the difference between neighboring sound samples. However the audible ef-
fect is to emphasize high frequencies and de-emphasize low ones, just as we wanted.
A steep filter might have a slope of 24 dB per octave or steeper, in the transition
band. But the diff opcode is an all transition band and has a gentle-but-constant
upward slope of 6 dB per octave. In other words, for each octave you go up in fre-
quency, you double the amount of emphasis you apply. With diff our convolution in-
strument now becomes as shown in figure 26.12.
The final set of demos, 2610.orc and 2610.sco, convolves exactly the same files as
2609.orc and 2609.sco, but with the high-frequency emphasis. As you listen to these
519 Convolution in Csound: Traditional and Novel Applications
Figure 26.12 Orchestra code for instr 2610, a convolution-resonator with a highpass filter
implemented with the diff opcode.
files try to decide if hello.aif and fox.aif are playing the music or if the music is
playing hello.aif and fox.aif. Compare the outputs from 2609.orc and 2610.orc. Does
your perception of who-is-playing-whom change? Which do you like better?
Summary
We compared direct convolution to FFT convolution and decided that FFT convolu-
tion is usually better. We have used convolution to generate echoes and reverberation
from individual samples and enveloped random samples. We have performed low-
pass and highpass filtering with specifically-generated files and with recordings of
lowpass-ish and highpass-ish percussion instruments (we could have used any
arbitrary filter type). We have smeared or spread out sharp transients in a soundfile
and examined its frequency content at one frequency. We have used one instrument
or a voice to play another instrument. We let the quick brown fox play around in
Beethoven’s Fifth Symphony and we have examined some simple ways to restore the
sparkle to cross-synthesized sounds. Even with all this, I have been able to mention
only a few of the uses and interpretations of convolution in Csound. But at this point,
you should be ready to experiment and convolve all sorts of things with all sorts of
other things. You should also have a sense of what you’ll get and why. If your interest
has been piqued, be sure to check the references.
References
Oppenheim, A. V., A. S. Willsky, and I. T. Young. 1983. Signals and Systems. Englewood
Cliffs, N.J.: Prentice-Hall.
Magdalena Klapper
Csound supports a wide variety of synthesis and processing methods. This chapter
is dedicated to the exploration of two powerful analysis/resynthesis algorithms:
adsyn—based on additive re-synthesis and lpread/lpreson—based on linear-
predictive-coding. I will present some of the general theory behind these techniques
and focus on how they are realized in Csound. Also, I will introduce the exciting
world of “morphing” and “cross-synthesis” and show how adsyn and lpread/
lpreson can be used to create hybrid timbres that are clearly a “fused” cross-mutation
between two different sound sources.
Additive Synthesis
In the most simple and general sense, additive synthesis creates sound through the
summation of other sounds—by adding sounds together. You might even say that
“mixing” is a form of additive synthesis, but the distinction here is that when we do
additive synthesis, we are creating a single “fused” timbre, not a nice balance of sev-
eral sounds. Thus, additive synthesis typically implies the summation of harmonic
and nonharmonic “partials.”
A partial is typically a simple sine function defined by two control functions: the
amplitude envelope and the frequency envelope. According to the specific frequency
relations, partials can be combined to create harmonic or nonharmonic structures. In
fact, when the frequencies of the partials are “tuned” to integer multiples of the
fundamental frequency, they are said to be harmonic partials or “harmonics.” The
definition of additive synthesis may be expressed with the following formula:
N
x (n) = ∑
k =1
hk ( n )sin ( nT ( k + 2 fk ( n ))) (27.1)
522 Magdalena Klapper
where n is the sample number, N is the number of partials, hk (n) is the amplitude
value of the k th partial for the n th sample, is the pulsation of fundamental fre-
quency and fk (n) is the frequency deflection of the k th partial for the n th sample.
Additive synthesis is one of the most general methods of sound synthesis. It en-
ables the precise control of a sound by allowing the sound-designer to directly con-
tour the frequency and amplitude of every partial. The amplitude and frequency
control data for this synthesis technique can come from many sources.
In Csound, for instance, you could use GEN9 or GEN10 to additively synthesize
simple or complex spectra. But these would remain static for the complete duration
of the note event. To achieve a dynamic form of additive synthesis, you could use
the linseg and expseg opcodes to control the amplitude and frequency arguments of
a bank of oscili opcodes that are reading sine waves.
Both of these methods give the user a great deal of control, but they can involve a
lot of manual labor. For instance, if the goal were to additively imitate the “pitched”
steady-state timbre of an acoustic instrument, such as a clarinet or violin, you would
need a lot of sine waves, a lot of points in your envelope generators, a lot of patience
and a lot of luck. Csound, however, provides hetro, a utility program that automates
this process to a large degree.
Csound’s heterodyne filter analysis utility, hetro, can be used to analyze complex
and time-varying sounds into amplitude and frequency pairs that can be read by the
adsyn opcode to resynthesize this preanalyzed sound.
∫a f ( x ) g( x )dx = 0 (27.2)
rier theory) with a sine wave and then summing the result gives us the information
about the contribution of this sine in the signal. Differentiation of multiplication with
other frequencies is almost equal to zero (because the analyzed section of the sound
does not hold the complete number of periods for all frequencies).
When in 1981 James Beauchamp implemented a tracking version of the hetero-
dyne filter, a version that could follow changing frequency and amplitude trajectories
of the analyzed tones, the age of additive-resynthesis was born. The major disadvan-
tage of additive synthesis continued to be the problem of dealing with all the data
required to accurately model a complex tone additively, particularly given all of the
partials involved and the complex nature of each of their frequency and amplitude
envelopes. To make this situation more manageable and practical, a form of data-
reduction is typically employed, a method in which one uses line-segment approxi-
mations of all the frequency and amplitude envelopes.
In Csound, the hetro utility program performs the hetrodyne filter analysis of an
input signal. It tracks the harmonic spectrum in the sound and can also follow chang-
ing frequency trajectories. The specific setting for the analysis are determined either
by a set of command-line flags, or via text fields, check-boxes and drop-down menus
in your Csound “launcher’s” hetro dialog box. The meaning of some of these options
is explained in figure 27.1.
Csound’s hetro utility program creates a file that contains the analysis results,
written in a special format. Each component partial is defined by time-sequenced
amplitude and frequency values. The information is in the form of breakpoints (time,
value, time, value. . .) using 16-bit integers in the range 0–32767. Time is given in
milliseconds, frequency in Hertz. The negative value ⫺1 indicates the beginning of
an amplitude breakpoint set, the negative value ⫺2 indicates the beginning of a fre-
quency breakpoint set. Each track is ended with the value 32767. And within a com-
posite file, sets may be in any order (amplitude, frequency, amplitude, amplitude,
frequency, frequency. . .).
Typically, the hetro analysis file provides the input information for Csound’s ad-
syn opcode. But in fact, adsyn only requires that the analysis file be in the hetro
format as outlined above, not that it be created by hetro. Thus, it would be possible,
and interesting, to create a file in the adsyn format, but which contains amplitude and
frequency information derived from a source other than Heterodyne Filter Analysis.
Now that we know how to analyze sounds with the hetro utility program, let us focus
on how to use the results of the analysis to resynthesize the sounds with Csound.
Let’s start with the adsyn opcode description. Csound’s adsyn opcode performs the
additive synthesis by producing a set of sine functions, whose amplitudes and fre-
quencies are individually controlled. The control signals are derived from the Hetero-
dyne Filter Analysis and data-reduced by using a line-segment approximation
method. The adsyn opcode reads the control information from the file created by the
hetro utility program. The adsyn opcode is defined as:
asig adsyn kamod, kfmod, ksmod, ifilcod
The adsyn opcode requires four input parameters. The first three are the control
signals, which determine modifications of the amplitude, frequency and speed of the
complete set of harmonics during resynthesis. If these three parameters are set to 1,
there is no amplitude, frequency or speed modification. The last input parameter is
the name of the file created by the hetro program, which supplies the amplitude and
frequency envelopes for each harmonic partial.
The first interesting resynthesis possibility is that adsyn allows one to transpose
the sound and change its duration independently. Transposition is associated with the
second input parameter and time-scaling or duration is controlled by the third.
The file a3.aif was analyzed with the hetro program using the two sets of flags
shown in figure 27.2.
525 Working with Csound’s ADSYN, LPREAD, and LPRESON Opcodes
Figure 27.2 Hetro analysis settings for the source file a3.aif.
Figure 27.3 Orchestra code for instr 2701, a basic additive resynthesizer.
Figure 27.4 Settings for our first three adsyn resynthesis files.
Now we can look at an example of the most basic Csound instrument to resynthe-
size this sound (see figure 27.3). Both of the control files were used to resynthesize
the sound, at first with no modifications (control signals equal to 1) and then with
some modifications. The three resynthesized sounds are shown in figure 27.4.
Even a simple instrument like instr 2701 may be used to create interesting modifi-
cations and tone-colors. Clearly, the color of the sound produced by 2701.orc, which
results from the a3_1.het analysis file is much duller than 2702.orc, which resulted
from a3_2.het. This is caused by the different analysis parameters. In the case of the
first example (from 2701.orc ) using a3_1.het, the frequency bandwidth is limited to
about 5 kHz, whereas the second example (from 2702.orc ), which used a3_2.het,
has a frequency bandwidth of about 8 kHz. Nevertheless, both of the sounds are
quite similar to the source, mainly because of the harmonic character of speech (see
figure 27.2).
The third example, 2703.orc, is an example of resynthesis with a constant modifi-
cation of frequency (the pitch is 20% higher) and speed (the duration is twice as
long). In fact, amplitude, frequency or speed, can be changed dynamically, which is
shown in the 2704.orc.
526 Magdalena Klapper
Figure 27.5 Orchestra code for instr 2704, an additive resynthesizer instrument with dy-
namic frequency modulation and variable time and amplitude scaling.
Figure 27.6 Settings for the resynthesis of nonharmonic timbres breaking glass and cym-
bal crash.
For the fourth example, then, the static initial value for frequency modification—
ifmod, is replaced by a control signal as shown in figure 27.5.
The next set of examples show the additive-resynthesis of nonharmonic sounds.
The glass.aif and gong1.aif soundfiles were analyzed with the hetro program with
the same set of flags as the a3.aif sound file (see figure 27.2). As a result, the files
glass_1.het, glass_2.het, gong1_1.het and gong1_2.het were created. Then these files
were used to resynthesize the sounds with no modification of amplitude, frequency
or speed as shown in figure 27.6.
During the analysis of the sounds of broken glass and gong, there is a significant
loss of spectral data. This is because of the settings which limited the number of
harmonics to 50, and because of the nonharmonic character of the source files. In
case of the broken glass, the tone color of the resynthesized sound is quite different,
but, in my opinion, musically attractive.
In the case of the resynthesized gong, another interesting effect appears. Since
the control signals of the “harmonic” frequencies created during the analysis move
significantly, the resynthesized sound, which is not at all similar to the source, ac-
quires a unique “trembling” character.
527 Working with Csound’s ADSYN, LPREAD, and LPRESON Opcodes
The method of linear prediction coding (LPC) has been employed a great deal in the
signal processing of speech. In principle, this analysis-resynthesis method attempts
to approximate a single sample of a signal as a linear combination of previous
samples:
N
x (n) ≈ ∑
j =1
hj x ( n − j ) (27.3)
Figure 27.7 The parameters for Csound’s lpanal LPC analysis utility.
As mentioned above, control data required for LPC synthesis may be derived from
the analysis of natural sounds. For this purpose, the lpanal utility program performs
LPC and pitch-tracking analysis. The input signal is decomposed into frames, and
for each frame a set of information required for the LPC resynthesis is derived. As
is the case with the hetro analysis utility program, the course of the analysis is deter-
mined by the values of a set of command-line flags or menu and “launcher” dialog
options as shown in figure 27.7.
It should be noted that the “hopsize”, (-h), should always be greater than 5*npoles
(-p). If not, the filter may be unstable because the precision of the calculation for the
filter poles may be too small.
For each frame, the parameters derived from the analysis are averaged. Because
the frame length is from 400 to 1000 samples, this averaging may cause a consider-
able modification of the sound during resynthesis, especially if the source file is
changing rapidly.
The lpanal program creates a file in a specific format, which contains the results
of the LPC and pitch-tracking analysis for use by the LPC family of opcodes. The
output file is comprised of an identifiable header plus a set of frames of floating point
analysis data. Each frame contains four values of pitch and gain information, fol-
lowed by npoles filter coefficients.
529 Working with Csound’s ADSYN, LPREAD, and LPRESON Opcodes
The Csound opcodes that perform LPC resynthesis are the lpread and lpreson op-
codes. The lpread/lpreson pair are defined as follows:
krmsr, krmso, kerr, kcps lpread ktimpnt, ifilcod[, inpoles[, ifrmate]]
ar lpreson asig
The reading opcode lpread creates four control signals based on the file con-
taining the results of the LPC analysis. These output control signals correspond with
the root-mean-square of the residual and original signals (krmsr and krmso), the
normalized error signal (kerr) and the pitch (kcps). In addition to the name of the
file with the LPC analysis results (created by the lpanal utility program), lpread
requires one additional input parameter, a control signal ktimpnt (in seconds). This
parameter enables time expansion and compression of the resynthesized sound. If it
is a linear signal with values from zero to the value of the source file duration, there
is no change of duration after resynthesis.
If the LPC analysis file (with the results of the analysis), has no header, two more
parameters are required. They must specify the number of filter poles and the frame
rate of the analysis.
The lpreson opcode then uses the control files, generated by lpread to resynthe-
size the sound. This opcode has one input parameter, an audio signal. For speech
synthesis, this audio signal is typically white noise for the “unvoiced phonemes,” and
a pulse-train for the “voiced phonemes.” An evaluation of whether the sound is
voiced or unvoiced may be obtained by evaluating the error signal. A value of the
error signal of about 0.3 or higher, usually indicates an unvoiced signal. There is also
a second form of the lpreson opcode in Csound called lpfreson, which is a formant-
shifted version of lpreson.
It should be noted that the input signal for the lpreson opcode need not only be
noise or a pulse-train. In fact, it can be any audio signal. Given this fact, we can
perform a type of cross-synthesis with the LPC method. For now, let us return to the
a3.aif soundfile and analyze it with lpanal using the settings shown in figure 27.8.
Figure 27.8 Two sets of lpanal parameters for the analysis of the source file a3.aif.
530 Magdalena Klapper
LINE
(ktimp) “a3_1.lpc”
LPREAD
(krmsr) (krmso) (kerr) (kcps) sr 2
INT
1 RAND
“a3.aif”
(asig2)
BUZZ
SOUNDIN (asig1)
if (p4 > 0) goto contin2
(asig3) if (kerr >.3) goto contin1
contin2: contin1:
(asig)
(asig) (asig)
LPRESON
(aout)
Figure 27.9 Block diagram of instr 2709, an LPC speech synthesizer that switches source
files between noise, pulse-train and original signal based on a combination of score parameters
and the “error” signal analysis from kerr.
Figure 27.11 Setting for the initial examples of LPC resynthesis using different source and
analysis files for comparison.
ognition. The voice in the 2712.orc example seems to be a little hoarse. This is be-
cause of the particular set of flags used for the analysis, mainly the long frame, and
may, in fact, be a desirable effect (see figure 27.8).
In instr 2709 we can change the duration during resynthesis. Modification of the
duration is associated with the value of the iosdur parameter. This effect is presented
by the three examples shown in figure 27.12 (all of them use the a3_1.lpc analysis
file and asig ⫽ asig1).
The first sound is ten times longer than the original, the second sound is two times
shorter, and the last sound is two times longer than the original. In this set of ex-
amples, a “funny” effect of pitch modulation appears, especially in the longest
sound. This effect appears because of the particular buzz opcode parameters (int(sr/
2/kcps) determining the total number of harmonics requested.
532 Magdalena Klapper
Next, the sounds of a gong and breaking glass were analyzed with the lpanal
utility, using the first set of flags from figure 27.8, and the resulting LPC analysis
files were named glass_1.lpc and gong1_1.lpc. The resynthesis was realized with the
same three excitation signal options (pulse, noise, source), as employed by lpreson
for the generation of the a3.aif examples above. But the effects of the resynthesis in
these cases are quite different. After resynthesis the sound files shown in figure 27.13
were obtained. The resynthesis of these files are not at all similar to the original
sounds. In some ways it is difficult to associate them with their sources. Since both
the amplitude and spectrum of the original sounds change quickly, some of the infor-
mation is lost because of the averaging performed by the analysis.
We must remember that the LPC synthesis method is clearly modeled on the syn-
thesis of speech and it imitates the natural way speech originates. The model used
for LPC has two components: the input to the filter (the signal of excitation associ-
ated with the vocal chords) and the filter shape (the vocal tract), which influences the
excitation signal. But quite obviously, the physics and acoustical properties of sounds
such as the gong and the breaking glass are quite different than speech. The LPC
speech model doesn’t fit them at all. The way that the sound of a gong comes into
existence is quite different from the way that speech does. Therefore, the LPC
method is not suitable for “accurate” resynthesis of sounds like these. But in my
opinion, analyzing nonspeech sounds with lpanal and resynthesizing them with
lpread/lpreson is an excellent way to obtain interesting and extraordinary effects.
From a musical point of view, I think that the resythesized sounds of the gong and
the breaking glass are puzzling, uniquely curious and make for some delicious ear-
candy!
533 Working with Csound’s ADSYN, LPREAD, and LPRESON Opcodes
The lpread opcode is an especially useful one. It creates four control signals, which
may be used to modify ANY control parameter of any sound we design with Csound.
What is more, these control signals are usually derived from natural sounds, so they
are usually nonperiodic and have a wonderfully complex nature. Let us examine
several examples that show how the lpread control signals can be used to modify
the sound of plucked strings. For this purpose we have designed an “lpread mapper”
instrument (see figures 27.14 and 27.15) that uses a p-field to select from one of four
control methods.
The dynamic “controls” for these “acoustically mapped” plucked strings came
from the gong1.aif file. It was analyzed with lpanal using the first set of parameters
from figure 27.8. In the case of “method 1” (2722.orc ), the pitch tracking signal of
gong1_1.lpc file was used as the frequency control of the string sound. In the case
of “method 2” (2723.orc), the amplitude of the pluck was controlled by the rms of
the original signal. In the case of “method 3” (2724.orc ), we use a combination of
the two methods mentioned above. Finally, in the case of “method 4” (2725.orc), a
second string was added whose frequency is controlled by the error signal of the
analysis file.
These simple sounds are a small example of the great number of possible ways
that the lpread control signals may be used as dynamic input parameters of other
Csound opcodes.
Cross-Synthesis Examples
LINE
(ktimp) “gong1_1.lpc”
LPREAD
(krmsr) (krmso) (kerr) (kcps)
method3: method4:
.1 .1 50
300 300 300
200 0 1 200 0 1 200 0 1
Figure 27.14 Block diagram of instr 2722, an instrument that uses p-fields to select a
“method” by which the lpread control signals will modify the parameters of a synthetic
instrument.
535 Working with Csound’s ADSYN, LPREAD, and LPRESON Opcodes
Figure 27.15 Orchestra code for instr 2722, an instrument that uses the lpread opcode as a
way to incorporate complex “natural” control signals into a synthetic instrument.
singing voice (2726.orc ) and the other where the singing voice is filtered by the
analysis data from the breaking glass (2727.orc ).
Since this chapter is about both additive and LPC resynthesis, let’s build two final
instruments that will unite both methods. Figure 27.18 shows a block diagram of this
cross-synthesis model.
The lpread output control signals may be used as the input parameters of the
additive resynthesis (to modify the amplitude, pitch or speed). These examples show
how the singing voice (a3.aif ) and the breaking glass (glass.aif ) may influence each
other (see figure 27.19). In the first case (2728.orc ), the amplitude and frequency of
the singing voice (a3_1.het) were modified during additive resynthesis by two con-
trol signals, derived from the lpread of the breaking glass (glass.lpc). In the next
example (2729.orc ), the controlling and resynthesizing roles were reversed.
If we compare the cross-synthesis in instr 2728, where LPC and additive methods
are united (see figure 27.19), with the cross-synthesis in instr 2726, where only LPC
536 Magdalena Klapper
idur1
0 idur2
LINE
(ktimp) “a3_1.lpc”
LPREAD
(krmsr) (krmso) (kerr) (kcps)
“glass.aif”
SOUNDIN
(asig)
LPRESON
(aout)
(asig)
BALANCE
(aout)
idur1
0 idur2
LINE
(ktimp) “glass_1.lpc”
LPREAD
(krmsr) (krmso) (kerr) (kcps)
1 “a3_1.het”
.1
ADSYN
(asig)
.0002
Figure 27.18 Block diagram of instr 2728, an instrument combining the lpread and adsyn
opcodes.
Figure 27.19 Orchestra code for instr 2728, a combination of an LPC and additive cross-
synthesizer.
538 Magdalena Klapper
Figure 27.20 Orchestra code for instr 2730, a combination of an LPC and additive cross-
synthesizer that “freezes” the control spectrum on a single analysis “frame.”
was employed (see figure 27.17), we should notice that in instr 2726, a change in the
shape of the spectrum occurs, whereas in the case of instr 2728 only modification of
amplitude or frequency of the sound are possible.
For our final combined LPC-Additive cross-synthesis instrument (see figure
27.20) the linear input signal of the lpread opcode was replaced by a constant value
(one point of time). This causes the lpread output to be a constant value too because
the control signals are read from that particular point in the LPC file. In essence, we
“freeze” the control data at that particular “frame” of analysis. In this case also, the
input files for the adsyn and lpread opcodes contain the analysis results of the same
file. This algorithm was used to process the breaking glass (2730.orc ) and the gong
(2731.orc ) and it resulted in two interesting sound effects.
Conclusion
Additive and LPC synthesis are powerful sound design tools. The intermediate an-
alysis stage provides us with important and useful information about the inner struc-
ture of the sound we are analyzing. Given this description, we have the possibility
to creatively modify and alter the structure of any sound by replacing analysis data
from one instrument with analysis data from another. Upon resynthesis we may ob-
tain many wonderful effects that can be extremely useful from a compositional and
musical point of view. These analysis/resynthesis methods allow us, for instance, to
modify the sound of orchestral instruments creating new hybrid tone colors that are
impossible to get from an acoustical instrument. Modification of the tone color of
natural sounds is, in my opinion, a great motivation for employing the computer in
a composer’s work.
We have seen also, that the analysis data may be used, not only to resynthesize
and cross-synthesize sounds, but to control parameters of computer generated sounds
adding a “natural” level of complexity and bringing the synthetic and acoustic sound
539 Working with Csound’s ADSYN, LPREAD, and LPRESON Opcodes
worlds closer together.other sounds generated with the computer. It may be used, for
instance, to add a natural level of complexity and bringing the synthetic and acoustic
sound worlds closer together.
Linear Prediction Coding and Additive Resynthesis are just two examples of the
many ways that Csound can be used to make new sounds. The sound design and
compositional possibilities of these ways of working can offer the imaginative com-
poser a rich and inspiring area in which to explore, cultivate and harvest.
References
Jayant, N. S., and P. Noll. 1984. Digital Coding of Waveforms. Englewood Cliffs, N.J.:
Prentice-Hall.
Kleczkowski, P. 1985. “The Methods of Digital Sound Synthesis. The Systematic Outline.”
Conference of Mathematics. OSA–85.
Roads, C. 1996. The Computer Music Tutorial. Cambridge, Mass.: MIT Press.
Vercoe, B. 1995. Csound: A Manual for the Audio Processing System and Supporting Pro-
grams with Tutorials. Cambridge, Mass.: MIT Press.
28 Csound’s Phase Vocoder and Extensions
Richard Karpen
is helpful, of course, and certain of these details will be discussed in order to shed
light on how to use this family of unit generators most effectively.
The phase vocoder process begins with the analysis of a digital audio signal. Csound
has several utility programs that operate with the -U option in the Csound command-
line. The pvanal utility, based upon the Short-Time Fourier Transform (STFT) for
the analysis of digital audio signals, is one of those programs. It is executed by the
following command-line:
csound -U pvanal input_soundfile output_pvocfile
where the input_soundfile will be the name of the digital audio file that is going to
be analyzed and output_ pvocfile will be the user given name of the phase vocoder
data file that will be created by running this utility program. The pvanal utility also
has a number of optional arguments that can help you tune the analysis process. The
specific qualities of the input_soundfile will sometimes determine how these other
arguments are used. Before going into the details of using pvanal, a general discus-
sion of how the STFT works on an audio signal and what is in the data that it pro-
duces will help in gaining an understanding of how to use this utility.
Jean Baptiste Joseph Fourier (1768–1830) was a mathematician who theorized
that any function can be decomposed into a sum of simple harmonic motion (sine
waves, for example). The basic premise of Fourier’s theory, as applied to sound, is
that complex sound waves can be broken down into a collection of pure, sinusoidal
components and that by adding these sinusoids together again one can recreate the
original sound.
This theory has been put into practice in the phase vocoder and, in general, our
ears can accept that it does indeed work for many sounds. In digital signal pro-
cessing, the Fast Fourier Transform (FFT) is used to break down a digital audio
signal and produce data that contain the frequencies and amplitudes of a collection
of sinusoidal components. By recombining these sinusoids using these data, the
audio signal can be reconstructed synthetically. To understand what is happening in
this reconstruction it might be useful to think of this process as similar to the way
that sine-tone oscillators, with different frequency and amplitude settings, can be
combined in additive synthesis to create complex sounds.
In order to find the frequencies of the components of a periodic signal and their
amplitudes, the FFT operates on segments of the digital sound samples. Each seg-
ment has an envelope applied to it, a process called windowing. The length of the
543 Csound’s Phase Vocoder and Extensions
windowed segment of sound samples is called the window-size. In Csound, the win-
dow size can be left to a default value in the pvanal program, or it can be set ac-
cording to certain characteristics of the sound being analyzed.
An analogy can be drawn between moving film or video and the segmented FFT
process. Movies and video consist of “frames” of individual still pictures that, when
played back at a fast enough rate, resolve into what appears to be continuous motion.
In the phase vocoder, the FFT is used to take “snap-shots” of the sound and produce
frames of data that, when played back, will sound like continuously time-varying
sound. The FFT takes these snap-shots and transforms them from time-domain sig-
nals into data containing the average values for the frequencies and amplitudes of
individual sinusoidal components, for the time covered by the window of samples.
The FFT analysis of a succession of windowed segments of sound samples produces
a series of time-ordered frames in a process sometimes referred to as the Short Time
Fourier Transform (STFT).
A key to the understanding of how to use the analysis program is to understand
the relationship between the window-size and the accurate rendering of the fre-
quency and amplitude values of the spectral components on the one hand and the
time-varying qualities of the sound on the other. The general rule of thumb is: the
larger the window-size, the more accurate the frequency data will tend to be and
the smaller the window-size, the more accurate the time-varying aspects will be.
A compromise must be found that gives the best possible frequency analysis while
capturing, on a frame-by-frame basis, as much of the variation over time in the sound
as possible. If the frequency components in each successive window of sound are
rather stable, (that is, significant changes are happening at a rate slower than the size
of the window), then the variation of the signal over time will be fairly accurately
represented by the changes in the values returned by the succession of analyses. But
if the window is too large and covers a selection of sound that has significant change,
the data will not contain an accurate rendering of this variation in time, since the
values in the analysis data are averaged over the entire window.
Remember that only one value for frequency per spectral component is returned
for each window of sound. So, again, the shorter the window size, the more data
frames there will be per second of sound and therefore, the more the process will
capture the moment-to-moment changes in the sound. The size of the window, on
the other hand, also determines the resolution in the frequency domain and a larger
window size will produce data with more individual components and will track their
frequencies more accurately.
There is a direct relationship between the size of the window and the number of
components derived through the analysis, which is also useful to understand. The
544 Richard Karpen
FRAME 1 2 3...
data in each frame returned by the analysis are separated into bins. Each bin contains
frequency and amplitude values, which may change on a frame by frame basis and
can be charted as shown in figure 28.1.
The FFT process actually returns windowsize bins of frequency and amplitude,
but only half that number will be stored as data and used in the resynthesis process.
The window sizes are always powers-of-two in the pvanal program, so, for example,
a window size of 1024 will return 1024 bins, 512 of which will be used. You can
think of these bins as narrow-band filters that are spaced evenly between 0 Hz and
the sampling-rate. Only half of the bins are used since only frequencies less than
sr/2 can be used in synthesis.
The bandwidths of the bins overlap slightly, but each bin can only track frequen-
cies that occur within its range. It is important therefore to choose window sizes that
produce enough bins to provide as dense a coverage of frequency as needed by partic-
ular sounds. A sound with harmonics spaced 100Hz apart, for example, will require
the bins to be no more than 100Hz apart. Therefore, at a sampling rate of 44100, it
will require a window size of at least 441 (or the length of one period of a 100Hz
signal). Since the window size must be power-of-two, the smallest window in this
case would be 512.
In practice though, it is better to have windows that are at least two, if not four,
times the size of the longest period. Therefore, in the case above, a good choice for
the window size would be 1024 or 2048. Once again, we are reminded of the fact
that by choosing a larger window size we might end up by loosing crucial informa-
tion that would represent the variation of the signal in time, but these are tradeoffs
we always have to make.
To make this compromise between time and frequency more workable, the selec-
tion of samples for the STFT are longer than the increment at which the successive
selections are taken. For example, if the FFT size is 1024 samples, the first selection
545 Csound’s Phase Vocoder and Extensions
would consist of samples 0–1023. The next selection, instead of containing the next
1024 samples (1024–2047) could instead start at sample 256 and contain samples
256–1279, the selection after that would contain 512–1536 and the one after that
would contain 768–1023. The size of the increment by which the selection of
samples moves through the sound is sometimes called the hop-size. In the case de-
scribed here, the hop-size is 256 while the window size is 1024. Each FFT begins
256 samples further into the sound and each FFT contains 1024 samples to do the an-
alysis. This means that we are overlapping the successive FFT analyses in order to
gain the best frequency analysis through the use of a larger FFT size, while capturing
more of the time-varying quality of the signal by moving forward through the sound
by a smaller increment.
There are two ways of specifying the relationship between the window-size and
the hop-size in the pvanal program. One allows you to specify the exact number of
samples in the window-size and the exact hop-size. The following command-line
instruction would cause the analysis to be carried out according to the above descrip-
tion. The number after the -n is the window-size, the number after the -h is the hop-
size.
csound -U pvanal -n 1024 -h 256 input_soundfile output_pvocfile
The following would do exactly the same thing. Again the number after the -n is
the window-size, but this instruction uses the -w option, which determines how many
overlapping analysis windows there will be.
csound -U pvanal -n 1024 -w 4 input_soundfile output_pvocfile
The pvanal utility program allows you to specify either the hop-size or the
overlap-amount. You can remember how to use these by memorizing the following:
hop-size ⫽ window-size/overlap
overlap ⫽ window-size/hop-size
The pvanal program will analyze only mono soundfiles or channel 1 (usually the
left channel) of a soundfile (it is possible that the analysis and the resynthesis pro-
cesses will eventually allow one to work with multiple audio channels, but as of this
writing, only one channel of analysis is possible). Finally, the data produced by the
pvanal program are stored in binary format in a file on your disk. This file can be
large; in fact, it can be larger than the soundfile that is being analyzed. So take care
where you are creating and storing these data files. The description above should
give you the information you will need to begin making sounds using the Csound
phase vocoder.
546 Richard Karpen
Once you have run the pvanal program as described above, you will be ready to use
the data in a Csound instrument to synthesize sound. The basic unit generator for
performing phase vocoder resynthesis is pvoc. The manual defines pvoc as:
ar pvoc ktimpnt, kfmod, ifilcod, ifn, ibins[, ibinoffset, ibinincr,
inextractmode, ifreqlim, iqatefn]
The argument ktimpnt, is an index value of time, in seconds, into the analysis file.
If for example, ktimpnt ⫽ .5, the pvoc opcode will select data from the frame that
represents the spectrum at .5 seconds into the original sound. Csound knows how to
find that frame because it stores information from the analysis process about how
many frames there are per second of sound. If the hop-size in the analysis were 256
and the sampling rate were 44100, then there would be approximately 172 frames
per second and we might call this the frame rate. So, ktimpnt ⫽ .5 would cause pvoc
to take its values from around the 86th frame.
Actually, the result of ktimpnt*framerate is often not an integer value. In this case,
pvoc will interpolate. For example, if ktimpnt ⫽ .6 and framerate ⫽ 172, then the
frame number would be 103.2 and pvoc will interpolate the appropriate values along
the line between the values in frames 103 and 104. By having ktimpnt change over
time, pvoc can use the successive frames in the analysis file. A simple example in-
strument shown in figure 28.2 below shows the use of the line generator to create a
changing time-pointer.
The line opcode produces a succession of values from 0 to 1 over the duration. If
p3 is 1 and the sound that was analyzed was for example a 1 second long digitized
violin sound, then a resynthesis sounding almost exactly like the original can be
achieved. If however, p3 is 2, so that it takes 2 seconds for the line to go from 0 to
1, then by using the output of that line as the time-pointer, the resynthesis would be
a version of the original that is twice as long and twice as slow as the original. The
pitch of the synthetic sound would remain exactly the same as the original. When
the passage through the analysis data is slower than the original sound, more interpo-
Figure 28.2 Orchestra code for instr 2801, a pvoc resynthesizer with dynamic time scaling.
547 Csound’s Phase Vocoder and Extensions
lation is performed between the data frames. The Csound pvoc opcode will always
interpolate values at every k-period. So, even if the frame rate is 172, there will
always be kr new values per second. It might be useful to think about the data as a
series of breakpoint functions (see figure 28.3).
In this representation of the data, you can see clearly that for each bin there is a
frequency envelope and an amplitude envelope, where the time values are the x-axes
and the frequency or amplitude values are the y-axes. For some of the extended phase
vocoder unit generators, this representation will be useful to return to for reference.
The time-pointer can be derived from any number of Csound unit generators and
your own musical and programming imagination. Nor does the time-pointer have to
progress in chronological order through the entire length of the pvoc file. It can start
anywhere between the beginning and the end and it can go forward or backward.
You can use linearly or exponentially changing values, a series of line segments,
curves, or any shape that you can load into a function table (f-table). You can even
use random numbers to access the data. Below are a few examples of some of the
ways to create time-pointers. It’s worth noting that these can be used in other Csound
opcodes (lpread, sndwarp, table) that use indices into data, sound, or f-tables.
To read backward through a sound at a constant speed determined by the length
of the note:
ibeg ⫽ 0
iend ⫽ p4 ;p4 ⫽ duration of input_sound
ktime line iend, p3, ibeg ;values go from end to beginning
To read through the data exponentially from the middle of the data to the end:
ktime expon (iend*.5), p3, iend
To read forward through the data at varying speeds, where the last time-point is
specified in the score:
548 Richard Karpen
iend ⫽ p4
ktime linseg 0, .1, .1, p3-.2, iend-.1, .1, iend
The above example shows how you can use the time-pointer to preserve the time-
scale of parts of the sound, in this case the attack portion and the end and alter the
timescale of other parts of the sound.
When determining the type of sound, the attack portion can be one of the most
important cues to our ears. The attacks of many sounds tend to be around the same
length, no matter how long the sound is sustained over time. Sometimes, stretching
or compressing the attack portion of a sound using pvoc can reveal wonderful new
sounds. But sometimes you might want to preserve the timescale of the attack or of
any other part of the sound and using the time-pointer with multiple line segments
or curves can allow you to move through the sound with extremely fine resolution in
time. Try making a time-pointer that stretches the attack of a sound by a large factor
but then moves through the rest of the sound at the original speed. In some sounds
this can be quite an interesting effect.
At this point it is appropriate to discuss the effects of time-scaling upon the spectra
of the sound. Remember that the FFT analysis breaks down the original sound into
a representation of frequencies and amplitudes and that the resynthesis process can
be thought of as an additive synthesis version of the sound, using the frequencies and
amplitudes as settings for the collection of sine tones. Therefore, as you might ex-
pect, the phase vocoder analysis/synthesis process works best with sounds that have
periodic and harmonically related components.
You will notice, however, that if you analyze a sound that has unpitched parts,
such as the consonants in speech, that pvoc can do well at synthesizing those sounds
under certain circumstances. Since the resynthesis process does not include any
noise element, even the consonants must be synthesized with sinusoids. This can be
achieved because, in the noisy portions of the sound, the frequency data in each bin
will change chaotically from one frame to the next and if the window is large enough,
the frequency data will be dense enough to create a noisy signal. But the resynthesis
of the noisy part of the signal will not be as successful as you stretch the time-scale
by greater and greater amounts. Then, what you will hear are the sinusoids gliding
up and down over time, creating a dense inharmonic mess. You might be attracted to
this sound at first, but you should also be aware that it has become somewhat cliché
and it is immediately recognizable as a phase vocoder trying to synthesize noise
while significantly stretching the time-scale.
Sounds that have noise that you want to preserve and sounds that have attacks or
particular attributes that need to be synthesized at their original time scale, may re-
quire using a time-pointer such as the one above using a linseg. Used as a time-
pointer into a pvoc file, it will preserve the original speed of the first and last .1
549 Csound’s Phase Vocoder and Extensions
seconds of the sound, but expand the length of time during the rest of the sound. In
some cases you may need to have functions with many breakpoints, if the sound has
many attacks and inharmonic or noisy segments. Speech and singing, for example,
can be well synthesized using the phase vocoder with carefully worked out time-
pointers that preserve the synthesis speed of the consonants.
Finally, before going onto some of the extended techniques and unit generators
based on the Csound phase vocoder, a few words about the other pvoc arguments
are in order. The kfmod argument is simply a frequency scalar on the entire spectrum
of the analysis:
ifmod ⫽ 1 ; NO CHANGE IN PITCH:
ifmod ⫽ 2 ; PITCH TRANSPOSED UP ONE OCTAVE:
ifmod ⫽ .5 ; PITCH TRANSPOSED DOWN ONE OCTAVE:
kfreq line .5, p3, 2 ; TIME-VARYING PITCH CHANGE
As with the time-pointer, the pitch envelope can be as simple or complex as you
would like, using any of the many ways to create constant or time-varying values
in Csound.
The initialization argument, ispecwp, can be used when pitch transpositions are
being made to try to preserve the spectral envelope of the original. When you shift the
pitch up or down by a significant amount, you are also shifting the entire spectrum up
or down. Sounds that rely upon fixed formant areas in their spectra will not retain
their unique quality of sound that the formants provide. By setting ispecwp to a non-
zero value, pvoc will attempt to generalize the spectral envelope and rescale the
amplitudes across the frequency range to maintain the spectra even when the fre-
quencies of all of the components are being shifted. This effect can be heard clearly
on resynthesized speech.
ifmod ⫽ 2
ispecwp ⫽ 1 ; JUST NEEDS TO BE NONZERO
asig pvoc ktime, ifmod, “hellorcb.pvc”, ispecwp
Vocoder Extensions
The use of pvoc, as you have seen, is quite straight forward with two basic transfor-
mations of the original sound that can be achieved—time and pitch scaling. The
data produced by the pvanal program lend themselves to a variety of other kinds of
processing however and new unit generators have been created to extend the function
of the basic Phase Vocoder. These will be discussed below and some examples will
be shown to give you some ideas of how to use these opcodes.
550 Richard Karpen
PVREAD
The pvread opcode is a unit generator that reads from a data file produced by pvanal
and returns the frequency and amplitude values from a single analysis bin. It is de-
fined in the manual as:
kfr, kap pvread ktime, ifile, ibin
The arguments ktime and ifile are used exactly as in pvoc, but pvread, does not
perform any synthesis using the data. Recall in figure 28.2 above seeing the bins as
time functions with frequency and amplitude values. The pvread opcode returns the
time-varying values for one bin as specified in the ibin argument. These values can
be used to synthesize single components, or to create your own additive synthesis
instrument using several or many pvread opcodes.
In a composition of my own, I analyzed a section of a Pibroch, a centuries old
Scottish classical bagpipe form. This is quite slow moving music with sudden leaps
of pitch, sometimes by an octave or more. These leaps of pitch are present throughout
the spectra, of course, since the fundamental changes as do all of the related compo-
nents. I used pvread to take the frequency and amplitude data from two high bins. I
transposed the frequencies down to the mid-range and used the values to control the
frequency and amplitude of two sine tones in resynthesis. Since the frequencies and
amplitudes of each bin change slightly from frame to frame, the effect was an eerie
mixture of the expressionlessness of a pure sinusoid with subtle changes in the frame
by frame data and sudden shifts of pitch as occurred in the original bagpipe recording
(see figures 28.4 and 28.5).
You can use your imagination to find interesting ways using this opcode. The ex-
ample shown in figures 28.6 and 28.7 illustrate an orchestra and score that uses
pvread to access one bin of analysis data at a time and to fade in successive compo-
nents gradually. Added to this is the use of frequency modulation in which, at the
beginning, the index of modulation is large enough to produce a significant number
of sidebands but later, the index decreases until only sine tones will be produced.
Both the time-pointer and the FM index are controlled by global variables in a sepa-
rate instrument, so that all the notes, no matter when they begin, will have the same
values for time and FM index. This is one way to synchronize values between differ-
ent notes and instruments in Csound.
In this example (see figure 28.7), the sound will begin with single FM sounds at
the frequencies and amplitudes of the bins being read by pvread. By the end, when
many notes are sounding together, the FM index is too small to produce sidebands
and there will be a collection of sine tones, each with frequency and amplitude data
from a different bin. After 45 seconds the sound should be much like the original
551 Csound’s Phase Vocoder and Extensions
LINE
(ktime)
“hellorcb.pvc” 40 “hellorcb.pvc” 45
PVREAD PVREAD
(kfreq1) (kamp1) (kfreq2) (kamp2)
OSCILI OSCILI
1 1
(aosc1) (aosc2)
Figure 28.4 Block diagram of instr 2810, an instrument that uses pvread to generate control
functions for the frequency and amplitude arguments of a pair of oscillators.
Figure 28.5 Orchestra code for instr 2810, a pvread controlled additive synthesis instrument.
one on which the analysis was based. Here you can see me adding new notes reading
from different bins, whenever, and in whichever order seems interesting. Of course,
this can be done much more easily if you are using some kind of higher level pro-
gramming language to produce your score files.
The vpvoc opcode is identical to pvoc except that it takes the result of a previous
tableseg or tablexseg and uses the resulting function table, passed internally to the
instr 2811
LINE EXPSEG
(gktime) (gkfmdevscale)
instr 2812
(gktime) (gkfmdevscale)
“hellorcb.pvc”
p4
PVREAD
(kfreq)(kamp)
OSCILI
1
(amod)
OSCILI
1
(acar)
p5
Figure 28.6 Block diagram of instr 2811 and 2812, a global control instrument (2811) and
a pvread controlled FM synthesizer (2812).
f 1 0 16384 10 1
i 2811 0 25 2.2
i 2812 0 25 8 500
i 2812 1 24 10 300
i 2812 5 20 12 150
i 2812 9 16 14 100
i 2812 13 12 18 100
i 2812 17 8 26 200
i 2812 21 4 43 500
Figure 28.7 Orchestra and score code for instr 2811 and 2812, featuring a globally con-
trolled FM synthesis instrument, (instr 2812), with parameter data from pvread.
553 Csound’s Phase Vocoder and Extensions
vpvoc opcode, as an envelope over the magnitudes of the analysis data channels. The
result is spectral enveloping. The function size used in the tableseg should be
framesize/2, where framesize is the number of bins in the phase vocoder analysis file
that is being used by the vpvoc. Each location in the table will be used to scale a
single analysis bin. By using different functions for ifn1, ifn2, etc. in the tableseg,
the spectral envelope becomes a dynamically changing one. For example, take an
analysis file with 512 bins (your window size would have been 1024) and you create
the f-statement:
f 2 0 512 5 1 512 .001 ; LOWPASS
The vpvoc opcode can use the successive values stored in this function to scale the
amplitudes of the 512 analysis bins. In this case, bin #1 would be scaled by 1 and
therefore remain unchanged, bin #2 would be scaled by a value slightly less than one
and would therefore be scaled down. This would continue to bin #512, which would
be scaled by .001 and would therefore virtually be eliminated from the spectra. The
effect would be that of a lowpass filter. The Csound manual defines vpvoc as:
ar vpvoc ktimpnt, kfmod, ifile [, ispecwp]
Below are a few more example functions and short descriptions of the effect they
would have upon the sound. This function allows frequencies contained in the first
100 bins to be heard, while the rest will have their amplitudes scaled to zero:
f 3 0 512 7 1 100 1 1 0 411 0 ; LOWSHELF
It acts like a lowpass filter, where the cut-off frequency is a precise demarcation
below which all of the signal will be allowed through the filter and above which none
of the signal will be heard. This kind of filtering is far more exacting the reson filters,
which have gradual, curved slopes describing the frequency response. This one cre-
ates a bandreject filter centered around bin #256:
f 4 0 512 5 .001 256 1 256 .001 ; BANDREJECT
An exponential curve is used, making the attenuation closer to that of a simple time-
domain bandreject filter.
A bandpass filter effect, with the pass band in the shape of a half-sine and the
frequency centered around bin #256, can be achieved by the following:
f 5 0 512 9 .5 1 0 ; BANDPASS
The way to use these functions to alter the spectral envelope with vpvoc is to use
the tableseg or tablexseg opcodes. These two unit generators are similar in syntax
to linseg and expseg, but instead of interpolating between values over given amounts
of time, they interpolate between the values stored in f-tables producing new time-
554 Richard Karpen
varying functions over time. The tableseg opcode does this linearly, while tablexseg
performs exponential interpolation. They are defined in the manual as:
tableseg ifn1, idur1, ifn2[, idur2, ifn3[, . . .]]
tablexseg ifn1, idur1, ifn2[, idur2, ifn3[, . . .]]
The arguments ifn1, ifn2, ifn3 and so on, refer to f-table numbers that have been
defined in the scorefile. Here idur1, idur2 and so on, are the amounts of time over
which the values of one table are interpolated until they become the values of the
next table. To give you an idea of how they work, consider the following simple
example:
ktableseg 2, p3, 3
The result would be a changing function over time, where all of the values begin
at one and gradually, over the duration ( p3), move to zero. If the two functions were:
f 4 0 512 5 1 512 .001
f 5 0 512 7 .001 512 1
and the results were applied over time to using vpvoc as described above, the sound
would begin with a lowpass filter effect and change over the duration, until the effect
would be that of a highpass filter.
You may have noticed that tableseg and tablexseg have no output. The way to use
these two unit generators with vpvoc is to make sure that they come before the vpvoc
in your instrument. See, for example, figure 28.8.
i 2815 0 10 2.28
Figure 28.8 Orchestra code for instr 2815, a vpvoc-based instrument whose resynthesis
spectrum is modified by the tables f 2, f 3 and f 4 under the control of tableseg.
555 Csound’s Phase Vocoder and Extensions
The result of the above will be a sound that begins with just one sinusoidal compo-
nent, that of bin #101. Over 5 seconds ( p3*.5) it will be transformed into a sound
that has many components but with a lowpass effect with all frequencies above those
in bin #256 being silenced. Then, over the next 5 seconds, a further transformation
will take place gradually giving the sound a spectral envelope like that produced by
a notch or bandreject filter.
Remember that ispecwp also operates on the spectral envelope, but it works to
keep the peaks and valleys of the amplitudes of the components in the same fre-
quency range, even when the pitch is being shifted up or down. When using this op-
tion in vpvoc, the spectral warping is done after the work is done using the function
table, or the tableseg/tablexseg opcodes.
These three unit generators make it possible to combine the data from different phase
vocoder analysis files in order to create transitions from one sound to another, or to
create sound hybrids that mix attributes of the different sounds. The opcodes pvcross
and pvinterp, both of which require a previously defined pvbufread in the instru-
ment, offer two ways of performing cross-synthesis. The three unit generators are
defined in the manual as:
pvbufread ktimpnt, ifile
ar pvcross ktimpnt, kfmod, ifile, kamp1, kamp2[, ispecwp]
ar pvinterp ktimpnt, kfmod, ifile, kfreqscale1,
kfreqscale2, kampscale1, kampscale2,
kfreqinterp, kampinterp
The opcode pvbufread accesses data from a phase vocoder analysis file in the
same way that pvoc does. But instead of using the data to resynthesize sound, it
makes the data available for use in pvcross and pvinterp. These two opcodes also
access data from analysis files in the same way, using a time-pointer to retrieve the
time-varying data. The opcode pvcross simply allows one to add some amount of
the amplitudes contained in the analysis bins of one file to those of another. It allows
one to determine how much of the amplitudes of each file will be used and it allows
the amounts of the two files’ amplitudes to be changed over time.
In the example shown in figure 28.9, the pvbufread and pvcross opcodes are each
reading from separate analysis files using their own time-pointer values. The two
values using kamp determine the mix of the amplitudes that will be applied to the
frequency spectrum contained in the analysis file being read by pvcross. When kamp
is 0, then the amplitudes from file1.pvc will be multiplied by 0, while all of the
556 Richard Karpen
Figure 28.9 Orchestra code for instr 2816, a cross-synthesis instrument using pvbufread.
amplitudes of file2.pvc will be multiplied by 1. The result should be exactly the same
as if a simple pvoc had been used with the same file and time-pointer. But as kamp
changes from 0 to 1 over the duration, the balance between the amplitudes from file1
and file2 changes. When kamp is .5, then equal amounts of the amplitudes of both
files will be added together and then used to scale the frequency spectrum. By the
end of the note, kamp is equal to 1 and therefore only the amplitudes of file1 will be
used. Through this process the frequencies are always those from file2, so that at the
end of the note the frequency values are those of file2, while the amplitudes are those
of file1.
Remember that the bins can be thought of as pairs of breakpoint functions, re-
flecting the time-variation of the frequency and amplitude values in each bin. You
can think of pvcross as a process that allows you to replace the amplitude breakpoint
functions of one sound with those of another, as well as allowing cross-fading be-
tween them.
The example shown in figure 28.10 shows how you can use the same analysis file
in the pvbufread and the pvcross opcodes. In pvcross, ktime2 is causing the frames
to be read in reverse chronological order. The amplitudes of the backward version of
the sound will be applied to the forward version of the frequencies.
As you can see, pvcross allows a specific type of cross-synthesis, using just the
amplitudes returned by the pvbufread. On the other hand, the pvinterp opcode is a
more powerful unit generator, allowing the same kind of processing as pvcross, but
with a number of other interesting possibilities. An important difference between
pvcross and pvinterp is that pvinterp will interpolate between the frequencies of
the two data files, as well as between amplitudes. While the interpolation process is
the same for frequencies and amplitudes, the effects are quite different. The instru-
ment shown in figure 28.11 illustrates how pvinterp works on the frequencies.
Let’s look at a slightly more complex example. Let us assume that we have two
sounds, a violin and an alto sax. And further, let’s say the two sounds that produced
the analysis files were both 1.2 seconds long and that the violin was at the pitch of
557 Csound’s Phase Vocoder and Extensions
Figure 28.10 Orchestra code for instr 2817, a cross-synthesizer that reads the amplitude
analysis data in reverse and combines it with frequency analysis data that is read in the for-
ward direction.
Figure 28.11 Orchestra code for instr 2818, a cross-synthesis instrument that interpolates
between the frequency data of both files.
c4 (middle-c), while the alto sax was a minor third higher, at the pitch e-flat4. Now
let’s take a look at some of the data in each analysis file. The two lines below show
the frequency values for bin #11 in each file covering six frames:
violin: Bin 11 Freqs 261.626 261.340 261.471 261.389 261.286 261.555
altosax: Bin 11 Freqs 311.127 311.137 311.715 311.924 311.950 311.708
When pvbufread reads through the violin file, it will interpolate between consecu-
tive values according to the value of the time-pointer. The pvinterp opcode will do
the same with the altosax file. But pvinterp will take the value of kfreqinterp, to
determine a value between that returned by the pvbufread and that found by pvint-
erp. Using the actual values of two analysis bins shown above, you can see that if
kinterp were 1 in the first frame shown, then the value of bin #11 used for synthesis
would be 261.626. If the value of kinterp were 0, the value used for bin #11 would
be 311.127. If the value of kinterp were .5, the frequency for bin #11 would be half-
way between 261.626 and 311.127. Since kinterp changes over time from 0 to 1, the
558 Richard Karpen
frequency values will change over time from those of the alto sax to those of the
violin. And this will occur for all of the analysis bins. Notice that kinterp is also be-
ing used for the amplitudes that will cause the interpolation over the same time pe-
riod between the amplitudes of the alto sax and those of the violin.
Now let us look at the use of the other arguments. Suppose we want a transition
between alto sax and violin, as shown above, but we want both to be at the same
pitch (remember that the alto sax is at e-flat4 and the violin is at c4). You can use the
kfreqscale2 argument to bring the pitch of the alto sax down to that of the violin:
ifreq1 ⫽ 1
ifreq2 ⫽ cpspch(8.00)/chspch(8.03)
kinterp line 1, p3, 0
pvbufread ktime, “violin.pvc”
asig pvinterp ktime, ifmod, “altosax.pvc”, ifreq1,
ifreq2, iamp1, iamp2, kinterp, kinterp
Now let us assume that the amplitudes of the alto sax were much greater than those
of the violin, so that when the interpolation is made between them, the amplitudes of
the alto sax are still much more prominent in a 50/50 interpolation (when kinterp is .5
for the kampinterp argument in this example). We can scale up the violin amplitudes:
iamp1 ⫽ 2
iamp2 ⫽ 1
If the amplitudes of the violin get louder over time and those of the alto sax get
softer and we want to maintain a somewhat more equal balance over the duration of
the sound, we might take advantage of k-rate variables for these arguments and do
something like:
kamp1 line 0, p3, 1
kamp2 expon 1, p3, 2 ; EXPON, JUST TO HAVE DIFFERENT SLOPE FOR KAMP2
Now, let us put this all together into a simple instrument (see figures 28.12 and
28.13). The argument kfmod acts to transpose the pitch after the work of ifreq1 and
ifreq2 is done. In the example below, ifmod is equal to .5. The result, when ifreq1,
ifreq2 and ifmod are put into play, will be a sound with the pitch c3 (the alto sax is
559 Csound’s Phase Vocoder and Extensions
LINE
(ktime) “violin.pvc”
8.00 8.03
PVBUFREAD
CPSPCH CPSPCH
(ifreq2)
EXPON LINE
(kamp2) (kamp1)
LINE
“altosax.pvc”
(kinterp)
ifmod ifreq1
PVINTERP
(asig)
Figure 28.12 Block diagram of instr 2821, a flexible cross-synthesis instrument with dy-
namic control of time, amplitude, and pitch transposition.
Figure 28.13 Orchestra code for instr 2821 as shown in figure 28.12.
first transposed down a minor 3rd and then the whole sound is transposed down an
octave), that changes over time from a sax into a violin.
As you can see, there are many different combinations of frequency and amplitude
scaling and different ways of combining those two aspects of two analysis files. Since
all of the arguments can change over time at the k-rate, there can be innumerable
combinations with constantly changing relationships between all of the values.
While there are few limits to the kind of transitions that you can do using pvinterp,
in practice there are some limitations if you want to maintain high quality in your
sounds.
560 Richard Karpen
Figure 28.14 Orchestra code for instr 2822, a cross-synthesis instrument in which the
pvinterp output is used as the input to lpreson resulting in a “talking marimba/violin morph.”
One of the biggest problems, which you can hear quite well when using pvinterp,
is making transitions between sounds that have different spectral distributions. Let
us say that you want to have a transition between a pitched sound and a noisy one
(violin to cymbal). If you do this transition over a fairly short time period, it can
produce quite wonderful results. But if you stretch the time frame by using a slowly
changing time-pointer, you will hear that during the stage of interpolation that is
between the two sounds, it can sound extremely muddy and unfocused. If you try the
same thing with a variety of pairs of different sounds, it can sound nearly the same,
no matter which two sounds you are using. The reason for this is that, if the spectral
distributions are different, all of the different frequency bins might be changing in
different ways from one frame to another. The results can therefore be quite chaotic
and messy sounding.
Finally, remember that you can combine different synthesis techniques quite easily
in Csound and that there is no need to try to make one method do everything. For
example, you can use the output from pvinterp as shown above as input into an
lpreson (see figure 28.14).
Conclusion
Use your imagination to guide your experiments with this family of unit generators
and then try combining them with other techniques. You will find that you can make
original and even amazing sounds that can inspire you as you compose new musi-
cal works.
Modeling Commercial Signal Processing Applications
29 Efficient Implementation of Analog
Waveshaping in Csound
Michael A. Pocino
Many analog synthesizers and effects have interesting and characteristic sounds that
can serve as sources of inspiration for Csound instrument design. In fact, good simu-
lations of analog gear can be based directly on the behavior of the actual circuits.
This article is not, however, about circuit analysis. That would take an entire book.
Rather, the focus of this chapter will be on several specific analog waveshaping tech-
niques and their efficient implementation in Csound. These techniques will be pre-
sented in the context of a couple of example instruments, but are general enough to
have a wide variety of uses. While these approximations will not sound exactly like
a particular manufacturer’s synthesizer or effect, they will provide some of the “clas-
sic” sounds and effects of that old analog gear.
The Comparator
The basic waveshaper we will look at is the comparator. An analog comparator com-
pares a signal to a reference voltage and produces a high or low output depending
on whether the input is higher or lower than the reference. If an audio waveform
were put through a comparator, its output would look like a series of pulses. An
example of a guitar signal that is put through a comparator with a reference of 0 can
be seen in figure 29.1.
In Csound, a comparator can be made with the following function table (f-table)
and the table opcode:
acomp table asig-aref, 1, 1, 0.5
f 1 0 32 7 -1 16 -1 0 1 16 1
This is a table lookup, where the table contains a square wave that goes from ⫺1
to 1. The origin is set at the middle of the table, where the change from ⫺1 to 1
564 Michael Pochino
Figure 29.1 A guitar signal (dashed line) is fed through a comparator with reference 0. The
output is shown as a solid line.
occurs. If the index to the table is greater than 0, the output will be 1. If it is less than
0, then the output will be ⫺1.
The reference (aref ) is subtracted from the input signal (asig), so that the signal
must be greater than the reference for the total to be greater than 0. Thus, the output
of the comparator is 1 if the signal is greater than the reference and ⫺1 if it is less.
If the reference were fixed at 0 (or simply left out), the comparator would act like a
“sign” function. This means that the output would be 1 or ⫺1, when the input is
positive or negative, respectively. In general, the reference can change with time and
be used for a variety of applications.
One way that comparators can be put to good use is in a pulse oscillator with variable
width, a circuit found in many analog synthesizers. This differs from most typical
analog oscillators, in that it can not be approximated with just one oscili opcode
and a corresponding f-table. An analog pulse oscillator is made with a sawtooth os-
cillator and a comparator. We can do the same thing in Csound with the instrument
shown in figure 29.2. Here, a sawtooth wave is generated at the same frequency as
the desired pulse wave, using the second wave table. It is likely that a sawtooth wave
will already be needed in an analog synth instrument, so this is sort of a freebie. The
sawtooth is put through the comparator and it can be used with a varying reference
to produce pulses of varying widths.
565 Efficient Implementation of Analog Waveshaping in Csound
Figure 29.2 Orchestra and score code for instr 2901, an instrument that uses a sawtooth
waveform and table-lookup comparator to realize a pulse wave with modulatable pulse-width.
In this comparator, apw is a time-varying reference that determines the pulse width
at any given time. If the reference (apw) is zero, we can see that the sawtooth will
be greater than zero and less than zero for equal amounts of time, producing a square
wave. As apw is increased, however, a smaller portion of the sawtooth is above this
reference, so the positive portion of the pulse will be shorter than the negative
portion.
Figure 29.3 shows how a changing reference changes the pulse width. Notice how
the width of the positive parts of the pulses (solid) increases as the reference (dash-
dot) increases. Generally, a low frequency oscillator is used to vary the pulse width
in a synth. And in instr 2901, the only thing the LFO changes is the pulse width, so
any LFO sounds you hear are only due to changing pulse width. More full-blown
analog synth simulations can be found in instr 2902 and instr 2903.
566 Michael Pochino
Comparisons and logical operations can be useful ways to implement more compli-
cated waveshaping techniques. Unfortunately, the relational and logical operators in
Csound, such as ⬎, ⬍, && and ||, will not work with audio rate signals. It is possible,
however, to generate the same relational and logical functions using comparators and
arithmetic. This allows the use of these functions at audio rates without having to set
the control rate equal to the sampling rate. In this case, the comparator will be gener-
ating outputs of 1 and 0 (for true and false) instead of 1 and –1 (for the sign of the
signal). In this section, the procedure for generating each of these functions will be
explained. An example of how these can be used is described in the next section.
The following f-table and table opcode make up an “a ⬎ b” function, which re-
turns 1 if it is true and 0 if it is not:
aagtb table asiga-asigb, 1, 1, 0.5
f 1 0 32 7 0 16 0 0 1 16 1
567 Efficient Implementation of Analog Waveshaping in Csound
f 1 0 32 7 0 16 0 0 1 16 1
Figure 29.4 Using tables to generate the “and” function “a ⬎ b && c ⬎ d.”
f 1 0 32 7 0 16 0 0 1 16 1
The previously explained techniques can all be used in a guitar sub-octave generator.
The method we will use inverts the signal during every other cycle, which makes the
guitar sound as though it is playing one octave lower. When mixed with the original
568 Michael Pochino
Figure 29.6 Orchestra code excerpt from instr 2904 for the generation of an adaptive
“Schmitt” trigger for counting zero crossings.
guitar signal, this can give a surprisingly clean and natural sound much like a guitar
and bass playing in unison.
The first task is to determine where each period of the waveform starts and stops.
One way to do this is to find the zero-crossings of the signal. That is, every time the
signal crosses zero, one half cycle has occurred. However, a complicated signal may
cross zero too many times during a cycle, causing glitches. There is a way to compen-
sate for this; we will use what is called an adaptive Schmitt trigger.
Basically, a Schmitt trigger does not count a zero-crossing unless the signal goes
far enough past zero. For instance, if the signal is negative, it will not count as going
positive until it passes a sufficiently large positive value. It will then count as positive
until it passes a large enough negative value and so on. It ignores any part of the sig-
nal between these two thresholds, in order to avoid glitches. We can make an adap-
tive Schmitt trigger by setting this threshold value based on the power of the signal.
This way, we can count zero-crossings of loud signals without glitches, while still be-
ing able to count zero-crossings of soft signals. The output of the adaptive Schmitt
trigger will be a glitchless pulse wave that oscillates between 1 and –1. This function
can be generated by the collection of opcodes shown in figure 29.6, and examples of
their effects can be seen in figures 29.7 and 29.8.
The power of the signal is found, which is used (when multiplied by ithr) as the
threshold for the trigger. In figure 29.7, the positive and negative thresholds appear
as dash-dot lines. What follows is a logical statement, which is generated using the
techniques covered in the previous section. A simplified logical equation for agate
would be:
agate ⫽ (ain ⬎ ithr*kpwr) || (ain ⬍ -ithr*kpwr)
This equation, however, is implemented as shown in figure 29.6 with the table
opcodes and sum, in order to work with the audio rate signals. If the signal is greater
than the positive threshold or less than the negative threshold, agate is true. In
other words, any ambiguous case (between the two thresholds), is ignored. A samp-
hold unit generator is used to store the current state of the trigger. If agate is true,
569 Efficient Implementation of Analog Waveshaping in Csound
Figure 29.7 A guitar signal (dashed) is passed through the Schmitt trigger. Dash-dot lines
designate the thresholds. The output (solid) is only positive or negative during positive or
negative parts (respectively) of each cycle.
Figure 29.8 The output of the adaptive Schmitt trigger (solid line) is compared to the origi-
nal guitar
570 Michael Pochino
samphold reads in aset, which is 1 or –1 depending on the sign of the signal. The
solid and variable width pulses in figure 29.7 represent the output of aset and agate
would be the absolute value of this function. The output of aschmitt would look like
the pulse wave shown in figure 29.8. As you can see, aschmitt goes positive the first
time aset goes positive and stays positive until aset goes negative. This new pulse
wave in figure 29.8 is clearly much cleaner that the original glitchy one we had in
figure 29.7.
Once we have the pulse wave with the same period as the input from the adaptive
Schmitt trigger, we need to get a square wave of half the frequency. This can be easily
accomplished by using a “divide-by-two-counter.” Basically, the output of the divide-
by-two-counter changes sign every time the input changes from negative to positive.
And detecting a change of sign of the pulse wave is simple.
alast delay1 aschmitt
aedge ⫽ aschmitt-alast
This subtracts the last value of the pulse wave from the current value. If they are
the same, the output is zero. If they are different the output is 2 or –2, depending on
the direction of the sign change. So, aedge has a nonzero value only at the edges of
the pulse.
The tricky part is getting a square wave that will work at half this frequency. One
solution to this problem uses a phasor controlling a table.
iincr ⫽ sr/8
aphs phasor abs(aedge*iincr)
asub1 table aphs, 1, 1
Every time an edge occurs, aphs is incremented by 0.25. This causes one period
of the square wave to occur for every 4 edges of the pulse wave (2 positive edges and
2 negative). The output of the divide-by-two-counter can be seen in figure 29.9. Each
period of the guitar waveform is contained nicely in one half period of the square
wave.
In order to invert every other cycle of the guitar waveform, we can just multiply
the original signal by the square wave. This has been done to the signal shown in fig-
ure 29.10. Notice how the period is now twice as long as before. It is easiest to see
this by finding the first positive peak, which matches up with the peak near the right
of the graph. Compare this period to that of the original waveform in figure 29.9.
This is the output waveform, which sounds much like a guitar playing one octave
lower than the original.
A working guitar sub-octave generator has been included in instr 2904.orc (see
figure 29.11). A few smoothing filters have been added, but it is basically the same
571 Efficient Implementation of Analog Waveshaping in Csound
Figure 29.9 The output of the “divide-by-two-counter” compared to the input guitar signal.
Figure 29.10 The final sub-octave output is the same as the guitar input, but inverted every
other cycle. The result sounds one octave lower.
process as outlined above. An example guitar audio file (guitar1.aif) has also been
included, but you can easily process your own files with the instrument.
Conclusion
f 1 0 2 2 -1 1
f 2 0 32 7 0 16 0 0 1 16 1
Figure 29.11 Orchestra and score code for instr 3404, a guitar sub-octave generator
instrument.
for the modeling of “classic” and “hybrid” instruments, which can be implemented
efficiently. While only two example instruments have been given, the complexity of
the second should give a good background on the many ways audio rate comparators
can be used in your own designs.
Additional Information
Schematics of many effects and synthesizers can be found at the following web sites.
They may be useful if you want to analyze other circuits and implement them in
Csound.
Music Machines (http://www.hyperreal.com/music/machines) has schematics of
several commercial synths and a few home-brew projects. It also has information
about synthesizers and quite a few samples.
DMZ Schematics (http://members.aol.com/jorman/schem.html) has many schemat-
ics of commercial synths and effects, as well as home-brew projects.
573 Efficient Implementation of Analog Waveshaping in Csound
References
Horenstein, Mark N. 1996. Microelectronic Circuits and Devices. Englewood Cliffs, N.J.:
Prentice-Hall.
Thomas, R. E. and A. J. Rosa 1994. The Analysis and Design of Linear Circuits. Englewood
Cliffs, N.J.: Prentice-Hall.
30 Modeling a Multieffects Processor
in Csound
Hans Mikelson
where num-a-rate and num-k-rate are the number of allocated audio rate and control
rate channels.
Communication between effects is accomplished using zaw and zar. To write to
an audio channel, one uses:
zaw asig, kndx
Here, asig is the audio signal and ichannel is the number of the channel.
The zawm opcode mixes the current contents of the audio channel with the new
audio data. During a chord, several instances of the pluck instrument are active at
576 Hans Mikelson
Figure 30.1 Orchestra code for instr 3099, a stereo zak-based mixer with four input audio
channels.
the same time, so the pluck instrument uses zawm to accumulate sound. The audio
channels must be cleared every sample period, or the data will continue to accumu-
late. The mixer is the last instrument in the orchestra and is used to clear audio
channels 0–30 with the statement:
zacl 0, 30
Mixer
The mixer, shown in figure 30.1, reads from four audio channels and provides inde-
pendent gain and pan control for each channel.
Dynamics Processing
This section describes a compressor, a limiter, a noise gate, a de-esser and a distortion
effect (Lehman). These are all related effects that change the dynamics of the sound.
577 Modeling a Multieffects Processor in Csound
Delay RMS
input output
Compressor
A compressor is used to reduce the dynamic range of a signal. It does this by moni-
toring the level of an input signal and varying the gain applied to the output signal.
Sounds greater than a specified level are reduced in volume. The rms opcode can be
used to give a time average of the level of the input signal. The output from rms is
used to reference a table that determines the amount of gain applied to the output
signal. A post gain is usually included to restore the level of the output signal after it
has been compressed. The rms opcode does not respond immediately to changes in
level so that sudden attacks are sometimes allowed to pass. To avoid this, the original
signal is monitored and compression is applied to a time delayed copy of the signal.
In this example a delay time equal to one half of the rms averaging time is used (see
figure 30.3).
The compressor block diagram is presented in figure 30.2. The amount of com-
pression is given by the compression curve f 6. Compression levels of 2:1 and 4:1
are common.
f 6 0 1025 7 1 256 1 256 .5 513 .5
578 Hans Mikelson
Limiter
The limiter is merely a compressor with a severe compression curve. Limiters pre-
vent a signal level from going above a specified limit. Limiters commonly use com-
pression levels of 10:1 or 100:1. This can be implemented by simply using a different
compression table:
f 6 0 1025 7 1 256 1 512 .01 257 .01
Noise Gate
Noise gates are used to remove unwanted background noise and hiss from a signal.
A noise gate can be implemented by changing the f 6 table again:
f 6 13 1025 7 0 64 0 0 1 448 1 513 1
In this case, signals below a certain level are completely silent. Once they exceed
that level they are allowed to pass. Noise gates are sometimes criticized for removing
playing dynamics. To solve this problem a delayed signal is used to determine the
level and the original signal is modified and output. This results in the gate opening
just before playing begins. This technique can also be used on a compressor to pre-
vent compression of initial playing dynamics (see figures 30.4 and 30.5).
De-Esser
The next effect considered in this section is the de-esser. When a microphone is used,
certain consonant sounds such as “s” and “p” produce loud artifacts. The de-esser, a
relative of the compressor, can be used to reduce these artifacts. The de-esser moni-
tors the level of the high frequency component of the signal and applies compression
to the signal based on this level. This can be implemented by applying a highpass
filter to the input signal and monitoring the level of the filtered signal (see figures
30.6 and 30.7).
Distortion
Delay
input output
input output
electronic amplifiers were based on vacuum tubes. Vacuum tube distortion is usually
described as warmer and more musical than other types of distortion (Hamm 1973).
If a sine wave signal is passed through an overdriven tube amplifier, the resulting
waveform differs from the original in several ways. The top of the waveform becomes
flattened or clipped. The bottom of the waveform is also flattened although not as
much as the top. The duty cycle of the waveform is also shifted so that the upper part
of the curve is not the same width as the lower part of the curve. The resulting shape
is approximated in figure 30.8.
Waveshaping may be used to reshape an input waveform to resemble figure 30.8.
For slight distortion use the following table:
f 5 0 8192 8 -.8 336 -.78 800 -.7 5920 .7 800 .78 336 .8
Filtering Effects
Slope
+
Delay
input output
Figure 30.10 Orchestra code for instr 3013, a “tube-amp” distortion processor as shown in
figure 30.9.
582 Hans Mikelson
ilofco
BUTTERLP
ilofco ihifco
BUTTERHP BUTTERLP
input output
ihifco
BUTTERHP
Figure 30.12 Orchestra code for instr 3018, a 3-band equalizer using Butterworth filters.
Equalizer
Wah-Wah
resonances derived from the vowel sounds “ahh” and “ooh” are used to add character
to the standard lowpass filter sound. The vowel “ahh” has resonances and amplitudes
of 730 Hz ⫺1 dB, 1090 Hz ⫺5 dB, 2440 Hz ⫺28 dB. The vowel “ooh” has reso-
nances of 300 Hz ⫺3 dB, 870 Hz ⫺19 dB, 2240 Hz ⫺43 dB. As the frequency is
swept from high to low the resonances are swept from “ahh” to “ooh.” This wah-wah
effect could be take several steps further and developed into a full blown “talk-box”
as in instr 3017. Check it out!
Some theory of resonant filters can be found in chapter 10, “Modeling ‘Classic’
Electronic Keyboards in Csound,” and will not be discussed here in detail. This im-
plementation shown in figures 30.15 and 30.16, uses the nlfilt opcode, which avoids
the need of setting kr ⫽ sr and also provides for simpler code. This filter is designed
to resonate for approximately the same amount of time no matter what the cut-off
frequency. Filters could also be used to model the resonances of the guitar body such
as in instr 3019.
Pitch Effects
This section describes vibrato, pitch shifting, chorus and flanging effects. All of these
effects make use of delay lines whose delay times are modulated with an oscillator.
An interpolating delay tap, deltapi, is used to allow continuous variation of the de-
lay time.
Vibrato
Vibrato can be accomplished by modulating a variable delay tap with a sine wave.
When the delay tap sweeps forward in the same direction as the signal the pitch is
lowered. As the delay tap sweeps backward in the opposite direction of the signal the
pitch is raised (see figures 30.17 and 30.18).
Pitch Shifting
A simple type of pitch shifting can be implemented with a variable length interpolat-
ing delay tap. The delay time is modulated with a sawtooth wave whose amplitude is
equal to the wavelength of the sound. This results in a resampling of the waveform
with linear interpolation between successive samples. Lowering the pitch results in
584 Hans Mikelson
input
BUTTERLP
RESON
RESON output
RESON
Resonant
input Low Pass Filter output
Figure 30.16 Orchestra code for instr 3015, a resonant lowpass design using Csound’s non-
linear filter opcode—nlfilt.
586 Hans Mikelson
Delay
input output
Figure 30.17 Block diagram of simple vibrato algorithm using an oscillator and a delay line.
Figure 30.18 Orchestra code for instr 3020, a variable tap-delay vibrato instrument.
cycles being discarded periodically. Raising the pitch results in some cycles being
repeated. In order to produce a good quality sample, the wavelength of the sound
must be known. In this example it is simply supplied when the instrument is called
(see figures 30.19 and 30.20).
Chorus
Chorus is an effect that attempts to make one instrument sound like more than one
instrument. The resulting sound is thicker than the original sound. Chorus can be
implemented by adding the original signal to a frequency-modulated delayed signal
(Cronin 1994). The signal is typically delayed between 20 and 30 msec. Gain is
applied to control the amount of mix between the original signal and the delayed sig-
nal. Common waveforms used to modulate the signal are sine, triangle and logarith-
mic waves (see figures 30.21 and 30.22).
Many choruses can be combined with different phases, waveforms and delay times
to produce a rich sound. A stereo chorus effect, for example, can be created by having
587 Modeling a Multieffects Processor in Csound
Delay
input output
Figure 30.20 Orchestra code for instr 3022, a variable delay-based pitch shifter.
two choruses one quarter cycle out of phase with each other and sending the output
of each to a separate channel as follows:
; STA DUR RATE DEPTH WAVE MIX DELAY PHASE INCH OUTCH
i 3035 0 1.6 .5 2 1 1 25 0 2 3
i 3035 0 1.6 .5 2 1 1 20 .25 2 4
Flanger
Flanging was originally produced by taking two tapes with the same music on them
and playing them at the same time. By pushing on the flanges of one of the tape
reels, the playback speed of one of the copies of the sound was modulated. This
detuning of the signal results in areas of constructive and destructive interference as
the different frequencies moved in and out of phase with each other. This produced
notches in the audio spectrum. As the frequency of the modulated signal was swept
back and forth, these notches moved closer together and farther apart producing the
characteristic “jet airplane” effect.
In this implementation, shown in figures 30.23 and 30.24, the original signal is
added to a delayed signal. The delay time is modulated by a sine wave, so that the
588 Hans Mikelson
Delay
input output
pitch of the delayed signal is modulated. The combined signal is then fed back into
the beginning of the delay path, which makes a more pronounced flanging effect.
Typical delay times are 10 msec.
A stereo flanger can be implemented by running two flangers at one quarter cycle
out of phase from each other and sending each to a separate channel as follows:
; ST DUR RATE DPTH WAVE FEEDBK MIX DELAY PHASE INCH OUTCH
i 3030 0 1.6 .5 1 1 .8 1 1 0 2 3
i 3030 0 1.6 .5 1 1 .8 1 1 .25 2 4
589 Modeling a Multieffects Processor in Csound
Delay
input output
Miscellaneous Effects
This section describes a digital delay, a panner, a tremolo effect and a simple reverb
effect.
Stereo Delay
This section describes a stereo delay with cross feedback (see figures 30.25 and
30.26). The delayr and delayw opcodes provide a straight forward implementation
of this. The right and left channels are delayed independently. The delayed signal
from each channel may be mixed with the original signal in either channel.
Tremolo
Panner
Reverb
All sound produces some type of reverberation as the sound waves are reflected and
absorbed on surfaces in the listening environment. The type of reverb depends on
the size, shape and material of the area in which the sound is produced. A concert
hall can produce a rich spacious reverb. Artificial reverb is often added to signals to
make them sound as if they were generated in a specific type of area, such as a
concert hall. Reverberation can be simulated by using a combination of allpass fil-
ters, comb filters and delays. Csound provides the nreverb opcode for generating
simple reverbs. This is used in the instrument shown in figure 30.31 to create a simple
591 Modeling a Multieffects Processor in Csound
Delay
input1 output1
Delay
input2 output2
Figure 30.26 Orchestra code for instr 3040, a stereo delay with feedback.
592 Hans Mikelson
input output
reverb. There are many excellent examples of reverbs available in the Csound ar-
chives, which can be modified to work with the system presented in this chapter.
Conclusion
I hope that this section has provided insight into the theory and implementation of
many of the most popular sound effects and has provided inspiration for further audio
experiments. Some further ideas to try would be to use different waveforms with the
pitch-based effects. You might try calling the chorus routine many times with differ-
ent parameters and waveforms, to produce a dense chorusing effect. Try setting up
different types of distortion and then devise an instrument to oscillate between them.
Add attack and decay envelopes to the dynamics effects. And if you have a fast com-
puter, one of the first things you may wish to do is to implement real-time input and
output for the effects. Effects processing is an important dimension of the sound
design process; I hope these instrument models help you get further into it.
593 Modeling a Multieffects Processor in Csound
input output
output
Figure 30.31 Orchestra code for instr 3045, a simple reverb instrument.
594 Hans Mikelson
References
Hamm, R., O. 1973. “Tubes versus transistors—is there an audible difference?” Journal of
the Audio Engineering Society.
As you begin to master the Csound system, there may come a point when you wish
it could do more and a desire may grow to change and extend Csound. In this chapter,
a number of pointers are given to start you on this endeavor. Granted, there is a great
deal to learn before declaring oneself an expert, but there is no better way than to
read the code and improve it.
Csound was originally conceived, designed and coded so that new opcodes could
be added by users who were not necessarily experts in all the workings of the system.
In practice this facility has been less used than was expected, possibly because the
instructions were not sufficiently explicit. Nevertheless, the facilities are there for
anyone with a sense of adventure and an idea for a new opcode. This chapter de-
scribes the various stages that are necessary to add one’s own operations to Csound,
drawing on examples of code that have been added by programmers other than the
original Csound author.
In adding a new opcode to Csound, there are five components that ideally should be
provided. They are:
1. New information for entry.c to define the syntax of the opcode
2. A header file (.h) to define the structure—how the opcode passes information
3. The actual C code (.c) that implements the opcode
4. Documentation and a manual page for the opcode
5. An .orc and .sco example to show how to use your new opcode
600 John ffitch
The purpose of the first of these is to make the word used to designate the opcode,
known to Csound; to specify the number and types of the opcode’s input arguments;
and to specify the number and type of the output arguments. These requirements are
achieved by a single line in a table defined in the entry.c file. Each line of this table
gives the complete information for a single opcode.
There are eight entries in the line. The first gives the name of the opcode as a
string. For example, in the line for Csound’s value converter for generating non–
12-note equal-tempered scales, the string cpsxpch introduces the word into the
language.
{ “cpsxpch”, S(XENH), 1, “i”, “iioo”, cpsxpch, NULL, NULL },
The second field defines the size of the data area—how large is the argument and
workspace. The S is a macro for sizeof and the name XENH is the name of the struc-
ture. This is explained in more detail in the next section.
The third field indicates when in performance Csound needs to act on this opcode.
There are three possibilities, note initialization time, control rate and audio rate.
These are coded as the numbers 1, 2 and 4 and the third field is the addition of the
necessary times. In this example the number 1 indicates that it is only required at
note initialization time. The number 5 would indicate that this opcode is usable at
both initialization and audio time (i.e., 1⫹4).
The output or “answer” of the opcode is coded in the fourth field. The possible
types are i, k, or a, for the i-value, k-value or a-value. It is possible to have no answer
(an empty string), or values like “aa” to indicate stereo audio answers. Other values,
such as m, d, or w are possible but are not common.
Similarly, the fifth field codes the input arguments. In addition to the codes already
seen, it is possible to use the following codes:
S String
h Optional, defaulting to 127
j Optional, defaulting to -1
m begins an indefinite list of iargs (any count)
n begins an indefinite list of iargs (nargs odd)
o Optional, defaulting to 0
p Optional, defaulting to 1
q Optional, defaulting to 10
v Optional, defaulting to .5
x a-rate or k-rate
601 Extending Csound
Care does need to be taken with optional arguments. They must be at the end and
they are interpreted strictly left to right. Any missing argument must be a right-most
one. In the example here “iioo” means two compulsory initialization time arguments
and two optional arguments, which, if missing, take the value zero.
The remaining three entries in the table are the functions that are called at initial-
ization, at control-rate and for audio-rate activities. If any of these is absent then a
NULL is used, but the third argument has already indicated which of these are to be
used. These functions are treated as functions of a void* argument (which will be
the argument and variable structure whose size is given in the second field) which
return no answer.
In order to complete the changes to entry.c, you need to add prototypes for any
new functions, for example for the cpsxpch opcode, whose entry is given above, you
will find:
void cpsxpch(void*);
The final stage is to include the header at the start of the file. This defines the new
structure in order that the size can be known.
The main requirement of the header file is to define the data area for the opcode.
This data area is used for the arguments and the “answer,” as well as any local vari-
ables that are needed. Arguments and the answer(s) are addresses—where to find the
floating point number that is the value, or where to find the vector of length ksamps
in the case of audio arguments or answers. If we consider the opcode above,
cpsxpch, we see that it has a total of 5 answers and arguments, so there will need to
be 5 fields of type float*. In fact, the structure is:
typedef struct {
OPDS h;
float *r, *pc, *et, *ref, *cy;
} XENH;
The OPDS field is needed for organizing the collection of opcodes that make an
instrument. For example, it includes chaining of opcodes together so the next perfor-
mance step can be found and the next initialization step. If one of the arguments is a
string it is coded within this header. For most purposes, the user or extender of
Csound need not worry about this header information, except to remember to include
it at the start of every opcode data definition.
602 John ffitch
Following the header there are the five pointers to floats that we expected. Notice
that there is no intrinsic difference between input and output values, or between
single values and audio vectors. It is assumed that the programmer is sufficiently
skilled to use the variables correctly.
If the algorithm requires any values that are determined at initialization time and
used at performance time, then they should be inside this structure. These additional
values can be of any form or type.
A different example is from the nonlinear filter opcode nlfilt.
typedef struct {
OPDS h;
float *ar /* The output */
float *in, *a, *b, *d, *C, *L; /* The parameters */
AUXCH delay; /* Buffer for old values */
int point; /* Pointer to old values */
} NLFILT;
Here the opcode has six arguments and generates audio output. As part of the
algorithm it is useful to have an offset into a buffer. The offset is maintained between
successive calls to the performance function so is coded in the structure. An integer
value is sufficient. The buffer however is included via the AUXCH structure, which
is designed for just such a purpose. There is a function auxalloc to allocate space
with this structure and in a fashion where space is not lost.
The Code
The code embodies the algorithm for the opcode. There are usually two functions.
The first is called at opcode initialization time. Its purpose is typically to check the
arguments, calculate important constants and to initialize performance values and
buffers.
The creation of buffers is important for delay-lines or similar algorithms. Csound
has support for such buffer creation with the structure AUXCH and the allocation
function auxalloc. This function is given the number of bytes required and the ad-
dress of an AUXCH structure. It returns after allocating a buffer of sufficient length.
In the process, it records it so the space can be recovered and undertakes other house-
keeping activities. It is strongly recommended that this mechanism is used rather
than some private method. In the case of the nonlinear filter mentioned above, the
initialization is:
603 Extending Csound
The first test checks to see if there is already a buffer defined from an earlier use
of the structure and if so, it is unnecessary to create a new one. If this test were
omitted, the only loss would be in time.
The second function is the one to be called every control period. It can be called
either entirely for control or for audio-rate. The latter is the more informative. The
structure of the function is usually:
void perform_opcode(STRUCTURE *p)
{
int nsamps ⫽ ksmps;
...other declarations...
...read saved values from structure...
do {
....calculate audio values....
*ar⫹⫹ ⫽ newvalue;
} while (--nsmps);
...write values back to structure...
}
It is worth remarking that the efficiency of the loop in this part of the opcode is
what has the largest effect on the efficiency of the opcode in performance. The usual
programming advice: “avoid recalculation; avoid structure access; and avoid function
calls” cannot be repeated too often.
Of course, the code may include additional functions, or make use of other parts
of Csound.
Each opcode should have a page of documentation, in the same style as the Csound
Reference Manual. This ensures that other users can benefit from the work you have
done in creating the opcode. In fact, the new hypertext manual is such that it can be
easily extended by all opcode creators.
Finally, a small and simple example may significantly help others understand what
the opcode does and how it could be used. For this reason, opcode designers are
strongly encouraged to produce working examples as models for study.
604 John ffitch
When a table is created at performance time (or indeed at compilation time as a result
of the ftgen opcode), the action of Csound is first to check that the table number is
not already in use and then, to allocate the space for the table. After this, the table is
filled by one of the more than 20 Csound GEN routines.
If you want to add a new way of initializing a table, all that is required is to write
a function to do the writing of the data and then to extend the table called gensub
defined in fgens.c. Also, you need to increase the constant GENMAX as well. The
function would look something like the following form:
void gennew(void)
{
float *fp ⫽ ftp-⬎ftable;
float *pp ⫽ &e-⬎p[5];
int nvals ⫽ nargs;
........
}
This fragment demonstrates where the function can find the allocated table (FUNC
*ftp) and the arguments to the table (EVTBLK *e). If the method uses a soundfile, it
should be opened with the function sndgetset(SOUNDIN *) and read with getsndin(-
int, float *, long, SOUNDIN *). For further suggestions it is probably best to read the
file fgens.c.
In the following idealized scenario, it is assumed that a new opcode called chaos is
to be added and this has been correctly coded in the files chaos.h and chaos.c. This
new opcode is assumed to be one to generate audio results from one initialization
value (that is, one i-value) and two control-rate values (two k-values), the second of
which is optional and defaults to zero.
The first action is to edit entry.c in three places:
Step 1: Include Header—At the top of the file there are a number of lines like:
#include “butter.h”
#include “grain.h”
#include “grain4.h”
#include “cmath.h”
605 Extending Csound
To the end of this list of the prototypes, add the two functions, which we will as-
sume are:
void chaosset(void*), chaos(void*);
Step 3: Add Syntax—about 3 lines from the end of the entry.c file it is necessary
to add the definition of the opcode and its arguments for the parser. The new line
will look like:
{ “chaos”, S(CHAOS), 5, “a”, “iko”, chaosset, NULL, chaos }
};
There will be an initialization call and a function to call on every control period (as
specified by the 5) and the two strings encode the answer and argument type.
Step 4: Add to the System—move the files chaos.c and chaos.h into the base
Csound source directory and then add these to the Makefile or the compiler project.
Assuming this has not caused any problems, compile Csound with the compilation
method for your system, whether that is an integrated environment or a Makefile.
Any compilation errors are user errors at this stage and it is beyond the scope of this
chapter to assist in debugging your code. We will assume that all is well.
The only remaining stage is to test the opcode, using the simple example that you
will be supplying. Check that the documentation is in the correct format to be added
into the manual and then you are ready to disseminate your contribution to the world.
There are a number of different platforms that support Csound, with the same func-
tionality. Indeed that is the power of the system; orchestras developed on one plat-
form can run on others. The following description comes from the PC and UNIX
606 John ffitch
The main engine contains the files that do the general organization and conduct the
performance. There are also headers that provide structures throughout the system:
Prototyp.h ANSI C Prototypes
cs.h The main header file
Main.c Argument decoding and organization
Insert.c Insertion of playing events
Insert.h
Musmon.c Control of performance
midirecv.c MIDI message receiving; the MIDI Manager
memalloc.c Memory allocation
memfiles.c Utility; Loading files into memory
sysdep.h System dependent declarations
version.h Current version number and string
filopen.c Utility; Opening files
opcode.c Write list of opcodes
auxfd.c Utility; Various
getstring.c Localisation of messages to languages
text.h
natben.c Utility for byte order
one_file.c .csd format input
argdecode.c command line arguments
Soundfile Input/Output
Utilities
cscore.c Cscore
cscore.h
cscorfns.c Cscore support
cscormai.c
cvanal.c Convolve analysis
extract.c Extract parts of a score
lpanal.c Linear Predictive Coding analysis
lpc.h
hetro.c Hetrodyne Analysis
pvanal.c Phase Vocoder analysis
609 Extending Csound
Opcode Implementation
3Dug.h
aops.c Arithmetic
aops.h
butter.c Butterworth filters
butter.h
cmath.c Mathematical opcodes
cmath.h
convolve.h Convolution
cross2.c Cross synthesis
dam.c Dynamics limiter
dam.h
diskin.c
diskin.h
disprep.c
disprep.h
dsputil.c DSP utilities for opcodes
dsputil.h
dumpf.c Dump of k-values
dumpf.h
fft.c Fast Fourier transforms
fft.h
fgens.c Table generators
fhtfun.h
filter.c Filter2 opcode
filter.h
complex.c Support code for filter
complex.h
610 John ffitch
ugens5.c
ugens5.h
ugens6.c
ugens6.h
ugens7.c
ugens7.h
ugens8.c
ugens8.h
ugens9.c
ugens9.h
ugensa.c FOG opcode
ugensa.h
ugrw1.c Zak system
ugrw1.h
ugrw2.c printk
ugrw2.h
vdelay.c vdelay opcode and echos
vdelay.h
vpvoc.c Sound morphing via Phase Vocoder
vpvoc.h
wavegde.c Waveguide plucked string
wavegde.h
biquad.c Various filters
biquad.h
physmod.c Physical model opcodes
bowed.h
brass.h
clarinet.h
flute.h
mandolin.c
mandolin.h
onepole.h
onezero.h
vibraphn.h
physutil.c Support code for physical models
physutil.h
fm4op.c FM synthesis models
fm4op.h
dcblockr.c DC blocking
dcblockr.h
modal4.c Modal physical models
modal4.h
marimba.h
moog1.c Mini Moog emulation
moog1.h
612 John ffitch
References
Kernighan, B., and D. Ritchie. 1988 The C Programming Language. 2d ed. Englewood Cliffs:
Prentice Hall
Winsor, P. 1991. Computer Music in C. Blue Ridge Summit: Tab Books.
32 Adding New Unit Generators to Csound
Marc Resibois
Csound is an outstanding tool. From the off-the-net version we have the means to
realize a myriad of applications. But still, sometimes we need to optimize a special
purpose algorithm, or sometimes the particular synthesis technique that inspires us
is not yet implemented. Fortunately, Csound was designed to allow users to expand
the language by adding their own opcodes to the basic set.
In this chapter we will learn how to add new functionality to the program by creat-
ing our own processing units and then compiling them into the Csound language.
First, we will look at the steps involved in building a user-extended Csound from the
source code. Then, we will build our own simple unit generator.
Along the way, we’ll see the steps and issues involved in this procedure and focus
on those issues specific to the Csound environment. Finally, we will build a more
complicated opcode that acts as a general purpose compressor/expander unit. This
example will allow us to go over the various steps involved in converting a desired
unit generator’s behavior into actual code that is implemented in Csound.
Quite obviously, this chapter relies on C programming, the language in which
Csound was written. Although it is not absolutely required to fully understand this
language in order to follow the basics of this chapter, it is assumed that the reader
has at least some knowledge of C.
Before even thinking of adding any code to the Csound environment, the first task is
to make sure you can get the existing code to compile. Although this might sound
trivial, it is not always an easy task. If you are familiar with programming, this will
probably not take more than a quarter of an hour, but for novices, it can be hard to
understand what is going wrong if complications arise.
614 Marc Resibois
The compilation process is the action that will turn source code (the program text)
into a program that will run on your computer platform. Basically, compilation in-
volves two phases: the first is to individually compile each of the required pieces of
source code (representing the different functionalities of the software) and turn these
into object code. Then, all the objects are linked together in order to form a single
Csound executable (see figure 32.1).
Compiling for a given platform implies a few strategic decisions. First, you have
to select which part of the code is needed for your machine. And even though Csound
is known to be cross-platform compatible, the way you address a sound output device
on a Silicon Graphics machine is quite different from how you do it on a Macintosh.
For each of these platforms, you will need to select a different source code file, im-
plementing the same functionality for the different hardware.
Then you have to activate the right compiler options. The way to specify some
actions may differ from compiler to compiler and you need to select the right ones.
When it comes to making these choices and determining the correct compiler set-
tings, it is worth noting that all platforms are not equal. For instance, if the source
code you grabbed is destined for a PC or a Mac, it is likely that it will contain a
project file that, assuming you use the same compiler as the creator, already contains
all the settings needed. The only thing you have to do is launch your compiler, open
the project file and press the compile button. If, on the other hand, you work on
UNIX machines, you will have to cope with the dreaded makefile and edit it in order
Source1.c Source1.o
Source2.c Source2.o
Executable
Source3.c Source3.o
Source4.c Source4.o
Figure 32.1 An illustration of the compilation and link phases of the compilation process.
615 Adding New Unit Generators to Csound
to suit your needs. A complete discussion of the makefile syntax is beyond the scope
of this chapter. But if you are a novice, I would recommend that you read some
documentation and try a few little examples before trying to attack the Csound
configuration.
The addition of opcodes to Csound is not complicated, once the basic principles are
known. Generally, the most difficult thing is coming up with a synthesis or pro-
cessing algorithm that provides relevant and useful results, not its interfacing with
Csound. Throughout this section, we will see how to convert a simple process into a
Csound opcode that can be used in an instrument definition. This new opcode will
only generate one single pulse of a given amplitude. In Csound terms, we would
define the following opcode:
aout impulse iamp
Just writing the code from scratch and without worrying about Csound, we might
end up with the following bit of C code:
void impulse(float amplitude, float *buffer, int bufferLength)
{
int i ;
buffer[0]⫽amplitude ;
for (i⫽1; i⬍bufferLength;i⫹⫹) {
buffer[i]⫽0 ;
}
}
Now, starting from this piece of code, we will need to make it into something
compatible with Csound. The first conflict with the code above is that Csound is
oriented toward real-time output. Although this has other implications, the main con-
cern here is that Csound cannot afford to compute the whole content of the sound
buffer, corresponding to the rendering of an instrument and mix it to the result of the
others. Since a note can have an arbitrary length, it has to cut its duration into small
chunks that are processed, one after the other and sent regularly to the output device
that is selected—either DAC or file.
This means that, if an instrument, consisting of a collection of opcodes (let us
suppose opcode 1, opcode 2, . . . , opcode n), is allocated for a given time, Csound’s
sequence of operations doesn’t allow all the opcodes to successively compute their
entire result. Rather, Csound computes some small block of each opcode in succes-
616 Marc Resibois
sion. From an implementation point of view, this has a quite big impact. Indeed, our
main processing routine, instead of being called once to do all the computing, will
be called several times and will have to act on a specified time slice. This means that
we will need to be able to recover the state of the processing from one call to the
next, so that we can continue computing from the place we left off the last time
we called.
Getting back to our example above, it would mean that each time the code is
called, we will have to check if the chunk that is currently processed is the first one
or not. If the answer is yes, then we generate the impulse. Otherwise, we just fill
everything with zeros. In C code, this can lead to something like:
for (i⫽0; i⬍chunkSize; i⫹⫹) {
chunk[i]⫽0 ;
}
if (isFirstTime) {
buffer[0]⫽amplitude ;
}
Note that in Csound, the size of those chunks is actually determined by the ini-
tialization variable ksmps, which tells us how many a-rate samples there are in one
k-rate. Essentially, this means that our opcode will be called at each k-time pass. This
variable is know globally and can be accessed by including the cs.h file.
Since we need to associate some internal state to our opcode, we will have, at
some time, to initialize it. This means that, for each opcode, we need to create at
least two routines. The first one is an initialization routine—called by the kernel at
each new note of an instrument, and the other one is a processing routine—called to
process each chunk of ksmps samples. In our case, we will name them impulseinit
and impulse. Both of them have to be declared as receiving one argument that is an
opcode data structure pointer. This opcode data structure is in fact a collection of
information relevant to the particular opcode that is used to communicate with
Csound. In our case, we will choose the name of this structure to be impulse and we
can thus declare our routines to be:
/* file: impulse.c */
void impulseinit(void *) {
/* init code here */
}
void impulse(void*) {
/* processing code */
}
617 Adding New Unit Generators to Csound
Output Output
Output result chunk 1 chunk n
Let us get into the detail of what the impulse structure must contain. It will act as
a communication channel between the Csound kernel and our opcode. It is through
these means that we receive current values of the input(s) and send back the output
resulting from our processing. This is also the place where we will store the internal
state of our opcode, in order to remember where we are when doing the chunk by
chunk processing. Indeed, the other option would be to store our opcode state in
some global variables, but this would lead us to problems when we have more than
one instance of the opcode working at the same time (e.g., when playing two simulta-
neous notes). Since a new copy of the structure is created for each new instrument’s
note, it is the perfect place to store anything that might be needed to keep track of
our opcode’s internal status.
To create this opcode data structure, we need to follow some rules. First, it has to
start with an OPDS part. This is something we do not really need to know about; it
is just needed by Csound to handle the invocation and registration of the various
instruments in an uniform way. Since the definition of OPDS is located in cs.h, we
will have to include this file at the beginning of our header file:
618 Marc Resibois
/* file: impulse.h */
#include “cs.h”
typedef {
OPDS h ; /* Csound opcode common part */
...
} IMPULSE ;
Then, we have to declare variables for the output and input (in that order). Note
that in Csound, when we communicate a parameter value, it is always through a
pointer to a float. Actually, it is because it wants to handle, in an uniform way, the a,
k or i-rate variable.
Remember that each time the opcode is called, we have to process one k-rate step
and ksmps a-rate steps. This means that any a-rate parameter will be passed as a
buffer of ksmps values while any i and k-rate parameter will be passed as a single
value. Using pointers gives a common interface to both situations. In our case, we
have one a-rate output and one i-rate input. This gives:
typedef {
OPDS h; /* Csound opcode */
float *aout ; /* Declare output first */
float *iamp ; /* Input value */
...
} IMPULSE ;
After those declarations, we can add whatever variable we find suitable for the
good functioning of our opcode. In our case, all we need to know when the pro-
cessing routine is called is whether it is the first time the opcode has been called or
not. We will keep track of this by adding a flag variable to the impulse structure
that will be properly initialized in impulseinit. Our impulse.h file finally looks like
the following:
#include “cs.h”
typedef {
OPDS h; /* Csound opcode */
Now, having all the pieces together, we can easily write the routine’s content. Our
impulse.c file contains the following lines:
/* file: impulse.c */
/* Initialization code */
void impulseinit(void) {
p-⬎isFirstTime⫽ 1 ;
}
void impulse(void*) {
int i;
for (i⫽0;i⬍ksmps;i⫹⫹) {
p-⬎aout[i]⫽0.0 ;
}
if (p-⬎isFirstTime) {
p-⬎aout[I]⫽*(p-⬎iamplitude) ;
p-⬎isFirstTime⫽0 ;
}
}
The last step we need to complete is to let Csound know that our opcode exists
and what its syntax is. The list of all opcodes available is defined in a file called
entry.c. This is where we’ll have to add ours. Since this file will have to know our
code, we will need to reference both our opcode data structure and our routine in it.
This can all be done by including our impulse.h (containing both the impulse defini-
tion and our function prototypes) at the beginning of the file. All opcodes are de-
clared by means of a big array called opcodlst. For each opcode declared in this array,
we have to declare the following fields:
䡲 The name of the opcode in the Csound language [impulse]
䡲 The size of the opcode data structure [sizeof(IMPULSE)]
䡲 When the opcode can be called [i and a rate]
䡲 The rate(s) of the output variables [a]
䡲 The rate(s) of the input variables [i]
䡲 The initialization routine name [impulseinit]
620 Marc Resibois
This gives us the following line that has to be added to the array:
OENTRY opcodlst[] ⫽ {
...
...
{ “impulse”,S(IMPULSE), 5, “a”, “i”, impulseinit, 0 , impulse}
} ;
Now that our code is complete, we have to produce a new executable of Csound.
What we need first is to add the file containing our opcode (the impulse.c file) to the
list of files to be used for the compilation (this is done via the edition of the makefile,
or through a specific project-builder feature) and to ask for a recompilation of all the
modified files (impulse.c and entry.c.) If this compilation is successful, we will now
have a customized version of Csound that is able to understand our new opcode.
In this section we will construct a new opcode, whose goal is to modify the amplitude
of a signal as a function of its power content—a compressor/expander-type unit. It
will help us crystallize the notions we have just learned and will demonstrate how
we can start from a desired behavior and wind up with an actual Csound implementa-
tion. Note that, although the flow of this chapter might suggest the opposite, getting
an opcode right is not trivial. Usually, one has to make some suppositions of the op-
code’s behavior and validate them through the use of a number of model implemen-
tations. It turns out regularly that some assumptions were not correct and that the
model has to be revised, re-implemented and re-tested. As I said earlier, the most
difficult part of adding relevant code to Csound is the modeling itself.
The first thing to do is to analyze what we want to model. In our case, we would like
to build a single opcode that can be used to model a number of compression/expan-
sion/limiting and gating effects found in many commercial recording studio proces-
sor boxes. In order to picture the common attributes of this family, we will take a
look at them one after the other:
621 Adding New Unit Generators to Csound
䡲 Compressors are used when one wants to reduce the amplitude of the peaks found
in a signal, thus making it more uniform. This is achieved by lowering the ampli-
tude of the sound when it is too loud and keeping it unchanged otherwise. Also, a
compressor could be used to smooth the transient attack of an acoustic signal such
as a bass guitar, for instance, while giving more emphasis to its release.
䡲 Expanders are used to increase the dynamics of a signal’s tail. It works by keeping
the signal unchanged when it is loud and amplifying it when it is quiet. It could
be used to lift up an instrument whose release is too fast (on guitar strings or
drums, for example) to achieve a hold effect.
䡲 Limiters ensure that the level of the output signal will not exceed a given value
(to avoid hard clipping in digital recording systems, for example.) This effect is
obtained by reducing a lot of the signal’s amplitude when it is above the given limit
and keeping it unchanged when it is under.
䡲 Noise gates are used to remove static or system noise when no musical sounds are
being produced. This is done by setting the output gain to zero when the sound is
under a low threshold level and by leaving the signal unchanged when it is above
this threshold.
From this enumeration, we can deduce that what we want from our opcode is a
processor that modifies the amplitude of the sound fed into it. In fact, each of the
four algorithms described above uses a ‘threshold’ value to divide the amplitude
range of the input signal into two zones and, depending whether the sound’s ampli-
tude is over or under this threshold, to amplify the input by the corresponding factors
(see figure 32.3).
Let us consider the prerequisites of our unit. First, our opcode should not modify
the harmonic content of our signal. This means that the only transformation we can
apply to the input signal is a multiplication. Moreover, this multiplication factor has
to be the same, no matter if the instantaneous value of the input is above or under the
threshold. Failing to do so would introduce modifications in the original waveform
and distort the signal. As a consequence, the central part of our opcode has to be
something like this:
aout(t) ⫽ ain(t)*gain(t)
Thus, our job will be to properly link or map the value of the gain applied to the
input signal with its own loudness characteristics. Again, since we do not want to
modify the signal’s harmonic content, the variation of the gain will have to be much
slower than the frequency content of the signal; we cannot directly link the gain to
the value of ain. What we need is an average value of the amplitude; that is, we will
link the gain to the mean value of the signal’s power over a short period of time.
622 Marc Resibois
Threshold zone 1
zone 2
Starting from that, we can start to shape the algorithm. We will somehow have to
measure an average power value of the signal and compute a resulting gain, de-
pending on whether this value is above the threshold or not (see figure 32.4).
The way the gain is computed in both zones will depend of the settings made by
the user of our opcode. Intuitively, we want to allow the user to say how strongly the
sound should be amplified or reduced in each region. In order to do this, we will
introduce a parameter called compression ratio for both zones. With a ratio equal to
one, the sound is left unchanged. When greater than one, the sound will be expanded
and, when less, it will be compressed.
We now need to establish a relationship that links the compression ratio defined
by the user to the gain to be applied to the signal. To do this properly, we have to
consider and comply with some additional criteria. First, the application of a change
in one of the regions should never send the signal over the other threshold. Indeed,
what we want is to control how the sound is modified in a given range, not send it
all over the place.
Also, to help us realize a smooth transition between the two zones, no matter what
the settings in either, we would like the signal’s power left unchanged when it hovers
around the threshold value. In order to address these requirements, we will consider
power values and, for each zone, define how a given input power value should induce
some output power value, given the various compression ratios. The actual gain to
be applied to the sound will then be computed by evaluating the ratio of the output
power divided by the input power.
Let us first take a look to what happens when the signal is over the threshold. First,
we know that the resulting output power should never be under the threshold. Also,
for a compression ratio lower than one, we would want the general output power to
623 Adding New Unit Generators to Csound
yes
be lower than the input power and vice versa. One can easily meet these specifica-
tions by having a linear relation between the input and output power. Thus we define
a family of lines that passes through the point (threshold,threshold) and have a slope
equal to the compression factor as shown in figure 32.5.
Then, on the lower side of the threshold, we have to satisfy one more condition. If
the input power is zero, the output power should also be zero. Therefore, the relation
between the input and the output has to be a family of curves, always passing by the
two points (0,0) and (threshold,threshold). For the compression ratio equal to one,
the curve should be a straight line, for values lower than one, the curve should raise
slower than this line and for a compression ratio greater than one, it should raise
faster. One can achieve this behavior by adopting the family of curves shown in
figure 32.6.
Now, for any value of the input signal power, using the curves shown in figure
32.6, we are able to compute what the output power has to be and derive what gain
should be applied to the signal.
The last situation we have to account for is if the sound power suddenly changes
from region to region. Often we do not want the gain to change drastically and would
rather have a smooth change from one behavior to the other. To do so, we will intro-
duce two speeds that describe how fast the gain is allowed to grow and how fast it is
allowed to diminish. The full algorithm is shown in figure 32.7.
624 Marc Resibois
Threshold=100
350
300
250
200
150
100
50
0
100 Input power 150
0.1
1
5
1/n
x
0.8
0.6
0.4
0.2
0
0
0.05
0.15
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.5
1
2
10
Compute Input
Sound Power
yes
yes
In itself, the processing is not complicated. Each time we have a new sample, we
first have to compute the average sound around the current time. Since we are in a
real-time oriented environment, we cannot use the future samples and will therefore
rely only on past values. One convenient way to implement the power calculation is
to have a buffer filled with the n previous values of the samples. Each time we start
processing a new sample we add its value to the power and retrieve the oldest one
contained in the buffer. Then we drop the oldest value from the buffer and replace it
with the newest. This can be represented in the following C code:
/* NOTE: powerPos is the current position in the power buffer */
/* Adds and store value of new sample */
*powerPos⫽fabs(ain[i])/(float)POWER_BUFSIZE*sqrt(2.0) ;
power⫹⫽(*powerPos⫹⫹) ;
if ((powerPos-powerBuffer)⫽⫽ POWER_BUFSIZE) {
powerPos⫽p-⬎powerBuffer ;
}
power-⫽(*powerPos) ;
Looking at the current value, we see which side of the threshold we are on and,
according to the equations described above, deduct a target gain to be reached:
/* Looks where the power is related to the threshold
and compute target gain */
if (power⬎threshold) {
tg⫽((power-threshold)*comp1⫹threshold)/power ;
} else {
tg⫽threshold*pow(power/threshold,1/comp2)/power ;
}
Finally, we modify the actual gain value in the direction of the target gain, accord-
ing to the allowed speed and apply it to the input sound:
627 Adding New Unit Generators to Csound
if (gain⬍tg) {
gain⫹⫽p-⬎rspeed ;
} else {
gain-⫽p-⬎fspeed ;
}
/* compute output */
aout[i]⫽ain[i]*gain ;
We will now follow all the steps needed to introduce our algorithm into the Csound
environment. Since we know all the parameters governing our opcode, we can define
its syntax in the orchestra:
ar dam ain, kthreshold, icomp1, icomp2, rtime, ftime
With ar being the output signal, ain the input signal, kthreshold the sound level
separating the two zones, icomp1 and icomp2 their associated compression ratio,
rtime the time in milliseconds it takes for the gain to rise one unit and ftime the time
it takes for the gain to drop one unit value.
As we’ve seen, the opcode data needs to respect some structure. First, it has to con-
tain the OPDS part:
/* file dam.h */
#include “cs.h”
typedef {
...
} DAM ;
And after those declarations, we can add all that we need to store the state of our
opcode from one call to the other. From the processing code we have just described,
we can see that from one call to the other we need to keep track of:
䡲 The current gain
䡲 The current power value
䡲 The power buffer
䡲 The current “store” position in the buffer
Additionally, we will need to derive gain variation speeds (in gain unit per sample)
from the time given by the user and the sample rate. Since this conversion is always
the same, we’ll do it once, at initialization time and store the value in the opcode too.
We finally wind up with a complete structure looking like:
typedef struct {
OPDS h ;
float rspeed ;
float fspeed ;
629 Adding New Unit Generators to Csound
float gain ;
float power ;
float powerBuffer[POWER_BUFSIZE] ;
float *powerPos ;
} DAM ;
Initialization Routine
As we saw, this routine is called once every time a new instrument copy is started. It
is here that the additional variables representing the internal state of the particular op-
code instance have to be set. In our particular case, we need to initialize the current gain
value, the power value, the power buffer and compute the two gain speed changes.
void daminit(DAM *p)
{
int i ;
p-⬎gain⫽1.0 ;
p-⬎rspeed⫽(*p-⬎rtime)/esr*1000.0 ;
p-⬎fspeed⫽(*p-⬎ftime)/esr*1000.0 ;
p-⬎power⫽*(p-⬎kthreshold) ;
for (i⫽0;i⬍POWER_BUFSIZE;i⫹⫹) {
p-⬎powerBuffer[i]⫽p-⬎power/(float)POWER_BUFSIZE ;
}
p-⬎powerPos⫽p-⬎powerBuffer ;
}
630 Marc Resibois
The main routine has to treat ksmps samples each time the function is called. Ac-
cording to what we just saw, we have:
/*
* Run-time computation code
* /
ain⫽p-⬎ain ;
aout⫽p-⬎aout ;
threshold⫽*(p-⬎kthreshold) ;
gain⫽p-⬎gain ;
comp1⫽*(p-⬎icomp1) ;
comp2⫽*(p-⬎icomp2) ;
powerPos⫽p-⬎powerPos ;
powerBuffer⫽p-⬎powerBuffer ;
power⫽p-⬎power ;
for (i⫽0;i⬍ksmps;i⫹⫹) {
*powerPos⫽fabs(ain[i])/(float)POWER_BUFSIZE*sqrt(2.0) ;
power⫹⫽(*powerPos⫹⫹) ;
if ((powerPos-powerBuffer)⫽⫽POWER_BUFSIZE) {
powerPos⫽p-⬎powerBuffer ;
}
power-⫽(*powerPos) ;
631 Adding New Unit Generators to Csound
if (power⬎threshold) {
tg⫽((power-threshold)*comp1⫹threshold)/power ;
} else {
tg⫽threshold*pow(power/threshold,1/comp2)/power ;
}
if (gain⬍tg) {
gain⫹⫽p-⬎rspeed ;
} else {
gain-⫽p-⬎fspeed ;
}
/* compute output */
aout[i]⫽ain[i]*gain ;
}
p-⬎gain⫽gain ;
p-⬎power⫽power ;
p-⬎powerPos⫽powerPos ;
}
To let our opcode be known, we just have to add a new opcode line in the opcodlst
table. The characteristics of our opcode are:
䡲 Its name
䡲 The size of the opcode data structure
䡲 When can it be called (i- and a-rates)
䡲 The rate of the output variable(s)
䡲 The rate of the input variables (in order)
䡲 Its initialization function
632 Marc Resibois
Figure 32.8 Table of recommended parameter values for the dam opcode’s various modes
of operation.
Remember, because this line has to know your opcode data and your function proto-
types, we will need to include dam.h at the beginning of the entry.c file.
Now that our opcode is created, we can recapitulate the various units described in
figure 32.8 and see what parameters need to be used.
In addition to those uses, the opcode can also be turned into a ducker, a unit that
reduces the loudness of one signal in response to the value of another. In order to do
that, one has simply to put the opcode in limiter mode and adjust the threshold at
k-rate as a function of the master sound level.
Conclusion
As a quick summary of the work needed for adding our opcodes, we’ve seen that
first we had to create two new functions—one for the opcode initialization and one
for the run-time code. We have chosen to place those functions in a separate source
file—dam.c, although we could have put them into an existing one. Then, we had to
edit the entry.c file in order to let Csound know the presence of our routines. Using
our compilation environment, we added the new file to the list of currently included
source code, then recompiled the program with all those changes and created a new
version of Csound with the new functionality of our dam and impulse opcodes.
633 Adding New Unit Generators to Csound
References
Embree, P. M. 1995. C Algorithms for Real-time DSP. Englewood Cliffs: Prentice Hall.
Kernighan, B., and D. Ritchie. 1988. The C Programming Language. 2nd ed. Englewood
Cliffs: Prentice Hall.
Moore, R. F. 1990. Elements of Computer Music. New York: Prentice Hall.
Press, W., B. Flannery, S. Tevkolsky, and W. Vetterling. 1993. Numerical Recipes in C. 2nd
ed. Cambridge, England: Cambridge University Press.
Appendixes
List of the Csound Book Chapter Instruments
instr 1102
1101 ; SIMPLE OSCILLATOR WITH AMPLITUDE ENVELOPE
instr 1105 ; ENVELOPE-CONTROLLED WHITE NOISE
instr 1106 ; PULSE TRAIN WITH AMPLITUDE ENVELOPE
instr 1109 ; SIMPLE AMPLITUDE MODULATION
instr 1111 ; RING MODULATION
640 Appendix 1
Chadabe, J. 1997. Electric Sound: The Past and Promise of Electronic Music. New York:
Prentice Hall.
De Poli, G., A. Piccialli, and C. Roads. 1991. Representations of Musical Signals. Cambridge,
Mass.: MIT Press.
De Poli, G., A. Piccialli, S. T. Pope, and C. Roads. 1997. Musical Signal Processing. Lisse,
The Netherlands: Swets and Zeitlinger.
Dodge, C., and T. Jerse. 1997. Computer Music. 2d rev. New York: Schirmer Books.
Eliot, T. S. 1971. Four Quartets. New York: Harcourt Brace & Company.
Embree, P. 1995. C Algorithms for Real-Time DSP. Englewood Cliffs: Prentice Hall.
Kernighan, B., and D. Ritchie. 1988. The C Programming Language. 2d ed. Englewood Cliffs:
Prentice Hall.
Mathews, Max V. 1969. The Technology of Computer Music. Cambridge, Mass.: MIT Press.
Mathews, Max V., and J. R. Pierce. 1989. Current Directions in Computer Music Research.
Cambridge, MA: MIT Press.
Oppenheim, A. V., and A. S. Willsky, with S. H. Nawab, 1997. Signals and Systems. 2d ed.
New Jersey: Prentice Hall.
Pierce, J. R. 1992. The Science of Musical Sound. 2d rev. ed. New York: W. H. Freeman.
Pohlmann, Ken C. 1995. Principles of Digital Audio. 3d ed. New York: McGraw-Hill.
646 Appendix 2
Press, W., B. Flannery, S. Tevkolsky, and W. Vetterling. 1993. Numerical Recipes in C. 2d ed.
Cambridge, England: Cambridge University Press.
Roads, C. 1996. The Computer Music Tutorial. Cambridge, Mass.: MIT Press.
Roads, C., and J. Strawn. 1987. Foundations of Computer Music. 3d ed. Cambridge, Mass.:
MIT Press.
Collections
Dinosaur Music: Chris Chafe, In a Word; David Jaffe, Silicon Valley Breakdown; and William
Schottstaedt, Dinosaur Music. Wergo. WER 2016-50.
Electroacoustic Music II: Jonathan Berger, An Island of Tears; Peter Child, Ensemblance;
James Dashow, Disclosures; John Duesenberry, Agitato (Ergo Sum); Gerald Shapiro, Phoenix.
Neuma. 450–75.
Electroacoustic Music Classics: Edgard Varese, Poeme Electronique; Milton Babbitt, Pho-
nemena, Philomel; Roger Reynolds, Transfigured Wind IV; Iannis Xenakis, Mycenae-Alpha.
Neuma. 450–74.
Music from SEAMUS Vol. 7: Tom Lopez, Hollow Ground II, William Rice, SATX; Bruce Ham-
ilton, Interzones; Jon Christopher Nelson, They Wash Their Ambassadors in Citrus and Fen-
nel; Matt Ingalls, f(Ear); Glenn Hackbarth, Passage. EAM-9801. 278044/45.
Le Chant du Monde Cultures Electroniques Vol. 2: Richard Karpen, Exchange; Jonathan Ber-
ger, Meteora; Julio d’Escrivan, Sin ti por el Alma Adentro; Henri Kergomard, Hapsis; Georg
Katzer, La Mecanique et les Agents de l’Erosion; John Rimmer, Fleeting Images; Javier Alva-
rez, Papalotl; Josh Levine, Tel; Werner Kaegi, Ritournelles; Ben Guttman, Different Attitudes
(2 CDs). LDC 278044/45.
Pioneers of Electronic Music: Arel, Stereo Electronic Music No.2; Davidovsky, Syncronisms
No.5; Luening, Low Speed, Invention in 12 Tones, Fantasy in Space and Moonflight; Luening
and Ussachevsky, Incantation; Shields, Transformation of Ani; Smiley, Kolyosa; Ussachevsky,
Sonic Contours, Piece for Tape Recorder and Computer Piece No.1. CRI. CD 611.
648 Appendix 3
Wergo Computer Music Currents Vol. 4: David Evan Jones, Scritto; Michel Decoust, In-
terphone; Charles Dodge, Roundelay; Jean-Baptiste Barriere, Chreode I; Trevor Wishart,
VOX-5; Roger Reynolds, The Vanity of Words. Wergo. WER 2024-50.
Wergo Computer Music Currents Vol. 9: Gerald Bennett, Kyotaka; Joji Yuasa, Towards the
Midnight Sun; Thomas Delio, Against the Silence; William Albright, Sphaera. Wergo. WER
2029-2.
Wergo Computer Music Currents Vol. 13: The Historical CD of Digital Sound Synthesis. New-
man Guttman, The Silver Scale and Pitch Variations; John R. Pierce, Stochatta, Variations in
Timbre, Attack, Sea Sounds and Eight-Tone Canon; Max V. Mathews, Numerology, The Sec-
ond Law, Bicycle Built for Two, Masquerades, and International Lullaby; David Lewin, Study
No. 1 and Study No. 2; James Tenney, Dialogue; Ercolino Ferretti, Pipe and Drum and Trio;
James Randal, Mudgett and Monologues for a Mass Murderer. Wergo. WER 2033-2.
Aphex Twin. Selected Ambient Works, Vol. II. Sire/Warner Bros. 9 45482-2.
Byrne, David, and Eno, Brian. My Life in the Bush of Ghosts. Sire. 6093-2.
Dhomont, Francis. Sous le regard d’un soleil noir. empreintes DIGITALes. IMED 9633.
Fast, Larry. Synergy—Electronic Realizations for Rock Orchestra. Chronicles. 314 558 042-2.
649 Recommended Listening
Future Sound of London. “Lifeforms: paths 1–7” (Maxi-Single). Astralwerks. ASW 6114-2.
McNabb, Michael. Computer Music. Mobile Fidelity Sound Lab. MFCD 818.
Nine Inch Nails. Further Down the Spiral. Nothing/Interscope. Halo ten. 95811-2.
Nine Inch Nails. “The Perfect Drug” Versions. Nothing/Interscope. Halo Eleven. Intdm-
95007.
Redolfi, Michel. Too much Sky, Desert Tracks, Pacific Tubular Waves. INA. C 1005.
Risset, Jean-Claude. Songes, Passages, Little Boy, Sud. Wergo. WER 2013-50.
Subotnick, Morton. Silver Apples of the Moon, The Wild Bull. Wergo. WER 2035-2 282 035-2.
Formant Values
SOPRANO “a” F1 F2 F3 F4 F5
freq (Hz) 800 1150 2900 3900 4950
amp (dB) 0 -6 -32 -20 -50
bw (Hz) 80 90 120 130 140
SOPRANO “e” F1 F2 F3 F4 F5
freq (Hz) 350 2000 2800 3600 4950
amp (dB) 0 -20 -15 -40 -56
bw (Hz) 60 90 100 150 200
SOPRANO “i” F1 F2 F3 F4 F5
freq (Hz) 270 2140 2950 3900 4950
amp (dB) 0 -12 -26 -26 -44
bw (Hz) 60 90 100 120 120
654 Appendix 4
SOPRANO “o” F1 F2 F3 F4 F5
freq (Hz) 450 800 2830 3800 4950
amp (dB) 0 -11 -22 -22 -50
bw (Hz) 70 80 100 130 135
SOPRANO “u” F1 F2 F3 F4 F5
freq (Hz) 325 700 2700 3800 4950
amp (dB) 0 -16 -35 -40 -60
bw (Hz) 50 60 170 180 200
ALTO “a” F1 F2 F3 F4 F5
freq (Hz) 800 1150 2800 3500 4950
amp (dB) 0 -4 -20 -36 -60
bw (Hz) 50 60 170 180 200
ALTO “e” F1 F2 F3 F4 F5
freq (Hz) 400 1600 2700 3300 4950
amp (dB) 0 -24 -30 -35 -60
bw (Hz) 60 80 120 150 200
ALTO “i” F1 F2 F3 F4 F5
freq (Hz) 350 1700 2700 3700 4950
amp (dB) 0 -20 -30 -36 -60
bw (Hz) 50 100 120 150 200
ALTO “o” F1 F2 F3 F4 F5
freq (Hz) 450 800 2830 3500 4950
amp (dB) 0 -9 -16 -28 -55
bw (Hz) 70 80 100 130 135
ALTO “u” F1 F2 F3 F4 F5
freq (Hz) 325 700 2530 3500 4950
amp (dB) 0 -12 -30 -40 -64
bw (Hz) 50 60 170 180 200
TENOR “a” F1 F2 F3 F4 F5
freq (Hz) 650 1080 2650 2900 3250
amp (dB) 0 -6 -7 -8 -22
bw (Hz) 80 90 120 130 140
TENOR “e” F1 F2 F3 F4 F5
freq (Hz) 400 1700 2600 3200 3580
amp (dB) 0 -14 -12 -14 -20
bw (Hz) 70 80 100 120 120
TENOR “i” F1 F2 F3 F4 F5
freq (Hz) 290 1870 2800 3250 3540
amp (dB) 0 -15 -18 -20 -30
bw (Hz) 40 90 100 120 120
655 Formant and Sound Intensity Values
TENOR “o” F1 F2 F3 F4 F5
freq (Hz) 400 800 2600 2800 3000
amp (dB) 0 -10 -12 -12 -26
bw (Hz) 40 80 100 120 120
TENOR “u” F1 F2 F3 F4 F5
freq (Hz) 350 600 2700 2900 3300
amp (dB) 0 -20 -17 -14 -26
bw (Hz) 40 60 100 120 120
BASS “a” F1 F2 F3 F4 F5
freq (Hz) 600 1040 2250 2450 2750
amp (dB) 0 -7 -9 -9 -20
bw (Hz) 60 70 110 120 130
BASS “e” F1 F2 F3 F4 F5
freq (Hz) 400 1620 2400 2800 3100
amp (dB) 0 -12 -9 -12 -18
bw (Hz) 40 80 100 120 120
BASS “i” F1 F2 F3 F4 F5
freq (Hz) 250 1750 2600 3050 3340
amp (dB) 0 -30 -16 -22 -28
bw (Hz) 60 90 100 120 120
BASS “o” F1 F2 F3 F4 F5
freq (Hz) 350 600 2400 2675 2950
amp (dB) 0 -20 -32 -28 -36
bw (Hz) 40 80 100 120 120
COUNTERTENOR “a” F1 F2 F3 F4 F5
freq (Hz) 660 1120 2750 3000 3350
amp (dB) 0 -6 -23 -24 -38
bw (Hz) 80 90 120 130 140
COUNTERTENOR “e” F1 F2 F3 F4 F5
freq (Hz) 440 1800 2700 3000 3300
amp (dB) 0 -14 -18 -20 -20
bw (Hz) 70 80 100 120 120
COUNTERTENOR “i” F1 F2 F3 F4 F5
freq (Hz) 270 1850 2900 3350 3590
amp (dB) 0 -24 -24 -36 -36
bw (Hz) 40 90 100 120 120
COUNTERTENOR “o” F1 F2 F3 F4 F5
freq (Hz) 430 820 2700 3000 3300
amp (dB) 0 -10 -26 -22 -34
bw (Hz) 40 80 100 120 120
656 Appendix 4
COUNTERTENOR “u” F1 F2 F3 F4 F5
freq (Hz) 370 630 2750 3000 3400
amp (dB) 0 -20 -23 -30 -34
bw (Hz) 40 60 100 120 120
Pitch Conversion
Csound is capable of generating about 1000 difference messages. These are not all easy to
understand, but they fall generally into 4 catagories.
䡲 Fatal: messages of this class cause Csound to stop, as they are irrecoverable.
䡲 Error: there messages indicate an error, usually of the user getting something wrong in an
opcode. They do not cause Csound to stop but the individual note may stop, or other similar
patch-up is taken. Errors in the parsing of the orchestra may cause Csound to refuse to run.
䡲 Warning: warning messages are not necessarily errors, but indicate that there may be some-
thing wrong. The user should ensure that they are expecting this situation, or correct for
later runs.
䡲 Information: Csound prints a number of informational messages as it runs. These include
the names of the orchestra file, number of instruments and so on. There are mainly for
reassurance, but they can also indicate small errors.
The following table contains all the messages that Csound generates. The words in italics will
be replaced by numbers, strings or characters as indicated. Of course not all error messages
relate to all platforms. The list here is sorted in some alphabetical order to make it easier to
look up a message.
closed Information
closing bracket not allowed in context [] Fatal
closing the file ... Information
coef range: float - float Information
coefficients too large(param1 ⫹ param2) Fatal
coeffs not allocated! Fatal
comb: not initialized Fatal
command-line srate / krate not integral Fatal
config : string Information
connect failed Warning
CONVOLVE: cannot load string Error
CONVOLVE: channel number greater than number of Error
channels in source
CONVOLVE: not initialized Fatal
CONVOLVE: output channels not equal to number of Error
channels in source
could not find indefinitely playing instr Warning
integer
could not get audio information Fatal
could not open /dev/audio for writing Fatal
could not set audio information Fatal
couldn’t allocate for initial shape Fatal
couldn’t configure output device Fatal
couldn’t find Warning
couldn’t open space file Fatal
couldn’t write the outfile header Fatal
Csound Version integer.integer (string) Information
ctrl integer has no exclus list Information
currently active instruments may find this Information
disturbing
668 Appendix 6
nh partials ⬍ 1 Fatal
nlfilt: not initialized Fatal
no amplitude maximum Fatal
no amplitude minimum Fatal
no arguments Fatal
no base frequency for clarinet Error
no base frequency for flute Error
no begin time Fatal
no channel Fatal
no coefs present Fatal
no comment string Fatal
no control rate Fatal
no duration time Fatal
no filter cutoff Fatal
no framesize Fatal
no fundamental estimate Fatal
no hardware bufsamps Fatal
no harmonic count Fatal
no high frequency Fatal
no hopsize Fatal
no infilename Fatal
no iobufsamps Fatal
no latch Fatal
no legal base frequency Fatal
no linein score device_name Fatal
no log file Fatal
no looping Information
no low frequency Fatal
no memory for PVOC Information
683 Error Messages
sr ⫽ iarg
kr ⫽ iarg
ksmps ⫽ iarg
nchnls ⫽ iarg
strset iarg, “stringtext”
pset con1, con, ...
seed ival
gir ftgen ifn, itime, isize, igen, iarga[,
iargb, ... iargz]
massign ichnl, insno
ctrlinit ichnkm, ictlno1, ival1[, ictlno2,
ival2[, ictlno3, ival3[, ... ival32]]
instr NN
endin
702 Appendix 7
i/k/ar ⫽ iarg
i/k/ar init iarg
ir tival
i/k/ar divz ia, ib, isubst
ihold
turnoff
ir active insnum
cpuprc insnum, ipercent
maxalloc insnum, icount
prealloc insnum, icount
i/kr timek
i/kr times
kr timeinstk
kr timeinsts
clockon inum
clockoff inum
ir readclock inum
703 Quick Reference
(a ⬎ b ? v1 : v2)
(a ⬍ b ? v1 : v2)
(a ⬎⫽ b ? v1 : v2)
(a ⬍⫽ b ? v1 : v2)
(a ⫽⫽ b ? v1 : v2)
(a !⫽ b ? v1 : v2)
igoto label
tigoto label
kgoto label
goto label
if ia R ib igoto label
if ka R kb kgoto label
if ia R ib goto label
timout istrt idur label
label :
704 Appendix 7
reinit label
rigoto label
rireturn
ival notnum
ival veloc [ilow, ihigh]
icps cpsmidi
i/kcps cpsmidib [irange]
icps cpstmid ifn
ioct octmidi
i/koct octmidib [irange]
ipch pchmidi
706 Appendix 7
kstatus, midiin
kchan,
kdata1,
kdata2
midiout kstatus, kchan, kdata1, kdata2
mclock ifreq
mrtmsg imsgtype
xtratim iextradur
kflag release
a1 in
a1, a2 ins
a1, a2, inq
a3, a4
a1, a2, inh
a3, a4,
a5, a6
a1, a2, ino
a3, a4,
a5, a6,
a7, a8
a1[, a2 soundin ifilcod[, iskptim[, iformat]]
[, a3, a4]]
a1[,a2 diskin ifilcod, kpitch[, iskiptim[,
[, a3, a4]] iwraparound[, iformat]]]
out asig
outs1 asig
outs2 asig
outs asig1, asig2
outq1 asig
outq2 asig
outq3 asig
outq4 asig
outq asig1, asig2, asig3, asig4
outh asig1, asig2, asig3, asig4, asig5,
asig6
outo asig1, asig2, asig3, asig4, asig5,
asig6, asig7, asig8
soundout asig1, ifilcod[, iformat]
soundouts asig1, asig2, ifilcod[, iformat]
ir filelen “ifilcod”
ir filesr “ifilcod”
ir filenchnls “ifilcod”
ir filepeak “ifilcod”[, ichnl]
f # time size 19 pna stra phsa dcoa pnb strb phsb dcob
f # time size 11 nh lh r
Wavetable synthesis
amplitude indexing, 161
contiguous-group, 188
French horn, 187–194
Wave terrain synthesis
GEN routines, 91–97
orchestra code, 94
parameters, 93
simple and complex, 96
tables, 91–97
Weibull distributions, 329
White noise, 231, 322, 333
Whittle, R., 575
Wind instruments, modeling, 187
Window functions, 88, 90
Windowing, phase vocoder, 544
Windows
filter, 425
quick reference, 723
specific flags, 723
Wrap-around lookup, 65