Curtis Roads
Roads, Curtis.
Microsound / Curtis Roads.
p. cm.
Includes bibliographical references and index.
ISBN 0-262-18215-7 (hc. : alk. paper)
1. MusicÐAcoustics and physics. 2. Electronic musicÐHistory and criticism.
3. Computer musicÐHistory and criticism. I. Title.
ML3805 .R69 2001
781.20 2Ðdc21 2001030633
Introduction
Acknowledgments
Overview
References 353
Beneath the level of the note lies the realm of microsound, of sound parti-
cles. Microsonic particles remained invisible for centuries. Recent technological
advances let us probe and explore the beauties of this formerly unseen world.
Microsonic techniques dissolve the rigid bricks of music architectureÐthe notes
Ðinto a more ¯uid and supple medium. Sounds may coalesce, evaporate, or
mutate into other sounds.
The sensations of point, pulse (regular series of points), line (tone), and sur-
face (texture) appear as the density of particles increases. Sparse emissions leave
rhythmic traces. When the particles line up in rapid succession, they induce the
illusion of tone continuity that we call pitch. As the particles meander, they
¯ow into streams and rivulets. Dense agglomerations of particles form swirling
sound clouds whose shapes evolve over time.
In the 1940s, the Nobel prize winning physicist Dennis Gabor proposed that
any sound could be decomposed into acoustical quanta bounded by discrete
units of time and frequency. This quantum representation formed the famous
Gabor matrix. Like a sonogram, the vertical dimension of the Gabor matrix
indicated the location of the frequency energy, while the horizontal dimension
indicated the time region in which this energy occurred. In a related project,
Gabor built a machine to granulate sound into particles. This machine could
alter the duration of a sound without shifting its pitch.
In these two projects, the matrix and the granulator, Gabor accounted for
both important domains of sound representation. The matrix was the original
windowed frequency-domain representation. ``Windowed'' means segmented in
time, and ``frequency-domain'' refers to spectrum. The granulation machine, on
the other hand, operated on a time-domain representation, which is familiar to
anyone who has seen waveforms in a sound editor. This book explores micro-
sound from both perspectives: the windowed frequency-domain and the micro
viii Introduction
This book derives from a doctoral thesis written for the Universite de Paris VIII
(Roads 1999). It would never have started without strong encouragement from
Professor Horacio Vaggione. I am deeply indebted to him for his patient ad-
vocacy, as well as for his inspired writings and pieces.
The congenial atmosphere in the DeÂpartement Musique at the UniversiteÂ
de Paris VIII was ideal for the gestation of this work. I would also like to ex-
tend my sincere appreciation to Jean-Claude Risset and Daniel Ar®b. Despite
much pressure on their time, these pioneers and experts kindly agreed to
serve on the doctoral committee. Their commentaries on my text resulted in
major improvements.
I owe a debt of thanks to my colleague GeÂrard Pape at the Centre de
CreÂation Musicale «Iannis Xenakis» (CCMIX) for his support of my research,
teaching, and composition. I must also convey appreciation to Iannis Xenakis
for his brilliant example and for his support of our work in Paris. My ®rst
contact with him, at his short course in Formalized Music in 1972, started me
on this path.
I completed this book while teaching in the Center for Research in Electronic
Art Technology (CREATE) in the Department of Music and in the Media
Arts and Technology Program at the University of California, Santa Barbara.
I greatly appreciate the friendship and support of Professor JoAnn Kuchera-
Morin, Director of CREATE, during this productive period. I would also like
to extend my thanks to the rest of the CREATE team, including Stephen T.
Pope for his collaboration on pulsar synthesis in 1997. It was a great pleasure to
work with Alberto de Campo, who served as CREATE's Research Director
in 1999±2000. Together we developed the PulsarGenerator software and the
Creatovox synthesizer. I consider these engaging musical instruments to be
among the main accomplishments of this research.
x Acknowledgments
The ®nal sections present techniques of spatialization with sound particles, and
convolution with microsounds.
Chapter 6 explores a variety of sound transformations based on windowed
spectrum analysis. After a theoretical section, it presents the main tools of win-
dowed spectrum transformation, including the phase vocoder, the tracking
phase vocoder, the wavelet transform, and Gabor analysis.
Chapter 7 turns from technology to compositional applications. It begins
with a description of the ®rst studies realized with granular synthesis on a digi-
tal computer. It then looks at particle techniques in my recent compositions, as
well as those by Barry Truax, Horacio Vaggione, and other composers.
Chapter 8, on the aesthetics of composing with microsound, is the most
philosophical part of the book. It highlights both speci®c and general aesthetic
issues raised by microsound in composition.
Chapter 9 concludes with a commentary on the future of microsound in
1 Time Scales of Music
makes the computer an ideal testbed for the representation of musical structure
on multiple time scales.
This chapter examines the time scales of music. Our main focus is the micro
time scale and its interactions with other time scales. By including extreme time
scalesÐthe in®nite and the in®nitesimalÐwe situate musical time within the
broadest possible context.
6. Micro Sound particles on a time scale that extends down to the thresh-
old of auditory perception (measured in thousandths of a second or milli-
7. Sample The atomic level of digital audio systems: individual binary sam-
ples or numerical amplitude values, one following another at a ®xed time
interval. The period between samples is measured in millionths of a second
8. Subsample Fluctuations on a time scale too brief to be properly recorded
or perceived, measured in billionths of a second (nanoseconds) or less.
9. In®nitesimal The ideal time span of mathematical durations such as the
in®nitely brief delta functions.
Figure 1.1 portrays the nine time scales of the time domain. Notice in the
middle of the diagram, in the frequency column, a line indicating ``Conscious
time, the present (@600 ms).'' This line marks o¨ Winckel's (1967) estimate of
the ``thickness of the present.'' The thickness extends to the line at the right
indicating the physical NOW. This temporal interval constitutes an estimate
of the accumulated lag time of the perceptual and cognitive mechanisms asso-
ciated with hearing. Here is but one example of a disparity between chronosÐ
physical time, and tempusÐperceived time (KuÈpper 2000).
The rest of this chapter explains the characteristics of each time scale in turn.
We will, of course, pay particular attention to the micro time scale.
As sound passes from one time scale to another it crosses perceptual bound-
aries. It seems to change quality. This is because human perception processes
each time scale di¨erently. Consider a simple sinusoid transposed to various
time scales (1 msec, 1 ms, 1 sec, 1 minute, 1 hour). The waveform is identical,
but one would have di½culty classifying these auditory experiences in the same
In some cases the borders between time scales are demarcated clearly; am-
biguous zones surround others. Training and culture condition perception of
the time scales. To hear a ¯at pitch or a dragging beat, for example, is to
detect a temporal anomaly on a micro scale that might not be noticed by other
5 Time Scales of Music
Figure 1.1 The time domain, segmented into periods, time delay e¨ects, frequencies,
and perception and action. Note that time intervals are not drawn to scale.
6 Chapter 1
Digital audio systems, such as compact disc players, operate at a ®xed sam-
pling frequency. This makes it easy to distinguish the exact boundary separat-
ing the sample time scale from the subsample time scale. This boundary is the
Nyquist frequency, or the sampling frequency divided by two. The e¨ect of
crossing this boundary is not always perceptible. In noisy sounds, aliased fre-
quencies from the subsample time domain may mix unobtrusively with high
frequencies in the sample time domain.
The border between certain other time scales is context-dependent. Between
the sample and micro time scales, for example, is a region of transient eventsÐ
too brief to evoke a sense of pitch but rich in timbral content. Between the
micro and the object time scales is a stratum of brief events such as short stac-
cato notes. Another zone of ambiguity is the border between the sound object
and meso levels, exempli®ed by an evolving texture. A texture might contain a
statistical distribution of micro events that are perceived as a unitary yet time-
varying sound.
Time scales interlink. A given level encapsulates events on lower levels and is
itself subsumed within higher time scales. Hence to operate on one level is to
a¨ect other levels. The interaction between time scales is not, however, a simple
relation. Linear changes on a given time scale do not guarantee a perceptible
e¨ect on neighboring time scales.
Complex Fourier analysis regards the signal sub specie aeternitatis. (Gabor 1952)
Figure 1.2 Zones of intensities and frequencies. Only the zone marked a is audible to
the ear. This zone constitutes a tiny portion of the range of sound phenomena.
This equation sums a set of numbers ui , where the index i goes from 1 to y.
What if each number ui corresponded to a tick of a clock? This series would
then de®ne an in®nite duration. This ideal is not so far removed from music as
it may seem. The idea of in®nite duration is implicit in the theory of Fourier
analysis, which links the notion of frequency to sine waves of in®nite duration.
As chapter 6 shows, Fourier analysis has proven to be a useful tool in the analy-
sis and transformation of musical sound.
9 Time Scales of Music
The supra time scale spans the durations that are beyond those of an individual
composition. It begins as the applause dies out after the longest composi-
tions, and extends into weeks, months, years, decades, and beyond (®gure 1.3).
Concerts and festivals fall into this category. So do programs from music
broadcasting stations, which may extend into years of more-or-less continuous
Musical cultures are constructed out of supratemporal bricks: the eras of
instruments, of styles, of musicians, and of composers. Musical education takes
years; cultural tastes evolve over decades. The perception and appreciation of
10 Chapter 1
a single composition may change several times within a century. The entire
history of music transpires within the supratemporal scale, starting from the
earliest known musical instrument, a Neanderthal ¯ute dating back some
45,000 years (Whitehouse 1999).
Composition is itself a supratemporal activity. Its results last only a fraction
of the time required for its creation. A composer may spend a year to complete
a ten-minute piece. Even if the composer does not work every hour of every
day, the ratio of 52,560 minutes passed for every 1 minute composed is still
signi®cant. What happens in this time? Certain composers design a complex
strategy as prelude to the realization of a piece. The electronic music composer
may spend considerable time in creating the sound materials of the work. Either
of these tasks may entail the development of software. Virtually all composers
spend time experimenting, playing with material in di¨erent combinations.
Some of these experiments may result in fragments that are edited or discarded,
to be replaced with new fragments. Thus it is inevitable that composers invest
time pursuing dead ends, composing fragments that no one else will hear. This
backtracking is not necessarily time wasted; it is part of an important feedback
loop in which the composer re®nes the work. Finally we should mention docu-
mentation. While only a few composers document their labor, these documents
may be valuable to those seeking a deeper understanding of a work and the
compositional process that created it. Compare all this with the e½ciency of the
real-time improviser!
Some music spans beyond the lifetime of the individual who composed it,
through published notation, recordings, and pedagogy. Yet the temporal reach
of music is limited. Many compositions are performed only once. Scores, tapes,
and discs disappear into storage, to be discarded sooner or later. Music-making
presumably has always been part of the experience of Homo sapiens, who it is
speculated came into being some 200,000 years ago. Few traces remain of
anything musical older than a dozen centuries. Modern electronic instruments
and recording media, too, are ephemeral. Will human musical vibrations some-
how outlast the species that created them? Perhaps the last trace of human
existence will be radio waves beamed into space, traveling vast distances before
they dissolve into noise.
The upper boundary of time, as the concept is currently understood, is the
age of the physical universe. Some scientists estimate it to be approximately
®fteen billion years (Lederman and Scramm 1995). Cosmologists continue to
debate how long the universe may expand. The latest scienti®c theories con-
tinue to twist the notion of time itself (see, for example, Kaku 1995; Arkani-
Hamed et al. 2000).
11 Time Scales of Music
The macro level of musical time corresponds to the notion of form, and
encompasses the overall architecture of a composition. It is generally measured
in minutes. The upper limit of this time scale is exempli®ed by such marathon
compositions as Richard Wagner's Ring cycle, the Japanese Kabuki theater,
Jean-Claude Eloy's evening-long rituals, and Karlheinz Stockhausen's opera
Licht (spanning seven days and nights). The literature of opera and contempo-
rary music contains many examples of music on a time scale that exceeds two
hours. Nonetheless, the vast majority of music compositions realized in the past
century are less than a half-hour in duration. The average duration is probably
in the range of a kilosecond (16 min 40 sec). Complete compositions lasting less
than a hectosecond (1 min 40 sec) are rare.
Design of Macroform
This is not to say that the use of preconceived forms has died away. The
practice of top-down planning remains common in contemporary composition.
Many composers predetermine the macrostructure of their pieces according to
a more-or-less formal scheme before a single sound is composed.
By contrast, a strict bottom-up approach conceives of form as the result of a
process of internal development provoked by interactions on lower levels of
musical structure. This approach was articulated by Edgard VareÁse (1971), who
said, ``Form is a resultÐthe result of a process.'' In this view, macrostructure
articulates processes of attraction and repulsion (for example, in the rhythmic
and harmonic domains) unfolding on lower levels of structure.
Manuals on traditional composition o¨er myriad ways to project low-level
structures into macrostructure:
Smaller forms may be expanded by means of external repetitions, sequences, extensions,
liquidations and broadening of connectives. The number of parts may be increased by sup-
plying codettas, episodes, etc. In such situations, derivatives of the basic motive are for-
mulated into new thematic units. (Schoenberg 1967)
Where people had felt the necessity to stick sounds together to make a continuity, we felt
the necessity to get rid of the glue so that sounds would be themselves. (Cage 1959)
The mesostructural level groups sound objects into a quasi hierarchy of phrase
structures of durations measured in seconds. This local as opposed to global
time scale is extremely important in composition, for it is most often on the
meso level that the sequences, combinations, and transmutations that constitute
musical ideas unfold. Melodic, harmonic, and contrapuntal relations happen
here, as do processes such as theme and variations, and many types of devel-
opment, progression, and juxtaposition. Local rhythmic and metric patterns,
too, unfold on this stratum.
Wishart (1994) called this level of structure the sequence. In the context of
electronic music, he identi®ed two properties of sequences: the ®eld (the mate-
rial, or set of elements used in the sequence), and the order. The ®eld serves as
a lexiconÐthe vocabulary of a piece of music. The order determines thematic
relationsÐthe grammar of a particular piece. As Wishart observed, the ®eld and
the order must be established quickly if they are to serve as the bearers of musical
code. In traditional music, they are largely predetermined by cultural norms.
In electronic music, the meso layer presents timbre melodies, simultaneities
(chord analogies), spatial interplay, and all manner of textural evolutions. Many
of these processes are described and classi®ed in Denis Smalley's interesting
theory of spectromorphologyÐa taxonomy of sound gesture shapes (Smalley
1986, 1997).
When new instruments will allow me to write music as I conceive it, taking the place of the
linear counterpoint, the movement of sound masses, or shifting planes, will be clearly per-
ceived. When these sound masses collide the phenomena of penetration or repulsion will
seem to occur. (VareÁse 1962)
A trend toward shaping music through the global attributes of a sound mass
began in the 1950s. One type of sound mass is a cluster of sustained frequencies
that fuse into a solid block. In a certain style of sound mass composition,
musical development unfolds as individual lines are added to or removed from
this cluster. GyoÈrgy Ligeti's Volumina for organ (1962) is a masterpiece of this
style, and the composer has explored this approach in a number of other pieces,
including AtmospheÁres (1961) and Lux Aeterna (1966).
Particles make possible another type of sound mass: statistical clouds of
microevents (Xenakis 1960). Wishart (1994) ascribed two properties to cloud
textures. As with sequences, their ®eld is the set of elements used in the texture,
which may be constant or evolving. Their second property is density, which
stipulates the number of events within a given time period, from sparse scat-
terings to dense scintillations.
Cloud textures suggest a di¨erent approach to musical organization. In
contrast to the combinatorial sequences of traditional meso structure, clouds
encourage a process of statistical evolution. Within this evolution the com-
poser can impose speci®c morphologies. Cloud evolutions can take place in the
domain of amplitude (crescendi/decrescendi), internal tempo (accelerando/
rallentando), density (increasing/decreasing), harmonicity (pitch/chord/cluster/
noise, etc.), and spectrum (high/mid/low, etc.).
Xenakis's tape compositions Concret PH (1958), Bohor I (1962), and Per-
sepolis (1971) feature dense, monolithic clouds, as do many of his works for
traditional instruments. Stockhausen (1957) used statistical form-criteria as one
component of his early composition technique. Since the 1960s, particle
textures have appeared in numerous electroacoustic compositions, such as the
remarkable De natura sonorum (1975) of Bernard Parmegiani.
VareÁse spoke of the interpenetration of sound masses. The diaphanous na-
ture of cloud structures makes this possible. A crossfade between two clouds
results in a smooth mutation. Mesostructural processes such as disintegration
and coalescence can be realized through manipulations of particle density (see
chapter 6). Density determines the transparency of the material. An increase in
16 Chapter 1
density lifts a cloud into the foreground, while a decrease causes evaporation,
dissolving a continuous sound band into a pointillist rhythm or vaporous back-
ground texture.
Cloud Taxonomy
The sound object time scale encompasses events of a duration associated with
the elementary unit of composition in scores: the note. A note usually lasts from
about 100 ms to several seconds, and is played by an instrument or sung by a
17 Time Scales of Music
vocalist. The concept of sound object extends this to allow any sound, from
any source. The term sound object comes from Pierre Schae¨er, the pioneer of
musique concreÁte. To him, the pure objet sonore was a sound whose origin a
listener could not identify (Schae¨er 1959, 1977, p. 95). We take a broader view
here. Any sound within stipulated temporal limits is a sound object. Xenakis
(1989) referred to this as the ``ministructural'' time scale.
In the ®rst place it is necessary that the strength of the vibrations of the air for very low
tones should be extremely greater than for high tones. The increase in strength . . . is of
especial consequence in the deepest tones. . . . To discover the limit of the deepest tones it is
necessary not only to produce very violent agitations in the air but to give these a simple
pendular motion. (Helmholtz 1885)
The sound object time scale is the same as that of traditional notes. What dis-
tinguishes sound objects from notes? The note is the homogeneous brick of
conventional music architecture. Homogeneous means that every note can be
described by the same four properties:
1 pitch, generally one of twelve equal-tempered pitch classes
1 timbre, generally one of about twenty di¨erent instruments for a full orches-
tra, with two or three di¨erent attack types for each instrument
1 dynamic marking, generally one of about ten di¨erent relative levels
1 duration, generally between @100 ms (slightly less than a thirty-second note
at a tempo of 60 M.M.) to @8 seconds (for two tied whole notes)
These properties are static, guaranteeing that, in theory, a note in one
measure with a certain pitch, dynamic, and instrumental timbre is functionally
equivalent to a note in another measure with the same three properties. The
properties of a pair of notes can be compared on a side-by-side basis and a
distance or interval can be calculated. The notions of equivalence and distance
lead to the notion of invariants, or intervallic distances that are preserved across
Limiting material to a static homogeneous set allows abstraction and e½-
ciency in musical language. It serves as the basis for operations such as
transposition, orchestration and reduction, the algebra of tonal harmony and
counterpoint, and the atonal and serial manipulations. In the past decade, the
MIDI protocol has extended this homogeneity into the domain of electronic
music through standardized note sequences that play on any synthesizer.
The merit of this homogeneous system is clear; highly elegant structures
having been built with standard materials inherited from centuries past. But
since the dawn of the twentieth century, a recurring aesthetic dream has been
the expansion beyond a ®xed set of homogeneous materials to a much larger
superset of heterogeneous musical materials.
What we have said about the limitations of the European note concept does
not necessarily apply to the musics of other cultures. Consider the shakuhachi
music of Japan, or contemporary practice emerging from the advanced devel-
opments of jazz.
Heterogeneity means that two objects may not share common properties.
Therefore their percept may be entirely di¨erent. Consider the following two
examples. Sound A is a brief event constructed by passing analog diode noise
19 Time Scales of Music
Objects that do not share common properties may be separated into diverse
classes. Each class will lend itself to di¨erent types of manipulation and musical
organization. Certain sounds layer well, nearly any mixture of elongated sine
waves with smooth envelopes for example. The same sounds organized in a
sequence, however, rather quickly become boring. Other sounds, such as iso-
lated impulses, are most e¨ective when sparsely scattered onto a neutral sound
Transformations applied to objects in one class may not be e¨ective in an-
other class. For example, a time-stretching operation may work perfectly well
on a pipe organ tone, preserving its identity and a¨ecting only its duration. The
same operation applied to the sound of burning embers will smear the crackling
transients into a nondescript electronic blur.
In traditional western music, the possibilities for transition within a note are
limited by the physical properties of the acoustic instrument as well as frozen by
theory and style. Unlike notes, the properties of a sound object are free to vary
over time. This opens up the possibility of complex sounds that can mutate
from one state to another within a single musical event. In the case of synthe-
sized sounds, an object may be controlled by multiple time-varying envelopes
for pitch, amplitude, spatial position, and multiple determinants of timbre.
These variations may take place over time scales much longer than those asso-
ciated with conventional notes.
20 Chapter 1
We can subdivide a sound object not only by its properties but also by its
temporal states. These states are composable using synthesis tools that operate
on the microtime scale. The micro states of a sound can also be decomposed
and rearranged with tools such as time granulators and analysis-resynthesis
The desire to understand the enormous range of possible sound objects led
Pierre Schae¨er to attempt to classify them, beginning in the early 1950s
(Schae¨er and Moles 1952). Book V of his Traite des objets musicaux (1977),
entitled Morphologie and typologie des objets sonores introduces the useful no-
tion of sound object morphologyÐthe comparison of the shape and evolution
of sound objects. Schae¨er borrowed the term morphology from the sciences,
where it refers to the study of form and structure (of organisms in biology, of
word-elements in linguistics, of rocks in geology, etc.). Schae¨er diagrammed
sound shape in three dimensions: the harmonic (spectrum), dynamic (ampli-
tude), and melodic (pitch). He observed that the elements making up a com-
plex sound can be perceived as either merged to form a sound compound, or
remaining separate to form a sound mixture. His typology, or classi®cation
of sound objects into di¨erent groups, was based on acoustic morphological
The idea of sound morphology remains central to the theory of electro-
acoustic music (Bayle 1993), in which the musical spotlight is often shone on
the sound object level. In traditional composition, transitions function on the
mesostructural level through the interplay of notes. In electroacoustic music,
the morphology of an individual sound may play a structural role, and tran-
sitions can occur within an individual sound object. This ubiquity of mutation
means that every sonic event is itself a potential transformation.
The micro time scale is the main subject of this book. It embraces transient
audio phenomena, a broad class of sounds that extends from the threshold of
21 Time Scales of Music
Perception of Microsound
Microevents last only a very short time, near to the threshold of auditory per-
ception. Much scienti®c study has gone into the perception of microevents.
Human hearing mechanisms, however, intertwine with brain functions, cogni-
tion, and emotion, and are not completely understood. Certain facts are clear.
22 Chapter 1
One cannot speak of a single time frame, or a time constant for the auditory
system (Gordon 1996). Our hearing mechanisms involve many di¨erent agents,
each of which operates on its own time scale (see ®gure 1.1). The brain inte-
grates signals sent by various hearing agents into a coherent auditory picture.
Ear-brain mechanisms process high and low frequencies di¨erently. Keeping
high frequencies constant, while inducing phase shifts in lower frequencies,
causes listeners to hear a di¨erent timbre.
Determining the temporal limits of perception has long engaged psycho-
acousticians (Doughty and Garner 1947; Buser and Imbert 1992; Meyer-Eppler
1959; Winckel 1967; Whit®eld 1978). The pioneer of sound quanta, Dennis
Gabor, suggested that at least two mechanisms are at work in microevent de-
tection: one that isolates events, and another that ascertains their pitch. Human
beings need time to process audio signals. Our hearing mechanisms impose
minimum time thresholds in order to establish a ®rm sense of the identity and
properties of a microevent.
In their important book Audition (1992), Buser and Imbert summarize a large
number of experiments with transitory audio phenomena. The general result
from these experiments is that below 200 ms, many aspects of auditory per-
ception change character and di¨erent modes of hearing come into play. The
next sections discuss microtemporal perception.
In the zone of low amplitude, short sounds must be greater in intensity than
longer sounds to be perceptible. This increase is about 20 dB for tone pips
of 1 ms over those of 100 ms duration. (A tone pip is a sinusoidal burst with
a quasi-rectangular envelope.) In general, subjective loudness diminishes with
shrinking durations below 200 ms.
In dense portions of the Milky Way, stellar images appear to overlap, giving the e¨ect of a
near-continuous sheet of light . . . The e¨ect is a grand illusion. In reality . . . the nightime
sky is remarkably empty. Of the volume of space only 1 part in 10 21 [one part in a quin-
tillion] is ®lled with stars. (Kaler 1997)
Circuitry can measure time and recognize pulse patterns at tempi in the range
of a gigahertz. Human hearing is more limited. If one impulse follows less than
200 ms after another, the onset of the ®rst impulse will tend to mask the second,
23 Time Scales of Music
The ear is quite sensitive to intermittencies within pure sine waves, especially in
the middle range of frequencies. A 20 ms ¯uctuation in a 600 Hz sine wave,
consisting of a 6.5 ms fade out, a 7 ms silent interval, and a 6.5 ms fade in,
breaks the tone in two, like a double articulation. A 4 ms interruption, con-
sisting of a 1 ms fade out, a 2 ms silent interval, and a 1 ms fade in, sounds like
a transient pop has been superimposed on the sine wave.
Intermittencies are not as noticeable in complex tones. A 4 ms interruption is
not perceptible in pink noise, although a 20 ms interruption is.
In intermediate tones, between a sine and noise, microtemporal gaps less
than 10 ms sound like momentary ¯uctuations in amplitude or less noticeable
transient pops.
24 Chapter 1
Doughty and Garner (1947) divided the mechanism of pitch perception into
two regions. Above about 1 kHz, they estimated, a tone must last at least 10 ms
to be heard as pitched. Below 1 kHz, at least two to three cycles of the tone are
Green (1971) suggested that temporal auditory acuity (the ability of the ear to
detect discrete events and to discern their order) extends down to durations as
short as 1 ms. Listeners hear microevents that are less than about 2 ms in du-
ration as a click, but we can still change the waveform and frequency of these
events to vary the timbre of the click. Even shorter events (in the range of
microseconds) can be distinguished on the basis of amplitude, timbre, and spa-
tial position.
When a person glimpses the face of a famous actor, sni¨s a favorite food, or hears the voice
of a friend, recognition is instant. Within a fraction of a second after the eyes, nose, ears,
tongue or skin is stimulated, one knows the object is familiar and whether it is desirable or
dangerous. How does such recognition, which psychologists call preattentive perception,
happen so accurately and quickly, even when the stimuli are complex and the context in
which they arise varies? (Freeman 1991)
Microevents touch the extreme time limits of human perception and perfor-
mance. In order to examine and manipulate these events ¯uidly, we need digital
audio ``microscopes''Ðsoftware and hardware that can magnify the micro time
scale so that we can operate on it.
For the serious researcher, the most precise strategy for accessing the micro
time scale is through computer programming. Beginning in 1974, my research
was made possible by access to computers equipped with compiler software
and audio converters. Until recently, writing one's own programs was the only
possible approach to microsound synthesis and transformation.
Many musicians want to be able to manipulate this domain without the total
immersion experience that is the lifestyle of software engineering. Fortunately,
the importance of the micro time scale is beginning to be recognized. Any sound
editor with a zoom function that proceeds down to the sample level can view
and manipulate sound microstructure (®gure 1.4).
Programs such as our Cloud Generator (Roads and Alexander 1995),
o¨er high-level controls in the micro time domain (see appendix A). Cloud
Generator's interface directly manipulates the process of particle emission,
controlling the ¯ow of many particles in an evolving cloud. Our more recent
PulsarGenerator, described in chapter 4, is another example of a synthetic
particle generator.
The perceived result of particle synthesis emerges out of the interaction of
parameter evolutions on a micro scale. It takes a certain amount of training to
learn how operations in the micro domain translate to acoustic perceptions on
higher levels. The grain duration parameter in granular synthesis, for example,
has a strong e¨ect on the perceived spectrum of the texture.
This situation is no di¨erent from other well-known synthesis techniques.
Frequency modulation synthesis, for example, is controlled by parameters such
as carrier-to-modulator ratios and modulation indexes, neither of which are
direct terms of the desired spectrum. Similarly, physical modeling synthesis is
controlled by manipulating the parameters that describe the parts of a virtual in-
strument (size, shape, material, coupling, applied force, etc.), and not the sound.
One can imagine a musical interface in which a musician speci®es the desired
sonic result in a musically descriptive language which would then be translated
27 Time Scales of Music
Figure 1.4 Viewing the micro time scale via zooming. The top picture is the waveform
of a sonic gesture constructed from sound particles. It lasts 13.05 seconds. The middle
image is a result of zooming in to a part of the top waveform (indicated by the dotted
lines) lasting 1.5 seconds. The bottom image is a microtemporal portrait of a 10 milli-
second fragment at the beginning of the top waveform (indicated by the dotted lines).
In the 1940s, the physicist Dennis Gabor made the assertion that all soundÐ
even continuous tonesÐcan be considered as a succession of elementary par-
ticles of acoustic energy. (Chapter 2 summarizes this theory.) The question then
arises: do sound particles really exist, or are they merely a theoretical con-
28 Chapter 1
struction? In certain sounds, such as the taps of a slow drum roll, the individual
particles are directly perceivable. In other sounds, we can prove the existence of
a granular layer through logical argument.
Consider the whole number 5. This quantity may be seen as a sum of sub-
quantities, for example 1 1 1 1 1, or 2 3, or 4 1, and so on. If we
take away one of the subquantities, the sum no longer is 5. Similarly, a contin-
uous tone may be considered as a sum of subquantitiesÐas a sequence of over-
lapping grains. The grains may be of arbitrary sizes. If we remove any grain,
the signal is no longer the same. So clearly the grains exist, and we need all of
them in order to constitute a complex signal. This argument can be extended
to explain the decomposition of a sound into any one of an in®nite collection
of orthogonal functions, such as wavelets with di¨erent basis functions, Walsh
functions, Gabor grains, and so on.
This logic, though, becomes tenuous if it is used to posit the preexistence (in
an ideal Platonic realm) of all possible decompositions within a whole. For ex-
ample, do the slices of a cake preexist, waiting to be articulated? The philoso-
phy of mathematics is littered with such questions (Castonguay 1972, 1973).
Fortunately it is not our task here to try to assay their signi®cance.
Below the level of microtime stands the sampled time scale (®gure 1.5). The
electronic clock that drives the sampling process establishes a time grid. The
spacing of this grid determines the temporal precision of the digital audio
medium. The samples follow one another at a ®xed time interval of 1= fS , where
fS is the sampling frequency. When fS 44:1 kHz (the compact disc rate),
the samples follow one another every 22.675 millionths of a second (msec).
29 Time Scales of Music
Figure 1.5 Sample points in a digital waveform. Here are 191 points spanning a 4.22 ms
time interval. The sampling rate is 44.1 kHz.
The atom of the sample time scale is the unit impulse, the discrete-time coun-
terpart of the continuous-time Dirac delta function. All samples should be con-
sidered as time-and-amplitude-transposed (delayed and scaled) instances of
the unit impulse.
The interval of one sample period borders near the edge of human audio
perception. With a good audio system one can detect the presence of an indi-
vidual high-amplitude sample inserted into a silent stream of zero-valued sam-
ples. Like a single pixel on a computer screen, an individual sample o¨ers little.
Its amplitude and spatial position can be discerned, but it transmits no sense of
timbre and pitch. Only when chained into sequences of hundreds do samples
¯oat up to the threshold of timbral signi®cance. And still longer sequences of
thousands of samples are required to represent pitched tones.
Users of digital audio systems rarely attempt to deal with individual sample
points, which, indeed, only a few programs for sound composition manipulate
directly. Two of these are G. M. Koenig's Sound Synthesis Program (SSP) and
30 Chapter 1
Herbert BruÈn's Sawdust program, both developed in the late 1970s. Koenig and
BruÈn emerged from the Cologne school of serial composition, in which the in-
terplay between macro- and microtime was a central aesthetic theme (Stock-
hausen 1957; Koenig 1959; Maconie 1989). BruÈn wrote:
For some time now it has become possible to use a combination of analog and digital
computers and converters for the analysis and synthesis of sound. As such a system will
store or transmit information at the rate of 40,000 samples per second, even the most
complex waveforms in the audio-frequency range can be scanned and registered or be
recorded on audio tape. This . . . allows, at last, the composition of timbre, instead of with
timbre. In a sense, one may call it a continuation of much which has been done in the elec-
tronic music studio, only on a di¨erent scale. The composer has the possibility of extending
his compositional control down to elements of sound lasting only 1/20,000 of a second.
(Brun 1970)
My intention was to go away from the classical instrumental de®nitions of sound in terms
of loudness, pitch, and duration and so on, because then you could refer to musical elements
which are not necessarily the elements of the language of today. To explore a new ®eld of
sound possibilities I thought it best to close the classical descriptions of sound and open up
an experimental ®eld in which you would really have to start again. (Roads 1978b)
