100% found this document useful (1 vote)

4K views704 pages

The Oxford Handbook of Sound and Imagination, VOLUME 2

Uploaded by

Oscar Perez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

4K views704 pages

The Oxford Handbook of Sound and Imagination, VOLUME 2

Uploaded by

Oscar Perez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 704

T h e Ox f o r d H a n d b o o k o f

S OU N D A N D
I M AGI NAT ION,
VOLU M E 2
The Oxford Handbook of

SOUND AND
IMAGINATION,
VOLUME 2
Edited by
MARK GRIMSHAW-AAGAARD,
MADS WALTHER-HANSEN,
and
MARTIN KNAKKERGAARD

1
1
Oxford University Press is a department of the University of Oxford. It furthers
the University’s objective of excellence in research, scholarship, and education
by publishing worldwide. Oxford is a registered trade mark of Oxford University
Press in the UK and certain other countries.

Published in the United States of America by Oxford University Press

198 Madison Avenue, New York, NY 10016, United States of America.

© Oxford University Press 2019

All rights reserved. No part of this publication may be reproduced, stored in

a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by license, or under terms agreed with the appropriate reproduction
rights organization. Inquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above.

You must not circulate this work in any other form

and you must impose this same condition on any acquirer.

Library of Congress Cataloging-in-Publication Data

Names: Grimshaw-Aagaard, Mark. | Walther-Hansen, Mads. | Knakkergaard, Martin.
Title: The Oxford handbook of sound and imagination / edited by Mark
Grimshaw-Aagaard, Mads Walther-Hansen, and Martin Knakkergaard.
Description: New York, NY : Oxford University Press, [2019] |
Series: Oxford handbooks | Includes bibliographical references and index.
Identifiers: LCCN 2018049753 | ISBN 9780190460167 (v. 1: cloth : alk. paper) |
ISBN 9780190460242 (v. 2: cloth : alk. paper) | ISBN 9780190460198
(v. 1: oxford handbooks online) | ISBN 9780190460273 (v. 2: oxford handbooks online)
Subjects: LCSH: Music—Psychological aspects. | Sound—Psychological aspects. | Imagination.
Classification: LCC ML3830 .O92 2019 | DDC 781.1/1—dc23
LC record available at https://lccn.loc.gov/2018049753

1 3 5 7 9 8 6 4 2
Printed by Sheridan Books, Inc., United States of America
Contents

Acknowledgmentsix
Contributorsxi
The Companion Websitexiii

Introduction: Volume 2 1
Mark Grimshaw-Aagaard, Mads Walther-Hansen,
and Martin Knakkergaard

PA RT I M U SIC A L P E R F OR M A N C E
1. Improvisation: An Ideal Display of Embodied Imagination 15
Justin Christensen

2. Anticipated Sonic Actions and Sounds in Performance 37

Clemens Wöllner

3. Motor Imagery in Perception and Performance of Sound

and Music 59
Jan Schacher

4. Music and Emergence 77

John M. Carvalho

5. Affordances in Real, Virtual, and Imaginary Musical Performance 97

Marc Duby

PA RT I I SYST E M S A N D T E C H N OL O G I E S
6. Systemic Abstractions: The Imaginary Regime 117
Martin Knakkergaard

7. From Rays to Ra: Music, Physics, and the Mind 133

Janna K. Saslaw and James P. Walsh

8. Music Analysis and Data Compression 153

David Meredith
vi contents

9. Bioacoustics: Imaging and Imagining the Animal World 179

Mickey Vallee

10. Musical Notation as the Externalization of Imagined,

Complex Sound 191
Henrik Sinding-Larsen

11. “. . . they call us by our name . . .”: Technology, Memory,

and Metempsychosis 219
Bennett Hogg

12. Musical Shape Cognition 237

Rolf Inge Godøy

13. Playing the Inner Ear: Performing the Imagination 259

Simon Emmerson

PA RT I I I P SYC HOL O G Y
14. Music in Detention and Interrogation: The Musical
Ecology of Fear 281
W. Luke Windsor

15. Augmented Unreality: Synesthetic Artworks and

Audiovisual Hallucinations 301
Jonathan Weinel

16. Consumer Sound 321

Søren Bech and Jon Francombe

17. Creating a Brand Image through Music: Understanding

the Psychological Mechanisms behind Audio Branding 349
Hauke Egermann

18. Sound and Emotion 369

Erkin Asutay and Daniel Västfjäll

19. Voluntary Auditory Imagery and Music Pedagogy 391

Andrea R. Halpern and Katie Overy

20. A Different Way of Imagining Sound: Probing the Inner

Auditory Worlds of Some Children on the Autism Spectrum 409
Adam Ockelford
contents vii

21. Multimodal Imagery in the Receptive Music Therapy Model

Guided Imagery and Music (GIM) 427
Lars Ole Bonde

22. Empirical Musical Imagery beyond the “Mind’s Ear” 445

Freya Bailes

PA RT I V A E S T H E T IC S
23. Imaginative Listening to Music 467
Theodore Gracyk

24. A Hopeful Tone: A Waltonian Reconstruction of Bloch’s

Musical Aesthetics 489
Bryan J. Parkhurst

25. Sound as Environmental Presence: Toward an Aesthetics

of Sonic Atmospheres 517
Ulrik Schmidt

26. The Aesthetics of Improvisation 535

Andy Hamilton

PA RT V P O S T H UM A N I SM
27. Sonic Materialism: Hearing the Arche-Sonic 559
Salomé Voegelin

28. Imagining the Seamless Cyborg: Computer System

Sounds as Embodying Technologies 579
Daniël Ploeger

29. Glitched and Warped: Transformations of Rhythm in the

Age of the Digital Audio Workstation 595
Anne Danielsen

30. On the Other Side of Time: Afrofuturism and the Sounds

of the Future 611
Erik Steinskog

31. Posthumanist Voices in Literature and Opera 629

Jason R. D’aoust

Index 653
Acknowledgments

This handbook has been a four-year labor of, if not unconditional love, then surely a love
tempered by blood, toil, tears, and sweat. Completing the task from proposal to comple-
tion, of compiling, editing, and publishing a two-volume work of over 650,000 words,
requires dedication, attention to detail, and, at times, sheer bloody-mindedness. Here,
thanks are due to the many people without whom the book you now hold in your hand
would not have seen the light of day. Our first thanks go to our commissioning editor
Norm Hirschy and music editor Lauralee Yeary, both of Oxford University Press, who
not only had the vision to see beyond the shortcomings of our initial proposal but also
firmly and patiently guided us through the many twists and turns of putting together the
final manuscript. Additionally, we are grateful to their many nameless colleagues at the
press who have tirelessly labored over copyediting, proofing, design, indexing, and a
host of other unknown tasks that take place behind the scenes. Thanks are also due a
number of anonymous reviewers, from proposal through to draft manuscript, who were
overwhelmingly supportive of what they read while also presenting us with many sug-
gestions for expansion and improvement. Alistair Payne, Professor of Fine Art Practice
and Head of the School of Fine Art at the Glasgow School of Art, has our gratitude for
allowing us to use his magnificent diptych The Fall as the cover art. Finally, although it is
our names on the front of the handbook—and thus our responsibility for any errors that
remain—none of what you are reading would have been possible without the contribu-
tions of our authors who have neither ceased in their enthusiasm for the project nor
flagged in the face of countless e-mails from us. Our heartfelt thanks go to them; we
hope you enjoy their efforts.
Contributors

Erkin Asutay, Postdoctoral Researcher, Department of Behavioral Sciences and

Learning, Linköping University
Freya Bailes, Academic Fellow, University of Leeds
Søren Bech, Director Research, Bang & Olufsen a/s and Professor, Aalborg University
Lars Ole Bonde, Professor Emeritus in Music Therapy, Aalborg University; Professor
Emeritus in Music and Health, Center for Research in Music and Health, The Norwegian
Academy of Music
John M. Carvalho, Professor of Philosophy, Villanova University
Justin Christensen, Postdoctoral Researcher, University of Saskatchewan
Anne Danielsen, Professor, RITMO Centre for Interdisciplinary Studies in Rhythm,
Time and Motion, University of Oslo
Jason R. D’Aoust, Visiting Assistant Professor of Comparative Literature and Musical
Studies, Oberlin College and Conservatory
Marc Duby, Research Professor in Musicology, University of South Africa
Hauke Egermann, Assistant Professor, York Music Psychology Group, University of York
Simon Emmerson, Professor of Music, Technology and Innovation, Leicester Media
School, De Montfort University
Jon Francombe, Senior Research and Development Engineer, BBC Research and
Development
Rolf Inge Godøy, Professor, Department of Musicology and the RITMO Centre for
Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo
Theodore Gracyk, Professor of Philosophy, Minnesota State University Moorhead
Mark Grimshaw-Aagaard, The Obel Professor of Music, Aalborg University
Andrea R. Halpern, Professor of Psychology, Bucknell University
Andy Hamilton, Professor of Philosophy, Durham University
Bennett Hogg, Senior Lecturer, International Centre for Music Studies, Newcastle
University
Martin Knakkergaard, Senior Lecturer, Aalborg University
xii contributors

David Meredith, Associate Professor, Aalborg University

Adam Ockelford, Professor of Music and Director of the Applied Music Research
Centre, University of Roehampton
Katie Overy, Senior Lecturer in Music, University of Edinburgh
Bryan J. Parkhurst, Assistant Professor of Music Theory and Philosophy, Oberlin
College and Conservatory
Daniël Ploeger, Research Fellow, Royal Central School of Speech and Drama, University
of London
Janna K. Saslaw, Professor of Music Theory, Loyola University New Orleans
Jan Schacher, Research Associate, Institute for Computer Music and Sound Technology,
Zurich University of the Arts
Ulrik Schmidt, Associate Professor, Roskilde University
Henrik Sinding-Larsen, Researcher, Department of Social Anthropology, University
of Oslo
Erik Steinskog, Associate Professor, University of Copenhagen
Mickey Vallee, Tier II Canada Research Chair in Community, Identity and Digital Media,
Athabasca University
Salomé Voegelin, Reader in Sound Arts, London College of Communication, University
of the Arts London
Daniel Västfjäll, Professor, Department of Behavioral Sciences and Learning, Linköping
University
James P. Walsh, Adjunct Instructor of Music, Loyola University New Orleans
Mads Walther-Hansen, Associate Professor, Aalborg University
Jonathan Weinel, Visiting Research Fellow, Aalborg University
W. Luke Windsor, Professor of Music Psychology, School of Music, University of Leeds
Clemens Wöllner, Professor of Systematic Musicology, Universität Hamburg
The Companion Website

www.oup.com/us/ohsi2
Oxford has created a website of images to accompany The Oxford Handbook of Sound
and Imagination, Volume 2. Readers are encouraged to consult this resource while reading
the volume as many images on the website are in color.
I n troduction
Volume 2

Mark Grimshaw-Aagaard,
Mads Walther-Hansen, and
Martin Knakkergaard

. . . Fümms bö wä tää zää Uu, pögiff, kwiiee.

Dedesnn nn rrrrr, Ii Ee, mpiff tillff toooo, tillll, Jüü-Kaa?
Rinnzekete bee bee nnz krr müüüü, ziiuu ennze ziiuu rinnzkrrmüüüü,
Rakete bee bee.
Rrummpff tillff toooo?
Ziiuu ennze ziiuu nnzkrrmüüüü, ziiuu ennze ziiuu rinnzkrrmüüüü,
Rakete bee bee.
Rakete bee zee . . .
—Kurt Schwitters, URSONATE (extract)

A working assumption might be that imagination has its genesis in past experience,
whether that genesis is social, cultural, or individual, and this influences the interpretation
of context and directs the thinking and ideas that arise from it. This is the theme that
fundamentally constitutes the substance of this handbook: the role and effect of imagi-
nation in the development and use of sonic processes and artifacts. Whether the act of
imagination is a previously unheard sound in a science fiction movie or a new composi-
tional style, such a process always derives from, and may be discussed and made sense of
in relation to, something pre-existing; the mundane recordings of wildlife that form the
basis for the alien’s screech, for example, or a distinctive difference from other composi-
tional styles. Yet, one should not make the mistake of assuming sonic imagination is
purely to do with the creation of new artifacts; one can rehearse mentally a piece of music
or recall and imagine a previously heard sound for the silent action seen on screen.
Equally, imaginative sound processes and artifacts themselves provoke other instances
and forms of imagination often far removed from the field of sound. It is this broad
reach that the handbook endeavors to cover.
2 grimshaw-aagaard, walther-hansen, and knakkergaard

A quick perusal of books on imagination will demonstrate that, if it is not viewed as

abstract or creative thought, imagination is typically discussed in terms of image, as is
clear from the root of the word itself. Equally, previous works on sonic imagination are
predominantly on the subject of musical imagination, but they disguise the topic of
imagination under themes of musical compositional creativity or performance tech-
niques such as improvisation or deal solely with auditory imagery in the domain of
neurosciences and psychoacoustics. When initially proposed to Oxford University
Press in early 2015, the handbook then envisioned consisted of forty-nine chapters
and, while we included chapters covering the “traditional” areas of sonic imagination
just noted, we also deliberately included chapters that dealt with other aspects of musi-
cal and auditory imagery and, bringing in other viewpoints from study areas that, prima
facie, have little to do with sound, speech, and what might be called pure sound (that
is, sound that, in English, is classed neither as music nor as speech). The handbook has
since grown to seventy chapters, and the “sound” in the handbook’s title covers the
broad domains of pure sound, music, and speech from numerous perspectives both
conflicting and complementary.
This is neither a humanities handbook nor a natural sciences handbook. As we make
clear in what follows, we eschew such proscriptive labels. The handbook is determinedly
multidisciplinary and so includes contributions from scholars and practitioners from
numerous disciplines and fields including musicology, acoustics and psychoacoustics,
sound studies, film studies, soundscape practice, literature, computer sciences, psychology,
computer games, acoustic ecology, cognition and neuroimaging, and the list goes on.
Thinking about and working with sound and imagination belongs to no one area.

The Chapters

The handbook comprises seventy chapters (excluding this Introduction) shared across
ten parts and two volumes that broadly arc across philosophical concerns to more prac-
tical matters before returning to philosophical issues again. However, the reader should
not expect a particular part to be purely philosophical and untainted by practice or for
those parts ostensibly dealing with practice to be unsullied by philosophy. As a multidis-
ciplinary handbook, we have endeavored to maintain that ethos across all parts meaning
that the reader, moving sequentially through the book, will, for instance, find a chapter
on the relationship of imagination to presence in the context of multimodal surfaces
juxtaposed to one dealing with the science of auditory imagery or a chapter on synes-
thetic art and hallucination abutting another detailing the process of controlling or even
excluding the listener’s imagination from auditory imagery. This is quite deliberate and
is a demonstration that particular topics within the broad theme of sound and imagi-
nation are as common to a variety of disciplines as those disciplines’ writing styles are
diverse. Yet there is a more devious method at work here: in a world where universities,
politicians, and research funding bodies all implicitly or explicitly work toward the
introduction: volume 2 3

rioritization of certain forms and areas of research, we would rather present a handbook
p
structure that ignores the barriers that arise in response to such short-term, limited,
and, yes, unimaginative thinking in order to show that the conditions for new thoughts
and ideas and for the synthesis of new knowledge are best nurtured and sustained in the
absence of academic siloes. So, our advice to the reader of this handbook is to indeed
read sequentially, and, in this, we trust that inspiration will be found.
Volume 2 of the handbook comprises five parts: “Musical Performance,” “Systems and
Technologies,” “Psychology,” “Aesthetics,” and “Posthumanism.” The first part takes us into
the sphere of musical performance and imagination. The chapters here cover musicking
and meaning-making in improvisation, the construction and use of sonic imagery in per-
formance, the role of motor imagery when playing a musical instrument, the emergence of
a hidden music facilitated through embodiment and environmental affordance, and the
connection between gesture and sound, particularly the imagined sound of the air guitar.
Part 2 of the volume has as its framework sound and imagination in the context of
systems and technologies. These are systems and technologies that underpin not only
the production of sound and music but also the analysis and description of sound and
music, stressing the role of imagination in what is consequently conceived of or inter-
preted. There are chapters on the profound influence of Ancient Greek imagination on
Western tuning systems, the centrality of repetition not only to the emergence of life but
also to the experience of music, the compression of musical information as a means to
analytical musical knowledge, how the technology and interpretation of bioacoustics
imposes a potentially incorrect imagining of the existences of other animal species and
our relationships to them, musical notation as an externalization of music, the reliance
of sound recording on the imagination, shape cognition and the experience of music,
and, finally, a speculative essay on a tool to extract sound imagery for musical purposes.
Part 3 concentrates on the psychology of sound and imagination. The first chapter
covers psychological warfare and interrogation practices under the influence of music,
tying this to the ethics of marketing, while the next chapter takes as its topic audiovisual
media, such as VJ events and gaming, to show how sounds can be used to evoke halluci-
natory experiences. The next four chapters deal with the areas of the control of auditory
imagination in the context of listening tests for the design of consumer audio products,
the use of music in the creation of brand identity, the role of emotions in sound perception,
and the part that musical imagery plays in music education and performance rehearsal.
The final three chapters in this part have topics ranging from the relevance of autism
research to the development of musical ability, the role of musical imagery in music
therapy, and musical imagery in an embodied framework.
The penultimate part of the volume comprises four essays dealing with aesthetics and
sonic imagination. Topics range over hearing-in—a form of imaginative engagement
with music—the viewpoint that music can be seen as a utopian allegory, the affective aes-
thetic potential of sonic environments, and the aesthetics of imperfection in the context
of musical improvisation.
Part 5 positions the theme of sound and imagination within posthumanism. The part
begins with a chapter on sonic materialism that brings sound into the posthumanist
4 grimshaw-aagaard, walther-hansen, and knakkergaard

debate on New Materialism. This is followed by an essay on computer operating system

sounds as cybernetic prostheses. The third chapter has as its topic rhythmic feel in
popular music and the uses of music technology to extend the human’s natural repertoire
of gestures. Afrofuturism is the topic of the next chapter, in which the style’s music is
presented as a form of sonic time travel, while the final chapter comprises a posthumanist
appraisal of vocality in the context of opera.

Musical Performance
Justin Christensen deals with the bond between improvisation and imagination in
artistic experience. Starting with a reassessment in continental philosophy both of how
imagination is conceived and can be demonstrated, Christensen observes that the con-
nection between improvisation and imagination has little value in classic aesthetic theories.
He then goes on to argue for the value of improvisation as a reflection of perception–
action coupling that is central to newer theories that favor embodied approaches to music
cognition. In the light of such theories, where perception, action, and imagination are
seen as interdependent properties, Christensen proposes a greater recognition of the
processes of musicking—including improvisation—to better understand meaning-
making and the role of imagination in musical experience.
Clemens Wöllner’s chapter deals with sonic actions in music performance. He argues
that musicians construct sonic images in the act of playing that allow them to anticipate
sonic actions and to perform without auditory feedback (for instance, when sound is
switched off during a performance). The construction of sonic images is discussed in
the context of performances on both traditional and controller-driven instruments, and
Wöllner shows how a performer’s anticipated sonic actions differ according to the type
of instrument. In relation to this, the level of detail of the imagined sound qualities
involved in auditory imagery is explored, and Wöllner considers the mappings between
gesture and sound that are required for audiences to imagine the sound emerging from
the performer’s actions.
In his chapter, Jan Schacher asks the question of what it is to imagine and initiate an
action on a musical instrument. For Schacher, the body is the central element of listen-
ing and sound perception; thus, the body, in an embodied and enactive sense, becomes
the focus for his explication of musicking on both conventional instruments and digital
instruments where, in the latter case, bodily schemata are replaced by metaphors and
instrumental representations. This last provides a significant topic of enquiry in the
chapter and the theme is explored from a number of angles, chief among which is a focus
(through the lenses of motor imagery and imagination in music) on relations between
inner and outer aspects of our ways and means of listening to and performing music and
sound. Ultimately, Schacher identifies a tension underlying digital musical performance
brought about by the fracturing of the action–sound bond that is the basis not only for
our sound perception of the natural world but also for the world of culturally ingrained
musical performance.
introduction: volume 2 5

John Carvalho’s chapter is about the music that emerges in a skilled engagement with
an environment of sound. This music emerges in a piece of music when the embodied
skills of a composer, performer, or listener enact affordances that turn up in the environ-
ment defined by that piece of music. For Carvalho, the imagination animates these skills
and directs their embodied testing of the environment for affordances to be enacted
in making music emerge in it. To support this argument, Carvalho turns to an ecology of
mind and a taxonomy of listening that account for how composers, performers, and
listeners enact the music in a piece of music.
Marc Duby bases his exploration of sound and imagination on James J. Gibson’s affor-
dance concept. Using this concept, Duby studies how musicians benefit from real and
imagined actions in their interaction with real (e.g., pianos), virtual (e.g., MIDI con-
trollers), and air instruments (e.g., air guitars [nonexistent instruments]). In each case,
Duby explores the connection between gesture and sound and how the instruments
afford creativity. This leads to discussions of the range of imaginary possibilities that the
instruments afford musicians in the act of performing, composing, and listening, and of
how the special case of the air guitar challenges existing theories of embodied cognition.

Systems and Technologies

The second chapter in the handbook by Martin Knakkergaard concentrates on the
development of a system designed to constrain the pitch interval in Western music.
Western tuning systems, according to Knakkergaard, developed out of the Ancient
Greek’s mystical fascination with the number 4. Thus, the Ancient Greek’s beliefs and
imaginations are very much alive today in a musical system that both regulates and
guides compositional imagination, and this remains the case even with the apparent
freedom offered music by digitalization.
Janna Saslaw and James Walsh speculate that repetition—as a principle central to the
emergence of life—could also be seen as central to the experience of music. Their argument
involves discussions about a number of key components involved in the continuous
process of developing the human species, such as self-replication, invariance, emergent
structure, swarm behavior, homeostatic frames of reference, periodicity, resonance, and
entrainment. In relation to this, the music of Sun Ra is used as an exemplification of the
evolutionary benefit of music in creating more efficient homeostasis.
David Meredith’s chapter focuses on how the musical knowledge that underpins both
music perception and musical imagery can be acquired by compressing musical infor
mation. The chapter is based on the notion that the goal of music analysis is to find the best
possible explanations for the structures of musical objects, such as individual works or
movements as well as extracts from works or collections of works. In opposition to what
Meredith considers to be the subjectively grounded evaluation of different analyses of the
same music, carried out by music analysts who adopt a humanistic approach, his chapter
discusses what he sees as objectively evaluable analysis tasks aimed at finding ways of
understanding musical objects that allow us to most effectively carry out the musical tasks.
6 grimshaw-aagaard, walther-hansen, and knakkergaard

In a chapter that explores the use within bioacoustics of technology and interpretation
of its data in order to assess human acoustic impact on nonhuman species, Mickey Vallee
introduces the term “transacoustic community” to illustrate the nefarious and trans-
gressive means those data are put to. Vallee makes the charge that the bioacoustics
community hears without listening, having a different imagination of sound to other
sound-based researchers. This imagination springs not only from the specific aims of
the bioacoustics community but also from the audio technology used that ultimately
relies on visualization for its data access; thus, the requirement of a mastery of visual
interpretation, rather than a refined aurality, affects our understanding of the relationship
between humans and other species.
Henrik Sinding-Larsen presents an analysis of how new tools for the visual description
of sound revolutionized the way music was conceived, performed, and disseminated.
The Ancient Greeks had previously described pitches and intervals in mathematically
precise ways. However, their complex system had few consequences until it was com-
bined with the practical minds of Roman Catholic choirmasters around 1000 ce. Now,
melodies became depicted as note-heads on lines with precise pitch meanings and with
note names based on octaves. This graphical and conceptual externalization of patterns
in sound paved the way for a polyphonic complexity unimaginable in a purely oral/aural
tradition. However, this higher complexity also entailed strictly standardized/homoge-
nized scales and less room for improvisation in much of notation-based music. Through
the concept of externalization, lessons from the history of musical notation are general-
ized to other tools of description, and Sinding-Larsen ends with a reflection on what
future practices might become imaginable and unimaginable as a result of computer
programming.
Bennett Hogg queries the relations that sound recording has commonly been thought
to have to memory, in particular mechanistic approaches to both memory and record-
ing that see them as processes that fix things through time. Making sense of memories
as they are “laid down,” and as they are “recalled,” involves imagining novel con
nections between memorized materials and networks of sensory, social, and cultural
experience. Imagination, through time, subtly reworks memories, modulating their
affect, re-evaluating the significance of particular memories, mythologizing them, even.
To understand listening to recordings according to a rather reductive model of memory
risks misrepresenting the richness of the cognitive ecosystem in which listening occurs.
In looking for a new metaphor to inhabit this ecosystem of memory, imagination, and
persistence through time, Hogg proposes metempsychosis, the transmigration of souls,
as a more suggestive model.
Rolf Inge Godøy’s chapter is focused on how notions of shape, understood as geometric
figures and images stemming from body-motion, metaphors, graphic representation,
and so on, can be associated with the production and perception of music. Central to the
chapter is the understanding that shape cognition is not only deeply rooted in the human
experience of music and in musical imagery as such, but also has the potential to enhance
our understanding of music as a phenomenon in general. The chapter also discusses
introduction: volume 2 7

how musical shape cognition, given that it is becoming increasingly more feasible with
new technology, can contribute to various domains of music-related research and,
furthermore, can be highly valuable to practical applications in musical and multimedia
artistic creation.
The conceiving of an evocative synthesis engine from our imagining of sound is the
substance of Simon Emmerson’s chapter. In it, Emmerson surveys recent neurological
experiments in the synthesis of speech and music and focuses his attention on how
our imagining of sound might be synthesized at some future date. The purpose of this
speculative chapter is not to map out the design and interface of such a system but rather
to conceive of what the act of imagining sound is and how the tool to extract such sound
imagery might be used both for musical purposes and to externalize these formerly
private sounds.

Psychology
Luke Windsor focuses on enforced listening to music in detention camps and explores
the use of music in detention and interrogation while pointing to the creation of ambi-
guity and uncertainty as a central effect. Windsor engages with several cases of psycho-
logical warfare during previous wars and references interrogation practices described
by CIA. The exploration of these cases, where music is used as a sound weapon, leads to
a broader discussion of the application of music to influence behavior and the ethics of
music application in, for instance, marketing.
The representation of hallucinations within audiovisual media forms the subject of
Jonathan Weinel’s chapter. Weinel builds his discussion around the concept of augmented
unreality, and he provides examples from films, VJ performances, digital games, and
other audiovisual media to show how sounds are used to form hallucinations. Ultimately,
Weinel points to a set of structural norms that defines psychedelic hallucinations and
the hypothesis that, with the improvement of digital technologies, the boundaries
between external reality and synthetic unreality may gradually dissolve.
Søren Bech and Jon Francombe’s chapter provides an illustration of how sensory
analysis is undertaken in the audio industry. It demonstrates how the industry attempts
to quantify the listener’s imagination—which is taken to include a range of modifiers of the
listener’s auditory experience including mood, expectation, and previous experience—
in order to ensure that the end result, the listener’s auditory experience or impression
after the audio transmission chain, matches the intended experience as closely as possible.
The example provided to illustrate this is of sensory analysis used for qualitative and quan-
titative evaluation of the listening experience in a personal sound zone. A perceptual
model was developed to reliably predict the listener’s sense of distraction (due to inter-
fering audio) from the experience of listening to audio intended for a particular zone.
Hauke Egermann explores the influence of music on how consumers imagine character-
istics of a brand. The chapter deals with several psychological mechanisms to outline
8 grimshaw-aagaard, walther-hansen, and knakkergaard

the associative and emotional potential of music and illustrates how music aids in estab-
lishing brand recognition and recall in consumers. Egermann elaborates on how music
can create brand attention and positive-affective responses in consumers and can affect
the cognitive meaning of a brand image. In summation, he argues for a brand-music com-
munication model that describes three different functions of music in the creation of
brand identity—brand salience, cognitive meaning, and emotional meaning.
The focus of Erkin Asutay and Daniel Västfjäll’s chapter is the relationship between
sound and emotion. Evidence from behavioral and neuroimaging studies is presented
that documents how sound can evoke emotions and how emotional processes affect
sound perception. Asutay and Västfjäll view the auditory system as an adaptive network
that governs how auditory stimuli influence emotional reactions and how the affective
significance of sound influences auditory attention. This leads to the conclusion that
affective experience is integral to auditory perception.
Andrea Halpern and Katie Overy argue that auditory imagery can be used actively as
a tool in various education and rehearsal sessions. Building on Nelly Ben-Or’s tech-
niques of mental representation for the concert pianist and the pedagogical approaches
of Zoltán Kodály and Edward Gordon, Halpern and Overy suggest that conscious and
deliberate use of auditory imagery should be exploited more in music education and
could be used with profound benefits for musicians as a rehearsal strategy. This leads to
a call for further empirical investigations of how voluntary auditory imagery might be
best used as a training method for both professional musicians and in classroom settings.
Adam Ockelford draws attention to that section of the population that does not nor-
mally engage in everyday listening; those for whom the acoustic properties of sound are
prioritized over the function of sound. In particular, Ockelford points to the listening of
autistic children for whom the perceptual qualities of sound exert an especial fascination
at the expense of the meaning that everyday listening would normally give to sound.
Through research that supports his contention that the development of musical abilities
in children precedes that of language skills, Ockelford makes the claim that the aural
imagination of those on the autistic spectrum is one that processes all sound, even speech,
for their musical structural properties and thus it is music that is the autistic person’s
gateway to communication and empathy.
Lars Ole Bonde considers musical imagery in the context of music therapy sessions
from the tradition of the Bonny Method of guided imagery and music which provides
well-documented examples of such imagery. While Bonde mainly focuses on listening
in clinical settings, he argues that image listening should be seen as a health resource in
everyday listening settings. Taking in perspectives from neuroaffective theory, Bonde
analyzes clinical material and evidence from the analysis of EEG data, and he shows how
music therapy theory—as a specific tradition within musicology—can contribute to
research on music listening through a greater understanding of the multimodal imagery
of such listening.
Musical imagery as a multimodal experience is also the topic of Freya Bailes’s chapter,
where embodied cognition is used as a framework to argue this. Existing empirical studies
introduction: volume 2 9

of musical imagery are reviewed, and Bailes points to future directions for the study of
musical imagery as an embodied phenomenon. Arguing that musical imagery can never
be fully disembodied, Bailes moves beyond the idea of auditory imagery as merely a
simulation of auditory experience by “the mind’s ear.” Instead she outlines how the imagi
ning of sounds and music is always connected to sensory-motor processing.

Aesthetics
Theodore Gracyk takes issue with the claim that imaginative engagement is a pre
requisite for the appreciation of music; that the experience of expressiveness in music
derives from an imaginative enrichment that allows music to be heard as a sequence of
motion and gestures in sound or that the expressive interpretation of music is guided by
imaginative description. While not completely rejecting an imaginative response to
music, Gracyk instead opts for a form of imaginative engagement with music described
as hearing-in. While not all music demands such engagement, hearing-in is not a trigger
for imaginative imagery but rather a musical prop that invites the listener to attend to
music’s animation in, for example, the form of musical causality and anticipation.
Bryan Parkhurst uses contemporary analytic “normativist” aesthetics as a lens through
which to view Leftist/Marxian “normative” aesthetics of music appreciation. In order to
do this, Parkhurst situates the key theses of Ernst Bloch’s theory of utopian musical lis-
tening within the framework of Kendall Walton’s theories of musical fictionality and
emotionality. The aim of this task is to make Bloch’s fundamental position perspicuous
enough that it can be assessed and evaluated. Parkhurst concludes that Bloch’s con-
tention that music should be viewed as a utopian allegory, and that the distinguished
office of (Western classical) music is to contribute to the political project of the imagining
of a better world (a “regnum humanum”), faces difficult objections.
An exploration of the affective dimension of our sonic environment forms the topic
of Ulrik Schmidt’s chapter. Schmidt poses the question, What does it mean to be affected
by the sonic environment as environment? The answer to this question involves a con-
ceptual distinction between atmosphere, ambience, and ecology. Schmidt argues that
affect and imagination are key components in the environmental production of presence,
and he provides examples of the aesthetic potentials of environments and explores how
an environment can “perform” in different ways to affect us as environment.
In his chapter on musical improvisation, Andy Hamilton deals with the cultural aspects
and historical practices of improvisation. The chapter sets out to explore the artistic status
of improvised music and this involves a discussion of the connection between imagi-
nation and art, and the contrast between composition and improvisation. These discus-
sions provide a theoretical framework to outline and defend an aesthetics of imperfection
as a contrast to an aesthetics of perfection. Finally, the artistic value of jazz as an improvised
art form is discussed, and Hamilton ponders whether jazz should be described as
classical or art music.
10 grimshaw-aagaard, walther-hansen, and knakkergaard

Posthumanism
Salomé Voegelin’s chapter contributes to current ideas on materiality, reality, objectivity,
and subjectivity as they are articulated in recent texts on New Materialism. Her chapter
makes use of the writing of Quentin Meillassoux and his posthumanist theorizing,
and it aims to contribute to the discussion through a focus on sound, as sound is seen to
support the reimagination of material relations and processes. In order to qualify and
substantiate her notion of sonic materialism, Voegelin includes narrow listenings to
three sound art works, focusing “on the inexhaustible nature of sound that exists perma-
nently in an expanded and formless now that I inhabit in a present that continues before
and after me.”
Daniël Ploeger investigates the designed sounds of operating systems (particularly
those of Apple and Microsoft computers and devices) from a cultural critical per-
spective, arguing that such sounds are cybernetic prostheses enhancing our capabilities.
In a chapter that takes in initial conceptions of the cyborg, which are overcast by the cyborg’s
roots in the military-industrial complex, to the subversion and use of operating system
sounds for creative purposes, Ploeger discusses the use and subsequent development of
such sounds—from early mainframe computers’ inherent noises to the designed sounds
of today’s computing devices—and shows how they underpin the imagining of computers
as extensions of the human body. Ultimately, for Ploeger, the recent design of operating
system sounds serves to propagate pre-existing ideological concepts of the cyborg as
evinced by our now technologically prosthetisized bodies.
Anne Danielsen’s chapter focuses mainly on the particular rhythmic feels that have
characterized many popular music styles since the 1980s and how these are produced
through the manipulation of sound samples and the timing of rhythm tracks. Initially,
Danielsen evaluates the formation of these rhythmic feels in two perspectives, an internal
and an external, but then goes on to discuss how they constitute a challenge to previous
popular music forms while, at the same time, offering new opportunities for human imagi-
nation and musical creativity. The chapter uncovers transformations across several
styles and discusses whether the present technology at hand can be seen as an extension
of the human, creating musics and causing gestural movements that go beyond human-
kind’s “natural” repertoire.
A musical imagining of the future and an exposition of a challenge to the normative
historical discourse are the subjects of Erik Steinskog’s chapter on Afrofuturism. These
topics are dealt with through a discussion of “blackness” and the theoretical discourse
that addresses the musical style and polemical and political stance of afrofuturist musi-
cians such as Sun Ra and others following in his path. Steinskog suggests that afrofutur-
ist music is a form of sonic time travel that intertwines the modalities of time represented
by notions of past, present, and future, his argument being that reimaginations, reinterpre-
tations, and revisions of a normative past are represented in the technology and music of
the black future.
introduction: volume 2 11

From a background that critically investigates conceptualizations and understandings

of the relations and dialectics between the inner and the outer voice and the dis
cursive implications of the posthumanist appraisal of vocality, Jason D’Aoust examines
the “operatic voice,” or rather the vocality of the opera—especially as it is practiced and
understood in present time. From a philosophically informed perspective, the chapter
carries out studies and analyses of artistic works from different genres, comprising opera,
literature, and film, understanding opera as a cultural work of exclusion and inclusion
and how its practitioners are now deconstructing the canon of opera.

Reference
Schwitters, K. 1932. URSONATE. http://www.costis.org/x/schwitters/ursonate.htm. Accessed
June 15, 2018.
Pa rt I

M USIC A L
PE R FOR M A NC E
Chapter 1

Improv isation
An Ideal Display of Embodied Imagination

Justin Christensen

Introduction

Imagination has often been considered to play a major role in perception, in the
production and appreciation of aesthetic objects, in simulation, and in fanciful creative
thought. Phenomenologists such as Husserl have claimed that imagination should not
be thought of in terms of images or description, but rather as a means of structuring
consciousness and giving meaning to phenomena. For Merleau-Ponty, imagination
brings about a perception that “arouses the expectation of more than it contains, and
this elementary perception is therefore already charged with a meaning” (Merleau-Ponty
2002, 4, italics in original). More recently, Varela and colleagues have stated, “cognition
is not the representation of a pregiven world by a pregiven mind but is rather the enact-
ment of a world and a mind on the basis of a history of the variety of actions that a being
in the world performs” (Varela et al. 1992, 9). Simply said, cognition is not independent
of the world, but instead functions as a means to guide action and perception. For
phenomenologists and for many empiricists, imagination acts to bind cognition together
with action and perception, and can also be seen to govern our perceptions, shaping and
filtering them into meaningful experiences. For instance, Merleau-Ponty has supported
linking together imagination, action and perception, stating,

our waking relations with objects and others especially have an oneiric character as
a matter of principle: others are present to us in the way that dreams are, the way
myths are, and this is enough to question the cleavage between the real and the
imaginary. (1970, 48)
16 justin christensen

This viewpoint is also shared by a number of embodied cognitive theorists, such as

Hawkins and Blakeslee:

As strange as it sounds, when your own behavior is involved, your predictions not
only precede sensation, they determine sensation. Thinking of going to the next
pattern in a sequence causes a cascading prediction of what you should experience
next. As the cascading prediction unfolds, it generates the motor commands neces-
sary to fulfill the prediction. Thinking, predicting, and doing are all part of the same
unfolding of sequences moving down the cortical hierarchy. (2004, 158)

This direct connection of imagination to action suggests that embodiment plays an

important role in how we experience our predictive simulations, our fanciful creative
thoughts and our perceptions. Supporting this, Vittorio Gallese considers simulation
to be embodied, “because it uses a preexisting body-model in the brain and therefore
involves a nonpropositional form of self-representation that also allows [one] to
experience what others are experiencing” (Gallese, quoted in Metzinger 2009, 177).
Since Hawkins and Blakeslee have argued that imagination determines sensation,
and that it shares the same neural activations as for sensations and actions, then the
distance between sensation, action, and simulation in a dynamic framework becomes
quite negligible (Decety and Grèzes 2006). Furthermore, neural imaging studies have
suggested

that when individuals perceive the actions and the emotions produced by others, they
use the same neural mechanisms as when they produce the actions and the emotions
themselves [even though] there is no complete overlap between self- and others
representations. This would lead to confusion and chaotic social interaction. (12)

Beyond this support for a common-coding of perception and action, there is evidence
of perception–action coupling in infant development (Johnson and Johnson 2000),
childhood development (Getchell 2007), and in sports activities (Ranganathan and
Carlton 2007). Attempting to move beyond common-coding, Maes and colleagues
support a more radical embodied approach to explain action-perception coupling in
music listening and performance. They argue, “sensory-motor association learning can
be considered a central mechanism underlying the development of internal models”
(Maes et al. 2014, 2). Similarly, they point out, “the ability to predict the auditory conse-
quences of one’s actions, which is one of the core mechanisms of action-based effects on
perception, depends on previous acquired sensory-motor associations” (2). Alongside
this, they “define the concepts of temporal contiguity and probabilistic contingency as
two [of the] main principles underlying associative learning processes” (2). Furthermore,
they consider that “musical instrument playing [is] a special but highly illustrative
case of sensory-motor association learning” (2). Subsequently, Maes and colleagues
follow a dynamic systems approach to examining embodied music cognition in order
to incorporate social interaction, introspection, and expressivity alongside sensory-
motor coupling in their meta-analytic study of embodied music cognition.
improvisation: embodied imagination 17

Similar to these views, I will argue that this ability to imagine and simulate the actions
and emotions of others plays a major role in our reception of both written music and
improvisational practice (see Wöllner, this volume, chapter 2, on the anticipation of
sound in performance). I fully acknowledge that this is only part of the picture, as
language, with its role of describing and categorizing, also plays a large role in filtering
and shaping our imagination. To attempt a more complete picture, I propose that imagi-
nation is made up of a dynamic collaboration between nonpropositional (embodied)
and propositional (language) forms of knowledge that construct our aesthetic experi-
ences. Bowman has remarked, “When we hear a musical performance, we do not just
‘think,’ nor do we just ‘hear’: we participate with our whole bodies; we construct and
enact it” (Bowman 2004, 47; also seen in Borgo 2005, 44). This combined approach
of the nonpropositional and propositional also fits well with both Heidegger’s and
Gadamer’s theories on aesthetics. For them, art has a great impact on our experiences in
the world, through presenting us with mutually dependent disclosures and “hidden-
nesses.” These disclosures not only disclose themselves, but they also reveal the presence
of the hidden, with these revelations drawing us further into the aesthetic experience.
While some “hiddennesses” are aspects of experience that we lack the ability to con-
ceptualize through language, others are aspects of experience that just have not as of yet
reached the center of attention to be conceptualized. This phenomenological perspec-
tive proposes that the separation of the nonpropositional from the propositional forms
of thought is achieved through having prereflective and reflective forms of conscious-
ness, validating that imagination should be viewed from multiple levels. As a result,
I will argue that a dynamic and multileveled perspective of imagination is necessary for
exploring our musical experiences. Furthermore, I propose that only focusing on the
reflective, verbally reportable aspect of consciousness impoverishes our understanding
of artistic emotions.
Many such as Daniel Dennett (1991) consider disclosure (the ability to give verbal
description) necessary before an experience can be considered a valid conscious
experience. This ignoring of the embodied aspects of the musical experience has also
permeated rationalist and positivist views on aesthetics. DeNora has stated,

listening is too often de-historicised in a way that imposes the model of the (histor-
ically specific) silent and respectful listener as a given. Within this assumption, the
body of the listener is excised. And yet, such listening involves a high degree of
bodily discipline. (2003, 84)

Borgo (2005) has pointed out the inherent tension between an embodied take on music
reception and a more traditional aesthetic view, which purports that one should disinter-
estedly examine an art object as something that is autonomous and fully separate
from oneself. Rationalist and positivist aesthetic viewpoints have also had difficulty in
making aesthetic judgments on musical experience, as the experience is ephemeral and
thus the only art object that remains to be judged is the score. This largely results from
the fact that representation has been considered vital for something to be recognized as
18 justin christensen

a fine art. For instance, owing to music’s ephemerality and lack of disclosed representation,
Kant has needed to consider music generally as agreeable sensations rather than a fine
art (with textual vocal music to be an exception to this norm). Supporting this, Kant has
stated that music, as he hears it, provides “nothing but sensation without concepts, so
that unlike poetry it leaves us with nothing to meditate about” (Kant 2007, §328). This
difficulty is exacerbated for improvisation, which lacks even a score to judge as an art
object. I propose that we need to get rid of the notion of reified art objects, and that we
instead need to re-examine imagination and improvisation, both through our past
interpretations of their usefulness and through the context of more current embodied
cognitive approaches, in order to give imagination and improvisation the important
recognition they deserve as part of artistic experience.

The Shared Double Histories of

Imagination and Improvisation

A historical study of imagination often begins by examining Plato’s suspicion of

imagination seen in the Analogy of the Divided Line in the Republic (Plato 1992, 509c),
where imagination is not seen as an effective means of gaining knowledge, but is instead
the lowest form of cognition, as it only grasps at objects through shadows and reflec-
tions. For Plato, the obtaining of intellect and knowledge is achieved by ignoring one’s
unstable imagination, by transcending one’s unreliable senses, and by aiming toward
gaining apodictic knowledge about the immutable and universal essences that are hidden
behind the objects in the world. Imagination for Plato was an imitation of the sensual,
which he considered to be an imitation of the ideal, and therefore imagination for him
became an imitation of an imitation, a copy of a copy. As a result, it is no surprise that
Plato thought, “an imitator has no worthwhile knowledge of the things he imitates, that
imitation is a kind of game and not something to be taken seriously, and that all the
tragic poets . . . are as imitative as they could possibly be” (Plato 1997, 1206, 602b).
However, even though Plato criticized imagination and the creative arts, he was awed by
the possibility that artists had a direct connection to a higher power or God.

For a poet is an airy thing . . . he is not able to make poetry until he becomes inspired
and goes out of his mind and his intellect is no longer in him. As long as a human
being has his intellect in his possession he will always lack the power to make poetry
or sing prophecy. (Plato 1997, 942, 534b–c)

For me, this simultaneous disparagement and awe of artists is an unresolved dichotomy
in Plato’s writing. On the one hand, imitation and creative thought is reprehensibly far
from the transcendental “forms” while, on the other hand, artistic inspiration in its
excess can form a direct link to the transcendental. Even with this dichotomy, Plato’s
view of creative imagination has had an enduring influence on the arts, such as when
improvisation: embodied imagination 19

Shelley, in A Defence of Poetry, discussed the artist resonating to their internal and
external influences like an Aeolian lyre, drawing on divine effluence (Shelley [1840]
2010). Similarly, Samuel Taylor Coleridge gave a Platonic description of imagination
when he stated that imagination is “a repetition in the finite mind of the eternal act of
creation in the infinite I am” (Coleridge 1984, 304). Johnson has nicely summed up
Plato’s view on creative imagination in the Republic by stating, “But imagination of this
sort is not a rational faculty; rather, it is the result of a kind of demonic possession in
which the poet loses rational control” (2013, 143, italics in original). Thus, the Platonic
creative imagination in its reaching toward the divine necessarily walks a fine line
between genius and insanity. Moreover, if the Platonic creative imagination lacks this
transcendental genius, then it fails society.
The other side of imagination, which has been considered as a mediator between the
senses and thought, has often been seen to begin with Aristotle, for whom

imagination is different from either perceiving or discursive thinking, though it is

not found without sensation, or judgment without it. That this activity is not the
same kind of thinking as judgment is obvious. For imagination lies within our
power whenever we wish. (Aristotle 2004, bk. 3, chap. 3, 427b)

Hume’s view on imagination is a variation on this, having no categorical difference

between perceiving and imagining.

[T]hose perceptions, which enter with most force and violence, we may name
impressions . . . By ideas I mean the faint images of these in thinking and reasoning . . .
The first circumstance, that strikes my eye, is the great resemblance betwixt our
impressions and ideas in every other particular, except their degree of force and
vivacity. ([1731] 1888, 1)

Hobbes was more extreme in his viewpoint, stating, “there is no conception in a man’s
mind, which hath not at first, totally, or by parts, been begotten upon the organs of sense.
The rest are derived from that original” ([1651] 1996, 9). For Hobbes, we are unable
to imagine anything that is completely free from the inputs of our sense apparatus.
Accordingly, Aristotelian imagination has a strong connection to an empirical philo-
sophical viewpoint and is shared by empiricists such as Hobbes, Berkeley, Locke, and
Hume as well as others.
The Platonic and Aristotelian viewpoints of imagination have an inherent tension
between them, congruent with the Cartesian mind–body problem. Since Descartes
(1985) presumed that the mind and the soul are more or less the same thing, the Platonic
view of imagination conforms well to a Cartesian substance of the mind in that it can be
seen to draw inspiration from the transcendental. Similarly, the Aristotelian view of
imagination conforms to a Cartesian substance of the body immanent in the world, as
this viewpoint explores the connections between sense experience and thought. Related
to this, Mary Warnock (1976) has asked how it is that imagination can both facilitate
everyday perception and be a source of novelty. For me, the only way to resolve this
20 justin christensen

dual nature of imagination is to deny the unfounded divisions that have been made
between the mind and the body, and between the divinely inspired and the routinely
experienced.

Tensions between the Sacred and the Divine

Throughout the history of Christian church music, there have existed competing
concerns regarding the amount of polyphony and melismatic ornamentation (singing
multiple notes per syllable of text) allowed during the mass setting. On the one hand,
improvisatory practice has meant innovation. One example of this need for innovation
in musical practice is that early Christian church music began as a monophonic setting.
One problem with monophony is that if one wanted to have boys and men sing together
there were either very few available pitches that could be sung by all of the voices or one
had to expand on the idea of monophony. These contrasting physical limitations could
very possibly have encouraged the simple innovation of parallel motion homophony by
allowing the boys to sing an octave above the men. Improvisation and innovation thus
have relied on coevolving with the constraints and affordances that a situation provides.
“[F]ar from being the antithesis of creativity, constraints on thinking are what make it
possible . . . Constraints map out a territory of structural possibilities which can then be
explored, and perhaps transformed to give another one” (Boden 2004, 95). Within these
constraints, affordances are “action possibilities” present in an environment that are
dependent on the actant’s ability to make use of them.
A good example of a developing dynamic interaction with the affordances of an environ-
ment can be seen in how infants learn to move. Thelen and Smith (1996) proposed a
dynamic systems approach for the development of movement patterns in infants, after
they found that each of their infant subjects faced unique challenges in response to their
individual body dimensions, energy levels, and changing contexts. In overcoming these
challenges, these infants used unique strategies, seen as emergent phenomena arising
from decentralized and local interactions, which Thelen argued would be very difficult
to defend as being innate (1995). Following the ideas of Thelen and Smith, I would argue
that the body (not only as a form of constraint) takes part in dynamic interactions with
the environment spontaneously guiding the individual to find movements, perceptions,
and thoughts that come more naturally for them when attempting to achieve meaning-
ful experiences and achieve goals. This type of improvising as a practical engagement
with one’s environment could be seen to have connections to an Aristotelian view on
imagination, where one synthesizes a meaning from perceptions, goals, and knowledge.
On the other hand, improvisation can also be seen as having some characteristics that
might correspond to a Platonic version of imagination. Church officials saw that music
had power to inspire divine thoughts. Reflecting this, Giannozzo Manetti stated,

all the places of the Temple resounded with the sounds of harmonious symphonies
as well as the concords of diverse instruments, so that it seemed not without reason
that the angels and the sounds and singing of divine paradise had been sent from
improvisation: embodied imagination 21

heaven to us on earth to insinuate in our ears a certain incredible divine sweetness;

wherefore at that moment I was so possessed by ecstasy that I seemed to enjoy the
life of the Blessed here on earth. (Manetti, quoted in Dufay 1966, xxvii)

Manetti’s statement suggests that an individual might reach an altered state of

consciousness by becoming overwhelmed by the music as part of a mystical religious
experience. Rouget (1985) has collected extensive ethnographic data on religious experi-
ences and music, and has presented the idea that ergotropic trance in many cases may
arise from sensory overstimulation and noise as part of a mystical experience. I would
argue that the improvised complex polyphony and melismatic singing in the mass could
contribute to this ecstatic feeling brought on by overstimulating the listeners. Furthering
this view is the evidence that melismatic ornamentation and polyphony were very often
among the musical elements that were specified when church officials repeatedly
attempted to restore sacredness to the mass setting (Fellerer and Hadas 1953).
“Improvisation seems to have been a central aspect of musical practice in the first
centuries of the Christian church” (Sancho-Velazquez 2001, 9). Later, in the ninth cen-
tury, a story was spread around that all of the music that Gregory I wrote down for the
liturgy of the service was whispered to him by God. Sancho-Velazquez (2001) suggests
the spreading of this story was a political move, shifting the divine creative inspiration
from the process of improvisation to the process of composition so that the church had a
reason to unify the liturgical service across the empire. Still, Gregory’s scores were gen-
erally only regarded as outlines over which the singers should follow their inspiration
and the goals of the liturgy of that week (Ferand 1961). Evidence from writings in the
twelfth century taken from a monastery in Spain reveals that singers prolonged notes of
the written chant to allow an upper voice to sing melismatically over the top: “Clearly it
was a style that could have originated, and probably did, in improvisation” (Grout and
Palisca 1988, 103). Also, a fourteenth-century treatise by Tinctoris supports the notion of
improvised polyphony when he divided polyphony into being either mente (impro-
vised) or scripto (written out). Furthermore, in attempting to recover the solemnity of
the service, the Synod of Schwerin requested in 1492 that melismas and polyphony be
reduced to increase the intelligibility of the text especially during the psalmody (Fellerer
and Hadas 1953, 578). Similarly, in 1503 the Council of Basel requested that the Credo not
be “mutilated” (Fellerer and Hadas 1953). As a culmination of the tensions between
divine inspiration and sacredness, the Council of Trent proposed Canon 8 during the
Counter-Reformation, which stated:

the sacred mysteries should be celebrated with utmost reverence, with both deepest
feeling toward God alone, and with external worship that is truly suitable and
becoming, so that others may be filled with devotion and called to religion . . . But
the entire manner of singing in musical modes should be calculated, not to afford
vain delight to the ear, but so that the words may be comprehensible to all.
(Canon 8, quoted and translated in Monson 2002, 9)

These requests for suitable solemnity, reverence, and comprehensibility of text seem to
suggest that sacredness be given a priority over divine inspiration. I find this tension
22 justin christensen

between sacredness and divine inspiration to have similar aspects to the tensions
that we have earlier seen involved with the Platonic and Aristotelian imaginations.
Improvisation as divine inspiration tied to altered states of consciousness and mystical
religious experiences maps well onto Plato’s need for divine inspiration in creative
thought for it to be worthwhile. Similarly, improvisation as an innovative practice with
constraints and affordances maps well onto the Aristotelian imagination that links
together cognition with perception and action.
Furthering the links between divinely inspired improvisation and Platonic imagi-
nation, there have also been accusations of demon possession related to ecstatic
experiences in the church (Edwards 1742), as people feared that they did not know
where these divine inspirations came from, whether from God or from demons
(Edwards 1746).
Outside of the church, there have also been descriptions of ecstatic responses to
improvised performances of secular music. In the sixteenth century, Jacques Descartes
de Ventemille described an improvised (free fantasia) performance of Francesco da
Milano, stating,

he continued with such ravishing skill that little by little, making the strings languish
under his fingers in his sublime way, he transported all those who were listening
into so pleasurable a melancholy that . . . they remained deprived of all senses save
that of hearing. [He left] as much astonishment in each of us as if we had been
elevated by an ecstatic transport of some divine frenzy.
(Descartes de Ventemille, quoted in Weiss and Taruskin 2007, 134)

While improvisation’s role was not only to transport people into ecstatic frenzy, I would
argue that it, showing a similar dual nature to imagination, would have been considered
to act in both the transcendental and material realms. As a result, I will argue in similar
fashion against any needless divisions between improvisation as a powerful and sublime
experience and improvisation as innovatory practice. I see these two sides of improvi-
sation as necessary to one another.

Kant’s Reconciliation of Imaginations

Kant attempted to reconcile the tension between rationalism and empiricism through
his Copernican shift. One secondary effect of this attempt at reconciliation was that
Kant gave the mind an active role in making meaning from sensual experiences by way
of the imagination. This is a big step forward from Locke’s passive perception, where
perceptions and ideas are only passively received by the mind (Locke 1803, II:xii.1, xxii.2,
and xxxiii.3 show examples of this way of thinking). Unfortunately, with Kant’s advance-
ment to active perception, he risked everyone having a completely individual and
solipsistic synthesis of sensory information, where no one would be able relate to
anyone else in how they made meaning out of their perceptions. This was a danger of
improvisation: embodied imagination 23

which he was aware, and so, as a result, Kant introduced the concept of synthetic a priori
knowledge (a concept is described by Kant as “something that is universal and that
serves as a rule” [Kant 1998, A106]).
Kant’s appeal for a synthetic a priori is an appeal for universally generalizable experi-
ence. Basically, if we exist as minds (in a Cartesian sense) the only possibility for us to
share experience with one another and to communicate with one another is either to
have an apparatus for connecting to the world that is universally similar or to passively
receive the information from the world. If we have a mind–body separation and we want
to escape the need for only passively accepting our perceptions of the world, then we
would need a chip in our brains that could bridge the gap, translating experience
from the body to the mind. Kant thus considered subjective experience not to be fully
subjective, and gave it the name transcendental subjectivity, as it gave individuals
some access to an apodictic experience. Through this transcendental subjectivity,
there are some universal ways to experience reality, which builds a universal (although
invisible) foundation on which we can intelligibly communicate to one another and
experience things similarly.
I find that an oversimplified but useful comparison to these hidden (transcendental
subjectivity) universals is with the hidden rules of universal generative grammar that
Chomsky has proposed are hardwired into the brain to allow us to learn languages
quickly and efficiently. Nevertheless, I would argue that neither of these viewpoints has
panned out. Chomskyan generative grammar has rules that are very commonly used
around the world but has failed to find universality (Everett 2005), while Kant’s s ynthetic
a priori has failed to adequately bridge the gap between the substances of the mind and
body. Instead, as Johnson points out,

the rigid separation of understanding from sensation and imagination relegates the
latter to second-class status as falling outside the realm of knowledge. As a result,
judgments of taste can never, for Kant, be determinative or constitutive of experi-
ence. Neither can they be “cognitive,” for he regards the cognitive as the conceptual,
and there is no concept or rule guiding reflective judgment. (2013, 167)

In a way, Kant may have succeeded in bringing together the different types of imagi
nation. However, in doing so, he relegated the Platonic imagination to the same place on
the divided line as Plato, to the very lowest rank. Furthermore, to accomplish this he
had to give great power to transcendental subjectivity, thus partially muting the active
participatory role of the mind, and obligating our experiences and understandings to
be normalized through their universal underpinnings (Steeves 2004).

Shifting Views on Imagination and Improvisation

Improvisation reached its height in classical music, becoming ubiquitous during the
baroque period with performers very often filling out their lines by following the
24 justin christensen

harmonic motion indicated by a written-out figured bass. C. P. E. Bach described

improvisation in his time by stating: “Variation when passages are repeated is indis-
pensable today. The public demands that practically every idea be constantly altered”
(Nettl et al., n.d.). There were also many opportunities for keyboard players to freely
improvise. Sometimes they had strict forms such as fugues and other contrapuntal
structures that they could improvise within, and, at other times, they could freely impro-
vise, not even necessarily following a predetermined musical form, but developing their
ideas as their imagination led them (e.g., fantasias or ricercare) (Bailey 1993). However,
once the figured bass had become removed from the music and musical forms began to
change during the Classical period, the divide between improvisation and composition
became much clearer. One of the reasons for this removal of the figured bass could be
that composers were attempting to wrest creative power away from performers. Verdi
was the epitome of this argument when he stated:

No: I want only one single creation, and I shall be quite satisfied if [the singers]
perform simply and exactly what [the composer] has written. The trouble is that
they do not confine themselves to what he has written. I deny that either singers or
conductors can “create” or work creatively.
(Verdi, quoted in Sancho-Velazquez 2001, 5)

In the spirit of the Enlightenment near the beginning of the eighteenth century, Fénelon’s
The Adventures of Telemachus presented imagination as something childish that could
be exploited when teaching children, but ought to have been eradicated by the time the
students reached adulthood. Fénelon followed the views of Plato, and thus imagination
was yet again considered inferior to reason (Lyons 2005). Romanticism is a period that
we, in retrospect, have decided celebrated creative imagination. However, I would argue
that it instead continues to celebrate divinely inspired imagination and attempts to
guard us against more fanciful imagination, while also continuing to separate the divinely
inspired from the routinely experienced.
Rousseau, an important figure for the growth of Romanticism also wanted to s uppress
imagination in favor of a Lockean passively receptive “sensibility.” Rousseau’s book
Emile; or, On Education had a teacher attempting to protect Emile from his imagination,
killing his imagination through habit. Through this book, Rousseau argued that people
should strongly avoid creatively modifying or distorting the information of the senses
(1979). This viewpoint matches well with Kant’s aesthetics on reception. In his Critique
of Judgement, Kant stated, “judgements of taste” must contain four characteristic
features: (1) disinterestedness, where pleasure is derived from judging something as
beautiful, and not the inverse of judging something as beautiful as a result of finding it
pleasurable; (2) universality of this judgment; (3) necessity of this judgment, where the
beauty is intrinsic to the object itself; and (4) “purposiveness without purpose” of the
object ([1790] 2007). Even though Kant gave the mind an active and participatory
role in perception, all of these features of aesthetic judgment work to greatly reduce the
role of imagination in the aesthetic appreciation of art.
improvisation: embodied imagination 25

Although Rousseau disapproved of imagination during the reception of an aesthetic

object, and Kant argued that individuals needed to be guarded against the fanciful
aspects of imagination (Kant 2007, §420), they were both greatly in favor of the imagi-
nation of the naturally gifted genius during the creative process (following a view similar
to Plato’s view on creative imagination). I think the premise that creative imagination
was an important method for deciding whether a musician was naturally gifted or not
had a major impact on allowing improvisation to continue into the second half of the
nineteenth century against the growing tide of positivism. This viewpoint of improvi-
sation, tied to a naturally gifted creativity, was enhanced by the tradition of eminent
composers calling attention to younger composers based on their improvisational
abilities. Clementi recognized Mozart for his improvising skills, who in turn recognized
Beethoven. As a result, it is not surprising that Liszt contacted Beethoven in an attempt
to gain recognition from Beethoven for his improvising skills (Sancho-Velazquez 2001).
One of the problems for improvisation during the nineteenth century was that
through its ephemerality it could not be considered as an autonomous art object. As a
result, improvisation through the nineteenth century became a vehicle that pointed to
the talent rather than the creativity of the performer, and thus often gave rise to virtu-
osity and showmanship in improvisation. Another problem for improvisation during this
period was that music theorists wanted to find objective universals in music that would
match the truths that they were searching for in the sciences. Christensen has stated that
the nineteenth-century music theorist

Fetis saw the task of his self-proclaimed science of the “philosophie de la musique”
to be to show how tonality was the dialectical synthesis of theory and history; music
history was the actualization of tonality, while music theory could be seen as its
“objectification.” (1996, 56)

More recently, Kivy repeatedly has stated in one way or another through his book on
musical genius, “A musical genius is one who produces supremely valuable musical
works” (2001, 178). Thus, genius is not in the process of musicking (focusing on music
as an act rather than music as an object), but rather in the production of objectifiable
aesthetic art objects. Similarly, through the twentieth century, improvisation has been
discredited by major composers such as Boulez and Berio, and by influential music the-
orists such as Adorno (Peters 2009). When musicology has become a search for a canon
of masterworks, then improvisation has had a difficult time staying relevant in this
changing landscape. Concurrently in the early twentieth century, due to the influence
of behaviorism, imagination was relegated to “the outer darkness of intellectual irrele-
vance” (Morley 2005, 117).
While improvisation slowly departed from classical music, it quickly spread into
other musical styles. Jazz improvisation, which highlights spontaneous and unplanned
performance practice, gained great popularity in the early twentieth century, especially
in America, well in advance of any psychological or philosophical theories that highly
value the role of spontaneous imagination in cognition. However, improvisation in jazz
26 justin christensen

was not without difficulty in its early days. Gushee quotes Colles in the 1927 Groves
Dictionary of Music and Musicians stating that improvisation “is therefore the primitive
act of music-making, existing from the moment that the untutored individual obeys the
impulse to relieve his feelings by bursting into song” (Colles 1927). Alongside this,
Gushee quotes a questioner in Jacobs Orchestra Monthly from 1912 asking about faking
(improvisation) even though it is “not playing correctly,” and later mentions that
improvisation, which is now considered “as an act of creative imagination, in the past it
was sometimes considered anything but, something that inadequately trained musicians
did from rote or force of habit or necessity” (1927, 265–266). Once improvisation in jazz
grew through the early stages, something that might have helped allow improvisation to
flourish in jazz was the act of recording, as it gave the possibility for improvisation to be
fixed into an autonomous aesthetic art object (Solis 2009). Subsequently, this ability to
reify improvisation into fixed art objects has been a point of contention for improvisers.
Some, like Derek Bailey (1993) are greatly against this reification, while others such as
Gabriel Solis (2009) see a positive value in this.

Imagination in Embodied
Cognitive Science

Earlier, I presented the tradeoffs that Kant felt were necessary (overvaluing normalized
experiences while devaluing imagination) to allow individuals to have a participatory
role in perceiving their world around them. Also, I have already several times through
this chapter appealed for an embodied perspective as a better means of participating in
the world around us.

Imagination Eliminates the Need for

a Transcendental Subjectivity
In this section, I will present some empirical theories that provide support for an
embodied perspective when examining aesthetic experience and experience in general.
In opposition to Kant’s transcendental subjectivity, I would like to argue that, through
the brain’s capabilities for predictive simulation alongside our embodied situatedness in
a shared environment, we do not need to have a transcendental subjectivity in order to
share experiences. Simulation, a form of imagination, can either overtly or covertly emulate
the behaviors of the self or of others. It “can be conceived as a conscious reactivation
of previously executed actions stored in memory [where one] may replay her own
past experience in order to extract from it pleasurable, motivational or strictly
informational properties” (Decety and Grèzes 2006, 5). With this, considerable evi-
dence has been presented that shows strong “similarities in the neural circuits activated
improvisation: embodied imagination 27

during the generation, imagination, as well as observation of one’s own and other’s
behavior” (4). Egermann and colleagues have stated that

recent theories of auditory statistical learning, also supported by evidence reported

by Huron (2006) and Pearce and Wiggins (2006), propose that melodic expectations
do not rely on underlying patterns of universal bottom-up principles but have merely
been formed through exposure to syntactic relationships within musical structures
of a given culture (Abdallah and Plumbley, 2009; Pearce and Wiggins, 2006).
Furthermore, computational simulations of this learning process have yielded robust
predictions of perceptual expectations, outperforming other rule-based models
like Narmour’s. (Egermann et al. 2013, 2)

The research of Egermann and colleagues highlights the importance of including

subjective cultural criteria in musical analyses, with the significant improvement it
contributes to predicting listener expectation. Their work strongly contradicts the claim
structuralist musicologists have asserted in the past, which is that the inclusion of sub-
jective elements would compromise the possibility of making a rigorous analysis. This
use of statistical learning has now also branched out into learning jazz improvisation
principles in the same manner through exposure to syntactic relationships (Thom 2000).
I propose that agnosticism toward others (a weakened version of solipsism), which
has been influential to the mind–body separation, is a confabulation of the experience
of the self. Damasio has stated, “consciousness, as currently designed, constrains the
world of imagination to be first and foremost about the individual, about an individual
organism, about the self in the broad sense of the term” (Damasio 2010, 300). We experi-
ence that our imagined simulations of our self are completely unique from the experi-
ences of others. For this reason, thinkers such as Nagel have erroneously argued that
there are fundamentally different mental processes used for imagining the actions of
oneself than the ones that are used for imagining the actions of another (Nagel 1974). In
an extreme exposition of agnosticism of others, Derrida asserts that no

animal or human individual inhabit[s] the same world as another, however close
and similar these living individuals may be . . . between my world and any other
world, there is first the time and space of an infinite difference, an interruption that
is incommensurable with all attempts to make a passage, a bridge, and isthmus, all
attempts at communication, translation, trope, and transfer that the desire for a
world or the want of a world, the being wanting a world will try to pose, impose,
propose, stabilize. There is no world there are only islands. (Derrida 2011, 31)

This agnostic position toward the other is more emphatically expressed by Derrida here
than he may actually believe, but theories following a Cartesian mind–body separation
have a very difficult time defending against this, which has led to their need to search for
essentialisms and universally innate common abilities (such as Kant’s transcendental
subjectivity) that allow for communication between these infinitely separated islands.
Furthermore, I argue, following an embodied viewpoint, that since we have a responsibility
28 justin christensen

to act in a goal-directed manner within the time constraints of real life, we actively
participate in the perception and meaning making of our environment rather than
passively apprehend an accurate reality.
Since our predictions are situated in similar embodiments and environments to
one another, we have access to similar shared experiences. Jean-Luc Nancy in Being
Singular Plural has given an adept perspective on how embodiment can defend us
from solipsism:

That which exists, whatever this might be, coexists because it exists. The co-implication
of existing [l’exister] is the sharing of the world. A world is not something external
to existence; it is not an extrinsic addition to other existences; the world is the coex-
istence that puts these existences together. . . . Kant established that there exists
something, exactly because I can think of a possible existence: but the possible
comes second in relation to the real, because there already exists something real.
(Nancy 2000, 29)

Nancy here proposes that thinking of reality includes an immediate coexistence. Thus, it
is impossible to approach things-in-themselves as they always already exist in a reality
that is plural. In my opinion, the troubles that we have had in joining the different types
of imagination into a single framework stem mainly from one problem, the completely
unnecessary split between mind and body.

Friston’s Free Energy Principle

Recently, Karl Friston (2012) has come up with a theory that strongly supports the idea
that cognition, perception, and action are not easily divisible but instead work together
to play a role in the dynamic interactions between body, mind, and environment. This
work also supports the notion that cognition does not just appear in response to triggers
from the environment. Instead, cognition is continually operating to prepare for the
“remembered present” (Edelman 2001) through simulation. This cognitive simulation
constructs a population of predictions, comparing our present experience to our past
experiences, and thus actively constructing meaningful experiences by filtering stimuli
through memories of past experiences. The free energy principle also gives empirical
support to notions of social simulation that allow us the ability to imagine the actions
and emotions of others, giving us nonpropositional means for the reception of per-
formed music.
The essence of the free energy principle is “that all agents or biological systems (like
us) must minimize free energy” (Friston 2012, 263). Furthermore, the driving question
behind this research is

How, in a changing and unpredictable world, do biological agents resist a natural

tendency to disorder and thermodynamic equilibrium? All the physics that we
know, such as the fluctuation theorem . . . suggests that random fluctuations in our
improvisation: embodied imagination 29

environment will ultimately change our physical states to the point we cease to exist.
And yet, biological systems seem to violate these laws [and] they occupy a small
number of states with a high probability and avoid a large number of other states. In
short, they appear to resist thermodynamic imperatives. (263, 266)

Similar principles have been discussed by Saslaw and Walsh in their chapter (this volume,
chapter 7), and more can be read on this topic there. A major part of Friston’s proposed
answer to this question is the minimization of surprise. He suggests that we do this
in two ways; through altering our perceptions, and through altering our actions. First
of all, we should through learning, evolution, and neurodevelopment maximize the
chances that our model works by becoming a model of our environmental niche. Then,
we can change our level of surprise by changing (optimizing) either our predictions
(perceptions), our expectations (our model), or our actions (Friston 2012). “This per-
spective suggests that we should selectively sample data (or place ourselves in relation to
the world) so that we experience what we expect to experience. In other words, we will
act upon the world to ensure that our predictions come true” (272).
It is this selective sampling of data that supports imagination’s major influence on our
agential interaction with the world. This is supported by Calvo-Merino and colleagues’
work on dance (2005), where experts had stronger neural activations in the premotor
cortex than novices. Furthermore, in this study they also saw that experts had stronger
activations in response to dance styles more similar to their own. Thus, our ability to act
can also influence how we choose to sample our perceptual data and how it directly
influences our behaviors. Related to this, Borgo quotes Evan Parker when he states,

In the end the saxophone has been for me a rather specialised bio-feedback instru-
ment for studying and expanding my control over my hearing and the motor
mechanics of parts of my skeleto-muscular system and their improved functioning
has given me more to think about. Sometimes the body leads the imagination,
sometimes the imagination leads the body. (Parker 1992; Borgo 2005, 58)

Friston suggests something very similar, but rather as a tri-part structure, where some-
times perception leads, sometimes imagination leads, and sometimes the body leads. In
this regard, I find improvisation to be an ideal representation of Friston’s free energy
principle.

Language and Imagination

Regarding the reception of music, I would assert that an individual’s use of vocabulary
and concepts as well as their previous experience can have a very strong impact on
their conscious experience. This premise has been explored in much greater detail in
Mads Walther-Hansen’s chapter (volume 1, chapter 23) and more can be read there.
In support of the claims that link language and experience, language has been found to
be crucial for developing concepts and categories, as it facilitates a development of new
30 justin christensen

categorizations (Lupyan et al. 2007), and when dementia patients lose the concept
knowledge to describe their emotions they can lose the capacity to perceive them
(Lindquist et al. 2014). Further supporting this, Richard Hilbert has found that chronic
pain that does not fit within the standard descriptions of pain causes further suffering
and social isolation in patients, as they have no language with which to describe their
pain (Hilbert 1984). On top of this, meta-analysis results from Nilsson and de López are
consistent with the theory that children with language impairment have a “substantially
lower ToM [theory of mind] performance compared to age-matched typically develop-
ing children” (2016, 143). Furthermore, there have been significant links made between
language processing and spatial representation (Richardson et al. 2003), language pro
cessing and the perception of moving objects (Meteyard et al. 2007), and language
processing and color perception (Thierry et al. 2009). Evidence is considerable and
growing that language is not an innocent tool, but rather that it has a large impact on
conscious experience. As a result, it is easy to see the power that language might have in
the reflective awareness and reflective imagination involved in improvisation.
The importance of reflective consciousness becomes more readily apparent once one
realizes all of the necessary work that is put in to prepare for “spontaneous” improvisa-
tory performances. Musicians spend years learning theory, scales, and the techniques
necessary to pull this off. As Berliner has stated, there is “a lifetime of preparation and
knowledge behind every idea that an improviser performs” (2009, 17). Similarly, musi-
cians converse with one another between and during performances, reflecting and
narrativizing on their musical experiences. Furthermore, as can be seen in Schmicking’s
chapter (volume 1, chapter 4), musicians intermittently imagine and reflect on how they
can guide the music to where they want it to go.
Even with these strong arguments for reflective thinking in improvisation, I would
argue that some thinkers such as Dennett (1991) take it too far when they consider that
the ability to provide a verbal report is necessary for an experience to be considered a
valid conscious experience. As artistic experiences both elicit and elude conceptuali-
zation according to Heidegger, I propose that only focusing on the reflective, verbally
reportable aspect of consciousness impoverishes our understanding of artistic emotions
(Harries 2011). Following this, Thompson and colleagues state, “phenomenologists
emphasize that most of experience is lived through unreflectively and inattentively, with
only a small portion being thematically or attentively given” (Thompson et al. 2005, 59).
This is well supported by the work of Al Bregman, who has spent much of his career
studying auditory scene analysis. Bregman has researched how sound perceptually
either groups together or splits apart into auditory streams, and has found that the
most transformative effect on how the auditory stream is processed is whether it is
foregrounded or backgrounded in the listener’s mind (Bregman 1990). With that said,
Bregman’s research also supports the notion that both the foreground and background
musical elements are consciously experienced by a listener. Flow (Csikszentmihalyi
1990), frequently considered to be an optimal state both for performing and listening to
music, also has support for it being a strong mix of reflective and prereflective states of
improvisation: embodied imagination 31

consciousness. Individuals in flow “are capable of strategically allocating attention, and

hence alternating between reflective and pre-reflective modes of awareness, in order
to meet the requirements of dynamically unfolding and contextually contingent per-
formance environments” (Toner and Moran 2015, 769). Since improvisation is an emergent,
dynamic, and social practice, it is by its very nature ephemeral and elusive. It still allows
for dialoguing with and dialoguing about, but it also heavily relies on more embodied
empathic responses from the listeners who not only think about the music but also
participate in it.

Conclusion

Neither improvisation nor imagination fares well within either the classical Greek or
Enlightenment theories of knowledge and understanding. These theories have instead
privileged objective, disinterested forms of approaching reified aesthetic objects, which
neither imagination nor improvisation has to offer. In this chapter, I have argued that
this failure of imagination and improvisation to find value within aesthetic theories that
value the ontology of fixed art objects says less about the value of either imagination or
improvisation than it does about the value lacking in these aesthetic theories. Instead,
I suggest that we need to focus more on newer aesthetic theories that value the ontology
of the process of musicking. In relation to empirical research, improvisation and imagi-
nation have made a resurgence into popular acceptance concurrent with the rise of
embodied theories that make direct links to perception and action. Researchers like
Decety, Grezes, and Friston have shown the extraordinary influence that imagination
has over perception and action, and the close links that perception, action, and imagina-
tion have to one another. Furthermore, improvisation, with its reliance on a strong inte-
gration of perception, action, and imagination can be seen as a strong reflection of this
tight interdependence. As a result, if we really want to understand our embodied, holis-
tically integrated, and time-constrained musical experiences, I feel that it is imperative
that we include the investigation of the improvisatory and participatory aspects of
meaning making that occur as part of the process of musicking.

References
Abdallah, S., and M. Plumbley. 2009. Information Dynamics: Patterns of Expectation and
Surprise in the Perception of Music. Connection Science 21 (2–3): 89–117. doi:10.1037/0
02h2-3514.78.1.53.
Aristotle. 2004. De Anima. Translated by Hugh Lawson-Tancred. Reissue edition. London:
Penguin.
Bailey, D. 1993. Improvisation: Its Nature and Practice in Music. New York: Da Capo.
Berliner, P. F. 2009. Thinking in Jazz: The Infinite Art of Improvisation. Chicago and London:
University of Chicago Press.
32 justin christensen

Boden, M. A. 2004. The Creative Mind: Myths and Mechanisms. Oxon and New York:
Psychology.
Borgo, D. 2005. Sync or Swarm: Improvising Music in a Complex Age. New York and London:
Continuum.
Bowman, W. 2004. Cognition and the Body: Perspectives from Music Education. In
Landscapes: The Arts, Aesthetics, and Education, edited by L. Bresler, 29–50. Boston,
Dordrecht, and London: Kluwer Academic.
Bregman, A. S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound.
Cambridge, MA, and London: MIT Press.
Calvo-Merino, B., D. E. Glaser, J. Grèzes, R. E. Passingham, and P. Haggard. 2005. Action
Observation and Acquired Motor Skills: An fMRI Study with Expert Dancers. Cerebral
Cortex 15 (8): 1243–1249. doi:10.1093/cercor/bhi007.
Christensen, T. 1996. Fetis and Emerging Tonal Consciousness. In Music Theory in the Age of
Romanticism, edited by I. Bent, 37–56. Cambridge: Cambridge University Press.
Coleridge, S. T. 1984. Biographia Literaria; or, Biographical Sketches of My Literary Life and
Opinions. Vol. 1. Edited by J. Engell and J. Bate. Princeton: Princeton University Press.
Colles, H. C. 1927. Groves Dictionary of Music and Musicians. 3rd ed. Edited by H. C. Colles.
New York: Macmillan.
Csikszentmihalyi, M. 1990. Flow: The Psychology of Optimal Experience. New York: Harper &
Row.
Damasio, A. R. 2010. Self Comes to Mind: Constructing the Conscious Brain. 1st ed. New York:
Pantheon.
Decety, J., and J. Grèzes. 2006. The Power of Simulation: Imagining One’s Own and Other’s
Behavior. Brain Research 1079 (1): 4–14. doi:10.1016/j.brainres.2005.12.115.
Dennett, D. C. 1991. Real Patterns. Journal of Philosophy 88 (1): 27–51. doi:10.2307/2027085.
DeNora, T. 2003. After Adorno: Rethinking Music Sociology. Cambridge: Cambridge
University Press.
Derrida, J. 2011. The Beast and the Sovereign. Vol. 2. Edited by M. Lisse, M.-L. Mallet, and
G. Michaud. Translated by G. Bennington. Chicago and London: University of Chicago Press.
Descartes, R. 1985. The Philosophical Writings of Descartes. Vol. 2. Translated by J. Cottingham,
R. Stoothoff, and D. Murdoch. Cambridge: Cambridge University Press.
Dufay, G. 1966. Opera Omnia. Vol. 2. Edited by H. Besseler. Rome: American Institute of
Musicology.
Edelman, G. 2001. Consciousness: The Remembered Present. Annals of the New York Academy
of Sciences 929 (1): 111–122. doi:10.1111/j.1749–6632.2001.tb05711.x.
Edwards, J. 1742. Some Thoughts Concerning the Present Revival of Religion in New-England,
and the Way in Which It Ought to Be Acknowledged and Promoted, Humbly Offered to the
Publick, in a Treatise on That Subject. Boston: S. Kneeland and T. Green.
Edwards, J. 1746. The Treatise on Religious Affections. New York: American Tract Society.
Egermann, H., M. T. Pearce, G. A. Wiggins, and S. McAdams. 2013. Probabilistic Models
of Expectation Violation Predict Psychophysiological Emotional Responses to Live
Concert Music. Cognitive, Affective, and Behavioral Neuroscience 13 (3): 533–553. doi:10.3758/
s13415-013-0161-y.
Everett, D. L. 2005. Cultural Constraints on Grammar and Cognition in Pirahã: Another
Look at the Design Features of Human Language. Current Anthropology 46 (4): 621–646.
doi:10.1086/431525.
improvisation: embodied imagination 33

Fellerer, K. G., and M. Hadas. 1953. Church Music and the Council of Trent. Musical Quarterly
39 (4): 576–594.
Ferand, E. 1961. Improvisation in Nine Centuries of Western Music: An Anthology. Koln:
Arno Volk Verlag.
Friston, K. J. 2012. Free Energy and Global Dynamics. In Principles of Brain Dynamics: Global
State Interactions, edited by M. I. Rabinovich, K. J. Friston, and P. Varona, 261–292.
Cambridge, MA, and London: MIT Press.
Getchell, N. 2007. Developmental Aspects of Perception-Action Coupling in Multi-Limb
Coordination: Rhythmic Sensorimotor Synchronization. Motor Control 11: 1–15.
Grout, D. J., and Claude V. Palisca. 1988. A History of Western Music. London: Norton.
Harries, K. 2011. Art Matters: A Critical Commentary on Heidegger’s “The Origin of the Work
of Art.” New York: Springer.
Hawkins, J., and S. Blakeslee. 2004. On Intelligence. New York: Times Books.
Hilbert, R. A. 1984. The Acultural Dimensions of Chronic Pain: Flawed Reality Construction
and the Problem of Meaning. Social Problems 31 (4): 365–378. doi:10.2307/800384.
Hobbes, T. 1996. Leviathan. Edited by J. C. A. Gaskin. Oxford: Oxford University Press.
Hume, D. 1888. A Treatise of Human Nature. Edited by L. A. Selby-Bigge. Oxford: Clarendon
Press.
Huron, D. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge:
MIT Press.
Johnson, M. 2013. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason.
Chicago and London: University of Chicago Press.
Johnson, S. P., and K. L. Johnson. 2000. Early Perception-Action Coupling: Eye Movements
and the Development of Object Perception. Infant Behavior and Development 23 (3–4):
461–483. doi:10.1016/S0163-6383(01)00057-1.
Kant, I. 1998. Critique of Pure Reason. Edited by A. W. Wood. Translated by P. Guyer. The
Cambridge Edition of the Works of Immanuel Kant. Cambridge and New York: Cambridge
University Press.
Kant, I. 2007. Critique of Judgement. Edited by N. Walker. Translated by J. C. Meredith. Oxford:
Oxford University Press.
Kivy, P. 2001. The Possessor and the Possessed: Handel, Mozart, Beethoven, and the Idea of
Musical Genius. New Haven and London: Yale University Press.
Lindquist, K. A., M. Gendron, L. F. Barrett, and B. C. Dickerson. 2014. Emotion Perception,
but Not Affect Perception, Is Impaired with Semantic Memory Loss. Emotion 14 (2):
375–387. doi:10.1037/a0035293.
Locke, J. 1803. An Essay Concerning Human Understanding. 1st American ed. Vol. 2. Boston:
David Carlisle.
Lupyan, G., D. H. Rakison, and J. L. McClelland. 2007. Language Is Not Just for Talking
Redundant Labels Facilitate Learning of Novel Categories. Psychological Science 18 (12):
1077–1083. doi:10.1111/j.1467–9280.2007.02028.x.
Lyons, J. D. 2005. Before Imagination: Embodied Thought from Montaigne to Rousseau.
Stanford, CA: Stanford University Press.
Maes, P.-J., M. Leman, C. Palmer, and M. Wanderley. 2014. Action-Based Effects on Music
Perception. Frontiers in Psychology 4:1008. doi:10.3389/fpsyg.2013.01008.
Merleau-Ponty, M. 1970. Themes from the Lectures at the Collège de France, 1952–1960.
Evanston, IL: Northwestern University Press.
34 justin christensen

Merleau-Ponty, M. 2002. Phenomenology of Perception. Translated by C. Smith. London and

New York: Psychology.
Meteyard, L., B. Bahrami, and G. Vigliocco. 2007. Motion Detection and Motion Verbs:
Language Affects Low-Level Visual Perception. Psychological Science 18 (11): 1007–1013.
doi:10.1111/j.1467–9280.2007.02016.x.
Metzinger, T. 2009. The Ego Tunnel: The Science of the Mind and the Myth of the Self. New York:
Basic Books.
Monson, C. A. 2002. The Council of Trent Revisited. Journal of the American Musicological
Society 55 (1): 1–37. doi:10.1525/jams.2002.55.1.1.
Morley, J. 2005. Introduction: Phenomenology of Imagination. Phenomenology and the
Cognitive Sciences 4 (2): 117–120. doi:10.1007/s11097-005-0134-x.
Nagel, T. 1974. What Is It Like to Be a Bat? Philosophical Review 83 (4): 435–450.
doi:10.2307/2183914.
Nancy, J.-L. 2000. Being Singular Plural. Translated by R. D. Richardson, and A. E. O’Byrne.
Stanford: Stanford University Press.
Nettl, B., R. C. Wegman, I. Horsley, M. Collins, S. A. Carter, G. Garden, et al. n.d. Improvisation.
Grove Music Online. Edited by D. Root. http://www.oxfordmusiconline.com. Accessed
July 15, 2015.
Nilsson, K. K., and K. J. de López. 2016. Theory of Mind in Children with Specific Language
Impairment: A Systematic Review and Meta-Analysis. Child Development 87 (1): 143–153.
doi:10.1111/cdev.12462.
Parker, E. 1992. Man and Machine 1992: “De Motu” for Buschi Niebergall. http://www.efi.
group.shef.ac.uk/fulltext/demotu.html. Accessed April 17, 2017.
Pearce, M. T., and G. A. Wiggins. 2006. Expectation in Melody: The Influence of Context and
Learning. Music Perception 23 (5): 377–405. doi:10.1525/mp.2006.23.5.377.
Peters, G. 2009. The Philosophy of Improvisation. Chicago and London: University of Chicago
Press.
Plato. 1992. Republic. Translated by G. M. A. Grube. Revised by C. D. C. Reeve. Indianapolis:
Hackett.
Plato. 1997. Complete Works. Edited by J. M. Cooper and D. S. Hutchinson. Indianapolis:
Hackett.
Ranganathan, R., and L. G. Carlton. 2007. Perception-Action Coupling and Anticipatory
Performance in Baseball Batting. Journal of Motor Behavior 39 (5): 369–380. doi:10.3200/
JMBR.39.5.369–380.
Richardson, D. C., M. J. Spivey, L. W. Barsalou, and K. McRae. 2003. Spatial Representations
Activated during Real-Time Comprehension of Verbs. Cognitive Science 27 (5): 767–780.
doi:10.1016/S0364-0213(03)00064-8.
Rouget, G. 1985. Music and Trance: A Theory of the Relations between Music and Possession.
Chicago: University of Chicago Press.
Rousseau, J. J. 1979. Emile; or, On Education. Translated by A. Bloom. New York: Basic Books.
Sancho-Velazquez, A. 2001. The Legacy of Genius: Improvisation, Romantic Imagination, and
the Western Musical Canon. PhD thesis, University of California. Los Angeles.
Shelley, P. B. (1840) 2010. A Defence of Poetry and Other Essays. Whitefish, MT: Kessinger.
Solis, G. 2009. Genius, Improvisation, and the Narratives of Jazz History. In Musical
Improvisation: Art, Education, and Society, edited by G. Solis and B. Nettl, 90–102. Urbana:
University of Illinois Press.
improvisation: embodied imagination 35

Steeves, J. B. 2004. Imagining Bodies: Merleau-Ponty’s Philosophy of Imagination. Pittsburgh:

Duquesne University Press.
Thelen, E. 1995. Motor Development: A New Synthesis. American Psychologist 50 (2): 79–95.
doi:10.1037/0003-066X.50.2.79.
Thelen, E., and L. B. Smith. 1996. A Dynamic Systems Approach to the Development of Cognition
and Action. Cambridge, MA: MIT Press.
Thierry, G., P. Athanasopoulos, A. Wiggett, B. Dering, and J.-R. Kuipers. 2009. Unconscious
Effects of Language-Specific Terminology on Preattentive Color Perception. Proceedings of
the National Academy of Sciences 106 (11): 4567–4570. doi:10.1073/pnas.0811155106.
Thom, B. 2000. Unsupervised Learning and Interactive Jazz/Blues Improvisation. In Proceedings
of the Seventeenth National Conference on Artificial Intelligence: Austin, July 31–August 2,
652–657. Menlo Park: AAAI Press.
Thompson, E., A. Lutz, and D. Cosmelli. 2005. Neurophenomenology: An Introduction for
Neurophilosophers. In Cognition and the Brain: The Philosophy and Neuroscience Movement,
edited by A. Brook and K. Akins, 40–97. Cambridge University Press.
Toner, J., and A. Moran. 2015. In Praise of Conscious Awareness: A New Framework for the
Investigation of “Continuous Improvement” in Expert Athletes. In Frontiers in Psychology,
5: 769. doi:10.3389/fpsyg.2014.00769.
Varela, F. J., E. Rosch, and E. Thompson. 1992. The Embodied Mind: Cognitive Science and
Human Experience. Cambridge, MA: MIT Press.
Warnock, M. 1976. Imagination. Berkeley and Los Angeles: University of California Press.
Weiss, P., and R. Taruskin. 2007. Music in the Western World. Belmont: Cengage Learning.
chapter 2

A n ticipated Son ic
Actions a n d Sou n ds
i n Per for m a nce

Clemens Wöllner

Introduction

Imagine the following situation: a guitarist performs in a large concert venue together
with a singer, a bass guitarist, and a percussionist. At one point, the stage monitor speak-
ers cease functioning, and the guitarist does not hear the sound of her own instrument
or the sounds produced by the others. Only some noise from the audience reaches her
ears and some delayed feedback from the loudspeakers directed away from the stage.
Puzzled at first, the guitarist quickly glances at the fellow musicians and then continues
playing. They had performed the piece a dozen times and she is an experienced guitarist,
so she is able to anticipate the sound outcomes of her own performance actions and to
synchronize with the imagined sounds of the others, all the while ignoring the delayed
acoustic feedback from the audience loudspeakers. Musicians such as the one described
in this situation are typically well aware of the effects of their actions, so they may
depend less on actual auditory feedback. Long training enables them to associate fine-
tuned motor behavior with sounding action outcomes, and their actions are tightly
coupled with sounds in their imagination (Keller 2012). Even when hearing sounds,
skilled musicians may experience a motor resonance of the corresponding actions
(Bangert et al. 2006). The sound waves transform to sonic actions in their auditory and
motor imagery, enabling them to perform even when auditory feedback is disrupted or
when anticipating their own and others’ sounds.
This chapter elaborates on anticipated sounds in performance, and focuses on sonic
actions in bodily representations of sounds (see also, Godøy this volume, chapter 12, on
musical shape cognition). Research into traditional instrumental performances is
discussed, and features of electroacoustically generated sounds in gestural performances
38 clemens wöllner

are described for controller-driven music. In all these examples, sound characteristics,
according to more restricted definitions, refer to the timbre of instrumental or vocal
sounds in a performance. Certain spectral components, as specified later, characterize
and distinguish sounds from each other. In a wider sense, the sounds of a music per-
formance include further components such as timing and dynamics that make a
performance unique and distinctive from others. These sound qualities are often related
to timbre, such that higher intensity in many acoustical instruments alters their timbre
as well, or the chosen tempo affects playing techniques, articulation, and sound quality.
Distinctions between the two concepts of sound may thus be primarily of theoretical
interest. Yet, for electronic music, an unlimited combination of sound features is pos-
sible that does not necessarily involve the aforementioned interdependencies between
timing, intensity and timbre. The chapter will focus on both perspectives of musical
sounds in actual and imagined performances and explores ways in which anticipated
sonic actions might differ between performances of physical instruments and those of
gestural, controller-based music. The main argument is that musicians construct images
of their sonic actions that permit them to perform independently from what they actu-
ally hear, while at the same time a feedback mechanism is needed for controlling the
actual sounds that the audience should perceive.

Approaches to Sounds in Traditional

Western Performance

In what ways do musicians establish images of their sonic actions? Findings of research
on perception and auditory imagery, mental performances and anticipation of sound
events are discussed in the following section. The research presented here employed
physical analog instruments that allow musicians to construct multimodal images
of their actions without sound modifications or distortions that are possible with
electronic instruments.

Perception and Imagination of Timbral Qualities

Among the various parameters that characterize music performances, timbre, or “sound
color” (Slawson 1985; Krumhansl 1989, McAdams, Winsberg, Donnadieu, de Soete and
Krimphoff 1995), is one of the key features. Listeners remark the unique timbre of the
voices of great singers and describe them as a personal expression of musical ideas.
Musical instruments of famous instrument makers are well known for their character-
istic sound qualities, which are further shaped by experienced performers. Even for
performances on the piano, offering a more limited variability in sound parameters
(compared to, say, string instruments), which are mainly based on keystroke velocity
anticipated sonic actions and sounds in performance 39

and pedaling, listeners ascribe individual sound qualities to pianists that are reflected
in a number of performance parameters such as touch or articulation (Bernays and
Traube 2014). On the other hand, these qualities are not fixed in Western musical nota-
tion, nor can they be captured in a direct, one-dimensional way. Empirical approaches to
music performance seem to have focused primarily on timing, pitch, dynamics, and
articulation. Though these dimensions may characterize key elements of music per-
formances, other features shape musical experiences to a significant extent. For electro-
acoustic music, Stockhausen (1963) defines five musical dimensions: pitch including
harmony and melody, duration including meter and rhythm, timbre (“Tonfarbe,” literally
“tone color”), dynamics, and spatial aspects. According to Stockhausen, each dimension
should be equally important for composers, performers, and listeners, and he employed
this stance for his own compositions.
While the first three of Stockhausen’s parameters have long been described and
indicated in the scores in relatively precise terms (typically, however, not taking into
account the fine nuances in performers’ microtiming or dynamics (see also Danielsen
this volume, chapter 29, on microtiming), defining the tone color seems more challeng-
ing. Musicians, theorists, and empirical researchers often employ verbal descriptions
such as full, bright, diffuse, or tense in their descriptive approaches to timbre, and these
should relate to acoustic features of the sounds. Measurements of the acoustic features,
then, focus on spectral components, including formant areas, spectral centroid, spectral
flux, or intensity of selected partials. Reuter and Siddiq (2017) present an overview of
various attempts to classify instrumental timbres by assessing their closeness or dis-
tance to each other in so-called timbre spaces. Grey (1975) was the first to construct
such a timbre space by having listeners rate the perceptual quality of synthesized sounds.
The dissimilarity of the perceived timbres was transferred into a three-dimensional
space using the statistical method of multidimensional scaling. Briefly, the further away
the timbres are in the space, the more they differ from each other in perceptual judg-
ments. Several other researchers constructed timbre spaces in the following years. In
their meta-analysis, Reuter and Siddiq found out that the dimensional solutions for
describing instrumental sounds vary widely in different studies of timbre spaces. They
tested the synthesizer sounds used in these studies and compared them with the timbres
of actual instrumental sound samples from the Vienna Symphonic Library. Listeners
subsequently rated the dissimilarity of the timbres in pairwise comparison judgments.
As a result, there were large differences between the timbral qualities of the same instru-
ments across the stimuli sets used in the respective studies, indicating limits in the
comparability and generalizability of synthesized sounds. In contrast, ratings within
the set of the actual instrumental samples were more widespread (see the boxes with a
“v” as first letter, Figure 2.1). Consequently, Reuter and Siddiq suggest that a wider range
of sound samples should be investigated and that perceptual judgments should be
more firmly related to acoustical properties.
If sound qualities, according to Reuter and Siddiq, are rather elusive and lead to great
variation in perceptual judgments and descriptions, in what ways are they relevant for
musical performers and composers? Even more, some Western music, especially of the
40 clemens wöllner

vS
vS

vS
gS1
gS3 gS1
gS2 vT gS1
kS gS2
gS2 kS
vF kP gS3 gS1
vT gP gF
gF
vP vF gT kS
gP kP

kP vF gT gFgC1
vP gF

Dimension 3
gE gC1 gF
gF
gE
kP gE kT
kT gC2
gC1
gB
vB gC1
kT kB
vB gC1
vC kF
kT
vE kF
vE vE vE
kF
kE
kB vC
vk
kC vk vk
vk vk
kB kC
kE vE vk vk

vT kT vC
kT vC
kP
kC
kB
vP vC
vk
D kP

im kC
kD vC
en kP
sio
on 1
kD vB
n vk
nsi
2 ime
D

Figure 2.1 Meta timbre space including four dimensions (fourth dimension: shades on the
gray scale).
(1) Sounds used in the study by Grey (1975): gB (bassoon), gC2 (bass clarinet), gC1 (E<flat> clarinet),
gE (English horn), gF (French horn), gS1-3 (cello with different playing techniques),
gT (trombone), gP (trumpet).
(2) Sounds used in Krumhansl (1989) and McAdams et al. (1995): kB (bassoon), kC (clarinet), kE (English horn), kF
(French horn), kS (strings), kT (trombone), kP (trumpet).
(3) VSL sounds (Vienna Symphonic Library): vB (bassoon), vC (clarinet), vE (English horn), vF (French horn), vS
(strings), vT (trombone), vP (trumpet) (Reuter and Siddiq 2017, 160).

baroque period with its widespread stress on structural elements rather than per
formative affordances of the instruments, sounds adequate even if it is performed on a
variety of different instruments. Arrangements for new instruments are popular and com-
mon, suggesting that the essence of a piece of music may remain intact and recognizable
even with completely different timbres. On the other hand, musical arrangements also
point to the fascination that the timbres of various instruments have for the audience.
Notions that timbral qualities are hard to capture in language do not imply that they should
be less relevant. In addition, it is generally accepted, and somewhat expected from musical
performers, that timbre differs from one performer to the other, giving rise to character-
istic sound qualities of soloists and whole orchestras. For recorded music, the mixing
anticipated sonic actions and sounds in performance 41

engineer’s and sound producer’s personal ideals of timbre further influence the sound of
a recording. It can thus be assumed that performers, producers, and listeners alike have
distinct images of sound qualities that shape their expectations and, in the case of musi-
cians, their actions. A basis for this claim certainly lies in the fact that people are indeed
able to distinguish between timbres, and that they do imagine them vividly.
Evidence for the vivid imagery of timbre stems from empirical research that addresses
differences and similarities between imagined and perceived instrumental timbres.
Halpern et al. (2004) asked ten participants with moderate musical experience to judge
the similarity of instrumental timbres in pairwise comparisons. The sounds were taken
from the McGill University Master Samples Library and included realistic sounds of
musical instruments. Participants were either acoustically presented with the samples
or had to imagine the sounds, while their brain activity was assessed with fMRI. The
pairwise comparisons, indicating the similarity or distance of the timbres, were ana-
lyzed with multidimensional scaling, resulting in two dimensions that the authors
defined as “brilliance” and “nasality.” As an example, the oboe timbre was placed highly
in both dimensions, whereas the clarinet was low in both dimensions. Interestingly,
there were roughly similar dimensional scaling solutions for the perceived and imag-
ined timbres. In addition, the similarity ratings for the timbre pairs correlated highly
between the imagined and perceived conditions, suggesting some overlap in the cog-
nitive processes of perception and imagery. This conclusion is further supported by
neuroimaging results showing that areas in the right auditory cortex were activated both
in timbre perception and in timbre imagination. In addition, there was some activation
in the supplementary motor area during timbre imagery, but not in perception.
Musicians may have accessed some unspecific motor component during imagery, or
they subvocalized the pitches of the instruments during imagery. Further research,
conversely, questions whether individuals can indeed imagine musical timbre vividly,
and suggests that it is rather their representations of timing or pitch that are more
detailed and accurate (Bailes 2007). Familiarity with a piece of music clearly aids in
imagining timbral qualities.
Despite the different systems in which timbral qualities are classified, they are
undoubtedly among the key characteristics of individual music performances and,
accordingly, one of the first features listeners may perceive when appraising musical
interpretations. Whereas the timing and dynamics unfold over time, the sound col-
ors of a performance are present immediately. Some evidence for the significance of
the sound quality for listeners’ judgments stems from research on “thin slices” of
music (Gjerdingen and Perrott 2008; Krumhansl 2010). Even when presented with
very short musical excerpts of 300 and 400 ms, listeners were able to provide judg-
ments on genre or emotional content. For well-known pieces, they could also name
the performers or release decade. Since information about other musical elements is
reduced, it can be assumed that timbral qualities have a paramount role for these
quick evaluations, especially for genre judgments. The same evaluation strategies, in
combination with other elements such as timing, dynamics, or pitch, are present of
course for longer durations.
42 clemens wöllner

Sounds in Imagined Performances and Mental Practice

Being able to imagine sound is essential for improvisers and composers, who need to
construct an image of the sound and structure of the music before it is played and heard.
Even composers and sound designers working on their instruments or computers, often
in trial and error processes, need some form of imagination in order to evaluate their
choices of the sounds. In an early study, Agnew (1922) scanned the diaries, letters, and
other sources of five well-known composers and reports that Schumann had very vivid
“inner hearing” (280), Berlioz “heard his compositions mentally” (284), and Tchaikovsky
was supposedly already aware of the sound quality in the Moscow concert hall in which
his compositions should be performed at a later time.
For music performance, auditory imagery is central for at least four reasons. First,
performers need to control the quality of their own sound, by exerting the muscles of
the vocal tract or putting the fingers in the right position before producing a sound
(Keller et al. 2010). Second, when practicing mentally without any overt movements,
they vividly imagine the sound of the music, resulting in benefits for memorization or
interpretative choices (Highben and Palmer 2004; Connolly and Williamon 2004).
Third, performing together with others and anticipating their sounds—for instance, in a
jam session—requires anticipatory imagery of one’s own and others’ actions and action
goals (Keller 2012). This skill is also important in musical conducting (Wöllner 2012).
Conductors need to imagine the course of the music while, at the same time, anticipat-
ing what the orchestra is going to play. So, experienced conductors indicate their inten-
tions by use of gestures and facial expressions some moments before the respective
sounds can be heard, which permits them to react and adjust their conducting in case
the orchestra performs in a different way than anticipated. Finally, it is important for
musical performers to imagine what the audience may perceive, especially if the musi-
cians do not hear themselves well in a large ensemble and when acoustic feedback is
impaired. Yet, even in situations with ideal acoustic reverberation, performers should
develop vivid inner representations of sounds, since they are sometimes different from
the sounds that the audience perceives; this depends on intrabodily sound transmission
to the cochlea from the vibrating instruments or the singer’s own voice. Therefore, per-
formers should ideally have a sense for the external quality of their sounds, in the same
way as experienced speakers and actors control the timbre of their voices by means of
recordings. Recent approaches to virtual acoustic environments aim at offering musicians
real-time feedback of their playing from different audience positions (Brereton 2017).
Empirical studies of auditory imagery have often employed methods in which audi-
tory feedback was deprived. The rationale for this approach lies in the idea that if musi-
cians have access to stable sound representations and auditory imagery, then they may
depend less on external acoustic feedback. In a sight-reading study in which different
types of feedback were manipulated (Banton 1995), pianists did not depend on auditory
feedback. Sight-reading performances with the sound of a digital piano switched off
did not lead to more errors, thus it was not necessary for them to hear what they actually
anticipated sonic actions and sounds in performance 43

played. It can be assumed that pianists could vividly imagine and anticipate the sounds
during sight-reading. Finney (1997) observed in a related study that manipulations of
pitch in auditory feedback interfered with pianists’ performance plans and impaired
their play. When auditory feedback was completely absent, on the other hand, their
imagery skills allowed them to perform without disruptions. The tactile and kinesthetic
feedback was evidently more important for pianists to control their performances than
the external auditory information. These findings show that employing different feed-
back modalities in empirical studies allows insights into the stability of performers’
imagery skills.
Further research investigated the learning and recall of unfamiliar music under
different feedback conditions. Highben and Palmer (2004) asked pianists to practice
without auditory feedback (i.e., playing without sound), without motor feedback
(i.e., score reading while listening to a prerecorded version of the piece), or while
practicing the music in their minds without any feedback at all. The last condition led to
worse results than the conditions where only one of the feedback modalities was absent.
Those pianists who succeeded in an auditory skill post-test were also better in learning
with no auditory feedback of the piano. It can be concluded that musicians with good
aural skills may benefit from more stable auditory images, supporting them in practice
and performance. In a related study (Brown and Palmer 2013), higher auditory imagery
skills allowed the pianists to recall more notes, to play with greater temporal regularity,
and to overcome distortion caused by experimental interferences. These results indicate
that the individual stability of auditory imagery aids in performance and recall, even in
situations when the sensory feedback of the actual sound is absent or altered. Auditory
imagery was not correlated with motor imagery, suggesting that both cognitive skills are
relatively independent in performers.
Multimodal imagery skills are particularly vital for mental practice, in which per-
formers play the music “in their minds’ ear” without overt muscle movements. Such mental
performances are only efficient and feasible if performers possess efficient imagery
skills. A study assessed timing stability in piano performance across experimental vari-
ations of performance feedback and imagery (Wöllner and Williamon 2007). Pianists
were asked to perform a memorized composition under four conditions that included a
normal performance as well as conditions without auditory feedback, without auditory
and visual feedback, and, finally, while tapping along with an imagined performance.
Analyses of the microtiming and dynamics revealed that the condition without auditory
feedback (but with haptic feedback of the MIDI piano) was relatively close to the normal
performance and deviated only about 10 percent, while tapping along with the imag-
ined performance led to strong timing deviations of up to 40 percent. In other words,
musicians had developed strong auditory-motor images of the compositions that did not
impair their play as long as they received the kinesthetic feedback of their fingers from
the piano keys. These results were not reflected in the pianists’ self-evaluations of their
practicing and memorizing strategies, in which they indicated that they had memorized
and employed aural, visual, kinesthetic, and conceptual images of the music’s structure
to a similar extent. In an associated study (Clark and Williamon 2011), the timing accuracy
44 clemens wöllner

of pianists’ imagined performances was related to the amount of time they had spent
practicing mentally per day, while results of a self-report imagery vividness test were
correlated only with live performances, not imagined performances. These findings sug-
gest that domain-specific practicing may enhance musical imagery skills, which are not
necessarily related to the general vividness of imagery outside the specific skills.
In summary, vivid auditory imagery may aid in performance when sounds cannot be
perceived. Studies provided evidence that auditory perception and imagery are closely
related, since the cognitive processes involved are highly comparable and both engage
the secondary auditory cortex (Daselaar et al. 2010; for a review, see Hubbard 2010). As
shown previously, when individuals evaluate the qualities of the sounds they imagine,
their judgments generally resemble those of the actually perceived timbres (Halpern
et al. 2004). Similarly, individuals are able to imagine durations and pitches in conditions
where they do not hear the sounds (Janata and Paroo 2006).
Vivid auditory, visual, and motor imagery is thus central for mentally practicing
music, but also for other performance areas such as sports or medicine in which athletes
and surgeons train “internally” a number of crucial actions, acquire new skills, and
prepare for their performance in the absence of any overt physical movements (Cocks
et al. 2014). Mental practice has been shown to be beneficial compared to no practice
(see Driskell et al. 1994; Connolly and Williamon 2004) and can aid the performers in
re-enforcing their imagery skills, which are necessary for building the auditory and motor
representations of a musical piece or of athletic movement patterns. Perhaps unsur-
prisingly then, musical training increases auditory imagery skills. In a study using a
melody-continuation paradigm in which only the beginnings of the melodies were actually
played, musicians had a more vivid imagination of the following tones of the melodies
that were not played (Herholz et al. 2008). Compared to nonmusicians, they responded
more quickly to single incorrect tones that were played during the imagined continu-
ation of the melodies and showed a neural mismatch negativity, leading the authors to
suggest that imagery shares the same neural processes with actual perception.

Anticipatory Imagery of Sonic Actions

In the performance of music, auditory imagery is a key element for anticipating and
shaping actions. Musicians access images of the sound before they actually play the
music. In doing so, they constantly activate representations of the sounds in working
memory (Kalakoski 2001) which are processed together with further images of timing,
intensity, and pitch patterns in accordance with the given demands for each moment of
the performance. In other words, musicians typically plan their actions by assessing
higher-order representations of the music and, at the same time, by incorporating the
auditory feedback of the sounds they produce (Pfordresher 2006). Keller (2012) termed
this skill “anticipatory auditory imagery” (209), which is beneficial for the planning and
execution of actions, allowing for more precise timing and economical control of the
performance movements. Musicians have an awareness of the sounds they are about to
anticipated sonic actions and sounds in performance 45

produce. In this way, music performance can be seen as an act of knowing or sensing
what comes next. Before producing a certain action, performers internally anticipate
the sonic outcome. There are two internal models that describe the processes of action
anticipation: a forward model (being aware that a motor action leads to a sensory expe-
rience), and an inverse model (focusing on the desired outcome of an action affects
motor behavior)—both models run a short time before action execution (Keller 2012;
Rauschecker 2011). Experienced musicians are thus able to anticipate and control the
movements they need to execute in order to reach a desired sound quality and to adjust
the force necessary for producing the sounds during play. They do not only respond
to the outcome of their own actions, but also internally imagine the sonic actions of
co-performers in an ensemble (cf., Sevdalis and Keller 2014).
Several experimental studies investigated the musicians’ anticipatory imagery that
guides their actions and the related sound outcomes. In a tapping task, Keller and
colleagues (2010) compared the timing accuracy in conditions without sound as well as
with compatible or incompatible sounds. In the silent condition, the musicians’ timing
matched the given target tempo best, and the accelerations of their finger trajectories
before tapping were highest. Thus, auditory imagery clearly aided the musicians in pla-
nning their actions and timing their tapping movements. Action anticipation was also
investigated in a study by Bishop and colleagues (2013). Twenty-nine pianists of varying
degrees of expertise were asked to play the right-hand part of relatively unknown and
simple piano compositions. After practicing the piece, some of the experimental condi-
tions included actually playing the music while, in other performance conditions, no
acoustic feedback or no auditory plus motor feedback was provided (hence, in the latter
condition, they imagined playing through the compositions in their minds). As expected,
experienced pianists produced fewer pitch and timing errors than less experienced ones
in the conditions without auditory feedback, suggesting that they had vivid imagery skills
and more stable action anticipation (cf., also studies reviewed above). Furthermore, at
several points, dynamic and articulation markings occurred in the scores on a digital visual
screen that had not been present when practicing the pieces. During performances with
disrupted feedback and during imagined performances, pianists were asked whether
the newly introduced markings matched their own expressive intentions at the specific
moments. For instance, a crescendo marking appeared and pianists said “yes” if it
matched their own idea or “no” if this was not the case. As a result, they responded more
quickly than would have been expected if they had waited for the auditory feedback.
These findings suggest that pianists had access to anticipatory imagery that aided them
in performing and imagining the music without feedback. The findings further indicate
that there are relatively stable performance plans in terms of articulation and dynamics,
so that the music was vividly played “in the mind’s ear.”
Research on agency provides further evidence for the claim that musicians construct
stable representations of their actions, allowing them to imagine their own play before
carrying out an action. Agency is the awareness that actions are produced by oneself,
in other words, that someone feels an authorship of their actions (Jeannerod 2003;
Synofzik et al. 2008). Auditory feedback is highly informative for action identification,
46 clemens wöllner

since a number of actions are typically accompanied by sounds—most prominently in

music making but also in other everyday actions including walking or clapping. Therefore,
perceiving the sounds of an action may stimulate the motor systems associated with the
action (for a review, see Aglioti and Pazzaglia 2010). Neuroimaging research showed
that perceiving sounds may activate a network not only of auditory centers but also
of motor brain areas. Bangert and colleagues (2006) carried out an fMRI study in which
seven pianists and seven nonmusicians listened to short piano melodies. In the pianists, a
sensorimotor integration network was active, including the supplementary motor and
premotor areas as well as the auditory Broca and Wernicke areas, indicating that listen-
ing to the piano sounds stimulated action-specific auditory and motor representations
in their brains.
Experienced musicians are well aware of their own musical actions when presented
with the sound of previously recorded performances. In a study of musical self-recognition
(Repp and Knoblich 2004), twelve advanced pianists played short, unfamiliar compo-
sitions, half of them without auditory feedback. Two months later, recordings of
the performances were played to the pianists, and they successfully identified their own
performances. Recognition scores were not affected by whether or not auditory feed-
back had been provided in the recording sessions. In follow-up tests, the recordings
were manipulated such that nuances in the dynamics or differences in the overall tempo
were removed. Pianists still identified their own performances out of the twelve per-
formances, suggesting that the remaining information regarding the expressive microtim-
ing and articulation was sufficient for the recognition of self-generated actions. These
results point to the prominence of timing characteristics for auditory representations
of one’s own sonic actions and self-other distinctions. While timing has been shown
to be crucial even for recognizing simple auditory stimuli (for a review, see Sevdalis
and Keller 2014), it might well be that for other, nonkeyboard instruments such as
the clarinet or flute, timbral qualities contribute more strongly to a sense of agency.
Future research could systematically control for these factors, such as in research on
visual self-identification (Knoblich and Prinz 2001).
In addition to auditory characteristics, visual performance cues can also be highly
informative for recognizing one’s musical actions. In a study with twelve orchestral con-
ductors, self-other identification of point-light displays was investigated (Wöllner 2012).
In a performance session, each of the conductors directed a string quintet performing
a Mendelssohn string symphony while markers at relevant parts of their bodies were
recorded with a motion-capture system. In a subsequent perception session, conductors
watched visual, auditory, and audiovisual sequences of their own and others’ conduct-
ing performances. They successfully identified their own performance in the visual and
audiovisual conditions, while judgments of the auditory-only version were not above
chance level. Since the sequences were relatively short (ca. 6–11 s), and the conductors
did not produce the sounds themselves, their action representations depended more on
the visual information. This interpretation is supported by visual ratings of the conduct-
ing quality, which were highest for their own gestures, independently of whether or
not they had successfully identified their own performances. Cues for self-recognition
anticipated sonic actions and sounds in performance 47

might therefore include a number of different sources and modalities that resonate with
the perceiver’s motor system (cf., Alaeerts et al. 2009).
An even greater challenge for musicians is to imagine the sonic actions of others
while performing themselves at the same time, thus integrating self and other imagined
actions as well as self auditory-motor feedback and other auditory feedback. Visual
information guides action anticipation in an ensemble. Keller and Appel (2010) inves-
tigated the relation between anticipatory imagery and dyadic synchronization. Seven
pairs of pianists performed duets of the classical piano literature together on MIDI
instruments while seeing or not seeing each other. Their body motion was recorded
with an optical motion capture system. In a second session, each pianist’s anticipatory
auditory imagery was tested with a tapping paradigm that included auditory feedback of
marimba tones or no auditory feedback. As a result, anticipatory imagery scores, calcu-
lated for each pianist separately, were correlated with average duo synchrony, suggesting
that those pianists with good imagery skills were more successful at timing their ensem-
ble play. While being in visual contact did not markedly affect results, lags in anterior-
posterior body sway between duo partners were related to synchrony, indicating that
the pianists also timed their performances via body motion. Alternatively, their body
motion might have functioned as individual time-keeping support. In this study, pianists
not only had to anticipate their own actions but also had to imagine the co-performer’s
sonic actions. Keller and Appel suggest that inverse internal models (see above) were
run slightly before producing the actions. Anticipatory auditory imagery should consist
of imagining the sounds of oneself and the other performer, and both are then transferred
to adequate motor commands. In duo and ensemble performance, internal models are
thus coupled by simulating the other’s actions (cf., Gallese and Goldman 1998).
Taken together, timing information seem most important for an awareness of actions
and self-other distinctions. Evidence for the paramount importance of timing has also
been provided by studies outside the domain of music (e.g., Knoblich and Prinz 2001).
Most research on auditory action identification and sonic imagination investigated
simple acoustical stimuli or piano performances, in which the variety of sound qualities
is rather limited as compared to string or wind instruments, or the human voice. More
research is needed that addresses specific timbral qualities in the anticipatory imagi-
nation of the sounds to be produced by oneself and other performers.

Sonic Actions in Controller-Driven

Performances of Digital Music

Anticipation and auditory imagery of sonic actions are vital in performance areas that
do not rely on the haptic feedback of traditional instruments. A growing field in per-
formance practice employs sensors and controller systems that allow for the shaping of
sounds in new forms. The bodily movements of the performers are translated to sound
48 clemens wöllner

signals, so that their musical gestures become sonic actions in a somewhat direct and, in
the eyes of observers, apparently unmediated way. Producing sounds by human actions
“in the air,” without the haptic feedback of physical instruments, has fascinated per-
formers and the audience for a long time even before modern computing technology.
One of the first such instruments is the Theremin, invented in 1920, in which the spatial
positions of the two hands control volume and pitch (see Theremin 1996). The funda-
mentals of the Theremin are two metal capacitors that function as proximity sensors in
the near field for the above two performance dimensions. A boost in sensor-based music
performances has coincided with the greater availability of electronics and software
solutions since the 1980s. There are various developments in the field, including tech-
nology for digitally augmenting the sound options of acoustical instruments (which are
fundamentally still based on the playing techniques of these instruments but involve
additional electronic sounds), or controllers that turn a variety of different information
including brainwaves or human motion into sound (cf., Hugill 2012). In the following,
central conceptual issues and examples of performance systems that focus on purely
gesture-based or “open-air” controllers and their consequences for action-sound map-
pings and anticipatory imagination are discussed. At the center of the discussion will be
two types of systems that enable experiences of sonic agency for performers and audi-
ence: the conductor’s jacket and data gloves.

Imagined Bodily Causes of Sounds

Practitioners and theorists point to an important difference between instrumental
performances and the controlling of live electronics by human gestures: compared to
traditional physical instruments, the mapping between the action and the sound out-
come is typically ambiguous when interacting with gesture-based instruments. While,
in the former, a finger movement, for instance, leads to a certain sound, and parameters
such as movement velocity shape the intensity and timbre, there are hardly any limits
for mapping gestures to sound qualities in sensor-based systems (cf., Hunt et al. 2003;
Caramiaux et al. 2014). Regardless of the system or transmitter that is used, from ultra-
sound to infrared or video-based capturing, there are manifold ways to align and
synthesize the sounds. MIDI and other interfacing protocols are often employed in
gesture-driven performances that manage hardware or software sound generators,
including digital instrument libraries or synthesizers on external devices. This leads to a
great freedom and range of mapping options (Miranda and Wanderley 2006). In this
way, developers of the systems need to decide which type of gestures may control which
sound characteristics, so that the performers can shape and vary their gestures intuitively.
Developers also need to consider how gestures are segmented into different units or
chunks, in so far as different sound categories are used.
Van Nort and colleagues (2014, 7) discuss three concepts of mapping: a “systems”
perspective, which describes the alignment between sound parameters and the controller;
a “functional” perspective that defines the combination of variables and computational
anticipated sonic actions and sounds in performance 49

operations used, for instance, in sound synthesis; and a “perceptual” perspective,

addressing the observable linkage between a human gesture and the sound outcome.
The perceptual perspective is particularly relevant for notions of intentionality, such
that a gesture and the corresponding sounds are perceived as being intentional and
meaningful. Thus, also with regard to the audience, the mappings between gesture and
sound need to be carefully considered since, in a live performance, the audience expects
to perceive the performer as an originator of sonic actions. While the audience knows
that a computer produces the actual sounds, they search for mappings between movement
and sound so that the sonic action appears “real” to them—even if they are aware that a
gestural origination of the sound itself is imagined rather than real. Following Chion’s
(1983) concept of “causal listening,” the audience aims at gathering more information on
the actions that lead to a certain sound and closely link perception with the corresponding
actions, for which they may use their own motor imagery (cf., Caramiaux et al. 2014;
Gallese and Goldman 1998). Electronic sounds, rendered from loudspeakers that are
typically placed at different spatial positions away from the performer, should, in this
regard, still inform the audience about the original actions, and acoustical information
should be panned in a way so as to match the visual performance information. The
impression of action-sound fusion is furthermore particularly strong if the audience
perceives gestural actions as chunks with certain “goal-points” (Godøy 2010, 121) that
coincide with musical changes, such as accents for quick downward movements or
changes in timbre along with peaks in movement position or acceleration.
Expectations that sounds do signify actions in human-controller performances lead
to a second issue: performers typically have no haptic feedback in empty-handed ges-
tures performed “in the air.” Thus, for controlling their bodily movements, they rely on
a limited set of auditory, and sometimes visual, feedback. Rovan and Hayward (2000)
describe the performance process in controller-based music as follows: the performer’s
intention initiates a certain gesture, which in turn is gauged by proprioception and
vision—hence, without haptic or tactile feedback. Following this, the sound outcome
may lead to adjustments of the gestures based on auditory as well as further visual and
proprioceptive information, ultimately leading to adjustments of the intentions if
necessary. This process is clearly different from the close feedback loop in traditional
instrumental performance, which typically allows for a more precise motor control
based on haptic feedback. There are certain cases in which temporal latencies are
apparent in gesture-driven performances because of the chain of calculations and
processes before the actual sound is produced (McPherson et al. 2016). The audience
and the performer alike may then perceive a feeling of “disembodiment,” so that the
sounds appear rather artificial and not related to the human actions—in short, they are
not imagined as proper sonic actions. Overcoming latency issues with more robust
computing as well as placing the loudspeakers in close proximity to the performance
spot, or creating phantom images of the sounds by use of sound wave synthesis or
multichannel stereo where appropriate, can enhance the perceived correspondence
between gesture and sound, that is, an imaginary fusion of perceived physical pres-
ence and sound location.
50 clemens wöllner

The disembodiment problem is also apparent when listeners perform actions in the
air to accompany music, either to follow the melodic lines and the musical structure
(e.g., Hohagen and Wöllner 2015) or by playing “air instruments” (Godøy et al. 2006;
cf., Jensenius et al. 2010; Visi et al. 2016). When mimicking or imitating the sound-
producing gestures, listeners may still imagine a link between their actions and the
sound of the original performance, and the motor involvement might augment their
auditory experience. Therefore, it can be assumed that audience members are prone to
assign visual gestures to performance sounds (see also Behne and Wöllner 2011), even if
they are aware that there is no direct mapping between the two.

Jackets and Gloves: Bespoke Mappings between

Gestures and Sound
Gestural performances typically require specific training before satisfactory results are
achieved. For example, in order to control the nuances in pitch and dynamics of an
aesthetically pleasing vibrato in a Theremin performance, some experience is needed
that goes beyond intuitive approaches. There is one traditional musical profession spe-
cialized in gestures without an object to be touched (or vocal chords to be stretched),
which is musical conducting. Conductors are experienced in shaping the orchestra’s
sound by their hand and arm movements, and they also need to imagine the musical
sounds before the musicians actually perform them on their instruments. In other words,
they use their anticipatory imagery to transmit their intentions to the musicians by
means of gestures. For these reasons, one could assume that conductors should also be
particularly skilled in performing controller-driven music.
Nakra (2000, 2002) developed the conductor’s jacket together with Tod Machover
and others at MIT. The jacket is a sensory interface that allows for a wide range of
different expressive movements in real-time performances. For the construction of the
jacket, the performance data of six professional conductors were analyzed, including
their muscle activity, respiration patterns, heart rate, galvanic skin response, and body
temperature, as well as changes in the arms’ spatial position via motion markers.
These data were then compared with the visual information and the musical structure.
The next step was to use the data and, partly intuitively, implement them in music
synthesis software. For example, the movements of the right arm indicated beat pat-
terns and those of the left arm controlled expressive qualities in accordance with long-
established approaches to conducting (cf., Wöllner and Auhagen 2008). In addition
to these e lements based on position and acceleration data, changes in the muscle tension
influenced the dynamics, whereas breathing patterns had an impact on phrasing. In
the final version of the conductor’s jacket, electromyographic (EMG) sensors and
a device for the measurement of breathing patterns were integrated. The software
converted all sensory information into MIDI data (Rogus MIDI library) in real time,
and employed filters for the final sound output. In this way, the conducting gestures
anticipated sonic actions and sounds in performance 51

were transferred into musical parameters such as sound onset and duration, tempo,
articulation, and dynamics.
Although the system was based on the movement patterns of actual conductors,
Nakra (2002) saw limits when used for the training of conductors, which requires more
complex interactions between musicians and conductor. It should thus instead be used
as a new, stand-alone instrument that allows different sounds to be produced, and for
which specific pieces should be written or arranged. Among the pieces composed
for the conductor’s jacket, Etude 2 (Nakra 2000) used algorithms in which the EMG
signal of the right biceps alone controlled pitch, volume, and timbre at the same time.
The more the muscle contracted, the more the pitch height and the intensity level
increased. In addition, the sound spectrum was altered such that the timbre appeared to
be brighter. When the contractions of the biceps were overtly shown by arm movements,
direct mappings between gesture and sound qualities became immediately apparent
for the audience and the performer alike. In contrast, traditional conducting involves
sometimes rather small gestures or blinks of an eye that can lead to large effects such as
an entrance of the whole orchestra—moments in which the audience may doubt that
the conductor “evokes” the sound.
Besides the conductor’s jacket, there are several other performance systems using
EMG sensors (see, among others, Donnarumma and Tanaka 2014; Nymoen et al. 2015).
A large number of gesture-driven controllers employ motion-capture technology to
direct digital instruments (e.g., Dobrian and Bevilacqua’s Motion Capture Music 2003).
Among the commercial systems most widely used in various applications and per-
formance art is the Kinect system, which was developed for Microsoft’s Xbox game con-
sole. The bodily motion of one or more performers can be tracked with video technology
using an RGB color camera and a depth sensor with infrared lighting, therefore no
markers are necessary as in other motion-capture systems. The advantage of the system
clearly lies in its usability, whereas caveats include the somewhat arbitrary modeling of
the body and the latency, resulting in less precise results compared to multiple-camera
motion-capture systems. Still, it is possible to capture some position and motion data
of performers and to map them to sounds. Various toolkits have been developed for
Kinect, one of the first action-to-sound applications among them is Crossole (Sentürk
et al. 2012). In this digital instrument, virtual building blocks are visualized in a way that
a performer may control them, including chord structure, arpeggios, and timbre. The
musical style is limited to harmonies of the Western classical tradition, while the tim-
bre control consists of various sound effects including delays and filters. In addition,
the contour of the melodic lines can be drawn by gestures.
While artistic approaches such as Crossole strongly rely on visual representations of
preset spatial positions for controlling the musical output, other systems allow for more
flexibility in shaping the sound qualities. For example, data gloves capture the fine-tuned
motion of the hands by use of markers or accelerometers. One of the earliest data gloves
gained information from optical finger-flex sensors and position sensors in order to train
a neural network of mappings between hand gestures and a vocabulary of 203 words
(Fels and Hinton 1993). Mapping results after training were astonishingly high, and later
52 clemens wöllner

Acceleration Sensor Y Z
X
Strain Sensor

Strain Sensor
(inner surface of the wrist)

Figure 2.2 The Musicglove (Hayafuchi and Suzuki 2008, 242).

versions were also used for music performances (Fels et al. 2002), in which the audience
should perceive metaphorical relationships between expressive gestures and sound.
A different set of gloves were constructed by Laetitia Sonami and collaborators. The
first version of their Lady’s Glove was invented in 1991, combining several transducers at
the fingers and a magnet worn on the other hand to control synthesizers via MIDI. Later
versions, mapped to MAX-MSP, allowed for more flexible sound control and intuitive
gesture-sound mappings (Rodgers 2010). Hayafuchi and Suzuki (2008) developed the
Musicglove (Figure 2.2), a device with accelerometers and strain sensors that meas-
ure the bending of the wrist and some fingers. A number of set gestures control the
musical tracks: for instance, making a fist pauses the music. Overall acceleration is
mapped to the tempo and vertical acceleration more specifically to the tempo of the
beats. Therefore, the sound of preexisting music can be controlled with the gestures.
One potential application includes the control of dance music by a DJ, where the ges-
tures could even be more convincing for an audience when compared to conventional
controlling devices such as a laptop or a turntable.
Perhaps the expressive performances of Imogen Heap made controllers, and data
gloves in particular, more popular with audiences even of popular music genres. The
Mi.Mu Gloves (see Mitchell and Heap 2011) contain bend sensors, accelerometers, and a
gyroscope as well as tactile feedback to the performer via vibrations. A large number of
preset audio control options are possible that can be incorporated in various performance
genres including dance, in which the mapping between human motion and sounds may
seem particularly impressive.

Conclusions

Musical performers in a wide range of genres benefit from vivid auditory and motor
imagery. Being able to perform the music in their minds’ ears enables performers to
anticipate their own and other musicians’ sonic actions and to shape the sound quality.
In the absence of sound or in situations with altered or delayed auditory feedback, musi-
cians depend on imagery skills to control their bodily performances. In digital music
performance, with its numerous features for shaping and combining sounds, precise
imagery clearly helps in reaching the desired sound outcome. Even more, controller-driven
anticipated sonic actions and sounds in performance 53

performances often fascinate the audience if a close mapping between gesture and
sound—an apparent fusion—is achieved. In these cases, the audience imagines the sound
to originate in the performer’s movements as sonic actions.
The anticipatory processes in auditory imagery and the level of detail of imagined
sound qualities, however, remain areas for further inquiry. Spectral components of
the sound quality are used to distinguish between performances; they may characterize
a performer’s “fingerprint,” that is his or her individual approach to sound (Bernays and
Traube 2014). At the same time, timbral qualities still appear rather elusive in descriptive
systems. In most empirical studies of music performances, microtiming information
sufficed for distinctions between sonic actions. In addition to timing, the role of timbre
in anticipatory imagery and the shaping of sounds are particularly important for
instruments and genres that rely more strongly on nuances in sound qualities, and thus
deserve further study.
Imagery skills vary among musicians according to their background and expertise,
depending on whether or not the music is played by heart, in ensembles with or without
notation, or in improvisations. Research summarized in this chapter has shown that
pianists who typically play their repertoire in a memorized way often develop par
ticularly vivid auditory imagery. Their performances are not interrupted if the sound
cannot be heard, for instance when it is switched off at MIDI pianos. Yet, for all musical
performers, it is paramount to imagine sonic events as an outcome of their own
actions. They need high degrees of vivid auditory imagery and motor awareness in
order to fine-tune the desired sound qualities in a performance and to adjust their
play if necessary, based on an auditory and motor error-feedback system. In this regard,
research on action–perception coupling indicated that musicians have a strong sense of
agency for their sonic actions. The neural processes underpinning imagery and per-
ception are fundamentally similar, and both may activate overlapping action networks
in performers, such that even listening to the sounds of music performances resonates
with the performer’s action systems.
Being aware of one’s own sonic actions, and vivid auditory imagery skills, occasion-
ally appears even more pertinent to musicians than hearing the actual acoustical sounds.
Apart from the widespread consumption of heavily compressed audio files, anecdotal
evidence suggests that experienced musicians are able to ignore shortcomings of sound
recordings and renditions; some of them may even not need high-fidelity sound systems
and can still appreciate the music to some extent by compensating for, and imagining,
the missing sound features. Similarly for performing, as discussed earlier, imagery can
become more important than actual sonic feedback during playing, especially in the
absence or alteration of sound. The relative dependencies of performers on imagery or
feedback remain a topic for more research: in other words, whether musicians primarily
concentrate on the imagined sounds during the act of performing as an intended action
outcome, or whether they rely more strongly on a variety of different feedback modali-
ties, including kinesthetic and visual components (cf., the internal models discussed
earlier). While performance plans may become more stable if they draw on multimodal
sensory stimuli, focusing one’s attention only on the sound as an external action goal,
54 clemens wöllner

thus concentrating less on internal bodily processes, has been shown to be efficient in
various fields (for an overview, see Wulf 2007).
The anticipation of sonic actions, as stated earlier, might rely solely on imagined
sounds that are to be produced. These processes appear to be particularly vital in gestural
performances of live electronics, in which the multisensory feedback loop of acoustical
instruments is often absent. Real-time processing, spatial placement and sound source
control are central for close mappings and perceived fusion in experiences of human
sonic actions. New developments in performance interfaces address these issues by
modifying gesture-to-sound mappings that offer captivating links between bodily actions
and sounds as an intentional, meaningful process for performers and for the perceptions
and imaginations of audiences.

References
Aglioti, S. M., and M. Pazzaglia. 2010. Representing Actions through Their Sound. Experimental
Brain Research 206: 141–151.
Agnew, M. 1922. The Auditory Imagery of Great Composers. Psychological Monographs 31:
279–287.
Alaeerts, K., S. P. Swinnen, and N. Wenderoth. 2009. Interaction of Sound and Sight during
Action Perception: Evidence for Shared Modality-Dependent Action Representations.
Neuropsychologia 47: 2593–2599.
Bailes, F. 2007. Timbre as an Elusive Component of Imagery for Music. Empirical Musicology
Review 2: 21–34.
Bangert, M., T. Peschel, G. Schlaug, M. Rotte, D. Drescher, H. Hinrichs, et al. 2006. Shared
Networks for Auditory and Motor Processing in Professional Pianists: Evidence from fMRI
Conjunction. NeuroImage 30: 917–926.
Banton, L. J. 1995. The Role of Visual and Auditory Feedback during the Sight-Reading of
Music. Psychology of Music 23: 3–16.
Behne, K.-E., and C. Wöllner. 2011. Seeing or Hearing the Pianists? A Synopsis of an Early
Audiovisual Perception Experiment and a Replication. Musicae Scientiae 15: 324–342.
Bernays, M., and C. Traube. 2014. Investigating Pianists’ Individuality in the Performance of
Five Timbral Nuances through Patterns of Articulation, Touch, Dynamics, and Pedaling.
Frontiers of Psychology 5: 157.
Bishop, L., F. Bailes, and R. T. Dean. 2013. Musical Imagery and the Planning of Dynamics and
Articulation during Performance. Music Perception 31: 97–117.
Brereton, J. 2017. Music Perception and Performance in Virtual Acoustic Spaces. In Body,
Sound and Space in Music and Beyond: Multimodal Explorations, edited by C. Wöllner,
211–234. Abingdon, UK: Routledge.
Brown, R. M., and C. Palmer. 2013. Auditory and Motor Imagery Modulate Learning in Music
Performance. Frontiers in Human Neuroscience 7: 320.
Caramiaux, J. F., N. Schnell, and F. Bevilacqua. 2014. Mapping through Listening. Computer
Music Journal 38: 34–48.
Chion, M. 1983. Guide des objets sonores: Pierre Schaeffer et la recherche musicale. Paris:
Buchet/Chastel.
Clark, T., and A. Williamon. 2011. Evaluation of a Mental Skills Training Program for
Musicians. Journal of Applied Sport Psychology 23: 342–359.
anticipated sonic actions and sounds in performance 55

Cocks, M., C.-A. Moulton, S. Luu, and T. Cil. 2014. What Surgeons Can Learn from Athletes:
Mental Practice in Sports and Surgery. Journal of Surgical Education 71: 262–269.
Connolly, C., and A. Williamon. 2004. Mental Skills Training. In Musical Excellence: Strategies
and Techniques to Enhance Performance, edited by A. Williamon, 221–245. Oxford: Oxford
University Press.
Daselaar, S. M., Y. Porat, W. Huijbers, and C. M. Pennartz. 2010. Modality-Specific and
Modality-Independent Components of the Human Imagery System. Neuroimage 52:
677–685.
Dobrian, C., and F. Bevilacqua. 2003. Gestural Control of Music using the Vicon Motion
Capture System. In Proceedings of the New Interfaces for Musical Expression Conference,
161–163. May 22–24, 2003, Montréal, Quebec, Canada.
Donnarumma, M., and A. Tanaka. 2014. Principles, Challenges and Future Directions of
Physiological Computing for the Physical Performance of Digital Musical Instruments. In
Proceedings of the 9th Conference on Interdisciplinary Musicology (CIM14), edited by T. Klouche
and E. R. Miranda, 363–368. Berlin, Germany: Staatliches Institut für Musikforschung.
Driskell, J. E., C. Copper, and A. Moran. 1994. Does Mental Practice Enhance Performance?
Journal of Applied Psychology 97: 481–492.
Fels, S. S., A. Gadd, and A. Mulder. 2002. Mapping Transparency through Metaphor: Towards
More Expressive Musical Instruments. Organised Sound 7 (2): 109–126.
Fels, S. S., and G. E. Hinton. 1993. Glove-Talk: A Neural Network Interface between a Data-
Glove and a Speech Synthesizer. IEEE Transactions on Neural Networks 4 (1): 2–8.
Finney, S. A. 1997. Auditory Feedback and Musical Keyboard Performance. Music Perception
15: 153–174.
Gallese, V., and A. Goldman. 1998. Mirror Neurons and the Simulation Theory of Mind-
Reading. Trends in Cognitive Sciences 2: 493–501.
Gjerdingen, R. O., and D. Perrott. 2008. Scanning the Dial: The Rapid Recognition of Music
Genres. Journal of New Music Research 37 (2): 93–100.
Godøy, R. I. 2010. Gestural Affordances of Musical Sound. In Musical Gestures: Sound,
Movement, and Meaning, edited by R. I. Godøy and M. Leman, 103–125. New York:
Routledge.
Godøy, R. I., E. Haga, and A. R. Jensenius. 2006. Playing “Air Instruments”: Mimicry of
Sound-Producing Gestures by Novices and Experts. In Gesture in Human-Computer
Interaction and Simulation: 6th International Gesture Workshop, edited by S. Gibet, N. Courty,
and J.-F. Kamp, 256–267. Berlin: Springer.
Grey, J. M. 1975. An Exploration of Musical Timbre. PhD thesis, Department of Psychology,
Stanford University.
Halpern, A. R., R. J. Zatorre, M. Bouffard, and J. A. Johnson. 2004. Behavioral and Neural
Correlates of Perceived and Imagined Musical Timbre. Neuropsychologia 42: 1281–1292.
Hayafuchi, K., and K. Suzuki. 2008. Musicglove: A Wearable Musical Controller for Massive
Media Library. In Proceedings of the International Conference on New Interfaces for
Musical Expression (NIME) 8, 241–244. 5–7 June 2008, Genova, Italy.
Herholz, S. C., C. Lappe, A. Knief, and C. Pantev. 2008. Neural Basis of Musical Imagery and
the Effect of Musical Expertise. European Journal of Neuroscience 28: 2352–2360.
Highben, Z., and C. Palmer. 2004. Effects of Auditory and Motor Mental Practice in
Memorized Piano Performance. Bulletin of the Council for Research in Music Education
159: 58–65.
Hohagen, J., and C. Wöllner. 2015. Self-Other Judgements of Sonified Movements: Investigating
Truslit’s Musical Gestures. In Proceedings of the Ninth Triennial Conference of the European
56 clemens wöllner

Society for the Cognitive Sciences of Music, August 17–22, Royal Northern College of
Music, Manchester.
Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136: 302–329.
Hugill, A. 2012. The Digital Musician. 2nd ed. New York: Routledge.
Hunt, A., M. M. Wanderley, and M. Paradis. 2003. The Importance of Parameter Mapping in
Electronic Instrument Design. Journal of New Music Research 32: 429–440.
Janata, P., and K. Paroo. 2006. Acuity of Auditory Images in Pitch and Time. Perception and
Psychophysics 68: 829–844.
Jeannerod, M. 2003. The Mechanism of Self-Recognition in Humans. Behavioural Brain
Research 142: 1–15.
Jensenius, A. R., M. M. Wanderley, R. I. Godøy, and M. Leman. 2010. Concepts and Methods
in Research on Music-Related Gestures. In Musical Gestures: Sound, Movement, and
Meaning, edited by R. I. Godøy and M. Leman, 12–35. New York: Routledge.
Kalakoski, V. 2001. Musical Imagery and Working Memory. In Musical Imagery, edited by
R. I. Godøy and H. Jørgensen, 43–55. Lisse, the Netherlands: Swets and Zeitlinger.
Keller, P. E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and
Potential Benefits. Annals of the New York Academy of Sciences 1252: 206–213.
Keller, P. E., and M. Appel. 2010. Individual Differences, Auditory Imagery, and the Coordination
of Body Movements and Sounds in Musical Ensembles. Music Perception 28: 27–46.
Keller, P. E., S. Dalla Bella, and I. Koch. 2010. Auditory Imagery Shapes Movement Timing
and Kinematics: Evidence from a Musical Task. Journal of Experimental Psychology: Human
Perception and Performance 36: 508–513.
Knoblich, G., and W. Prinz. 2001. Recognition of Self-Generated Actions from Kinematic
Displays of Drawing. Journal of Experimental Psychology: Human Perception and Performance
27: 456–465.
Krumhansl, C. 1989. Why Is Musical Timbre So Hard to Understand? In Structure and
Perception of Electroacoustic Sound and Music, edited by S. Nielzen and O. Olsson, 43–53.
Amsterdam: Elsevier.
Krumhansl, C. L. 2010. Plink: “Thin slices” of Music. Music Perception 27: 337–354.
McAdams, S., S. Winsberg, S. Donnadieu, G. de Soete, and J. Krimphoff. 1995. Perceptual
Scaling of Synthesized Musical Timbres: Common Dimensions, Specificities, and Latent
Subject Classes. Psychological Research 58 (3): 177–192.
McPherson, A., R. H. Jack, and G. Moro. 2016. Action-Sound Latency: Are Our Tools Fast
Enough? In Proceedings of the International Conference on New Interfaces for Musical
Expression, 12–14 July 2016, Brisbane, Australia.
Miranda, E. R., and M. M. Wanderley. 2006. New Digital Musical Instruments: Control and
Interaction Beyond the Keyboard. Madison, WI: A-R Editions.
Mitchell, T. J., and I. Heap. 2011. SoundGrasp: A Gestural Interface for the Performance of Live
Music. In Proceedings of the International Conference on New Interfaces for Musical
Expression (NIME), 465–468. 30 May–1 June 2011, Oslo, Norway.
Nakra, T. M. 2000. Inside the Conductor’s Jacket: Analysis, Interpretation and Musical
Synthesis of Expressive Gesture. PhD thesis, MIT.
Nakra, T. M. 2002. Synthesizing Expressive Music through the Language of Conducting.
Journal of New Music Research 31: 11–26.
Nymoen, K. H., M. R. Haugen, and A. R. Jensenius. 2015. MuMYO: Evaluating and Exploring
the MYO Armband for Musical Interaction. In Proceedings of the International Conference
on New Interfaces for Musical Expression, edited by E. Berdahl. Baton Rouge: Louisiana
State University.
anticipated sonic actions and sounds in performance 57

Pfordresher, P. Q. 2006. Coordination of Perception and Action in Music Performance.

Advances in Cognitive Psychology 2: 183–198.
Rauschecker, J. P. 2011. An Expanded Role for the Dorsal Auditory Pathway in Sensorimotor
Control and Integration. Hearing Research 271: 16–25.
Repp, B. H., and G. Knoblich. 2004. Perceiving Action Identity: How Pianists Recognize Their
Own Performances. Psychological Science 15: 604–609.
Reuter, C., and S. Siddiq. 2017. The Colourful Life of Timbre Spaces: Timbre Concepts from
the Early Ideas to a Meta Timbre Space and Beyond. In Body, Sound and Space in Music and
Beyond: Multimodal Explorations, edited by C. Wöllner, 150–167. Abingdon, UK: Routledge.
Rodgers, T. 2010. Pink Noises: Women on Electronic Music and Sound. Durham, NC: Duke
University Press.
Rovan, J., and V. Hayward. 2000. Typology of Tactile Sounds and Their Synthesis in Gesture-
Driven Computer Music Performance. In Trends in Gestural Control of Music, edited by
M. Wanderley and M. Battier, 355–368. Paris: IRCAM.
Sentürk, S., S. W. Lee, A. Sastry, A. Daruwalla, and G. Weinberg. 2012. Crossole: A Gestural
Interface for Composition, Improvisation and Performance using Kinect. In Proceedings of
the International Conferences on New Interfaces for Musical Expression (NIME). 21–23
May 2012, University of Michigan, Ann Arbor.
Sevdalis, V., and P. E. Keller. 2014. Know Thy Sound: Perceiving Self and Others in Musical
Contexts. Acta Psychologica 152: 67–74.
Slawson, W. 1985. Sound Color. Berkeley: University of California Press.
Stockhausen, K. 1963. Musik und Raum. In Texte zur elektronischen und instrumentalen Musik,
Band I, 159–160. Cologne, Germany: DuMont.
Synofzik, M., G. Vosgerau, and A. Newen. 2008. I Move, Therefore I Am: A New Theoretical
Framework to Investigate Agency and Ownership. Consciousness and Cognition 17: 411–424.
Theremin, L. S. 1996. The Design of a Musical Instrument based on Cathode Relays. Leonardo
Music Journal 6: 49–50.
Van Nort, D., M. Wanderley, and P. Depalle. 2014. Mapping Control Structures for Sound
Synthesis: Functional and Topological Perspectives. Computer Music Journal 38: 6–22.
Visi, F., E. Coorevits, R. Schramm, and E. R. Miranda. 2016. Analysis of Mimed Violin
Performance Movements of Neophytes. In Music, Mind, and Embodiment: 11th International
Symposium on Computer Music Multidisciplinary Research, edited by R. Kronland-Martinet,
M. Aramaki, and S. Ystad, 88–108. Cham, Switzerland: Springer.
Wöllner, C. 2012. Self-Recognition of Highly Skilled Actions: A Study of Orchestral Conductors.
Consciousness and Cognition 21: 1311–1321.
Wöllner, C., and W. Auhagen. 2008. Perceiving Conductors’ Expressive Gestures from
Different Visual Perspectives: An Exploratory Continuous Response Study. Music Perception
26: 129–143.
Wöllner, C., and A. Williamon. 2007. An Exploratory Study of the Role of Performance Feedback
and Musical Imagery in Piano Playing. Research Studies in Music Education 29: 39–54.
Wulf, G. 2007. Attention and Motor Skill Learning. Champaign, IL: Human Kinetics.
chapter 3

Motor I m agery i n
Perception a n d
Per for m a nce of
Sou n d a n d M usic

Jan Schacher

Music exists at the intersection of organised sounds with our sensorimotor

apparatus, our bodies, our brains, our cultural values and practices, music-
historical conventions, our prior experiences, and a host of other social
and cultural factors. Consequently, musical motion is really experienced
by us, albeit via our imaginative structuring of sounds.
—(Johnson 2007, 255)

Introduction

Audition is one of our central senses, and listening is deeply integrated into how we
perceive and act in the world. In our perception, sound plays a central role for the
construction of a coherent, multimodal “world-view.” Hearing and listening are tied
together with all other senses in low-level, subpersonal, and prereflective relationships
that are established in interaction with others, with the goal of meaning generation
in higher cognitive functions. The central element used for making sense of acoustic
information and for interpreting sounds is the body with its indivisible being-in-the-world,
its capacity for action, and its role as substrate for cognitive processes. Sound comprises
any acoustic phenomenon we might perceive, whereas music is a constrained field, oper-
ating within combinations of sounds that are human-made and culturally coded. Sound
perception operates on numerous levels, from providing evolutionary survival cues,
to carrying core elements of interpersonal exchanges, to enabling culturally encoded
60 jan schacher

music appreciation. The manner in which elements of primary, low-level “enactive”

sound perception and listening form the central aspects that constitute the field of musi-
cal perception is to be the topic of this chapter. This situates the reflection between the
poles of sound and music, between the states of active and passive perception, and per-
formance of sound and music. The practice of music performance provides an interesting
case where these principles can be observed in action. When highly technical instru-
ments come into play, this relationship is put to the test, because physiological and
cognitive schemata are replaced by metaphors and representations of instruments. The
increasing pervasiveness of technology in music making and listening makes this issue
relevant and even urgent to understand.
The relationship between the body and music is, in a way, quite straightforward:
Performing music is a voluntary act and normally involves physical movement; per-
ceiving music on the other hand can be motionless and involuntary. These two modes are
never completely separable, they occur simultaneously to different degrees both for the
musician and the listener. That the performer also listens to what is played goes without
saying, but does the listener also perform what is heard? How does motionless aural
perception involve the body of the person partaking in a performance? And how does
the performer imagine and initiate musical actions with the body?
The perspectives taken in this chapter are influenced by the phenomenological
thinking of Merleau-Ponty (1945), who postulates an indivisible unity between body
and perception in the construction of the world. Based on this, the “enactive” position
claims that it is this link that produces cognition: “A living organism enacts the world
it lives in; its effective, embodied action in the world actually constitutes its perception
and thereby grounds cognition” (Stewart et al. 2010, vii). In other words, the fact that
perception is guided by action in continuously reinforcing loops that leave imprints in
the organism might be what produces cognitive structures (Varela et al. 1991, 173).
The topic of musical motor imagery represents a critical perspective on music making
by an individual, or “musicking” (Small 1999), in the social situation of a concert, by many.
Here the domains of evolutionary, psychological, cultural, and social significance
intersect and permit the linking and juxtaposition of current approaches in musicology,
psychology of music, and philosophies of mind. Motor imagery and imagination in music
as the focal points provide a lens through which to investigate the relationships between
the inner and outer aspects of listening to and performing of music and sound.
The elements of agency and intentionality, precognitive and conscious corporeal
perception, and explicit and covert imitation play a central role in this constellation.
They inform the way a musical action is effective on both performer and listener and to
where or what the sources of musical events are ascribed by the perceiver. They also
shape the role that sounds and sound-objects play in constituting a prereflective per-
ception of musical dynamics and expressive meaning as well as the sonic environment in
general. Imitation or simulation, as our body’s strategy to understand the other, extends
to sounds, to this invisible yet signaling domain. The mimicry involves not just the
sounding aspect itself, but also the sound-producing processes, that is, the actions that
produce the sound. Recognition is based on subpersonal and prereflective re-enacting
motor imagery in perception and performance 61

or simulation of the sound-originating event, in an approximation that uses all available

means, first of all with the voice but also with the other movement capabilities of the
body. In the culturally charged space of music, instruments and modes of sounding
with devices, materials, and processes are projected (imagined) and simulated internally
when not witnessed in actual performance. The body’s relation to the instrument, that is,
the sound-producing mechanism or object, as well as the subpersonal and precognitive
processes that are engaged in sound making, creates a site of intertwining, enfolding, and
charging that constitutes the basis for sound- and music-related imagery and imagination.
One of the aims of this chapter is to understand how the naturally acquired skills
and knowledge about corporeal and physical object-relationships come to bear in digital,
technological modes of music performance and perception. Interactions with digital
instruments that contain intangible sound production processes are based on the bodily
skills of acting on and with physical tools and instruments. The recognition or imagi-
nation of synthetic electronic sounds is dependent on our embodied and tacit knowledge
of sound sources in the natural world even when they have no real-world counterpart.
That is why knowledge of conventional instruments, of their sound, physics, and their
modes of playing is crucial for imagining musical sounds and represents an important
part of our cultural heritage.

Sound, Music, and the Body

In addition to being a natural phenomenon that can be explicated by physics’ wave

equations and psychoacoustics’ understanding of our perceptual apparatus, sound
carries a deeper meaning of what is “out there” or what is beyond our self. Since we hear
events and not sounds (Gaver 1993) as the invisible, enveloping presence of exterior
actions, it lets us witness in an immediate and fleeting, ephemeral manner the others, be
they natural, cultural, or subjective agents. Sound is transmitted through the medium of
air (or water, bones, or other materials), touches us physically, and with its energy sets our
body in resonance. Sound then touches us perceptually, providing us with involuntary
(affective) reactions, as well as with recognizable or novel impressions. We perceive
sound on subpersonal as well as conscious levels and through it we can be alarmed,
intrigued, informed; we reach out to investigate. Hearing and listening as modes of
(dis-)engagement are always mutually inclusive (except during sleep) and simul
taneously let us gauge, weight, and focus our own sonic perceptions and what it is we hear.
In culture, the role of sound is superseded by music and speech, which are means for
the active and conscious creation and structuring of sounds into units and combinations
of meaning. However, music is in no way decoupled from sound perception, from the
primary affective impact it has on the body and, through it, on our emotional and cogni-
tive appraisal. General cultural categories and practices, such as rituals (Turner 1982),
theatrical performances (Schechner 1977), and play (Huizinga 1955), separate sound from
music, make it subservient to the latter, in the same way as in music stories, language(s),
62 jan schacher

symbolic, and semantic elements subjugate the primary elements of timbre, pulse,
rhythm, consonance or dissonance, and the sonic spaces that are created with the use of
voices and instruments. Through its structuring movement and its relational meaning-
construction, music functions as an ordering principle and as a conveyor of fixed
(and therefore recognizable) sounding tropes and affects. Cultural practices—that is,
musical styles that avoid the tendency of fixing all elements—have the potential to
reflect and mirror back the perception of music to a primary sensorial experience of the
ephemeral, both through listening and through perceiving by other corporeal means,
such as kinesthesia or proprioception. Other cultural practices, such as popular and
club music, leverage fixity and use expectations (Huron 2006) and musical schemata
to access a mode of perception that need not reside primarily in listening, and rather
occurs through co-performing; for example, by dancing and similar modes of physical
partaking in the music.
In all these cases, the body’s characteristics and its capacities to resonate, re-enact and
“re-member” physically provide the foundation for the sonic experience. Even in phono-
graphically mediated music or sound experiences (Walther-Hansen 2012), the body
provides a point of reference—even if only through its absence in the perceptual field
when listening to a recording. And even sounds with clearly identified natural causes
that do not involve an active subject, be it human or nonhuman, such as water or wind
noises, provoke a physiological, affective response in the body.
Here again, the “enactive” position postulates that cognition is a function of being
bodily in the world, and that having a body is what enables us to develop experiential
structures that exhibit meaning (Noë 2004). The “autopoietic” position of embodied
action is stated clearly by Varela and colleagues:

By . . . embodied we mean to highlight two points: . . . cognition depends upon the

kinds of experience that come from having a body with various sensorimotor
capacities, . . . these individual sensorimotor capacities are themselves embedded in
a more encompassing biological, psychological, and cultural context. By . . . action
we mean to emphasize . . . that sensory and motor processes, perception and action,
are fundamentally inseparable in lived cognition. . . . the enactive approach consists
of two points: (1) perception consists in perceptually guided actions and (2) cognitive
structures emerge from recurrent sensorimotor patterns that enable action to be
perceptually guided. (Varela et al. 1991, 173)

Applied to my topic, it is fair to say that without the body’s materiality, the body’s ability
to produce sound itself, we would not be able to perceive, let alone identify and give
identity to sound, as far abstracted from its moment of production it might be, or how-
ever tenuous our perceptual link to it might be (Voegelin 2010, 82). The subjective identifi-
cation with sounds and their origins, through their presence in the same moment as
the perceiving subject, connects immediately but without fixity: it remains fluid and
fleeting, and needs to be (re-)activated continuously.
When perceiving events in sound, we perceive other bodies, agents, and actants
(Latour 2005) that are of the same kind and have the same capacities as our own body
motor imagery in perception and performance 63

and that the body only understands when they fit with a preexisting experience, a
predisposition to resemble, to be equivalent: a resonance. The body is the site of expe-
rience, the site of fusion between senses, perceptions, and memories, the site of cog-
nition. As a substrate and foundation, it carries cultural schemata of sounding, of
“eventing”1 (Ihde 2007, 109) with sounds in an affective, interpersonal, protolinguistic,
and even musical way. These schemata complement those of the body itself, of its imme-
diate learning and imprinting, and expose the potential to approximate relationships
between sound and event through other means.
Music performance encapsulates and charges sonic perception with cultural dimen-
sions, yet depends essentially on the perceptual and subpersonal capabilities of an
“enactive,” embodied intertwining with the sounding world, and on an experiential,
personal, and interindividual link to sound in a cultural context. In addition, personal
aspects of the performative construction of the self (Butler 1988) contribute the elements
of gender, age, social position, and other biographical factors that further color the act of
music performance and perception. The nature of musical processes is a dynamic flow,
not simply of time but also of elements constituted of bodily actions that produce
distinct sound impressions and carry immediate ecological meaning in a prereflective
corporeal domain, before rising to a protolinguistic, presemantic, or adaptive semantic
level (Reybrouck 2006). Within musical perception, the processes we are affected by,
perceive, and act out are made by dynamic chains of sound-objects as well as by action–
sound pairs or multimodal “gestural sonorous objects” (Godøy 2006). These elements
form “segregated streams and objects that lead, via the subjective sensing of the
subject’s body motion, to impressions of movement, gesture, tensions, and release of
tension” (Leman and Camurri 2006, 212–213). As musicians perform, they construct a
temporally unfolding stream of movement dynamics that the listener-viewer re-
enacts and co-performs through kinesthetic, corporeal resonances and higher-order
dynamic sensing. This state of active engagement is more akin to moving oneself than
to sounding within oneself.

Mimetic Motor Imagery

and the Sense of Agency

The field of music psychology investigates cognitive, neural, and behavioral aspects
of music perception and performance actions. Empirical studies address the
understanding of one’s own and others’ actions (Jeannerod 2006), for example, by
investigating self-recognition (Sevdalis and Keller 2010) and co-performing with
other musicians (Keller 2008).
In his mimetic motor imagery hypothesis, Cox brings together a number of elements
that demonstrate and anchor in empirical research how imitative, simulation-based
re-enacting of sound-producing movements lies at the core of music perception: “Motor
64 jan schacher

imagery is imagery related to the exertions and movements of our skeletal-motor

system, and in the case of music this involves the various exertions enacted in musical
performance” (Cox 2011). With his hypothesis, he attempts to show that, within the
embodied perspective of music perception, the corporeal sensorimotor apparatus and
its neurological equivalents form an indispensable foundation for perceiving, recog-
nizing, and conceptualizing musical events.
As with sound perception, mimetic motor imagery is based on the perception of an
event, rather than motion or sound in their raw form. These events carry with them
the physicality of their origin, are evidence of the actions that produced them, and can
be comprehended via overt and covert imitation through bodily representations of
observed actions in the same modality, in a cross-modal way, or even in an amodal,
metaphorical manner. The bodily mimetic responses allow for the representation of all
perceived acoustic features, albeit translated to those capabilities that are present in the
motor system (e.g., the voice or movements) (Cox 2001).
Giving an intentional perceptual focus (attention) to an ongoing event generates in
the body–mind an observational state that attempts to understand the unfolding event.
Through the recognition of intentions and goal-directedness the event gets ascribed
to an agent, be it human, biological, or an unknown entity. The primary reason for the
sensing subject to perceive agency is to understand the possible goals and implications
of the perceived action. In music perception, these goals need not be complex cultural
achievements, they can be relatively low-level, single units of meaning, such as a single
musical phrase, individual tone, or rhythm element, or the endpoint of a single gesture
trajectory. The identification of goals and goal-points within the action-perception loop
is an essential part of anticipatory perception, of the capability to project an external
willed action, in order to understand its finality and how it affects the perceiving subject.
In music, the temporal aspect of the perceived events is particularly important, since
meaning arises out of a continuous stream of events rather than simultaneous, momen-
tary perception of compact objects. This speaks to our

evident ability to have prospective as well as retrospective images of musical sound,

that is, as performer or improvisers we are able, “in a split second” to overview what
we have just played, what we are playing now, and what we will be playing in the
near future. (Godøy and Leman 2010, 121)

The understanding of goals is achieved by the comparison of established body-schematic

patterns against the perceived events, either in an identical modality or a surrogate one
(Smalley 1996), in order to generate a bodily impression or affective resonance through
imitation (Enticott et al. 2010; Montgomery et al. 2007; Lotze et al. 2006). “Individuals
recognize actions made by others because the neural pattern elicited in their premotor
areas during action observation is similar to that internally generated to produce that
action” (Rizzolatti and Arbib 1998, 190).
The re-enacting mechanisms are subpersonal, prereflective processes that represent a
way for the perceiving organism to inquire what certain actions and states feel like and
motor imagery in perception and performance 65

to obtain in response a body-schematic resonance that might lead to open mimicry

through movement (e.g., dancing or foot-tapping) or to inhibited forms of movement
participation through prereflective inner motor imagery alone (Rizzolatti 2012).

In effect, it is as if we are responding to an invitation to somehow imitate and to thus

take part. Accordingly, we can speak of the performing arts as offering a mimetic
invitation, and we can speak of our various responses as mimetic engagement or
mimetic participation, whether in the form of overt movement or in the privacy of
covert imagery. (Cox 2011, 2)

The basis for motor imagery is given by two elements that are crucial for the acting
subject: the kinesthetic memory (Sheets-Johnstone 2009) and physical experience of
the perceptual consequences of similar earlier actions, and the ability to trigger complex
motor- or body-schemata or kinetic melodies (Luria 1973), such ability providing an
appropriate action form for the intentional image (Bergson 1939). While the memory of
a prior experience is always linked to an actually executed action, the intentional image
can become a surrogate for those perceptions that an executed action would produce
(Annett 1996). Thus, in motor imagery processes, the necessary addition of location and
timing information to a pre-established motor schema alters it in such a way as to help
inhibit or suppress the execution of the actual movement patterns.
The perceptual linking of imagery to motor patterns functions in two reciprocal paths,
one serving a representative the other an operational role. They are not exclusively
coupled and are independent enough to allow for an inhibition of imitative actions
as reaction to movement or action perception (Berthoz 1997, 209), and permit the
projection of action in motor imagery only and without the need to be executed openly
(Reybrouck 2001). In addition to recognition and mimetic re-enacting for goal-under-
standing, the mechanism of motor imagery plays a crucial role in the preparation of
real actions (Glasersfeld 1996, 65) and the storage of memories of executed actions:
“motor action itself, in its prenoetic body-schematic performance, has the same tacit
and auto-affective structure that involves the retention of previous postures, and the
anticipation of future action” (Gallagher 2005, 204).
If, in the case of simulation, the impulse for the execution of an action is blocked on
the way from the cortex to the spinal cord (Decety and Chaminade 2003) then, in case
of execution, the efferent and afferent neural activation streams form a complete loop
that enables adaptive and continuous control over the action (Annett 1996) and lead,
via internal model parameters (Keller 2012, 209), to the perception of one’s own agency:
“Performative awareness that I have of my body is tied to my embodied capabilities
for movement and action . . . my knowledge of what I can do . . . is in my body, not in a
reflective or intellectual attitude” (Gallagher 2005, 74). This “sense of effort” (James 1896)
provides the tacit proprioceptive knowledge that perceptual changes are indeed the
outcome of one’s own actions. “That is, although the content of experience may be the
intended action, the sense that I am generating the action may be traced to processes
that lie between intention and performance” (Gallagher 2005, 57) and “they are generated
66 jan schacher

in the sub-personal processes of body-schematic control, and specifically in the processes

of motor preparation and the sensory feedback that results from the action” (174). Any
level of motor imagery is informed by the sense of agency and ties it to a real, as opposed
to purely imaginary, or even hallucinated action.

Motor Imagery’s Role in

Perceiving Music through
Instrumental Actions

The technique of motor imagery is commonly used in conjunction with physical training
in order to optimize skillful execution, for example, of an athletic task. It is used to imprint
as a corporeal motor schema a sequence of movements in a single coherent and economi-
cal movement unit. The goal is to have at one’s disposition a complex movement pattern
that only needs to be triggered once and not consciously controlled in every aspect
throughout the entire movement trajectory. The repetitive nature of practicing coordi-
nated movements of instrumental play fulfills the function of establishing body-schemata,
“integral kinaesthetic structures” (Luria 1973, quoted in Sheets-Johnstone 2009), dynamic
patterns, or so-called kinetic melodies. However, such obtained

knowledgeability is not simply a know-how, a lesser of form of knowledge that is

“merely physical.” Kinetic melodies are saturated in cognitive and affective acuities
that both anchor invariants and color and individualize the manner in which any
particular melody [pattern] is run off. (Sheets-Johnstone 2009, 256)

Through the practicing process, the embodied “know-how” becomes prereflective and
can later, in the right environment and circumstances, be triggered as a unit without the
necessity to individually deal with the actions that constitute it. Obtaining these motor
schemata is considered beneficial to concentration and mental preparation for extreme,
singular, and rare high-performance moments. Having integrated complex patterns
into single units allows one to shift the focus to anticipation and adaptation in complex
situations such as, for example, returning a 200-km/h tennis service.
In athletic training scenarios, on the one hand, the patterns are predefined, pretrained
units of movement that are continuously recalled and reinforced. In a musical per-
formance scenario, on the other hand, how many of the actions are pattern-based and to
what extent these patterns modulate the shaping of a performance depends on stylistic,
musical definitions and their degree of fixity.
Active motor imagery serves as a technique for training and builds on the way
movements are memorized, related, and executed on a subpersonal as well as somatic/
kinesthetic level. Athletes, as well as musicians and dancers, are known to practice
mentally and with reduced physical activation, such as when marking phrases (Kirsh
motor imagery in perception and performance 67

2010). The extra scaffolding obtained from executing reduced, yet signifying bodily
actions, represents an interesting case of a hybrid practice, which leverages motor imagery
with goal-points or “key-frames” (Godøy and Leman 2010), without fully exerting the
body. Practice exercises for musicians can have the same degree of determination as an
athlete’s movement schema and can be mentally practiced in the same way. Training of
fine-motor control and the creation of larger body-schematic movement units that can
be recalled without conscious involvement are the central occupation in a musician’s
training during the instrumental skill acquisition phases.
The body accumulates knowledge about movements, dynamics, and forces and, in the
case of traditional musical instruments, links it to the perception, the adaptation, and the
control of the desired sound-qualities, thus dealing with movement-sound conjunctions
rather than with movement and sound separately. This embodied knowledge encom-
passes the full range of the body’s motion and audition control. It is completely interde-
pendent with the environmental situation within which it is learned and acquired.
Full musical performance situations consist of a large number of perceptual tasks
that need to be negotiated and mastered with skills going beyond mere body-schematic
patterns. Yet, in order to achieve the necessary level of (hyper-)reflection (Kozel 2007),
and in order to master both the corporeal and the expressive musical demands of the
performance, it is important to able to base the execution on previously imprinted
body-schemata, to anticipate or plan the triggering of a motor movement unit, and to
modulate in real-time the parameters of its execution. The ensuing multilevel perception
and attention during music performance are necessary to manage high-level musical
goals and succeed with timing and the expressive control of phrases (Brown et al. 2015)
or chunks, in a top-down manner (Godøy 2006, 156) without the need to put full
attentional focus on specific single task elements.
The difference between general movement tasks, such as picking up a cup and drink-
ing from it, and musical tasks lies in the acoustic, sounding component whose percep-
tion plays a crucial role in controlling the quality of execution. In musical performance
with an instrument, by modulating fine-motor actions, an adaptive feedback-loop is
created that controls sound’s central aspects such as timbre, timing, and dynamics. This
loop contains both prereflective, kinesthetic as well as conscious, musical, or sonic
perceptions and (re-)actions and includes all the peripheral situating elements that are
part of the performance, that is, the stage, the other players, the social situation, and so
forth. Thus, the training of instrumental playing on a traditional instrument such as
the violin or the flute consists of learning recursive motor adaptations that depend
both on the perception of physiological, corporeal elements, such as posture, breath,
tension, and force, and on the auditory perception of tonal qualities such as timbre,
pitch, resonance, and volume:

When the status of habituation is reached, the body-image retreats into the background
in order to enable the concentration on the sonic-expressive shaping of the entire
piece of music, something to which the prereflective, proprioceptive and auditory
body senses are continuously subjected. (Kim and Seifert 2010, 111, my translation)
68 jan schacher

For the musical performer, lower-level auditory processes occur on a prereflective level
and inform musical awareness on a higher level, where the musical elements become
part of the experiential content. With habituation, this prereflective perception of
musical elements gets integrated into prereflective somatic proprioception, as in the
example of “feeling” the correct intonation on a string instrument. This habituation process
shows how musical awareness plays out on a metaphorical (Lakoff and Johnson 1980)
or conceptual (Fauconnier and Turner 2003) level and blends with and informs the
sensory-motor integration of auditory adaptations in body-schematic patterns. As with
any other physical task, performing music involves the coordination of intention, goal,
perception, and adaptive feedback for adjusting the motion trajectory. This is where
motor imagery on a prereflective and subpersonal level, as well as active, intentional
imagination become fundamental to a successful performance.

Ecological Embedding
and Affordances

The “enactive” perspective on perception would not be complete without considering

the ecological embedding that forms an integral part of and frames the moment of
sound and music perception. As the “being-in-the-world” is a relationship between the
exterior and the interior, so is the zone between subject and object, between body–
mind and environment the domain where interaction arises (Kozel 2007). In the case
of performing with technology, the environment comprises stage, technical devices,
performers, and the public; for example, in pieces bringing together choreography with
motion-capture technology (Kozel 2007, 274) or video analysis with choreographic
scores (Groves et al. 2007).
When thinking of sound and music perception, the relationship to the sources of
sounds is key and, therefore, also the relationship we have on a physical, subpersonal
level with the tools and instruments that produce them, with the physics of sound, and
with our aural and psychoacoustic capabilities of localizing and identifying sources
through their spectral contents (Bregman 1990). The concept of ecological embedding
applied to music perception implies a direct relationship between source, sound, and
subjectively perceived sound image.
Affordances are defined in terms of ecological potential as that which an object or
environment is offering as actions or resources to a subject:

The affordance of something does not change as the need of the observer changes. The
observer may or may not perceive or attend to the affordance, according to his needs,
but the affordance, being invariant, is always there to be perceived. (Gibson 2015, 130)

Gibson derived his concept from “Gestalt” psychology’s terms of valence, invitation,
and demand, but was critical that its proponents used the concept in a value-free manner.
He emphasized the inherent meaning that arises out of ecological embedding:
motor imagery in perception and performance 69

An affordance points two ways, to the environment and to the observer. So does the
information to specify an affordance . . . this is only to reemphasize that exterocep-
tion is accompanied by proprioception—that to perceive the world is to coperceive
oneself . . . The awareness of the world and of one’s complementary relations to the
world are not separable. (132–133)

In order to understand the scope of objective affordances (Paine 2009) that arise in
playing traditional, physical music instruments, the concept of perceptual affordances
needs to be added that is located in the cultural domain of music. On a primary level,
perceptual affordances can be defined as those types of perceptions generated when
entering into contact with the instrument but without necessarily interacting with it.
These perceptions form a multimodal field that encompasses the traditional five senses
of vision, audition, touch, taste, and smell. They arise when attentional awareness is
guided toward the instrument in any of the sensory modes. An example of such an affor-
dance is that of perceiving the tension of a drum skin while holding a frame-drum. On
a secondary level, perceptual affordances could also be seen as the potential for per-
ceptions to arise from interaction with the instrument. These secondary perceptions could
be tied to the five senses as well, if they manifest themselves within the outside per-
ceptual field and in direct relationship to the instrument. An example of this affordance
would be the sound generated from playing the instrument and contained in the audi-
tory event that arises out of an instrumental action. The perception or awareness that
originates within the player when interacting with the instrument, however, represents
a separate type of perceptual affordance that—even though it is derived from contact
and action with the instrument—does not exist independently of the cognitive or sub-
personal processes of the performer. The outer contact with the instrument is conveyed
by tactile and sometimes vibrotactile cues. In contrast, the inner effects of contact with
the instrument are based on a kind of sensing that is active within the body, such as
kinesthetic and vestibular sensing. These effects cannot be called perceptions but rather
sensations and belong to the prereflective, precognitive levels of our perceptual system.
An example of this inner type of affordance might be the level of comfort or the com-
plexity of physical adaptation an instrument demands for its proper playing position,
such as, for example, correctly lifting the hands while sitting at a piano. Or the affordance
might be the prereflective adaptations to playing due to the perception of vibrational
forces transmitted through the body, such as the modulation of a vibrato as felt through
the changes in the vibrating string. Finally, on a higher level, the sounds of an instrument
itself obtain their meaning, and therefore offer their “musical” or “cultural” affordance
in the context of their application. In that case it is less the physical aspects of the instru-
ment and more their habitual use that defines the “ecological” potential. Motor imagery
depends on internalizing the affordances, on the ability to internalize the sounding
result of a musical motor action; it therefore “involves mutuality between perception
and action at a neurobiological level” (Windsor and Bézenac 2012, emphasis added) as
well as on an experiential and cultural level, since any musical action is situated in a
cultural context and builds on prior experiences.
70 jan schacher

Gestures, Metaphors, and Models in

Technological Instruments

The literature on musical gesture provides a rich set of categorizations and classifications
that deal mainly with the types and effects of actions on musical instruments labeled
as “gestures.” Cadoz’s classification of the “gesture channel” differentiates between the
three functions of the ergotic, that is, the “material action, modification and transfor-
mation of the environment,” the epistemic, and the semiotic, and orders the instrumental
“gestures” in the three categories of excitation, modification, and selection (Cadoz 2000).
Godøy formulates the distinction between body-related and sound-related “gestures”
(Godøy and Leman 2010) that are categorized into sound-producing, communicative,
sound-facilitating, and sound-accompanying “gestures” (Jensenius et al. 2010). These
authors all take into account the bodily basis for the actions, sometimes also the per-
ceptual effects, but fail to address the prereflective effects inherent to acting and perceiving
musical agency through an instrument, in particular the new forms of technological
instrument that rely on abstract mathematical models and digital signal processing for
the production of sound.
In order for digital musical instruments to become “playable” in the proper sense of
the word, the representations of their digital processes need to occur in metaphors
(Lakoff and Johnson 1980); these processes are too complex to be grasped and acted on
directly while performing.2 The metaphors are present in visual representations, such
as the display of waveforms or spectrograms, in physical placeholders, such as levers,
wheels, knobs, and sliders, or in more encompassing analog device metaphors such
as tape-reels, patch-bays, and signal-chains. By themselves, these metaphors are useful,
and enable complex instruments to be “played”; the problem is their limiting effect on
the cognitive and perceptual capacities that could be better mobilized with richer, more
differentiated, and more process- or action-specific metaphors.
A number of conceptual models for the control of digital sound processes originate
in real-world scenarios and in existing physical devices and can therefore be cognitively
handled through actions and behaviors that are shaped by everyday experiences. The
two main models of control can be identified as the instrument (Jordà Puig 2005) and
the cockpit (Wanderley and Orio 2002). The first model is based on a traditional musi-
cal instrument’s dependence on (continuous) energy input that is necessary to produce
sound. Rather than presenting mechanisms for generating larger time-based structures,
the instrument offers a palette of sound options (or playing techniques) that need to
be actively selected, combined, and performed by the musician. The second model of
action puts the performer into an observer perspective or pilot’s cockpit, where, from a
position of overview, single control actions keep the system within the boundaries of
the intended output, while the actual sound processes produce their output without the
need for continuous excitation and control. A third and less common model is that of
dialogical communication and interaction with generative aspects that become an
motor imagery in perception and performance 71

integral part of the sound production processes. The most interesting manifestations of
the third model deploy some form of autonomous agents to generate an “inter-subjective”
exchange (Lewis 2000).
The types of interaction and their position on the conceptual axis, between direct
parametric control and “naturalistic interaction,” depend on the level at which the musi-
cian acts or “inter-acts” with the digital domain (Kozel 2007, 68). Different complexities
demand different tangible objects and instrumental interfaces. In the case of one-
dimensional and precise parametric control, individual objects such as knobs, sliders,
or buttons are cognitively appropriate, since they represent in their physical form the
singular dimension of the parameter and can be handled discretely. In the case of
higher-dimensional or model-based action patterns, control objects with more degrees
of freedom are required. The mode of “interaction” with more intertwined dimensions
should reflect the relationship and dependency of those degrees of freedom that are
present in the digital domain.
The most complex set of entangled degrees of freedom that we can cognitively han-
dle are those present in our entire body. Leveraging this level of complexity, at least
through extraction of information about posture and kinematic qualities of the body is
attempted, for example, by camera-based motion controls for games in so-called natural
user interaction where full-body movements are used for control. This might be an
appropriate method when the goal is to affect a virtual body that mirrors the capabilities
of the natural body in a virtual game environment. It becomes problematic, however,
when the correspondence between the actions in the physical world and the result or
reaction in the abstract digital domain are modeled after categories that originate in
the abstract domain. Empty-handed and movement-based controls in an allocentric3
frame work well for metaphors of control that reflect spatial qualities. Object-based,
instrumental actions with tangible interfaces in an egocentric4 (for example, with wear-
able sensors) or object-centric frame are effective for actions on abstract entities without
clear correspondence in the real world. Digital instrument design and interface devel-
opments oscillate between these two poles. There is, however, a tendency to shift away
from action and behavior patterns that are based on the bodily capabilities shaped by
object “interaction” with physical instruments, and to move toward symbolic and
metaphorical projection onto a disjointed digital model.
A smartphone with its touch screen, for example, gets used as—but was also designed
to become—a generalized object with a repeatable and representable repertoire of
movement patterns, the so-called gestures of pinch-to-zoom, swipe, and so on. These
action patterns were copied from the natural world. Slight dissonances within or new
interpretations of these patterns are learned and absorbed quickly when they constitute
part of the interaction vocabulary of an information device.5
Technological instruments such as the turntable or a tablet are easily integrated into a
musician’s movement and instrumental vocabulary. Turntable-ism is a prime example
of the reappropriation of a music playback device into an instrument, subverting codes
of musical style as well as social codes of “stealing” music or the disregard for the
“authentic” musician’s voice (Eshun 1998). Today, in a further shift, the turntable finds
72 jan schacher

itself translated to a physical interface that is merely a representation of the turntable.

The digital interface for DJs reconstructs the actions of scrubbing with a vinyl disc on a
platter through a scrubbing, disk-like interface on a touch-screen or physical controller
that affords the same movement-type and, through it, the same sonic transformation of
digital content as a turntable.
This kind of mimicry or design-genealogy within instrument building illustrates the
fact that successful instrument designs engender cultural movement-, action-, and
gesture-tropes, which survive technological transitions because of strong habituation and
the effectiveness of the metaphorical elements they carry. Electronic music performance
practices provide a good use-case to observe these shifted relations. Here, the pattern-
and image-projections occur in a pronounced manner and, in some cases, bodily motor
imagery or gestural interaction models are translated into purely metaphorical forms.

Instruments, Objects, and the Others

As the source of sounds, musical instruments with their rich cultural history and the
field of association they carry have a profound impact on our imagination of music
making. With the exception of the voice, all human-made (conventional) musical sounds
are generated by vibrating objects that exhibit specifically tuned physical properties.
Within a single culture, these modes of sound production are commonly known and
form the basis for understanding the act of music making. Sounds that have never been
heard, and do not resemble any other sounds that were experienced before, are not
easily identified, get confused with other sounds, or are simply ignored (Lemaitre et al.
2010). The prereflective auditory processes responsible for these decisions are part of
the filtering and inhibition systems that most of our perception is based on. Since
recognizing sounds is an evolutionary necessity, we are highly attuned to localizing and
identifying a sound’s origin rapidly and preconsciously, even if that means occasionally
failing. This capability is transferred to recognizing musical sounds and identifying
instruments, voices, and acoustical signals.
When considering the import that musical instruments have on our ability to imag-
ine producing sounds in a meaningful manner, the primary relationship to take into
account is that of an active body interacting with, exerting control over, and imposing
intentions onto a tool or object. The “body-object articulation” is a charged field and
contains not just the pragmatic value of its usage but also the signifiers of agency6 or,
in political terms, of the inherent power-relationship (Foucault 1977); the articulation
constitutes a body-weapon, body-tool, even body-machine complex that becomes a
relevant topic and urgent concern in technological performance practices and the way
technological and information-bearing tools pervade our current life-world and blur
the boundary between the organic and the technological (Haraway 1987).
Even though the body-object-movement relationships and kinesthetic patterns that
are offered by technological instruments exist in the same domain as the traditional
ones, culturally defined and explicitly designed motor images and interaction patterns
motor imagery in perception and performance 73

prevail. In order to enable the manipulation of sound with intentional actions, even a
technological instrument that is based on digital (intangible) processes to generate
sound needs a control- or performance-interface that is based on physical character-
istics; it needs to provide methods of access through proxy layers that enable physical or
gestural interactions. The way technological instruments mediate and alter the path
from an imagined and anticipated sound-event to its sonic manifestation tells as much
about the “technicity” of the instrument (Simondon 1958) as about the mechanisms for
music making we depend on. The temporal unity of an action and its sonic result, for
example, are critical to maintaining a sense of causality and agency. The translations that
are necessary to link a physical action to the production of a sound show the perceptual
boundaries of the physical properties of sounding objects, moving bodies, and the
action-sound coupling that are always present in the natural world; the immediate bond
between bodily action and sounding result is broken by the use of symbolic machines.
These computer programs with their associated graphical user interfaces are merely
executing logical or mathematical operations in order to generate sounds. Even though
technology is optimized to hide this fracture, for example, by becoming so fast as to appear
immediate and transparent, our necessary and indissociable reliance on embodied per-
ception for identifying sound sources as a matter of survival generates an inherent tension
and contradiction that undergirds and permeates any performance with technology.
How this tension can be fruitfully exploited to generate meaningful relationships
for performing arts is stated succinctly by Kozel (2007, 70–71): “If we create responsive
relations with others and our environments that transcend language, then by means
of intentional performance with technologies we can regard technologies not as tools,
but as filters or membranes for our encounters with others.” This statement emphasizes
the fact that musical imagination and performance are part of a deeply cultured activity
and are always already oriented toward others (Decety and Chaminade 2003). This
applies to all levels of “technicity” of instruments, even primary vocal utterances of
musical nature, and shows that current musical practices contain the reciprocal func-
tion of affectively touching the performing as well as the perceiving subject who are each
other’s “other” in the communicatively enfolded moment of “musicking.”

Notes
1. Or producing an event.
2. Even emerging live-coding practices rely on textual representations in programming
languages and widgets of graphical user interfaces to “perform” with sound-processes.
3. An outer spatial frame of reference.
4. A spatial frame of reference anchored on oneself.
5. Think of the finding of the power button of your smartphone; after a short period of
accustomization, the act of switching the screen on or off becomes a pattern that does not
need extra attention, even though there is often no clear reason why it might be in one
place or the other on the device.
6. It is interesting to consider the term “agency” in its German translation: “Handlungssmacht”
could be translated as the power to act (Stockhammer 2015).
74 jan schacher

References
Annett, J. 1996. On Knowing How to Do Things: A Theory of Motor Imagery. Cognitive Brain
Research 3 (2): 65–69.
Bergson, H. 1939. Matière et mémoire: Essai sur la relation entre le corps et l’esprit. Paris, France:
Presses Universitaires de France, Quadrige. (English: 1911, Matter and Memory. London,
UK: George Allen and Unwin.)
Berthoz, A. 1997. Le sens du mouvement. Paris, France: Odile Jacob.
Bregman, A. S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound.
Cambridge, MA: MIT Press.
Brown, R. M., R. J. Zatorre, and V. B. Penhune. 2015. Expert Music Performance: Cognitive,
Neural, and Developmental Bases. Progress in Brain Research 217: 57–86.
Butler, J. 1988. Performative Acts and Gender Constitution. Theatre Journal 40 (4): 519–531.
Cadoz, C. 2000. Gesture-Music. In Trends in Gestural Control of Music, edited by
M. M. Wanderley and M. Battier, 71–94. Paris, France: Ircam, Centre Pompidou.
Cox, A. 2001. The Mimetic Hypothesis and Embodied Musical Meaning. Musicae Scientiae
5 (2): 195–212.
Cox, A. 2011. Embodying Music: Principles of the Mimetic Hypothesis. Music Theory Online
17 (2): 1–24.
Decety, J., and T. Chaminade. 2003. When the Self Represents the Other: A New Cognitive
Neuroscience View on Psychological Identification. Consciousness and Cognition 12 (4):
577–596.
Enticott, P. G., H. A. Kennedy, J. L. Bradshaw, N. J. Rinehart, and P. B. Fitzgerald. 2010.
Understanding Mirror Neurons: Evidence for Enhanced Corticospinal Excitability During
the Observation of Transitive but Not Intransitive Hand Gestures. Neuropsychologia 48 (9):
2675–2680.
Eshun, K. 1998. More Brilliant Than the Sun: Adventures in Sonic Fiction. London: Quartet Books.
Fauconnier, G., and M. Turner. 2003. The Way We Think: Conceptual Blending and the Mind’s
Hidden Complexities. New York: Basic Books.
Foucault, M. 1977. Discipline and Punish: The Birth of the Prison. London: Vintage.
Gallagher, S. 2005. How the Body Shapes the Mind. Oxford: Clarendon.
Gaver, W. W. 1993. What in the World Do We Hear? An Ecological Approach to Auditory
Event Perception. Ecological Psychology 5 (1): 1–29.
Gibson, J. J. 2015. The Ecological Approach to Visual Perception. New York and London: Taylor
and Francis, Psychology Press.
Glasersfeld, E. 1996. Radikaler Konstruktivismus: Ideen, Ergebnisse, Probleme. Frankfurt am
Main: Suhrkamp.
Godøy, R. I. 2006. Gestural-Sonorous Objects: Embodied Extensions of Schaeffer’s Conceptual
Apparatus. Organised Sound 11 (2): 149–157.
Godøy, R. I., and M. Leman. 2010. Musical Gestures: Sound, Movement and Meaning.
New York: Routledge.
Groves, R., N. Zuniga Shaw, and S. DeLahunta. 2007. Talking about Scores: William Forsythe’s
Vision for a New Form of “Dance Literature.” In Knowledge in Motion: Perspectives of Artistic
and Scientific Research in Dance, edited by S. Gehm, P. Husemann, and K. von Wilcke,
91–100. Bielefeld, Germany: Transcript Verlag.
Haraway, D. 1987. A Manifesto for Cyborgs: Science, Technology, and Socialist Feminism in
the 1980s. Australian Feminist Studies 2 (4): 1–42.
motor imagery in perception and performance 75

Huizinga, J. 1955. Homo Ludens: A Study of the Play-Element in Culture. Boston: Beacon.
Huron, D. B. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge,
MA: MIT Press.
Ihde, D. 2007. Listening and Voice: Phenomenologies of Sound. Albany: SUNY Press.
James, W. 1896. The Principles of Psychology. London: Macmillan.
Jeannerod, M. 2006. Motor Cognition: What Actions Tell the Self. Oxford: Oxford University
Press.
Jensenius, A., M. M. Wanderley, R. I. Godøy, and M. Leman. 2010. Musical Gestures, Concepts
and Methods in Research. In Musical Gestures, Sound, Movement and Meaning, edited by
R. I. Godøy and M. Leman, 12–35. New York: Routledge.
Johnson, M. 2007. The Meaning of the Body, Aesthetics of Human Understanding. Chicago:
University of Chicago Press.
Jordà Puig, S. 2005. Digital Lutherie: Crafting Musical Computers for New Musics’ Performance
and Improvisation. PhD thesis, Barcelona, Spain: Universitat Pompeu Fabra, Department of
Information and Communication Technologies.
Keller, P. E. 2008. Joint Action in Music Performance. Emerging Communication 10:205.
Keller, P. E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and
Potential Benefits. Annals of the New York Academy of Sciences 1252 (1): 206–213.
Kim, J. H., and U. Seifert. 2010. Embodiment musikalischer Praxis und Medialität des
Musikinstrumentes—unter besonderer Berücksichtigung digitaler interaktiver Musik
performances. In Klang (ohne) Körper, Spuren und Potenziale des Körpers in der
elektronischen Musik, edited by M. Harenberg and D. Weissberg, 105–117. Bielefeld:
Transcript Verlag.
Kirsh, D. 2010. Thinking with the Body. In Proceedings of the 32nd Annual Conference of the
Cognitive Science Society, Austin, TX, edited by the Cognitive Science Society, 32:32,
2864–2869. Mahwah, NJ: Lawrence Erlbaum.
Kozel, S. 2007. Closer: Performance, Technology, Phenomenology. Cambridge, MA: MIT Press.
Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago: University of Chicago Press.
Latour, B. 2005. Reassembling the Social. Oxford: Oxford University Press.
Lemaitre, G., O. Houix, N. Misdariis, and P. Susini. 2010. Listener Expertise and Sound
Identification Influence the Categorization of Environmental Sounds. Journal of Experimental
Psychology: Applied 16 (1): 16.
Leman, M., and A. Camurri. 2006. Understanding Musical Expressiveness using Interactive
Multimedia Platforms. Musicae Scientiae 10 (1): 209–233.
Lewis, G. E. 2000. Too Many Notes: Computers, Complexity and Culture in Voyager. Leonardo
Music Journal 10: 33–39.
Lotze, M., U. Heymans, N. Birbaumer, R. Veit, M. Erb, H. Flor, et al. 2006. Differential Cerebral
Activation during Observation of Expressive Gestures and Motor Acts. Neuropsychologia
44 (10): 1787–1795.
Luria, A. R. 1973. The Working Brain. Harmondsworth, UK: Penguin Books.
Merleau-Ponty, M. 1945. Phénoménologie de la perception. Paris: Gallimard.
Montgomery, K. J., N. Isenberg, and J. V. Haxby. 2007. Communicative Hand Gestures and
Object-Directed Hand Movements Activated the Mirror Neuron System. Social Cognitive
and Affective Neuroscience 2 (2): 114–122.
Noë, A. 2004. Action in Perception. Cambridge, MA: MIT Press.
Paine, G. 2009. Towards Unified Design Guidelines for New Interfaces for Musical Expression.
Organised Sound 14 (2): 142–155.
76 jan schacher

Reybrouck, M. 2001. Biological Roots of Musical Epistemology: Functional Cycles, Umwelt,

and Enactive Listening. Semiotica 134 (1–4): 599–634.
Reybrouck, M. 2006. Music Cognition and the Bodily Approach: Musical Instruments as
Tools for Musical Semantics. Contemporary Music Review 25 (1–2): 59–68.
Rizzolatti, G. 2012. The Mirror of Mechanism: A Neural Mechanism for Understanding Others.
Bethesda, MD: National Institutes of Health.
Rizzolatti, G., and M. A. Arbib. 1998. Language within Our Grasp. Trends in Neurosciences
21 (5): 188–194.
Schechner, R. 1977. Performance Theory. London and New York: Routledge.
Sevdalis, V., and P. E. Keller. 2010. Cues for Self-Recognition in Point-Light Displays of Actions
Performed in Synchrony with Music. Consciousness and Cognition 19 (2): 617–626.
Sheets-Johnstone, M. 2009. Kinesthetic Memory. In The Corporeal Turn: An Interdisciplinary
Reader, edited by M. Sheets-Johnstone, 253–277. Exeter, UK: Imprint Academic.
Simondon, G. 1958. On the Mode of Existence of Technical Objects. Paris: Editions
Aubier-Montaigne.
Small, C. 1999. Musicking: The Meanings of Performing and Listening; A Lecture. Music
Education Research 1 (1): 9–22.
Smalley, D. 1996. The Listening Imagination: Listening in the Electroacoustic Era. Contemporary
Music Review 13 (2): 77–107.
Stewart, J., O. Gapenne, and E. A. Di Paolo. 2010. Enaction, toward a New Paradigm for
Cognitive Science. Cambridge, MA: MIT Press.
Stockhammer, P. W. 2015. Lost in Things: An Archaeologist’s Perspective on the Epistemological
Potential of Objects. Nature and Culture 10 (3): 269–283.
Turner, V. W. 1982. From Ritual to Theatre: The Human Seriousness of Play. New York, NY:
Performing Arts Journal.
Varela, F. J., E. T. Thompson, and E. Rosch. 1991. The Embodied Mind: Cognitive Science and
Human Experience. Cambridge, MA: MIT Press.
Voegelin, S. 2010. Listening to Noise and Silence: Towards a Philosophy of Sound Art. London
and New York: Continuum.
Walther-Hansen, M. 2012. The Perception of Sounds in Phonographic Space. PhD thesis,
Department of Arts and Cultural Studies, Musicology Section, University of Copenhagen.
Wanderley, M. M., and N. Orio. 2002. Evaluation of Input Devices for Musical Expression:
Borrowing Tools from HCI. Computer Music Journal 26 (3): 62–76.
Windsor, W. L., and C. de Bézenac. 2012. Music and Affordances. Musicae Scientiae 16 (1):
102–120.
chapter 4

M usic a n d Em ergence

John M. Carvalho

Introduction

There is something that emerges in a piece of music, especially in the skilled act of
making music. What emerges is the music in that piece of music. We say we make music
when, better put, we enact it by patterning sounds that achieve or contribute to the
emergence of music in an otherwise undifferentiated field of sound. For the purposes of
this chapter, the music that emerges from our skilled engagement with an otherwise
undifferentiated field of sound will be described as afforded by that field (Gibson 1979).
The affordances that turn up in the field will depend on the skills and the refinement of
the skills one attempting to make music deploys in her skilled engagement with a field of
sound. For the musician, sound is her environment, and the skills she has for engaging
this environment have been acquired and refined in prior engagements with sound
in this environment. Her skills are importantly embodied and, also, extended in the
instruments and others tools—the score, a music stand, a tuning device, and so on—she
uses in her skilled engagements with an environment of sound. Affordances in that
environment turn up for her specifically embodied and extended skills, and music
emerges from her embodied and extended engagements with those affordances. She
imaginatively tests which skills most musically pick up what is afforded by her environ-
ment and deploys the skills that enact the music virtually present there.1 The particular
music that emerges from that environment emerges for the distinctly refined skills
engaged by the composer, the performer, and the listener (recognizing that these skills
can and will overlap). The emergence of roughly the same music for a variety of musical
skill sets and refinements testifies to the way these skills are shared and the extent to
which our environment is co-constituted by a variety of musical subjects.
This chapter draws on arguments for the ecology of cognition to support its claims
about the emergence of music (Clarke 2005). The leading idea for this ecology is that our
minds are fundamentally active and interactive, in the world and not in our heads. This
thinking reverses traditional models that conceive cognition as passively receptive to
78 john m. carvalho

input from the environment that is processed in the form of representational content
leading to action and behavior. On the ecological model, cognition actively engages the
world, remaking the environment into an emergent field where its projects can be real-
ized. Using skills learned and refined in engagements with the world, subjects realize
their aims by enacting what the environment affords them. In the case of music, subjects
pattern sounds that turn up in the environment for their particular set of skills and the
refinement of those skills. Music emerges from affordances picked up and enacted or
realized by composers, performers, and audiences drawing from their skilled engage-
ment with prior performances of music works, the score, this particular performance
and responses to that performance as well as in the instruments played, the voices sung,
the venue where the music is performed, the constituency of the audience and so on.
Again, the music that emerges will be singularly connected to specific performers and
audiences engaged in making this music, but it will also be shared in virtue of the
convergences of the skills and affordances shared by the musical subjects involved.
In this ecological model, the imagination is not treated as a discrete faculty repre-
senting mental content that differs from what can be perceived or believed about what is
represented in that content. On the philosophy of mind drawn on here, the imagination
figures as an affective valence of the always embodied engagement of the mind in an
environment of sound. The mind as it is conceived here is not separable into distinct
doxic, praxic, and pathic streams. For embodied cognition, there is no perception with-
out action and no action without a conception, including an imagination, of the end of
that action which is afforded by the environment for an actor with specific skills. The
imagination functions with cognition and action to make actual what is only virtually
present in the environment for this particular body with aims afforded for the skills
acquired and refined by that body in relation to the environment. For the musician,
embodied in a composer or a performer or a listener, as this particular composer, per-
former, or listener—does she or has she played the piano, to what degree of proficiency,
or is she a singer, a horn player, a DJ, and so forth?—the environment of sounds turns
up affordances for the emergence of music by virtue of the imaginative, perceptive,
cognitive, and active engagement of this particular musician with this particular environ-
ment of sound. The imagination is a feature of this engagement. It is not determina-
tive, but it always figures in the embodied engagement of a musician with the sonic
environment that turns up for her as she composes, performs, and listens for the music
in that environment, engaging the environment to make music emerge from it.2
On this ecological model, music will be accounted for in terms of what emerges from
the affordances subjects pick up in their skilled interactions with the environment. This
music does not have a specific representational content that can be recognized as this but
not that. It is, instead, what repeats itself and cannot but repeat itself precisely because
it has no content.3 What emerges as music is what repeats itself in the environment in
virtue of the skilled engagements with the environment by composers, performers, and
auditors. This music that repeats itself is what the composer, the performer, and auditor
find and give back to the environment by their attentive, enactive playing and listening.
Without such an engagement, there is no music but only a succession of notes more or
music and emergence 79

less adequately executed, more or less attentively heard. In this chapter, music will be
taken to be what emerges in performances of it. What emerges does not approximate an
idea or an ideal. Music is very much real in its emergence as what we hear in this per-
formance of it. We approach this controversy as well as the question of a skilled listening
to music through an underappreciated text, “Listening,” by Roland Barthes (1985a). In
his account of listening, Barthes refers to the way the unconscious, in the psychoanalytic
setting, gives an ear to what emerges from the subject the analyst listens to. We update
and substantiate Barthes with a revised taxonomy for modes of listening proposed by
Kai Tuuri and Tuomas Eerola (2012) defending Barthes’s appeal to the unconscious
against what Tuuri and Eerola call “critical” and “reflective” listening. We locate music’s
“unconscious” in its “groove” as discussed by Maria Witek (2014) and Tiger Roholt
(2014), and we encounter it in the performance of “At Last!” by Etta James. We conclude by
defending a critical as opposed to a metaphysical ontology (Goehr 2007; Neufield 2014)
that identifies the emergence of music in an enactive performance of it.4

Listening

Among the many insights he offers us about music, Barthes says there are three ways
of listening. There is listening to an alert, listening that is a deciphering, and listening
that develops an intersubjective space where what we listen to is “a general ‘signifying’
no longer conceivable without the determination of the unconscious” (1985a, 246). The
first two ways of listening, Barthes says, we share with animals. The last, on his view, is a
distinctly human, and modern, way of listening.5 For Barthes, this distinctly human lis-
tening compares with the listening of the analyst in the psychoanalytic setting. Barthes
does not spell out the implications of this observation for listening to music. We do that
here, but ours is not a hermeneutic exercise. We do not hope to reveal what Barthes
might really have wanted to say about a listening to music that is inconceivable without
the determination of the unconscious. Instead, we hope to take advantage of what
Barthes wrote to get traction on the question, How does what is virtually present in an
environment of otherwise undifferentiated sound emerge as music in our listening to
that environment? To manage this, at least two things must be clarified: what Barthes
means by the unconscious and what Barthes means by a “general signifying,” what he
also calls “signifiance.”
Before getting to the unconscious and signifiance, however, we should note that Barthes
distinguishes listening from hearing. Hearing, on his account, is the physiological ana-
logue to the psychological act of listening. We cannot account for listening acoustically,
Barthes writes, or by reference to the anatomy of the ear and its object or goal. Rather,
listening involves the mind as well as the body which is formed by the mind from which
the body takes the object or goal of its listening. Hearing, on this view, is how the body
physically responds to listening’s psychological evaluation of a spatial and temporal
situation. Listening does not respond to hearing. That would lead quickly to a dualism
80 john m. carvalho

Barthes would not abide and we should reject. It is rather the case that listening directs
hearing while hearing supports listening. Listening is, thus, embodied in hearing just as
much as hearing is animated by listening. Barthes would likely not have been moved
from this position by the evidence now available that extends the brute anatomy of the
ear to the sophisticated psychobiology studied by cognitive neuroscience (see Schnupp
et al. 2012). Listening has for him an evaluative function that directs the hearing body to
affordances in its environment, and there are good reasons for thinking he is right. There
is no question of a dualism here. Hearing and listening are conjoined tendencies of an
entirely embodied cognition. At one pole we recognize that something is audible. At the
other, we engage what is audible in the context of the lives we enact by listening but also
singing, dancing, speaking, and so forth.
Listening “is the very sense of space and time,” Barthes writes. By “the perception of
degrees of remoteness and of regular returns of phonic stimuli” we shape our sonic
world (1985a, 246). He contends that humans identify a territory by listening to the
familiar and the unfamiliar in the sounds that constellate a more general environment.
We hear sounds, on his view, but identify a place by listening. For example, the “house-
hold symphony” of kitchen noises, plumbing, heating and air-conditioning, the sounds
of nature or the neighbors and maintenance equipment bleeding in from the outdoors
form an aural texture of background noises we hear as the basis for listening to a world
we call our home.
It is, again, as if listening “were the exercise of a function of intelligence,” Barthes
writes, taking intelligence to be a kind of “selection” (247). Listening picks things out,
it picks up affordances, but it only exercises this function against the background of
what is familiar and unobtrusive. If the background noises are too loud or unfamiliar,
listening—as a form of intelligence or selection—is precluded. Affordances turn up
because they reinforce what is familiar and because they expand creatively on what
has been familiarly afforded. If we were to restate Barthes in a contemporary idiom or,
better, if we were to draw from Barthes what will help us understand the role of the
mind in our appreciation of music today, we might say that listening enacts or achieves
the music afforded by what we hear in an environment of sounds. Let us now see how
this distinction plays out in Barthes’s taxonomy.
The alert, the first order of listening, is said to be what threatens to interrupt, disturb
or positively enhance the safe, sonic space that is the listener’s territory. Listening, at this
level, is a response to surprises perceived as either a menace or a need. The “raw material”
of listening on this level is what Barthes calls the “index.” The index is something singu-
lar, something that stands out because it is distinctive or exemplary in the context or the
texture of the territory or what we have called the environment. This is a type of listening
we share with animals. The sound of a can of food being opened stands out in the sonic
space of the napping cat as do unexpected footsteps in the hall, the one promising the
satisfaction of a perceived need, the other anticipating a perceived threat. In the experi-
ence of listening to music, an alert may take the form of a missed note that derails the
resolution of a melodic line or what proves to be a passing tone that opens a sonic space
for improvisation. What is important about this type of listening in the case of music is
music and emergence 81

that something distinctive and pertinent stands out as a danger or an opportunity

against an otherwise undifferentiated and diffused but familiar and comforting texture
of sounds. Listening at this level makes us more intelligent, allows us to pick out and
pick up what will threaten or enhance our listening experience just in case we feel at
home among the sounds, including the music, we hear.
At the next level, listening becomes something more human and actively creative,
beginning, Barthes writes, with “the intentional reproduction of rhythm” (1985a, 248).
To account for rhythm, Barthes refers to archaeological evidence, incisions on cave
walls, and the regularly percussive activity of housebuilding that predate, on his view,
the invention of language.6 While he agrees that we can never know anything about it, he
invites us to enjoy the delirium (jouissance) and logic of speculating about the origins of
sonic rhythm. He plausibly suggests, in this context, that “without rhythm, no language
is possible” (249). The sign, for example, which is the most fundamental unit of mean-
ing, and what must be deciphered in a language, is based on an oscillation between
what is marked and what is unmarked, the latter term signing the general category, the
paradigm, against which the former syntagmatically stands out. So, for example, “man”
is unmarked: it is sometimes, still, and too often, used to refer to both males and females
of the human species. “Woman,” on the other hand, is marked: it refers only to females.
Barthes explains the relation between the marked and unmarked in terms of the
well-known story of the game, Fort-da, played by Sigmund Freud’s grandson, Ernst,
staging the presence and absence of his mother (Freud 1953a, 14–15). Fort-da is at once a
symbolic game and the creation of rhythm. More importantly, by miming his mother’s
return, retrieving the spool cast from his crib, the boy enters into another form of lis-
tening. For what on the first level would have been listening for the index and the
unmarked sign of his mother’s presence, the sound of her shoes on the floor or the rus-
tling of her skirts, has been made, by the young Ernst, into a marked sign of her return.
He no longer listens for what is possible but for what has become the secret, “that which,
concealed in reality, can reach human consciousness only through a code, which serves
simultaneously to encipher and decipher that reality” (Barthes 1985a, 249).7 Listening
on this level, which is the listening most often associated with listening (especially a
skilled listening) to music, is always linked to an interpretation. It is an attempt to make
intelligible what is otherwise obscure. Barthes connects this listening to religion and
the hidden world of the gods. The listener, at this level, he writes, seeks to decipher the
future, to identify what is marked with the destiny the gods have in store for us, and to
reveal our transgressions, the marked ways we have made ourselves displeasing in the
sight of the gods.
What is important about this development for a discussion of the determination of
the unconscious in a distinctly human way of listening is the introduction of an interi-
ority in this type of listening. It is by listening to ourselves, to our conscience, that we
learn about our transgression, about how we have displeased, if not the gods, our
families, our friends, and ourselves. Further, faced with these transgressions, we seek to
confess them to someone else, to ask another to listen to us. In the long history of the
Christian pastoral, and the shorter history of psychoanalysis, this listening has become
82 john m. carvalho

an increasingly private affair, until the moment when the speaker, listening to what is
interior to himself commands the attention of another’s—the priest’s, the analyst’s—
interiority. The one speaking now commands another to listen to what the speaker has
heard listening to himself.

The injunction to listen is the total interpellation on one subject by another: it places
above everything else the quasi-physical contact of the subjects (by voice and ear):
it creates transference: “listen to me” means touch me, know that I exist.8
(Barthes 1985a, 251)

Is not this what the musical performer commands? “Listen to me,” she says, through her
instrument and her song. “Touch me. Know that I exist.” She does not simply share the
music she plays in her performance of it with an audience. She commands the attention
of that audience. Hear me. Feel me. Touch me. That is our clue to how the musician listens
to herself, to what is afforded her embodied enactment of her music.
Importantly, it is the affordance she engages, interior to her, that this musician listens
to while performing and that she commands her audience to listen to in the music she
makes. Listening to her music, music that emerges from the musician’s skilled engage-
ment with those affordances, the audience listens to the musician herself. They make
her interiority theirs. Here we have the beginnings of a shared intersubjective space
of listening. Barthes describes the telephone as the archetypical instrument of this
listening, since it “collects the two partners into an ideal (and in certain circumstances
an intolerable) inter-subjectivity” (1985a, 251–252). Telephonic communication, he says,
invites the Other to “collect [the speaker’s] whole body in his voice” (252). Speaker and
listener are, thereby, embodied and extended in the telephone that connects these
modalities of their embodiment.9
So far, Barthes’s observations square with the revised taxonomy for listening proposed
by Tuuri and Eerola (2012). Tuuri and Eerola conceive listening as an action-oriented
intentional activity that finds meaning in “emerging resonances between experi
ential patterns of sensation, structured patterns of recurrent sensorimotor experiences
(action-sound couplings) and the projection of action-relevant mental images” (137).
In the literature on listening taxonomies (Schaeffer 1966; Chion 1983, 1990), they discern
three listening modes: a causal mode distinguished by an intention to apprehend causal
indices, a semantic mode distinguished by an intention to comprehend meanings, and a
reduced mode distinguished by an intention to perceive the sound itself (Tuuri and
Eerola 2012, 139).10 More recent developments (Huron 2002; Tuuri et al. 2007) suggest a
division into two pre-attentive modes (reflexive and connotative), two source-oriented
modes (causal and empathetic), and three context-oriented modes (functional, seman-
tic, and critical) (Tuuri and Eerola 2012, 141). The pre-attentive modes, which capture
innate and primordial affective responses and their associations, map what Barthes
calls listening as to an alert. The source-oriented modes, which capture denotative acti-
vation systems and the perception of a sound being intentional, and the context-oriented
modes, which capture sounds’ affordances, their sociocultural conventions, and their
music and emergence 83

appropriateness, make a finer grained map of what Barthes (1985a) calls listening as a
deciphering (Tuuri and Eerola 2012, 141–142).
In their revised taxonomy, Tuuri and Eerola term the pre-attentive modes (to which
they add kinesthetic action sound couplings) “experiential.” They group the source-
oriented and context-oriented modes, stripped of critical listening, under the heading
“denotative.” Finally, they pair critical listening, judgments about the appropriateness of
a sound in a given context and of our responses to that sound, with reduced listening,
focusing on the “sound itself and its qualities,” and call this mode “reflective” (142, 147).
Again, experiential and denotative modes of listening follow what we have observed in
Barthes so far. Reflective listening, however, is not a part of Barthes’s plan. Tuuri and
Eerola attribute reflective listening in its reduced mode to an attention to qualities of the
sound apart from the denotations associated with the sound (149). To the critical mode
of reflective listening they attribute a judgment that “evokes new meaning” in the sounds
experienced “and reevaluates those [meanings] already evoked” (149). So described,
reflective listening does not appear to contribute to the making or emergence of music.
Reduced reflective listening gives us sound stripped of its relation to the listener and
the environment. Critical reflective listening makes the dynamic interplay between
cognition and the environment one-sided: the sound is given and the listener judges what
is given. In the experience we are hoping to describe, the music emerges from affordances
that turn up in the environment for the specifically embodied skills of the composer,
performer, and listener.11 We expect to find music so enacted in what Barthes has called
a distinctly human and modern mode of listening not covered by Tuuri and Eerola.

The Unconscious

Listening at Barthes’s first level transforms noise into an index. Listening at the second
level transforms the index into a sign. It also transforms the listener into a dual subject.
On this second level, interpellation becomes interlocution “in which the listener’s
silence will be as active as the locutor’s speech” (Barthes 1985a, 252). Listening speaks,
and it is at this level that the third type of listening, a distinctly human and modern
listening, a listening “no longer conceivable without the determination of the uncon-
scious,” begins to take shape. The image of the telephone cited earlier does not occur
capriciously to Barthes. It is the same image Freud used to describe the analyst’s listening
to the analysand.

The analyst must bend his own unconscious like a receptive organ toward the
emerging unconscious of the patient, must be as the receiver of the telephone to the
disc. As the receiver transmutes the electric vibrations induced by sound waves back
into sound waves, so is the physician’s unconscious mind able to reconstruct the
patient’s unconscious, which has directed his associations, from the communications
derived from it. (Freud 1963, 117–126)
84 john m. carvalho

In the free association of the analysand’s giving an account of himself, the unconscious
speaks: “touch me,” it says, “know that I exist.” The analyst, evenly hovering, attending to
nothing in particular, refusing to latch onto anything that would lead her to learn only
what she already knows, listens for the emerging unconscious of the analysand. She makes
her unconscious the sounding board for the unconscious of her patient or, better, a
surface where the array of her patient’s cathexes can take shape, where, in this array, her
patient’s unconscious can emerge.12
Since the unconscious is said by Freud to function at the level of images, any inter-
vention at the level of language, the language of the analyst in particular, threatens to intro-
duce a selection that revises the unconscious in advance, telling the analyst only what
she wants to hear. The evenly hovering attention of the analyst attempts to eliminate or
bracket, at least, anything that might mediate a connection between her unconscious
and the unconscious of her patient. The unconscious, while not a language, is said to be
structured like a language (Lacan 1981), that is as a constellation of gaps, lapses, dif-
ferences, and differential relations, so the patient’s words, spoken freely and associatively,
provide a medium for his unconscious to emerge. The impressions the patient’s free
associations make on the unconscious of the analyst are, ideally, recorded there for
revision only after the fact in the report the analyst writes. The “symbolic order” of lan-
guage, the language of the Other, is said to enter the equation only in this revision the
analyst gives of her patient’s account as mediated by her own unconscious. Of course,
these ideal conditions only infrequently obtain. What obtains more regularly, right
away in the account the analysand gives of himself, is the structuring influence of the
symbolic order and the Other as embodied by the analyst.
The unconscious, on these terms, should not be counted as something archaic or
archetypical in consciousness. It is not, either, what Freud called the preconscious or
what others often think of as a reservoir of repressed desires and cathexes. The uncon-
scious on these terms is rather what is afforded in the account the patient gives. It is
what emerges, when it emerges, and repeats itself in the free associations of the patient
for the analyst who skillfully bends her ear to these associations and makes her uncon-
scious a sounding board for the unconscious of the analysand. Jacques Lacan (1981)
calls this unconscious the objet a, the object cause of desire in the analysand. For
Barthes it is what he calls a general signifying, an overfullness of meaning or signifi-
ance that means nothing in particular. In musical terms, it is not an alert indicated by
a false note nor a sign interpellating a practiced listener to interpret the meaning of a
raised fourth in a blues scale. There is, in music, following Barthes, something more
than what it means. There is in music a general signifying that emerges and repeats
itself—“hear me,” “know that I exist”—and we hear that something more, that signifi-
ance, only with a listening practiced and refined on the model of the unconscious just
described. Listening in this way, engaging the affordances in an environment of sound
on the model of the unconscious of the analyst bending to the unconscious of the
analysand, the music in a piece of music, the something more in that environment,
emerges for us. This mode of listening cannot be found in any of the taxonomies
offered by Tuuri and Eerola.
music and emergence 85

Groove

This something more in music is approximated by what Maria Witek and Tiger
Roholt have called “groove.” Groove is not a colloquial term. Roholt uses it spe
cifically to talk about the body’s noncognitive grasp of an element in music that signifies
without signification, that is, without meaning something in particular. Groove, Roholt
writes, is something we feel, the “motor-intentional affect” lived through as part of the
effort to understand the music’s motor-intentionality through bodily movements
(Roholt 2014, 137). Groove, following Roholt, is perceived haptically by the body in its
lived corporeality. In Barthes’s terms, our embodied listening echoes haptically the
motor-intentional affect in the music itself. On our terms, this embodied listening is a
skilled engagement with what is afforded by an environment of sounds. This listening
compares with the listening in the psychoanalytic setting. It is not listening for something
it has determined in advance. It is listening for what, emerging in this environment of
sound, that has no determinate content, that cannot be fitted under a concept, repeats
itself and cannot but repeat itself. Groove is what repeats itself for the listener skilled at
engaging what is afforded by its general signifying, its signifiance.
This general signifying is the something more we are listening for, especially as per-
formers, in the performance of a piece of music. We are listening for the music as it
emerges in the environment of sounds we engage. We are listening for what is latent, as it
were, in the manifestly skilled execution of the piece as scored. This signifiance is the
element that turns up, emerges, slips away, and re-emerges. It is what as performers we
are striving to find and hold together (maybe by letting go of our attention to the score
and our technique to do so). It is what as auditors we are listening for in those pieces of
music we find exemplary because in them there is a groove. What Roholt calls groove
gestures to what we are describing as the intersubjective space where what we are lis-
tening for is a general signifying that repeats itself and cannot but repeat itself, and this
general signifying, this groove, is there only just in case we are there to enact or achieve
it. So, what we are getting at is not so much what Witek calls “groove music,” though no
doubt there is such a thing, but a groove in music, a general signifying that we bend
our ears toward the way the analyst bends her unconscious to the unconscious of the
analysand. This is as true of the outro of “Straight, No Chaser” as played by the second
Miles Davis Quintet on Milestones (1958)—a groove hackneyed performers struggle
to achieve—as it is for the opening passage of “The Beatitudes,” by Vladimir Martynov
(1998), rescored for Kronos Quartet (2012[2006]) and heard throughout Paulo Sorentino’s
film The Great Beauty (2013).
In an interview with David Harrington of Kronos Quartet,13 we hear many of the
same sentiments reported by Simon Høffding (forthcoming) from his interviews with
The Danish String Quartet. After learning to execute the score, these highly skilled, pro-
fessional musicians say they have to learn the music that comes only after playing the
piece together repeatedly. In their rehearsals, they are not just listening to one another.
86 john m. carvalho

They are also importantly listening for the music. That music has a fleeting quality. It
cannot be heard when one or another player insists on what they have deciphered as its
secret, a secret only she or he can hear. The music emerges in the intersubjective space
created by a shared bending of the ears of every player (and often the living composer in
the case of Kronos) to the music in its emergence. These ears are obviously particular to
each of those players, but they are also general to the quartet as a whole in virtue of
the skills those players have refined by playing together and the affordances they have
shared in their performances. With ears skillfully attentive to the environment of
sound, these performers actively engage what emerges with their voices and their
instruments and make music, enact music in this performance of it.14
Consider the example of Beethoven’s late string quartet, No. 14, Opus 131 (1826),
which is played attacca (without break or pause).15 As the piece goes on for nearly forty
minutes in seven movements without stopping, the musicians are tasked with achieving
and maintaining the music through their own fatigue and their instruments’ changing
tuning. They must continuously enact the music in part by listening, with their own
musical bodies—bodies extended by their individual instruments as well as by the
bodies of those playing with them—to what is emerging in this music, what repeats
and can do nothing more than repeat itself because, in the overfullness of its own sig-
nificance, its signifiance, it cannot signify something to the exclusion of something else.
Witek and Roholt agree on the attribution of groove to music that moves the bodies
of listeners. They point to rock, rhythm and blues, funk, hip hop, and electronic dance
music as sources of listening pleasure derived from a compulsion to move and acting
on that compulsion by moving the body. Witek focuses her more scientific study on the
impact syncopation has on the desire of listeners to move to the music. Roholt focuses
on swing and on the comprehension of a motor-intentionality in the music by listeners
who move their bodies in response to that music. Our focus has been on listeners who
are also performers. Again, we are guided in our intuitions by Barthes:

There are two musics (or so I’ve always thought): one you listen to, and one you
play. They are entirely different arts, each with its own history, sociology, aesthetics,
erotics . . . The music you play depends not so much on an auditive as a manual (hence
much more sensuous) activity . . . it is a muscular music; in it the auditive sense has
only a degree of sanction: as if the body was listening, not the “soul”; this music is not
played “by heart”; confronting the keyboard or the music stand, the body proposes,
leads coordinates—the body itself must transcribe what it reads: it fabricates sound
and sense: it is the scriptor, not the receiver.16 (Barthes 1985b, 261)

From what we have said earlier, we know that for us the body listening is not just
the physical body but (as for Roholt) the haptic, sensuous, affected and affective body,
the body capable of enacting or achieving music not out of habit (by heart) but by a
constant, attentive bending toward the environment where music can be heard emerging
in the performance of it.
This listening is something we can fathom comfortably when it comes to playing
for ourselves or performing solo for an audience. We can also comfortably fathom how
music and emergence 87

this listening is engaged in the enactment of music by small ensembles. The examples
cited previously were the Miles Davis Quintet and Kronos Quartet. It is more difficult
to imagine this kind of listening in a large ensemble, a symphony orchestra or concert
band.17 This difficulty is not a problem for the view set out here, since in those large
ensembles it sometimes happens, for a particular passage, a particular movement, in a
particular performance, that there is a magical confluence of the music making of the
conductor, one hundred or more performers, a concert audience, and the music. These
performances are truly memorable, legendary even. We attend performances of large
ensembles, and perform in them, for the chance to achieve that magic in music. In a
small ensemble, however, such music must be achieved more regularly if that ensemble
and the music they play will be remembered at all. It is likely most regular in string
quartets that play together for twenty-five years or more. In jazz (and rock and pop)
ensembles, too often a clash of egos or the assertion of a single ego leads to the ensemble
disbanding after only a few years, players seeking alternative intersubjective spaces
where they can achieve that general signifying conceivable only within the determi-
nation of the unconscious rather than the particular signifying of one dominant player.
This listening is not unfathomable for those who are not playing the music them-
selves. It is this listening that motivated Roholt and Witek to focus on how the bodies
of listeners are physically moved by rock, hip hop, and electronic dance music. Roholt
considers the case of classical music and ventures to guess that an attention by listeners
to the nuances expressed in some classical performance “will involve some body move-
ment that can be elucidated in terms of motor-intentionality” (Roholt 2014, 125–126).
We would venture to propose that he might discover motor-intentionality in the body
movements of classical performers (who are also listeners) themselves, especially the
body movements of performers in chamber ensembles and string quartets which extend
the ears and minds of those performers in the music that emerges from their playing or,
better, their achieving and enacting of that music.18

At Last!

In what we have said so far, at least two points deserve closer scrutiny: our apparent
commitment to the very idea of the unconscious—why introduce this element since it is
bound to arouse controversy—and the apparent commitment in the conclusions we
draw to an enactive ontology of music. In fact, neither commitment is as controversial
as it seems.
We were led to the concept of the unconscious by following Barthes’s association of
a distinctly human and modern form of listening with the listening of the analyst in
the psychoanalytic setting. We have tried to show that this is a form of listening well
known and acknowledged by performers and attentive listeners alike. Performers reg-
ularly talk about finding the music in what has been formally scored or merely sketched
out in advance. We mentioned the reports of David Harrington of Kronos Quartet
88 john m. carvalho

earlier. As an example of what he is getting at, consider the entrance of the second
violin in Kronos’s performance of “The Beatitudes,” by Vladimir Martynov. The crucially
embodied timing and the tonality of that entrance enacts or achieves the music in this
piece of music. Bowed a little late, a little too soon, with more or less vibrato, just slightly
sharp or flat, and the piece fails to come together. Everything about that piece of music
follows from this entrance, which must be skillfully achieved for the music of “The
Beatitudes” to emerge, and it will only be achieved if every member of the quartet, even
those not yet playing, bend their ears in the direction of that emerging music.
As another example of the same sort, take the Etta James rendering in 1960 of “At
Last!” written by Mack Gordon and Harry Warren in 1941 for the film Orchestra Wives
(directed by Archie Mayo). The timing, tonality, and timbre of the second note James
sings, as “last,” introducing and leading the band into the tune, are crucial to the emer-
gence of the music she and her accompanists achieve in this song. The entrance of the
note sounded as “last” follows a credenza ending in a held 9th chord and following that,
James singing “At” without accompaniment on the dominant 5th of the tune’s tonic scale.
“Last,” then, resolves the tension introduced with “At” and marks the downbeat as well
as the key that signs the song, signifying generally and abundantly the music of “At Last!”
As James holds “At” for as long as it feels right for her (the note is scored with a fermata),
she bends the sound in anticipation of the “last” that will follow (as she bends her ear to
the music emerging in her performance of the song). She increases the tension between
the dominant 5th and the tonic with the time it takes to resolve it and a grain in her
voice that enacts the blues idiom where she locates the tune (Barthes 1985c). She lands on
“last” in a way that cues the entrance of the band and, to do all these things, she must be
skillfully attentive to the affordance turning up for her, listening to the general signifying
she and those performing with her are in the course of enacting. She does not consult
a mental representation of prior performances of the song by herself or another. She
does not remember the song. What she is listening to, in advance, are the affordances
she will engage to enact the song on this occasion, the song that is realized or achieved
only by her skilled performance. No doubt she draws on skills she has acquired and
refined in prior performance of this song and others, but in this enactment of “At Last!”
she engages those skills in the context of the entirely local affordances picked up from
the particular performance of the cadenza and the anticipated skills of the band as a
whole as well as the audience for this particular performance. She bends her embodied
ear to what is afforded by this environment and, skillfully picking up those affordances,
enacts or achieves the music in this song. She brings “At Last!” to life. She deploys her
refined skills to realize and hold together the music of that song, what in that song
repeats itself and cannot but repeat itself in anticipation of her engagement with what is
emerging in the overfullness of her performance of it.
What emerges from “At Last!” in James’s performance is not a secret to be found by
tracing its origins to the musical score for a film about white entertainers (Orchestra
Wives) translated by a black blues singer for another audience and embraced by that
other audience because of what James brings to the performance of the song. Rather,
what emerges as “At Last!” is something James listens for in the song as she enacts it,
music and emergence 89

affording us the chance to listen for ourselves with the skills we have refined for enacting
the music James is performing. It may happen that this performance will result in a
missed encounter. It may happen that, on this particular occasion, James’s skills or the
supporting affordances will not be up to the task. “At Last!” will be realized only just in
case it is enacted by performers and listeners alike. Again, there is a certain magic to
these enactments. Music is not achieved just by the skilled execution of scored notes
performed in a prescribed manner for audiences skilled at evaluating such executions
(Goehr 2007). Music emerges for performers deploying their skills in the context of local
affordances for listeners deploying their skills in the context of their own local affor-
dances to achieve and enjoy not the signs of this or that song but the general signifying
that is the music itself. If there is a certain magic to these enactments, it is not something
mysterious we cannot foresee but a rare confluence of several, variable affordances
turning up and being picked up by skilled practitioners.19
If what emerges in a piece of music is not what Barthes, earlier, called its secret, what
we listen for on the second level, listening as a deciphering, it is also not what, once
heard, the musician can actively achieve by repeating it in memory. What the musician
is listening for each time she performs will be different relative to the local affordances
that vary with the musician’s honing of her skills and with the particular material cir-
cumstances of this or that performance. As we noted, it may happen that the musician’s
skills and the affordances she picks up result in a missed encounter with the general
signifying of the music she is attempting to enact. It may also happen that she encounters
a signifying she is not expecting, that she gives the song a life she did not know it had.
This can happen in the spontaneity of a live performance or it may happen in the prac-
ticed enactment of the song by the same or different performers.
Compare, for example, the renderings of “Stella by Starlight” by Ella Fitzgerald
(Verve 1961) and the Miles Davis Quintet (Columbia 1964). Both arguably achieve the
music of the song, but these are two very different lives that emerge from this same piece
of music. Fitzgerald’s “Stella,” with the help of accompaniment by Ray Brown on bass,
Lou Levy on piano, and Stan Levey on drums, swings. Against the syncopated rhythm,
we listen for the song’s title as it appears late in the lyric line and to the urgency in
Fitzgerald’s up-tempo vocals. The Davis Quintet’s “Stella” quickly dispenses with the
melody. In its place, Davis plays a moody, introspective meditation on the form of the
tune, leading the rhythm section through shifting cadences and drawing a line through
the form that enacts or achieves a music that only emerges in this particular performance
of the tune.
Both performances achieve “Stella by Starlight.” Both enact the music the song would
not have otherwise. (Both find a “groove,” in the language used earlier.) Drawing on
their refined skills as musicians, listening to the resources afforded by the circumstances
of their performances—Fitzgerald recording in a studio, Davis recorded live at Lincoln
Center—they enact the general signifying of the tune, what about the tune repeats
itself in anticipation of being engaged in this environment of sound by their skilled
musicianship.20 What each performer hears in “Stella by Starlight” is animated by an
evenly hovering attention to an intersubjective space formed from the bending of their
90 john m. carvalho

skilled listening to the performance of that song. From their skilled engagement to the
environment of sound turning up in that space, these performers enact what we have
been calling music. Something similar could be said of the renderings of “At Last!” by
the Glenn Miller Orchestra in 1941 and by Etta James some twenty years later. These
performers, Miller and James, each enact the music in the song, yet what emerges as a
general signifying of “At Last!” will vary relative to the specific skills of these musicians
and the affordances local to the performances they give. In effect, “At Last!” is a different
tune every time it is achieved, when it is achieved, allowing again that a given perfor-
mance may result in a missed encounter with the music of that tune.
A more traditional account of the distinction we are drawing would make James’s
an especially powerful token of the type “At Last!” (Wollheim 1980). On that view, the
type of the song would be given in the score, and the first token performance of the piece
would transfer qualities to the type substantiating it, setting a standard for future tokens.
James’s performance would be an especially powerful token that would transfer traits
to the type, thus setting a new standard against which future performances of the
tune, by Beyoncé Knowles, for example, would be judged. In fact, James came to think
of “At Last!” as her song, and Knowles’s performance (at the inaugural ball for
US President Barack Obama) has been judged against a standard supposedly set by James.
This view, however, idealizes the music of this song and music in general. It assumes
that there is a standard of correctness (the score) for measuring the achievements of
Miller and James and Knowles (and others). It would be better to say that Etta James
came to think of the song as hers because she heard a general signifying in the song
that others did not have the skills to achieve. It is not that James is more skilled, rather
her specific skills lead her to pick up affordances that do not turn up for others in
every performance of the tune that enacts the music in it.
Now, what exactly is the song whose general signifying James listens for in her
performance of it? How is the song identified, how is it heard as the song it is? On the
view defended here, “At Last!” is nothing else than the song that is enacted in every per-
formance of it, and it exists or, better, lives in those performances based on the skills of
the performer and the affordances that turn up for her specific skills and the skills of her
audience. What she listens for, what in the service of which she deploys her considerable
skills, is constituted, enacted, or achieved in all of the renderings of it that have been
performed and appreciated and that enable the music to emerge. There is no music in
the tune without a performance of it (Goehr 2007). The tune can be played and heard,
but not every playing or hearing of it achieves or enacts the music in that tune.
This view is grounded in an ecology of cognition that takes the mind to be necessarily
embodied and continuous with the environment in which it is embodied. For this
embodied mind, imagination contributes to the perception, cognition, and evaluation
of the affordances that turn up in the environment for the skills it has acquired and
refined. On this view, music is what emerges from such a skillful engagement with an
environment of sound. Music is just what is enacted and achieved in performances of it.
Performances depend on some manner of composition. Performances depend more
strictly on skilled listeners for those performances. In any case, on this ecological and
music and emergence 91

embodied view, music does not exist in an idea in the mind of the composer or in the
memory of a listener based on recordings or prior performances he has heard. Music
emerges in the skilled enactment of it on this occasion, played and sung by these musi-
cians, in this venue for this audience. Of course, it may happen, as in the “spontaneous
compositions and improvisations” of Charles Mingus or the “creative spontaneous compo-
sitions” of Steve Coleman, for example, that composer, performer, and auditor are
the same person. For an ecology of musical cognition, the problem of the ontology of
music is solved: music just is what emerges in our engaged listening and skilled enact-
ment of it (Neufield 2014).
In the film A Late Quartet, mentioned earlier, Peter (played by Christopher Walken),
the cellist and elder statesman in the group, relates the story of his encounter with the
great Pablo Casals for a master class of young musicians. As a young musician himself,
he played for Casals and, to his ear, performed miserably, but Casals inexplicably praised
him. Years later, as a mature professional, he chided Casals for what he said was Casals’s
insincerity so many years ago, and this time Casals grew angry. Didn’t you play this
figure, Casals asked, picking up his cello to demonstrate, with this fingering? It was a
novelty for me, he said. And didn’t you attack this phrase—again, demonstrating—with
an up bow? Casals emphasized the good stuff, Peter tells his students. He encouraged.
He wasn’t listening for the mistakes. The music in a piece of music is not missed when we
make mistakes. The music is missed when we fail to listen for what is emerging, what
turns up and wants to emerge, in that piece of music, when we fall short of bending our
ears and our skills to what affords us the chance to enact the music in a piece of music we
otherwise do not hear. What emerges in music is the music we make or, better, enact and
achieve by composing, listening, and especially playing skillfully what are, without our
engaged attention and refined skills, only patterns of sounds.

Notes
1. I conceive of this imagination as thoroughly embodied, as something felt about the fit of
this or that skill, as a form of affective cognition on the order of how the body feels about
attacking a snow-packed slope with a pair of skis or feels about the dish that can be made
from what is afforded by the refrigerator. Given what is afforded by an environment of
sound, the musician “imaginatively” feels the deployment of this or that skill will render
the most musical results. She has an embodied and affective sense of what to do with this
environment. This idea is developed in a paragraph below.
2. This account does not fall short of analytic specificity. It rethinks the mind as a continuous,
embodied engagement with its surrounds. It conceives of the imagination as inex
tricably caught up in perception, cognition, assertion, and action as well as with evaluations
of the ethical and aesthetic value of this imagination. All of these dimensions of embodied
cognition are present in different degrees in different engagements as they are afforded by
different environments and as those affordances turn up for the skills acquired and refined
by a particular embodied mind. The musician is an especially rich example of a continuity
of mind, body, and environment that is achieved by a skilled engagement with the music
afforded by sonic material.
92 john m. carvalho

3. This is not to say that the music has no content but only the affordances in an environment
of sound have no content prior to being skillfully engaged by the composer, performer or
listener. Once engaged, the music afforded by that environment acquires a content relative
to the specifically embodied skills deployed in that engagement. This content will be
shared in the same way skills are shared so that it is not surprising that the content the
composer enacts in a score is picked up as affordances and enacted as content by per-
formers for listeners who find affordances in performances for a shared content. Differences
in the embodiments of the skills acquired and refined by composers, performers, and listen-
ers as well as the particular conditions in which the music is enacted will account for dif-
ferent valences in the content on each occasion.
4. I thank Aili Brenahan, Marc Duby, Richard Eldridge, Enrique Morata, Manos Perrakis,
Martin E. Rosenberg, and Dylan van der Schyff, who read and commented on earlier
drafts of this paper.
5. By “modern” Barthes refers, as we see in more detail later, to a time after the discovery of
the unconscious, so from the late 18th century (in the work of Friedrich Schelling) to his
present day. Also, he is referring to a mode of listening and not to the modernity of what
we are listening to.
6. A piece of music heard at Stanford University many years ago as part of a recital by stu-
dents was composed for audience members to take the stage and use hammers and nails
provided to assemble random lengths of two-by-fours into unspecified arrangements. In
this piece, likely inspired by the work of La Monte Young, otherwise asynchronous sounds
and untampered pitches developed a rhythm and a tonal palette over time that we would,
today, attribute to a form of entrainment. We may suppose Barthes has such a phe-
nomenon in mind.
7. This secret is not the unconscious. It is rather the meaning or signification that the sign,
insofar as it is a sign, conceals, even as it points to it.
8. “Transference” refers to the psychoanalytic patient’s unconscious redirection of his own
affects toward the analyst.
9. It may be helpful, here, to distinguish this account of music making and appreciation from
a more traditional or standard story. It is often thought that the musician hears something
of the developing motif of the music she is making and makes music of the notes she is play-
ing or singing by her attention to this development. Memory is required, and the pattern the
music realizes in her performance is followed from recollection (Eldridge 2003, 132–133).
This makes music a mental activity heard, first, in the mind of the composer and performer,
then, communicated to the listener. On our account, there is only music in the enactment of
it from affordances picked up in the environment where that music is performed. For the
skilled musician those affordances are vast. They include the score, the instrument, past
performances by the musician herself and others, a note just played that was just slightly too
sharp, a page turned too quickly, what has been played by other musicians in the ensemble,
and on and on. The skilled musician also has a way of navigating all of these variables, effi-
ciently and creatively. She quickly cancels what will not contribute to her enacting the music
in this piece of music on this occasion. She will make it seem effortless, and when she is
successful she brings to her audience something more than a pattern of tones more or less
perfectly executed. She brings something of herself. She communicates something of the
affordances that turn up for her, and only her, in the performance of this piece of music.
10. In fact, Schaeffer posits four modes of listening: Écouter (attentive to the source of the
sound), Ouïr (nonattentive listening to the context of the sound), Entendre (selective
music and emergence 93

appreciation of the sound itself and its qualities) and Comprendre (attribution of a mean-
ing to the sound) (Schaeffer 1966). Tuuri and Eerola (2012) appear to have left out Ouïr in
their initial assessment of the literature but build it into their revised taxonomy.
11. Pierre Schaeffer’s musique concrète presents an interesting test in this context. Starting
with a type of reduced listening (écoute réduite), Schaeffer and those who followed him
attempt to find music in sounds dissociated from their sources in traditional musical
instrumentation and representation or abstraction in scores. On our terms, then, Schaeffer
and fellow electroacoustic practitioners have acquired and refined skills that allow music
to emerge from an environment of sound that is not restricted to the tradition of “serious”
music. If there is a difference between our view and theirs, it is in the volitional reduction
or bracketing of sounds from their source which is part of the skill set of electroacoustic
musicians. On our view, music emerges from an embodied engagement with the sound
environment. These intuitions, inspired by Simon Emmerson’s comments, deserve further
study elsewhere.
12. This is how it might be possible to speak about an unconscious in music but not a conscious-
ness. We tend to think of consciousness as some one thing, whatever it might be, whereas the
unconscious emerges from relations between elements, positively and negatively charged
cathexes, just as music might be said to emerge from patterns that turn up as melody, har-
mony, and rhythm in the notes played and sung without emerging as something we can
specify (see Freud 1953b). We make no claims here for an unconscious in music. Our aim is
rather to suggest an analogy between the listening of the analyst in the psychoanalytic
setting and the listening of the composer, performer, and auditor in the case of music.
13. See, for example, the interview by Don Kaplan, “Navigating a Single Note: The Kronos
Quartet’s David Harrington” at www.learningmusician.com/features/0107/DavidHarrington
(2007); “Interview with David Harrington of Kronos Quartet” at www.youtube.com/
watch?v=hxoF0wMb0Jc (July 2013); and “Spotlight on . . . David Harrington (Kronos
Quartet)” at www.youtube.com/watch?v=ibGTx4CY1VA (2013). Accessed October 5, 2017.
14. For the 2014 meeting of the American Society of Aesthetics in San Antonio, Texas, the
chamber ensemble SOLI performed and discussed their corroborations with living com-
posers. Working with Robert Xavier Rodríguez on Música, por un tiempo, the musicians
came to recommend that one of the movements be played at a tempo slightly modified
from how it was scored. Rodríguez agreed, and in that modified tempo performers
and composer together “found” the music in that score. We would say that composer and
performers, bending their ears to the environment of sound originally scored, engaged
what was afforded in that environment and contributed, through their recommendation,
to the emergence of the music in that environment by enacting it in a performance of that
music. Examples of music coming together in this way are the norm and not the exception
in the course of making music.
15. As featured in the film A Late Quartet (Yaron Silberman, US, 2012).
16. The relation between the scriptor and the receiver in music should be compared with what
Barthes elsewhere calls a “writerly” and “readerly” text (Barthes 1977).
17. The exception would be the swing bands of the 1940s headed by Count Basie, the Dorsey
brothers, Duke Ellington, Stan Kenton, and others.
18. The Music of Strangers (Morgan Neville, US, 2015) documents Yo-Yo Ma’s Silk Road
Project—a collective of musicians displaced by crises and brought together by a conviction
that music can make a difference in the world—with images of performers whose seemingly
noninstrumental flourishes are no doubt crucial to enacting or achieving the music they
94 john m. carvalho

play. Such noninstrumental flourishes proliferate in performances of Kronos Quartet.

These same noninstrumental flourishes and forms of bodily engagement, as represented
by the actors in A Late Quartet, are crucial for the verisimilitude of that film.
19. This achievement is what distinguishes music from what only passes for music and every-
thing that does not even pass. Skilled execution, whether it be classical music, jazz, or
pop, may pass for music, but if it only passes it is missing what is achieved in those (mostly
live) performances where the context of local affordances allows the enactment of some-
thing truly musical. In the studio, it is what distinguishes which recording of the same
tune makes it to the album.
20. Working in the studio, Fitzgerald’s rendering can be refined to the musicians’ satisfaction.
In the Quintet’s live performance, Davis draws a listener’s excitement—someone in the
audience wails loudly in response to a note he has played—into the melodic line he is
drawing. In fact, in the context of this performance, each solo can be described as aiming
to enact the music in this piece of music with difference drawn from each musician’s spe-
cifically honed skills, from the resources afforded and extended by his particular instru-
ment, by the resources that have turned up in what has already been played and what he
can anticipate being played, and what is emerging in the tune as a whole which he listens
for as the band plays through the form.

References
Barthes, R. 1977. From Work to Text. In Image/Music/Text, translated by S. Heath, 155–164.
New York: Noonday.
Barthes, R. 1985a. Listening. In The Responsibility of Forms, translated by R. Howard, 245–260.
Berkeley: University of California Press.
Barthes, R. 1985b. Musica Practica. In The Responsibility of Forms, Translated by R. Howard,
26–66. Berkeley: University of California Press.
Barthes, R. 1985c. The Grain of the Voice. In The Responsibility of Forms, translated by
R. Howard, 267–277. Berkeley: University of California Press.
Chion, M. 1983. Guide des objets sonores: Pierre Schaeffer et la recherche musicale. Paris: Buchet
Chastel.
Chion, M. 1990. Audio-Vision: Sound on Screen. New York: Columbia University Press.
Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
Eldridge, R. 2003. Hegel on Music. In Hegel and the Arts, edited by S. Houlgate, 119–145.
Evanston, IL: Northwestern University Press.
Freud, S. 1953a. Beyond the Pleasure Principle. In The Standard Edition of the Complete
Psychological Works of Sigmund Freud, Vol. 18, translated by J. Stachey, 1–64. London: Hogarth.
Freud, S. 1953b. The Unconscious. In The Standard Edition of the Complete Psychological Works
of Sigmund Freud, Vol. 14, translated by J. Stachey, 159–215. London: Hogarth.
Freud, S. 1963. Recommendations for Physicians on the Psychoanalytic Method of Treatment.
In Therapy and Technique, translated by Joan Rivere, edited by P. Rieff 117–126. New York:
Collier Books.
Gibson, J. 1979. The Ecological Approach to Visual Perception. Hillsdale, NJ: Erlbaum.
Goehr, L. 2007. The Imaginary Museum of Musical Works: An Essay in the Philosophy of Music.
Oxford: Oxford University Press.
music and emergence 95

Huron, D. 2002. A Six-Component Theory of Auditory-Evoked Emotion. In Proceedings of

the International Conference on Music Perception and Cognition, edited by C. Stevens,
D, Burnham, G. McPherson, E. Schubert, and J. Renwick, 673–676. Sydney: AMPS and
Causal Productions.
Høffding, S. 2019. Performative Passivity: Lessons on Phenomenology and the Extended and
Musical Mind with the Danish String Quartet. In Music and Consciousness 2: Worlds, Practices,
Modalities, edited by R. Herbert, D. Clarke, and E. Clarke. Oxford: Oxford University Press.
Kaplan, D. 2007. Navigating a Single Note: The Kronos Quartet’s David Harrington. Learning
Musician. http://learningmusician.com/features/0107/DavidHarrington.
Kronos Quartet. 2012. Music of Vladimir Martynov. New York: Nonesuch Records Inc.
Lacan, J. 1981. The Unconscious and Repetition. In The Four Fundamental Concepts of Psycho-
Analysis, translated by A. Sheridan, 17–64. New York: Norton.
Neufield, J. 2014. Musical Ontology: Critical, Not Metaphysical. Contemporary Aesthetics 12.
Roholt, T. 2014. Groove: A Phenomenology of Rhythmic Nuance. New York: Bloomsbury.
Schaeffer, P. 1966. Traite des objets musicaux. Paris: Editions du Seuil.
Schnupp, J., I. Nelkin, and A. J. King. 2012. Auditory Neuroscience: Making Sense of Sound.
Cambridge, MA: MIT Press.
Tuuri, K., and T. Eerola. 2012. Formulating a Revised Taxonomy for Modes of Listening.
Journal of New Music Research 41 (2): 137–152.
Tuuri, K., M. Mustonen, and A. Pirhonen. 2007. Same Sound Different Meanings: A Novel
Scheme for Modes of Listening. In Proceedings of Audio Mostly, 13–18. Ilmenau, Germany:
Frauhofer Institute for Digital Media Technology.
Witek, M. A. G., E. F. Clarke, M. Wallentin, M. L. Kringelbach, and P. Vuust. 2014. Syncopation,
Body-Movement and Pleasure in Groove Music. PloS ONE 9 (4).
Wollheim, R. 1980. Art and its Objects. 2nd ed. Cambridge: Cambridge University Press.
chapter 5

A ffor da nce s i n R e a l ,
V irt ua l , a n d
Im agi na ry M usica l
Per for m a nce

Marc Duby

Introduction

The concept of affordance as “the ecological equivalent of meaning” (Reybrouck 2015,

16) finds its roots in the work of James J. Gibson (1962, 1968, 1979, 1994, 1998). Gibson
frames perception in terms of agents in motion, actively engaging within specific
environments, employing each sense as a perceptual system, defined as “a functional
assemblage of tissue comprising classically defined sensory and motor tissue” (Michaels
et al. 2007, 750). In Gibson’s understanding, the perception–action (henceforth PA)
cycle forms the foundation of the synergistic relationship between perception and
action, synergistic because informationally rich environments respond to (and are
affected by) the active, questing creatures situated within them. Fuster (2013) describes
the operations of this cycle in terms that place the emphasis on the feedback loops that
obtain from organism-environment coupling:

A flow of environmental signals gathered by sensory systems shapes the actions of

the organism upon the environment; these actions produce environmental changes,
which in turn generate new sensory input, which informs new action, and so on.
This circular flow of information operates in the interactions of all animal organisms
with their environment. (90)

A growing body of literature (Clarke 2005; Barrett 2011, 2014; Krueger 2011, 2014;
Windsor 2011; Windsor and de Bezenac 2012) seeks to understand musical performance
98 marc duby

from a Gibsonian ecopsychological perspective, invoking various notions of affordances

as central to their arguments. In this regard, Krueger (2014) suggests that musical affor-
dances “help elucidate the extent to which audition (including music listening) and
action are fundamentally interwoven” (2). While the connections between live musical
performance and performers’ actions appear straightforward because they are directly
observable, Krueger’s argument also frames the listening agent as an active participant
in an ongoing sense-making project, as opposed to a passive recipient of sensory impulses
emanating from outside to impinge on a blank slate.
Reybrouck (2015) considers the implications of such a perspective for understanding
music in stating, “[d]ealing with music in ecological and biosemiotic terms means that
we should try to understand music not merely in terms of its acoustical qualities but in
terms of what it affords to the listener” (16). Performers also listen, and the ecological
approach to music allows for the possibility of understanding both listening and per-
formance as active sense-making processes, which require imagination and action to bring
to fruition.
After a period of relative exile from academic discourse, the topic of imagination seems
to have returned to the forefront of debates in recent times (Cook 1992; Godøy and
Jørgensen 2001; Hargreaves et al. 2011; Hargreaves 2012; Markman et al. 2015; Kind 2016).
With respect to musical creativity, Hargreaves (2012) argues, echoing Plato, that
“creativity” is an ill-defined term, actually “only one facet of a much broader phe-
nomenon, the central core of which is imagination” (546). Hargreaves proposes that
musical creativity be reconceived in terms of “the cognitive processing underlying music
perception and production” (545). Since performing music is an especially demanding task
that simultaneously recruits a number of perceptual and action systems, exploring the
varieties of cognitive processing in musical performance seems instructive, and as Schlaug
(2015) contends, “[t]he demands placed on the nervous system by music making are
unique and provide a uniquely rich multisensory and motor experience to the player” (38).
Clark and his colleagues provide a useful definition of musical imagery couched in
terms that include recollecting sounds in absentia, the sense of proprioceptive and kines-
thetic actions required for musical performance, and more overtly phenomenological
intuitions regarding the score, the instrument, the specific environments in which
music is performed, and the emotional components that such performances subserve.
They write:

Musical imagery has often been viewed and considered as the ability to hear or
recreate sounds in the mind even when no audible sounds are present. However,
imagery as used by musicians involves not only the melodic and temporal contours
of music but also a sense of the physical movements required to perform the music,
a “view” of the score, instrument, or the space in which they are performing, and a
“feel” of the emotions and sensations a musician wishes to express in performance
as well as those experienced during an actual performance. (2011, 352)

As opposed to recollection of musical sounds in absentia, this chapter’s goal is broadly to

explore the second sense of this statement, that is, to place the emphasis on motor
affordances in real, virtual, and imaginary music 99

actions as musicians use them in real or imagined performance, attending in the first
place to the physical movements that bring music (understood tout court as “organized
sound”) into being.
Learning to play an instrument necessitates a prolonged period of acquaintanceship
to acquire technical proficiency: for instance, to learn the gradations of pressure to apply
to a violin bow to produce a particular sound quality (Cumming 2000). Through physical
engagement with the task, long-term changes in brain plasticity ensue (Schlaug 2015),
facilitating and strengthening what Heft (2001) describes as “the mutuality between the
knower and the object known” (143). One way to approach this mutuality with regard to
musical instruments is to understand them as transducers (devices for converting one
form of energy into another). This is how Baily (1992) treats them:

A musical instrument is a type of transducer, converting patterns of body movement

into patterns of sound. The interaction between the human body, with its intrinsic
modes of operation, and the morphology [the study of the forms of things, in par-
ticular] of the instrument may shape the structure of the music, channeling human
creativity in predictable directions. (149)

So saying, Baily makes explicit the mechanisms whereby body movements are trans-
formed into audible patterns of organized sound through embodied engagements. In
broad sympathy with the fundamental principle that fingers and voices move air to
bring sound into being, it is also feasible to consider musical instruments as tools1 and,
in this regard, the concept of affordances provides an opportunity for understanding
musical instruments as real and imaginary tools and a starting point for exploring the
various configurations of human–musical instrument interfaces that this concept might
illuminate.
Such interfaces range in possibilities along a spectrum from directly embodied (as
in cases of a musician generating sounds in the moment) to more or less disembodied,
as in the case of the air guitar and its related cousin, the virtual air guitar (Karjalainen
et al. 2006). Virtual instruments, in which the performer’s actions generate MIDI data
from keyboards, drum controllers, or other sources (wind controllers, guitar synthesizers,
and so on) form a middle ground by introducing a degree of arbitrariness between
actions and outcomes. Such raw data does not by itself specify the intended sound
because the receiving device (generally a digital computer) treats the incoming binary
data as “pure” information, and not sound. The performer (or composer/arranger) is
then free to assign appropriate virtual instruments to translate these data into sound.
Through the Gibsonian idea of active touch (1962), musical instruments can be
understood as a specialized subset of tools with the potential to provide agents with real
and imaginary affordances (environmental opportunities for feedback-directed action
and learning). On this view, musical instruments afford playability just as different
surfaces variously afford climbability, stability, or concealment for different creatures.
So understood, as tools whose configurations have changed over time—whether real,
virtual, or entirely nonexistent (instruments of human construction, computer-based,
100 marc duby

or made of thin air)—musical instruments afford a wide range of imaginative possibilities

for active participation through performance, composition, or listening.

The Hand as a Perceptual System,

Musical Instruments as Tools

The form of our mind is shaped by our handedness. The kind of mind we
exemplify is influenced by our possession of hands.
—McGinn (2015, 67, original emphasis)

Stanley Kubrick’s celebrated film 2001: A Space Odyssey (1968) opens with a scene from
prehistoric times known as “The Dawn of Man.”2 Shortly after a mysterious black mon-
olith of otherworldly origin is discovered by a group of apes on the African savannah,
Moonwatcher, the leader of the troop, grasps in a momentous instant the creative—and
destructive—potential of the bone he has to hand by using it to smash to pieces the skel-
eton from whence it came. For proto-man, from grasping the affordances of the animal
bone (as a weapon, a tool for waging war) to murdering the leader of the competing
troop is one small step, and as the conquering alpha male flings the weapon aloft in cele-
bration, the spinning bone morphs into a futuristic space station to the melodious
strains of The Blue Danube.
As Horn describes it (2015), “Kubrick’s segue from a triumphantly hurled tibia bone
weapon to a cylindrical Earth-orbiting satellite brilliantly encapsulated 4 million years
of technology into 10 seconds of film.” Through cross-fading the two events, Kubrick’s
imagery—by way of Arthur C. Clarke’s imagination—proclaims the direct links between
technology and a telescoped version of evolution.
After a combination of unforeseen factors such as prehistoric climate changes and
competition for scarce resources forced our ancestors to descend from the relative safety
of their arboreal environment (McGinn 2015), early man, as much prey as predator, was
forced by harsh and dangerous conditions to become a tool-maker. Harari (2014) notes
how, beginning approximately 2.5 million years ago, “evolutionary pressure brought
about an increasing concentration of nerves and finely tuned muscles in the palm and
fingers” (9). These evolutionary adaptations enabled humans to produce ever more
sophisticated tools, so that “the manufacture and use of tools are the criteria by which
archaeologists recognise ancient humans” (10).
Weapons such as arrows and spears enabled killing at a distance, so empowering the
hunters to attack larger prey and providing a means of self-defense against predator and
proto-human competitors. Anatomical developments such as larger brains, opposable
thumbs, and upright posture equipped our ancestors for the emergence of the new
world of the savannah, as Wallin (1991) argues:
affordances in real, virtual, and imaginary music 101

The upright posture presented new perspectives. To see and to hear was given a new
context. The anterior limbs were made free for communicative gestures, for making
and for using tools and instruments, for combining and comparing objects which
earlier had not had any obvious relation to each other, to support the balance during
that very specific locomotion, the dance, which was released by intense sound
sequences. (493, emphasis added)

These interpretations of prehistoric developments connect posture, perception, and

movement to a PA cycle unfolding over evolutionary time. Wallin describes how biped-
alism freed the forelimbs for gesture and tool-making. As a further consequence of
these evolutionary developments, early man’s developing ability to hunt and defend
himself allowed for the possibility of leisure time, for stories, music, art, and dancing;
in short, the enabling conditions for early culture to emerge.3
Moonwatcher’s actions at the beginning of 2001 fictionalize the enactment of an
imaginary proto-human evolutionary leap, in which new action possibilities emerge
from physically grasping an environmental object and mentally grasping its potential as
a tool all at once; in short, grasping its affordances. In this regard, Michaels and
colleagues (2007) characterize such a process on the basis of dynamic touch:

Dynamic touch is a subsystem of the haptic perceptual system, and refers to perceiving
properties of hand-held and hand-wielded objects. During the process of wielding,
one can be aware of a variety of properties of the object being wielded such as length,
orientation, and heaviness. (751)

Active or dynamic touch forms a vital element in approaching the transformations in

the PA cycle that ensue in learning to play a musical instrument.
In the first place, the instrument provides both tactile and auditory feedback. In playing
a stringed instrument, the notes set the fingertips vibrating. From the languages of biol-
ogy and physics emerges the idea that instrument and musician alike are behaving like
resonant bodies in a force field of vibrating energy. One acquires the contours of a given
acoustic environment and how it responds to sound (in short, its acoustic fingerprint)
through a multisensory grasp of the instrument’s behavior in specific spaces. Similarly,
at loud concerts the low-frequency elements of bass and kick drum are not just heard
but also felt with one’s whole body resonating in sympathy.
Playing the double bass affords characteristic morphological possibilities such as
using the sympathetic resonance of the open G string to adjust the pitch of the stopped
one an octave below and the deployment of harmonics for tuning the instrument. The
harmonic D on the G string has the same pitch as that of its counterpart on the D string,
and this property of the instrument’s morphology can be used for tuning the instrument
in the absence of a reference tone (a tuning fork or digital tuner). For ecological psy-
chology, such properties are availed through direct perception.

To remain with the example of tactile perception, imagine handling an object presently
out of sight. By turning it around in one’s hand and feeling its surfaces, contours,
102 marc duby

and edges, invariant properties of the object are revealed that specify its shape and
quite possibly its identity. The superiority of active over passive touch in shape
recognition is a robust experimental finding. (Heft 2001, 174)

The PA cycle (in which the notion of feedback—more precisely, degrees and types of
feedback—plays a vital role) provides a framework for comparing this experience with
those of virtual and nonexistent instruments, which also provide auditory feedback. In
the last two categories of instruments, tactile (haptic) feedback from the instrument is
unpredictable to a degree because of its mediation through MIDI (playing a MIDI guitar
may generate the sound of a saxophone, for argument’s sake) or absent altogether, as in
the case of air instruments.
To resolve the apparent category mistake4 implicit in terms such as “motor imagery”
and “auditory imagery,” one might consider the thought experiment of inverting such
terms so reframe “motor imagery” simply as imagined movement, and by corollary
auditory imagery as imagined sound. Far from mere verbal sleight of hand, this exercise
restores two perspectives: first, that imagining movement may not require intermediary
representations5 to be re-implemented in action, and second that the disciplinary
procedure of considering perceptual systems in isolation of necessity overlooks their
multimodal integration in complex organisms.
Creatures actively engaged in environments where parsimonious cognitive decisions
enable quick responses to systemic or local changes make sense of simultaneous multi-
sensory information available to the participant, whether deploying vestibular, visual,
auditory, haptic, or taste and smell, as per Gibson’s list of perceptual systems. As this
information becomes available to the organism by way of affordances, it becomes mean-
ingful, and therefore it seems plausible to consider the hand as a special case of a perceptual
system. Handedness, as McGinn’s epigraph proclaims, remains key to shaping the human
mind and will continue to do so as humankind and technologies continue to co-evolve. On
this view, contemporary hand-held technologies (such as mobile phones and tablets) may
well spark a new cognitive revolution as fingers and thumbs adapt to these new interfaces.

Virtual Instruments: Analog

Hardware Reimagined

Every path to composition engages tools, be it a pencil, a drum, a piano, an

oscillator, a pair of dice, a computer program, or a phone application. Each
tool opens up aesthetic possibilities but also imposes aesthetic constraints.
—Roads (2015)

The increasing sophistication of digital computers has facilitated a wide range of pos-
sibilities for musical simulations. In this section, the discussion focuses chiefly on
affordances in real, virtual, and imaginary music 103

two aspects: the use of computing technology to simulate the behavior of analog
equipment (such as Hammond organs, Fender-Rhodes electric pianos, and guitar
amplifiers, to name a few) and its potential to recreate the soundscapes of iconic
recording studios. For a fee, the home studio enthusiast may purchase access to the
acoustic fingerprint of celebrated environments such as Abbey Road, including virtual
simulations of tape technology without the expense, weight, and inconvenience of the
original analog equipment.
The Fender Rhodes electric piano is a weighty beast whose signature sound has
graced many a recording, and it is no doubt convenient for home studios to have access
to this fingerprint without its less convenient aspects: likewise, the Hammond organ,
now in a virtual digital version where the recordist has access to its wide range of sonic
possibilities. More modern virtual instruments such as the Korg Wavestation consist
of the original digital synthesis programs used in the hardware unit, so porting over
its sonic capabilities to a computer platform. This is less a simulation than a shift of
environment from hardware (integrated circuits capable of various forms of digital
synthesis) to software, which “reads off ” the synthesizer’s behavior using the same
digital parameters as the original unit.
The advent of digital technologies has rendered virtual the spaces of costly recording
studios and their equipment, so that it is possible (at least in theory) for the recordist to
recreate the sound of these environments at home. Companies that offer software for
such simulations claim that these are meticulous physical models of their original
analog counterparts and some comparisons by reviewers seem to indicate a close degree
of similarity between the original environments and their software models.
Emulating the behavior and sound characteristics of analog equipment using digital
technology raises the problem of translating the nonlinear response characteristics of
analog technologies into the binary language used by computers, pointing to different
(if not incompatible) ways of encoding information. “Analog” implies fluctuations in
voltage as produced by changes in attack and volume of an electric musical instrument,
for argument’s sake, whereas digital technologies sample such voltage changes using a
bit stream of zeros and ones to take snapshots of the system state in infinitesimally
small increments.
The differences between these technologies are perhaps best exemplified by how they
handle distortion. Vacuum tube technologies found in “retro” guitar amplifiers produce
nonlinear effects (harmonic distortion) when overdriven, as opposed to digital clipping
in the case of a computer, which compromises the integrity of the original signal.
In other words, unlike analog equipment, it is impossible to overdrive a computer in a
musically pleasing way6.
The brave new world of the digital computer has certainly simplified the task of
precision editing. Compare, for argument’s sake, Pierre Schaeffer’s labor-intensive tasks
in producing his original musique concrète compositions to the relative ease with which
such tasks can be performed in the digital domain. As Schaeffer (2012) notes, the tech-
nological options of the time incorporated an element of risk insofar as their results
were unpredictable:
104 marc duby

A movement of the bow responds with dignity to the composer’s notations, to the
conductor’s baton. But the effects of a turn of a handle on the gramophone, an
adjustment of the potentiometer, are unpredictable—or at least we can’t predict
them yet. And so we reel dizzily between fumbling manipulations and erratic effects,
going from the banal to the bizarre. (79)

Consider the exercise of playing a sound backward. Using the available technology of
Schaeffer’s time, this would entail cutting a length of magnetic tape, resplicing and
reversing it so that the information was read as back to front by the playback head. To
replicate this effect in the digital domain, the recordist simply changes the order of the
sample data, so that the direction of the bit stream is reversed and the computer reads
the data from back to front to produce the desired effect. As Doornbusch and Shill
(2014) note, the concept of affordances also finds application in the field of digital audio
editing: “Having audio available in a digital form can be said to be an affordance to the
editing and manipulation of that audio” (27). The advent of the digital computer has
loosened the centuries-old bonds between sound production and its outcomes, so
paving the way for new methods such as granular synthesis, of which Opie writes (2015),
“If you want to do it the original Xenakis way you will need a reel to reel tape recorder, a
razor blade, sticky tape, and a lot of time.”
Physical modeling as a synthesis technique attempts to recreate the sonic behavior of
musical instruments by simulating their physical characteristics. According to Hind
(2016), “With physical modeling it is the actual physics of the instrument and its playing
technique which are modelled by the computer.” From physical modeling, the designers
of the Virtual Air Guitar adopted the Karplus-Strong algorithm, “a computationally
efficient digital wave-guide algorithm for modeling the guitar string as a single-delay
loop filter structure with parametric control of the fundamental frequency and losses in
the filter loop” (Karjalainen et al. 2006, 965). Their aim was to create “a pleasant rock
guitar sound experience” (969) (complete with a simulated vacuum tube amplifier and
distortion) where the player has a degree of control over the outcome as opposed to the
“schizophonic”7 experience of popular video games such as Guitar Hero (Miller 2009;
Katz 2012), where the would-be guitarist has to deal with a controller interface, a plastic
instrument without strings that is modeled on a Fender Stratocaster.
Miller (2009) extends Schafer’s original concept to what she terms “schizophonic per-
formance,” in which clear lines dividing live and recorded performance are blurred “by
combining the physical gestures of live musical performance with previously recorded
sound” (401). Her nuanced work raises vital questions regarding notions of authenticity
and identity, specifically that of the rock guitarist. She writes: “By giving players an immer-
sive gaming experience filled with rock-oriented cues—including the musical repertoire,
archetypal rock-star avatars, a responsive crowd, a guitar-shaped controller, and physical
performance cues—Harmonix [the designers of Guitar Hero] encouraged them to adopt a
rock-star identity.” In this respect, Katz (2012) notes how “digital music technologies—in
the form of video games and mobile phone applications—challenge traditional notions of
musicianship and amateurism” (460). Rather than framing such technologies as less than
affordances in real, virtual, and imaginary music 105

real, as pale shadows of an Ur-experience of the authentic, Miller (2009) strikingly

suggests, “these games might be compelling and valuable not just as simulations, fantasy-
enablers, and stepping stones to real instruments, but because they offer people a new
kind of musical experience” (425). One might quite simply frame such new experiences
in terms of the unique imaginative possibilities afforded by such musical games.

The Worshipful Fraternity

of Air Guitarists

In this shifting musical setting, rock guitar has assumed an almost “tradi-
tionalist” aura for many audiences and musicians, encased in a nostalgia for
past forms that in previous eras was reserved for more folk-based styles of
expression. At the same time, though, rock guitar has moved into more
hybridized contexts wherein the polarity between analog and digital, electric
and electronic musicianship, becomes the basis for creative fusion.
—Waksman (2003, 131)

What do the gestures of air guitar enthusiasts mean for theories of embodied cognition?
According to Godøy (2006), it is not uncommon

to see people making sound-producing gestures such as playing air drums, air guitar,
or air piano when listening to music. Our observation studies of people with different
levels of expertise, ranging from novices with no musical training to professional
musicians, playing air piano, seem to suggest that associations of sound with sound-
producing gestures is common and also quite robust even for novices. (155)

What Godøy proposes is a direct link between sound and gesture that applies equally to
all musically inclined participants; that is, independently of the expertise required to be
a professional musician. By connecting instrumental sounds to the imaginary actions
required to bring them to life, Godøy provides a lens through which to examine the
nature of cognition as embodied action. In the case of the air guitar, this is a lens that
refracts light, whose resulting picture bends reality and challenges comfortable
assumptions about human musicking. At first glance, it might seem patently absurd to
spend time discussing such an extreme case as that of the air guitar, a phenomenon
one might want easily to dismiss rather like the notorious case of the pop duo Milli
Vanilli, whom the media unmasked as charlatans after it was discovered that they had
employed session singers on their hit record and mimed onstage to a prerecorded
backtrack.8 Popular and critical outrage at their inauthentic performance tactics
prompted the withdrawal of their awards, and subsequently their career took a disas-
trous turn into obscurity.
106 marc duby

Godøy (2004) elsewhere argues for profound links between sound and gesture
maintaining that,

when listening to the ferocious beating of drums, it is probably almost impossible to

avoid having images of an equally ferocious activity of hands and mallets hitting the
drum membranes, and conversely, listening to a slow, quiet, and contemplative
piece of music would probably evoke images of slow and smooth sound-generating
gestures. (58)

It seems obvious—but not trivial—that “the associations of sound with sound-producing

gestures” Godøy describes are grounded in movement, in what has been termed mind
as motion (Port and Van Gelder 1995; Sheets-Johnstone 2011; see also Godøy, this
volume, chapter 12).
Imagine this mind within the setting of a stadium such as Woodstock 1969, the sheer
power, the high volume, the presence of a large crowd of people—Pete Townshend’s
windmilling gestures, the ecstatic abandonment of Alvin Lee, Hendrix’s evocation of the
Vietnam War in “Machine Gun”—thus, through films and recordings, are immortalized
the musical gestures of a whole host of soon-to-be canonized guitar heroes. Air guitar
practice inverts in one sense this heroic aspect by claiming such status for everyman
guitarist; at the same time, it exalts the long-term love affair between humans and instru-
ments, which Latour (2005) describes as “the constant companionship, the continuous
intimacy, the inveterate contiguity, the passionate affairs, the convoluted attachments of
primates with objects for the past one million years” (83).
Humans mold, and are in turn molded by, instruments. Some of them have the potential
to injure us until we learn how to treat them right.9 At the same time, they lend us their
unique voices. As much as musicians’ actions give voice to matter, matter as the instru-
ment speaks through the hands and breath of the player. What, though, if this is an
imaginary instrument? Is it right and fitting to speak of imaginary affordances accom-
panying imaginary gestures?
The bogus pomp of this section’s title points to the medieval-guild-like character
of this imagined community, with its inherent “traditionalism” and “nostalgia for past
forms,” as Waksman’s epigraph suggests. The electric guitarist as hero is surely one of the
abiding mythologies of twentieth-century rock culture, incorporating notions of super-
human virtuosity, presence, authenticity, technical (and by implication, sexual) prowess,
and an aura of masculinity. Originating in Oulu, Finland, in 1996, the idea for an air guitar
competition (Türoque and Crane 2006) was based on an imaginary civil war requiring
“air guitar forces” for its resolution.
Air guitar contests set soberer traditional music competitions on their heads, raising
the question of what criteria the judges of such contests deploy in assessing the winner.
Fortunately for the perplexed, the website of the Air Guitar World Championships, at
this time of writing in their twenty-first year, (http://www.airguitarworldchampion-
ships.com) reveals all. While providing for personal air roadies but disallowing real or
imaginary backing bands, the site projects a distinctly egalitarian aura:
affordances in real, virtual, and imaginary music 107

Air Guitar is all about surrendering to the music without having an actual instrument.
Anyone can taste rock stardom by playing the Air Guitar. No equipment is needed, and
there is no requirement for any specific place or special skills. In Air Guitar playing all
people are equal regardless of race, gender, age, social status or sexual orientation.

The procedure is for contestants to submit a one-minute audio clip for miming to, to be
played over a “big sound system,” with the jury criteria listed as: “Originality, the ability
to be taken over by the music, stage presence, technical merit, artistic impression and
airness.” In the unlikely case of mystification, the last criterion (“airness”) is defined in
the rules of the US Air Guitar Championships (http://usairguitar.com/rules2/) as “the
extent to which a performance transcends the imitation of a real guitar and becomes an
art form in and of itself.” Hutchinson (2016) understands airness as “a term meant to
pinpoint some of those ineffable qualities that transform a competent performance into
a truly great one” (416). The notion of congruency in pantomime also comes into play in
defining the criteria for technical merit. As the site claims, “You don’t have to know what
notes you’re playing, but the more your invisible fretwork corresponds to the music
that’s playing, the better the performance.”
The air guitar phenomenon has inspired a number of websites that specialize in the
sales and marketing of these invisible instruments, with Dimitri’s Air Guitars in Sydney,
Australia, a front-runner (http://www.air-guitars.net/home/about.html). Here it is possible
to order not only electric, acoustic, and bass air guitars, but useful—if not essential—
accessories such as air strings and plectrums. Learning the right moves for the budding
air guitar practitioner entails imitation, defined by Buccino and colleagues (2004) as
“the capacity of individuals to learn to do an action from seeing it done. Imitation implies
learning and requires a transformation of a seen action into an ideally identical motor
action done by the observer” (323, original emphases).
As Heine van der Walt (aka Lord Wolmer, the South African air guitar champion in
the late 2000s) describes his learning process, he drew his original inspiration from
watching VHS tapes of bands like Iron Maiden and Megadeth: “My elder brothers were
in high school and they would watch these tapes, and there’s me, about 6 years old,
staring. As I watched the bands I thought ‘Man, that must be the best job in the world’ ”
(AuntyNexus 2014). With the credentials of a professional touring guitarist, van der
Walt might be thought to have possessed an unfair advantage over so-called nonmusicians.
Regarding his time in the contest, he notes that the number of professional musicians
taking part increased from around 30 to 50 percent.
Table 5.1 consists of a summary of the three categories of instruments in this chapter.
It forms a matrix with porous boundaries as opposed to discrete categories, and the
distinctions between the categories are understood as fluid and dynamic. In this
light, it is perhaps best understood as encompassing degrees of hybridity, because
music fields such as jazz, Western art music, karaoke, DJ’ing, videogames, and even
imaginary metal (as in the repertoire of my informant), all avail themselves of evolving
technologies of performance and improvements in interfaces (in short, affordances)
to bring musical sounds to life. Moreover, the porosity of these genres allows for
108 marc duby

Table 5.1 Comparing Performance with Three Different Interfaces (Real, Virtual,
and Air)
Real (live performance) Virtual (live or recorded) Air (recorded)

Agentic Agentic/schizophonic Schizophonic

Dynamic sensorium Reproduced sensorium Imaginary sensorium
“Authentic” Potentially “inauthentic” Avowedly “inauthentic”
Online (real-time) On- or offline (sequencers, Offline (and loving it)
backtracks)
Gesture and sound congruent Gesture and sound (in)congruent Gesture and sound incongruent
(one-to-one correspondence) (may not necessarily correspond) (reassembled after the fact)
Instruments, voices Controllers Air instruments
Jazz, Western art music, etc. Karaoke, DJ’ing, videogames Pantomime (game)
“Serious” “Serious”/ludic Ludic (kitsch, camp, drag)

cross-pollination across and between them, so bringing ideas of recursion, borrowing,

and interpenetration across boundaries.
Reading from top to bottom, “agentic” refers to the degree to which a given participant
can influence the outcome of the music, so that real-time live performance in fields like
jazz and Western art music necessitates a one-to-one correspondence between gesture
and sound, whereas karaoke “inserts” the performer into a ready-made environment,
the only variable in this instance being the performer’s contribution to a fixed backtrack.
Agency is understood in this musical context simply as the person moving the air, so to
speak. The notion of authenticity (hotly debated in the field between traditionalists and
innovators, according to both Miller and Katz) is deliberately placed in inverted commas
because it is felt to be to a large degree irrelevant for the purposes of this argument.
A more productive strategy perhaps is to approach this contentious notion as a function
of time; in other words, time connects gesture to sound in live performance in demon-
strable fashion, because both the audience and fellow-musicians can observe a congru-
ency between action and sound. With virtual instruments, gesture and sound might not
necessarily correspond, because a gesture on a keyboard might trigger an incongruent
sound or prerecorded sequence of musical events, so that a one-to-one correspondence
might or might not occur. In the case of air guitar, the gesture is separate from the outcome
because the guitarist is miming to a prerecorded backtrack and then reassembling the
gestures to correspond with the sonic outcome after the fact, according to the conventions
of the genre.
By proclaiming its adherence to the ersatz, the fake, the less than authentic—
“celebrating the basement,” as van der Walt puts it10—the world of the air guitarist upends
the social order of professionalism and virtuosity, acknowledging that, after all, it is just
a game. Or a carnival, so that Bakhtin’s (1998) distinction between the official feast’s
“consecration of inequality” and the freewheeling spirit of carnival seems very appropriate
in this context.
affordances in real, virtual, and imaginary music 109

As opposed to the official feast, one might say that the carnival celebrated tempo-
rary liberation from the prevailing truth and from the established order; it marked
the suspension of all hierarchical rank, privileges, norms, and prohibitions. Carnival
was the true feast of time, the feast of becoming, change, and renewal. (45)

Miller (2009) astutely connects the ludic aspect of guitar-oriented videogames to self-
satirizing genres such as pantomime, high camp, and kitsch, stating that, in the case of
Rock Band, “players not only choose the gender, body type, clothes, and instruments
for their avatars but also must select a physical performance style, choosing from
rock, punk, metal, and goth ‘attitudes’ that govern the avatar’s physical mannerisms,
stance, and affect” (421). As a result of the disconnect between performance and outcomes,
Miller contends, “The games invite players to make a spectacle of themselves” (421).
Yes, indeed.

Further Discussion, and Some

Remarks on Representations

Movement is the mother of all cognition.

—Sheets-Johnstone (2011, 128).

If we accept this premise as the fundamental basis for cognition, then the motor actions
of professional musicians involve the acquisition over time of an exacting level of precise
control as required for the execution of highly complex music at virtuoso level. Consider
in this regard the actual changes in wiring in the corpus callosum (responsible for
coordination between left and right “sides” of the body: see, for instance, Sacks 2011;
Koelsch 2012) in professional musicians, such that Sacks claims that musicians’ brains
are distinguishable at the physical level (i.e., dissection) from those of other occu
pations. The corpus callosum rewires itself exactly because it is bound up in precise control
and coordination exemplified by the performance of music at the outer limits of human
possibility by virtue of its technical difficulty, its inherent structure (Satie’s Vexations,
with its 840 repetitions lasting between eight and 24 hours to perform11), or its demands
for synchronization, such as music by Pierre Boulez the author witnessed in performance
in which not least of the technical demands was to play such difficult music in time with
the rest of the ensemble.
If, as Sheets-Johnstone proposes, consciousness of self and others springs from our
own animation in the gradual process of individuation, it is true that the realm of the
sensorimotor has largely been ignored by conventional cognitive science, as she avers.
The fact is that musicians learn not only how to move but also how to limit movement
for economy’s sake (so conserving energy) as well as deploying off-line resources such as
mental rehearsal to enhance performance in the case of athletes, dancers, musicians,
110 marc duby

and so on. For Cook (1992), “[b]eing able to play the piano is a matter not so much of
mastering the actions required in performance as of knowing how to organize them into
a coherent motor sequence” (75). Beilock and Lyons (2015) ground their discussion on
expert performance in experts’ greater propensities for off-line preparation, a kind of
motor visualization of imagined actions/procedures supported by clinical evidence.
Between player and instruments exists a reciprocity through which each is gradually
transformed; this is how constant friction wears the original varnish down to bare
wood so that instruments acquire a patina over time, and how, in turn, such instruments
become priceless. Much more than mere tools, instruments provide avenues for
self-expression, real and imaginative affordances for creativity, and may contribute to
the enactment of a temporary sense of community among participants. As Peretz (2006)
states it, a key purpose of music and dance is to “enhance cooperation and educate the
emotions and the senses. It is a form of communion whose adaptive function is to
generate greater sensory awareness and social cooperation” (24).
One of the founders of the field of artificial intelligence maintains that “mobility,
acute vision and the ability to carry out survival-related tasks in a dynamic environment
provide a necessary basis for the development of true intelligence” (Brooks 1991, 141).
His view aims to dispense with representations as intermediaries between body and
mind in favor of direct perception and action within a robotic environment. For Brooks,
a robotic agent displays intelligence by simply acting and has no need for an internal
map inside her silicon head. Windsor and de Bezenac (2012) point to the incompatibility
of notions like representations with an ecological approach, claiming that:

Ecological approaches do not sit well with discussions of imagery and representation,
however situated or embodied these discussions may be. Although our attempt to
stretch affordances to cover a wide range of behaviours may appear speculative in
some instances, we have intentionally chosen to avoid falling back on mental pro-
cesses and representations as an explanation of behaviour in order to test how well
the concept can be extended, and we would expect that such hypotheses should
attract further empirical as well as philosophical investigation. (116)

Stretching affordances (somewhat of a conceptual newcomer to the broad field of

cognitive science), as these writers propose, may well provide an ecologically valid
methodological approach in problematizing the divisions between body and mind,
action and perception, and making and theorizing about sounds. By placing the
emphasis on musical actions as examples of enacted cognition, ecological approaches
offer a viable alternative to the disembodied stance of the computationalist project.
Further to this point, one may choose to be parsimonious about representations and
deploy such concepts as and when they are necessary and proven by experimental
evidence. In this regard, it may be important to take heed of Haselager and his colleagues,
who proffer what they term (Haselager et al. 2003) “an antidote to the traditional represen-
tational cravings of cognitive science: ‘Don’t use representations in explanation
and modeling unless it is absolutely necessary’ ” (229).
affordances in real, virtual, and imaginary music 111

So, instead of giving in to such cravings, Chemero (2016) enjoins his readers to consider
the explanatory force of sensorimotor empathy as the “implicit, sometimes unin
tentional, skilful perceptual and motor coordination with objects and other people” (138).
This concept seems well suited to understanding the affordances of musical performance
as real-time sociocultural phenomena, without necessarily invoking representations as
intermediaries in such circumstances.

Acknowledgments
This material is based on work supported financially by the National Research Foundation
of South Africa. Any opinion, findings, and conclusions or recommendations expressed in
this material are those of the author(s), and therefore the NRF does not accept any liability
thereto.

Notes
1. It is worthwhile to note how composers have tended to push against the limits and con-
ventions of musical creativity by harnessing new extended instrumental techniques in
performance, so extending concepts of what musical instruments can be employed to do.
2. “Synopsis for 2001: A Space Odyssey,” 2016. http://www.imdb.com/title/tt0062622/
synopsis?ref_=tt_stry_pl. Accessed July 25, 2016.
3. “Man is known by his artifacts. He is an artisan, an artificer, an employer of the arts, an
artist, and a creator of art. Beginning with tools and fire and speech, the ‘tripod of culture,’
he went on to making pictures and images, then to the exploitation of plants and animals,
then to the exchange of goods for money, and finally to the invention of writing”
(Gibson 1968, 27).
4. The category mistake comes from Gilbert Ryle, whose example is a visitor to Oxford
who is in search of “the University,” misunderstanding the abstract nature of this term
for the agglomeration of buildings and personnel who staff it. The contention is that terms
such as “motor imagery” are problematic because they conflate two different modalities
of perception.
5. As Decety and Stevens put it (2015), “Because motor representation inherently involves
aspects of both body and mind, it presents as the most obvious candidate for wedding this
dichotomy” (3) [that is, that between body and mind]. No such dichotomy exists in eco-
logical psychology, which, as noted, insists on the mutuality between agent and
environment.
6. Unison technology as employed by the California-based audio company Universal Audio
offers a solution to this problem by allowing the digital information of the computer to
change the impedance characteristics of the interface, so modeling the behavior of an ana-
log channel strip.
7. Miller (2009) defines “schizophonic” as “R. Murray Schafer’s term for the split between a
sound and its source, made possible by recording technology” (400).
8. In fact, this kind of technological sleight of hand is fairly routine within the music industry.
Consider auto-tuning software, for instance, which purports to correct bad intonation, or
112 marc duby

the wide range of “sweetening” (audio processing such as compression, EQ, and reverb)
treatments employed routinely as production techniques.
9. Speaking of strings, Rufus Reid (1974) exhorts the apprentice jazz double bass player to
develop calluses to acquire a stylistically idiomatic pizzicato sound. These are constituted
over time through friction between skin, metal strings, and wooden fingerboard. The
instrument gives and takes over time: witness the devastating psychological effects for
professional musicians of being incapacitated and unable to play. Such strokes of cruel
misfortune sunder performers from their professional and artistic selves and steal from
them a vital raison d’être.
10. M. Duby, Skype Interview: Heine van Der Walt. Pretoria, South Africa, September 29, 2016.
11. According to Sweet (2013), “An Australian pianist named Peter Evans abandoned a 1970
solo performance after five hundred and ninety-five repetitions because he claimed he was
being overtaken by evil thoughts and noticed strange creatures emerging from the sheet
music. ‘People who play it do so at their own peril,’ he said afterward.”

References
AuntyNexus. 2014. Space Has Never Held Such Terror: Boargazm Interview.
http://metal4africa.com/interviews/space-has-never-held-such-terror-boargazm-interview/.
Accessed April 6, 2017.
Baily, J. 1992. Music Performance, Motor Structure, and Cognitive Models. In European
Studies in Ethnomusicology: Historical Developments and Recent Trends: Selected Papers
Presented at the VIIth European Seminar in Ethnomusicology, Berlin, October 1–6, 1990, edited
by M. P. Baumann, A. Simon, and U. Wegner, 142–158. Wilhelmshaven: F. Noetzel.
Bakhtin, M. 1998. Rabelais and His World (1940), Mikhail Bakhtin. In Literary Theory: An
Anthology, rev. ed., edited by J. Rivkin and M. Ryan, 45–51. Oxford: Blackwell.
Barrett, M. S. 2011. Troubling the Creative Imaginary: Some Possibilities of Ecological Thinking
for Music and Learning. In Musical Imaginations: Multidisciplinary Perspectives on Creativity,
Performance and Perception, edited by D. J. Hargreaves, D. Miell, and R. MacDonald, 45–66.
Oxford: Oxford Scholarship Online. doi:10.1093/acprof.
Barrett, M. S., ed. 2014. Collaborative Creative Thought and Practice in Music. SEMPRE Studies
in the Psychology of Music. Farnham, UK: Ashgate.
Beilock, S. L., and I. M. Lyons. 2015. Expertise and the Mental Simulation of Actions. In
Handbook of Imagination and Mental Simulation, edited by K. D. Markman, W. M. P. Klein,
and J. A. Suhr, 139–159. New York: Psychology Press.
Brooks, R. A. 1991. Intelligence without Representation. Artificial Intelligence 47: 139–159.
doi:10.1016/0004-3702(91)90053-M.
Buccino, G., S. Vogt, A. Ritzl, G. R. Fink, K. Zilles, H. J. Freund, et al. 2004. Neural Circuits
Underlying Imitation Learning of Hand Actions: An Event-Related fMRI Study. Neuron
42 (2): 323–334. doi:10.1016/S0896-6273(04)00181-3.
Chemero, A. 2016. Sensorimotor Empathy. Journal of Consciousness Studies 5: 138–152.
Clark, T., A. Williamon, and A. Aksentijevic. 2011. Musical Imagery and Imagination: The
Function, Measurement, and Application of Imagery Skills for Performance. In Musical
Imaginations: Multidisciplinary Perspectives on Creativity, Performance and Perception,
edited by D. J. Hargreaves, D. Miell, and R. MacDonald: 45–66. Oxford: Oxford Scholarship
Online. doi:10.1093/acprof.
Clarke, E. F. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
affordances in real, virtual, and imaginary music 113

Cook, N. 1992. Music, Imagination, and Culture. Oxford: Clarendon.

Cumming, N. 2000. The Sonic Self: Musical Subjectivity and Signification, Advances in Semiotics.
Bloomington, IN: Indiana University Press.
Decety, J., and J. A. Stevens. 2015. Action Representation and Its Role in Social Interaction. In
Handbook of Imagination and Mental Simulation, edited by K. D. Markman, W. M. P. Klein,
and J. A. Suhr, 3–20. New York and Hove, UK: Psychology Press.
Doornbusch, P., and G. Shill. 2014. Affordance and Appropriation. In Music in the Social and
Behavioral Sciences: An Encyclopedia, edited by W. F. Thompson, 28–31. Thousand Oaks:
SAGE Publications, Inc. doi:http://dx.doi.org/10.4135/9781452283012.
Fuster, J. M. 2013. The Neuroscience of Freedom and Creativity: Our Predictive Brain. Cambridge:
Cambridge University Press.
Gibson, J. J. 1962. Observations on Active Touch. Psychological Review 69 (6): 477–491.
Gibson, J. J. 1968. The Senses Considered as Perceptual Systems. London: George Allen & Unwin.
Gibson, J. J. 1979. The Ecological Approach to Visual Perception: Classic Edition. New York:
Psychology Press.
Gibson, J. J. 1994. The Visual Perception of Objective Motion and Subjective Movement.
Psychological Review 101 (2): 318–323. doi:10.1037/h0061885.
Gibson, J. J. 1998. Visually Controlled Locomotion and Visual Orientation in Animals.
Ecological Psychology 10 (3–4): 161–176.
Godøy, R. I. 2004. Gestural Imagery in the Service of Musical Imagery. In Gesture-Based
Communication in Human-Computer Interaction: 5th International Gesture Workshop, GW
2003, edited by A. Camurri and G. Volpe. 55–62. Berlin: Springer-Verlag.
Godøy, R. I. 2006. Gestural-Sonorous Objects: Embodied Extensions of Schaeffer’s Conceptual
Apparatus. Organised Sound 11 (2): 149–157. doi:10.1017/S1355771806001439.
Godøy, R. I., and H. Jørgensen. 2001. Musical Imagery. London: Routledge.
Harari, Y. N. 2014. Sapiens: A Brief History of Humankind. London: Harvill Secker.
Hargreaves, D. J. 2012. Musical Imagination: Perception and Production, Beauty and Creativity.
Psychology of Music 40 (5): 539–557. doi:10.1177/0305735612444893.
Hargreaves, D. J., D. Miell, and R. MacDonald. 2011. Musical Imaginations: Multidisciplinary
Perspectives on Creativity, Performance and Perception. Oxford: Oxford Scholarship Online.
Haselager, W. F. G., R. M. Bongers, and I. Van Rooij. 2003. Cognitive Science, Representations
and Dynamical Systems Theory. In The Dynamical Systems Approach to Cognition:
Concepts and Empirical Paradigms Based on Self-Organization, Embodiment, and
Coordination Dynamics, edited by W. Tschacher and J.-P. Dauwalder, 229–241. Singapore:
World Scientific.
Heft, H. 2001. Ecological Psychology in Context: James Gibson, Roger Barker, and the Legacy of
William James’s Radical Empiricism. Mahwah, NJ: Erlbaum.
Hind, N. 2016. Physical Modelling Synthesis. https://ccrma.stanford.edu/software/clm/comp-
mus/clm-tutorials/pm.html. Accessed September 27, 2016.
Horn, M. 2015. 10 Reasons Why 2001: A Space Odyssey Stands as the Best Sci-Fi Movie Ever
Made. http://memeburn.com/2015/11/10-reasons-why-2001-a-space-odyssey-stands-as-
the-best-sci-fi-movie-ever-made/. Accessed April 8, 2017.
Hutchinson, S. 2016. Asian Fury: A Tale of Race, Rock, and Air Guitar. Ethnomusicology
60 (3): 411–433.
Karjalainen, M., T. Mäki-Patola, A. Kanerva, and A. Huovilainen. 2006. Virtual Air Guitar.
AES: Journal of the Audio Engineering Society 54 (10): 964–980.
Katz, M. 2012. The Amateur in the Age of Mechanical Music. In The Oxford Handbook of Sound
Studies, edited by T. Pinch and K. Bijsterveld, 459–479. Oxford: Oxford University Press.
Kind, A. 2016. The Routledge Handbook of Philosophy of Imagination. Abingdon, UK: Routledge.
114 marc duby

Koelsch, S. 2012. Brain and Music. Chichester, UK: Wiley-Blackwell.

Krueger, J. 2011. Doing Things with Music. Phenomenology and the Cognitive Sciences 10 (1):
1–22. doi:10.1007/s11097-010-9152-4.
Krueger, J. 2014. Affordances and the Musically Extended Mind. Frontiers in Psychology
4 (January): 1–13. doi:10.3389/fpsyg.2013.01003.
Latour, B. 2005. Reassembling the Social: An Introduction to Actor-Network-Theory. Oxford:
Oxford University Press.
Markman, K. D., W. M. P. Klein, and J. A. Suhr. 2015. Handbook of Imagination and Mental
Simulation. New York and Hove, UK: Psychology Press.
McGinn, C. 2015. Prehension: The Hand and the Emergence of Humanity. Cambridge, MA:
MIT Press.
Michaels, C. F., Z. Weier, and S. J. Harrison. 2007. Using Vision and Dynamic Touch to
Perceive the Affordances of Tools. Perception 36 (5): 750–772. doi:10.1068/p5593.
Miller, K. 2009. Schizophonic Performance: Guitar Hero, Rock Band, and Virtual Virtuosity.
Journal of the Society for American Music 3 (4): 395–429. doi:10.1017/S1752196309990666.
Opie, T. 2015. What Is Granular Synthesis? http://granularsynthesis.com/guide.php. Accessed
April 17, 2017.
Peretz, I. 2006. The Nature of Music from a Biological Perspective. In Cognition 100:1–32.
doi:10.1016/j.cognition.2005.11.004.
Port, R. F., and T. Van Gelder. 1995. Mind as Motion: Explorations in the Dynamics of Cognition.
Cambridge, MA: MIT Press.
Reid, R. 1974. The Evolving Bassist: An Aid in Developing a Total Music Concept for the Double
Bass, and the Four and Six String Electric Basses. Chicago: Myriad.
Reybrouck, M. 2015. Music as Environment: An Ecological and Biosemiotic Approach.
Behavioral Sciences 5 (1): 1–26. doi:10.3390/bs5010001.
Roads, C. 2015. Composing Electronic Music: A New Aesthetic. Oxford: Oxford University Press.
Sacks, O. 2011. Musicophilia: Tales of Music and the Brain. Rev. ed. London: Picador.
Schaeffer, P. 2012. In Search of a Concrete Music. Translated by C. North and J. Dack. Berkeley:
University of California Press.
Schlaug, G. 2015. Musicians and Music Making as a Model for the Study of Brain Plasticity. In
Music, Neurology, and Neuroscience: Evolution, the Musical Brain, Medical Conditions, and
Therapies, 37–55. Progress in Brain Research, 217. Amsterdam: Elsevier B.V. doi:10.1016/bs.
pbr.2014.11.020.
Sheets-Johnstone, M. 2011. The Primacy of Movement. Expanded 2nd ed. Amsterdam and
Philadelphia: John Benjamins.
Sweet, S. 2013. A Dangerous and Evil Piano Piece. The New Yorker.
Türoque, B., and D. Crane. 2006. To Air Is Human: One Man’s Quest to Become the World’s
Greatest Air Guitarist. New York: Riverhead Press.
Waksman, S. 2003. Contesting Virtuosity: Rock Guitar since 1976. In The Cambridge
Companion to the Guitar, edited by V. A. Coelho, 122–132. Cambridge Companions Online.
Cambridge: Cambridge University Press.
Wallin, N. L. 1991. Biomusicology: Neurophysiological, Neuropsychological, and Evolutionary
Perspectives on the Origins and Purposes of Music. Musicological Series by Pendragon Press.
Stuyvesant, NY: Pendragon Press.
Windsor, W. L. 2011. Gestures in Music-Making: Action, Information and Perception. In New
Perspectives on Music and Gesture, edited by A. Gritten and E. King, 45–66. SEMPRE
Studies in the Psychology of Music. Farnham, UK: Ashgate.
Windsor, W. L., and C. de Bezenac. 2012. Music and Affordances. Musicae Scientiae 16 (1):
102–120. doi:10.1177/1029864911435734.
pa rt I I

SYST E M S A N D
T E C H NOL O GI E S
chapter 6

System ic
A bstr actions
The Imaginary Regime

Martin Knakkergaard

Introduction

When we imagine and make music, we are fundamentally manufacturing sculptures in

time. Music is an outline of time in which we make use of pitched as well as unpitched
human and musical instrument sounds, odd sound syntheses and noise, all kinds of con
crete sound sources and samples, silence, and any combination of frequencies imaginable,
for the last few decades even digitally modeled and thus transcending the limits of sound
made by analog means. In making these sounds we are, besides our own bodies, using a
number of technologies most of which—the musical instruments—have been spe
cifically designed for the particular purpose. Generally, technology in the widest sense of
the concept has and always has had a profound influence on music. No matter whether
we look at how music is conceived, organized, performed, preserved, or developed,
music making is always closely tied to the technologies at hand at the given time. As
such, technology—and the insights, beliefs and understandings behind it—always regu
late the conditions on which music is formed, eventually leading to the generation of
theoretical rules, terminologies, and discourses that more or less stand out as universal
principles, uncovered by the means of language and logic.
The workings and principles of these technologies are guiding our imagination, they
are co-builders of the mental frames and schemes we conceive music by, and central to
these music technologies—and to music making in general—is the notion of the tone or
the interval, which is, in my understanding, also what distinguishes music from other
sound-producing, expressive art-forms.
As phenomena, music and music-making are highly complex and richly faceted art-
forms as well as engaging social activities sometimes initiated for their own sake and
118 martin knakkergaard

sometimes to support gatherings, communities, and ceremonies. Music is a seclusive art

of sound that makes time perceptual, generally in such a way that we, when listening to
it, can tell if it is done right or wrong, is well performed or not, even though we are not
familiar with the particular piece in advance or with the musical genre or style in ques
tion. What we are or become aware of, however, is the underlying systemic implications
that the music in question either conforms to or fails to observe.
So, even though there are no limits to what can be used in making music as long as it
can produce some kind of sound, how the music is organized and realized is quite another
matter. There are limits, but, contrary to sounds, these limits are not linked to any kind
of materiality. The composition of sounds into music is ruled and restricted by corporeal
implications and dependencies as well as by the element of embodiment on the one
hand and, on the other, by specific, cultural practices that are theorized, or capable of
being theorized, explicitly or implicitly. These factors and elements interact dialectically
with the approaches, techniques, and technologies that are used for making music, and
together they regulate and guide our imagination of music, in regard to what music
we can expect and what music we can ideate. Whereas we in the Western countries, for
instance, often tend to—or at least used to—recognize music from other cultures as
exotic and maybe even consider it to be somewhat wrong in terms of pitch and articu
lation or likewise, we forget, overlook, or simply do not know that our music at a time not
that long ago might very well have sounded much in the same way, even after it had been
theoretically confined by extensive restrictions of metaphysically justified theory and
normativity that music’s encounter with the philosophy and speculations of the Ancient
Greeks led to. So, even though music in practice typically does not comply with the
rigidity of theory, Western music has had its own Weberian iron cage for an eternity
that has imposed a number of restrictions on the expressive and articulatory potentials
of musical forming and expression—banning “subversive noise [and] maintaining
tonalism” (Attali 2008, 8) in favor of control and order, which, however, all the same has
paved the way for the development of, for instance, advanced polyphony and rich and
complex harmony.
In this chapter I shall discuss how the beliefs and imaginations of primarily the
Pythagoreans have influenced the shaping of pitch structures in music for more than 2000
years and how the resultant selection and organization of abstract and generalized pitches
in practice affect our musical expectations, conceptualizations, and imaginations.

Interval

At the core of the organization of music is the interval. No matter how we look at music,
the interval holds a central position as the essential building block and reference. It is by
the presence and use of defined intervals that music is distinguishable from other forms
of expressive sonorous (art) forms such as sound art and speech. Sound art and speech
obviously contain intervals (it is impossible not to), but these intervals are, contrary
to music, not tied to and dependent on specific, rigorous codes, although the use of
systemic abstractions: the imaginary regime 119

pitch-differences in speech, just like variations in dynamics and tempo, carries a lot
of information—in many languages they actually alter the meaning of words and
sentences—and of course can obtain a decisive function in sound art as well. Any interval
is valid in sound art and speech, intervals—no matter whether they are rhythm or pitch
intervals—are tied to language, dialect, situation, age, and so forth, and are more or less
unique to the person speaking (see Yasar, volume 1, chapter 21). Using specific, regular
intervals will either turn speech to song or maybe signal that the person in question is
not quite well. With regard to rhythm intervals or beats, as Oliver Sacks states (quoting
A. D. Patel), “The perception of synchronization of beat, Patel feels, ‘is an aspect of
rhythm that appears to be unique to music . . . and cannot be explained as a by-product
of linguistic rhythm’ ” (2006, 243).
Regarding music, we think in intervals just as we hear in—and listen for—intervals,
and the concrete intervals are set against an immanent systematization that is given in
advance depending on culture, epoch, style, and genre. From early on, the brain is sim
ply trained to listen for and respond to patterns of intervals that fall within certain, definite
matrixes, typically referred to as tone systems. These systems are not universal but are
culturally specific, which implies that what, from the side of the receiver, is acknowledged
as meaningful musical utterances is dependent on the sonorous pattern’s reference to
the tone system, just as what can be imagined from the perspective of the sender is guided
by, and in a way confined by, proportions of the tone system. Again, whether we are refer
ring to pitch or time intervals is not important: in practice, they are both always present
as long as pitch paradoxically is also allowed to include nonpitched or inharmonic
sound such as noise and many drum sounds. In this chapter I will, however, concentrate
almost entirely on pitch intervals.
The modern organization of pitch is not just a system of intervals but a finite system of
tones or pitches of fixed frequencies, albeit the determination of the frequencies—the
tuning—is relative to whatever concert pitch is currently decided (today, typically
a = 440–444 Hz). In comparison, neumes, which were in use for centuries in the Middle
Ages, only indicate relative intervals,1 and often ornamental implications too, but carry
no information about pitch (or rhythm and duration for that matter); the neume virga,
for instance, indicates a tone that is higher than both or one of the surrounding tones,
whereas clivis indicates two tones where the first is the highest and porrectus indicates
the tone sequence high-low-high, and so on. Neumes were thus primarily a descriptive
and mnemotechnical tool, to support learning and performance of music, and it was
completely dependent on the user’s imagination, experience, and acquaintance with the
musical practices and the tone system of the time. Similarly, pitches within the fixed
notation of Gamelan music today are only relative, as they vary “considerably from one
gamelan to the next, both in absolute pitch and in relative size of intervals” (Brinner 2017),
implying that instruments of whole ensembles are not necessarily “in tune” with one
another. In other words, whereas the intervals of Western music of today are dependent
on abstract, fixed proportions, this is not at all a universal—or an ahistorical—situation,
and even though the music theory of the Middle Ages was closely tied to a strict systemati
zation founded on Pythagorean principles and idealization, the performance of the
music relied on oral practices.
120 martin knakkergaard

Since the Pythagoreans, the systematized organization of pitches in most Western

cultures is founded on arithmetic (Sundberg 1980; Hansen 2003). This factor is probably
a decisive element behind the Western understanding of numbers as central to music as
it, for instance, is manifested in this oft-cited quotation from Leibniz (1712): “Musica est
exercitium arithmeticae occultum nescientis se numerare animi” (Music is a hidden
arithmetic exercise of the soul, which does not know that it is counting). It must, how
ever, be noted that, to the Pythagoreans, numbers were sacred, and their study of music
aimed beyond the music itself:

The important truths about music were to be found instead in its harmonious reflec
tion of number, which was ultimate reality. As a mere temporal manifestation, the
employment of this harmonious structure in actual pieces of music was of decidedly
secondary interest. (Mathiesen 2017)

And, in the Greek’s “new sensitivity to order and form” (Burkert 1962, quoted in
Sundberg 1980, 21, my translation), the numbers provided a way to escape materiality’s
grip, as numbers “are present in the things with a reality-constructing and determining
function” (Sundberg 1980, 21, my translation).
The Ancient Greeks’ use of numbers in the generation of the tone system, where the
octave is defined by the relation 2:1, the fifth as 3:2 and the fourth as 4:3, is a direct result
of their general favoring of the musica universalis (also referred to as the harmony of the
spheres) comprising the small integers 1-2-3-4. With these four numbers, they also con
structed the triangular tetractys that, besides constraining the above ratios, also comprises
10, the sum of the four numbers and the core of the decimal system. Even though the
fundamental ratios of the tone system were supported empirically by experiments with
proportional divisions of the string of a monochord, the Greeks abstained from con
tinuing the sequence of ratios further, which, for instance, based on an additional readout
from the monochord, would have produced the major third by the ratio 5:4 (see later).
They insisted on “explaining” the harmonic proportions from within the reach of the
tetractys because the number 4 was considered to be sacred as it was observable every
where: nature and matter was made up of the four elements, there were four seasons,
four directions, four ages, and so on, and these numbers were expected to be present
in all proportions, natural as well as mystical, that warranted cosmic correspondence,
balance, and coherence. The scientific axiom that is linked to this particular use of num
bers in relation to music is, in other words, rooted in a sort of mythical understanding of
unity, simplicity, and balance as a ruling principle (see Klempe 1991; Sundberg 1980;
Knakkergaard 2016a).

Diatonic Scaling
Even though there are practically unlimited ways to divide audible sound into separate
pitches (see later), the Western practice is initially to maintain the octave as an identity
systemic abstractions: the imaginary regime 121

2 4
1 e f g a 3 e’ f’ g’ a’
A B c d e b c’ d’ e’
a bb c’ d’
5

Figure 6.1 The construction of a (double) diatonic tone system with a slight chromatic
element.

or unison interval and to partition this octave into a scale of 7 discrete steps. The scale
comprises 5 (whole) tones (T) and 2 semitones (S), often represented as this series
T-T-S-T-T-T-S, thus forming the scale we today refer to as “major.” Originally, the gen
erative structure behind this scale is the tetrachord—literally “four strings”—and the
partition of the octave is rooted in the combination of five equally constructed groups of
four tones within the compass of a fourth, called diatonic tetrachords (Figure 6.1). To
the Greeks, these five tetrachords together formed a complete tone system, and again
“against this tetrachordial thinking the fourth interval stands out as «das Lieblingskind
der griechischen Theorie» (the lovechild of Greek theory)” (Handschin 1948, quoted in
Sundberg 1980, my translation).
The internal proportions of the tetrachord were constructed2 on the basis of the three
fundamental intervals whose ratios, as mentioned earlier, are contained within the
tetractys: the octave, the fifth, and the fourth. Setting the octave to 12, the octave (2:1) below
will be 6, the fourth (4:3) below will be 9 and the fifth (3:2) below will be 8, and the pro
cedure thus reveals the interval 9:8, namely, between the fourth and the fifth. The Greeks
called this interval “tonos” (≈ “tension”), which was “considered to be the fundamental
tone-step” (Sundberg 1980, 112, my translation) and the Pythagoreans used it to “fill out”
the gaps (downward) between the fundamental intervals—see d and c in tetrachord 1
and f and g in tetrachord 2 in Figure 6.1.3
The tetrachordal understanding of this early diatonic scale came to dominate in
Western music theory until the eleventh century, when a hexachordal understanding
gradually took over, along with the development of Guidonian notation and the
mnemotechnical system called the Guidonean hand. The use of the hexachord as a par
ticular unit was most of all a pragmatic move, not a theoretical one. Its primary aim was
to ease and support the learning of vocal music, and it formed an intelligent approach to
the adoption of the proportions of the diatonic system of the time by the introduction of
the fixed format T-T-S-T-T, vocalized as ut-re-mi-fa-sol-la. Furthermore, it made it easy
to distinguish between the three different versions, the natural, the hard (durum), and
the soft (molle), according to their note placement, respectively c, g, and f, thus allowing
for the use of both b and bb (see Figure 6.2).
The tone-step was maintained as a fundamental proportion, and so the introduction
of the hexachord did not weaken the diatonic scale’s dominant position. Quite the con
trary. By replacing the much more ambiguous neumes, the innovation in practice not only
consolidated the scale but also indicated the transition from an informed descriptive
122 martin knakkergaard

molle ut re mi fa sol la ut re mi fa sol la

naturale ut re mi fa sol la ut re mi fa sol la

durum ut re mi fa sol la ut re mi fa sol la ut re mi fa sol la

gamut G A B c d e f g a bb b c’ d’ e’ f’ g’ a’ b’ h’ c” d” e”

Figure 6.2 The possible hexachords within the tone system of the Middle Ages, the gamut.

system of complex signs to a reductionist prescriptive system of disconnected and

generalizable abstract entities. As such, it marks the first steps toward the merging of
a tone system and a notation system too, pointing toward a situation that legitimizes to
a certain degree Wishart’s claim that “the priorities of notation do not merely reflect
musical properties—they actually create them” (1996, 11).
With the eventual expansion of the tone system, the so-called gamut (see Figure 6.2),
by the inclusion of half steps between the (whole) tone-steps—some of which in the Late
Middle Ages were referred to as false or fictive music (musica falsa and musica ficta)—a
full chromatic scale could finally emerge in the sixteenth century. And even though the
basic scale almost certainly was the pentatonic (Hansen 1995, 25f), just as it was and is in
most non-Western music (Tran 1977, 77), the inclusion of semitones afforded the pre
requisites for the development of the harmonic principles that came to dominate music
from the late seventeenth century until 1900—after Riemann (1893), often referred to as
cadential harmony or “functional harmony”—in which the two semitones of the diatonic
scale provide the pivotal leading notes.
Returning to the tetrachord, it is noteworthy that it played a decisive role as a con
stitutive element in tone scales of the Greco-Roman epoch, not just in Greece but also
in general, for instance, in Anatolia and Mesopotamia. And besides the diatonic tet
rachord, the Greeks also referred to two variants of the four-tone unit: namely the
enharmonic and the chromatic, where the enharmonic was built on the intervals
quarter-tone–quarter-tone–major third and the chromatic on semitone–semitone–
minor third. Had these tetrachords not been suppressed by the diatonic, the inner
workings of the Western scale or scales could have been composed very differently to
what they came to be. In comparison, the tone system of Anatolia, most of which today
is known as Turkey, developed into a number of slightly different scales, makams, which
represent unique selections of tones, referred to as commas, from a division of the
octave into 53 theoretically equidistant intervals. These proportions suggest that
the chromatic and enharmonic tetrachords or similar continued to function as con
stituent elements. However, elements similar to the diatonic principle are present in these
scales too, as many makams include octave identity dividing the octave accordingly
into scales or combined sets of pentachords and tetrachords, each scale comprising
seven steps. Just like the makams, scales from other non-Western music cultures bear
similarities with the diatonic scale, but the general picture is that they are typically pen
tatonic selections.
systemic abstractions: the imaginary regime 123

Some Notational Implications

From the beginning of the thirteenth century, five-line staff notation was dominant
along with the increased use of the polyphony of the time, which also brought detailed

divisive (multiplicative) principles—♩ = ♪ + ♪, for example—as well as the introduction

representation of the rhythmic parameter with notes of different duration organized by

of bar-lines and, somewhat later, measures.

As mentioned, the tone system and the notation system have become increasingly
interdependent in accordance with the developments of the latter and the resulting expan
sion of the former. However, being based on theory and abstraction, both the tone
system and the notation system relate to a number of compromises and choices, and
almost all ratios, maintained and recorded within the two systems, “do not really” comply
with the stringency of the systems (Hansen 1999). Instead, sounding music is almost free
of the exact divisions of the systems (see Danielsen, this volume, chapter 29), which,
nevertheless, have a decisive influence on the development of musical instruments. And
many instruments will unavoidably regulate and restrict the possible interactions with
the systems thus, in reality, forming yet another technological system that not only solidi
fies the two other systems but also co-shapes our understanding of what can be achieved
and imagined even further. The manifest musical interfaces are thus developed into
affirmative and controlling devices on their own.

Interface

Depending on the needs and ambitions at a given time, the development of musical
instruments and the generation of tone and notation systems have to interconnect in a
process of synthesis in which they can be refined dialectically. The innovations and
changes that vocal music went through following the breakthrough of the radical new
kind of polyphony that Ars Nova brought about in the fourteenth century gradually trans
formed music from an almost entirely linearly organized art form into a musical practice
that included increased attention toward the vertical parameter as well. This development
initiated the emergence of instrumental music, as a separate trajectory, and eventually
also paved the way for the emergence of the triad, which later made up a central element
in the constitution of the “functional harmony” mentioned earlier.
The central premise for the triad to emerge was the inclusion of the third as a consonant
interval. The first traces of this process go back to the twelfth century, in which theore
ticians began to refer to the purely empirical observations, secundum auditum (“by the
ear”), that were achieved when practicing and performing vocal music that displayed
polyphonic implications (Hansen 1995, 58). This development eventually led to reconsider
ations regarding the relation between consonance and dissonance, which did not
simply imply the inclusion of the third as a consonant interval but also, in fact, formed a
124 martin knakkergaard

break with Pythagorean theory because the third’s numeric ratio fell “outside the range
of the tetractys, which by the Pythagoreans assumedly had to be given some sig
nificance” (Sundberg 1980, 107). As they did not want to exceed the restriction of 4,4
they consequently conceived of the major third by moving four fifths up and two octaves
3
 2  4 81
  = .
down:
4 64
In Ramos de Pareja’s treatise Musica Practica from 1482, the interval of the third is, for
the first time, described by the ratio 5:4 (today known as the “pure major third”), and
some seventy-six years later Zarlino

was the first also to include in his concept of harmony triads consisting of 5ths and
3rds; this he was able to do because, besides perfect consonances, he defined imperfect
ones—the 3rds—by means of simple, “harmonic” numerical proportions (5:4 and 6:5)
rather than by the complicated Pythagorean proportions. (Dahlhaus 2017)

From the title of his treatise, it is evident that Ramos de Pareja was concerned with
music in practice: music in its performance. Thus, his theoretical effort was apparently
motivated by the need to overcome the divide between music in theory and music
in practice that the development of polyphony in particular had led to, or, rather, had
uncovered. Just as singers and choirs in practice had probably been singing secundum
auditum all along, ensuring that intervals other than octaves, fifths, and fourths sounded
in consonance, they probably had never had any problems in “transposing” or “shifting”
from one central pitch to another (cf., hexachord, earlier)—they simply maintained the
internal proportions of the scale by ear—this obviously was, however, not the case
regarding the musical instruments of the time. The instruments were laid-out and tuned
according to the principles of the Pythagoreans, to whom “harmonia”—as already
implied—was a question of codifying the (diatonic) scale and “the relationship between
those notes that constituted the framework of the tonal system” (Dahlhaus 2017) and not
the theoretical or acoustic harmony of simultaneously sounding intervals. Consequently,
instruments with fixed steps like, for instance, the organs and the early clavicembalo of
the Middle Ages, were not capable of producing triads that sounded satisfactory. The
thirds and sixths of the Pythagorean scale

do not meet medieval and Renaissance criteria of consonance implied by such terms
as “perfection” and “unity.” When used as harmonic intervals these Pythagorean
3rds and 6ths are likely to be characterized, on an organ Diapason stop for example,
by rather prominent beats; middle C–E or C–A beat more than 16 times per second
at modern concert pitch. (Lindley 2017)

By 1600, the development of the tone system had found its final form with the division
of the octave into twelve intervals or half-steps—however, the question of tuning was
not solved at this time (and remains still unsolved to a certain degree). Equal temperament,
systemic abstractions: the imaginary regime 125

which was suggested by Vincenzo Galilei in 1584, where the semitone is defined as 12 2 , is
a compromise that is not fully musically satisfying. Just or harmonic tuning, that is
based on the ratios of small integers as suggested by Parejo and Zarlino, is more pleasing
to the ear, and is typically used by vocals and strings in ensembles where the performers
adjust pitch with each other by the ear (secundum auditum). However, just tuning only
works ideally for one scale at a time, hence the tuning of, for example, D major is not the
same as that of E major. Today, however, music software programs such as Apple’s Logic
Pro X and Steinberg’s Cubase include Hermode tuning, which is capable of adjusting
simultaneously sounding tones of electronic instruments in real time to accommodate
to just intonation without compromising equal temperament as the overarching tuning.
It is, however, interesting to consider that, by taking one’s point of departure in the
diatonic scale’s combination of tones and semitones,

there is nothing that prevents the whole tone from being more than twice as large or
less than twice as high as the halftone. If a tone system’s minimum interval equals m,
then the system using the fewest tones, is the one whose semitone is m and whole
tone 2m. This gives in all 2 × m + 5 × 2m = 12m, hence 12 tones per octave.”
(Hansen 2003, 1645, my translation)

However, if the half tone is set to 2m and the whole tone to 3m we get 19 tones within
the octave, which, compared with the 12-tone system, is slightly better or more precise in
terms of pure intervals.

While it nevertheless was the 12-tone system that prevailed, it is probably due to
economic and technical performative advantages, and maybe to the fact that
Western European music preferred to live with too large major thirds (implying
small semitones and thus sharp leading notes) than major thirds that are smaller
than the pure. (Hansen 2003, 1645, my translation)

This 19-tone system has not been abandoned altogether—the 19-tone-guitar is for
instance currently available5—but, from this time on, the system of 12 intervals per
octave has been the absolute dominant standard. Even though many musical instruments
can be traced back much further—and very often to non-Western cultures and coun
tries—most of the ones that exist today have been refined and developed to reach their
current shape and level of perfection within the last 300 years in order to comply with
the 12-interval tone-system. The piano and its well-known keyboard lay-out is in many
ways the epitome of a modern musical instrument, being, as it is, the most commonly used
instrument for music teaching and demonstration because it provides a generally objec
tive interface whose design is easily understood and is capable of producing many
simultaneously sounding tones. And although its tuning typically is fixed, it is possible
to alter the tuning if it is the original, acoustic instrument. Such a practice was actually
very often the case in the seventeenth century where there were many candidates for the
new tuning system that the new fully chromatic tone system required, and it was not
126 martin knakkergaard

uncommon for composers and keyboardists to tune their instrument themselves

according to the particular standards of the times or to their own preferences. But the
division of the octave into 12 discrete steps is, however, fixed, just as the diatonic scale in
a way holds a privileged position (highlighted by the seven white keys).
Contrary to modern Western musical instruments, which generally all—apart from a
few exceptions like the trombone and the family of strings—dictate discrete steps through
fixed holes, keys, frets, and similar contraptions, instruments from other cultures do not
just comprise many variations of the fretless violin-type instruments and assortments of
different reed instruments that corresponds to distinctive tunings but also encompass
fretted instruments where the frets are organized in fewer or more steps than fretted
instruments of the West today or where the distance between frets is even adjustable in
order to accommodate different intervallic relationships of dissimilar modus. This
could indicate that, without the dominance of a metaphysical, fundamentally extra
musical theory such as the Pythagorean theory, the cores of other tone systems might be
closely tied to, for example, emotion and experience, as the use of ‘ “makams or perdahs’
and South Asian ‘rāgas and rāginīs’ . . . properly signifies” passions or affections of the
mind (Powers et al. 2017)—but then again, the Greeks, or at least Ptolemy, also associ
ated different modes with ethnic names, such as Dorian, Lydian, and so on, which were
carried on in the writings of Boethius (app. 480–525).
As implied, the West maintained a highly theoretical axiom leading to a situation
where it can be said that, the more theorized tone systems are, the more abstract they
become, and the more abstract they become, the more flexible (understood as emanci
pated from materiality) but, at the same time, regulating and influential they are. Once a
tone system coincides with a notation system, the way is paved for a completely soundless
music to emerge. A music that is conceived in close relation to descriptive units organ
ized in a standardized coordinate-system that is visualized and thus readable. When this
has taken place, there are absolutely no (physical) limits to the kinds of music that can be
imagined as long as it respects the rules and norms of systems, leading to a situation
where we expect and perhaps even require the elements of the system—and none
other—to be present for us to acknowledge music, even though these elements are all
principally abstract and not dependent on any kind of physical, material interaction.

Interaction

Until now, all references to musical instruments in this chapter have signified traditional
acoustic music instruments. During the first half of the twentieth century, however, a
number of electrophones were introduced, such as the Hammond organ, the electric
guitar, and the first synthesizers, and some of these instruments came to change and
expand the concept of the interface and its implications. Among the synthesizers, the
Ondes Martenot and the Theremin—which are both monophonic or one-note-at-a-time
instruments—introduced new kinds of step-less interfaces (in the case of the Ondes
systemic abstractions: the imaginary regime 127

Martenot, though, in combination with a normal keyboard interface) thus offering

seamless control of pitching similar to that of, for instance, the violin. More generally, in
being electric, these electrophones brought a new element to the field of music: namely
musical sound as an electromagnetic current either produced as a transformation of
acoustic sound, as in the case of the electric guitar and the human voice through a micro
phone, or as entirely electrically generated phenomena, as in the case of synthesizers
and samplers. Besides being a major factor in the development of most of the popular
music genres that emerged through the greater part of the twentieth century, sound as an
electromagnetic current, obviously, came to play a paramount and indispensable role in
relation to the sound production of music and eventually also in the digitization of music.
By the middle of the twentieth century, it had become possible to work with sound
interactively by means of tape-recorders. This technology was primarily used for record
ing and reproduction of performed music—where it soon catalyzed a development by
which music and sound increasingly became subject to numerous forms of manipulation
and processing—but some of the avant-garde composers used tape-recording to integrate
nonpitched or concrete sound resources into their compositional vocabulary while
others used it in combination with sine-wave generators to lay out new, unique tone sys
tems for unique compositions and, in both cases, the tape-recorder was actually turned into
a musical instrument of its own. Examples of this include Pierre Schaeffer’s continual
development of musique concrète from the late 1940s and onward (Knakkergaard 1994, 36)
and Karl-Heinz Stockhausen’s meticulous investigations into alternative pitch-
organizations and ton-gemische in Studie 1 (1953) and Studie 2 (1954) (Manning 1994, 45f.).
Much of this must be seen as an attempt to escape the straitjacket of the systems just as,
for instance, dodecaphony and serialism were attempts to dismantle the tyranny of the
diatonic scale, and the odd instruments and arbitrary tone systems of Harry Partch are
idealistic emancipatory efforts to break the spell. However, only Schaeffer’s work seems
to have had lasting importance and not really as a specific aesthetics but much more as
a method—and, frankly, a method that first aroused greater interest once its practical
implications were simplified by the introduction of digital sampling techniques.
Instead of forming a substantial challenge to the systems, the electric turn simply per
petuated them in its composition and concept of musical artifacts. It did contribute
immensely to the development and shaping of music as we know it today, but it did not
form any kind of break. Standing on the shoulders of the advances of the, by this time,
unambiguous systems, and the practices and understandings they had led to, the electric
turn, for its part, brought about a great many techniques and approaches toward music
making and performance that progressively changed the concept and idea of sounding
music. In particular, the recording studio nourished ideas and imaginations about music
that were unthinkable without it; in this way, the studio became a musical instrument in
itself. It has not really challenged the systems though—they are safe and sound—instead,
the electric turn acts much more like an interface that facilitates the shaping of the sound
ing of the music as performance, and makes this sound facilitation an art of its own.
In what is essentially the same period, a few composers such as Max Matthews
and Lejaren Hiller, who in their capacity as researchers had access to the mainframe
128 martin knakkergaard

computers of the time, carried out experiments with digitally produced—and partly
generated—music and sound. The application of digital technology did not just imply
an expansion of available interfaces (digital technology’s physical interfaces that are
comparable with traditional music instruments in fact came somewhat later) but addi
tionally offered new ways to control and interact with musical sound as well as new
models for musical shaping. From the start, this was only carried out on a very limited
scale since it could only be done by using mainframe computers and more or less stan
dardized command line programming. But, following the introduction of MIDI, a
standard protocol for the digital control of musical events, in the beginning of the 1980s,
together with the rapidly growing propagation of microcomputers and the development
of “graphical programming” in the same decade, the digital interface became quite
ubiquitous and so did various kinds of interaction that exceeded the, strictly speaking,
very definite forms of interactions that are possible by means of traditional musical
instruments (acoustic as well as electric). Today, computers, digital audio, and MIDI—
in the form of a vast number of music and sound applications of which many are spe
cifically aimed toward particular uses, interests, and music—together have become a
more or less dominating factor and reference, making the tone system, along with the
matching notation system, accessible from a plethora of digital sources.
One of the most baffling elements that this development has brought with it is the
unification of the three separate interdependent systems—or technologies—discussed so
far: the tone system, the notation system, and the interface technology, namely, the instru
ments (see also Dyndahl, volume 1, chapter 10, and Danielsen, this volume, chapter 29).
Digital technology in the form of MIDI integrates the three systems in such a way that
they appear to be one coherent and indivisible system. In this alternative and, in a way,
nonphysical world, the tone of the system, the note of the score, and the key of the key
board have apparently become one and any trace of theory and abstraction has practically
been obscured by the manifest totality and parallelism of the digital virtualization (for a
more detailed discussion on some of the consequences of this, see Knakkergaard 2016b).
Albeit digital technology logically offers unlimited ways of organizing sound into separate
tone steps, for designing interfaces and for representing sonorous events graphically or
likewise, such opportunities have nevertheless only been developed to a modest degree,
and even though the technology epitomizes a situation where most physical borders can
be crossed at will, the twelve intervals of the octave and its protagonist, the diatonic
scale, are maintained. By means of digital technology it is, for instance, much easier than
ever before to work with different tunings. In addition to the possibility for the user to
define personal, unique tunings, the music application Logic Pro X, for instance, offers
97 “standard” tunings including scientific ones such as “1/4-comma meantone with
equal beating fifths” and “12-tone Pythagorean subset of JI 17-tone scale”; historical ones
like “Ramos de Pareja (Ramos de Pareia)—Monochord, Musica practica (1482)” and
“J. S. Bach ‘well temperament,’ acc. to Jacob Breetvelt’s Tuner”; and also exotic tunings
such as “Northern Indian Gamut, modern Hindustani Gamut out of 22 or more Shrutis”
and “Gamelan Udas Mas (approx) s6,p6,p7,s1,p2,s2,p2,p3,s3,p5,s5,p5.” Due to MIDI’s
limitations, there is, however, no way to avoid the twelve steps of the octave and the
systemic abstractions: the imaginary regime 129

notion of the standard keyboard layout because the otherwise abstract tone in MIDI is
understood as a key-number instead of a tone-number and as such is tied to the concrete
concept of a key which is struck.6 MIDI is organized via the metaphor of the standard
Western keyboard, plain and simple, and thus, in reality, controls and regulates the way
music is created and appreciated in a much more rigid and dominating sense than was
possible before its advent, eventually nourishing the notion that the systems behind are
ontologically given a priori.
Thus, unless we turn to sound art and sound installations, the development seems
only to have consolidated the dominance of the implied systems and the interval has not
at all been set free. Although practices that imply procedures such as glissandi, blue
notes, and similar alterations that diverge from the fixed intervals are still widely used,
the keyboard metaphor is not really suited for the “in-betweens” and the 12-note segmen
tation of the octave—and the octave itself—in MIDI is not just a prerogative but an una
voidable premise.
Composers, performers, critics, and music thinkers, especially in the field of contem
porary music, every so often challenged this situation. The alternative protocol, ZIPI,
which was introduced in the 1990s, is maybe the best qualified and most versatile and
least esoteric example of this. But although the proposed standard was MIDI compatible,
and thus did not imply a complete break with current equipment and practices, it never
caught on and, to date, every attempt to establish an agenda that could threaten the con
cepts seriously has failed. For the time being, it seems fair to claim that music is not only
organized by means of the system’s concepts and elements, but it is also imagined
through the same conceptual formats—taking sound production of music into consid
eration confirms this.

Final Remarks

Music of today—and roughly speaking of the last 2,500 years—is not just influenced but
also determined by a particular kind of metaphysical thinking of the Ancient Greeks.
Although this thinking’s strong focus on the number four in reality lost its footing long
ago, it is still the major factor behind the idiosyncratic regime of possible tone-steps
within the range of audible sound and therefore the division of the octave—and nothing
seems capable of disturbing this regime seriously. The tone steps and their tuning might
appear to us as the squares or fields of a graph paper but, in reality, they build a format
that resembles the nodes of the lines of the grid and not the squares. Thus, there are many
possible nodes in-between the ones that are preprinted, and even though they are invisible,
these alternative nodes are very often articulated and exposed in musical performances.
They are, however, brought into play in relation to the preformatted nodes that, in this
way, function not just as a reference but also as theoretical and abstract final goals. There
is no doubt, that this well-organized and continuously exposed universe of pitches, and
especially the various selections that make up certain identifiable “tonics”—pentatonic,
130 martin knakkergaard

diatonic, and familiar, or unique modes—is essential as the core premise for musical
creativity in Western cultures, not least because the “tonics”—contrary to the thoroughly
chromatic as in the case of serialism—make way for sensations of invariant musical
signs (motives, figures) that are perceptually achieved through a kind of object perma
nence even where there is talk of obvious breakdowns between the different expositions
of the sign (Kjeldsen 2004). As Kjeldsen points out, it is our perception that “delivers”
the notion of identity or equivalence, just as it makes us “experience” elements such
as tension and relaxation. Thus, we really cannot hear what we are hearing, the “tonicity”
is too strong and our brains are too adapted to the patterns of the diatonic scale. The
strength of our perception of the diatonic scale—or any scale with which we can become
familiar—also makes it possible for us to ignore the quality or character of the sound
source and it does not matter if it is a piccolo flute or a bass synthesizer that plays a melody,
we can still recognize it just as we can even when it is transposed. The diatonic scale is a
strong regime.
Historically, there are numerous examples of alternative proposals aimed at replacing
or supplementing the ruling tone system. Ferruccio Busoni’s essay Entwurf einer neuen
Ästhetik der Tonkunst, first published in 1907, is a good—and famous—example of such
a proposal in which he, among many other things, suggests an expansion of the octave
into eighteen steps by means of a sixth-note-division of the whole tone (Busoni [1916]
1973, 40) and further claims that music’s full blossom is hindered by the instruments,
that “their range, their tone, what they can render . . . are chained fast, and their hundred
chains must also bind the creative composer” (Ruscoll 1972, 33). Many composers and
musicians in the twentieth century challenged the dominance of the systems, some by
expanding it further similar to what Busoni dreamed of, others by working with non
pitched or weakly pitched sounds in the form of samples, as introduced by the composers
of musique concrète. However, even when working with isolated sound samples, the
organization of these may take the form of a “normal” piece of music, in composition as
well as in realization. This is, for instance, the case in a highly “outer-space-soundscape”
composed by Eric Serra for a particular scene in Luc Besson’s movie The Fifth Element.
In the scene, even though the odd sounds and many “picturesque” sound-effects appear
somewhat chaotic and as a properly stereotyped space-soundscape, when closer exam
ined they turn out to be neatly composed in accordance with not just a discrete steady
beat but also in accordance with a selection of pitches that evokes a sense of musical
mode with references to the diatonic scale (see Knakkergaard 2009, 294). Again, the
diatonic scale is a strong regime, and, in a way, it seems fair to claim that not just our
musical practices but also our understandings and imaginations of music are subject to
the discreet hegemony of diatonism. However, this diatonism, and the abstract entities
of the reductionist tone system as a whole, have nourished the development of a firm,
highly complex and advanced basis for musical creativity and imagination. These frame
works that today truly are numerically regulated have provided the prerequisites that
secure the comprehensibility of highly complex sound structures and an overwhelming
amount of highly different musical genres and styles. So, maybe the Greeks were right
after all, not in their focus on the number four, but in their vision and imagination of the
systemic abstractions: the imaginary regime 131

order of musical sound regulation as a means to gain insight into some of the fundamental
principles of existence. By detaching the structure of the tone system from practice
and by making its entities abstract, the path is paved for a composite, ideal system whose
elements all are theoretically balanced. Such a strategy is not unique to Western cultures:
no matter where we look in time and place, there are always basic norms, scales, and
generative principles in play in the making of sonorous musical artifacts aimed for cere
monial and religious and eventually epistemological purposes. The question remains,
however, how is it possible to overcome the limitations that the current principles entail,
how can their imaginative spell be broken?

Notes
1. From around 1150, Byzantine neumes did, however, indicate intervals but not pitches.
2. Or reconstructed theoretically, as the tetrachords were present empirically at the time.
3. The diatonic scale can also be produced by applying the fifth as “generator interval”:
F–c–g–d’–a’–e’’–b’’, just as the pentatonic scale can be produced by stacking fifths F–c–g–d’–a’
and the chromatic scale by proceeding from the diatonic: F–c–g–d’–a’–e’’–b’’–f#’’’–c#’’’’–
g#’’’’–d#’’’’’–a’’’’’. However, before the introduction of equal temperament—which in fact
replaces the fifth with the semitone as the generator interval—these intervals would, just
like the ones produced by means of the tetrachord, not “be in tune” when folded back into
the same octave.
4. The fact that the Pythagoreans defined the interval of the whole note as 9:8 does not cor
rupt this point, as they understood the 9 as a fifth plus a fifth and the 8 as a fifth plus a
fourth, this way maintaining the limits of 4.
5. See https://en.wikibooks.org/wiki/Guitar/Print_Version. Accessed March 2017.
6. By tuning every single key-number (tone) individually, it is possible to program MIDI in
such a way that it has more than 12 tones to the octave.

References
Attali, J. 2008. Noise and Politics. In Audio Culture: Readings in Modern Music, edited by
C. Cox and D. Warner, 7–9. New York: Continuum.
Brinner, B. 2017. Indonesia. §III: Central Java. 3. Instruments and ensembles. Grove Music
Online. Oxford Music Online. Oxford University Press. Accessed October 16, 2017.
Burkert, W. 1962. Weisheit und Wissenschaft: Studien zu Pythagoras, Philolaos und Platon.
Nürnberg: Hans Carl.
Busoni, F. (1916) 1973. Entwurf einer neuen Ästhetik der Tonkunst. Hamburg: Verlag der
Musikalienhandlung. Karl Dieter Wagner.
Dahlhaus, C. 2017. Harmony. Grove Music Online. Oxford Music Online. Oxford: Oxford
University Press.
Handschin, J. 1948. Der Toncharacter. Zürich: Atlantis.
Hansen, F. E. 1995. Middelalderen. In Gads Musikhistorie, edited by S. Sørensen and B. Marschner,
15–72. Copenhagen: G.E.C. Gad.
Hansen, F. E. 1999. Musik: Logisk konstruktion eller æstetisk udtryk? In Æstetik og logik,
edited by J. Holmgaard, 151–167. Aalborg, Denmark: Medusa.
132 martin knakkergaard

Hansen, F. E. 2003. Tonesystem. In Gads Musikleksikon, edited by F. Gravesen and

M. Knakkergaard, 1641–1646. Copenhagen: G.E.C. Gad.
Kjeldsen, J. 2004. Tonale gruppesymmetrier versus tonale gennembrud: Semiotikken mellem
struktur og tilbliven. In Semiotiske undersøgelser, edited by T. Thellefsen and A. M. Dinesen,
141–163. Copenhagen: Hans Reitzel.
Klempe, H. 1991. Musikkvitenskapelige retninger: En innføring. Oslo: Spartacus.
Knakkergaard, M. 1994. IO: Om musik, teknologi og musikteknologi. Odense, Denmark: Odense
Universitetsforlag.
Knakkergaard, M. 2009. The Musical Ready-Made: On the Ontology of Music and Musical
Structures in Film. In Music in Advertising: Commercial Sounds in Media Communication
and Other Settings, edited by N. J. Graakjær and C. Jantzen, 275–304. Aalborg, Denmark:
Aalborg Universitetsforlag.
Knakkergaard, M. 2016a. Music by Numbers. In Cultural Psychology of Musical Experience,
edited by S. H. Klempe, 299–318. Charlotte, NC: Information Age.
Knakkergaard, M. 2016b. Unsound Sound: On the Ontology of Sound in the Digital Age.
Leonardo Music Journal 26: 64–67.
Leibniz, G. W. von. 1712. Letter to Christian Goldbach, April 17, 1712. https://en.wikiquote.org/
wiki/Gottfried_Leibniz. Accessed November 10, 2017.
Lindley, M. 2017. Pythagorean Intonation. Grove Music Online. Oxford Music Online. Oxford
University Press. Accessed October 11, 2017.
Manning, P. 1994. Electronic and Computer Music. Oxford: Clarendon.
Mathiesen, T. J. 2017. Greece, §I: Ancient. (i) Pythagoreans. Grove Music Online. Oxford
Music Online. Oxford University Press. Accessed March 21, 2017.
Patel, A. D. 2006. Musical Rhythm, Linguistic Rhythm, and Human Evolution. Music Perception
24 (1): 99–104.
Powers, H. S., and R. Widdess. 2017. Mode, §V: Middle East and Asia, V. Middle East and Asia.
Grove Music Online. Oxford Music Online. Oxford University Press. Accessed September
30, 2017.
Riemann, H. 1893. Vereinfachte Harmonielehre, oder die Lehre von den tonalen Funktionen der
Akkorde. London: Augener.
Ruscoll, H. 1972. The Liberation of Sound. New York: Prentice Hall.
Sacks, O. 2008. Musicophilia: Tales of Music and the Brain. London: Picador.
Sundberg, O. K. 1980. Pythagoras og de tonende tall. Oslo: Tanum-Norli.
Tran, V. K. 1977. Is the Pentatonic Universal? A Few Reflections on Pentatonicism. World of
Music xix (1–2): 76–84.
Wishart, T. 1996. On Sonic Art. Amsterdam: Harwood Academic.
chapter 7

From R ays to R a
Music, Physics, and the Mind

Janna K. Saslaw and

James P. Walsh

Introduction

Surfing the net one day led us to the discovery of a fortuitous combination of articles.
The first, by Elizabeth Hellmuth Margulis, titled “One More Time” (Margulis 2014),
deals with the crucial role of repetition in musical experience. The second article,
“A New Physics Theory of Life,” describes the work of MIT’s Jeremy England (Wolchover
2014). England’s work focuses on the second law of thermodynamics, particularly on
how entropy can be defeated locally under certain physical conditions. The signifi-
cance of repetition in both articles led us to the thought that a line could be traced
from the one to the other. That is, if repetition is crucial to the emergence of life and
to the experiencing of music, could it be that a fundamental relationship underlies
both phenomena?
We will take a moment to examine the implications of England’s work. Entropy can
be regarded as a measure of the tendency of energy to disperse over time.1 We focus on
entropy of “open” systems. Within these systems, entropy can be kept low by increasing
the entropy of their surroundings. During photosynthesis, for example, a plant uses
sunlight to maintain its own internal order while increasing overall entropy in the uni-
verse (Wolchover 2014). Jeremy England’s mathematical formula shows that more
likely evolutionary outcomes involve atoms that absorb and dissipate more energy.
Significantly, “[p]articles tend to dissipate more energy when they resonate with a driv-
ing force, or move in the direction it is pushing them, and they are more likely to move
in that direction than any other at any given moment.” For example, “clumps of atoms
surrounded by a bath at some temperature, like the atmosphere or the ocean, should
134 janna k. saslaw and james p. walsh

tend over time to arrange themselves to resonate better and better with the sources of
mechanical, electromagnetic or chemical work in their environments” (Wolchover 2014).2
There are two mechanisms mentioned by England that can increase efficiency of energy
use and its subsequent dissipation. These are self-replication (in nonliving or living things)
and increasing structural organization. Self-replication increases energy use and dissi-
pation by copying an already efficient entity. Structural organization will only increase,
as indicated earlier, if it results in greater energy usage. Both mechanisms are found in
life forms, but are not limited to them.3 It seemed to us that resonance with an energy
source, self-replication (a form of repetition), and increasing structural organization
were all notions that pertain to sound in general and to music in particular, both as
physical and cultural productions. A few examples may suffice at this point. Since sound
is produced by waves, resonance works most obviously in areas where waves combine:
timbre, consonance, and synchrony of constituent elements. Repetition applies to rhythm,
but also to many other facets of sound, including the creation, recognition, and memory
of pitch patterns. Increasing structural organization can be found in the development of
sonic and musical creations over time.
In this chapter, we intend to trace the role of repetition from the atomic level to the
homeostasis (stable state of equilibrium) of life forms, to the formation of culture, and to
music. We will focus on the evolutionary advantage of music in homeostasis of indi-
viduals and cultures, both sub- and supraconsciously, delineating what we call a “homeo-
static frame of reference.” We will provide short examples in music, from lullabies
to Beethoven, before examining two longer examples presenting the Afrofuturist jazz
pioneer Sun Ra’s vision and methods of expanding listeners’ homeostatic frame of
reference through his music.
We speculate that music is not simply a cultural invention or an evolutionary trait,
but rather an outcome of elementary laws governing the disposition of matter.4 It is a
product of iteration or periodicity and the natural accumulation of complexity through
variation. Just as England implies that if you shine light on random atoms long enough,
they will tend to self-replicate and organize until, eventually, you will get a plant, we
propose that if you continue shining that light you will get music.

Replication, Repetition, Homeostasis

In this section, we will examine the key components of our conjecture: self-replication,
invariance, emergent structure, homeostasis, entrainment, swarm behavior, and neural
synchronization. Each of these components will be discussed in what follows. We use
the term “homeostatic frame of reference” to refer to any group of entities that col
lectively maintains homeostasis. Using this concept, it is possible to theorize a continuous
process of development from England’s observations about thermodynamics, through
work on the origins of life and the development of cells, on to theories of mind and
consciousness, and even the behavior of crowds, economies, and nations.
music, physics, and the mind 135

Self-Replication
Self-replication is a necessary mechanism for counteracting entropy. According to England,

Interest in the modeling of evolution long ago gave rise to a rich literature exploring
the consequences of self-replication for population dynamics and Darwinian compe-
tition. In such studies, the idea of Darwinian “fitness” is frequently invoked in com-
parisons among different self-replicators in a non-interacting population: the
replicators that interact with their environment in order to make copies of them-
selves fastest are more “fit” by definition because successive rounds of exponential
growth will ensure that they come to make up an arbitrarily large fraction of the
future population. (England 2013)

England’s contribution is the statistical formula predicting that:

a self-replicator’s maximum potential fitness is set by how effectively it exploits

sources of energy in its environment to catalyze its own reproduction. Thus, the
empirical, biological fact that reproductive fitness is intimately linked to efficient
metabolism now has a clear and simple basis in physics. (England 2013)

Note that the mathematical formulas to determine reproductive fitness will apply at
any level of structure. As England puts it, to examine self-replication, first the entity
being replicated must be identified. Whatever that entity may be, however, the probability
of its replication is determined in the same way:

“Self-replication” is only visible once an observer decides how to classify the “self ”
in the system: only once a coarse-graining scheme determines how many copies of
some object are present for each microstate can we talk in probabilistic terms about
the general tendency for that type of object to affect its own reproduction . . . Whatever
the scheme, however, the resulting stochastic population dynamics must obey the
same general relationship entwining heat, organization, and durability.
(England 2013)

Invariance
Invariance is the property of identity between entities. This identity could be of any sort.
For example, in the case of two different triangles, the area or the angles or some other
aspect could be invariant. Invariant behavioral properties of starlings cause flocking.
Invariant ratios between elements in sunflower heads lead to their arrangement in
spirals. In sound, invariance of wave forms connects instruments that may be of different
sizes or composition. In music, invariant intervals, contour, and rhythm (separately
or together) connect different motive forms. In a progression toward more complex
136 janna k. saslaw and james p. walsh

entities, replication leads to collections of similar entities; interaction of multiple similar

entities leads to emergent structure; emergent structure leads to formation of more
complex entities formed of multiple complex systems, and so on.

Emergent Structure
Circumstances that create a high enough probability of an entity sharing an invariant
feature with another entity tend to lead to a multiplicity of similar entities. These entities
then may give rise to an emergent structure—a new structure that comes to exist because
of the interactions of the initial individual entities. For emergent structure to arise, one
needs an instantiation of some entity, another entity that displays invariance with the
first one, and some property of the invariance between the items that allows for con-
nectedness. For example, life on earth is largely carbon-based due to the element’s ability to
chain together to create more complex combinations. Single-celled creatures contain
genes for cell adhesion molecules that allow them to attach to their environment but
also serve to connect them to each other and create the beginnings of multicellular
organisms (Neubauer 2011, 45–49). Musical meter may be viewed as an emergent prop-
erty of rhythms that coincide at regular time intervals.
Emergent structure may be found at any level of complexity. Each phase involves
similarly behaving entities that give rise to an emergent structure. This emergent struc-
ture then takes on its own life as an entity, interacting with similar entities, and in turn
gives rise to yet another layer of emergent properties. This simple pattern of events will
continue over time. Thus, particles form into atoms, atoms into molecules, molecules
into chemicals, chemicals into cells, cells into organs, and organs into organisms.

Swarm Behavior
Organisms give rise to yet another level of complexity when they form into groups. We
probably have all seen flocks of flying birds that appear to act with a single intelligence.
In fact, swarm behavior “emerges naturally from simple rules of interaction between
neighboring members of a group” (Fisher 2009, 9).
One could view human social groups as a kind of swarm. In the words of Len Fisher,
author of The Perfect Swarm (2009):

The process by which simple rules produce complex patterns is called “self-
organization.” In nature it happens when atoms and molecules get together
spontaneously to form crystals and when crystals combine to form the intricate
patterns of seashells. It happens when wind blows across the sands of the desert to
produce the elaborate shapes of dunes. It happens in our own physical development
when individual cells get together to form structures such as a heart and a liver, and
patterns such as a face. It also happens when we get together to form the complex
social patterns of families, cities, and societies. (2)
music, physics, and the mind 137

Fisher further notes that animal and human swarms provide certain advantages to
individuals:

Swarm behavior becomes swarm intelligence when a group can use it to solve a
problem collectively, in a way that the individuals within the group cannot. Bees use
it to discover new nest sites. Ants use it to find the shortest route to a food source.
It also plays a key role, if often an unsuspected one, in many aspects of our own
society, from the workings of the Internet to the functioning of our cities. (10)

In this passage, we see how “swarm intelligence” is used to solve problems that deal
with homeostasis. For insects, finding a nest or food source contributes to a stable environ-
ment and regulation of energy intake and expenditure. In our view, this is a primary
function of musical activity. We propose that music, which can serve to bring individual
humans together into a sociocultural group or “swarm,” also aids in solving problems
required for homeostasis (the ultimate goal of which is to sustain life).
To delineate the role of the second law of thermodynamics as it pertains to life forms
we will generalize from the concept of swarm behavior to homeostatic frames of
reference.

Homeostatic Frames of Reference

As stated earlier, “homeostatic frames of reference” delineate various groups of entities
that share a common need for maintaining homeostasis. Thus, a bee’s nest is an example
of a homeostatic frame of reference. Another example would be the human body, seen
as a collection of individual cells that must maintain an appropriate temperature to
avoid death.
In a work that served as a primary inspiration for this article, Raymond Neubauer
describes the significance of homeostasis in relation to evolution:

For life that exists within a relatively narrow range of conditions, there must be a
strong tendency toward homeostasis, toward building an inner world that is buf
fered against fluctuations in the outer world. The changes described here are probably
not just accidental (although mutations supplied the raw material). Those groups
that came to dominance in each era evolved new ways to be independent of their
surroundings. (2011, 26)

The concept can be seen to give rise to various levels of complexity in which one might
consider the role of homeostasis. For example, if we consider biological functions, we
might derive frames of reference that expand from individual cells through organisms,
families, communities, societies, nations, and so forth. The idea could be applied on both
larger and smaller scales. We could conceivably start with atoms and build up greater
levels of complexity through the creation of molecules, chemicals, protein chains, and
138 janna k. saslaw and james p. walsh

so on. Likewise, we could expand our notion of a homeostatic frame of reference to

include worlds, solar systems, galaxies, and the universe.

Periodicity
Periodicity is the recurrence of invariant attributes. We focus here on temporal perio-
dicity, which involves recurrence of invariant temporal intervals. At the cellular level,
periodicity helps regulate basic bodily functions and controls the scanning of the
environment that leads to perception.
Periodicity is necessary in musical activity as well. The existence of a tone, for example,
requires periodicity of sound wave frequency. Temporal periodicity is also fundamental
in creating a basic pulse.
At the level of the individual, repetition in music lends itself to the creation of memory.
Repetition is crucial to simple identification of event sequences. In this process, the
presence of invariant features in consecutive spans of time causes neurons to fire in a
manner that aids the formation of memory. The neurobiologist Gerald Edelman describes
the neuronal activity of sequencing events in the brain in these terms:

Signaling in either a phasic [periodic] or a continuous fashion across reentrantly

connected maps permits temporal correlations of the various selections that occur
among neuronal groups within these maps. These correlations are driven initially by
the signals arriving at primary cortical receiving areas from stimulus objects at a
given time and place; selections in all higher-order maps related to the presence of an
object are correlated through reentry with these primary areas.5 (Edelman 1989, 49)

Correlations permit certain neural pathways to be strengthened, while others weaken.

After multiple encounters with a stimulus, particular patterns of neuronal groups

will be selected in a mapped area. Following such selection, similar signals in each
neuronal channel can preferentially activate previously selected neuronal groups in
the repertoires of a neural region to which that channel is mapped. (50)

Regularly occurring sensory inventories, or global mappings, lead to adaptive changes

in the brain, and also to a sense that the individual is one enduring entity.

The properties of motion converted by global mapping and reentry to adaptive

action are fundamental to the building of perceptual experience via short- and long-
term memory. This continual activity (which might be called the rhythm of reentry)
is the substrate for continuity in primary consciousness. (248)

In a sense, then, brains are themselves “swarms,” with each neuron functioning both as
an individual and collectively according to simple rules. These rules include the activation
or inhibition of neuronal members of the “swarm” in order to optimize homeostatic
regulation.
music, physics, and the mind 139

Rodolfo R. Llinás (2002) compares neural activity to “some types of fireflies, which
synchronize their light flash activity and may illuminate trees in a blinking fashion like
Christmas tree lights. This effect of oscillating in phase so that scattered elements may
work together as one in an amplified fashion is known as resonance—and neurons do it
too” (12). This type of synchronization is referred to also as “entrainment.”

Entrainment and Social Bonding

Entrainment occurs when two different periodicities gradually come into synchroniza-
tion with each other (Clayton et al. 2004, 2).

The tendency for rhythmic processes or oscillations to adjust in order to match

other rhythms has been described in a wide variety of systems and over a wide range
of time scales (i.e. periodicities): from fireflies illuminating in synchrony, through
human individuals adjusting their speech rhythms to match each other in conver-
sation, to sleep-wake cycles synchronizing to the 24-hour cycle of light and dark.
Examples have been claimed from the fast frequency oscillations of brain waves to
periods extending over many years, and in organisms from the simplest to the most
complex as well as in the behaviour of inorganic materials and systems. (3)

In human activity, it may be that entrainment serves to activate coordinated action. For
instance, work songs use metrical regularity to enable the exertion of multiple indi-
viduals to happen simultaneously. This basic function can be extended to include the idea
of groove. Margulis describes the way that groove enables human bonding:

Listeners, in other words, engage in anticipatory attending, allocating attention in

advance to expected time points in the future. This strategy enables them to process
music more efficiently, devoting additional attention to moments where events are
likely to occur, but it also allows them to tap along, or join in. Repetition allows for
increasingly successful predictive attending, and the resulting entrainment mimics
the condition of successful social interactions and easy communication . . . Music
scholars have referred to elements of this state as groove: a felt, kinesthetic sense of
the predictable elements of the temporal structure within a particular episode of
music making. (Margulis 2013, 111–112)

Groove may enable individuals to merge into social units:

Groove tends to make people feel as though they were “a part of the music,” pro-
viding further evidence for a link between the ability to successfully predict elements of
the musical structure and the kind of extended subjectivity that has been identified
as a hallmark of strong experiences of music. (112)

Repetition facilitates social connectedness by contributing to the sense of a piece’s

“rightness,” in Margulis’s words. She maintains,
140 janna k. saslaw and james p. walsh

feeling that a piece is inevitable and right amounts to an appealing sense of someone
else’s (the composer or performer) artistic act precisely matching our own s ensibilities.
It can be intoxicating to feel that a piece created by another person is funda
mentally right. (113)

That is, repetition facilitates a sense of pleasurable resonance with others.

Music and Homeostasis

At a physical level, music helps our body to coordinate homeostasis. It has been shown
to have effects on the immune system, through its effect on stress levels and, indeed, has
recently been shown to have effect on a cellular level (Lestard et al. 2013).
At the individual level, music helps us to create memories and to develop a personal
sense of identity. At the societal level, music impacts on many aspects of life, including
childrearing, mating, and peer bonding. In each of these we observe that maintaining
homeostasis serves as a principal purpose.
Let us use childrearing and warfare as examples. Lullabies contain certain features
that contribute to the well-being of infants, contributing to the likelihood of their survival.
A study done at the Great Ormond Street Hospital for Children, London, found that:

Live music, as compared with reading and no interaction, appears to improve the
wellbeing of young patients with cardiac and/or respiratory problems, and also to be
beneficial for their careers. It seems to be live music per se, and not the social com-
ponent of the musical interaction that attracts and distracts children, thereby helping
them to feel less in pain and more relaxed, and this seems to apply to the older children
in particular. (Longhi et al. 2015)

Ellen Dissanayake has done extensive research into the role of music in human infancy.
Music is an important mode through which parents interact and bond with their children,
leading to homeostatic benefits for both.

Researchers have described a number of adaptive or functional benefits to infants of

early interactions or their component—from achieving emotional self-regulation
to predisposing the learning of language . . . The most important function with regard
to my argument here is that the mechanisms of early interactions serve to com-
municate and coordinate the emotions and behavior of the interacting pair and thus
to create and reinforce their emotional bond. . . . One can propose that mother-infant
interaction, with its peculiar characteristics, is an adaptive behavior that enabled
ancestral infants to enjoy increased survival and their mothers to have greater
reproductive success. (Dissanayake 2008, 173–174)

On the opposite end of the spectrum, music can also be used to improve the chances of
a society surviving while at war, either by using it to discourage or even to torture the
music, physics, and the mind 141

enemy,6 or by using it to benefit soldiers, inspiring them to fight or distracting them

from pain or defeat.
At a social level, the reuse of invariant features in multiple pieces of music helps to
create an identifiable style. Robert Gjerdingen (1988, 2007) has identified these kinds of
style-defining invariant features for the Galant and Classical eras. For invariant features
of musical works to be recycled they must be found to be significant to musicking
(Small 1998, 9). Thus, we should be able to establish that certain features of musical styles
tend to affirm underlying socially binding values. R. Murray Schafer confirms:

The thesis is also borne out well in tribal societies where, under the strict control
of the flourishing community, music is tightly structured, while in detribalized
areas the individual sings appallingly sentimental songs. Any ethnomusicologist
will confirm this. There can be little doubt then that music is an indicator of the age,
revealing, for those who know how to read its symptomatic messages, a means of
fixing social and even political events. For some time I have also believed that the
general acoustic environment of a society can be read as an indicator of social con-
ditions which produce it and may tell us much about the trending and evolution of
that society. (Schafer 1993, 7)

By examining the values that bind individuals into a society, we should be able to deter-
mine how these values contribute to homeostasis—how the sound environment leads to
sustenance of suitable living conditions.

Sub- and Supraconscious Levels

Music has a significance that is not obvious to us at the conscious level. It communicates
directly with the cells of our body, without intervention of natural language, and it serves
to bind us together as members of a society, with each of us serving as single cells in a
collective organism. Thus, its “meaning” lies both “beneath” our consciousness, as when
our body reacts to it, and “above” our consciousness, when it contributes to our partici-
pation in group activities.
Music perhaps also is a manner in which the supra mind, or the mind that is an
emergent property of the combination of human minds, thinks. It interfaces between us
and coordinates the action of the “supra organisms” of which we are merely cells.

Music as Expanding Homeostatic Frames of Reference

As organisms start to deal with sound they begin to develop a new form of intelli-
gence. As they begin to communicate through sound this activity becomes part of
their swarm intelligence. As these communications evolve they become more com-
plex and begin to take on replicable forms that can be recombined to create systems
capable of expressing more and more complex communications about the internal
142 janna k. saslaw and james p. walsh

state of the organism. Thus, organized sound itself (including music) contributes to
the evolution of intelligence.
It may be that composers propose new combinations of sound that then affect
swarm behavior. With his Ninth Symphony, Beethoven is attempting to affect human
history. At the moment when the baritone soloist and then the chorus announce “Alle
Menschen werden Brüder” (“all men become brothers”), the composer and poet com-
bine to expand the homeostatic frame of reference from the listener as individual to
the world as family.

Emotions and Embodied Meaning

The field of embodied cognition considers thought to be shaped by the most immediate
knowledge humans have—that of the orientation and sensations of their bodies in the
world. Musical imagination is no exception to this mode of understanding. It rests on
mental representations of musical features shaped by schemas (essentially neuronal phase
synchronizations) such as CONTAINER, PATH, and NEAR-FAR. The CONTAINER
schema shapes our sense of music being “in” or “out” of a key, scale, or style; the PATH
schema shapes our sense of a passage “moving” in space—higher, lower, “from” one key
“to” another, for instance; the NEAR-FAR schema places musical features relative to a
focal point in mental “space.” These and other schemas shape the world of musical
imagination through mapping or “metaphor” (see Saslaw 1996, 1997–1998; Cox 2011).
Recent work in neuroscience indicates that the brain maps closeness or distance of
social, spatial, and temporal relations from the self (see Tavares et al. 2015). These relation-
ships relate to emotional content, showing how the listener’s images of and emotional
responses to music are tied to embodiment. As discussed previously, infants experience
well-being and calm from being held and fed close to their caregivers, and we suggest that
they associate closeness with these emotions. Thus, the infant’s sense of space is emo-
tionally invested. Closeness is restful and calm, distance is uncomfortable and tense.
We suggest that music induces emotions relative to the conceptual closeness or distance
of its elements. Musical images display the following correspondences:
Spatial/Emotional images: Close = Familiar = Safe = Comfort = Rest
Musical realm: Tonic, Familiar, Resolution, Closely related tones/keys
Spatial/Emotional images: Distance = Unfamiliar = Dangerous = Anxiety = Unrest
Musical realm: Foreign tones/keys/overtones, unfamiliar styles
These relationships between spatial/emotional images and the musical realm may be
a result of basic laws of thermodynamics. Resonance would underlie more positive
emotions, the ones we sense as familiar, warm, and so forth. Lack of resonance, or rather
the absence of coordinated periodicity, would underlie the more distant, unfamiliar, and
potentially negative emotions.
We propose that this correspondence occurs not because image schemas invoke
metaphors, but because metaphors themselves arise from the processes Jeremy England
music, physics, and the mind 143

describes. The primary concept here is the role that periodicity plays in creating and
enhancing adaptive systems. Neuronal phase synchronization creates a periodicity that
allows the stochastic system of potential neural connections to become the adaptive
system that is the mind. In other words, metaphors arise from the inherent tendency of
our cognitive apparatus to adapt for homeostatic regulation, and music’s periodic
character aids this process.

Sun Ra

In this section we will introduce musical examples that illustrate the principles stated
above. Our choice to examine the music and thought of Sun Ra (born Herman Poole
Blount, 1914–1993), stems from his explicit attempt to create a larger, better homeostatic
frame of reference. Many styles and genres of music overtly feature periodicity that
encourages physical entrainment and social connection: dance music of all sorts, work
songs, gospel-style hymns, marches, for instance. Minimal music, especially the early
work of Philip Glass and Steve Reich, brings attention both to entrainment and also
lack of it. However, Sun Ra’s work, at times, combines different periodicities in order
to create the emergent property of new types of entrainment, thus encouraging us to
expand our homeostatic frame of reference to a planetary scale.
In a sense, Sun Ra dedicated his career and music to imagining moving beyond the
society that held him back. Yet Sun Ra did not simply imagine what his universal utopia
would be like. He spoke and behaved as if it were reality. Sun Ra created his life and
music as a means of refusing to participate in the oppressive narrative of his time.
Sun Ra spoke of the earth as being out of tune with the universe. We take his comments
to mean that humans need to expand our homeostatic frame of reference to include a
vision of the earth as it exists in the vastness of space. Sun Ra’s thinking on this subject
seems akin to the message delivered in James Lovelock’s classic book on the environment:

We need to love and respect the Earth with the same intensity that we give to our
families and our tribe. It is not a political matter of them and us or some adversarial
affair with lawyers involved; our contract with the Earth is fundamental, for we are
a part of it and cannot survive without a healthy planet as our home. I wrote this
book when we were only just beginning to glimpse the true nature of our planet and
I wrote it as a story of discovery. If you are someone wanting to know for the first
time about the idea of Gaia, it is the story of a planet that is alive in the same way
that a gene is selfish. (Lovelock 2000, viii–ix)

Sun Ra considered his music to come from beyond Earth. For him, an envisioned music
of space would not unfold like that of Earth.

If the harmony is just what they teach you in schools, then it wouldn’t be any other
than what we’ve been hearing all along, but when the harmony’s moved the rest is
144 janna k. saslaw and james p. walsh

supposed to move and still fit, then you’ve got another message from another realm,
from somebody else. Superior beings definitely speak in other harmonic ways than
the earth way because they’re talking something different, and you have to have chord
against chord, melody against melody, and rhythm against rhythm; if you’ve got that,
you’re expressing something else. (Schaap 1989, cited in Szwed 1997, 128)

“Melody against melody, and rhythm against rhythm” is immediately apparent in the
composition “Space Is the Place” (Sun Ra 1973). This is perhaps Sun Ra’s most well-
known composition, having been performed on the television show Saturday Night Live
in 1978. The bass ostinato is in 5/4 while the melody is in cut time (2/2). This means that
their respective downbeats coincide only every five bars of cut time (or four of 5/4).
Since the melody’s phrases move in four-bar units, this means that the two patterns
would begin together only every twenty (or sixteen) bars (the melody is not identical
throughout, but the four-bar phrases continue). Adding to the different feels of the osti-
nato and melody is the fact that, although quarter notes are the same tempo in both, in
the former the beat is the quarter note while in the latter it is the half note. In per
formance, percussion and other parts may add to the layers of metrical conflict. This use of
parallel streams of rhythms that articulate conflicting metric structures lures the listener
into a kind of entrainment: after multiple repetitions, the mind begins to coordinate the
5/4 bass ostinato with the cut time melody in a manner that creates a sense of flow. In
other words, the invariant quarter-note pulse, combined with the two conflicting metrical
streams, creates an emergent property of entrainment over longer spans of time—the
coordinated downbeats of the two meters. This emergent property, a new kind of entrain-
ment, reflects the lyrics’ encouragement to the audience to expand their consciousness:

Outer space is a pleasant place

A place where you can be free
There’s no limit to the things you can do
Your thought is free and your life is worthwhile
Space is the place

Sometimes, “rhythm against rhythm” is found in the subtle shifting, variation, and expan-
sion of motives that confound metrical expectations, as in “Dance of the Language
Barrier,”7 which probably dates from the early 1980s.8 The titular “language barrier” refers
to the difficulty of understanding created by Sun Ra’s radical alteration of his materials,
which disguises the underlying motivic repetition, but it also refers to the problems of
communication between human beings. In terms of our discussion, the “language
barrier” represents the border between differing homeostatic frames of reference. The
“dance” aspect of the title suggests a pairing of the opposing sides, creating an expansion
of one’s social framework to include those who do not speak your language.
“Dance of the Language Barrier” is the musical realization of the difficulties of creating
entrainment between different homeostatic frames of reference. Sun Ra tried to create
a consciously challenging sense of entrainment, largely through more challenging rhyth-
mic ideas. Nevertheless, if he had thought that the language barrier was insurmountable,
music, physics, and the mind 145

he would not have written a piece of music to tackle, and, through physical and social
means, eliminate it.
Sun Ra is constantly changing the length of his motives in “Language Barrier.” While
many jazz phrases flow in units of two or four bars, there is no consistent phrase length
here. Most of the variations in motivic length in this tune are created through additions
or subtractions from the upbeat figures that begin every motive. Listeners certainly
will notice invariant elements in the phrases, especially after repeated hearings, but the
degree of variation seems much higher than in typical jazz tunes. The metrical clarity of
the tune is severely undermined by a high degree of syncopation, achieved by minimiz-
ing articulation of downbeats and emphasizing upbeats. The composition also contains
a very complicated pattern of accents. When jazz musicians are playing a tune, they
often talk about whether the “feel” of a note is “up” or “down.” “Up” notes occur as antici-
pations or delays of the beat and are played as accents, resulting in syncopation. “Down”
notes occur on the beat. In our own experience performing “Dance of the Language
Barrier,” we found that the written score does not adequately portray where to place ups
and downs. Only through listening to the tune as played by Michael Ray, former trum-
peter with the Arkestra, were we able to place the accents and shape the phrases the way
Sun Ra had taught them to his band. In a personal conversation with us in October 2000,
the woodwind player Marshall Allen, who worked with Sun Ra from the 1950s until the
latter’s death and now leads the Arkestra, discussed the difficulty of learning the accen-
tual patterns of Sun Ra’s music. Allen indicated that the main focus of rehearsals was
how to phrase the songs, how to achieve the right sound, tone, “vibration,” “voice,” or style,
how to “say” what Sun Ra wanted. This included when to slur notes, when to cut them
off, and when to “push the time forward or backwards,” “before the beat” or “after.”9
Part of the difficulty in detecting patterning in “Dance of the Language Barrier” comes
from the relatively constant stream of fast note values in the long, irregular phrases,
punctuated by longer note values that seem to be irregularly placed. Finally, the harmonic
progressions of this tune do not reinforce a sense of regular phrase structure. In fact,
apparently there was no single progression for the tune. Sun Ra was notorious for
reharmonizing his compositions at each performance. Since he was the keyboard player,
many of his more complicated written scores do not contain chord symbols—only he
knew what harmonies he would play, and the band would follow.
All of these factors combine to underline the “Language Barrier” of the piece’s title.
However, a crucial factor serves to encourage metrical entrainment: the drum part. In
the recording, there is a clearly discernable jazz swing beat, with cymbals adding emphasis.
Thus, a listener might have great difficulty humming the tune, for the reasons outlined
previously, but would still have no trouble tapping a foot to the performance. Thus, in a
sense, the language barrier is being erected by the melody and broken by the drums of
the Sun Ra Arkestra. Or perhaps the tune/harmony is one “language” and the very tradi-
tional swing drum part is another. In any case, one would need to supply more than
just passive listening. The kind of “dance” that would be performed to this music might
well require superhuman efforts—exactly what Sun Ra wanted. In our experience, if
one makes those efforts, one enters a new understanding. The fact that this music can be
146 janna k. saslaw and james p. walsh

played by a big band, which was designed to be a model of cooperation and coordination,
means that the “Language Barrier” is surmountable, and rewards listeners with a cosmic
mode of dance.
From Sun Ra’s writings and communications (see Sun Ra 2005; Szwed 1997), it seems
that he hoped audiences, in a sense, would create new thought patterns that would
resonate with the advanced patterns of his music, turning away from negative, divisive
ideas toward a more inclusive frame of reference. Sun Ra envisioned an active role for
every person creating or hearing his music. He felt that “all people vibrate” (Sun Ra 2005,
460), and all their individual frequencies were important. “Each person is music himself
and he’ll have to express what he is” (476). Echoing the terms used by Jeremy England
(discussed earlier) to describe the resonance of atoms with an energy source, Sun Ra
associates particular sound and musical frequencies with the notion of race: “each color
has its own vibration. My measurement of race is rate of vibration—beams, rays” (460).
So, in his view, individuals and races not only resonate with external energy sources,
but they are themselves energy sources. Through the vibrations of his music, Sun Ra
suggests, entrainment between the audience and the performers takes place, resulting
in a higher state of being. “The real aim of this music is to coordinate the minds of people
into an intelligent reach for a better world, and an intelligent approach to the living
future” (457). Achieving an emergent entrainment different from normal experience
would result in humans actually becoming different, more natural beings (ones who
do not injure or kill their brothers and sisters—to Sun Ra, a much better homeostatic
condition). “Space music is an introductory prelude to the sound of greater infinity. . . . It
is a different order of sounds synchronized to the different order of being. . . . It is of, for
and to the Attributes of the Natural Being of the universe” (457).

Musical Universals versus

Cultural Difference

Music functions at two nonconscious levels. At the level of the body, music affects
hormonal and motor systems; at the level of society these hormonal and motor systems
contribute to coordinated behavior. Given this theory, one can conclude that music does
have basic universal functions and effects. However, history and evolution have led to
the creation of differing strategies for maintaining homeostasis in different “climates.”
From these differing strategies, basic values systems emerge and develop. The differences
in musical thinking in different cultures, and so on, then arise from the differing values
systems and behaviors that arise from cultural differences.
Invariance and repetition allow for predictive coherence and create the possibility of
emotive hearing in individuals and hence coordinated actions between individuals. As
an example, mating strategies10 may be viewed as related to musical taste. A society that
favors fewer children and greater parental commitment may prefer music that differs
music, physics, and the mind 147

from that in a society that favors more children and privileges the act of mating. One
might speculate that the latter society will feature a more “dance-friendly” style of music
with sharper attacks, motor rhythms, and simpler melodic forms, while the former
might place more emphasis on complex melodic shape.

Resonance and Neural

Synchronization

Whatever the features of a musical style may be, it is likely that their emergence was
shaped by resonance, in accordance with the processes described earlier. Resonance,
according to our view, encompasses both universal laws of physics and cultural dif-
ferences, as indicated by the work of Edward W. Large (2011). He finds, for example, that:

Tonality is a universal feature of music, found in virtually every culture, but tonal
“languages” vary across cultures with learning. Here, a model auditory system, based
on knowledge of auditory organization and general neurodynamic principles, was
described and studied. The model provides a direct link to neurophysiology and,
while simplified compared to the organization and dynamics of the real auditory
system, it makes realistic predictions. . . . Analysis of the model suggests that certain
musical universals may arise from intrinsic neurodynamic properties (i.e., nonlinear
resonance). Moreover, preliminary results of learning studies suggest that different
tonal languages may be learned in such a network through passive exposure to music.
In other words, this neurodynamic theory predicts the existence of a dynamical,
universal grammar . . . of music. (123)

Even widespread cultural resonance, then, ultimately can be viewed as a consequence

of “a general tendency in driven many-particle systems towards self-organization into
states formed through exceptionally reliable absorption and dissipation of work energy
from the surrounding environment” (Perunov et al. 2014).
Neuronal oscillations in the brain may well function according to the same probability
calculations. Periodicity amplifies the probability that a particular alignment may recur.
Put briefly, stochastic conditions become systematic when subjected to periodicity,
whether the entities involved are atoms, chemical compounds, simple or complex organ-
isms, brains, social groups, musical systems, or entire worlds.
In the realm of music, repetition is sound rendered appropriate for memory
formation. We suggest that periodicity in music amplifies the brain’s neuronal phase
synchronization.
As stated earlier, we suggest that the underlying image schemas involved in processing
musical emotion have their effect because they correspond to resonance potentials in
neuronal phase synchronization. Similar processes would then apply to metrical
structures, phrase groupings, and large-scale musical form.
148 janna k. saslaw and james p. walsh

If, instead of imagining an open system with particles in it, as England asks us to
do, we substitute the brain for the system and potential synaptic connections for the
particles, we can see how neural synchronization functions to create the emergent
property of memory and ideas. A neuronal group that “fires together” repeatedly could
be considered equivalent to a recurring grouping of particles. As they fire, these neurons
would be absorbing and dissipating heat more efficiently than neural connections that
are not subject to periodic activation by neural synchronization. The neurophysiologist
Pascal Fries claims:

Inputs that consistently arrive at moments of high input gain benefit from enhanced
effective connectivity. Thus, strong effective connectivity requires rhythmic synchroni-
zation within pre- and postsynaptic groups and coherence between them.
(2015, 220, emphasis ours)

He proposes that cognitive processes rely on this type of synchronization:

Even in the absence of changes in (ultra)structural connectivity, neuronal synchroniza-

tion as an emergent dynamic of active neuronal groups has causal consequences for
neuronal communication. If neuronal communication depends on neuronal synchroni
zation, then dynamic changes in synchronization can flexibly alter the pattern of
communication. Such flexible changes in the brain’s communication structure, on the
backbone of the more rigid anatomical structure, are at the heart of cognition. (220)

The mind, then, is an emergent structure that arises as a result of the effect of periodicity
on the stochastic collection of possible neural connections.

Conclusions

So, based on the foundation laid out earlier, we feel that we have clarified the connection
between the phenomena of repetition in music described by Margulis and England’s
theoretical work on emergent systems. Using terminology developed by Grimshaw and
Garner (2015), we could say that music is an emergent perception that is part of an acoustic
ecology arising from the interaction of two different stochastic systems, the mind and
the universe. We can describe air pressure fluctuations in the auditory range as exosonus,
and the activation of neural networks caused by these sounds as endosonus (33).
Repetition in music causes the exosonic stimulus to be better suited to forming neural
networks and thus to become endosonic activity. We have proposed that this emergent
perception is facilitated by synchronized neural activity that creates a referential time
frame in which particular sequences of synaptic connections are more likely to recur
and thus are reinforced. The neural networks themselves then function as an emergent
system that gains efficiency in taking in and dissipating energy (in the form of neural
firings). The endosonic results can be recombined through memory activation to take
music, physics, and the mind 149

on the properties of an emergent system as well, and can be manipulated by imagination

to become effective communication. These effective communications are more likely
to have the appropriate resonance to become significant when they are structured by
repetition.
Let us revisit the steps in our argument:

• Self-Replication (Repetition)—enables increased efficiency of energy use and

dissipation. Entities use energy to create copies of themselves with the property of
• Invariance—which allows them to combine into
• Emergent Structures—whose coordinated action, in living beings, is
• Swarm Behavior—which serves to create
• Homeostatic Frames of Reference.
• Periodicity—is the mechanism behind
• Resonance—which serves as the primary driving force behind reversing entropy
in Jeremy England’s theory, coordinates neural activity, and serves as the basis of
• Entrainment—synchronization between entities, allowing for communication.
• Music—for humans, acts both as a catalyst of social entrainment and as a means
for homeostasis,11 aiding in the formation of homeostatic frames of reference.

Light shines on a heat bath. Its resonance creates entities that are more efficient at
processing heat, some of which then self-replicate. Eventually, the invariant relation-
ships between the replicants allow them to combine into emergent structures. Emergent
structures coordinate the actions of living beings into swarm behavior, allowing them to
better control homeostasis. Periodicity, which was already present in the system in the
form of resonance-creating energy, then serves as a basis for communication between
emergent structures by means of entrainment. This periodicity also influences neurons
to form networks, allowing consciousness to arise. Music is then a means by which
entrainment may be used for communication, allowing us to form more efficient homeo-
static frames of reference. As we have seen, Sun Ra used music to teach the human
species to see ourselves as a single homeostatic frame of reference in order for the spe-
cies to continue to exist in the vastness of space.

Notes
1. The entropy in a “closed” or isolated system can be described as follows: “[Entropy] increases
as a simple matter of probability: There are more ways for energy to be spread out than for
it to be concentrated. . . . Eventually, the system arrives at a state of maximum entropy called
“thermodynamic equilibrium,” in which energy is uniformly distributed. A cup of coffee
and the room it sits in become the same temperature, for example. As long as the cup and the
room are left alone, this process is irreversible. The coffee never spontaneously heats up
again because the odds are overwhelmingly stacked against so much of the room’s energy
randomly concentrating in its atoms” (Wolchover 2014).
2. In other words, England argues that under certain conditions, matter will spontaneously
self-organize instead of becoming more disordered. This tendency could account for the
150 janna k. saslaw and james p. walsh

internal order of many inanimate structures and of living things as well (see also Chaisson
2001). “Snowflakes, sand dunes and turbulent vortices all have in common that they are
strikingly patterned structures that emerge in many-particle systems” (Wolchover 2014).
3. “The underlying principle driving the whole process is dissipation-driven adaptation of
matter. . . . You start with a random clump of atoms, and if you shine light on it for long
enough, it should not be so surprising that you get a plant” (Wolchover 2014).
4. These notions have been around for millennia in the form of philosophical and religious
conceptions such as the “music of the spheres” in Western thought, as well as in ancient
Asian cosmology. Here, we add the contributions of modern physics and biology to these
earlier ideas.
5. Reentry is “a process of temporally ongoing parallel signaling between separate [neuronal]
maps along ordered anatomical connections” (Edelman 1989, 49). In other words, it is the
method by which receptors in the brain of stimuli from the world communicate and
coordinate with other neuronal activity elsewhere in brain structures. If neuronal activity
from separate receptors coincides temporally, those neural patterns are associated by
strengthening of their pathways. For example, if I hear a lullaby and simultaneously smell
baby powder, those stimuli can become associated in my brain.
6. For more on music used as a weapon, see Ross (2016) or music used in detention and
torture, see Windsor, this volume, chapter 14. For more on the sound environment of war,
see Bull, volume 1, chapter 9.
7. There is only one recorded Arkestra performance of this tune, on Sun Ra (1990).
8. Personal communications, October 2000, with Michael Ray, trumpeter in the Sun Ra
Arkestra from 1977 on, and with Robert L. Campbell, coauthor of Campbell and Trent (2000).
9. Allen also said that Sun Ra would sometimes ask the band to go faster on ascending pas-
sages and slower on descending ones. “If you go up the stairs you use more energy than
going down.” Sometimes only a part of the band would be accelerating while the rest
stayed at a steady tempo.
10. Neubauer (2011) describes two contrasting mating strategies, suited to differing habitats.
“Opportunistic, or r-selected species, tend to have rapid rates of increase, small size, many
offspring, rapid development, and little parental care. They are able to colonize variable or
unpredictable habitats quickly but may also experience catastrophic mortality when con-
ditions change. Equilibrial, or K-selected, species have fewer young, with slower develop-
ment, and a lot of parental care. They often exist in more constant or predictable environments
where competition is keen and long-term survival skills, in terms of either behavioral ver-
satility or physical growth, are important” (14).
11. While this chapter was in the editing process, Damasio (2018) was published. It supports
our thesis that homeostasis plays a crucial role in the formation of cultural activity, includ-
ing music. Damasio states that “cultural activity began and remains deeply embedded in
feeling” (5), and “feelings are the mental expressions of homeostasis” (6). Damasio defines
homeostasis as “the mechanisms of life itself and . . . the conditions of its regulation” (6).

References
Campbell, R. L., and C. Trent. 2000. The Earthly Recordings of Sun Ra. 2nd ed. Redwood, NY:
Cadence Books.
Chaisson, E. J. 2001. Cosmic Evolution: The Rise of Complexity in Nature. Cambridge, MA:
Harvard University Press.
music, physics, and the mind 151

Clayton, M., R. Sager, and U. Will. 2004. In Time with the Music: The Concept of Entrainment
and Its Significance for Ethnomusicology. ESEM CounterPoint 1. http://web.stanford.edu/
group/brainwaves/2006/Will-InTimeWithTheMusic.pdf. Accessed January 19, 2016.
Cox, A. 2011. Embodying Music: Principles of the Mimetic Hypothesis. Music Theory Online
17 (2). http://www.mtosmt.org/issues/mto.11.17.2/mto.11.17.2.cox.html. Accessed May 7, 2017.
Damasio, A. 2018. The Strange Order of Things: Life, Feeling, and the Making of Cultures.
New York: Pantheon.
Dissanayake, E. 2008. If Music Is the Food of Love, What about Survival and Reproductive
Success? Musicae Scientiae 12 (1 suppl), 169–195.
Edelman, G. M. 1989. The Remembered Present: A Biological Theory of Consciousness. New
York: Basic Books.
England, J. 2013. Statistical Physics of Self-Replication. Journal of Chemical Physics 139 (121923).
doi: http://dx.doi.org/10.1063/1.4818538. Accessed May 7, 2017.
Fisher, L. 2009. The Perfect Swarm: The Science of Complexity in Everyday Life. New York:
Basic Books.
Fries, P. 2015. Rhythms for Cognition: Communication through Coherence. Neuron 88 (1):
220–235. doi:10.1016/j.neuron.2015.09.034.
Gjerdingen, R. 1988. A Classic Turn of Phrase: Music and the Psychology of Convention.
Philadelphia: University of Pennsylvania Press.
Gjerdingen, R. 2007. Music in the Galant Style. Oxford: Oxford University Press.
Grimshaw, M., and T. Garner. 2015. Sonic Virtuality: Sound as Emergent Perception. Oxford:
Oxford University Press.
Large, E. 2011. Musical Tonality, Neural Resonance and Hebbian Learning. In Mathematics
and Computation in Music, 115–125. New York: Springer.
Lestard, N. D., R. C. Valente, A. G. Lopes, and M. A. Capella. 2013. Direct Effects of Music in
Non-Auditory Cells in Culture. Noise Health 15: 307–314.
Llinás, R. 2002. I of the Vortex: From Neurons to Self. Cambridge, MA: MIT Press.
Longhi, E., N. Pickett, and D. J. Hargreaves. 2015. Wellbeing and Hospitalized Children: Can
Music Help? Psychology of Music 43 (2): 188–196.
Lovelock, J. 2000. Gaia: A New Look at Life on Earth. Oxford: Oxford University Press.
Margulis, E. H. 2013. On Repeat: How Music Plays the Mind. Oxford: Oxford University Press.
Margulis, E. H. 2014. One More Time. Aeon. https://aeon.co/essays/why-repetition-can-turn-
almost-anything-into-music. 07 March. Accessed May 7, 2017.
Neubauer, R. L. 2011. Evolution and the Emergent Self: The Rise of Complexity and Behavioral
Versatility in Nature. New York: Columbia University Press.
Perunov, N., R. Marsland, and J. England. 2014. Statistical Physics of Adaptation. http://arxiv.
org/pdf/1412.1875.pdf. Accessed May 7, 2017.
Ross, A. 2016. When Music Is Violence. New Yorker. July 4, 2016 Issue. http://www.newyorker.
com/magazine/2016/07/04/when-music-is-violence. Accessed May 7, 2017.
Saslaw, J. 1996. Forces, Containers, and Paths: The Role of Body-Derived Image Schemas in
the Conceptualization of Music. Journal of Music Theory 40 (2): 217–243.
Saslaw, J. 1997–1998. Life Forces: Conceptual Structures in Schenker’s “Free Composition” and
Schoenberg’s “The Musical Idea.” Theory and Practice 22–23: 17–34.
Schaap, P. 1989. An Interview with Sun Ra. WKCR 5:5 (January–February) 7.
Schafer, R. M. 1993. The Soundscape: Our Sonic Environment and the Tuning of the World.
Rochester, Vermont: Destiny Books.
Small, C. 1998. Musicking: The Meanings of Performing and Listening. Middletown, CT:
Wesleyan University Press.
152 janna k. saslaw and james p. walsh

Sun Ra. 1973. Space Is the Place. Impulse IMP 12492.

Sun Ra. 1990. Mayan Temples. Black Saint 120121–1.
Sun Ra. 2005. The Immeasurable Equation: The Collected Poetry and Prose. Compiled and
edited by J. L. Wolf and H. Geerken. Norderstedt, Germany: Waitawhile.
Szwed, J. 1997. Space Is the Place: The Life and Times of Sun Ra. New York: Pantheon.
Tavares, R. M., A. Mendelsohn, Y. Grossman, C. H. Williams, M. Shapiro, Y. Trope, et al. 2015.
A Map for Social Navigation in the Human Brain. Neuron 87: 231–243.
Wolchover, N. 2014. https://www.quantamagazine.org/20140122-a-new-physics-theory-of-life/.
Accessed May 7, 2017.
chapter 8

M usic A na lysis
a n d Data Compr e ssion

David Meredith

Introduction

Most people are capable of imagining music, and composers can even imagine novel
music they have never heard before. This is known as musical imagery and can be dis
tinguished from musical listening or music perception, where the music experienced results
from physical sound energy being transmitted across the listener’s peripheral auditory
system and then transduced in the inner ear into nerve signals that are propagated to
higher centers of the brain. In both music perception and musical imagery, what is expe
rienced is actually an encoding of musical information, created by the person’s brain.
Alternatively, one could adopt a less dualist stance and say that experiencing music is the
direct result of certain spatiotemporal patterns of neural firing that encode musical
information. In listening, this encoding is generated from information about sound
currently in the environment, combined with the person’s musical knowledge. In
imagery, the encoding is constructed only from the person’s musical knowledge.
Sound is thus just one particular medium for communicating musical information
and is not a prerequisite for musical experience. Indeed, trained musicians can experience
(i.e., “imagine”) music they have never previously heard while silently reading a musical
score. Musical imagery and perception therefore have a great deal in common—indeed,
there are some brain centers (especially in the right temporal lobe) that are necessary for
both (Halpern 2003).
Both the way that one perceives and understands music as well as the music that one
is capable of imagining are therefore largely determined by one’s musical knowledge that
is gained through passive exposure to music, active learning of musical skills, and/or study
of music theory and analysis. It has been proposed in psychology, information theory,
and computer science that knowledge acquisition—that is, learning—is essentially data
compression (Chater 1996; Vitányi and Li 2000): on being exposed to new data, a learning
154 david meredith

system attempts to encode this data as parsimoniously as possible by removing redun

dancy in the data and relating it to what it already knows. If a learning system can
describe the new data in a compressed manner, then the total amount of space used to
store all the system’s knowledge increases by only a small amount. The less extra space
required to encode the new data, the better this new data is “understood” by the system.
In this chapter, I focus on how the musical knowledge that underpins both music
perception and musical imagery can be acquired by compressing musical infor
mation. In particular, my concern is with how it might be possible to find the best ways of
understanding musical works simply by compressing as much as possible the information
that they contain. The ideas presented in this chapter are founded on the assumption that
the goal of music analysis is to find the best possible explanations for the structures of
musical “objects,” where such objects are typically individual works or movements but
could be extracts from works (e.g., phrases, chords, voices, even individual notes) or
collections of works (e.g., all the pieces by a composer or in a particular genre). This
assumption raises the following question: given two analyses of the same musical
object (i.e., two different explanations for the object or ways of understanding it), how
are we supposed to decide which of the two is the “better” one?
Musicologists and music analysts who adopt a humanistic approach generally do not
use objective criteria for deciding which of two possible analyses of the same musical
object is preferable. Typically, such scholars prefer analyses that provide what they
individually consider to be “more satisfying” readings of a work—that is, the ones that
make them feel as though they have a better understanding of a work or repertoire.
Analysts, then, traditionally evaluate musical analyses on subjective grounds. However,
claiming that one analysis of a piece of music is “better than” another one for the same
piece is considered here to be meaningless, unless one specifies objectively evaluable tasks
that the first analysis allows one to perform more successfully. If such tasks are specified,
then one can meaningfully aim to find those analyses that are the best for carrying out
those tasks. Such tasks could include:

• memorizing a piece in order, for example, to be able to perform it without a score;

• identifying errors in a score or performance of an analyzed piece or other related
pieces;
• correctly identifying the composer, place of composition, genre, form, and so on,
of an analyzed piece or other related pieces;
• predicting what occurs in one part of a piece, having analyzed another part of the
same piece or other related pieces; or
• transcribing a performance of a piece from an audio recording or MIDI1 represen
tation to staff notation.

Of course, it may be the case that there is no single way of understanding a piece or set
of pieces that allows for optimal performance on all such tasks. For example, the best way
of understanding a piece in order to be able to detect errors in a performance may not be
the best way of understanding that piece in order to determine whether some other, pre
viously unheard, piece is by the same composer. There may also be several different ways
music analysis and data compression 155

of understanding a given piece or set of pieces that are equally effective for carrying out a
given task. Nevertheless, it will often be the case that understanding a piece in certain
ways will allow one to carry out certain objectively evaluable tasks more effectively than
understanding the piece in certain other ways; to this extent, one can speak of some anal
yses as being “better than” others for carrying out specific, objectively evaluable tasks. The
goal of the work presented in this chapter is therefore that of finding those ways of under
standing musical objects that allow us to most effectively carry out the musical tasks that
we want to accomplish. The approach adopted is based on the hypothesis that the best
possible explanations for the structure of a given musical object are those that

1. are as simple as possible;

2. account for as much of the detailed structure of the object as possible; and
3. set the object in as broad a context as possible (i.e., describe the object as being
part of as large a body of music as possible).

Clearly, these goals often conflict: accounting for the structure of a piece in more
detail or in a way that relates the piece to all the music in some larger context can often
entail making one’s explanation (i.e., analysis) more complex.
This hypothesis, which forms the foundation for the work reported in this chapter, is
a form of the well-known principle of parsimony. This principle can be traced back to
antiquity2 and is known in common parlance as “Ockham’s razor,” after the medieval
English philosopher, William of Ockham (ca. 1287–1347), who made several statements
to the effect that, when presented with two or more possible explanations that account
for some set of observations, one should prefer the simplest of these explanations.
In more recent times, the parsimony principle has been formalized in various ways,
including Rissanen’s (1978) minimum description length (MDL) principle, Solomonoff ’s
(1964a, 1964b) theory of inductive inference, and Kolmogorov’s concept of a minimal
algorithmic sufficient statistic3 (Li and Vitányi 2008, 401ff; Vereshchagin and Vitányi
2004). The essential idea underpinning these concepts is that explanations for data (i.e.,
ways of understanding it) can be derived from it by compressing it—that is, by finding
parsimonious ways of describing the data by exploiting regularity in it and removing
redundancy from it. Indeed, Vitányi and Li (2000, 446) have shown that “data com
pression is almost always the best strategy” both for model selection and prediction.
The basic hypothesis that drives the research presented in this chapter is thus that the
more parsimoniously one can describe an object without losing information about it,
the better one explains the object being described, suggesting the possibility of auto
matically deriving explanatory descriptions of objects (in our case, musical objects) simply
by the lossless compression of “in extenso” descriptions of them. In the case of music,
such an in extenso description might, for example, be a list of the properties of the notes
in a piece (e.g., the pitch, onset, and duration of each note), such as can be found in a
MIDI file. Alternatively, it could be a list of sample values describing the audio signal of a
musical performance, such as can be found in a pulse-code modulation (PCM) audio
file. The defining characteristic of an in extenso description of an object is that it explicitly
specifies the properties of each atomic component of the object (e.g., a MIDI event in a
156 david meredith

MIDI file or an audio sample in a PCM audio file), without grouping these atoms
together into larger constituents and without specifying any structural relationships
between components of the object.4 In contrast, an explanation for the structure of an
object, such as an analysis of a musical object, will group atomic components together
into larger constituents (e.g., notes grouped into phrases and chords or audio samples
grouped together into musical events), specify structural relationships between com
ponents (e.g., “theme B is an inversion of theme A”), and classify constituents into catego
ries (e.g., “chords X and Y are tonic chords in root position,” “bars 1–4 and 16–19 are
occurrences of the same theme”). Throughout this chapter, I assume that an analysis is a
losslessly compressed encoding of an in extenso description of a musical object, even
though most musical analyses to date have typically been lossy, in that they only focus
on certain aspects of the structure of an object (e.g., harmony, voice-leading, thematic
structure, etc.). Such lossily compressed encodings of an object can also provide useful
ways of understanding it, but, because information in the original object is lost in such
encodings, they do not (at least individually) explain all of the detailed structure of the
object. In particular, such lossy encodings do not provide enough information for the
original object to be exactly reconstructed. Thus, if one is interested, for example, in
learning enough about a corpus of pieces in order to compose new pieces of the same
type, then such lossy analytical methods would not be sufficient.
In the remainder of this chapter, it is proposed that a musical analysis can fruitfully be
conceived of as being an algorithm (possibly implemented as a computer program) that,
when executed, outputs an in extenso description of the musical object being analyzed,
and thus serves as a hypothesis about the nature of the process that gave rise to that
musical object. Moreover, it is hypothesized that, if one has two algorithms or programs
that each generate the same musical object, then the shorter of these (i.e., the one that
can be encoded using fewer bits of information) will represent the better way of under
standing that object for any task that requires or benefits from musical understanding.
A model of music perception and learning will be sketched later on in this chapter,
that is based on the idea of accounting for the structure of a newly experienced piece of
music by minimally modifying a compressed encoding of previously encountered
pieces. Some recent work will then be reviewed in which these ideas have been put into
practice by devising compression algorithms that acquire musical knowledge that can
then be applied in automatically carrying out a variety of advanced musicological tasks.

Encodings, Decoders,
and Two-Part Codes

In this chapter, a musical analysis is conceived of as an effective procedure (i.e., algorithm),

possibly implemented as a working computer program, that, when executed, generates
as its only output an in extenso description of the music to be explained. Typically, the
music analysis and data compression 157

description of this program may be shorter than its output. A basic claim of this chapter
is that such a description (in the form of a program) becomes an explanation for the
structure of the object being described as soon as it is shorter than the in extenso
description of the object that it generates. In other words, a compressed encoding of an
in extenso description of an object can be considered a candidate explanation (not neces
sarily a “correct” one) for the structure of that object because it serves as a hypothesis
as to the nature of the process that gave rise to the object.
Moreover, it is hypothesized that the more parsimoniously one can describe an object
on some given level of detail, the better that description explains the structure of the
object on that level of detail. As discussed earlier, this is an application of Ockham’s razor
or the MDL principle (Rissanen 1978).
The following simple example serves to illustrate the foregoing ideas. Consider the
problem of describing the set of twelve points shown in Figure 8.1. One could do this by
explicitly giving the coordinates of all twelve points, thus:

P(p(0, 0), p(0, 1), p(1, 0), p(1, 1), p(2, 0), p(2, 1), p(2,2), p(2, 3), (1)
p(3, 0), p(3,1), p(3, 2), p(3, 3)).

In this encoding, a set of points, {p1, p2, . . . pn}, is denoted by P(p1, p2, . . . pn) and each
point within such a set is denoted by p(x,y), where x and y are the x- and y-coordinates
of the point respectively. The encoding in (1) can be thought of as being a program that
computes the set of points in Figure 8.1 simply by specifying each point individually.
Representing this set of points in this way requires one to write down twenty-four inte
ger coordinate values. Moreover, the encoding does not represent any groupings of the
points into larger constituents, nor does it represent any structural relationships
between the points. In other words, this description is an in extenso description that
does not represent any of the structure in the point set and therefore cannot be said to
offer any explanation for it. One could go even further and say that expression (1) repre
sents the data as though it were a random, meaningless arrangement of points with no
order or regularity.

0
0 1 2 3

Figure 8.1 A set of twelve two-dimensional points in a Euclidean integer lattice.

158 david meredith

Note that, in order to actually generate the set of twelve points, the description (1)
needs to be decoded. An algorithm that carries out this decoding is called a decoder.
In this case, such a decoder only needs to know about the meanings of the P(·) and
p(x,y) formalisms.
One can obtain a shorter encoding of the point set in Figure 8.1 by exploiting the fact
that it consists of three copies, at different spatial positions, of the square configuration
of points,

P(p(0, 0), p(0,1), p(1, 0), p(1, 1)). (2)

One could represent this description of the point set as follows:

T(P(p(0, 0), p(0,1), p(1, 0), p(1,1)), V(v(2, 0), v(2, 2))), (3)

where T(P(p1, p2, . . . pn),V(v1, v2, . . . vm)) denotes the union of the point set, {p1, p2, . . . pn},
and the point sets that result by translating {p1, p2, . . . pn} by the vectors, {v1, v2, . . . vm},
where each vector is denoted by v(x,y), x and y being the x- and y-coordinates, respectively,
of the vector. Note that description (3) fully specifies the point set in Figure 8.1
using only twelve integer values—that is, half the number required to explicitly list the
coordinates of the points in the in extenso description in (1). Description (3) is thus a
losslessly compressed encoding of description (1). Description (3) thus qualifies as an
explanation for the structure of the point set in Figure 8.1, precisely because it represents
some of the structural regularity in this point set. If one perceives the point set in
Figure 8.1 in the way represented by description (3), then the twelve points are no longer
perceived to be arranged in a random, meaningless manner—they are now seen as
resulting from the occurrence of three identical squares. Moreover, it is precisely because
expression (3) captures this structure that it manages to convey all the information in (1)
while being only roughly half the length of (1).
On the other hand, in order to generate the actual point set in Figure 8.1 from the
expression in (3), the decoder now needs to be able to interpret not only the operators
P(·) and p(x,y), but also the operators T(·), V(·), and v(x,y). The decoder required to
decode description (3) is therefore itself longer and more complex to describe than the
decoder required to decode expression (1). The crucial question is therefore whether we
save enough on the length of the encoding to warrant the resulting increase in length of
the decoder. If the set of twelve points in Figure 8.1 were the only data that we ever had to
understand and the operators T(·), V(·) and v(x,y) were only of any use on this par
ticular dataset, then the increase in the length of the decoder required to implement these
extra operators would probably exceed the decrease in the length of the encoding that
these operators make possible. Consequently, in this case, the parsimony principle
would not predict that description (3) represented a better way of understanding the
point set in Figure 8.1—the new encoding would just replace the specification of eight
random points in (1) with two random vectors in (3) and three randomly chosen new
operators to be encoded in the decoder. However, the concepts of a vector, a vector set,
and the operation of translation can be used to formulate compressed encodings of an
music analysis and data compression 159

infinite and commonly occurring class of point sets—those containing subsets related by
translation. If we encode a sufficiently large sample of such point sets using translation-
invariance as a compression strategy, then the saving in the lengths of the resulting
encodings will more than offset the increase in the length of the decoder required to
make it capable of handling translation of point sets. This illustrates that interpreting
the point set in Figure 8.1 as being composed of three identical square configurations of
four points only makes sense if one is interpreting this point set in the broad context of
a large (in this case, infinite) class of point sets, of which the set of points in Figure 8.1 is
an example.
The foregoing example illustrates that what we are really interested in is not just the
length of an encoding but the sum of the length of the encoding and the length of
the decoder required to generate the in extenso description of the encoded object from the
encoding. We therefore think about descriptions of objects as being two-part codes in
which the first part (the decoder) represents all the structural regularity in the object
that it shares with all the members of a (typically large) set of other objects and the sec
ond part represents what is unique to the object and random relative to the decoder.5
This is why we would not, for example, be interested in a “decoder” that itself consists
solely of an in extenso description of the point set in Figure 8.1 and generates this point
set every time it is run with no input. In this case, the “encoding” of the data would be of
length zero but, because the decoder would be of length at least equal to that of the
uncompressed in extenso description of the point set, we would have no net com
pression and, consequently, no explanation.

Music Analysis and Data Compression

If the best explanations are the shortest descriptions that account for as much data as
possible in as much detail as possible, then this suggests that the goal of music analysis
should be to find the shortest—but most detailed—description of as much music as pos
sible. To illustrate this, let us consider a close musical analogue of the point-set example
in Figure 8.1 discussed previously.
Figure 8.2 shows the beginning of J. S. Bach’s Prelude in C minor (BWV 871) from the
second book of Das Wohltemperierte Klavier (1742) and Figure 8.3 shows a point-set repre
senting this music, in which the horizontal dimension represents time in sixteenth

pitch letter name (A–G) and octave of a note but not its alteration ( . . . ≅≅, ≅, ∃, #, *, . . . ),
notes and the vertical dimension represents morphetic pitch, an integer that encodes the

so that, for example, D≅4, D∃4 and D#4 all have the same morphetic pitch of 24
(Meredith 2006, 2007). The union of the three, 4-note patterns, A, B, and C, in Figure 8.3
could be described in an in extenso manner, on an analogy with description (1), as follows:

P(p(1, 27), p(2, 26), p(3, 27), p(4, 28), p(5, 26), p(6, 25), p(7, 26), (4)
p(8, 27), p(9, 25), p(10, 24), p(11, 25), p(12, 26))
160 david meredith

Figure 8.2 The opening notes from J. S. Bach’s Prelude in C minor (BWV 871) from the s econd
book of Das Wohltemperierte Klavier (1742). Patterns A, B, and C correspond, respectively, to the
patterns with the same labels in Figure 8.3 (from Meredith et al. 2002).

29
A
28
B
27
C
26

16
0 1 2 3 4 5 6 7 8 9 10 11 12

Figure 8.3 A point-set representation of the music in Figure 8.2. The horizontal dimension
represents time in sixteenth notes; the vertical dimension represents morphetic pitch (Meredith
2006, 2007). Patterns A, B, and C correspond, respectively, to the patterns with the same labels
in Figure 8.2. See text for further explanation (from Meredith et al. 2002).

This would require one to write down twenty-four integer coordinates. Alternatively,
on an analogy with description (3), one could exploit the fact that the set consists of
three occurrences of the same pattern at different (modal) transpositions, and describe
it more parsimoniously as follows:
music analysis and data compression 161

T(P(p(1, 27), p(2, 26), p(3, 27), p(4, 28)), V(v(4, −1), v(8, −2))) (5)

This expression not only requires one to write down only half as many integers but also
encodes some of the analytically important structural regularity in the music—namely,
that the twelve points consist of three, 4-note patterns at different transpositions. Thus,
by seeking a compressed encoding of the data, we have succeeded in finding a represen
tation that gives us important information about the structural regularities in that data.
In the particular case of Figure 8.3, we can get an even more compact description by
recognizing that the vector mapping A onto B is the same as that mapping B onto C. This
means that one could represent the vector set V(v(4,−1),v(8,−2)) in description (5) as a
vector sequence consisting of two consecutive occurrences of the vector v(4,−1), where
the result of translating pattern A by the first vector in the sequence is itself translated by
the second vector in the sequence. For example, this could be encoded as V(2v(4,−1)),
where the emboldened V operator indicates that what follows is a sequence or ordered
set, not an unordered set; and where we denote k consecutive occurrences of a vector,
v(x,y), by kv(x,y). This would, of course, require a modification of the decoder so that it
could process both vector sequences and the shorthand notation for sequences consist
ing of multiple occurrences of the same vector. As discussed earlier, whether or not add
ing this functionality to the decoder would be worthwhile depends on whether the new
functionality allows for a sufficient reduction in encoding length over the whole class of
musical objects that we are interested in explaining. In this particular case, since the
device of musical sequence, exemplified by the excerpt in Figure 8.2, is commonly used
throughout Western music, it would almost certainly be a good strategy to allow for the
encoding of this type of structure in a compact manner. It is therefore not surprising
that most psychological coding languages that have been designed for representing
musical structure allow for multiple consecutive occurrences of the same interval or
vector to be encoded in such a compact form (Deutsch and Feroe 1981; Meredith 2012b;
Restle 1970; Simon and Sumner 1968, 1993).

Music-Theoretical Concepts
That Promote Compact Encodings
of Musical Objects

There are a number of basic music-theoretical concepts and practices that help Western
musicians and composers to encode tonal music parsimoniously and reduce the cogni
tive load required to process musical information.
One example of such a concept is that of a voice. The strategy of conceiving of music as
being organized into voices substantially reduces the amount of information about note
durations that has to be communicated and remembered by musicians. For the vast
162 david meredith

majority of notes in a piece of polyphonic Western music, the duration is equal to the
within-voice, inter-onset interval—that is, most notes are held until the onset of the next
note in the same voice. This means that, for most notes, provided we know the voice to
which it belongs, we do not have to explicitly encode its duration—we only need to do so
if there is a rest between it and the next note in the same voice. Grouping notes together
into sequences that represent voices therefore considerably reduces the amount of
information about note durations that needs to be explicitly encoded, remembered, and
communicated.
The way in which pitch information is encoded in standard Western staff notation
also helps to make scores more parsimonious. Key signatures, for example, remove the
need to explicitly state the accidental for every note in a piece. Instead, accidentals only
have to be placed before notes whose pitches are outside the diatonic set indicated by the
key signature. Since most of the notes within a single piece of Western tonal music occur
within a small number of closely related diatonic sets (i.e., within a relatively limited
range on the line of fifths), accidentals are typically only necessary for a small pro
portion of the notes in a score. Key signatures, therefore, provide a mechanism for parsi
moniously encoding information about pitch names in Western tonal music.
Also, typically, Western music based on the major–minor system (or the diatonic
modes) is organized into consecutive temporal segments in which each note is under
stood to have one of seven different basic tonal functions within the key in operation at
the point where the note occurs. For example, in the major–minor system, these basic
tonal functions would be {tonic, supertonic, mediant . . . leading note} and each could be
modified or qualified by being considered flattened or sharpened relative to a diatonic
major or minor scale. Staff notation capitalizes on this by providing only seven different
vertical positions at which notes can be positioned within each octave, rather than the
twelve different positions that would be necessary if the pitch of each note were repre
sented chromatically rather than in terms of its role within a seven-note scale. Again,
this strategy allows for pitch information to be encoded more parsimoniously, leading
to a reduction in the cognitive load on a musician reading the score.
This pitch-naming strategy leads to more parsimonious encodings by assigning simpler
(shorter) encodings to pitches that are more likely to occur in the music. Time signatures
similarly define a hierarchy of “probability” over the whole range of possible temporal
positions at which a note may start within a measure. Specifically, notes are more likely to
start on stronger beats.6 In Western classical and popular music, this results in only very
few possible positions within a bar being probable positions for the start (or end) of a note
and the notation is designed to make it easier to notate and read notes that start at more
probable positions (i.e., on stronger beats). In data compression, variable-length codes,
such as the Huffman code (Huffman 1952; Cormen et al. 2009, 431–435) or Shannon–Fano
code (Shannon 1948a, 1948b; Fano 1949), work in a closely analogous way by assigning
shorter codes (i.e., simpler encodings) to more probable symbols or symbol strings.
Huffman coding, in particular, assigns more frequent symbols to nodes closer to the root
in a binary tree, which is closely analogous to tree-based representations of musical meter
that assign stronger beats to higher levels in a tree structure (Lerdahl and Jackendoff 1983;
Temperley 2001, 2004, 2007; Martin 1972; Meredith 1996, 214–219).
music analysis and data compression 163

It thus seems that several features of Western staff notation and certain music-theo
retical concepts have evolved in order to allow for Western tonal music to be encoded
more parsimoniously.

Kolmogorov Complexity

The work presented in this chapter is based on the central thesis that explanation is
compression. The more compressible an object is, the less random it is, the simpler it is
and the more explicable it is. This basic thesis was formalized by information theorists
during the 1960s and encapsulated in the concept of Kolmogorov complexity. The
Kolmogorov complexity of an object is a measure of the amount of intrinsic information
in the object (Chaitin 1966; Kolmogorov 1965; Solomonoff 1964a, 1964b; Li and
Vitányi 2008). It differs from the Shannon information content of an object, which is the
amount of information that has to be transmitted in order to uniquely specify the object
within some predefined set of possible objects. The Kolmogorov complexity of an object
is the length in bits of the shortest possible effective (i.e., computable) description of an
object, where an effective description can be thought of as being a computer program
that takes no input and computes the object as its only output. In other words, the
Kolmogorov complexity of an object is a measure of the complexity of the simplest
process that can give rise to the object. The more structural regularity there is in an object,
the shorter its shortest possible description and the lower its Kolmogorov complexity.
Unfortunately, it is not generally possible to determine the Kolmogorov complexity of
an object, as it is usually impossible to prove that any given description of the object is
the shortest possible. Nevertheless, the theory of Kolmogorov complexity supports the
notion of using the length of a description as a measure of its complexity and it supports
the idea that the shorter the description of a given object, the more structural regularity
that description captures. The theory has also been used to show formally that data com
pression is almost always the best strategy for both model selection and prediction
(Vitányi and Li 2000). For some further comments on the relationship between music
analysis and Kolmogorov complexity, see Meredith (2012a).

Music Analysis and

Perceptual Coding

As stated at the outset, the work presented here is based on the assumption that the goal
of music analysis is to find the best possible explanations for musical works. This could
be recast in the language of psychology by saying that music analysis aims to find the
most successful perceptual organizations that are consistent with a given musical surface
(Lerdahl and Jackendoff 1983).
164 david meredith

Most theories of perceptual organization have been founded on one of two principles:
the likelihood principle (Helmholtz 1867), which proposes that the perceptual system
prefers organizations that are the most probable in the world; and the simplicity principle
(Koffka 1935), which states that the perceptual system prefers the simplest perceptual
organizations.
For many years, psychologists considered the simplicity and likelihood principles to
be in conflict until Chater (1996), drawing on the theory of Kolmogorov complexity,
pointed out that the two principles are mathematically equivalent. However, Vitányi
and Li (2000) showed that, strictly speaking, the predictions of the likelihood principle
(which corresponds to Bayesian inference) and the simplicity principle (which corre
sponds to what they call the “ideal MDL principle”) are only expected to converge for
individually random objects in computable distributions (Vitányi and Li 2000, 446).
They state, “if the contemplated objects are nonrandom or the distributions are not
computable then MDL [i.e., the simplicity principle] and Bayes’s rule [i.e., the likelihood
principle] may part company.”
Musical objects are typically highly regular and not at all random, at least in the sense
that randomness is defined within algorithmic information theory (Li and Vitányi 2008,
49ff.). Vitányi and Li’s conclusions therefore seem to cast doubt on whether approaches
based on the likelihood principle, commonly applied in Bayesian and probabilistic
approaches to musical analysis such as those proposed by Meyer (1956), Huron (2006),
Pearce and Wiggins (2012), and Temperley (2007), can ever successfully be used to dis
cover certain types of structural regularity in musical objects such as thematic transfor
mations or parsimonious generative definitions of scales or chords.
The approach presented in this chapter is therefore more closely aligned with
models of perceptual organization based on the simplicity principle—in particular,
theories of perceptual organization in the tradition of Gestalt psychology (Koffka 1935)
that take the form of coding languages designed to represent the structures of patterns
in particular domains. Theories of this type predict that sensory input is more likely to
be perceived to be organized in ways that correspond to shorter descriptions in a
particular coding language. Coding theories of this type have been proposed for serial
patterns (Simon 1972), visual patterns (Leeuwenberg 1971), and, indeed, musical patterns
(Deutsch and Feroe 1981; Meredith 2012b; Povel and Essens 1985; Restle 1970; Simon and
Sumner 1968, 1993).

A Sketch of a Compression-Based
Model of Musical Learning

Let us define a musical object to be any quantity of music, ranging from a single note
through to a complete work or even a collection of works. A musical object is typically
interpreted by a listener or an analyst in the context of some larger object that contains it
music analysis and data compression 165

I
T

WS P

C F

Figure 8.4 A Venn diagram illustrating various contexts in which a musical object might be
interpreted. A phrase (P) could be interpreted within the context of a section (S), which could
be interpreted within the context of a work (W), and so on. C = works by the same composer;
F = works in the same form or genre; I = works for the same instrumentation; T = tonal music;
M = all music.

(see Figure 8.4). In essence, the model of musical learning presented here is as follows.7
The analyst or listener explicitly or implicitly tries to find the shortest program that
computes the in extenso descriptions of a set of musical objects containing:

• the object to be explained (the explanandum); and

• other objects, related to the explanandum, defining a context within which the
explanandum is to be interpreted.

This idea is illustrated in Figure 8.5.

The analyst and listener differ in the degree of freedom that they have to choose the
context within which they interpret an object. The analyst can explicitly choose a con
text of closely related objects, such as other music in the same genre or by the same com
poser. In general, the more similar the explanandum is to the other objects in the
context, the shorter its description can be, relative to that context. The listener, on the
other hand, is forced to interpret the explanandum in the context of their largely implicit
understanding of all the previous music they have encountered.
Figure 8.6 illustrates the idea that, when the listener hears a new piece (in red), the
existing explanation (i.e., program) (P) for all the music previously heard (in yellow) is
minimally modified to produce a new program (P’) to account for the new piece in addi
tion to all previously encountered music. This is achieved by discovering the simplest
way of interpreting as much of the material in the new piece in terms of what is already
known. The perceived structure of the newly encountered musical object is then
represented by the specific way in which P’ computes that object. Note that P’ may also
166 david meredith

Figure 8.5 The analyst’s or listener’s understanding of a musical object (the dark gray circle—in
red on the companion website) is modeled as a program, P, that computes a set of musical objects
containing the one to be explained along with other related objects (the light gray circles—in yellow
on the companion website) forming a context within which the explanandum is interpreted.

P P′

Figure 8.6 When the listener hears a new piece (the dark gray circle—in red on the compan
ion website), the existing explanation (i.e., “program”) (P) for all the music previously heard is
minimally modified to produce a new program (P’) to account for the new piece in addition to
all previously encountered music. This might be achieved by discovering the simplest way of
interpreting as much of the material in the new piece in terms of what is already known.

g enerate the previously heard pieces in a way that differs from that in which P generates
these pieces, reflecting the fact that hearing a new piece may change the way that one
interprets pieces that one has heard before.
One can speculate that P’ is produced in a two-stage process. In the first stage, an
attempt is made to interpret as much of the new, unfamiliar piece as possible by reusing
elements and transformations that have previously been used to encode (i.e., under
stand) music. This will typically lead to a compact encoding of the new piece if it con
tains material that is related to that in previously encountered music. However, after this
first stage, the global interpretation of all pieces known to the listener/analyst (including
music analysis and data compression 167

the most recently interpreted piece) may no longer be as close to optimal as it could be.
In a second stage, therefore, the brain of the listener or analyst might carry out a more
computationally expensive “knowledge consolidation” process in which an attempt is
made to find a globally more efficient encoding of all music known to the individual.
This might, for example, occur during sleep (see Tononi and Cirelli 2014) and might
consist of a randomized process of seeking alternative encodings of individual pieces
that help to produce a more efficient global interpretation of the music known to the
individual.
On this view, music analysis, perception, and learning essentially reduce to the process
of compressing musical objects. This is, of course, an idealized model: for example,
in practice, a listener will not have internalized a model that can account in detail for all
the music they have previously heard. In other words, in reality, this learning process
would probably be based on rather lossy compression.
However, it is important to stress that, even though both the analyst and the listener
aim to find the shortest possible encodings of the music they encounter, they both usu
ally fail to achieve this. As Chater (1996) points out, “the perceptual system cannot, in
general, maximize simplicity (or likelihood) over all perceptual organizations. . . . It is,
nonetheless, entirely possible that the perceptual system chooses the simplest (or most
probable) organization that it is able to construct” (578). This is largely a result of the
limited processing and memory resources available to the perceptual system. For exam
ple, we typically describe the structure of a piece of music in terms of motives, themes,
and sections, all of which are temporally compact segments, meaning that they are
patterns that contain all the events that occur within a particular time span. It could well
be that, for some pieces, a more parsimonious description (corresponding to a better
explanation) might be possible in terms of patterns containing notes and events that
are dispersed widely throughout the piece. However, listeners would normally fail to
discover such patterns because their limited memories and attention spans constrain
them to focus on patterns that are temporally compact (see also Collins et al. 2011).

Using the Model to Explain

Individual Differences

The model just sketched can be applied to understanding the emergence of differences
between the ways that individuals understand the same piece. The model proposed in
the previous section consists essentially of a greedy algorithm8 that is used to construct
an interpretation for a newly encountered piece that minimally modifies an existing
“program” that generates descriptions of all the pieces in a particular context set. It was
proposed that this greedy approach might be supplemented by a computationally more
expensive process of consolidation that attempts to find a globally more efficient
encoding. Nevertheless, because such a consolidation process will not generally be
168 david meredith

capable of consistently discovering a globally optimal encoding, the way that an individual
understands a given piece will generally depend not only on which pieces they already
know, but also on the order in which these pieces were encountered. This implication
could fairly straightforwardly be tested empirically.
A rather crude version of the foregoing model has been implemented in an algorithm
called SIATECLearn. The SIATECLearn algorithm is based on the geometric pattern
discovery algorithm, SIATEC, proposed by Meredith and colleagues (2002). The SIATEC
algorithm takes as input a set of points called a dataset and automatically discovers all the
translationally related occurrences of maximal repeated patterns in the dataset. If the
dataset represents a piece of music, with each point representing a note in pitch-time
space, then two patterns in this space related by translation correspond to two state
ments of the same musical pattern, possibly with transposition. We say a pattern P is
translatable within a dataset D if there exists a vector, v, such that P translated by v gives a
pattern that is also in D. A translatable pattern is maximal for a given vector, v, in a data
set D, if it contains all the points in the dataset that can be mapped by translation by v
onto other points in the dataset. The maximal translatable pattern (MTP) for a vector v
in a dataset D, which we can denote by (MTP (v, D), can also be thought of as being the
intersection of the dataset D and the dataset D translated by –v. That is,

MTP (v , D) = D ∩ (D − v ). (6)

For each (nonempty) MTP, P, in a dataset, SIATEC finds all the occurrences of P, and
outputs this occurrence set of P. Such an occurrence set is called the translational equiv-
alence class (TEC) of P in D, denoted by TEC(P, D), because it contains all the patterns in
the dataset that are translationally equivalent to P. That is,

TEC(P , D) = {Q | Q ⊆ D ∧ (∃v | Q = P + v )}. (7)

SIATEC therefore takes a dataset as input and outputs a collection of TECs, such that
each TEC contains all the occurrences of a particular maximal translatable pattern.
An algorithm called SIATECCompress (Meredith 2013b, 2015, 2016) runs SIATEC on
a dataset, then sorts the found TECs into decreasing order of “quality.” Given two TECs,
the one that results in the better compression (in the sense of expressions (4) and (5),
discussed earlier) is deemed superior. If both TECs give the same degree of com
pression, then the one whose pattern is spatially more compact is considered superior.
SIATECCompress then scans this list of occurrence sets and computes an encoding of
the input dataset in the form of a set of TECs that, taken together, account for or cover
the entire input dataset.
SIATECLearn runs SIATECCompress, but also stores the patterns it finds on each
run and will preferably reuse these patterns rather than newly found ones on subse
quent runs of the algorithm. Thus, when SIATECLearn is run on the twelve-point pat
tern on the left in Figure 8.7, it “interprets” the dataset as being constructed from three
occurrences of the square pattern shown. This square pattern is therefore stored in its
music analysis and data compression 169

y 6 y 5

5
4

4
3
3
2
2

1
1

0 0
0 1 2 3 4 0 1 2 3 4
x x

Figure 8.7 Output of SIATECLearn when presented first with the dataset on the left and then
with the dataset on the right.

y 5 y 6

5
4

4
3
3
2
2

1
1

0 0
0 1 2 3 4 0 1 2 3 4
x x

Figure 8.8 Output of SIATECLearn when presented first with the dataset on the left and then
with the dataset on the right.

“long-term” memory. When the algorithm is subsequently run on the ten-point dataset
on the right, it prefers to use the stored square pattern rather than any of the patterns
that it finds in this newly encountered dataset; it interprets the new dataset as containing
two occurrences of the square pattern along with two extra points.
Conversely, when SIATECLearn is first presented with the ten-point dataset, it inter
prets the dataset as being composed from five occurrences of the two-point vertical line
configuration shown on the left in Figure 8.8. This pattern is then stored in long-term
memory, so that, when the algorithm is subsequently presented with the twelve-point
dataset, it interprets this set as consisting of six occurrences of this vertical line rather
than three occurrences of the square pattern. This very simple example illustrates how
the way in which objects are interpreted can depend on the order in which they are
presented.
170 david meredith

COSIATEC: Music Analysis by

Point-Set Compression

Given the concept of a TEC, as defined in (7) earlier, we can define the covered set,
CS(T), of a TEC T to be the union of all the patterns in T. That is

CS(T ) = ∪P∈T P . (8)

COSIATEC (Meredith et al. 2003; Meredith 2013b, 2015, 2016) is a greedy compression
algorithm based on SIATEC. The algorithm takes a dataset as input and computes a set
of TECs that collectively cover this dataset in such a way that none of the TECs’ covered
sets intersect. It also attempts to choose this set of TECs so that it minimizes the length
of the output encoding. The basic idea behind the algorithm is sketched in the pseudo-
code in Figure 8.9.
As shown in Figure 8.9, the COSIATEC algorithm first finds the “best” TEC in the
output of SIATEC for the input dataset, S. The best TEC is the one that produces the best
compression. This means that it is the one that has the best compression factor, which is
the ratio of the number of points in its covered set (as defined in (8)) to the sum of the
number of points in one occurrence of the TEC’s pattern and the number of occurrences
minus 1. The reasoning behind this is that a TEC can be compactly encoded as an
ordered pair, (P,V), where P is one occurrence in the TEC and V is the set of vectors that
map P onto all the other occurrences of P in the dataset. The number of vectors in V is
therefore equal to the number of occurrences of P minus 1. The length of an in extenso
encoding of a TEC’s covered set in terms of points is simply |CS(T)| as defined in (8).
Each vector in V has approximately the same information content as a point in P, so the
length of an ordered pair encoding of a TEC, (P,V), in terms of points is approximately
|P|+|V|. The compression factor is the ratio of the length of the in extenso encoding to
the length of the compressed encoding. Thus, the compression factor of a TEC, T = (P,V),
denoted CF(T), can be defined as

CS(T )
CF (T ) = .
P +V

COSIATEC(S)
while S is not empty
Find the best TEC, T, using SIATEC
Add T to the encoding, E
Remove the points covered by T from S
return the encoding E

Figure 8.9 The COSIATEC algorithm.

music analysis and data compression 171

If two TECs have the same compression factor, then COSIATEC chooses the TEC in
which the first occurrence of the pattern is the more compact: the compactness of a pat
tern is the ratio of the number of points in the pattern to the number of dataset points in
the bounding box of the pattern. The rationale behind this heuristic is that patterns are
more likely to be noticeable if the region of pitch-time space that they span does not also
contain many “distractor” points that are not in the pattern. These heuristics for evaluat
ing the quality of a TEC are discussed in more detail by Meredith and colleagues (2002),
Meredith (2015), and Collins and coauthors (2011).
As shown in Figure 8.9, once the best TEC, T, has been found for the input dataset, S,
this TEC is added to the encoding (E) and the covered set of T, CS(T), is removed from S.
Once the covered set of T has been removed from S, the process is repeated, with
SIATEC being run on the new S. The procedure is repeated until S is empty, at which
point E contains a set of TECs that collectively cover the entire input dataset. Moreover,
because the TEC that gives the best compression factor is selected on each iteration, E is
typically a compact or compressed encoding of S. COSIATEC typically produces encod
ings that are more compact than those produced by SIATECCompress.
Figure 8.10 shows the output of COSIATEC for a short Dutch folk song. The complete
piece can be encoded as the union of the covered sets of five TECs. In Figure 8.10, each
TEC is drawn in a different shade. The first TEC, drawn in red, consists of the occur
rences of a three-note, lower-neighbor-note figure. This TEC has the best compression
factor of any TEC for a maximal translatable pattern in this dataset. After these three-
note patterns have been removed from the piece, the next best TEC is the one drawn in
light green in Figure 8.10, namely the two occurrences of the four-note, rising scale seg
ment. The fifth TEC consists of the fourteen occurrences of a single unconnected point
in Figure 8.10. These are the points (notes) that are left over after removing the sets of
repeated patterns that give the best compression factor. This final set of “residual” points,
which cannot be compressed by the algorithm, is essentially seen by the algorithm as
being random “noise” that it cannot “explain.”
Figure 8.11 shows the analysis generated by COSIATEC for a more complex piece of
music, the Prelude in C minor (BWV 871) from book 2 of J. S. Bach’s Das Wohltemperierte
Klavier. Note that the first TEC (in red) generated by COSIATEC (i.e., the one that
results in the most compression over the whole dataset) is precisely the four-note
pattern shown in Figure 8.2, discussed earlier.

NLB015569_01 mid
28
Morphetic pitch

21
14
7
0
0 185 370 555 740 925 1110 1295 1480 1665 1850 2035 2220 2405 2590 2775 2960 3145 3330 3515 3700
Time/tatums

Figure 8.10 The set of TECs computed by COSIATEC for a short Dutch folk song, “Daar zou
er en maagdje vroeg opstaan” (file number NLB015569 from the Nederlandse Liederen Bank,
http://www.liederenbank.nl). Courtesy of Peter van Kranenburg.
172 david meredith

28
Morphetic pitch

0
0 2131 4262 6393 8524 10655 12786 14917 17048 19179 21310 23441 25572 27703 29834 31965 34096 36227 38358 40489 42620
Time/tatums

Figure 8.11 Analysis generated by COSIATEC of J. S. Bach’s Prelude in C minor (BWV 871)
from the second book of Das Wohltemperierte Klavier (1742). Each set of pattern occurrences
(i.e., TEC) is displayed in a distinct shade of gray (see image on companion website which uses
colors). The first TEC generated, consisting of occurrences of the opening “V”-shaped motive
(indicated with triangles here and red on the companion website), is the one that has the highest
compression factor over the whole dataset. The overall compression factor of this analysis is 2.3,
and the residual point set, containing notes that the algorithm does not re-express in a compact
form, contains 3.61 percent of the notes in the piece (corresponding to 25 out of 692 notes).

Evaluating Music Analysis Algorithms

In the introduction to this chapter, it was proposed that, when given two or more
different analyses of the same piece of music (or, more generally, musical object), it
may be possible to determine which of the analyses is the best for carrying out certain
objectively evaluable tasks. It is similarly possible to evaluate algorithms that compute
analyses by comparing how well the generated analyses allow certain tasks to be
performed.
In a recent paper (Meredith 2015), the point-set compression algorithms, COSIATEC
and SIATECCompress, were compared on a number of different tasks with a third
greedy compression algorithm proposed by Forth and Wiggins (2009) and Forth (2012).
The algorithms were evaluated on three tasks: folk song classification, discovery of
repeated themes and sections, and discovery of fugal subject and countersubject entries.
Although no obvious correlation was found between compression factor and per
formance on these tasks, COSIATEC achieved both the best compression factor (around
1.6) and the best classification success rate (84%) on the folk-song classification task. The
pattern-discovery task on which the algorithms compared in this study were evaluated
consisted of finding the repeated themes and sections identified in the JKU Patterns
Development Database, a collection of five pieces of classical and baroque music, each
accompanied by “ground-truth” analyses by expert musicologists (Collins 2013). The
output of each algorithm was compared with these analyses. I have argued (Meredith
2015, 263–265) that these “ground-truth” analyses are not satisfactory for at least two
reasons: first, the musicologists on whose work the ground-truth analyses are based did
not consistently identify all occurrences of the patterns that they considered to be worth
mentioning; and second, there are patterns that are noticeable and important that the
music analysis and data compression 173

Figure 8.12 Examples of noticeable and/or important patterns in Bach’s Fugue in A minor
(BWV 889), that were discovered by the algorithms tested by Meredith (2015) but were not
recorded in the “ground-truth” analyses in the JKU Patterns Development Database used for
evaluation. Patterns (a), (b), and (d) were discovered by COSIATEC. Patterns (c) and (d) were
discovered by SIATECCompress.

musicologists who created the ground-truth analyses failed to mention. Indeed, the
tested algorithms discovered not only structurally salient patterns that the analysts
omitted to mention but also exact occurrences of the ground-truth patterns that are not
recorded in the ground-truth analyses. Figure 8.12 shows some examples of structurally
important patterns in a fugue by J. S. Bach that were not recorded in the “ground-truth”
analyses used for evaluation.
Notwithstanding the foregoing methodological issues with this task, it was found that
SIATECCompress performed best on average, achieving an average F1 score of about
50 percent over the five pieces in the corpus. However, COSIATEC, achieved F1 scores of
71 percent and 60 percent on the pieces by Beethoven and Mozart, respectively; and
Forth’s algorithm performed substantially better than the other algorithms on a fugue
by Bach. There was therefore no algorithm that consistently performed best on this task.
On the fugal analysis task, the algorithms performed rather less well than on the
other evaluation tasks. COSIATEC and SIATECCompress achieved a mean recall of
around 60 percent over the twenty-four fugues in the first book of J. S. Bach’s Das
Wohltemperierte Klavier. However, COSIATEC’s precision on this task was much
lower (around 10%). Overall, the best performing algorithm was SIATECCompress
that achieved an F1 score of around 30 percent on this fugal analysis task.
In the study just discussed, the performance of the SIA-based compression algo
rithms on the folk-song classification task was compared with that of the general-
purpose text compression algorithm, bzip2 (Seward 2010). On this task, bzip2 achieved a
much higher average compression factor (3.5) but a much lower classification success
rate (12.5%) than the SIA-based algorithms. At first sight, this might be interpreted as
evidence against the basic hypothesis that shorter descriptions correspond to better
explanations. In a later study, Corentin Louboutin and I therefore explored in more
174 david meredith

depth whether general-purpose compression algorithms could be used for music

analysis, by comparing three general-purpose compression algorithms with COSIATEC
on two music-analytical tasks (Louboutin and Meredith 2016). The general-purpose
algorithms compared included the Burrows–Wheeler algorithm (Burrows and Wheeler
1994), Lempel–Ziv-77 (LZ77) (Ziv and Lempel 1977) and Lempel–Ziv-78 (LZ78) (Ziv and
Lempel 1978). This study confirmed that, in order to achieve good results, the type of
representation used for the music has to be appropriate for the compression algorithm
used. Thus, COSIATEC, which discovers maximal repeated patterns in point sets, was
unaffected by the order in which the notes were sorted in the input files. However, LZ77
discovers repeated substrings in a sequence of symbols and these substrings, consisting
of sequences of contiguous symbols in the original string representation, only corre
spond to sequences of contiguous notes in a voice when the notes in the music are
presented to the algorithm a voice at a time. If the notes are presented a chord at a time
(i.e., sorting the notes first by pitch and then by onset), then we should not expect LZ77
to be capable of finding repeated melodic themes. Our results confirmed this; when the
algorithms were used on the fugal analysis task described earlier, the F1 score for the
LZ77 algorithm doubled when the notes were first sorted so that the algorithm was pre
sented with the music a voice at a time rather than a chord at a time. On this task, we also
found a strong correlation between compression factor and F1 score, supporting the
general notion that shorter descriptions represent better explanations.
On the folk-song classification task, we were able to improve on the performance of
COSIATEC by using eight different representations in combination, with LZ77 being
used to calculate normalized compression distances for seven of these and COSIATEC
being used for the last one. In this way, we succeeded in achieving a classification success
rate of over 94 percent using an eight-nearest-neighbor classification algorithm,
compared with 85 percent for COSIATEC alone.

Applying a Compression-Driven
Approach to the Analysis of
Musical Audio

The main concern in this chapter has been with explaining “musical objects” by dis
covering losslessly compressed descriptions of these objects. The basic scheme is that one
takes an in extenso encoding of such an object and then attempts to find a short algo
rithm that generates that in extenso encoding as its only output. The encoding could be
on any level of granularity and could represent any quantity of music in any possible
domain in which a musical object might be manifested—for example, an image of a
score, a symbolic encoding of a score, an audio recording or a video recording. In the
examples and evaluations presented above, the focus has been on musical objects that
are symbolic encodings of scores. In such cases, one can realistically hope to be able to
produce losslessly compressed descriptions in which we are required to consider only a
music analysis and data compression 175

very small proportion of the information in the object to be “random” or “noise.” On the
other hand, if one were concerned with explaining the structure of a digital audio
recording of a performance of a piece produced by human performers playing from a
score, then one would expect the compression factors achievable to be lower and one
would expect to have to be satisfied with considering a larger proportion of the
information in the object as being “noise.” This is because the detailed structure of such a
recording depends not only on the score from which the players are performing, but also
many other factors that are perhaps harder to model, such as the acoustics of the space in
which the recording was made, the precise nature of the instruments used and, most
importantly, the players themselves and their own particular ways of interpreting the score.

Summary

In this chapter, I have proposed that the goal of music analysis should be to find the
“best” ways of understanding musical objects and that two different analyses of the same
musical object can be compared objectively by determining whether one of them allows
us to more effectively perform some specific set of tasks. I have also explored the hypoth
esis that, for all tasks that require an understanding of how a musical object is con
structed, the best ways of understanding that object are those that are represented by the
shortest possible descriptions of the object. I have briefly outlined how this hypothesis
relates to the theory of Kolmogorov complexity and to coding theory models of percep
tion. I have also briefly sketched how these ideas can form the basis of a theory of musi
cal learning that can potentially explain aspects of music cognition such as individual
differences. Finally, I briefly described the COSIATEC point-set compression algorithm
and reviewed the results of some experiments in which it and other related algorithms
have been used to automatically carry out musical tasks such as folk-song classification
and thematic analysis. The results achieved in these experiments generally support the
idea that the knowledge necessary to be able to successfully carry out advanced musico
logical tasks can largely be acquired simply by compressing in extenso representations
of musical objects. Moreover, some of the results clearly indicate a correlation between
compression factor and success on musicological tasks. However, these experiments
also show that performance on such tasks depends heavily both on the specific types of
redundancy exploited by the compression algorithm used to generate the compressed
encodings and on the precise form of the in extenso representations used as input to
these compression-based learning methods.

Acknowledgments
The work reported in this chapter was carried out as part of the EU collaborative project,
“Learning to Create” (Lrn2Cre8). The project Lrn2Cre8 acknowledges the financial support of
the Future and Emerging Technologies (FET) programme within the Seventh Framework
Programme for Research of the European Commission, under FET grant number 610859.
176 david meredith

Notes
1. https://www.midi.org/specifications
2. See, for example, chapter 25 of Book 1 of Aristotle’s Posterior Analytics (Bouchier 1901, 66).
3. Kolmogorov introduced the field of nonprobabilistic statistics at a conference in Tallinn,
Estonia, in 1973 and in a talk at the Moscow Mathematical Society in 1974 (Li and
Vitányi 2008, 405). Unfortunately, these talks were never published in written form.
4. See Simon and Sumner (1968, 1993) for a similar use of the term “in extenso” in the context
of music representations.
5. For a more technical discussion of two-part codes, see Vitányi and Li (2000, 447).
6. See Temperley (2007, chap. 3) for a model of rhythm and meter perception based on the idea
that simpler meters are more probable and events are more likely to occur on stronger beats.
7. This model was originally described by Meredith (2012c, 2013a).
8. A greedy algorithm attempts to solve an optimization problem by always choosing the locally
best option at each decision point in the construction of a solution. This does not always
produce a globally optimal solution, but for some problems it does (e.g., activity selection,
the construction of a Huffman code). For more details, see Cormen and colleagues
(2009, 414–450).

References
Bouchier, E. S. 1901. Aristotle’s Posterior Analytics. Oxford: Oxford University Press.
Burrows, M., and D. J. Wheeler. 1994. A Block-Sorting Lossless Data Compression Algorithm.
Palo Alto, CA: Digital Systems Research Center (now HP Labs). Technical Report SRC 124.
Chaitin, G. J. 1966. On the Length of Programs for Computing Finite Binary Sequences.
Journal of the Association for Computing Machinery 13 (4): 547–569.
Chater, N. 1996. Reconciling Simplicity and Likelihood Principles in Perceptual Organization.
Psychological Review 103 (3): 566–581.
Collins, T. 2013. JKU Patterns Development Database. http://tomcollinsresearch.net/research/
data/mirex/JKUPDD-Aug2013.zip. Accessed January 21, 2016.
Collins, T., R. Laney, A. Willis, and P. H. Garthwaite. 2011. Modeling Pattern Importance in
Chopin’s “Mazurkas.” Music Perception 28 (4): 387–414.
Cormen, T. H., C. E. Leiserson, R. L. Rivest, and C. Stein. 2009. Introduction to Algorithms.
3rd ed. Cambridge, MA: MIT Press.
Deutsch, D., and J. Feroe. 1981. The Internal Representation of Pitch Sequences in Tonal Music.
Psychological Review 88 (6): 503–522.
Fano, R. M. 1949. The Transmission of Information, Technical Report No. 65, March 17.
Cambridge, MA: Research Laboratory of Electronics, MIT.
Forth, J. 2012. Cognitively-Motivated Geometric Methods of Pattern Discovery and Models of
Similarity in Music. PhD thesis, Department of Computing, Goldsmiths, University of
London.
Forth, J., and G. A. Wiggins. 2009. An Approach for Identifying Salient Repetition in
Multidimensional Representations of Polyphonic Music. In London Algorithmics 2008:
Theory and Practice, edited by J. Chan, J. W. Daykin, and M. S. Rahman, 44–58. London:
College Publications.
music analysis and data compression 177

Halpern, A. R. 2003. Cerebral Substrates of Musical Imagery. In The Cognitive Neuroscience of

Music, edited by I. Peretz and R. J. Zatorre, Chapter 15. Oxford: Oxford University Press.
doi:10.1093/acprof:oso/9780198525202.001.0001.
Helmholtz, H. L. F. 1867. Handbuch der physiologischen Optik. Leipzig: Leopold Voss.
Huffman, D. A. 1952. A Method for the Construction of Minimum-Redundancy Codes. In
Proceedings of the IRE, September, Vol. 40 (9), 1098–1101. doi:10.1109/JRPROC.1952.273898.
Huron, D. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge,
MA: MIT Press.
Koffka, K. 1935. Principles of Gestalt Psychology. New York: Harcourt Brace.
Kolmogorov, A. N. 1965. Three Approaches to the Quantitative Definition of Information.
Problems of Information Transmission 1 (1): 1–7.
Leeuwenberg, E. L. J. 1971. A Perceptual Coding Language for Visual and Auditory Patterns.
American Journal of Psychology 84 (3): 307–349.
Lerdahl, R., and R. Jackendoff. 1983. A Generative Theory of Tonal Music. Cambridge, MA:
MIT Press.
Li, M., and P. M. B. Vitányi. 2008. An Introduction to Kolmogorov Complexity and Its
Applications. 3rd ed. New York: Springer.
Louboutin, C., and D. Meredith. 2016. Using General-Purpose Compression Algorithms for
Music Analysis. Journal of New Music Research 45 (1): 1–16.
Martin, J. G. 1972. Rhythmic (Hierarchical) versus Serial Structure in Speech and Other
Behavior. Psychological Review 79 (6): 487–509.
Meredith, D. 1996. The Logical Structure of an Algorithmic Theory of Tonal Music. Unpublished
thesis. http://www.titanmusic.com/papers/public/thesis1996.pdf. Accessed May 15, 2017.
Meredith, D. 2006. The “ps13” Pitch Spelling Algorithm. Journal of New Music Research 35 (2):
121–159.
Meredith, D. 2007. Computing Pitch Names in Tonal Music: A Comparative Analysis of Pitch
Spelling Algorithms. PhD thesis, University of Oxford.
Meredith, D. 2012a. Music Analysis and Kolmogorov Complexity. In Proceedings of the 19th
Colloquio d’Informatica Musicale (XIX CIM). Trieste, Italy, 21–24 November, Trieste, Italy.
pages 96-102. Available online at http://cim.lim.di.unimi.it/2012_CIM_XIX_Atti.pdf.
Meredith, D. 2012b. A Geometric Language for Representing Structure in Polyphonic
Music. In Proceedings of the 13th International Society for Music Information Retrieval
Conference (ISMIR 2012), 133–138. Porto, Portugal: International Society for Music
Information Retrieval.
Meredith, D. 2012c. A Compression-Based Model of Musical Learning. DMRN+7: Digital
Music Research Network One-day Workshop 2012, December 18, Queen Mary University
of London.
Meredith, D. 2013a. Analysis by Compression: Automatic Generation of Compact Geometric
Encodings of Musical Objects. In The Music Encoding Conference 2013, May 22–24, 41–53.
Mainz, Germany: Mainz Academy for Literature and Sciences.
Meredith, D. 2013b. COSIATEC and SIATECCompress: Pattern Discovery by Geometric
Compression. In Music Information Retrieval Evaluation Exchange (Competition on
“Discovery of Repeated Themes & Sections”) (MIREX). Curitiba, Brazil. https://www.music-ir.
org/mirex/abstracts/2013/DM10.pdf.
Meredith, D. 2015. Music Analysis and Point-Set Compression. Journal of New Music Research
44 (3): 245–270.
178 david meredith

Meredith, D. 2016. Analysing Music with Point-Set Compression Algorithms. In Computational

Music Analysis, edited by D. Meredith, 335–366. Cham, Switzerland: Springer.
Meredith, D., K. Lemström, and G. A. Wiggins. 2002. Algorithms for Discovering Repeated
Patterns in Multidimensional Representations of Polyphonic Music. Journal of New Music
Research 31 (4): 321–345.
Meredith, D., K. Lemström, and G. A. Wiggins. 2003. Algorithms for Discovering Repeated
Patterns in Multidimensional Representations of Polyphonic Music. Proceedings of the
Cambridge Music Colloquium. University of Cambridge.
Meyer, L. B. 1956. Emotion and Meaning in Music. Chicago: Chicago University Press.
Pearce, M., and G. A. Wiggins. 2012. Auditory Expectation: The Information Dynamics of
Music Perception and Cognition. Topics in Cognitive Science 4: 625–652.
Povel, D.-J., and P. Essens. 1985. Perception of Temporal Patterns. Music Perception 2 (4):
411–440.
Restle, F. 1970. Theory of Serial Pattern Learning: Structural Trees. Psychological Review 77 (6):
481–495.
Rissanen, J. 1978. Modeling by Shortest Data Description. Automatica 14:465–471.
Seward, J. 2010. bzip2 version 1.0.6, released 20 September 2010. http://www.bzip.org. Accessed
April 19, 2014.
Shannon, C. E. 1948a. A Mathematical Theory of Communication. Bell System Technical
Journal 27 (3): 379–423.
Shannon, C. E. 1948b. A Mathematical Theory of Communication. Bell System Technical
Journal 27 (4): 623–656.
Simon, H. A. 1972. Complexity and the Representation of Patterned Sequences of Symbols.
Psychological Review 79 (5): 369–382.
Simon, H. A., and R. K. Sumner. 1968. Pattern in Music. In Formal Representation of Human
Judgment, edited by B. Kleinmuntz. New York: Wiley.
Simon, H. A., and R. K. Sumner. 1993. Pattern in Music. In Machine Models of Music, edited by
S. M. Schwanauer and D. A. Levitt, 83–110. Cambridge, MA: MIT Press.
Solomonoff, R. J. 1964a. A Formal Theory of Inductive Inference, Part I. Information and
Control 7 (1): 1–22.
Solomonoff, R. J. 1964b. A Formal Theory of Inductive Inference, Part II. Information and
Control 7 (2): 224–254.
Temperley, D. 2001. The Cognition of Basic Musical Structures. Cambridge, MA: MIT Press.
Temperley, D. 2004. An Evaluation System for Metrical Models. Computer Music Journal
28 (3): 28–44.
Temperley, D. 2007. Music and Probability. Cambridge, MA: MIT Press.
Tononi, G., and C. Cirelli. 2014. Sleep and the Price of Plasticity: From Synaptic and Cellular
Homeostasis to Memory Consolidation and Integration. Neuron 81 (1): 12–34.
Vereshchagin, N. K., and P. M. B. Vitányi. 2004. Kolmogorov’s Structure Functions and Model
Selection. IEEE Transactions on Information Theory 50 (12): 3265–3290.
Vitányi, P. M. B., and M. Li. 2000. Minimum Description Length Induction, Bayesianism, and
Kolmogorov Complexity. IEEE Transactions on Information Theory 46 (2): 446–464.
Ziv, J., and A. Lempel. 1977. A Universal Algorithm for Sequential Data Compression. IEEE
Transactions on Information Theory 23 (3): 337–343.
Ziv, J., and A. Lempel. 1978. Compression of Individual Sequences via Variable-Rate Coding.
IEEE Transactions on Information Theory 24 (5): 530–536.
chapter 9

Bioacoustic s
Imaging and Imagining the Animal World

Mickey Vallee

Introduction

About 100 kilometers north of the Alberta Oil Sands, hidden amid the burned and
twisted shards of lumber, netted over freshly growing grass and sprouting pine, a com-
mon nighthawk nests on the ground, unseen by our eyes despite our attempts. Its own
speckled pattern mingling with the black, ash, and tan of the land, the nighthawk sits
unseen and unheard until one of the biologists whispers to the rest of us, “got it.” I can
see it: its form emerges from its surroundings under my own eyes like an image that
grows from a magic eye test, an organism whose home only a year ago was engulfed in a
700,000-hectare forest fire. It lays there like a taxidermy prop, but breathing rapidly,
seemingly unaware that we see it, its solid black marble eyes glistening with the silence
of a life full of wait. It seems designed for this terrain, and even as I see it. I cannot say in
all comfort that I can fix it under my gaze—my vision cannot hold it in all certainty,
which the nighthawk uses to its advantage when it explodes from the ground where it
nests (when it senses that its young are under threat, the common nighthawk produces a
loud “wing-clap” as it arks dramatically through the air away from its makeshift nest—
they do not nest in trees). This one lands in front of us, clumsily, with its plumage puffed
out and its wing bent backward, meters from its nest, in an attempt to distract us from its
young; one of the researchers slowly positions his iPhone overtop two chicks left on the
ground and takes a picture, and we leave hastily to let the mother return to her pair.
The common nighthawk is difficult to sight: it is nocturnal, it blends with its environ-
ment, it is notoriously elusive, and it is relatively quiet save for a nasally peent! in flight,
and a sonic wing-clap as it dives (Viel 2014). Sighting nighthawks, especially while they
are nesting, is painstaking work, which is why biologists are turning increasingly to
bioacoustics technologies for the purposes of identification and location exercises: an
180 mickey vallee

animal’s sonic emissions serve as a reliable route of access to their location and their
patterns of behavior (Laiolo 2010). But what to do with these sonic emissions, and
how these emissions play into the imagination of science, is my research focus in
this chapter.
In the context of the biological sciences, researchers who use bioacoustics are interested
in animals’ sounds in their ecological contexts and what those sounds might indicate
regarding the security of biodiversity and concerns over ecological depletion (this
context-based program of research is what some call “ecoacoustics”; see Sueur and
Farina 2015). Bioacoustics researchers use a variety of sound equipment to gather and
analyze data: durable autonomous recording units (ARUs) can track years of information
from within one location (Hutto and Stutzman 2009); backpack microphones strapped to
animals’ backs will track their sonic patterns as they move through space (Gill et al. 2016);
data are uploaded onto “listeners” that align the sounds with their appropriate species
(Schroder et al. 2012); and such results are uploaded for international research centers
and for international research teams.
This last point about the digital community of nighthawks is an example of a trans-
acoustic community. Barry Truax defines an acoustic community as an “information
rich” system that uses “acoustic cues and signals” that play a “significant role in defining
the community spatially, temporally in terms of daily and seasonal cycles, as well as
socially and culturally in terms of shared activities, rituals, and dominant institutions”
(2001, 66). But here, a transacoustic community transcends the immediacy of place,
transgresses the boundaries of immediate community, transforms data into inter
national research centers, transcends the visual with auditory analysis that has a better and
higher definition, and transposes from the audible into the visible. Because the sharing
is to access signs of population depletion and biodiversity loss, imagination is a scientific
tool for intervening in avoidable and undesirable futures. Indeed, I am not so much
interested in sound here as I am in sounding as a research method. Thus, in being
interested in how researchers are implicated in the infrastructures they spontaneously
design, I work toward inverting that infrastructure, with an eye to the argument that
such encounters are almost entirely reliant on a specific form of imagination where the
image of sound overrides the evanescence that is so often ascribed to it. Throughout the
chapter, I will attempt to “open the blackbox” of bioacoustics, by exploring the notion
that contemporary bioacoustics encourages hearing without listening: that is, when
emerging sound technologies are capable of detecting small variations in sound, they
register at a much higher accuracy than does human listening; the scientists involved in
this research must develop a technical mastery at species identification, but one that is
visually instead of audibly grounded. Characteristic of other sound-based research
units, bioacoustics researchers use sound not to understand the nature of the sonic but
instead as a means to find palpable solutions to pressing social and environmental
problems using the sonic as a mode for imaging. Bioacoustics researchers are not
intrigued by sound as an object so much as a method.
Since sound technologies and their storage devices have become (1) digitized and
(2) automated, they are capable of capturing the sounds of global populations in real time.
bioacoustics: imaging and imagining the animal world 181

Sound has become an essential methodological device for identifying species, as well as
tracking the polyphonic and polyrhythmic complexities of the various landscapes that
change across ecosystems. The open disciplinary conversation between disciplines and
to the public requires a more flexible and porous usage and definition of emerging sound
technologies, intended to educate members of the public in assisting research projects.
If Henry David Thoreau celebrated the “warbling of the birds ushering in the day” (1885, 35),
and this is certainly not an antiquated attitude toward birdsong today, researchers
today are more accepting of the fleeting nature of sound as “arrangements of charged
particles in the semiconductive materials of solid state “flash” memory, or the magnetic
surfaces of hard drives, tapes, and minidiscs” (Gallagher 2015a, 569). “Common prac-
tices include,” the geographer and sound recordist Michael Gallagher writes elsewhere,
“making field recordings, including the transduction of inaudible vibrations using
devices such as hydrophones and contact microphones; making compositions from
field recordings, and distributing these via CDs, MP3s, vinyl, radio or online platforms
such as weblogs, digital audio maps and podcasts; site-specific performances and
installations; and audio walks designed for listening on portable devices whilst moving
through a particular environment” (2015b, 469).
To elucidate the specific complexity of the transacoustic community proposed, I aim
in this chapter to clarify the general complexity that sound retains throughout creative
imaging processes. Sounding, I argue, has the potential of producing interdisciplinary
and theoretically innovative knowledge that seeks new virtual spatializations of the
earth. I proceed with a description of the historical context through which bioacoustics
became a research focus for those in the biological sciences. I am especially interested in
the move from individual specimens to whole species in their ecological contexts.
I conclude with a brief discussion of the transacoustic community, borrowing from Jakob
Johann Baron von Uexküll’s notion of the Umwelt; Uexküll explored the making of
worlds from a theoretical biological perspective, his ideas about organism self-preservation
deriving from a decidedly antimechanistic perspective that asks us to attend to an organ-
ism’s inner and external sense of events as the habituation of the codes and information
an organism uses to inhabit an environment.

Sounding Animals

There is a long-standing interest in transcribing animals’ sounds: Anthanius Kircher’s

Musurgia Universalis ([1650] 1970) imagines connections between humanity and nature
by designating certain animal songs to the cosmos, such as a variety of birds as well as
the sloth (which exhibits a six-interval vocal range). Ludwig van Beethoven, on his
well-known wilderness walks, incorporated birdsong into his compositions (the most
obvious being the sixth symphony’s “cuckoo calls” [see Baptista and Keister 2005]).
Olivier Messiaen transcribed birdsong and manipulated their speed and range in order
to capture an otherwise imperceptible dynamic (Hill 2007). David Rothenberg’s (2008)
182 mickey vallee

well-known performances for and with a variety of species, live and recorded, continues
this tradition of linking aesthetic, sound, sense, and imagination (and, in Rothenberg’s
case, collaboration). Most of these and other examples have relied on an aesthetic of
listening, which the nineteenth-century music critic Paul Scudo once referred to as
“the divine language of sentiment and imagination” (Scudo, cited in Johnson 1995, 272).
But in this chapter I am interested in the type of imagination belonging to the biological
sciences. Before the mid-twentieth century, researchers in the biological sciences
centralized listening in their data collection and analyses, transcribing the sounds of
animals for the purposes of discovering keys to biodiversity, mating behavior, and the
anticipation of biological change. Because recording devices were too cumbersome, one
had to rely on having a musical ear to transcribe sound in the form of onomatopoeia.
Unconvinced by this method, Albert R. Brand at Cornell University’s Ornithology
Research Lab had attempted to capture bird song with “sound film” (used otherwise for
Hollywood “talkies”), which captured both the image and the sonic emissions of birds.
This he considered a more objective means of capturing sound. Brand had written:

[R]arely do two observers hear the same song in exactly the same way. The song is not
noticeably different when produced by varying members of the species, but by the
time the sound waves have affected the listeners’ hearing apparatus, and have been
transferred by the nerves to the brain, and interpreted by that organ, it has created an
entirely different sensation and impression on each individual listener. (1937, 14)

Although they were grounded in visual images and movement, Brand’s films were still
keen on listening in real time to the sounds of animals. However, by the mid-twentieth
century, the spectrogram was introduced to ornithologists to visualize sonic information.
Spectrograms had a significant impact on the democratization of access to sonic data
and analysis; scientists needed no more musical ear but rather a technical knowhow.
Spectrograms made a direct contribution to the democratization of sound analysis, data
collection, and contributions to science throughout the late twentieth century. Today
the spectrogram image (and its variations) is the most common image of an animal’s
utterance, an image that is inseparable from a new kind of work that would free up the
scientist from the burden of listening and instead place the attention on, first, placing
the equipment and, then, using it to capture the animals’ sounds. The researcher, now
liberated from their own ear, worked with the technology that could pick up the trans-
mission of information. Better yet, the spectrogram was equipped with a capacity for
accurate visualization, given that the vibrations from the needle on the machine would
be etched into a paper surface. The spectrogram caught more than the sound of the
organism, but rather the whole situation within which it was situated; this transcription
of the atmosphere, of its world, allowed researchers to visualize the polyrhythmic com-
plexities of its environment, including its communications with other species.
The spectrogram demanded a unique, visually grounded, art of its own: calligraphy,
traced on paper, meticulously teased out the upper portion of the recorded sound so as
to discover an arc represented through space (frequency) and time (duration). Birdsong
would no longer be described using words, or onomatopoeia for that matter, but had to
bioacoustics: imaging and imagining the animal world 183

have a direct inscription of the ecology in which the organism was situated onto pages,
using ink and paper. These were less contiguities, these points of contact, any kind of
mediation and transduction, than they were direct feelings, motivations, and movements
onto the page in order to trace the otherwise invisible (but real) contours of the bodies
responsible for producing them. Calligraphy thus demanded a particular visual detail of
sonic information that reduced the need for the humans involved in producing them to
listen attentively and instead to trace the contours of a sonic inscription.
The spectrogram was adept at picking up certain important information: the environ-
ment and the ecology in which the animal was situated, whereas the onomatopoeia
transcriptions isolated song from context. What became more important, then, was less
the taxonomy of the animal than what the animals’ sounds could tell researchers about
their surroundings and their environments: how they were situated within a community
or a sound ecology.
Bioacoustics researchers today are interested almost principally in method, technique,
and representation, attributed to the rapidly expanding datasets they have access to.
While some use multiple ARUs to triangulate the position of organisms and their return
to particular locations, others use sound to measure the amount of masking caused by
anthrophonic intrusion (see Berkaak, volume 1, chapter 15, for debates surrounding
cultural heritage). A vast array of representational methods and knowledge syntheses are
available to those interested in bioacoustics, moving well beyond the “manipulation and
playback” model of acoustic ecology, or the GIS (Geographic information system)-based
representations of landscape ecology.
Bioacoustics research is also a response to the uncertainty and anxiety around biodi-
versity loss, on a global scale, and the role that anthrophonic interference is having on
the balance of ecosystems. Some research teams use “noise mapping” by reading city
decibel levels corresponding to a color-coded legend that identify noise “hot-spots”
(Hawkins 2011). With a supposed 83 percent of the land in the United States being about
two-thirds of a mile from a road, conservation officers team up with acousticians and
sound ecologists to reduce the presence of helicopters, planes, and other means of trans-
porting especially tourists into natural landscapes (Powers 2016). Such high levels of
noise have inspired Gordon Hempton to locate the “quietest square inch on earth” in
(ironically enough) the United States that, he claims, has no anthrophonic interference
whatsoever for up to 20 minutes at a time (Berger 2015).
Such searches for quietude against the din of mobile humanity and expanding
urbanization have also resulted in conservationist measures to select habitats and in
the use of sonic technologies as a geoengineering strategy. These researchers have
taken to using sonic technologies to coordinate new and better soundscapes by
masking the anthrophonic interference with loudspeakers planted in natural settings
that are intended to “give back” the soundscape (Berger 2015). Others use multiple
recording technologies to triangulate the exact location of species so as to expedite
conservationist interventions for those who are deserting their natural habitat
(Donaldson 2016). (Triangulation is thus the creation of a virtual space that uses
sound, of tracing the contours of what a place might come to represent.) Playback
(which will be expanded later) is used for giving voice back to place.
184 mickey vallee

Conservationist interventions, if they are successful, must transcend local interventions

and projects, which is the motivation behind such films as Global Soundscapes: Mission
to Control the Earth, an Imax feature that attempts to identify every sound in the world
for a massive online repository of the earth’s sounds. Such interventions are intended to
elucidate the making of a place through sound, but in such a manner that transcends
any one such location, the actualization alluded to above; longitudinal research comes
to life through sonification (Vartan 2016). This is research that is geared toward finding
the changes to “whole, global populations” of species, places, ecosystems, and the
biosphere, and to elucidating the necessity for conservationist intervention for the sur-
vival of the human race (Torino 2015).
Certainly, if bioacoustics researchers are interested in whole populations over long
periods of time, it is becoming increasingly necessary to use technologies that listen to
and recognize patterns, such as the declining sonic signals emitted from organisms. One
research team’s 35,000 samples from recordings in Tippecanoe County in Indiana have
indicated that places with more anthrophonic interference are drowned by sonic drones
that mask the information emitted by organisms which is fundamental to reproduction;
that is, while organisms emit signals for the purposes of biological reproduction, the
anthrophonic is an accidental byproduct of a machine in action, which is, in theory at
least, meaningless beyond the action of the machine (Pijanowski et al. 2011). Interested
in studying “the whole spectrum of acoustical energy in a landscape” (Hall 2016),
soundscape ecologists in Germany have used 300 microphones to record one area;
the microphones are timed to record one minute of sound in the environment every ten
minutes, after which the data is processed by computer using over 120 terabytes of storage.
On the other side of the unimaginably immense, there is the local involvement of citizen
scientists and volunteer biologists, who contribute to the community-building aspects of
global research initiatives. While the data collected and analyzed by these volunteers is
sometimes perceived as borderline spurious (Cohn 2008), the efforts for community
building and for live feedback on scientific methodologies is invaluable. In Canada,
students and community members are working with the University of British Columbia
(UBC) to sonically monitor tankers in the waters around Canada’s proposed west-bound
pipeline (see Mazumder 2016, 5). But beyond the community-oriented or conservationist
mandated studies of global populations, there is another set of practices that uses recorded
sound, as well as the practice of recording sound, in new ways that compress, expand, and
challenge the representations of soundscapes into new experimental cartographies.

Recoding the Recording—Catching

Nighthawks

The bioacoustics researchers with whom I worked in Northern Alberta erected mist-nets
deep in the forest, in some places only accessible by bike or all-terrain vehicle, laced with
ghetto blasters emitting nighthawk calls in order to bait and capture them in flight. Once
bioacoustics: imaging and imagining the animal world 185

captured, the nighthawks were placed into small aluminum tubes and returned to the
research station on the gate of their pickup truck, were measured, and equipped with
a small backpack microphone and a GPS device; the data that is subsequently
recorded onto the microphone is uploaded to international research networks and
measures the “sound-event” of the organism (its heartbeat, its wing pace, its calls,
etc.) against the “sound-scene” of its habitat (the geophonic, anthrophonic, and bio-
phonic data that informs the backdrop against which the sound-events unfold). It was
not necessarily the results of this research that interested me as the infrastructural
labor that went into the capture of data. This infrastructure is conducive with the
“essence of mediations” that Bruno Latour describes as crossing the line between
signs and things. Latour writes:

To be sure, we no longer portray scientists as those who abandon the realm of signs,
politics, passions and feelings in order to discover the world of cold and human
things in themselves, “out there.” But that does not mean we portray them as talking
to humans only, because those they address in their research are not exactly humans
but strange hybrids with long tails, trails, tentacles, filaments tying words to things
which are, so to speak, behind them, accessible only through highly indirect
and immensely complex mediations of different series of instruments . . . Instead
of abandoning the base world of rhetoric, argumentation, calculation—much
like the religious hermits of the past—scientists began to speak in truth because
they plunge even more deeply into the secular world of words, signs, passions,
materials, and mediations, and extend themselves even further in the intimate
connections with the nonhumans they have learned to bring to bear on their
discussions. (1999, 96–97)

Certainly, the sonic imagination around nighthawk bioacoustics (and bioacoustics

generally) is not limited to only those devices that are restricted to the audible, but pertains
also to those that are part of a larger assembly of devices intended to capture the energy
and emissions of organisms in their environment. With the assemblage of capture no
longer restricted to the human capacity to listen, the efficacy and pragmatic efficiency of
capture turned the aural-centric away from the qualities perceived through human
hearing. I will now briefly describe two nodes in the network of nighthawk capture: the
ghetto blaster and the mist-net.
First, the ghetto blaster plays back the sound of a nighthawk in order to bait a nighthawk.
A recording’s playback is partly neutral (it simply plays back that which it once inscribed,
after all, in this case a stock recording of a common nighthawk’s peent!), but playback is
also provocative (it always happens in new arrangements, in new contexts, for new
audiences, in new moments, new times, new places). The flying nighthawk situates the
presence of the phantom nighthawk by diving toward its sound. This reinforces Michel
Chion’s notion of playback, in which “there is something before us whose entire effort is
to attach his face and body to the voice we hear” (1999, 156). Playback is about producing
symmetries between subject and object, which is to say that recording’s code is re-placed
along with the ecosystem in which it is re-placed. Playback changes the place in which
186 mickey vallee

it is situated, it attaches (virtually) bodies to sounds, but in new assemblages (the

nighthawk bird is a nighthawk-net-database). Playback does not happen without
such a change. In playback, the event of recording is transformed into a new event
that involves the event of a bird plummeting into what the biological community
refers to as a mist-net.
The silent partner of the ghetto blaster, but one no less important in the recording
apparatus, is the mist-net. The mist-net emerged along with the spectrogram and
was considered one of the great inventions for ornithologists. Where once orni
thologists used bait to trap individual specimens, by 1947 they were placing mist-nests
around the periphery of their observation areas. Much like the microphones that
capture everything that they come into contact with, the mist-nets were condoned
for capturing everything that would attempt to pass through them, which allowed a
more realistic impression of the numbers of specimens occupying a zone, and which
led to the rise of quantitative measurements over qualitative descriptions of species.
And while today many bioacoustics researchers consider the ARU the best practice
for obtaining sonic data, they still use the mist-net to capture and track individual
specimens (as part of a group) and to receive a more high-definition image of a popu-
lation as it occupies a territory. The differences between qualitative and quantitative
research here are not worthy to parse. Instead, the entire process counts as the “labor
of methods” approach, which is currently gaining popularity in interdisciplinary
research. As Michael Mair and colleagues write of the fabled divisions that keep
quantitative and qualitative research methods distant from one another, such a practice
is best avoided because:

labelling research practices as qualitative or quantitative (or indeed “mixed”) may

well have some uses (as badges of membership, for instance), but the labels themselves
are not specifically descriptive of those practices and should not be treated as such.
Knowing whether a piece of research is qualitative or quantitative, interpretive or
calculative . . . is much less important for characterizing that research than understand-
ing the specific ways in which it makes “the social structures of everyday activities
observable”—that is, how it puts society on display. (2015, 54)

After one of the researchers with whom I worked strapped a backpack microphone
between the wings of a captured nighthawk, she released it to discover that it dropped
like a stone; the microphone was not properly installed and was causing a disequi
librium in the nighthawk’s capacity for flight. Swiftly she moved to its writhing body on the
ground to remove the device, which she did with ease. As the microphones take a great
deal of labor and care to install, it comes as a disappointment when they do not work.
Sound is connected to vibration and the kind of ethnographic fieldwork where sound is
the goal, but when sound is the method, the goal in this case being conservation, then
what does that tell us about the philosophy of soundworlds and the worlds between
sounds? This requires a turn to the transacoustic community as a way of imagining a
“society on display.”
bioacoustics: imaging and imagining the animal world 187

The Transacoustic Community

The common nighthawk, the bioacoustics researchers, the technologies through which
they are measured and made sense of, globally and locally, constitute the image of a
transacoustic community. The transacoustic community is bound by an elaborate
recording/playback apparatus that is not necessarily reducible to the listenable but
expands more generally into recording as a technical and cultural set of images. The
transacoustic community is itself an image central to contemporary debates and dis-
cussions around multispecies encounters: simply, that entities open through their surfaces
onto other entities (these openings are precisely the point of interrogation for bioacou-
stics researchers). There are variegated routes of access to such a conclusion (that enti-
ties have edges that open onto other entities), a few of which I have set out to explore in
this chapter. Of course, there are many ways of doing bioacoustics research, but all these
ways converge on the creation, maintenance, and breaking through of an entity’s
contained space through its sonic emissions; technological assemblages belonging to
bioacoustics researchers are intended to create new images and imaginations for how
these breaches are done.
Entities sound. And insofar as they sound, they make up their worlds. But since entities
sound out to other entities, it is insufficient to claim that the worlds to which these entities
belong are contained. Thus, the notion that a world is not self-contained, but rather
porous and protean, makes it necessary to interrogate the underlying function of worlds
as open, such as Uexküll’s philosophy of the Umwelt, which translates literally as “world
around” (Brentari 2015, 75): it describes a connecting point between an organism’s
interior and exterior sense of events, and describes the habituation of the codes and
information an organism uses to inhabit an environment (von Uexküll [1934] 2010,
126–132). Elizabeth Grosz has expanded on this position, explaining that the human
world finds its equivalent in the professional life of the architect, who brings together
things at the demarcation of boundaries, heterogeneous expressions within a space that
are given meaning through those very heterogeneous expressions (2008, 48). Grosz
accounts for the famous tick that appears early in Uexküll’s book, A Foray into the Worlds
of Animals and Humans, which he uses to go against the physiological approach to
organisms as the sum of independent reflexes, arguing instead that ticks are embedded
in affective worlds. Ticks use the smell of chemicals, the heat of the sun, and the flesh of
the mammal to complete their worlds, once they have conjoined with another organism,
such as the mammal; their world is defined by their connection to another’s world. The
tick’s world is thus complete when it attaches to the edge of another world (the world of
the unaware mammal, for instance, whose own totality the tick is equally unaware of).
While the organism’s perceptive world is an inherited, species-specific conscious per-
ception of those objects an organism perceives as outside of itself (such as the sound-event
mentioned above, the peent! of the nighthawk), its operative world completes the organ-
ism’s immersion in an environment by merging with it (such as the sound-scene, the
188 mickey vallee

flutter of moth wings the nighthawk dives into to consume) (see Brentari 2015, 99). At
stake here is thus, along with the maintenance of worlds as highly dependent on the
membrane of their milieus, the indeterminate nature of worlds as they forever open
onto and into other worlds. Elizabeth Grosz writes that such worlds are musical (a com-
mon audio-based resource for those constructing idealized collective experiences):

the music of nature is not composed by living organisms, a kind of anthropomorphic

projection onto animals of a uniquely human form of creativity; rather, it is the
Umwelten, highly specifically divided up milieu fragments that play the organism.
The organism is equipped by its organs to play precisely the tune its milieu has
composed for it, like an instrument playing in a larger orchestra. Each living thing,
including the human, is a melodic line of development, a movement of counterpoint,
in a symphony composed of larger and more complex movements provided by its
objects, the qualities that its world illuminates or sounds off for it. Both the organism
and its Umwelt taken together are the units of survival. Each organism is a musician
completely taken over by its tune, an instrument, ironically, only of a larger perfor-
mance in which it is only one role, one voice or melody. (2008, 43)

At what point does a boundary turn into a breaking point? Or, at what point does the
edge of one boundary merge into the edge of another? Uexküll establishes his position
against physiological accounts that would see organism and interorganism behaviors as
effects of stimuli reactions between different parts of an organism. This way of perceiving
organisms was isolationist, against environment, and against the notion that an organism
possessed agency in the construction of its world, and had to have agency in order to
converge with the edge of another world. But a cycle is never on its own; it is with other
cycles, and with others, significant and otherwise. Therefore, it is not the world that the
organism creates but its coconstructive capacity for going into other worlds. It is for
every melodic contour an edge of another’s world.
It is sound that accounts for the breakthrough between the edges of worlds. Sound, in
the context of this chapter, is the individuation of energies from separate worlds in
transduction, which involves at once the assemblages of vocal tissue and environmental
biotic and abiotic movements, including all other biophonic, geophonic, and anthro-
phonic crystallizations. When software has been programmed to detect a variant in
sound, a transduction, it is registered at a much higher accuracy than with organic lis-
tening and pattern identification (though researchers often test-listen samples to assure
accuracy), which introduces a technical or mechanical, in any case inorganic, listening
into the process of exploration and discovery. The individuation of longitudinal studies,
such as those that are multisited and multimicrophoned, which record more sound than
is possible to listen to organically, and which is never stable but always in flux, is the
crystallization of the node in a transacoustic community.
Imagination, to imagine, to image; these variations on a term point to the slippage
(linguistically and otherwise) of image; there are countless philosophical explorations
of image and imagination, but a question of how images come to be made through
sound is quite another matter, and one that is often grounded in routine empirics
bioacoustics: imaging and imagining the animal world 189

and the technics of observation. To return to the point of transacoustic communities:

bioacoustics researchers are less interested in sound as an object of analysis itself than
they are in using sound (and expanding definitions of sound) for high-definition
insights into environmental and social problems. As such, they open up sound to a
cross-modality of senses and transductions. We need to further think through sound as
an emerging complexity with expanding boundaries, like the world composed of the
ghetto blaster and the mist-net, which releases us from thinking of a recording as a pris-
tine reproduction of sound “as it is.” Instead we think here through the image, imaging,
and imagination of sound, as it affords the heard and the unheard; within and beyond,
below and above normal human hearing, the capacity to live in resonance between
objects and entities like ghetto blasters, mist-nets, nighthawks, and the global research
teams that they converge on. There is more at stake than having the potential to capture
more sound at a higher definition with a wider grasp at a longer rate, as the opening
example of the common nighthawk suggests, to the extent we imagine place as a boundless
mediation through and resonance between technological interfaces—this is a perspective
that privileges the aesthetic and technological intervention over the usefulness or prag-
matics of learning about biodiversity preservation. Instead, we might consider how new
digital technologies, including the massive amounts of data storage and the live-response
possibilities of constructing transacoustic communities, reveal how the image of sound
(in this case its transduction of energy into forms of information for the purposes of
research) is not necessarily obligated to what we might consider listenable.

References
Baptista, L. F., and R. A. Keister. 2005. Why Birdsong Is Sometimes like Music. Perspectives in
Biology and Medicine 48 (3): 426–443.
Berger, E. 2015. Welcome to the Quietest Square Inch in the U.S. Outside. Outside Online.
https://www.outsideonline.com/2000721/welcome-quietest-square-inch-us. Accessed
September 30, 2017.
Brand, A. R. 1937. Why Bird Song Cannot Be Described Adequately. Wilson Bulletin 49 (1): 11–14.
Brentari, C. 2015. Jakob von Uexküll: The Discovery of the Umwelt between Biosemiotics and
Theoretical Biology. New York: Springer.
Chion, M. 1999. The Voice in Cinema. New York: Columbia University Press.
Cohn, J. P. 2008. Citizen Science: Can Volunteers Do Real Research? AIBS Bulletin
58 (3): 192–197.
Donaldson, A. 2016. National Network of Acoustic Recorders Proposed to Eavesdrop on
Australian Ecosystems. ABC News. http://www.abc.net.au/news/2016-07-11/soundscape-
ecology-could-track-environmental-changes/7587354. Accessed September 30, 2017.
Gallagher, M. 2015a. Field Recording and the Sounding of Spaces. Environment and Planning
D: Society and Space 33: 560–576.
Gallagher, M. 2015b. Sounding Ruins: Reflections on the Production of an “Audio Drift.”
Cultural Geographies 22 (3): 467–485.
Gill, L. F., P. B. D’Amelio, N. M. Adreani, H. Sagunsky, M. C. Gahr, and A. Maat. 2016.
A Minimum-Impact, Flexible Tool to Study Vocal Communication of Small Animals with
Precise Individual-Level Resolution. Methods in Ecology and Evolution 7 (11): 1349–1358.
190 mickey vallee

Grosz, E. 2008. Chaos, Territory, Art: Deleuze and the Framing of the Earth. New York:
Columbia University Press.
Hall, M. 2016. Soundscape Ecology: Eavesdropping on Nature. Deutsche Well (DW). http://
www.dw.com/en/soundscape-ecology-eavesdropping-on-nature/a-19304871. Accessed
March 12, 2017.
Hawkins, D. 2011. “Soundscape Ecology”: The New Science Helping Identify Ecosystems at
Risk. Ecologist: Setting the Environmental Agenda since 1970. http://www.theecologist.org/
investigations/science_and_technology/1171165/soundscape_ecology_the_new_science_
helping_identify_ecosystems_at_risk.html. Accessed September 30, 2017
Hill, P. 2007. Olivier Messiaen: Oiseaux exotiques. Farnham: Ashgate.
Hutto, R. L., and R. J. Stutzman. 2009. Humans versus Autonomous Recording Units:
A Comparison of Point-Count Results. Journal of Field Ornithology 80 (4): 387–398.
Johnson, J. J. 1995. Listening in Paris: A Cultural History. Berkeley: University of California Press.
Kircher, A. (1650) 1970. Musurgia Universalis: sive Ars Magna, Consoni et Dissoni. Hildesheim
and New York: Olms.
Laiolo, P. 2010. The Emerging Significance of Bioacoustics in Animal Species Conservation.
Biological Conservation 143 (7): 1635–1645.
Latour, B. 1999. Pandora’s Hope: Essays on the Reality of Science Studies. Cambridge, MA:
Harvard University Press.
Mair, M., C. Greiffenhagen, and W. W. Sharrock. 2015. Statistical Practice: Putting Society on
Display. Theory, Culture and Society 33 (3): 51–77.
Mazumder, A. 2016. Pacific North West LNG Project: A Review and Assessment of the Project
Plans and Their Potential Impacts on Marine Fish and Fish Habitat in the Skeena Estuary.
Environmental Assessment Report, Government of Canada. Minister of Environment and
Climate Change.
Pijanowski, B. C., L. J. Villanueva-Rivera, S. L. Dumyahn, A. Farina, B. L. Krause,
B. M. Napoletano, et al. 2011. Soundscape Ecology: The Science of Sound in the Landscape.
BioScience 61 (3): 203–216.
Powers, A. 2016. Preserving the Quietest Places. The California Sunday Magazine. https://
story.californiasunday.com/quietest-places-on-earth. Accessed September 30, 2017.
Rothenberg, D. 2008. Thousand-Mile Song: Whale Music in a Sea of Sound. New York:
Basic Books.
Schröder, M., E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, et al. 2012. Building
Autonomous Sensitive Artificial Listeners. IEEE Transactions on Affective Computing
3 (2): 165–183.
Sueur, J., and A. Farina. 2015. Ecoacoustics: The Ecological Investigation and Interpretation of
Environmental Sound. Biosemiotics 8 (3): 493–502.
Thoreau, D. 1885. The Writings of Henry David Thoreau. Vol. 6. Boston: Houghton Mifflin.
Torino, L. 2015. You Can Actually Hear the Climate Changing. Outside. https://www.outsideonline.
com/2035701/you-can-actually-hear-climate-changing. Accessed September 30, 2017.
Truax, B. 2001. Acoustic Communication. Vol. 1. Santa Barbara: Greenwood.
Vartan, S. 2016. We’re Changing the Way the World Sounds: Noise Impacts Ecosystems
in More Ways than You Might Think. Mother Nature Network. http://www.mnn.com/
earth-matters/wilderness-resources/blogs/we-are-changing-way-world-sounds. Accessed
September 30, 2017.
Viel, J. M. 2014. Habitat Preferences of the Common Nighthawk (Chordeiles Minor) in Cities
and Villages in Southeastern Wisconsin. PhD thesis, University of Wisconsin-Milwaukee.
von Uexküll, J. J. Baron. (1934) 2010. A Foray into the Worlds of Animals and Humans, with A
Theory of Meaning. Translated by J. D. O’Neil. London: University of Minnesota Press.
chapter 10

M usica l Notation as
the Exter na liz ation
of Im agi n ed, Compl ex
Sou n d

Henrik Sinding-Larsen

Introduction

At one, concrete level, this chapter is about the innovation of musical notation and how
this tool for the description of sounds affected the way new music was imagined, per-
formed, and socially organized. But there is also another and more theoretical aim; to
see how this case of imagining and describing sonic qualities and patterns can inform
and be informed by a general theory on the emergence of complexity as a result of new
tools for storing, transmitting, and processing information. A key concept in my work
with these topics is externalization. The theoretical aim of the chapter is to explore the
insights we may gain by analyzing musical notation as a case of externalization of sound
or patterns in sound. A subtheme within this endeavor focuses on imagination, and how
imagination can be supported by externalizations. Obviously, there are limits to how a
chapter of this size can provide even a brief overview of the history of musical notation
and its consequences. Thus, only selected contemporary and historical examples will be
dealt with to the extent that they serve the wider and theoretical aim.
Because of this composite aim, the text moves between quite different levels of empirical
detail and theoretical abstraction. It also draws on various academic disciplines.
Before digging into the more theoretical and conceptual issues, I will set the scene
by describing some cultural events where music was important and where musical
notation played quite different roles. The cases are built on concrete events with personal
participation as well as reflections based on other relevant events and sources. The aim is
to highlight differences that are related to musical notation as a tool for the description
of sounds.
192 henrik sinding-larsen

My background for engaging with these issues is broadly interdisciplinary (including

some evolutionary biology and informatics) with an origin in social anthropology and
fieldwork among Norwegian fiddlers and their controversies around notation’s role in
the preservation of a living folk music tradition (Sinding-Larsen 1983, 1991). Having
once been member of the ensemble Kalenda Maya playing medieval and renaissance
music is also an experience relevant to my choice of topic.

A Symphonic Concert

A symphony orchestra is performing a twentieth-century work of classical music. When

the conductor marks the pulse and tempo he mostly follows the score in a fairly regular
way. But at times, the tempo is “contracted or stretched” with substantial amounts of
interpretive freedom, agogic and rubato, whether notated or not. The musicians divide
their visual attention between the conductor and the sheet music in front of them. They
often play parts with a melodic complexity that makes them very difficult or even impos-
sible to recall from memory. Thus, the written music is not only a support for learning
the work in question but an indispensable element of its performance. Each musician
concentrates on their part where every note and interval is played in full compliance
with what is written; no improvisation, syncopation, ornamentation, or gliding from
one note to the next unless explicitly indicated in the score or demanded by the conductor.
The movements of the arms and fingers within the group of violinists could match that
of Olympic medalists in synchronized swimming. Only the conductor is in a position to
give full attention to the work as a totality.
A performance of a symphony orchestra could be analyzed as a complex sociocul-
tural event or a ritual where society (or a section of it) communicates with itself. The
musicians, the conductor, the orchestra as a socioeconomic unit, the audience who paid
for their tickets, and the politicians and sponsors who invested in the concert hall and
the orchestra could all be seen as actors in a ritual with a “liturgy” perfecting and cele-
brating a selection of their cultural values: hierarchical, loyal cooperation with a high
level of standardization within each group (of instruments); balanced and functional
complementarity between groups; individual creativity concentrated at the top level (of
composer, conductor, and possibly soloist); and a high level of individual accomplish-
ment (among the musicians manifested in their technical ability to simultaneously syn-
chronize multiple parameters like melody, rhythm, intonation, timbre, and intensity
while following the score). The overall achievement of this, as with most professional
orchestras, is impressive. A high level of technical difficulty is an important part of their
artistic expression. However, to repeat what has been achieved in the past is not an ideal
for the best orchestras. In the aesthetics of Western classical music, as well as in the gen-
eral modern ethos dominated by the value of “progress,” the ideal is to always try to push
the limits of what is possible not least in technical control. In general, this priority
increases the focus on avoiding errors. A concert at the highest level could be compared
externalization of imagined, complex sound 193

to trapeze artistry without a safety net: Will someone play or sing out of tune? Will
someone miss the timing? Will each note be sufficiently distinct? Will someone play a
wrong note? Will the sum of artistic efforts match the expectations of critics? With such
premises, it is not surprising that a sense of precariousness and nervousness may be
prominent during performance with a corresponding feeling of relief when the concert
ends without flaws.
However, every dimension of symphonic music is not equally complex. If one
masked the variations in pitch, intensity, timbre, and orchestration and just listened
to the rhythmic aspects of a classical concert, then much of so-called advanced
symphonic music would be rather simple if not boring judged by the ideals of, for
instance, jazz or a well-played traditional fiddle tune intended for dancing. Compared
to popular music, the “romantic freedom,” so prominent in much of classical music,
reflects an inversion of priority between the melodic and rhythmic dimensions of
music. In most popular music, a regular pulse and tempo is to a larger extent treated
as an imperative premise on top of which the melody, harmony, and other pitch-
based effects can unfold. In classical music, in particular in the romantic period,
cadences but also other, local melodic events, even a single note, might trigger emo-
tional responses that “demand” more time: time that is not taken from the duration
of other notes in the same measure but that results in a net slowing down of the
tempo. The tempo is, in these cases, treated as an expressive parameter subordinate
to the “needs” of the melody or harmony. Musical “needs” of the melody and har-
mony can “with impunity” override and both accelerate and retard the integrity of a
regular pulse. Deviating from the regularity of pulse and tempo may occur in many
musical genres, also within popular music. Generally, it is most easily practiced and
achieved by a soloist. To achieve the romantic kind of “freedom” with a large orches-
tra is generally very difficult without notation-based instruction and a conductor
during performance.
In popular music, an ideal is often to extensively challenge the main beat through
subtle, improvised syncopations and other off-beat, rhythmic effects while adhering
even more strictly to the regularity of the overall pulse. This is an element in what is
called “groove”; a phenomenon that is hard or impossible to capture with musical nota-
tion (see Danielsen, this volume, chapter 29). As a contrast to this ideal, it is along the
dimensions of pitch (melody, harmony), timbre (instrumentation, orchestral texture),
and intensity (the overall, dynamic “narrative”) that we find most of the complexity of
symphonic music (in addition to agogics and rubato mentioned earlier). The aesthetics
of classical music is often expressed in disciplined, large-scale, hierarchically organized
complexity most of which is impossible to achieve without musical notation. Notation
in this context is, to a large degree, indispensable both for the music’s conception (imagi-
nation) and performance. Nevertheless, within these strict frames, set by the composer’s
notation, uniqueness and creativity are highly valued. Similar values do also permeate
modern complex society in many other domains. For example, laws and contracts could
be thought of as externalizations or notation systems facilitating complex economic and
social processes.
194 henrik sinding-larsen

A yoik—Complexity
on a Different Scale

Seeking to compare a symphonic work with a very different genre, the traditional yoik
from the song tradition used among indigenous Sami reindeer herders in the northern
regions of Scandinavia, is a fruitful endeavor. Yoik has shamanistic origins and was tra-
ditionally only performed by a single singer in small settings without any instruments or
at most a hand-held drum played by the singer. The melodic material and intervallic
range were very limited but could be extensively repeated—at certain occasions until a
state of trance was reached. Within these constraints, there existed a huge variation in
subtle qualities of the voice including countless pitch degrees outside those captured by
a diatonic scale and traditional musical notation. Rhythmically, it alternates between a
relatively steady pulse and more free rhythms; at any moment it is susceptible to pauses for
breathing followed by restarts with no reconnection to the previous pulse (Graff 2007).
This organization of time is incompatible with long, elaborate melodic developments.
However, the range of possible qualities of the voice in yoik by far exceeds the acceptable
ones for a classically trained voice. The pitch is often mixed with expressive guttural
sounds or more relaxed, speech-like qualities which make the pitches less fixed and
“pure” and thus less combinable into polyphonic complexity.
The social and cultural values traditionally communicated through yoik are quite
different from those of a symphony orchestra. The traditional Sami society was small
scale, and personal relations to both humans and animals constituted an important part
of the total amount of the social “glue” keeping society integrated. This represents a con-
trast to the society that produced symphonic music where written laws and contracts
had replaced much of the relational integration that characterized smaller scale societies.
An important function of yoik was to describe or confirm concrete relations. A yoik can
be descriptive of persons, animals, and landscapes, not as externalized descriptions of
these entities, but by performing the yoik as a kind of “speech act” or “song act” which
directly connects the singer with the “described.” An important genre of yoik is called
person yoiks. The Sami singers insist that a person yoik is not about a person, it is that
person. To some extent, a similar logic is operative vis-à-vis animals. A famous yoik
about a wolf chasing a reindeer is modeled on the sounds of howling wolves.1 The yoik’s
main motif ’s most important intervals are a fourth and a fifth. Central to the section of
the yoik “describing” the wolf ’s final attack on the reindeer the fourth is, in a gliding way,
pushed toward the tritone, a particularly dissonant interval, known as Diabolus in
Musica in medieval times. One possible function of the wolf yoik is to connect with the
wolf and, in that way, obtain some magic control over this dangerous predator.
Yoiks have been notated by ethnomusicologists with classical musical notation, but
only few of the important elements in yoik are captured by this tool of description.
An example that could reveal other and subtler contrasts to a symphonic concert
would be sacred music echoing the congregational chant as it may have sounded and
externalization of imagined, complex sound 195

functioned in the cathedrals and monasteries of medieval Europe at the time just before
some of the basic elements of modern musical notation were invented around the
eleventh century ce.

A Responsorial Chant in a Mass

The following example consists of impressions from a chant I witnessed during a

Roman Catholic mass in 2005 in Kraków. This event is obviously not from a preno
tation era, but it had some “historic” qualities that make it useful for comparison to a
symphonic concert.
As I do not understand Polish, I could concentrate on the musical aspects of the
priest’s reading. The distance from text reading to song is narrower in these liturgical
settings than in general. The chant was responsorial in the sense that the priest sang the
verses and the congregation responded with a refrain that contained a relatively
intricate melodic line. The refrain was repeated after each of the many verses, and the
synchrony of the congregation became rhythmically tighter with each repetition. No
organ or large choir dominated the voices of the congregation. The chant was mono-
phonic, unaccompanied song. The church was packed, every seat was occupied, and
many were standing. I saw nobody with a hymnbook or other texts in their hands
during the chant. Their eyes were focused on the priest or beyond, and their musical
response came without hesitation in unison, with firm voices, and organically integrated
to the rhythm and pitch set by the priest. It was obvious that this congregation had par-
ticipated in the ritual many times before. Neither the rhythm of the priest’s chant nor the
response from the congregation was strictly metric. It was influenced by the rhythm of
the words and phrases like a kind of melodic prayer. Without a dominating organ, choir,
hymnbook, conductor, or a melodic style that presupposed a steady beat, the only way to
synchronize tightly with one’s fellow worshippers was through mutual attentive imi-
tation enhanced by the many repetitions of verses in this chant as well as repetitions from
attended masses in the past. The level of synchronization of time and pitch was never
comparable to that of a professional symphonic orchestra or choir with a conductor. But
the congregational chant had some other qualities that followed from the particular
process whereby their level of synchrony was obtained. One reason for the strong
impression that this responsorial chant made on me was what I experienced as a state of
enhanced, mutual, collective attentiveness or awareness of an organic kind. I experi-
enced the synchrony obtained as different from that which follows from the counting of
beats and measures or following the gestures of a conductor. I would describe the differ-
ence as the one between marching together and breathing together.
We could also articulate the difference as one between a less and a more externalized
way of achieving synchrony: synchronization through collectively evolving, mutual
attention versus synchronization through individual adjustment to a common, exter-
nalized, rhythmic template (drum beat or other audible track of reference).
196 henrik sinding-larsen

The metaphor of breathing together as a suitable metaphor for the less externalized
process of synchronization was corroborated by a story a Norwegian singer of Gregorian
chant told me about a workshop he had attended.2 The leader of the workshop, a member of
a renowned ensemble of early music, instructed the workshop participants on how to
synchronize in the spirit of early music. The goal of his exercise was to make all the par-
ticipants start to sing in full synchrony without any prior counting or visual cues, in total
blackness. The only way to achieve this was to listen to each other’s breathing, synchro-
nize the breath, and then start singing. The deeper aim of the exercise was to achieve a
relevant state of mutual attentiveness for performing music closer to an oral or—in my
terminology—a less externalized tradition.
Of course, many other factors contributed to my different experiences of sounds and
music between the mass in Krakow and the symphonic concert. In the mass, there was
no separation between performers and audience. Everyone sang the same monophonic
song except for the priest, who had a more elaborate textual part. None of the sections
in the chant was exceedingly complex. The main values that were celebrated were less
about coordinated hierarchy and excellence and more about community and inclusion
through a shared practice. Also in ecclesiastic settings, hierarchy and excellence may be
valued. But in the Dark Ages, before notation and the splendors of Gothic polyphony,
the sacred music in churches and monasteries was less about the display of artistry and
more about community and participation (Saulnier 2009). This was also reflected in the
more modest complexity of the monophonic Gregorian chant.
To understand what happened to music between the period of early Gregorian chant
and the modern symphony orchestra, it can be useful to dig deeper into the relationship
between notation and externalization.

Externalization and Complexity

Humans had imagined and performed music for a very long time without notation. The
purportedly oldest musical instrument is a flute made of mammoth ivory found in a
cave in Southern Germany. It has been carbon dated to between 42,000 and 43,000 bce
(Goodall 2013, 6). The oldest rudimentary musical notation is from ancient Mesopotamia
(app. 2000 bce) and the oldest efficient notation is from Western Europe around 1000 ce.
So, how and why did this need for a comprehensive notation of musical sounds emerge?
And what were the consequences? I argue that these questions must be answered in the
light of wide, historical transformations where musical notation was just one example
among many other emergent tools of description affecting various domains in society.
The quest for an efficient notation started with an alliance between the imperially
ambitious Frankish king Charlemagne (r. 774–814) and the pope, who both wanted
every Christian in Western Europe to sing the same chants authorized by the Vatican.
The century following Charlemagne saw the rise of a comprehensive project of political
unification supported by religious, educational, artistic, bureaucratic, economic,
architectural, and military standardization (Freedman 2011). Orthography and grammar
externalization of imagined, complex sound 197

were standardized, and the small letters for writing were simplified to promote literacy.
Coins and weights were standardized to promote long-distance trade. Such was the
political and cultural climate in which the development toward an efficient system for
notating music started (Levy 1998). On the frontispiece of many of the newly stan-
dardized, liturgical chant books was a picture of Saint Gregory (pope from 590 to 604 ce)
with a dove (the Holy Spirit) whispering the chants directly into his ear, and a scribe
sitting by his side and notating (Figure 10.1).
A much earlier tool of description of vocal sounds was the phonetic alphabet with
decisive implications for the development of Greco-Roman civilization and its unprece-
dented level of complexity (Goody and Watt 1963; Ong and Hartley 2012). The emer-
gence of programming languages for computers is a recent example with possibly even
more global consequences than the phonetic alphabet. Computer programming also
includes radically new, digital approaches to the description, recording, and production

Figure 10.1 Frontispiece of a chant book from the monastery of Saint Gall circa 1000 ce.
(St. Gallen, Stiftsbibliothek, Cod. Sang. 390, p. 13—Antiphonarium officii [Antiphonary for liturgy
of the hours].)
198 henrik sinding-larsen

of sound (Danielsen, this volume, chapter 29; Knakkergaard, this volume, chapter 6). In
order to understand what happened in the particular case of musical notation, it would
be helpful to understand what all these histories of emergent tools of description have in
common. My main hypothesis is that all are cases of externalization, which is a concept
I have used to bring together several theories on transitions in cultural and natural his-
tory (Sinding-Larsen 1987, 1991, 2008). I have found this concept and perspective useful
for developing a more holistic understanding of cultural history as well as the relation-
ship between cultural and biological evolution.3
There is a need to pay attention to one distinction made in The Oxford English
Dictionary’s definition of externalization: “The action or process of externalizing; an
instance of this; also concr. an embodiment. externalize: To make external; to embody in
outward form.”
What is important to retain here is that the word “externalization” can be used with
two related but different meanings: (1) a process (of making external), and (2) something
concrete, an embodiment which could be the result of that process. I also use “externali-
zation” in more abstract and specialized senses that I will gradually approach through
various examples until I arrive at a more formal, in-depth discussion of the concept later
(see “Externalization and the Emergence of Complexity”).
The action of recalling a pattern of sounds from memory and writing this as a pattern
of note-heads on a staff would qualify as a process of externalization, while the concrete
result of that process, the actual manuscript with notation, could also be called an externali-
zation. In this chapter, it is the process of externalization that is of main interest,
including the larger-scale and more complex processes that follow from the fact that
even externalizations may themselves be externalized. And not only may externali
zations be externalized. In evolutionary time scales, externalizations show a tendency to
become externalized. In spite of periods of significant setbacks, both biological and
cultural evolution are characterized by a long-term tendency toward increased levels
of externalization.

Information and Complexity

in Biology and Culture

Living organisms grow by capturing or diverting flows of energy and materials from the
environment into the dynamics of their bodies. These are materials and energy that
otherwise would have dispersed more directly in accordance with the law of increasing
entropy (the second law of thermodynamics) (Deacon 2012b). Life could be thought of
as an extremely indirect way of dispersing energy and materials, and an overall trend in
the evolution of life is a steady increase in the level of indirectness. A main driver of
increasing indirectness is increasing complexity in information or “informed actions”
that constrain and enable the flows of energy and materials. The ultimate function of
information is to constrain environmental (external) flows of material and energy for
externalization of imagined, complex sound 199

the purpose of maintaining, growing, and reproducing the internal dynamics of a living
body (its interiority). All living organisms need to handle information about how they
performed helpful and harmful actions in the past (memory), a way to repeat helpful
actions in the future (heritable, functional habits/traditions), and when needed (for
example in face of environmental changes), a way to modify habits/traditions through
imagination, creativity, learning, and evolution.
Niche construction is a recent and increasingly important concept in evolutionary
biology (Odling-Smee 2010). Niche construction denotes organisms’ actions in con-
structing and changing their environment for their short-term benefit in a way that also
has consequences for the species’ long-term genetic selection. Beavers build dams with
logs cut by their teeth. The dams favor selection for a flat tail that is adaptive for swim-
ming in the dams. Teeth suitable for cutting trees, dams as a constructed niche, tails for
swimming, and many other features enter into a kind of dialectic or coevolutionary
process. It is argued that humans first developed a rudimentary language as a cultural
(nongenetic) adaptation. Subsequently a language community functioned as a semiotic
niche construction that favored the selection of individuals with larger brains who
processed linguistic signs more efficiently (Deacon 2012a). Further externalizations
through writing, maps, notations, and other semiotic tools have now become part of the
niche or environment in which humans grow up and live.
There is no doubt that the human semiotic externalizations we call science have vastly
increased our species’ ability to channel energy and materials from the rest of the envi-
ronment into our bodies as well as into those of domesticated plants and animals under
our control. Describing and controlling are two closely linked activities not only within
science. The same is true for describing and controlling the sound production called
music. In a wide sense, all music could be thought of as a more or less transient sonic
niche construction where the development of notation systems and tuning systems has
played an important role as semiotic externalizations.
There exist different levels or orders of externalization processes. To create a new
piece of music and write its pitches and rhythmical patterns on paper by means of musical
notation could be thought of as one order of externalization. To improve or create a new
system of notation with which it is possible to write down or externalize entirely new
kinds of sonic phenomena is an externalization process of a comparatively higher order
than just making a description with an existing tool of description. To create a new tuning
system that better matches the possibilities of a notation system could be seen as a form
of externalization that resembles sonic niche construction or at least the construction of
a sonic infrastructure. Finally, the term “externalization” may also be used to denote the
large-scale processes of societal transformation that are a result of multiple, nested and/
or more limited externalization processes. This implies that my concept of externalization
can be used for processes on several levels and often in a wider sense than the colloquial
sense that is mostly concerned with the first order of “making external.” At a basic level,
an alphabet can describe, make explicit, or externalize phonemes in a language. At a
higher level, the process of introducing an alphabet and literacy to a culture that is
without writing has been characterized as “alphabetization” which, in my terminology,
would be an externalization (in the wider sense) by means of writing. The process of
200 henrik sinding-larsen

describing (making explicit, external) a work process in a programming language

interpretable by digital computers can be called a process of digitalization. “Digitalization”
can also denote: “The adoption or increase in use of digital or computer technology by
an organization, industry, country, etc.” (The Oxford English Dictionary). In the same
way as alphabetization and digitalization can be used in both a narrow and a wider
sense, my concept of externalization is meant to capture what alphabetization, digitali-
zation, and the introduction of musical notation have in common on several levels.

The Externalization
of Pitch and Intervals

Something radical and interesting happens when we change from speaking to singing.
The continuous type of pitch variation that characterizes prosody switches to a much
more discrete or discontinuous type of pitch variation that characterizes melodies. The
singing voice often moves in discrete steps between a limited set of pitches with a more
or less fixed pattern of intervals we call a musical scale. One important background for
our affinity toward discrete pitch steps in music is to be found in physical acoustics.
A vibrating object like a string does not only vibrate in its full length but simultane-
ously in fractions of its length. These shorter fractions vibrate at higher frequencies that
are inversely proportional to the frequency of the full length. If the full length vibrates
at 100 cycles per second (Hz), then ½ of the length vibrates at 200 Hz, 1/3 at 300 Hz, ¼ at
400 Hz, and so on. The full-length pitch or frequency is called the fundamental frequency
or just the fundamental. Pitches from the vibrating fractions are called overtones. In a
well-crafted musical string, the most important of these fractions will vibrate in integer
multiples of the whole length and produce pitches that are called harmonic overtones.
The collection of harmonic overtones together with its fundamental is called a harmonic
series. A tone consisting of concurrently sounding overtones from a single harmonic
series is called a complex harmonic tone. In general, overtones are fused with the funda-
mental into a single auditory image with the fundamental being perceived and labeled
as that complex tone’s only pitch. However, the overtones become important when we
judge concurrent intervals between different pitches as consonant or dissonant. An
element in how we judge the consonance or dissonance of an interval is the degree to
which the fundamentals’ overtones overlap and form a single harmonic series or not. If
the overlap is extensive, we could say that the fundamentals are closely harmonically
related. Pitches at an octave apart (frequency ratio 2:1) are the most closely related
because the higher tone has no harmonic overtone that is not also present in the tone
one octave below. This is the physical basis for what we call octave equivalence. We could
think of pitches an octave apart as simultaneously being identical and different in two
different pitch spaces or pitch dimensions. One dimension is continuous and linear (the
height or register aspect of pitch) while the other dimension (variably called “pitch
externalization of imagined, complex sound 201

class,” “tonal chroma,” or the identity aspect of pitch) varies in a discrete and cyclic way
and may be depicted as a circle. This dimension could also be called the harmonic aspect
of pitch, since it is this aspect that is the basis for creating melody and harmony. Scale
comes from the Italian word “scala” meaning ladder. A ladder is ascending in a straight
line, which is a relevant metaphor for the height or register aspect of pitch. But the pitch
class (chroma or harmonic aspect of pitch) changes in steps from one pitch class to the
next until it reaches the octave, which is identical to the pitch class where the movement
started. In other words, a one-octave musical scale is simultaneously a straight ladder of
linearly ascending pitch heights and a circular, harmonic scale akin to a “soft” ladder
that is turned onto itself in a ring and where an octave is “one full circle” (Deutsch 2013).4
Many of the early challenges in developing an efficient notation system had to do with
this double nature of pitches.
The second most harmonically close interval after an octave has the frequency ratio
3:2 and is called a perfect fifth.5 We find this interval between the third and second over-
tone in a harmonic series. The potential symmetries, consonances, and dissonances that
are a part of physical and auditory acoustics have under various cultural circumstances
been exploited to create tensions and resolutions in musical themes and variations.6 To
unleash this literally epic potential one needs to create a tone system (a collection of
pitches and intervals) suitable for moving around in a tonal pitch space in harmonically
relevant and motorically/perceptually manageable scale steps. This can be done in many
ways and can be achieved in an entirely oral tradition. But a notation system coupled
with a system for producing precise and predictable intervals with tunable instruments
might provide significantly extended possibilities. How well the notation system is able
to describe the tone system and its potential harmonic symmetries will influence what
kind of music it is possible to imagine, compose, and perform.

Pythagoras, Sounds, and

Mathematics

Pythagoras was, according to legend, the first to describe (externalize) the size of intervals
by means of what in his time was a relatively new and powerful tool of description:
mathematics. Pythagoras established the basis for the idea that the length of a vibrating
string was inversely proportional to its frequency and that the size of the most consonant
and basic music-relevant intervals could be expressed as small-integer ratios between
string lengths, such as 2:1 (octave), 3:2 (perfect fifth), 4:3 (perfect fourth). Also, the inter-
vallic difference between a fourth and a fifth (with the ratio 9:8) was of particular impor-
tance to the Greeks and was called a tone. Both the arithmetic and geometry of these
ratios were important, because the Greeks by means of calculations and a compass could
construct the length of strings that would produce the theoretically established pitches
and intervals as sounds. The instrument they used for this “sonification” of their theory
202 henrik sinding-larsen

was based on one string with movable bridges above a line inscribed with the appropriately
constructed geometric points. The instrument was called a monochord. With this
knowledge, the Pythagoreans generated a tone-system by repeatedly adding the interval
of a fifth to an initial pitch and then (building on the principle of octave equivalence)
subtracting surplus octaves to locate all scale steps within the first octave (Hansen 2003).
After six applications of a fifth to, for example, F, one gets F–C–G–D–A–E–B. Arranged
within one octave from C the result is C–D–E–F–G–A–B. Although the Greeks used
different note-names, they had created a sequence of seven pitches and intervals (the
diatonic scale) that was to become the backbone of Western music until today not least
because later (medieval) musical notation was specifically developed to fit this particular
scale. The Pythagoreans chose the interval of a tone (ratio 9:8) as their “atom” in the
diatonic version of the tone system. The basic interval structure of the seven diatonic
scale steps (five tones [T] and two semitones [S]) is in today’s major mode given as
follows: TTSTTTS. The cyclic character of the octave implies that the beginning and
end of this linear interval pattern TTSTTTS can be joined to form a circle. Because the
two semitones are asymmetrically located in the circle, one may obtain seven different
diatonic sequences (or scales) of tones and semitones depending on where one starts in
the circle. The Greeks identified these permutations and called them “species of the
octave” but did not relate this to the cyclic nature of the octave.
Today the idea of a musical scale is indissolubly connected to the octave as a cyclic or
symmetrically repeatable segment within the tone system. But in ancient Greece, the
foundational, symmetric scale segment was considered to be the tetrachord, which was
not symmetrically repeatable to the same extent as the octave. A tetrachord consisted of
four notes or scale degrees (three scale steps) where the two boundary notes were fixed
(a perfect fourth apart) while the two notes that separated the internal scale steps could
vary. The tetrachord in the diatonic genus (with internal steps of one semitone and two
tones) was considered to be the most ancient and natural (Atkinson 2009, 11). The Greek
diatonic genus is basically the scale that we still use and that now has attained almost
global dominance not least through the dissemination of modern notation and keyboard
instruments with their diatonic layout of the white keys.
To make the tetrachord segment work for the description of their two-octave diatonic
tone system, the Greeks had to stack the tetrachords in two different ways: one where the
highest note in one tetrachord was also the lowest in the next tetrachord (called conjunct
tetrachords—repeating each tetrachordal scale degree a fourth apart), and another (the
disjunct) repeating a fifth apart. This lack of a uniform symmetry in the stack of tetra-
chords added to the complexity of the Greek way of naming pitches. Each pitch label
consisted of two terms. One part of the pitch name referred to the location of the tetra-
chord in the overall register of tetrachords. The other part referred to the location (or
scale degree) within that tetrachord. However, notes with the same scale degree in two
consecutive tetrachords were only to a limited degree harmonically related and their
relatedness varied depending on the type of tetrachord (conjunct or disjunct). The
Greek diatonic tone-system as sounds was not basically that different from the one
we use today, but their tetrachord-based perspective and notation system provided
externalization of imagined, complex sound 203

support for a different cognitive map with different constraints and affordances for how
symmetries and harmonic possibilities could be imagined.
Aristoxenos, a pupil of Aristotle and a founding father of Greek music theory, was
well aware of the acoustics of the octave and octave equivalence. But for various reasons,
he, along with subsequent Greek philosophers, singled out the tetrachord as their ele-
mentary symmetric segment in the tone system. Some reasons were metaphysical and
related to the magic of the number four: The material world consisted of four elements
(earth, fire, air, and water), and one only needed four numbers (e.g., 16-12-9-8, see
Figure 10.2) to establish the ratios of the four foundational intervals in Greek music theory
(2:1 octave, 3:2 fifth, 4:3 fourth, and 9:8 tone). But it could also be that a music theory and
notation based on tetrachords worked sufficiently well for their basically monophonic
melodies that included intervals smaller than a semitone (in the enharmonic genus)
that in any case were less suited for our kind of polyphony. It could also be that the
symmetry-obsessed Greeks found the lack of symmetry within an entire diatonic
octave (with its two, asymmetrically placed semitones) to be incompatible with the
status of a foundational entity in their tone-system. In any case, the tetrachordal con-
ception of the tone-system represented serious limitations for medieval music scholars

Figure 10.2 The numerical basis for Pythagoras’s harmonic scale. (Detail from woodcut on
page 18 in the 1492 treatise Theorica musice Franchini Gafuri laudensis. Source: Bibliothèque
nationale de France.)
204 henrik sinding-larsen

with the ambition of creating an efficient notation system that could support an emerging
Christian interest in more advanced polyphony.

The Octave Revolution

An important breakthrough came with the late tenth-century treatise (anonymous or by

Pseudo-Odo7) Dialogus de musica, where the basic repeatable segment changed from
the tetrachord to the heptachord (which implies the octave). The basic interval structure
was still the same as in the Greek diatonic genus. But a deeper, octave-based symmetry
was highlighted in a new, simple, and consistent way.
The Greeks described their two-octave tone-system with a notation system using four
different and variously stacked tetrachords with a unique and complex name and note
symbol for each of the fifteen different pitches across both octaves. The same infor
mation was in the new system expressed with one repeatable set of seven Latin letters in
capital and small versions: ABCDEFGabcdefg. The most important part of this innova-
tion was that each of the seven letters represented a pitch with symmetrically identical
intervallic contexts across the entire tone system. Implicitly, the modern concept of
pitch class was thereby invented and the harmonic aspect of pitch (based on the cyclic
character of the octave) had been named and externalized with an unprecedented level
of accuracy and simplicity.
One background for the octave perspective could be Pseudo-Odo’s approach to
the division of the monochord. Whereas Boethius (c. 450–525) established the diatonic
set of intervals by first dividing the monochord in two octaves and then proceeded
with the smaller intervals and pitches variously placed across the two-octave system
(Boethius 1989), Pseudo-Odo found a more ingenious method starting with G and let-
ting the whole string represent one octave and a tone. In this way, he could begin with
constructing a tone (ratio 9:8 from G to A), then another tone (from A to B), then a
fourth from G to C before working his way up the rest of the first octave. Subsequently,
he created all the pitches of the second octave from the first (by dividing the string of
each pitch of the first octave in two equal halves). Pseudo-Odo’s dividing and naming of
the monochord provided a better support for an octave-based imagination of the tone
system than former principles (Atkinson 2009, 212–214). All the sounds and intervals
were the same as those established by Pythagoras 1,500 years earlier, but a new tool of
description, the seven letters in small and capital versions, externalized the fundamental
acoustic symmetry of octaves and thereby the harmonic aspect of pitch (the pitch class)
in a better way. The seven Latin letters rapidly became a dominating way of referring to
diatonic pitches all over Western Europe. This rapid dissemination was also linked to
the more or less simultaneous graphic revolution of staff notation that likewise supported
an octave perspective of the tone system.
The diatonic scale and notation was fairly well fitted to the existing repertory of
Gregorian chant, but the fit was not perfect. Certain chant melodies had an intervallic
externalization of imagined, complex sound 205

structure of tones and semitones that did not fit the notation system. These melodies
needed a semitone interval where the tone system did not provide one. At times this
resulted in melodies that did not fit being simply suppressed or altered to fit the system
(Atkinson 2009, 244). The integrity of the notation system became at this stage more
important than preserving an oral and divine tradition that Saint Gregory purportedly
had received directly from the Holy Spirit! This tells us something about the cultural
power of semiotic conventions. Eventually, the notation system, as well as the diatonic
tone-system, was developed with the addition of more notes and particular signs until
the system comprised twelve semitones in an octave.
The Greek names and symbols (inverted/distorted letters) for pitches were sufficiently
functional for theoretical treatises on music and the storage of melodic shapes for
“archival” purposes but were not for practical playing and sight-singing or the imagi-
nation (composition) of new, polyphonic complexity.
In particular, the Greeks did not use graphics in their notation system to visualize
with iconic resemblance the melodic pitch movements (scale steps) of actual melodies.
This idea appeared for the first time in the ninth century in the context of the Carolingian
renaissance. The treatise Musica enchiriadis (The music handbook) from the second
half of the century is an influential and early example (Erickson and Palisca 1995). The
anonymous author described his pitches with the long Greek note names and also with a
version of an ancient Greek collection of signs called the dasian sign system. But more
importantly, he transferred these signs into a kind of coordinate system where the pitches
of syllables of text were placed on lines ascending on the vertical axis while the temporal
sequence of the same syllables were placed on the horizontal axis.
In the left column in Figure 10.3 we see seven ascending dasian pitch signs (looking
like twisted Fs and referred to in the treatise as notae or notes). The intervals separating
the pitch signs (and hence also the lines) are marked with T for tone and S for semitone
in accordance with the Greek diatonic genus. This treatise, from before 900 ce, contained
the most basic graphic idea of staff notation more than a century before Guido of Arezzo’s
treatise from 1028 ce, which is reckoned as the definitive birth of staff notation.

Figure 10.3 Example of polyphonic chant from the treatise Musica Enchiriadis. (Staatsbibliothek
Bamberg, Msc.Var.1, fol.57r, photo: Gerald Raab.)
206 henrik sinding-larsen

The critical Enchiriadis innovation was (1) to depict the vertically stacked horizontal
lines as placeholders for pitches instead of attaching a letter-based pitch symbol to each
separate syllable in the text (as the Greeks had done), and (2) to specify the intervals
between the lines. This implied a graphic and iconic communication of pitch move-
ments and melodic contours that was cognitively much more intuitive and efficient than
its alphabetic predecessors. However, the author did not take it for granted that his
medieval reader understood his bold abstraction right away. The author asks the reader
to think of the lines as ordered strings (to create associations to the order of strings on a
lyre or harp) and he further asks the reader, “Let these strings be in place of the sounds
the notae signify” (Atkinson 2009, 124). The idea of a quality of a sound (pitch) depicted
as a visual line in a staff had not yet become established imagination. Or we could say
that the idea of pitch had not yet been fully externalized from the actual sounding string
(at least for his readers). Nor had the sound been fully externalized from the syllable of
the sung word that would eventually be replaced with a dot (a note-head). It would take
more than a century before the scribes of music would pick up again this way of depict-
ing a pitch space as a grid of horizontal lines. In the meantime, a quite different system of
notation was developed: the neumes (Figure 10.4).
Neumes also depicted shifting pitches as vertical movements, in particular within
compounded neumes (ligatures). In that sense, they were more intuitive and efficient to
read than the alphabetic notation. But without definitive, horizontal lines (pitches) of
reference and explicit intervallic distance between the lines, it was often impossible to
know exactly how the pitch changed from one neume to the next. Each neume (or group
of neumes) was complex, gestural, dynamic, often with additional hints on duration.
Neumes contained hints about changes in direction but no map with coordinates of the

Figure 10.4 Musical notation (neumes) from circa 900 ce. (St. Gallen, Stiftsbibliothek, Cod.
Sang. 359, p.145—Cantatorium.)
externalization of imagined, complex sound 207

pitch space to help locate from where the changes of direction took place. Neumes
served as mnemonic support for singers who already knew the melodies, but were not
usable for learning melodies or as a cognitive tool to support the imagination of
advanced polyphony.
The concepts egocentric and allocentric are used to characterize two kinds of navigation
(Buzsaki and Moser 2013). To navigate in an unknown landscape without a map requires
egocentric navigation. Any place must be understood in relation to the navigator’s per-
sonal path through the landscape to this location. The experience of a specific location
becomes path-dependent. To navigate with a map is called allocentric navigation because
prior information about the landscape has been plotted onto a map or has been exter-
nalized in a way the navigator can consult independently of her own past itinerary.
The information about a location on a map is thus more path-independent and more
independent of the “ego” as the ultimate point of reference or “point-of-view.” In many
contexts, allocentric may be used as a synonym for externalized, and the process of
externalization could be thought of as “allocentrification.” The neumes in their inability
to depict precise intervals (scale steps) were less externalized from the oral tradition and
thus less allocentric than both the previous alphabetic notation and the subsequent staff
notation, which was first developed with just one line of reference (Figure 10.5), and
then with four, five, or six (Figure 10.6).
The innovative idea of Guido’s staff notation was to combine and develop several
previously established insights: (1) to map the neumes as standardized dots onto a com-
pressed version of the pitch lines established in Musica enchiriadis (using both lines and

Figure 10.5 Neumes on a single line red F-staff. Montecassino, Italy, 2nd half of 12th century.
(“The Schøyen Collection MS 1681.”)
208 henrik sinding-larsen

Figure 10.6 Thirteenth-century polyphonic composition in three parts. (Chansonnier de

Montpellier. H196 p.16. Source: Bibliothèque interuniversitaire de Montpellier. BU historique de
médecine. Credit photo: BIU Montpellier/DIAMM, University of Oxford.)
externalization of imagined, complex sound 209

spaces between lines as horizontal markers of pitch levels) and thereby creating more
compact visual gestalts of the melodic contours; and (2) to specify the intervals between
lines on the basis of octave symmetry, which meant that two note-heads seven ordinary
scale steps apart would always sound as a 2:1 octave. This was not the case in the
Enchiriadis version of the lines because of its consistent use of disjunct tetrachords. The
new staff became a powerful tool for the externalization of not only precise intervals but
also intervals’ contexts by making intervals visible and much more intuitively intelligible
than in any previous notation system. The letter-based pitch symbols had been able to
externalize exact intervals but in a more indirect way than the staff. Two intervals with
the same width, for example the fifths C–G and E–A, had in their letter-based versions
no visual similarities that would tell the reader that both were fifths. The letters only
functioned as ordinal numbers indicating the number of steps from a first scale degree
of a particular segment whether this segment was a tetrachord or an octave. On the
other hand, note-heads on a staff communicated pitch height dissociated from par-
ticular scale degrees. For example, a fifth (two note-heads three lines or three spaces apart)
was immediately visible and recognizable as this interval irrespective of its register or
scale degree. In this way, intervals could be transposed vertically up or down the staff
(register) while maintaining both their visual and sonic characteristics. Although some
adjustments might be needed for the location of semitones, in general, pitch patterns
became visually and cognitively “transposable” on a staff to a much higher degree than
in the letter-based or neume-based notation systems. Through vertical alignment of
several voices, it became easier to visually express and imagine how a particular pitch
was part of several intervals at the same time. This paved the way for more complex
polyphonic compositions. The staff was thus not only important for physical external-
ized notes on tablets and parchment. The staff had become a tool for visual-spatial
imagination of sonic relationships between concordant and discordant intervals, relation-
ships that were not easy to keep track of in a purely aural mode of conceiving music.
The name of an early and important genre of polyphony was counterpoint. This term
alludes directly to the note-heads as points on the staff that would be organized in several
voices, point-against-point, in parallel, oblique, and countermovements. The art of
counterpoint increased in complexity in the coming centuries and, according to many,
peaked with the fugues of J. S. Bach.8 Although notation was crucial in this develop-
ment, the change from an oral to a written musical tradition did not represent a simple,
one-way transition where the importance of interiorized, implicit oral models is decreased
and that of externalized, explicit written models is increased (Berger 2005). Berger
acknowledges Goody when she writes that literacy does not replace orality. Literacy also
creates conditions for a new kind of orality.
Certain genres of music departed more profoundly from their oral/aural origins than
others. Trevor Wishart reflects on the limited success of composers of serial (atonal)
music. He attributes its limited appeal to this music’s almost total reliance on notation
for its imagination; serial music was, according to Wishart, conceived with the eyes and
not with the sense of hearing (Wishart and Emmerson 1996). Music is a sonic art and
will ultimately be linked to hearing. But visual externalizations in the form of notation
210 henrik sinding-larsen

and theories about the geometry of interval ratios have evidently influenced music. The
extent to which the quality of music can be “objectively” determined by means of
mathematical proportions or if music will always depend on idiosyncratic feelings and
cultural preferences has been a contentious issue ever since Aristoxenos’s critique of the
more mathematically fundamentalist Pythagoreans (Boethius 1989, chap. 5).
Any semiotic culture is, in a profound way, both real and imaginary. We could say that
culture is nothing but cemented, habituated, institutionalized, or externalized imagi-
nations that are used for further imaginations and externalizations. Music, art, humor,
and science are human activities where the creative tensions between institutional-
ized constraints and imagination are widely cultivated (Deacon 2006). These activities
are characterized by being both more constrained and regularized on one side and more
relaxed (open to random or unconstrained events) on the other side than ordinary life.
Institutionalized externalizations are essential both for demolishing old complexity and
building new complexity. We cannot predict what in the future will be specifically
imagined or institutionalized, but an externalization perspective on the past may tell us
something about the general shape of certain transitions in this evolution.
In the following, I will take a step back from the externalization of pitch that paved the
way for increased polyphonic complexity to see how this relates to the connection
between externalization and complexity more generally.

Externalization and the Emergence

of Complexity

My current understanding of externalization is influenced by modern evolutionary

theory, not least the theory of what is called the major transitions in evolution (Maynard
Smith and Szathmáry 1995). This is a theory of the evolution of higher-order complexity
based on cooperating individuals that become integrated to a degree where the group
ends up functioning more or less as a superindividual. The prototype example in biology
is the evolution of multicellular organisms from the cooperation of unicellular organ-
isms. Major transitions are about transitions in individuality. Such transitions are inti-
mately linked to the emergence of new ways of communicating information between
the previously autonomous individuals. Theories of major transitions in evolution are
thus in a wide sense about the emergence and consequences of new tools of description.
The development of social groups among humans based on the emergence of language
is reckoned as one such major transition in evolution. We can use this perspective to look
at more limited and specialized cases of emergence of new tools of description and new
cooperative formations like the emergence of musical notation. The emergence of
orchestras playing music that could not have existed without this notation could be
understood, at least in part, as a major transition in this sense. I also propose that an
alternative and more precise term for “major transition” could be an onto-synergistic
externalization of imagined, complex sound 211

transition, because the transition is about the emergence of an ontologically new entity
(a new and higher-level individuality), and because the foundation for its emergence is
the synergy obtained through new levels of cooperation enabled by new tools for the
management of information and knowledge. Terrence Deacon’s related transition to
what he calls higher-level teleodynamics is also an inspiration for my understanding of
externalization; in particular, Deacon’s treatment of the relation between the dynamic
(processes) and the static (constraints) (Deacon 2012b).
Some short conceptual clarifications before attempting to produce a more explicit
definition of externalization: Knowledge: useful information or information with sig-
nificance for the organism (Deacon 2017); and Self: an organism’s most fundamental orga-
nizing principle and what defines its individuation (Deacon 2012b, 465–466). I may also
use self in this wide sense almost as a synonym with individual, which may include even
wider individualities or individual-like entities like an orchestra. The subjective self is
regarded as a more special mode of the wider, organismic self (Deacon 2012b).
In the context of historic developments of new tools of description (or more specifically
semiotic externalizations), I propose to define and explain the concept externalization
in the following way:

Externalization denotes a process of change where knowledge (useful information)

stored as an integral part of the dynamics of an individual self (in some broad
sense intracognitively) is transferred to a more static storage medium external to
the dynamic self (to some degree, extracognitively stored) in a way that makes the
information accessible (for interpretation) multiple times for the same individual as
well as for other individuals (or larger scale individualities) across longer stretches
of time and space than before the externalization (that is, with more independence
from the ephemeral immediacy of the dynamics of the individual self). The new
possibilities for synergy and cooperation between a single self/individual and its
externalizations as well as the new synergy between formerly non- or less-cooperating
selves/individuals/individualities, can create new, extended, and more complex selves/
individualities. To the extent that the new, emergent level of individuality achieves a
sustainable autonomy, we may call it an onto-synergistic transition.

Selves, Individuals, and

Individualities

Central to my definition of externalization is the contrast between the dynamics of an

individual self and a more static storage medium external to the dynamic self. Here, two
contrasts—internal-external and dynamic-static—are connected. As the knowledge or
information in the externalized and more static medium becomes accessible multiple
times to the same self as well as to others, new synergies by means of reflexive cognition
can emerge between individuals and their externalizations as well as between several
individuals engaging in social or distributed cognition and cooperative action. Synergistic
212 henrik sinding-larsen

and cooperative action can become more or less strongly institutionalized or in other
ways fixated as habits or addictions (Hui and Deacon 2010). To the extent that this insti-
tution becomes self-regenerating, self-repairing, and in other ways protects itself from
dissolution (“death”), we may speak of the emergence of higher-order individuality or
an onto-synergistic transition. The members of this higher-order individuality may
share the benefits stemming from the social synergies. But they will in general also have
to pay a price in the form of giving up some of their autonomy and uniqueness for the
sake of the new and larger-scale individuality based on standardized differentiation and
separation of labor or combination of labor (Corning and Szathmáry 2015). A former,
organically grown, relational, and complex uniqueness is replaced with a new, higher-level
uniqueness based on the enhanced combinatorial (often permutational) properties of
the simpler, more standardized elements.
The musicians in a symphonic orchestra or singers in a choir who are not allowed to
improvise or do anything not indicated in the score could represent an example of such
higher-level complexity made up of lower-level, standardized elements that have given
up some of their autonomy to share the gains from a higher-level synergy. Also, in the
development of musical notation, we saw the early neumes, where a single symbol (a
ligature) depicted a cluster of notes together with their intervallic movements as well as
an egocentrically grounded vertical placement on the paper, give way to staff notation
where each pitch had a separate symbol (note-head) and where all intervals were allo-
centrically defined by exact vertical positions in a grid (the five-lined staff). Signs and
symbols are not alive and do not have to literally give up autonomy in the same sense as
living individuals. But there might nevertheless be some interesting similarities between
the processes leading to standardization and increased combinatorial properties of
elements in a semiotic system and musicians in an orchestra.
It is the external and relatively static quality—the quality of being dissociated from the
continuous stream of material and energetic consequences of the dynamic self (its egocen-
tricity)—that provides descriptions and information with a certain distance of virtuality
which in the next turn can become a support for creative imagination. Sounds are inher-
ently dynamic and ephemeral. Descriptions of sounds (or aspects of or patterns in
sound) on paper are more static and have, in that sense, a distance to the immediacy of
the present. For the externalization process to become more than a halt or temporary
postponing of the dynamics of the self (a kind of pause-button), it must also be possible
to copy and manipulate the externalized information (the patterns that contain possible
information) without presupposing or involving interpretation of these patterns. It must
be possible to manipulate the externalized patterns in their preinterpreted state. With a
metaphor from computer science we could express this idea as follows: It must be possible
to rewrite a program while it is not running. If it is possible to come back to or revisit the
static, “frozen,” externalized version of the sound patterns multiple times, to make copies,
change one copy and compare the variant with the original, then the conditions for
evolutionary processes are in place (heredity, mutation, variation, and selection). Such
evolutionary-like processes supported by notation will often be an important part of a
cognition/imagination (Szathmáry and Fernando 2011).
externalization of imagined, complex sound 213

Summing Up

Throughout human history, new tools of description (new semiotic systems) have
catalyzed the emergence of larger-scale social and cultural institutions. The emergence
of language paved the way for the first human tribal groups based on culturally accumu-
lated skills and tools. The emergence of the first cities and empires (Mesopotamia) was
based on taxation that presupposed records of taxation based on the invention of writing
(including numerals) (Scott 2017). On a smaller scale, but in the same direction, musical
notation contributed to the emergence of large-scale symphony orchestras playing
music with a harmonic complexity unthinkable within a purely oral tradition. It was
notation’s quality of comprising externalized patterns of sound that enabled the syner-
gies of multiple musicians playing different parts reading from multiple, identical copies
of these patterns. Externalization also enabled new synergies between the two senses of
vision and hearing as a support for the composer’s combined aural and visual imagi-
nation of the complexities of multipart polyphony. These several changes promoted each
other in a dialectic or coevolutionary manner. The early tools for the description of
sounds were tailor-made for the diatonic tone system with its relatively limited number
of pitches available for singing, playing, and composing (imagining). But the notation’s
explicit (externalized) character also made it easier to explore this limited pitch space
to its edges from where it was possible to look further, toward new, “forbidden” or
“unimaginable” notes and interval patterns. In the Middle Ages, music that included
notes outside those accepted in the early notation systems was called musica ficta or musica
falsa as opposed to the music within the system which was called musica recta or
musica vera (“true” music) (Bent and Silbiger 2017). This lasted until the authoritative
theoreticians (guardians of the notation norms) had accepted an unlimited use of the
supplementary signs, accidentals, key signatures, and so forth. These amendments to
the notation system implied that the staff, originally tailor-made to describe the seven-step
diatonic tone system, could now describe a twelve-step, chromatic tone system. As the
enhanced notation system now supported the imagination of more audacious harmonic
modulations (in particular on instruments with fixed pitches like organs and harpsi-
chords), an old discrepancy between notation and sound became more acute. All
intervals that were visually identical on the staff (that spanned the same number of lines)
were not acoustically identical as sounds. The result was that certain intervals that, on
paper, should be consonant sounded dissonant. This discrepancy was eliminated by a
fully homogenized tuning system. With the tuning system called twelve-tone equal
temperament, one obtained for the first time a full symmetry between notated intervals
and acoustic intervals. The synergy between the visual and the aural intervals in this way
became complete, and the number of combinatorial possibilities increased significantly.
Bach celebrated the path toward equal temperament with his famous collection Das
Wohltemperierte Klavier, which, for the first time, exploited all the possible keys and
modes of his time.
214 henrik sinding-larsen

Notation of sounds (both intervallic and rhythmic aspects), imagination of sounds,

production of sounds, and appreciation of sounds (the aesthetics of the new harmonic
complexity) were all entangled and functioned at various historical stages as premises
for each other’s development in what can be analyzed as processes of externalization.
Semiotic externalization in music did not occur isolated from other cultural and
societal externalization processes. The standardization of elementary intervals in the
tone system, together with efficient tools of description, complemented the emergence
of large-scale synergies and complexities in other domains like architecture. The thick
walls of Romanesque churches were often made of stones with nonstandardized sizes
more or less shaped by nature. The elements in walls, columns, and arches of Gothic
cathedrals were more standardized, were thinner, and let in more light not least thanks
to the use of mathematical calculations in the design of “flying buttresses” that sup-
ported higher constructions with less stone. The Gothic “revolution” coincides precisely
in time with the development of staff notation, and the world’s first four-part polyphony
was performed in 1198 ce in Notre Dame of Paris while it was still under construction.
Increased specialization of work processes and complex divisions of labor also had
cognitive consequences demanding both more standardized and more flexible selves
able to calibrate to various and varying allocentric coordinates of reference. The emer-
gence of the vanishing point in perspective painting is another example of the increased
externalization that characterized the shift from the Middle Ages to early modernity.
Egocentric, intuitive craftsmanship was, to an increasing degree, supplemented with
explicit, written design based on geometry and other science-based principles. Increased
standardization on a lower level combined with an increased complexity on a combina-
torial level involved increased openness to combinatorial synergies. A full exploitation
of the new combinatorial flexibility based on standardized elements presupposes an
ability to switch between egocentric and allocentric perspectives. Performing a Gregorian
plainchant in a single mode could be seen as celebrating an attitude toward life where
the singers were never too far removed from either a spiritual or a tonal “home” and
where the collection of notes was limited and predictable, as if given by God. The
extended modulations made possible by the chromatic scale of equal temperament
represented a new flexibility in perspective shifting, a cultural value that would become
increasingly important and celebrated in the emergent, creative individualism of
large-scale, complex modernity. Scales based on equal temperament, together with their
harmonic possibilities, have now penetrated music cultures all over the world to the
point where scales built on traditional, nonequalized intervals are either already extinct
or critically endangered (Huron 2008).9
There exist a number of attempts to revive the qualities of prenotational music, both
within Gregorian chant and various kinds of folk music. But the power of the many
externalization processes in modernity, of which notation-based music is just one, is so
massive that attempts to preserve “premodern” music qualities are extremely difficult
except for those prenotational practices that lend themselves to externalization. The
pioneers of the Gregorian chant revival movement put great emphasis on the earliest
externalization of imagined, complex sound 215

neumes, as this notation contained information about the previous oral chant tradition,
information that was lost in later and more standardized editions of the chant books.
But the revivers’ quest for the oldest, most “authentic” manuscripts has made them
more obsessed with notation than ever before. The old neumes were treated as
“directly externalized authenticity,” which becomes somewhat of a paradox if the aim
is to revive qualities of a prenotational past (Bergeron 1998). Obviously, there exist no
recordings of prenotational Gregorian chant. In that respect, certain folk music tra-
ditions are in a better position. Recorded music from oral traditions does exist and repre-
sents a new kind of externalization which captures many details that escaped the
limited descriptive power of traditional notation. But a meticulous copying of a
recorded tune can never be the same as learning music in a traditional, small-scale,
oral setting. The increased power of recordings as externalizations might, in some
senses, even increase the distance to an oral tradition because the externalized template
for what is “authentic” becomes more totalizing with less room for a personal interpre-
tation than one based on a crude transcription with standard notation. It seems that
some aspects of a lower level of externalization simply cannot “survive” the descriptive
power of higher-level externalizations.
Today, thousands of people worldwide are at any one moment engaged in imagining
and describing innumerable physical, social, and cultural processes by means of
computer-based tools of description (not least for creating artificial intelligence and
virtual reality, including elaborate soundscapes). These tools (programming languages)
have a descriptive power far beyond anything the medieval and renaissance creators of
musical notation could have imagined. With the modern sound and music applications
of the digital age (Knakkergaard, this volume, chapter 6), the distinction between nota-
tion (descriptive tools) and music has to some extent been abolished. Whatever can be
formally described can automatically be played, and whatever can be played can auto-
matically be described. All aspects of life (or music) cannot be formally described, but
the proportion that can, increases steadily. Humanity is undergoing multiple processes
of externalization contributing to a major transition in cultural evolution with conse-
quences comparable to those following the invention of writing or maybe even the
emergence of language. My contention is that we may get a better understanding of what
may be gained and lost in this transition by looking closely at what happened to music as
a result of what in hindsight looks like a comparatively innocent medieval improvement
of the Greek way of describing, imagining, and controlling musical sounds. The aim of
the concept and theory of externalization is not to make normative judgments on what
is “progress” or what is better or worse music. It is to show how externalization processes
are deeply transformative and that increased complexity at a large scale may be insepa-
rable from reduced complexity at lower levels. My ultimate goal with the concept of
externalization applied to the history of describing and imagining musical sounds is to
create a distance of reflection to both the historic processes and our current global
dynamics so that we are better able to imagine what might follow. The ultimate ambition
of the concept of externalization is thus to function as a good example of itself.
216 henrik sinding-larsen

Acknowledgments
The research leading to this chapter has received funding from the European Research Council
under EU’s Seventh Framework Programme (ERC Grant Agreement no. 295843), the
Research Council of Norway (SAMKUL project no. 246893/F10), and Department of Social
Anthropology, University of Oslo. Thanks to Hans M. Borchgrevink, Henning Kraggerud, Rob
Waring, Tellef Kvifte, Tim Ingold, Chris Hann, Viggo Vestel, Tord Larsen, Alix Hui, Ola Graff,
Maria Kartveit, Thomas Hylland Eriksen, Lars Henrik Johansen, Mark Grimshaw-Aagaard,
and Martin Knakkergaard for important feedback to the manuscript and contributions to the
writing process.

Notes
1. Performer Per Hætta, 1960, Track 24 on CD (1995) Norsk folkemusikk: 10: Folkemusikk frå
Nord-Noreg og Sameland. (Norwegian folk music: vol. 10: Folk music from Northern Norway
and Sameland). Grappa musikkforlag AS, Oslo. GRCD 4070.
2. Thanks to Hans M. Borchgrevink for this story.
3. An in-depth genealogy of the concept externalization falls outside the scope of this chapter,
but Leroi-Gourhan’s term “exteriorization” from 1964 is a precursor (see Ingold 1999) and
Hegel’s term “entaüsserung” from 1809, translated as “externalisation” in Rae (2012), may
be the first use with a related meaning to the one used in this chapter.
4. The two dimensions of pitch have been depicted in a combined way as a three-dimensional
spiral or helix ascending one octave per cycle (Deutsch 2013). But the ladder, circle, and
helix are simplifications of the complex and entangled relationship that exists between
pitch heights and pitch classes involving both physical, physiological, and cultural factors.
5. The terms “octave,” “fifth,” “fourth,” and so on, refer to the intervals one covers with
scale steps in the diatonic tone system counting the starting pitch of the scale as the first
scale degree.
6. Several other and more personal and cultural factors than vibrational interference patterns
may influence the judgment of consonance versus dissonance. Nonetheless: “Preference
for consonance over dissonance is observed in infants with little postnatal exposure to
culturally specific music . . . . Consonance and dissonance play crucial roles in music across
cultures: whereas dissonance is commonly associated with musical tension, consonance is
typically associated with relaxation and stability” (Thompson 2013, 108–109).
7. The anonymous author was for centuries thought to be Odo of Cluny (this is now proven
to be incorrect, and instead the author is often referred to as Pseudo-Odo) (Atkinson 2009).
8. A dynamic visualization of Bach’s notation-based polyphony can be watched on the online
video “Music+Math; Symmetry” provided by the Santa Fe Institute: http://tuvalu.santafe.
edu/projects/musicplusmath/index.php?id=35. Accessed November 10, 2017.
9. Small-scale music cultures are particularly affected, although ethnopolitical movements
may to some extent counteract these global trends. An example: In Norwegian folk song,
nonequalized scales that previously were nearly extinct are now regarded as a valuable cul-
tural trait and increasingly used among professional folk singers. However, they do not
represent a widespread music culture and their mastery of the traditional scales mostly
resembles that of a second language while their first tonal language remains in equal
temperament.
externalization of imagined, complex sound 217

References
Atkinson, C. M. 2009. The Critical Nexus: Tone-System, Mode, and Notation in Early Medieval
Music. Oxford: Oxford University Press.
Bent, M., and A. Silbiger. 2017. Musica Ficta. Grove Music Online. Oxford Music Online. http://
www.oxfordmusiconline.com/subscriber/article/grove/music/19406. Accessed November
15, 2017.
Berger, A. M. B. 2005. Medieval Music and the Art of Memory. Berkeley: University of
California Press.
Bergeron, K. 1998. Decadent Enchantments: The Revival of Gregorian Chant at Solesmes.
Berkeley: University of California Press.
Boethius, A. M. S. 1989. Fundamentals of Music. Translated, with introduction and notes by
C. M. Bower. Edited by C. V. Palisca. New Haven, CT: Yale University Press.
Buzsaki, G., and E. I. Moser. 2013. Memory, Navigation and Theta Rhythm in the Hippocampal-
Entorhinal System. Nature Neuroscience 16 (2): 130–138.
Corning, P. A., and E. Szathmáry. 2015. “Synergistic Selection”: A Darwinian Frame for the
Evolution of Complexity. Journal of Theoretical Biology 371: 45–58.
Deacon, T. 2006. The Aesthetic Faculty. In The Artful Mind: Cognitive Science and the Riddle
of Human Creativity, edited by M. Turner, 21–53. Oxford: Oxford University Press.
Deacon, T. W. 2012a. Beyond the Symbolic Species. In The Symbolic Species Evolved, edited by
T. Schilhab, F. Stjernfelt, and T. Deacon, 9–38. Dordrecht, Netherlands: Springer.
Deacon, T. W. 2012b. Incomplete Nature: How Mind Emerged from Matter. New York:
W.W. Norton.
Deacon, T. W. 2017. Information and Reference. In Representation and Reality in Humans,
Other Living Organisms and Intelligent Machines, edited by G. Dodig-Crnkovic and
R. Giovagnoli, 3–15. Cham, Switzerland: Springer.
Deutsch, D. 2013. The Processing of Pitch Combinations. In The Psychology of Music, 3rd ed.,
edited by D. Deutsch, 249–325. San Diego: Elsevier.
Erickson, R., and C. V. Palisca. 1995. Musica enchiriadis and Scolica enchiriadis. New Haven,
CT: Yale University Press.
Freedman, P. 2011. The Early Middle Ages, 284–1000 (HIST 210). Open Yale Course Online
Lecture. https://oyc.yale.edu/history/hist-210. Accessed May 16, 2016.
Goodall, H. 2013. The Story of Music. London: Chatto & Windus.
Goody, J., and I. Watt. 1963. The Consequences of Literacy. Comparative Studies in Society and
History 5 (3): 304–345. doi:10.1017/S0010417500001730.
Graff, O. 2007. Om å forstå joikemelodier: Refleksjoner over et pitesamisk materiale. Svensk
Tidskrift för Musikforskning 89: 50–69.
Hansen, F. E. 2003. Tonesystem. In Gads musikleksikon: Sagdel, Vol. 2, edited by F. Gravesen
and M. Knakkergaard, 1641–1646. Copenhagen. Denmark: Gad.
Hui, J., and T. Deacon. 2010. The Evolution of Altruism via Social Addiction. In Social Brain,
Distributed Mind, edited by R. I. M. Dunbar, C. Gamble, and J. Gowlett, 177–198. Oxford
and New York: Oxford University Press.
Huron, D. 2008. Lost in Music. Nature 453 (7194): 456.
Ingold, T. 1999. “Tools for the Hand, Language for the Face”: An Appreciation of Leroi-Gourhan’s
Gesture and Speech. Studies in History and Philosophy of Biological and Biomedical Sciences
30 (4): 411–453.
Levy, K. 1998. Gregorian Chant and the Carolingians. Princeton, NJ: Princeton University Press.
218 henrik sinding-larsen

Maynard Smith, J., and E. Szathmáry. 1995. The Major Transitions in Evolution. Oxford:
Freeman Spektrum.
Odling-Smee, J. 2010. Niche Inheritance. In Evolution: The Extended Synthesis, edited by
M. Pigliucci and G. B. Müller, 175–207. Cambridge, MA: MIT Press.
Ong, W. J., and J. Hartley. 2012. Orality and Literacy: The Technologizing of the Word. 3rd ed.
London and New York: Routledge.
Rae, G. 2012. Hegel, Alienation, and the Phenomenological Development of Consciousness.
International Journal of Philosophical Studies 20 (1): 23–42.
Saulnier, D. 2009. Gregorian Chant: A Guide to the History and Liturgy. Orleans, MA:
Paraclete Press.
Scott, J. C. 2017. Against the Grain: A Deep History of the Earliest States. New Haven, CT: Yale
University Press.
Sinding-Larsen, H. 1983. Fra fest til forestilling: Et antropologisk perspektiv på norsk folkemusikk
og dans gjennom skiftende materielle, sosiale og ideologiske betingelser fra nasjonalromantik-
ken og fram til i dag. Magister Artium dissertation. University of Oslo.
Sinding-Larsen, H. 1987. Information Technology and the Management of Knowledge. AI &
Society: The Journal of Human-Centred Systems and Machine Intelligence 1 (2): 93–101.
Sinding-Larsen, H. 1991. Computers, Musical Notation and the Externalization of Knowledge:
Towards a Comparative Study in the History of Information Technology. In Understanding
the Artificial: On the Future Shape of Artificial Intelligence, edited by M. Negrotti, 101–125.
London: Springer.
Sinding-Larsen, H. 2008. Externality and Materiality as Themes in the History of the Human
Sciences. Fractal: Revista de Psicologia 20 (1): 9–17.
Szathmáry, E., and C. Fernando. 2011. Concluding Remarks. In The Major Transitions in
Evolution Revisited, edited by B. Calcott and K. Sterelny, 301–310. Cambridge, MA: MIT Press.
Thompson, W. F. 2013. Intervals and Scales. In The Psychology of Music, 3rd ed., edited by
D. Deutsch, 107–140. San Diego: Elsevier.
Wishart, T., and S. Emmerson. 1996. On Sonic Art. Contemporary Music Studies,
12. Amsterdam: Harwood.
chapter 11

“. . . th ey ca l l us
by ou r na m e . . .”
Technology, Memory, and Metempsychosis

Bennett Hogg

Introduction

In the following chapter I shall be proposing that living, as we do, in a world where
sound recordings are a major element in our sonic ecosystems, we cannot think about
sound without considering the ways in which recording technologies affect and inform
our experiences of listening more widely. That perception and memory have been
invoked to account for sound recording is widely noted (perception and memory forming
a structurally congruent pair to recording and playback). However, in line with phe-
nomenological positions developed since Husserl, we cannot discount imagination
when we talk about perception and memory. This problematizes the assumed congruency
between sound recording and memory, there being no equivalence of imagination
immanent to the medium of sound recording itself (as opposed to imaginative ways artists
might use sound recording). After examining several problematic mappings of sound
recording and memory I shall be proposing the animistic doctrine of metempsychosis,
or the transmigration of souls, as a more suitable model of sound recording than the
more obvious and culturally embedded one of memory.
Recordings of all kinds, from the written word to the digital photograph, have, for
millennia, held associations with memory. Since at least Plato’s concerns that writing, as
one form of recording, undermines human memory, to the relatively recent use of the
term “memory” to refer to storage of information on computer hard drives, recording
and memory have gone hand in hand. The etymology of the word “recording” refers, of
course, directly to remembering—from the Latin recordare—though it is worth pointing
out at the outset that to record and to remember are not always the same thing, though
220 bennett hogg

they are often part of the same, larger process. Freud, in 1930, proposed the gramophone
as a prosthesis of memory, along with the camera as one of the “materializations” of the
“innate faculty of recall” ([1930] 2004, 35). Sound recording has been widely—and on
the whole unproblematically—figured as a metaphor of memory, as has the photograph.
In parallel to this, memory has been conceived in terms of sound recording—or other
forms of inscription (Freud [1924] 1961; Draaisma 2000; Terdiman 1993)—such that it is
not always fully possible to determine exactly which is the metaphor and which the original
object; indeed, as with so many phenomena, determining which aspect is originary and
which consequent is difficult to determine; the conundrum of the chicken and the egg.
Metaphors do have a tendency to acquire power over their referents, though (see also
Walther-Hansen, volume 1, chapter 23, for more on metaphor and recording technology).
Even as they illuminate those aspects of a phenomenon which they resonate with, their
apparent efficacy can dazzle us, and cast into shade, those aspects of a phenomenon that
are not accounted for in the work the metaphor does. We should not, therefore, take the
prosthesis of memory to be the same thing as memory—a prosthesis may extend human
capacities to remember, or compensate for the failures of memory (Armstrong 1998, 78),
but its prosthetic activity is only really active as one among many different elements that
go together to make up memory tout court. Memory, even as a metaphor, is less like a
recording than we might at first think, complicated as it is by being a malleable element
within a greater ecosystem of embodied consciousness. Recordings behave in ways very
unlike memory, for similar reasons, being enmeshed in their own cultural, material, and
creative ecosystems which, though having significant overlaps with ideas of memory
because of their mediating role in human culture, are also in some respects quite radi-
cally separate from, and not well accounted for by recourse to, models drawn directly
from human memory. In particular, recent thinking has credibly challenged formerly
held ideas about the ways in which perception, memory, and imagination work together.
A traditional linear model, in which perception sends images into memory which serves
as a resource for imagination to draw on, is perhaps too strictly causal to account for the
complex procedures involved in many acts of imagination; doodling absentmindedly
on a piece of paper, for example, and then realizing that one has drawn a monster.
Though distinct and dissimilar from one another, memory, perception, and imagination
cannot, in terms of the ways in which they interact and mutually inform one another, be
separated from one another, as Bergson asserted early in the twentieth century. In this
chapter, I have used the constellation of memory and imagination as mutually critical
tools to destabilize a received wisdom that understands recording and memory as being
adequately similar to one another. The key problem with the admittedly persuasive idea
that memory and recording can stand as productive metaphors of one another, is, in a
nutshell, that recordings can stand on their own, and the signals they contain can remain
more or less unchanged,1 can persist as entities, whereas human memories, through
their positioning inside of a psychic ecosystem of consciousness, action, and agency in
which forgetting, imagination, and supposition render them much less fixed, are much
less discretely organized with respect to one another. Imagination, then, is my principal
critical tool for prizing apart the connection of memory and recording, and at the same
time a phenomenologically informed understanding of imagination affords a process
technology, memory, and metempsychosis 221

through which other conceptualizations of recorded sound become possible. What

follows is, to paraphrase Freud, speculation,2 but this is a speculation that is con-
cerned less with branching out into the unknown than with seeking to integrate my
different readings of the disparate cultural phenomena that have become associated
with sound recording.

Memory and Imagination

Once we start looking into memory and imagination it soon becomes clear that we are
dealing with multiplicities rather than directly definable, unitary phenomena, each of
which have convoluted histories, and which, depending on the philosophical approach
grounding them, continue to carry aspects of these histories into the present. Sometimes
such aspects seem to cross over to, or to repopulate, sociocultural phenomena in the
present day. Casey, for example, notes how memory has been “frequently confined to
a passively reproductive function of low epistemic status” (1977, 187) in Western phi-
losophy since at least Aristotle. The notion that memory is passive and merely reproductive
places it in a subordinate position within consciousness in relation to sense perception,
which, in Hume, Kant, and to an extent Merleau-Ponty, is seen as primary. Insofar as
sound recording has been associated with memory, this subordination of it with respect
to perception finds echoes in Adorno’s dismissive words on sound recording as “not
good for much more than reproducing and storing a music deprived of its best dimen-
sion, a music, namely, that was already in existence before the phonograph record and is
not significantly altered by it” ([1934] 2002, 278). Even if, as Thomas Levin notes, a dis-
trust of the mimetic in Adorno may be “to some degree a function of the Jewish taboo on
representation so central to Adorno’s aesthetic” (Levin 1990, 25), it nevertheless fits in
with a more widely distributed distrust of “mere” copies or images, also manifest in vari-
ous iconoclastic moments such as Puritanism’s destruction of religious paintings and
statuary in England in the sixteenth and seventeenth centuries, and antagonized by the
likes of Warhol and Lichtenstein bringing techniques and ideologies of the mechanical
copy into the mainstream art world in the 1960s.
But such a conflation of memory with the reproduction of copies misrepresents how
memory operates—a misrepresentation that predates the inception of sound recording
and which therefore demonstrates how already existent cultural and philosophical
values colonize, as models of thinking, emergent technologies. Clearly, the phonograph
was conceived as an extension of memory almost at the moment of its inception.
Johnson reports that “at any future point in history” the recorded voice can be recalled
(Johnson 1877), but the relatively “low epistemic status” that has historically accrued to
memory is later compounded in the case of the phonograph through the central role
sound recordings come to play (not really envisaged by its inventor in the early days) in
mass culture—Adorno and Horkheimer’s culture industry—whose financial successes
come at the price of a failure (if it is indeed a “failure”) to attain high cultural value as
“Art.” Financial success and artistic failure are both, of course, factors in the reproducibility
222 bennett hogg

of identical copies: as Adorno articulates it, reproduction-as-repetition is “the very

antithesis of the humane and the artistic, since the latter cannot be repeated and turned
on at will but remain tied to their place and time” ([1934] 2002, 278).
In the eighteenth century, memory and imagination were seen as subordinate, second-
ary phenomena with respect to perception, “the act of acts, from which all other acts of
mind are seen to stem” (Casey 1977, 188). In Hume’s Thesis, for example, sense “impres-
sions” are primary, and are then “copied or reflected by ideas” (Warnock 1976, 37).
Hume’s “ideas” seem to have the character of remembered “impressions,” but are not
counted as “memories” per se. For Hume, the mental force that turns impressions into
ideas is imagination, articulating something of the closeness with which imagination
and memory have been associated together, beyond their commonality as the poor
cousins of perception. In the same philosophical tradition, Hobbes had already proposed
that “Imagination and Memory are but one thing, which for divers considerations hath
divers names” (cited in Casey 1977, 188), and something of this seems to carry through
into Hume’s “ideas.”
That imagination might have a sort of bridging function is implicit in what has just
been outlined, and also strongly present in the ways in which Kant, in Warnock’s analysis,
understood an existence that was either purely intellectual or purely sensory to be an
impossibility, with imagination serving to bring the intellectual and the sensory together.
“Neither understanding alone nor sensation alone can do the work of imagination, nor
can they be conceived to come together without imagination. For neither can construct
creatively, nor reproduce images to be brought out and applied to present experience.
Only imagination is in this sense creative; only it makes pictures of things. It forms these
pictures by taking sense impressions and working on them” (Warnock 1976, 31). There
are two degrees of imagination in Kant, the empirical (or reproductive) imagination,
and the transcendental (or productive) imagination, distinguished from one another by
the fact that “the transcendental imagination is said to have a constructive function . . . it
is an active or spontaneous power” (Warnock 1976, 30). The reproductive imagination,
by way of contrast, can only recreate images stored in the memory (see, 26–32).

Memory and Representation

Toward the end of the eighteenth century Blake would write, “Imagination has nothing
to do with memory,” in the margins to a collection of Wordsworth’s poems, identifying,
in terms current among the nascent Romantic movement, imagination as “the Divine
Vision” that is only vouchsafed to the “Spiritual” rather than “the Natural Man” (Blake
[1927] 1975, 822). In seeking to promote the notion of a creative and spiritually inspired
imagination, Blake repeats the compensatory denigration of another psychic element—
memory. As noted, there has been a tendency to see memory in terms of the storage and
retrieval of “images”—of copies. Casey notes that Western philosophy’s conceptions of
memory—and, indeed, thought more generally—have, for centuries, been colored
technology, memory, and metempsychosis 223

and informed by the model of representation (Casey 1977, 187; 1993, 166–167). The
representational model for understanding memory has, for Casey, “been given a privi-
leged place in thinking about memory überhaupt.” A representational model leads to
an understanding of memory as “reproductive,” and as a result “we witness a working
presumption that all significant human remembering . . . is at once representational
and founded on isomorphic relations between the representing content of what we
remember and the represented thing or event we are recalling” (Casey 1993, 166).
Dreyfuss notes how a similar ideological frame determines how human actions have,
until recently, been understood, but proposes, in contrast to this, a phenomenological
approach that resists the idea that representation is a prerequisite for action. “When every-
day coping is going well one experiences something like what athletes call flow . . . One’s
activity is completely geared into the demands of the situation” (Dreyfuss 1996, 35).
Such “skillful coping does not require a mental representation of its goal” but rather,
quoting Merleau-Ponty, “[a] movement is learned when the body has understood
it, that is, when it has incorporated it into its ‘world,’ and to move one’s body is to
aim at things through it; it is to allow oneself to respond to their call, which is made
upon it independently of any representation” (Merleau-Ponty 1962, 139, quoted in
Dreyfuss 1996, 37, emphasis added).

Memory beyond Representation

As an example of nonrepresentational memory Casey gives the scenario of when one

sees again someone not seen for decades:

I do not check out an inner image, or other representation, of my friend: his face and
body give themselves out as already (and instantly) recognizable to me, as featuring
familiarity on their very sleeve, as it were. Here what is remembered, far from being
continued in intrapsychic space, suffuses what I perceive as I perceive Burton [the
friend]; and in this natural context Bergson is right to say that “perception is full of
memories.” (1993, 167–168)

The implied dialogism (or more accurately polylogism) of recognition is differently pre-
sented by Varela and colleagues, yet the refusal of the “stored object” model of memory
and recognition remains a strong trope. Visual sensory data from the eye is met by

activity that flows out from the cortex. The encounter of these two ensembles of
neuronal activity is one moment in the emergence of a new coherent configuration,
depending on a sort of resonance or active match-mismatch between the sensory
activity and the internal setting at the primary cortex. The primary visual cortex is,
however, but one of the partners in this particular neuronal local circuit at the LGN
level. . . . Thus the behaviour of the whole system resembles a cocktail party conver-
sation much more than a chain of command. (Varela et al. 1993, 96)
224 bennett hogg

Husserl underwrites the distinction between memory and imagination by claiming that
memory retains while imagination protends, but some slight self-reflection will show
this to be too baldly schematic. Casey argues, rather, that memory and imagination,
while being distinct mental phenomena in their own rights, are “indispensable”
to one another. Their “mutual inclusiveness and co-iterability” and “their inbuilt
co-operativeness” (Casey 1977, 194–195). Memory and imagination are not just “difficult
to disentangle” from one another, “each act is indispensable in its collaboration with the
other . . . not just essential but co-essential, essential in its very co-ordination with
the other” (Casey 1977, 196).3 For Harpur, this interconnectedness of imagination and
memory is also highly significant. Memory does not simply keep records of past events
like files but:

mixes them up with fantasies and imagined events . . . It even makes things up
altogether, like imagination, and points to the fact that Mnemosyne (Memory) is
the mother of the Greek Muses and infers from this “that memory is pregnant with
imaginative power.” (2002, 215–216)

Harpur’s resistance to thinking of memory in terms of file storage is congruent with the
view of contemporary cognitive science. Memories “are not stored intact in the brain
like computer files on a hard disk” but are built up from different elements in a process
that is also “open to the world,” in other words, a process that incorporates environmental
and social elements, adjusting and reconstructing memories in the light of the con-
temporary situation and conditions (Auyang 2000, 283). Sanders underlines this, noting
how beliefs, ideas, and memories are not only in brains but also out in the world, to the
extent that we put traces into the world, “which changes what [we] will be confronted
with the next time it comes around,” so that our memories are not only carried around
inside of us but are inscribed, as it were, in our worlds (Sanders 1996, paragraph 36).
This brings us to an ecological sense of memory—and of imagination—and though
J. J. Gibson himself was cautious about “the muddle of memory,” excluding recognition
from his understanding of perception, Auyang proposes that an understanding of memory
in terms of an ecosystem of thought is a viable project (Auyang 2000, 300–301). It should
now be clear that the ideas of Varela and colleagues, as well as Casey, Dreyfuss, and
Sanders reported earlier, are broadly compatible with understanding memory and
imagination as participating elements within a greater ecosystem of consciousness that
includes intellection, embodiment, action, agency, and sociocultural relations.

Metempsychosis

We can begin to re-evaluate the cultural imaginary of sound recording by reflecting on

the first page of Proust’s À la recherche du temps perdu, surely one of the most thorough-
going and insistent engagements with memory and imagination in the Western literary
technology, memory, and metempsychosis 225

canon. Standing as an emblem at the beginning of Proust’s novel is an implicit reference

to metempsychosis. Having drifted to sleep while reading, it seems to the book’s narrator

that I myself was the immediate subject of my book . . . This impression would persist
for some moments after I awoke. . . . Then it would begin to seem unintelligible, as the
thoughts of a former existence must be to a reincarnate spirit. (Proust [1913] 1985, 3)

“My book” here refers directly to the one the young Proust was reading as he fell asleep,
and though Proust does not claim that what happens when he drifts off to sleep while
reading is directly mnemic, his placing of this figure at the very beginning of a book in
which he and his memories are the immediate subject invites a double reading of what
“the immediate subject of my book” intends. In Proust’s conflation of memory and reincar-
nation memory is positioned less like a process of recording and more like the expe-
rience of being otherwise re-embodied. Somewhat later in the first chapter Proust writes:

there is much to be said for the Celtic belief that the souls of those whom we have
lost are held captive in some inferior being, in an animal, in a plant, in some inanimate
object, and thus effectively lost to us until the day (which to many never comes)
when we happen to pass by the tree or to obtain possession of the object which
forms their prison. Then they start and tremble, and they call us by our name, and
as soon as we have recognized their voice the spell is broken. Delivered by us, they
have overcome death and return to share our life. (47)

This immediately precedes the famous incident of the madeleine, in which voluntary
memory, “the memory of the intellect” that “preserve[s] nothing of the past itself ” is
presented as inferior to the so-called memoire involontaire that spontaneously and unex-
pectedly takes over the whole being, triggered not by a volitional intention but by an
encounter with an object charged with memory. The profound sense of joy that results
from the tea-soaked madeleine is figured in terms of an invasion by “something isolated,
detached, with no suggestion of its origin.” If the book is all memory, and returning from
the book (as it were) is to be reincarnated, it follows that Proust intuits a strong connec-
tion between memory and the transmigration of souls in his novel.
The persistence of a soul, or a disembodied personality or intelligence after bodily
death is congruent with the idea of metempsychosis—the transmigration of a soul from
one embodiment to another. Sound and music are full of such references: the per
sistence of the voice of Echo, or the autotransformation of the nymph Syrinx into a Pan-
pipe in Ovid’s Metamorphoses; the voice of the murdered younger sister in the Scottish
traditional ballad The Twa Sisters or, as it is sometimes known, Binnorie, or its Germanic
equivalent set down in Grimm’s Household Tales and set to music by Mahler as Das
klagende Lied, which emerges from the playing of an instrument made from her mortal
remains—a harp or fiddle strung with her hair or a flute made from one of her bones;
spirit mediums speaking with the voices of the dead they claim to be channeling, spirits
of the departed temporarily taking over the physical bodies of the medium to speak
through them; there is even a hint of the voice qua soul or personality as a vehicle of
226 bennett hogg

transmigration in the behavior of the Cheshire Cat in Disney’s 1951 version of Alice in
Wonderland, who appears as a disembodied voice, and persists momentarily as a voice
and a fading smile (the detached outlet of a voice) after its body has dissolved. After the
telematic technologies of phonography and telephony made their appearance around
1876–1878, one immediate set of cultural constellations in which they were quickly
bound up involved the survival of death, the supernatural, and a means to communicate
with the dead. The conflation of voices disembodied by recording, telephony, or radio, with
the voices of the dead has been a clearly identifiable trope in fantasy literature and
popular culture almost since the moment of these technologies’ inventions (Connor
2000; Kittler 1999; Sconce 2000; Weiss 2002; Hogg 2008), “a historically mediated
imaginary . . . in which death is part of a cluster of ideas that gather around the image of
technology” (Danius 2002, 181; see also Peters 1999, 137–176). This is also another clear
instance of how emergent technologies are often colonized by ideas that predate their
invention and that was mentioned earlier in this chapter.
Memory, or more properly remembrance, is intimately linked with cultural practices
around the death of someone, and so it is perhaps not surprising that technologies that
can record moments in time—such as the movie camera, photography, sound recording—
should step in as extensions of the mnemic capacity. But in the case of sound recording,
this mnemic capacity is articulated through an apparent reappearance of the living
presence of the departed. In gruesomely vivid terms, the narrator of Renard’s Death and
the Shell (1907) evokes the image of departed friends seemingly brought back to life by
listening to recordings of their voices on a phonograph:

[O]n Wednesday the dead spoke to us. . . . How terrible it is to hear this copper throat
and its sounds from beyond the grave ! . . . it is the voice itself, the living voice, still
alive among carrion, skeletons, nothingness.
(quoted in Kittler 1999, 53, emphasis added)

The voice then, as cipher of the soul, passes between the worlds of the dead and of the
living through the mediumship of sound recording. Though memorial in its tone, this
seems more like a visitation or a ghostly encounter than a memory.

Ghosts in the Machine? Uncanny

Connections between Telegraphy
and Phonography

On the subject of ghosts, telegraphy served as a model for spirit communication from
the time of the so-called Rochester Rappings of 1848. Here, two young sisters claimed to
be able to communicate with spirits who knocked once for yes and twice for no in answer
to questions put to them verbally. From this grew a spiritualist movement sometimes
technology, memory, and metempsychosis 227

characterized as “the spirit telegraph” (Sconce 2000, 21–28), “such fantastic visions of
electronic telecommunications demonstrat[ing] that the cultural conception of a tech-
nology is often as important and influential as the technology itself ” (Sconce 2000, 27).
From this it is interesting to speculate on what we might think of as phonography’s
ghost, the technology that phonography left behind, as it were, the technology that
Edison was intentionally working on when he chanced upon phonography: the auto-
matic telegraph repeater.
The distances over which Morse telegraphy was possible were originally limited by
the resistance of the telegraph wires, which led to a degradation of signal to the point
that, in order to transmit across the massive distances of the United States, repeater
stations were needed in which a clerk would transcribe an incoming signal and then
retransmit it manually to the next repeater station, and so on across the whole country.
Not only did this take time and manpower, but it also meant that errors of reception and
transmission, both technological and human, could creep into the system. Edison was
working on a system whereby an incoming Morse signal would cut dots and dashes into
a moving paper strip which would then pass through a mechanical reader. This reader
would register, by means of a moving needle, the sequence of short and long signals,
transmitting them onward almost instantaneously and, in theory, as absolutely exact
copies. It was in experimenting with the increase in speed at which accurate relays were
possible that Edison is said to have chanced upon the idea of recording sound—more
particularly the human voice (Kittler 1999, 27–28; Wile 1977, 10–13).
The original situation in which telegraph messages could be sent over long distances
was that a human being would receive and transcribe the incoming message, which they
would then retransmit in a second act of “writing.” The technology Edison was working
on, though, moved from an imaginary of transcribing human bodies toward an imagi-
nary in which information passes smoothly and automatically across great distances
without passing through the body of other humans. Rather than a series of dictations
and reinscriptions—each of them conceived, according to the traditional imaginaries of
writing, as records—a disembodied energy passes from its origin to its destination with-
out seeming to be recorded at all (though in Edison’s repeater it was, in fact, recorded,
but not in writing by a human hand). To then seize on these transitional moments (the
telegraph repeaters) in the relay of information and isolate them as a recording technology
is, in some senses, to turn a means of transmission into a means of recording; records
that were originally only made for the purposes of resending onward become things in
themselves. Recording, then, according to this alternative genealogy, is the capture of an
energy during a moment of its transmigration. If the story ended there we would have
little more than an amusing observation, but much of Edison’s work was conducted in a
milieu of spiritualist research and a grasping at electromagnetic explanations of avow-
edly psychic and supernatural phenomena, not with the aim of debunking myths and
superstitions, but of arriving at scientific justification for such beliefs (Kahn 1994, 76–78;
also Connor 2000, 362–393). As Connor puts it, “The commerce between the disembodied
and the re-embodied, the phantasmal and the mechanical, is a feature in particular of
the scientific understanding of the voice, but it [is] apparent too in the languages and
228 bennett hogg

experiences of the Victorian supernatural, which coil so closely together with that work
of scientific imagining and understanding” (Connor 2000, 363).
And had the phonograph not made speech, “as it were, immortal” (Johnson 1877)?
Like the soul?
When we listen to an audio recording, is it really like remembering, though? Although
the type of recording affects how we experience it—the voice of a departed loved one, a
string quartet by Bartók, and the undefined distant rumble of traffic produce very dif-
ferent experiences—one general distinction between listening to a recording and
remembering is that it is not necessary to re-experience something in the time it would
take to happen in order to remember it. I can remember my wedding, for example, in an
instant, whereas it occupied the best part of a whole day. I remember hearing Mahler’s
Seventh Symphony in Newcastle City Hall similarly, as though it were, in mental time, a
moment. Memory seems to compress and codify experiences, at some level. And though
the etymology of phonography is concerned with the writing of sound, playing back a
recording is nothing like reading, even if “reading” is a viable metaphor for what the
machine is doing. Listening to a sound recording can seem like something altogether
more intersubjective, and though it can evoke memories it feels more like an encounter
with a sounding presence than recall per se. We see this in the “terrible” experience that
Renard’s narrator has with “the voice itself, the living voice, still alive among carrion,
skeletons, nothingness” (Kittler 1999, 53).

Proust: Telephone and Camera

Rather as Casey conceives of imagination and memory as “not just essential but
co-essential, essential in [their] very co-ordination with [each] other,” Harpur finds Proust’s
memoire involontaire—the memory that seems to surge into consciousness unbidden,
triggered by a chance encounter such as the madeleine dipped in tea—“analogous to
imagination . . . the relationship between recollection and imagination is so richly inter-
fused that it is as difficult to separate them as it would be to separate, in Proust’s novel,
autobiography and art” (Harpur 2002, 212). Given that sound recording has been tra-
ditionally associated with recollection yet, as we have seen, the match is very far from per-
fect, it is useful to look briefly at two other technologies that occupy important positions
in Proust’s elaboration of his remembering of times past. That the something that surges
up as involuntary memory is “detached” and “isolated” finds a resonance in Proust’s
experiences with his beloved grandmother, the hearing of her voice on the telephone
isolated from seeing her face, and his view of her some short time later before she sees
him and is able to return his gaze. In the latter instance, it is Freud’s other prosthesis of
memory, the camera, that is invoked. Whereas the human eye is “marked by affection
and tenderness . . . necessarily refracted by preconceptions” it “prevents the beholder
from seeing the traces of time in the face of a loved one. . . . Memory thus prevents truth
from coming forward” (Danius 2002, 15). The camera eye, though, invoked by Proust to
technology, memory, and metempsychosis 229

account for the shock of seeing the “red-faced, heavy and vulgar, sick, vacant . . . dejected
old woman whom I did not know” (Proust quoted Danius 2002, 15), “carries no thoughts
and no memories, nor is it burdened by a history of assumptions. For this reason, the
camera eye is a relentless purveyor of truth” (Danius 2002, 15). Though the horror dis-
sipates as soon as eye contact is made, the moment experienced “hint[s] at her impending
death.” Here, though, it is not the photographic record that is deathly (as in Barthes’s
Camera Lucida, with its “anterior future” case of “he is dead, and he is going to die”
(Barthes 2000, 96), but the technological gaze of the camera. Memory, rather than a
dead record, in fact moderates vision and hearing, humanizes and warms it. Beckett
takes the same episodes and, like Danius, brings out the ways in which memory mediates
rather than models recording. He writes, “the laws of memory are subject to the more
general laws of habit” (Beckett [1931] 1999, 18–19), and it is habit, when he visits his
grandmother after the experience of their telephone conversation has so unsettled him,
which is “in abeyance, the habit of his tenderness for his grandmother . . . the notion of
what he should see has not had time to interfere its prism between the eye and its object.
His eye functions with the cruel precision of a camera; it photographs the reality of his
grandmother” (27–28).
The telephone, though, is a more productive technology to examine in terms of
memory, imagination, and the transmigration of souls. As already noted, the telegraph
had required the intermediation of human bodies to transmit over large distances
whereas, as Connor puts it, the telephone “allowed for intimate communication between
two interlocutors alone” (Connor 2000, 362). In its near immediacy of transmission,
and its avoidance of writing and reinscription/retransmission, the telephone fails as a
technological prosthesis of memory, or as a metaphor for memory. Connor, though,
notes how the “striking co-incidence in time” of the discovery of the two inventions
(within a year of one another) “allows us to see the two inventions as different forms of,
or relays in, some single, but polymorphous prosthetic apparatus” (362). When we
consider Connor’s suggestion in the light of the telegraph as not only a technological
forebear of phonography, and the telematic communication system preceding
telephony, but also as a metaphor for communication with the souls of the dead (the
Spirit Telegraph), this “polymorphous prosthetic apparatus” allows for an imaginary in
which memory and metempsychosis fold over one another, but where at least from the
technological perspective metempsychosis, as the motion of an energy through differ-
ent embodiments in media, seems a more plausible model of what is happening with
sound recording technologies. I do not believe that, in phenomenological terms, our
experience of sound recordings is like the retrieval of files from a storage medium, and
neither is our experience of memory.
It is worth noting that memory is nothing like a unitary phenomenon but is, instead, a
whole range of sometimes independent, sometimes interconnected cognitive and
embodied processes (Casey 1993, 165–169). This is one reason why it does not really work
as a metaphor of recording, though recording does seem—on the surface of things—to
work as a metaphor of remembering. Recording sounds and playing them back seems
like a mnemic process, but our experience of listening to such recordings is very different,
230 bennett hogg

as already noted. As Peters has described it, there is a cultural, viable dimension to the
recorded voice that comes over as in a sense “oracular,” a direct transmission from a
being whose consciousness is elsewhere and other to our own, and with whom there is
no possible dialogue. Though to be in the presence of a recorded voice is in many
respects to experience another subjectivity, it is not in any real sense an intersubjectivity
of participants with equivalent status. We are told things by the recorded voice, but there
is no sense of any interlocution. This gives rise, in Peters’s account, to voices whose con-
tents are available for hermeneutic readings, but not for dialogic interrogation. As such,
the voices of the dead—which we encounter most conventionally of course through
sound recordings—are “the paradigm case of hermeneutics: the art of interpretation
where no return message can be received” (Peters 1999, 149).

Recordings

Sounds, though, are recorded. If we experience them as oracles, voices of the dead,
idealized memories, writings, they nevertheless remain as records (rather than the more
complicated memories). The content of a memory does not remain constant in itself,
but is shaped, molded, even materially changed as it combines with other associations,
joins forces with other bits of information in a story that may well conflate different
events without even realizing that this has happened. In Freud’s theory of screen memo-
ries, for example, modifications to memory are repressed as modifications and “remem-
bered” as having actually occurred; such transitive and unreliable qualities of human
memory have already been extensively noted. Memory-content, then, has no independent
existence, and no permanence; it is as much an effect of the ecosystem of which it is a
part as it is an agent that has an effect on that ecosystem. Sound recordings, though,
seem self-sufficient and, as Edison himself put it, are “as it were, immortal” (Johnson 1877).
We know this, though, not to be true.
In the first case, recordings physically deteriorate, they are not immortal in any reliably
ontological sense. In this respect, they seem once more to resemble human memory.
Many of us will have known friends or relatives with Alzheimer’s disease, or similar
degenerative neurological conditions, and the loss of memory for these patients is often
likened to a kind of (premature) death; someone is said to have “left” long before they
actually died, for example. Listening to a very decayed phonograph recording—such as
the one supposed to be of Brahms playing the piano—seems to suggest the fogged and
abrasive feeling of not remembering. In the final moments of Ibsen’s play Ghosts, for
example, Osvald is paralyzed in the final stages of syphilis, and rapidly collapsing
into dementia. His mother asks him if there is anything he wants, and he asks for “the
Sun,” which has just risen in a bright dawn after days and days of rain and darkness.
When his mother questions this, he repeats, “the Sun,” and then, like a cracked record,
repeats again “the Sun . . . the Sun” without any change of expression. Ibsen directs that
Osvald “repeats dully and tonelessly,” and then “tonelessly as before,” In contrast, his
technology, memory, and metempsychosis 231

mother—still human, not reduced to a broken machine—is given extensive and detailed
indications of the gamut of emotion that should be expressed: she “trembles with fear”;
“throws herself on her knees”; “tears her hair with both hands”; “whispers as though
numbed”; “shrink a few steps backwards and screams”; “stares at him in horror” (Ibsen
[1881] 1973, 97–98). Juxtaposed against one another, then, are Osvald’s deathly, machine-
like voice uttering a single sound like a broken record, and the terrible range of violently
shifting, painfully human emotions specified by Ibsen and performed for the audience
by Osvald’s mother.
Ghosts was written in 1881, only four years after the invention of the phonograph, and
there is no evidence that Ibsen conceived of Osvald’s expressionless repetitions because
he had ever heard a broken phonograph cylinder playing. However, the fact that Osvald’s
loss of his personality is performed through a mechanical and expressionless cycle of
repetitions does show a strong congruency in the cultural imagination between a dehu-
manized body and the sound of a broken machine, particularly one designed to sustain
beyond the immediate present the human voice, that atavistic marker of presence and
soul. The “talking machine” was a sensation in the late nineteenth century because it
managed to do what no machine prior to it had been able to do4—it spoke. The distorted,
identical repetitions of the cracked record, though, present nothing but a machine to the
listener, and the suspension of disbelief that sustains the persistence of a human pres-
ence is broken.
Understandably, perhaps, there is a sense of pathos around a recording that has
degenerated. There is a constellation of anthropomorphization that encompasses loss of
memory, a sense of dehumanization of the broken voice, and the transience of all things
human. At the same time, there is opened the possibility for a further anthropomorphi
zation of the fading signal because the recorded speech, like memory, is not immortal,
whatever Edison said about it in 1877. It is worth noting, though, that loss of conscious
memory is only “tragic” when it is framed as such; forgetting is, in many respects, an
essential condition of being able to function in the world without collapsing beneath the
onslaught of multitudinous and mostly distracting or irrelevant connections and asso-
ciations between disparate pieces of information.
But human memory does not only fade through a process of physical damage, or
through the natural entropy of biological matter. Imagination fills in gaps, creates mem-
ories that never happened, forms connections that never pertained in “the real world.”
As I pointed out at the very start of this chapter, imagination is a phenomenon that
shines a critical light on the too easy association of recording and memory and brings a
subtle pressure to bear to explore other kinds of relations. For instance, we tend to think
of analog recordings as more or less inert material onto which a signal is recorded. We
experience the playback of a recording as a separation of the signal from the medium, in
part because we recognize the sound as a voice or an instrument or a steam train which
exists, or existed, outside of the recording. Additionally, we experience the sounds as
being physically separate from the medium as they emerge from loudspeakers, not the
disc or tape itself.5 All of this serves to reinforce the notion that there is the material
medium on the one hand, and the signal on the other, the medium serving only as a
232 bennett hogg

partial and temporary means for the disembodied energy of the sound “itself ” to be
transmitted through. Again, the transmigration of souls seems a more apposite model
for this experience than does human memory.
But is this the only way to conceive of this? In the case of a vinyl recording, for
instance, the “curves of the needle,” as Adorno described them ([1928] 2002), are a phys-
ically integral part of the disc, and their playing back is, arguably, simply the sound the
disc itself makes under the particular conditions of its playback via a turntable and
cartridge. The voice or music we hear is a sonic property of a solid piece of matter; there
is no real separation, any more than the sound of a cymbal, or a piano, could be con-
sidered as a separate property of the metal or the piano string. The sound of a cymbal or a
piano is the articulation of the sonic properties of its constitutive materials as they are
held in a particular state, and under particular conditions of excitation. Can we think of
the sound of a recording in terms of being simply the sonic properties of the disc, or the
tape, itself, once it has been excited in a suitable way? To do so allows us to reconfigure,
in a less anthropomorphizing way, our understanding and our imagination of the
fading signal.
For the surrealist poet and thinker André Breton, “the marvelous” is that which
stands outside of the natural order, which confounds rational expectations we may have
of the world, and which encompasses automata,6 the so-called fixed-explosion, and
objective chance (Foster 1997). He proposes the ruin as an instance of the marvelous in
which nature retakes culture in a reversal of the humanistic notion that culture has the
domination over nature. In this, he mirrors the very aim and objective of surrealism as
“the future resolution of these two states, dream and reality, which are seemingly so contra-
dictory” as it is stated in the first of the manifestoes of surrealism (Breton [1924] 2004,
14). This is a “resolution” that tends more toward a shift of balance away from the cultur-
ally and traditionally dominant side of the binarism toward placing more significance
on the generally subordinate term, rather than a genuine leveling out. The ruin, as a formu-
lation modeled on this pattern of thinking, shows a cultural construction succumbing
to nature, just as automatic writing models the dictation of the unconscious (figured
naively in much surrealism as more “natural” than conscious thought) against rationally
constructed narratives. Automatic writing, or other forms of articulating “pure psychic
automatism,” is “[d]ictated by thought, in the absence of any control exercised by reason,
[and is] exempt from any aesthetic or moral concern” (Breton [1924] 2004, 26).
Taking this as a model, can we take another view of the apparently fading signal, with
its pathetic anthropomorphic associations, and see not the fading of human memory,
but simply the medium reasserting itself as nature reasserts itself in the ruin? Evading
the separation of signal and medium in this way makes it possible to imagine the record
as an object where signal and medium are completely integrated. Instead of a human
memory, the sound of the record evidences something more like the soul of one
departed that has migrated into an apparently inanimate, “inferior” object, as Proust
states “the Celtic belief ” to be. It is important, I think, to ensure that one thinks in terms of
the “reality,” as it were, of the nonseparation of signal and medium, and the “imaginary”
of the metempsychosis of sound, just as memory is also an “imaginary” of recording.
technology, memory, and metempsychosis 233

As an alternative imaginary for recording, the notion of an entity migrating into an

object and awaiting the right conditions for its release seems in fact no more or less
fanciful than thinking of it in terms of a memory.
Additionally, contemporary creative practices, such as sampling and remixing,
tend to treat recordings as mobile and transferable entities that can be detached and
reassembled, much closer to the cooperation of imagination and memory than of
memory conceived in isolation. Such practices also model the mobility of energies
and identities implicit in the transmigration of souls; musical creativity, at least where
recordings form the prima materia of a practice, seems in many respects more like
assembling (dare I say summoning?) a community of “souls” than a stitching together
of memories, insofar as the materials are put into relation with each other, in which
they mediate and interact with one another, rather than being selectively recalled as
isolated quanta of information.
The reduction of voice to the circulation of “detachable phonemes” is an attribute of
what Ivan Kreilkamp (1997) has termed “the phonographic logic” of the voice, which he
elaborates through a close reading of Conrad’s novella Heart of Darkness. Kreilkamp
notes how, for early listeners to the phonograph, there was something “disturbingly
anti-mimetic” about the phonograph, in which the “whole person is not made immortal”
just “part-objects, signs standing in for the whole” (221). Such a detachment, character-
izing the recording as “a disturbing fragmentation of the human subject into circulating
bits of sound,” (221) seems very different to the intimacy and incorporation of sound
that characterizes a memory; we tend to talk about our memories, after all. Kreilkamp’s
observations, made in connection with the ways that Conrad figures the voice in his
novella Heart of Darkness, proposes a phonography of fading, broken, dissociated voices
that combine into an uncanny present reality colored and inflected with the past and the
coexistence in the present of the deceased.
Taken all-in-all, then, I have tried to make a case in which imagination is brought to
bear on both sound and memory, destabilizing the assumed relationships between
sound recording and memory. Imagining beyond the established model of recording as
a metaphor of memory, and of memory as a metaphor of recording, I have sought after a
fuller, more inclusive way to conceive of sound recording afforded by such a critical use
of imagination. Drawing on various sources in which sound recording features in the
cultural imaginary, I have proposed that metempsychosis, the transmigration of souls,
though admittedly controversial, may afford a productive way to approach the ways that
sound recordings have been deployed, and may have been consciously and uncon-
sciously understood, in the period since their inception.

Notes
1. This is of course a more complex situation; the artifacts arising from damage to, or degra-
dation of certain media (scratches, glitches, drop-outs, etc.) have a perceptible effect on the
status of the recorded signal, and the process by which a signal is reanimated and received
234 bennett hogg

has much to do with when and by whom it is heard. Arguably, though, that which is essential
in the original recorded signal is retained to a high degree.
2. At the very beginning of chapter 4 of Beyond the Pleasure Principle, Freud writes, “What
follows now is speculation, speculation often far-fetched, which each will according to his
particular attitude acknowledge or neglect. One may call it the exploitation of an idea out
of curiosity to see whither it will lead” (Freud [1920] 1961, 24).
3. Casey notes three other instances where this is the case—screen memories, dreams (Freud),
and time-consciousness (Husserl) (Casey 1977, 196).
4. In du Moncel’s early account of the phonograph (du Moncel 1879), he writes of it in relation
to the telephone and the microphone but also, as an afterthought, almost, to Faber’s
Speaking Machine, a mechanical organ-like device that seems to have attempted to physically
recreate, through bellows and different shaped pipes and resonators, the phonemes of
human speech.
5. I refer specifically to analog recording systems because the issues of signal and medium are
perhaps more explicitly foregrounded than with digital systems, though the differences
between digital and analog recordings with respect to the current discussion is not as large
or significant as might be imagined. In addition, the culturally significant resurgence of
cassette tape and vinyl discs over the past ten years or so, especially in DIY culture and
certain areas of experimental music and sound art, means that these forms of recording are
still very much a significant element in the audio culture of today.
6. As Foster (1997) shows, eighteenth- and nineteenth-century automata such as The Little
Writer, The Chess Playing Turk, and the Harpsichord Player exercised a complex fascina-
tion over many of the surrealists, and André Breton in particular.

References
Adorno, T. W. (1927) 2002. The Curves of the Needle. In Essays on Music, edited by R. Leppert,
271–276. Berkeley: University of California Press.
Adorno, T. W. (1934) 2002. The Form of the Phonograph Record. In Essays on Music, edited by
R. Leppert, 277–282. Berkeley: University of California Press.
Armstrong, T. 1998. Modernism, Technology and the Body. Cambridge: Cambridge
University Press.
Auyang, S. Y. 2000. Mind in Everyday Life and Cognitive Science. Cambridge, MA, and London:
MIT Press.
Barthes, Roland. 2000. Camera Lucida. Translated by R. Howard. London: Vintage.
Beckett, S. (1931) 1999. Proust and Three Dialogues. London: John Calder.
Blake, W. (1927) 1975. Poetry and Prose of William Blake. Edited by G. Keynes. London: The
Nonesuch Library.
Breton, A. (1924) 2004. Manifesto of Surrealism. In Manifestoes of Surrealism, edited by
A. Breton, translated by R. Seaver and H. R. Lane, 3–47. Ann Arbor: University of Michigan
Press, Ann Arbor Paperbacks.
Casey, E. S. 1977. Imagining and Remembering. Review of Metaphysics 31 (2):187–209.
Casey, E. S. 1993. On the Neglected Case of Place Memory. In Natural and Artificial Minds,
edited by R. G. Burton, 165–185. Albany, NY: State University of New York Press.
Connor, S. 2000. Dumbstruck: A Cultural History of Ventriloquism. Oxford and New York:
Oxford University Press.
technology, memory, and metempsychosis 235

Danius, S. 2002. The Senses of Modernism: Technology, Perception, and Aesthetics. Ithaca,
NY: Cornell University Press.
Draaisma, D. 2000. Metaphors of Memory: A History of Ideas about the Mind. Translated by
P. Vincent. Cambridge: Cambridge University Press.
Dreyfuss, H. L. 1996. The Current Relevance of Merleau-Ponty’s Phenomenology of
Embodiment. Electronic Journal of Analytic Philosophy 4. http://ejap.louisiana.edu/EJAP/
1996.spring/dreyfus.1996.spring.html. Accessed May 15, 2017.
du Moncel, T. A. L. vicomte. 1879. The Telephone, the Microphone and the Phonograph. London:
C. Keegan Paul.
Foster, H. 1997. Compulsive Beauty. Cambridge, MA, and London: MIT Press.
Freud, S. (1920) 1961. Beyond the Pleasure Principle. In The Standard Edition of the Complete
Psychological Works of Sigmund Freud XVIII, edited and translated by J. Strachey, 7–64.
London: Hogarth.
Freud, S. (1924) 1961. A Note on the Mystic Writing-Pad. In The Standard Edition of the
Complete Psychological Works of Sigmund Freud XIX: The Ego and the Id and Other Works,
edited and translated by J. Strachey, 226–232. London: Hogarth.
Freud, S. (1930) 2004. Civilization and Its Discontents. Translated by D. McLintock.
Harmondsworth, UK: Penguin.
Harpur, P. 2002. The Philosophers’ Secret Fire: A History of the Imagination. London: Penguin.
Hogg, B. 2008. The Cultural Imagination of the Phonographic Voice 1877–1940. PhD thesis,
University of Newcastle upon Tyne.
Ibsen, H. (1881) 1973. Ghosts. Translated by M. Mayer. London: Eyre Methuen.
Johnson, E. H. 1877. A Wonderful Invention—Speech Capable of Indefinite Repetition from
Automatic Records. Scientific American 37 (20): 304.
Kahn, D. 1994. Death in Light of the Phonograph. In Wireless Imagination: Sound, Radio,
and the Avant-Garde, edited by D. Kahn and G. Whitehead, 69–103. Cambridge, MA:
MIT Press.
Kittler, F. 1999. Gramophone Film Typewriter. Translated by G. Winthrop-Young and M. Wutz.
Stanford, CA: Stanford University Press.
Kreilkamp, I. 1997. A Voice without a Body: The Phonographic Logic of Heart of Darkness.
Victorian Studies: An Interdisciplinary Journal of Social, Political, and Cultural Studies 40 (2):
211–244.
Levin, T. Y. 1990. For the Record: Adorno on Music in the Age of Its Technological
Reproducibility. October 55: 23–47.
Merleau-Ponty, M. 1962. The Phenomenology of Perception. Translated by C. Smith. London:
Routledge and Keegan Paul.
Peters, J. D. 1999. Speaking into the Air: A History of the Idea of Communication. Chicago and
London: University of Chicago Press.
Proust, M. (1913) 1985. Remembrance of Things Past, Vol. 1: Swann’s Way and A Budding Grove.
Translated by C. K. S. Moncrieff and T. Kilmartin. Harmondsworth: Penguin.
Sanders, J. T. 1996. An Ecological Approach to Cognitive Science. Electronic Journal of Analytic
Philosophy 4. http://ejap.louisiana.edu/EJAP/1996.spring/sanders.1996.spring.html. Accessed
May 15, 2017.
Sconce, J. 2000. Haunted Media: Electronic Presence from Telegraphy to Television. Durham,
NC, and London: Duke University Press.
Terdiman, R. 1993. Present Past: Modernity and the Memory Crisis. Ithaca, NY: Cornell
University Press.
236 bennett hogg

Varela, F. J., E. Thompson, and E. Rosch. 1993. The Embodied Mind: Cognitive Science and
Human Experience. Cambridge, MA, and London: MIT Press.
Warnock, M. 1976. Imagination. London: Faber & Faber.
Weiss, A. S. 2002. Breathless: Sound Recording, Disembodiment, and the Transformation of
Lyrical Nostalgia. Middletown, CT: Wesleyan University Press.
Wile, R. R. 1977. The Wonder of the Age: The Edison Invention of the Phonograph. In
Phonographs and Gramophones, 9–48. Edinburgh: Royal Scottish Museum.
chapter 12

M usica l Sh a pe
Cogn ition

Rolf Inge Godøy

Introduction

This chapter shall explore notions of shape in our experiences of music, “shape” denoting
various geometric figures or images that we may associate with the production and/or
perception of music. For instance:

• Shapes of sound-producing body motion, such as of hands hitting a drum or

bowing on a violin, or the shape of the mouth in blowing on a trumpet or in singing
a high-pitched vowel.
• Shapes of sound-accompanying body motion, such as of hands gesticulating to a
tune, or of heads nodding, feet stomping, or the whole body swaying to the beat of
the music.
• The widespread use of shape-related metaphors such as flat, pointed, round,
smooth, and so forth, when speaking and writing about music, both in everyday
and more music analytic contexts.
• The profusion of graphical shape images for representing features of sound in
musical acoustics, such as various images of waveforms and spectra.
• Shape features in graphical scores and composition sketches, but also apparent in
Western, common practice notation such as note patterns on score pages.
• Multimodal images of shapes in musical imagery, that is, combining images of
sound, motion, and vision in salient recollections and/or inventions of music in
our minds.

The basic tenet of this chapter is that what may be called shape cognition is not only
deeply rooted in our experiences of music and in musical imagery but also has the
potential to enhance our understanding of music as a phenomenon, to contribute to
238 rolf inge godøy

various domains of music-related research, and to have practical applications in musical

and multimedia artistic creation.
Yet, given this plethora of shape instances in music-related contexts, there has, to my
knowledge, been relatively little focus on shape cognition as such in music. There are
some obvious reasons for why shape cognition in music, until now, has not been the
main focus of a more concerted research effort:

• Handling shapes in research is inherently challenging because shapes are distributed;

that is, shapes typically have curves with peaks and troughs and all kinds of twists
and turns, hence, shapes are not reducible to singular values or abstract symbols,
and require conceptual and technological tools previously not readily available in
music-related research.
• With the focus on abstract symbolic representations in Western musical culture,
for instance, on discrete pitches and durations, concepts for continuous sound and
body motion have been less developed.

However, I believe it is now possible to enhance our understanding and applications

of musical shape cognition because of significant technological and conceptual
advances: using state-of-the-art technologies we can record, process, and experiment
directly with music-related shapes, meaning we can now handle chunks of temporally
unfolding sound and motion as wholes. Also, we can map “hard” numerical data of
sound and body motion shapes to subjective experiences; that is, we can link meta-
phorical labels of shape with signal-based sound and motion data.
Additionally, through research on music-related body motion during the last decade,
we now have a better understanding of how sound feature shapes and body motion
shapes are connected in musical experience, providing us with hands-on skills for a
systematic and extensive exploration of musical shape cognition. In other words, we see
musical shape cognition as a unifying paradigm for handling complex, information
rich, and temporally unfolding musical sound and body motion in a holistic manner,
for capturing all kinds of musical features, from basic acoustic and motion-related
features to higher-level stylistic and affective features, as more “solid” and “instantaneous”
overview images.
This chapter will include a summary of some prominent cases of music-related shape
cognition, but shall take shape cognition in music further by trying to understand it as
an active process. With a so-called motor theory perspective, I shall argue that musical
shape images can emerge from sound-producing body motion and/or from active tracing
(e.g., moving to the beat, gesticulating, or miming sound-production) in listening to, or
in imagining, musical sound. My main tenet is that active tracing of sound features as
shapes is integral to the perception and cognition of music. In Figure 12.1, we can see a simple
example of this in what I have called sound-tracing; meaning asking listeners to spon-
taneously draw the shape of a musical excerpt that they just heard.
Additionally, I also believe that the spontaneous tracing of sound features as shapes
can be exploited in various music-related contexts; besides enhancing our understanding
Musical Shape Cognition 239

Figure 12.1 Sound-tracings by nine listeners of the sound fragment built up of an initial triangle
attack, a downward glide in the strings, and a final drum roll (spectrogram at the bottom).
(Sound fragment from cd3, track 13, 20”–29”, in Schaeffer [1967] 1998.)

of music as a phenomenon, shape cognition may also be useful in sonic design, musical
composition, performance, and multimedia arts, and a number of other domains by pro-
viding conceptual and practical tools for handling most musical features as shape images.

Notions of Shape

Music is ephemeral: musical sound and music-related body motion unfolds in time and
then vanishes, yet we are (fortunately) left with memory traces of what we just heard
and/or saw. The ephemeral nature of music is (and has been) a major challenge for
research, however, given available technologies for recording, processing, and repre-
senting sound and music-related body motion, we now have the means to “freeze” or
“make solid” the ephemeral, enabling close scrutiny of details previously not possible.
Yet, given these means, the next major challenge has become how to make sense out of
the vast amount of data typically generated by digitalization.
On the other hand, traditional means of representation by Western music notation,
although useful in conserving some aspects of music, is evidently incapable of repre-
senting many aesthetically and affectively highly significant features of musical expression.
This concerns what we may call subsymbolic features of music, meaning the various fea-
tures of sound, such as its so-called timbre (sometimes referred to tone color), a number
240 rolf inge godøy

of nuances in pitch (intonation) and loudness (dynamics), as well as what we may call
the suprasymbolic features, meaning the expressive elements of musical phrases such as
in timing and articulation, in so-called grooves, and in various affective and motion-
related labels, for example, tense, relaxed, light, heavy, agitated, calm, and so on.
We thus have the dual challenge of, on the one hand, representing salient features of
music using digital technology and, on the other hand, to go beyond the limitations of
traditional Western notation. My answer to this dual challenge, then, is that of musical
shape cognition, meaning that all features of music—that is, those at the subsymbolic,
the symbolic, and the suprasymbolic levels—can be represented as shapes; shapes that
enable us to systematically explore the many until now mostly inaccessible, yet highly
significant, elements of musical experience.
Musical shape cognition is thus a unifying conceptual and practical paradigm for
studying and actively manipulating salient features of music at different timescales,
ranging from the micro-level, subnote timescale features, to phrase and section-level
features of musical expression. In sum, we have the main challenge of bridging gaps
between the quantitative (of digital representations of sound and body motion) and the
qualitative (of holistic and subjective musical experience), and I believe musical shape
cognition will be the best answer to this challenge.
It would be no exaggeration to say that expressions of shape are ubiquitous in musical
discourse: there are innumerable occurrences of shape-related terms in music theory,
music analysis, music aesthetics, music history, and other music-related disciplines. We
typically encounter shape expressions for designating melodic, harmonic, rhythmic,
textural, dynamic, and expressive features, as well as large-scale formal designs. Also,
our Western music notation system, with its spatial distribution of notes on the pages
of the score, could actually be seen as having some element of shape cognition and,
secondarily, also as scripts for sound-production that in turn will result in body motion
shapes. And, needless to say, various graphical scores and sketches found in musical
composition and analysis contexts are instances of shape cognition. However, the more
systematic approach to musical shape cognition should be seen in relation to some
specific previous research endeavors:

• Seminal ideas on shape cognition in music extend back to classical Gestalt theory,
with early proponents towards the end of the nineteenth century such as Ehrenfels
and Stumpf and, a bit later, Koffka, Köhler, and Wertheimer, who were all con-
cerned with musical features as shapes (Smith 1988; Leman 1997; Godøy 1997b).
A number of Gestalt ideas have been extended into more recent music theory
(Tenny and Polansky 1980), into auditory research (Bregman 1990), and into music
perception research on melodies (Dowling 1994).
• The single most important historical background for my present thoughts on
musical shape cognition is the phenomenological approach to musical research
advocated by Pierre Schaeffer and his colleagues (Schaeffer 1966, [1967] 1998).
With the triple challenges of new music, music from other cultures, and new music
Musical Shape Cognition 241

technology in the post-World War II era, the need to develop a more universally
applicable music theory became evident to Schaeffer. To go beyond the confines of
traditional mainstream Western music theory, Schaeffer and colleagues turned
their attention to the subjective perception of sound, with the ambition of estab-
lishing a systematic classification of fragments of sound, of so-called sonic objects,
of any type, origin, or signification, for the most part by a systematic ordering of
sound features as shapes.
• Shape cognition plays an important role in acoustic and psychoacoustic research
(De Poli et al. 1991), and it has been used in signal-based visualizations of musical
sound (Cogan 1984) and in readily available software (e.g., SonicVisualiser, Praat,
and AudioSculpt as well as MIRToolbox, Timbre Toolbox, and other MatLab-
based software). Within these software development projects, there is ongoing
work to try to extract more perceptually salient information from signals and to
represent these features as “solid” shapes, that is, representations that can be
exploited in the context of our work on musical shape cognition.
• In work with new interfaces for musical expression (NIME), there is the challenge
of capturing and mapping body motion shapes to sound with the aim of enabling
more human-friendly control of the many parameters that go into digital synthesis
and processing of musical sound. As for motion data input, different technologies
for motion capture are available (various sensors, infrared and video camera
recordings). Associated processing tools (e.g., the MoCapToolbox, the EyesWeb
software, and the AudioVideoAnalysis software [Jensenius 2013]) have been
important in developing shape cognition, making the study of motion as “solid”
shape images possible. Notably, this also makes possible the study of expressive
and affective features of motion as shapes derived from motion data, for example,
of amplitude, velocity, acceleration, jerk, and so forth.
• We have learned much from more general approaches to shape cognition in
so-called morphodynamical theory (Thom 1983; Petitot 1985), an extensive theory
of geometric cognition as a basis for capturing and handling complex and distributed
phenomena in general. Also, in so-called cognitive linguistics, studies of image
schemata (i.e., more generic shape images) and of metaphor theory suggest that
shapes and spatial relations are crucial for all cognition (Godøy 1997a). Additionally,
there has been some very interesting work on the display of quantitative informa-
tion as shapes (Tufte 1983), with modes of representation that seem to have great
potential for shape cognition in general.
• Lastly, we have seen shape cognition become a topic in so-called embodied music
cognition, where the shapes of both sound-producing and sound-accompanying
body motion are understood as integral to musical experience (Godøy 2001, 2003a;
Leman 2008; Godøy and Leman 2010). In Figure 12.2 is an example of such sound-
producing motion shapes of a pianist playing an excerpt from a Beethoven sonata,
together with the notation and spectrogram of the resultant sound, demonstrating
a case of the ubiquitous sound-motion shape relationships in music.
242 rolf inge godøy

Spectrogram
Frequency (Hz)

900
600
300

Right side, horizontal movement

Position (mm)

1000

900

800

Left side, horizontal movement

600
Position (mm)

500

400

300
Right side, velocities
Velocity (mm/s)

400

200

0
Left side, velocities
Velocity (mm/s)

400

200

0
1 2 3 4 5 6 7 8 9 10
Time (s)
Shoulder Elbow Wrist

Figure 12.2 A synoptic representation of notation (top), spectrogram of resultant sound (next
to top), motion shapes, and velocity shapes of the shoulders, elbows, and wrists of a pianist
playing the opening of the last movement of L. v. Beethoven’s Piano Sonata No. 17 Op. 31 No. 2 in
d-minor, The Tempest, demonstrating shape correspondences between score, sound-producing
motion (including velocity shapes), and resultant sound Reproduced with permission from the
publisher, S. Hirzel Verlag, from (Godøy, Jensenius, and Nymoen 2010).
Musical Shape Cognition 243

As for embodied music cognition, I myself and colleagues have for more than a decade
tried to advance our knowledge of musical shape cognition through the following topics:

• Imitations of sound-producing body motion, so-called air instrument per

formances, by both trained and nontrained listeners (Godøy et al. 2006).
• So-called sound-tracing studies; that is, listeners with different levels of musical
training drawing sound shapes in listening (Nymoen et al. 2013). An example of
such sound-tracing can be seen in Figure 12.1.
• Motion capture studies of performers (Jensenius 2008; Godøy et al. 2010; Godøy
et al. 2016).
• Processing and representations of motion capture data (Jensenius 2013).
• Sonifications, that is, turning motion capture shape and other visual domain data
into sound (Jensenius and Godøy 2013).
• Statistical processing and machine learning for feature classification (Nymoen
et al. 2013).
• Theory development on chunking and emergence of shapes (Godøy 2013, 2014, 2017,
4–29).

Throughout, we have tried to make compilations of relevant findings together with

our international partners as, for example, summarized in (Godøy and Leman 2010),
and we shall in the following sections of this chapter give an overview of some main
aspects of sound and motion related to musical shape cognition.

Motor Cognition

A core idea of the present chapter is that shape cognition is embodied, and that it extends
to several sense modalities; that is, it is manifest in sound and motion, with motion in
turn including vision, proprioception, haptics, and sense of effort. In particular, the
motor theory of perception has claimed that images of sound-producing body motion are
integral to our perception of sound. Initially presented in linguistics with the suggestion
that language acquisition is not only a matter of becoming familiar with a set of sounds
but also just as much a matter of learning the corresponding sound-producing motion
of the vocal apparatus (Liberman and Mattingly 1985), it has been extended to other
domains of human perception and cognition (Galantucci et al. 2006), including the vis-
ual domain (Berthoz 1997). Furthermore, there is now brain observation evidence of the
spontaneous linking of sound and motion in perception (Haueisen and Knösche 2001;
Bangert and Altenmüller 2003), including evidence of a neurophysiological predisposition
for this linking (Kohler et al. 2002). The mental simulation of assumed sound-producing
motion will, in most cases, be covert, but we may also sometimes have observable behavior
in the form of imitation. This imitation may be variable in its accuracy, ranging from
244 rolf inge godøy

very detailed to rather approximate and vague, as can be observed in cases of the
aforementioned air instrument performance and as may be observed in various kinds of
vocal imitation such as in scat singing and beatboxing.
The motor theory perspective implies that shapes of observed or imagined sound-
producing body motion are projected onto whatever it is that we are hearing; for instance,
we might project images of energetic hand motion onto ferocious drum sound, or slow
bowing motion onto protracted, soft string sound. The idea is that the shapes of sound-
producing body motion contribute to the mental schemas for perceiving musical sound.
However, there is an important duality to shape cognition here: shapes may be
considered “instantaneous” images—that is, something occurring “in the blink of an
eye”—yet shapes may also be considered something that unfolds in time—something
that has to be “set into motion” and more like a script that needs to be run through in a
performance. This duality will be a recurrent topic in musical shape cognition, with a
tentative understanding that perception and action may shift between “instantaneous”
and “unfolding” shapes. This can be related to musical features, as suggested in
Figure 12.3, meaning I can hypothesize that there is a core of more amodal shape cog-
nition surrounded first by a circle of body instantiation, both as stationary posture shapes
and as motion trajectory shapes that are in turn surrounded by a circle of musical
features manifest variably as postures and motion shapes.
With my main tenet that active tracing of sound features as shapes is integral to the
perception and cognition of music, and the associated idea that the spontaneous tracing
of sound features as shapes can be exploited in various music-related contexts, we can
work along the following lines:

• Collect music-related body motion by marker-based motion capture, video, and

other means.
• Analysis of music-related body motion in view of salient shape features.
• Analysis of corresponding musical sound, also in view of salient shape-related
features.
• Advanced statistical methods (e.g., functional data analysis, canonical correlation
analysis, etc.) and machine learning, for making correlations between shapes in
different sensory domains.
• Systematic explorations of sound and motion shape correspondences by analysis-
by- synthesis and by various perceptual experiments.

My present explorations of musical shape cognition can thus be characterized as

mainly behavioral and signal-based (i.e., based on sound and body motion data),
however, I am also inspired by various recent neurocognitive findings from nonin
vasive brain research methods (e.g., functional magnetic resonance imaging), in particular
concerning motor theory and multisensory integration of sound and motion per
ception in music.
Musical Shape Cognition 245

Musical Timescales

In research on musical shape cognition, we need to be specific about what timescales we

are dealing with. For instance, a short melodic phrase has a pitch contour, as does a
whole melody, while a large-scale work, such as a whole symphony, can be perceived as
having a (macroscopic) pitch contour. Previously, I have suggested distinguishing three
main timescales of musical features (Godøy 2010a):

• Micro, that is, the less than approximately .5 seconds duration range of continuous
sound and body motion, with features such as pitch, loudness, stationary timbre
(or tone color), and various microtextural fluctuations related to shape metaphors
such as smooth, grainy, rough, and so forth.
• Meso, typically in the very approximately .5–5 seconds duration range, and usually
encompassing salient information on rhythmic, textural, timbral, harmonic,
melodic, and overall stylistic and affective features, and very often related to salient
body motion shapes of sound-production, such as of hands moving along the
keyboard (see Figure 12.2).
• Macro, typically containing several meso timescale chunks, forming sections,
whole songs, and more extended works of music.

Clearly, the micro and meso timescales are the most important with respect to
perceiving salient musical features such as timbre, dynamics, rhythmical-textural, melodic,
harmonic, and motion shapes; a couple of seconds of music would be enough to tell us,
for example, that it is a slow waltz, late romantic style, played by a small café ensemble,
and so on (see Gjerdingen and Perrott 2008, for examples of duration thresholds for
various features). But also, the macro timescale could be important for musical shape
cognition; however, this would be more on a narrative or dramaturgical level, for
instance, as found in various cases of program music.
On the micro and meso levels, we have attention and memory constraints that make
these timescales special (see, Godøy 2013 for a summary), but we also find sound-
producing constraints that contribute to chunking at the meso timescale. This includes
some crucial biomechanical constraints (e.g., limits to maximum speed of body motion,
need for rest and change of posture to avoid strain injury, need to anticipate positioning
of effectors [fingers, hands, arms, etc.] before tone onsets, etc.) resulting in s o-called
coarticulation, meaning a contextual fusion of events into meso timescale chunks
(Godøy 2014). Also, there are some motor control constraints that contribute to the
formation of meso timescale chunks. For one thing, human motor control seems to
be hierarchical and goal-oriented (Grafton and Hamilton 2007), organized in the form
of action Gestalts (Klapp and Jagacinski 2011), and furthermore, there have been well-
founded suggestions that human motor control is intermittent (Loram et al. 2014), and
246 rolf inge godøy

also that it may proceed by postures (Rosenbaum et al. 2007), something that I have
called key-postures in sound-producing body motion (Godøy 2013, 2014). Each such
key-posture is surrounded by what I call a prefix and a suffix so that there is a continuous
trajectory to and from the key-postures, something which is closely linked with the
aforementioned coarticulation (Godøy 2014).
The intermittency in human perception, cognition, and action is especially relevant
to our ideas on musical shape cognition, because a shape is, by definition, something
that works holistically, as something overviewed “instantly,” in a “now-point,” to use
Husserl’s expression (Husserl 1991; Godøy 2010c).

Sound Features

Although readily available technologies provide us with a number of versatile means

for capturing, storing, and retrieving musical sound, as well as processing and repre-
senting musical sound in different ways, we still have substantial challenges in extract-
ing and representing perceptually salient features of musical sound. As pointed out by
Pierre Schaeffer several decades ago, there is often a complex, or nonlinear, relationship
between our subjective perception of musical sound and the acoustic substrate of the
sound, a relationship of what Schaeffer called anamorphosis (Schaeffer 1966). Although
psychoacoustic and music perception research in the ensuing decades have made
substantial progress in exploring the relationships between acoustics and the subjective
perception of sound, Schaeffer’s idea of starting research with subjective notions of sound
features is still an attractive proposition in the context of musical shape cognition.
The basic idea of Schaeffer was that of focusing on sonic objects, typically in the very
approximately .5–5 seconds duration range. The focus on sonic objects emerged from
what was a technological necessity in the early days of musique concrète in the late 1940s
and early 1950s, working with so-called closed groves (sillons fermés) on phonograph
records in order to mix sounds into compositions, and where the experience of innu-
merable repeated listening to such looped sound fragments led to a perceptual focus
on the overall shape of the sound fragments as well as on their internal fabric. The focus on
the overall shapes led to what is called the typology of sonic objects, and the focus on the
various internal features was called the morphology of the sonic objects. Throughout this
work with sonic object theory, the crucial element for Schaeffer was that all features
should be thought of as concrete, as nonabstract shapes, as opposed to Western music
theory, which Schaeffer thought of as basically abstract in its designation of pitch and
duration symbols.
The typology in Schaeffer’s theory is, then, a scheme of concrete shapes, of what we
would call dynamic and pitch-related envelopes. Also, the morphology consists of a
scheme of concrete shapes, but here extended to include a number of spectral features.
Musical Shape Cognition 247

Of these two schemes, the typology was considered a coarse, first sorting of sonic objects
into three basic dynamic envelope shape categories:

• Sustained, meaning a relatively stationary and protracted type of sound.

• Impulsive, meaning a short sound with an abrupt, percussive, or plucked onset.
• Iterative, including several sounds in rapid succession as in a tremolo or trill.

It should be noted that there are so-called phase-transitions between these typological
c ategories, dependent on the density, rate, duration, and so forth of the events and
clearly related to constraints of sound-production (and probably also of perception
and cognition): shortening the duration of a sustained sound will lead to an impulsive
sound and, conversely, lengthening the duration of an impulsive sound will lead to a
sustained sound; slowing down the rate of onsets in an iterative sound will lead to a
series of impulsive sounds, accelerating the rates of onset of impulsive sounds will lead
to an iterative sound; and so on. Also, in addition to these dynamic typological categories,
there were three pitch-related shape categories in the typology:

• Stable pitch, a clear and unchanging pitch sensation.

• Variable pitch, a clear but changing pitch sensation, for instance, with a glissando.
• Nonpitched, that is, noise or a strongly inharmonic type of sound.

These two main categories (dynamic and pitch-related) were then combined in a 3 × 3
matrix of basic typological classification. The general procedure of both the typology
and the ensuing morphology is that of a top-down feature differentiation, starting out
with the overall envelopes and proceeding to subfeatures, sub-subfeatures, and so on,
as far down as is deemed useful for characterizing perceptually salient features. The
morphology is quite extensive in detail, so here are just two of the categories:

• Grain, meaning a rapid fluctuation within the sound, be that of loudness, pitch, or
spectral content; for example, the grainy sound of a deep double bass tone.
• Gait, meaning a slower fluctuation within the sound, such as in the undulating
motion in a dance tune accompaniment.

These morphological features would then be subject to further differentiations; for

example, the rate and amplitude of the grain, the consistency versus fluctuation of the
rate of the gait, and so forth. In addition, we may think of more conventional Western music
theory features as shapes, both in terms of more purely sonic features, such as could
be represented by spectrograms, and as note-level symbolic features, such as could be
represented with Western common practice notation, alternatively as MIDI data:

• Textural and rhythmic patterns

• Melodic and modal patterns
• Harmonic patterns
248 rolf inge godøy

• Timbral patterns
• Articulatory and expressive patterns
• Timing patterns

In particular, the last three feature categories are well suited for technological
representations that combine Western notation with more detailed information on onset
timing, duration, dynamics, and spectral features, thus making accessible performance-
related information that previously it was not possible to represent. Such access to
features of musical sound is presently made possible within the field of so-called music
information retrieval; that is, the searching through large collections of musical sound
by way of various sound perception criteria (Müller 2015).

Motion Features

The basic idea of the aforementioned motor theory is that any sound event is also
perceived as embedded in a motion event, hence that it could be useful to make a more
systematic overview of various types of music-related body motion. Following the
scheme suggested in Godøy and Leman (2010), there are the following main categories:

• Sound-producing motion, consisting of so-called excitatory motion that transfers

energy from the body to the musical instrument (or the vocal apparatus) such as
by hitting, stroking, bowing, plucking, and blowing, as well as modulatory motion,
such as changing the pitch with the left hand on a string instrument or moving
the mute on a brass instrument. We also find so-called ancillary motion here,
motion that is not directly sound-producing but made in order to facilitate sound-
production, to avoid fatigue or strain injury, or to help in the expressive shaping of
the music, as well as so-called communicative motion for coordinating musicians
within the ensemble or for making some theatrical effects on the audience.
• Sound-accompanying motion, including all kinds of body motion made to music,
such as in dancing, walking, gesticulating, nodding, and swaying, often in sync
with some features of the music and often reflecting some overall sense of effort of
the music (Godøy 2010b).

These are just main categories, and notably so; music-related motion may also in
many cases be multifunctional, that is, it can be both directly sound-producing and
more theatrical so as to enhance the total, multimodal experience of attending a
concert. Also, there might, in many cases, be similarities in the energy envelopes of
sound-producing and sound-accompanying motion, typically in dance and/or other
kinds of body motion, such as in the classic example of Charlie Chaplin’s shaving
motions mirroring the sound-producing motions in the famous barber scene from
The Great Dictator (see Godøy 2010b, for a discussion of this).
Musical Shape Cognition 249

Furthermore, we have different timescales at work here as well, ranging from the
global to the local. Typically, we may have overall, global motion features such as:

• Quantity of motion, which may be calculated directly from the video data (frame-
by-frame pixel difference) or motion capture data or other time-varying sensor
data (total amount of distance traveled within a timeframe), and which may be a
coarse indicator of overall activity level.
• Various derivatives of the motion data, such as velocity and jerk, indicative of the
mode of motion, a high value for jerk meaning much abrupt motion, a low or zero
value for jerk meaning rather calm motion, and so on.

And we may have more local motion features such as:

• Local trajectories, such as the shape of beat, that are indicative of mode of
articulation.
• Trajectories for different sonic features, such as ornaments and figures, indicating
anticipatory motion, phase-transition, and coarticulation.

Common to all these motion features is that they may be experienced, conceived,
and represented as shapes and, returning to the basic dynamic sound envelopes of
Schaeffer’s typology presented earlier, we see that they correlate well with motion
features (Godøy 2006):

• Sustained, meaning smooth, protracted motion with little acceleration and no

jerk.
• Impulsive, meaning abrupt motion with much acceleration and jerk.
• Iterative, meaning rapid back-and-forth motion, fast enough to make individual
motion events fuse into a superordinate oscillatory motion.

As was the case for the sound features, these motion categories and features can in
turn be combined into more complex textures, for example, into often-found fore-
ground-background or melody-accompaniment textures of Western music, or into var-
ious heterophonic textures with composite sonic objects.

Multimodal Sound-Motion Shapes

Musical shape cognition combines sound and motion, and hence also motion-linked
sensations such as vision, touch, proprioception, sense of effort, and possibly other
sensations as well, all together calling the “purity” of music into question, and rather
suggesting that we recognize and tackle sound-motion links as an inherent multi-
modal feature of music. Furthermore, “multimodal” here means that sound is combined
250 rolf inge godøy

Stationary spectral shapes

(formants, vowels, etc.)
Spectral
Chordal
motion
Posture shapes (of shapes
shapes
hands, mouth, torso, etc.)
Rhythmic-
Modality
textural
shapes
shapes
Musical shape
cognition
Dynamic Melodic
(envelope) shapes
shapes Motion trajectory shapes (of
fingers, hands, arms, etc.)
Expressive Intonation
(articulation) shapes
shapes Affective
Timing
shapes shapes

Figure 12.3 A schematic overview of music-related shape cognition elements. Assuming a

core of amodal musical shape cognition, the next circle includes quasistationary body postures
(as typically found in the vocal apparatus and in instrumental music as effector shapes) and
motion trajectories that are then related to various other musical features in the next circle.

with shapes in the involved modalities, that is, shapes of motion, vision, proprioception,
touch, and so on.
Concretely, we can see how different, and variably multimodal, musical features relate
to shape in Figure 12.3. Centered on a core of what may be understood as a very general
and amodal musical shape cognition, this general faculty for shape cognition may be
differentiated into the two main categories of posture-related shapes and motion tra-
jectory shapes, but certainly this is more of a gravitation matter than a sharp divide. These
two main categories can in turn be differentiated into a number of shape categories that
are variously posture and/or motion related.
In more detail, the posture shapes, hence the quasistationary shapes, include the
following sound-motion shapes (going clockwise around the outmost circle, starting
at the top):

• Stationary spectral shapes, or more accurately, so-called quasistationary spectral

shapes for natural sounds, often reflecting vocal apparatus sound shapes and/or
instrumental sound shapes, with characteristic energy distributions, formants,
spectral centroids, and so forth.
• Chordal shapes, both more abstract classifications (chord types) and actual distri-
butions (spacing and voicing), related to stationary spectral shapes.
• Modality (tone semantic) shapes, reflecting typically recurrent interval constel
lations within a limited timeframe, typically modes such as Dorian, Phrygian, and
Lydian, similar to the spectral and chordal shapes.
Musical Shape Cognition 251

And the motion trajectory shapes include the following sound-motion shapes
(continuing clockwise around the outmost circle), often displaying high levels of meso
timescale, within chunk fusion by coarticulation:

• Melodic shapes, meaning contours at different timescales, ranging from small

motive to more protracted melodies.
• Intonation shapes, including note-level nuances such as glissando, vibrato, and
portamento.
• Affective shapes, meaning more composite shapes of, for example, temporal,
dynamic, and timbral changes.
• Timing shapes, typically acceleration or retardation curves.
• Expressive (articulation) shapes, including accentuations, timing, and intonation
nuances.
• Dynamic (envelope) shapes, with crescendo, decrescendo, and/or more rapid
fluctuations.
• Rhythmic-textural shapes, the overall sound-producing motion shapes and the
overall sound output shapes of all kinds sound patterns.
• Spectral motion shapes, meaning changes in overall spectral features, for instance,
by opening-closing of mutes or change of bowing position.

Needless to say, there are a number of challenges in studying sound-motion similarity

in music, however it is a feasible undertaking provided there is careful consideration of
the ontological validity of the shape images across modalities (Godøy et al. 2016).

Musical Instants

A shape is intrinsically (by definition) instantaneous, in the case where we see it all at
once (a figure on a page, a sculpture, an object), when we anticipate a sequence of
motion, a sequence of events, of sound, or when we need to scan a figure in time or listen
to a sequence in time and form the shape image retrospectively and by keeping the
temporally unfolded shape in some kind of resonant buffer. We thus have a seemingly
enigmatic relationship between continuity (stream of sensations) and discontinuity
(instantaneous shape images based on overviews of continuous segments) in our reflec-
tions on musical shape cognition.
However, this relationship between continuity and discontinuity may perhaps
(at least partially) be understood as integral to motion planning and motor control, cf.
the model of key-posture oriented motion where key-postures are discontinuous but
where their respective prefixes and suffixes are continuous, resulting in a continuous,
undulating motion, only intermittently “punctuated” by key-postures (Godøy 2013).
The idea of motor theory is that the production schema is projected onto whatever it is
that we are perceiving, hence suggesting that we also perceive the key-posture orientation
252 rolf inge godøy

“in reverse” of sound-production when we are perceiving musical sound. However, we

clearly need more research here, in particular along the lines suggested by (Grossberg
and Myers 2000), with the metaphor of a short-term memory resonant buffer that keeps
the entire chunk in consciousness until the sequential unfolding is perceived holistically
as one chunk.
Also, many perceptually salient features (e.g., style, affect, aesthetics) are dependent
on some minimum duration segment of spatiotemporal unfolding, yet, even this
unfolding seems to be somehow perceived “all-at-once” or “in-a-now,” as was pointed
out by Husserl. We seem to still have a long way to go in fully understanding this relation-
ship between the continuous and the discontinuous in perception, and here are some
ideas that might be relevant for our reflections here:

• Edmund Husserl, in dialogue with several of his contemporaries, argued that

perception by necessity proceeds in a discontinuous manner by intermittently
interrupting the continuous stream of sensations at so-called now-points in order
to make sense out of whatever it is that we are perceiving (Husserl 1991; Godøy
2008, 2010c). One possible extension of Husserl’s idea of interruption could be
that of intermittent motor control as an “instantaneous” overview image of what is
to come, that is, as an anticipation of sound-producing motion trajectories to be
made in the immediate future (Godøy 2011, 2013). This is something that would fit
quite nicely with the idea of shape overview as an integral element of anticipatory
planning of body motion.
• Related ideas on the holistic experience of the present moment have been presented
throughout the twentieth century (e.g., Pöppel 1997, Michon 1978, Stern 2004),
and, although the arguments in favor of such basic discontinuity in perception
and cognition may vary, there seems to be a general consensus that there is indeed
such a “moment-by-moment” feature in human perception.
• In more music-related contexts, we have various accounts of the need for overview
images of musical unfolding, and Hindemith suggested that an instantaneous
overview of an entire composition was an essential feature of craftsmanship in
music: “If we cannot, in the flash of a single moment, see a composition in its abso-
lute entirety, with every pertinent detail in its proper place, we are not genuine
creators” (2000, 61).
• Xenakis expressed similar ideas on the instantaneous overview of any musical
work with his concept of “hors-temps” (“outside time”) and of having a distant
visual perspective on the whole work so that it would appear in one snapshot:
“This is to say that in the snapshot, the spatial relations of the entities, the forms
that their contiguities assume, the structures, are essentially outside time (hors-
temps). The flux of time does not intervene in any way. That is exactly what
happens with the traces that the phenomenal entities have left in our memory.
Their geographical map is outside time” (1992, 264).
• The transition from shape images to sound may be seen as a general phenomenon
that may have different instantiations, ranging from that of transforming the
notation symbols of a score into sound by way of sound-producing motion, to
Musical Shape Cognition 253

more direct transformation of graphic images to sound by so-called sonification

(Jensenius and Godøy 2013).
• Psychoacoustic research on auditory objects has shed some light on this holistic
perception (Griffiths and Warren 2004; Bizley and Cohen 2013), partly through
recourse to the multimodal features of sound. Additionally, we have an interesting
modeling of the holistic perception of sequential unfolding by some kind of res-
onant buffer, a kind of short-term memory store (Grossberg and Myers 2000).

From a more conceptual point of view, we may in any case claim that shape is by
definition holistic or nonpunctual, hence, always temporally extended, and often also
spatially extended in the sense of the effectors (fingers, hands, arms) and, more indi-
rectly, in the time-domain and frequency-domain representations as extended shapes.
How the transition between the continuous stream of sound-motion and the discon-
tinuous images of shape (physical and/or mental images) actually works in our perception
and cognition still seems to be quite enigmatic, yet it so very obviously seems to work,
both in musical contexts and in general.

Shape Cognition in
Musical Imagery

Evidently, musical sound creates memory traces in our minds, and it seems that we may
mentally replay the music in the original tempo, or in slow motion, or in fast motion,
even defying temporal unfolding as the sounds may be more in the guise of instan
taneous overview images (cf., the previous section). What such an ability to recollect and
reenact musical sound in our minds points to, is the capability of musical imagery,
meaning to make music present in our minds beyond the immediate or “original” listening
experience. “Musical imagery” may be defined as “our mental capacity for imagining
musical sound in the absence of a directly audible sound source, meaning that we can
recall and re-experience or even invent new musical sound through our ‘inner ear’ ”
(Godøy and Jørgensen 2001, ix). However, the expression “musical imagery” is sometimes
also taken to denote mental images that accompany music, images of colors, textures,
landscapes, and so forth, that listeners may have when listening to various kinds of music
(Aksnes and Ruud 2008). Such imagery with music may of course reflect shape-related
features of the music; however, I shall here limit my reflections to the imagery for sound
and its associated sound-producing and/or sound-accompanying motion images.
Knowledge about musical imagery has in the past couple of decades been enhanced
by both behavioral and brain imaging research (see, e.g., Zatorre and Halpern 2005, for
an overview). But musical imagery may also be seen in the broader context of mental
imagery, and is closely linked with our general capacity for reenacting in our minds
whatever it is that we may have experienced (Kosslyn et al. 2001), as well as having a
capacity for simulating expected future events and actions in order to make us better
254 rolf inge godøy

prepared for upcoming challenges in our moment-to-moment existence (Berthoz 1997).

Furthermore, mental imagery might be voluntary as well as sometimes involuntary; in
the latter case as unwanted persistent images in our minds. In music, we have the well-
known experience of “tunes stuck in the head,” that is, of so-called involuntary musical
imagery (see Williams 2015, for an overview).
As for more voluntary musical imagery, one crucial question in our context is how
such imagery is triggered, sustained, and put to use in various music-related tasks: musical
imagery is integral to so-called mental practice for performance, in score reading, in
composition, arranging, and orchestration, or in any situation requiring the recall of
previously heard music, such as for the purpose of writing a review or for the sheer
pleasure of savoring a great musical experience. More precisely, the question is that of
how to control volitional musical imagery, that is, how to actively initiate and sustain
images of musical sound in our minds.
In line with the arguments above of the many and strong links between motion,
sound, and vision in music, a possible answer to this question of volitional musical
imagery could be that of shape cognition in music; the appearance and persistence of
salient imagery of musical sound may be linked to enacted shape cognition, meaning
that images of musical sound may be “brought to life” in our minds by active shape
tracings. In other words, what could be called gestural imagery may here be understood
as serving auditory imagery (Godøy 2003b). The concrete implementation of shape cog-
nition in musical imagery could be by imagined performance, reminiscent of so-called
air instrument performance (Godøy et al. 2006) or by various kinds of sound-tracing
(Nymoen et al. 2013), that is, of moving hands, fingers, arms, torso, and so on, in tracing
the shapes or miming the production of various sound features.
The crucial element here would be the shifting between images of motion and sound,
with motion imagery triggering sound imagery, and conversely, sound imagery trigger-
ing motion imagery, all the time with shape cognition as the translating factor between
motion and sound. Such shifting between sound and motion, and between motion and
sound, could be a way to explore features of both sound and of motion, potentially
encompassing a large number of musical features that previously has had no name but
still are aesthetically significant. This was in fact one of the main ideas of Schaeffer’s
typology and morphology of sonic objects, and also something that may be put to use in
contemporary sonic design: our awareness of sonic features may be enhanced by shape
cognition in musical imagery, by actively and persistently shifting between images of
sound shapes and motion shapes.

Prospects and Challenges

Musical shape cognition is becoming increasingly more feasible with new technology;
thanks to new conceptual tools and attitudes, we might soon achieve an enhanced under-
standing of how we perceive sensory impression holistically as shapes. As argued earlier,
Musical Shape Cognition 255

shape cognition can become useful in a number of music-related tasks, in particular

by bridging the gaps between “hard” numerical data and “soft” qualitative concepts
and, furthermore, these advantages of shape cognition are based on the fact that shapes
are inherently holistic and extended, whereas more abstract symbols are inherently
atomistic and local.
But needless to say, we still have much to do in understanding how musical shape
cognition works. On a more practical level, we have the following challenges:

• Finding and isolating experientially salient features, both in sound data and in
motion data.
• Systematically exploring sound-motion shape relationships by analysis-by-synthesis
and match-mismatch experiments.
• And, not to forget, exploring practical applications of musical shape cognition in
performance, composition, improvisation, sonic design, and various multimedia
arts, by systematic mappings between different representations.

And on a more general level, we also have some significant challenges:

• Understanding how the continuous to discontinuous transition in perception

works.
• Develop better representations of perceptually salient features of sound-motion as
shapes.
• Develop enhanced means for handling very large collections of sound-motion
data in view of systematic feature mappings.

Yet, in spite of these outstanding challenges, it seems fair to conclude that most (if not
all) perceptually salient musical features may be conceptualized as shapes. Our capacity
for musical shape cognition should be considered one of the most powerful tools of both
knowledge and skill in musical creation and we are only at the beginning of tapping its
potential.

References
Aksnes, H., and E. Ruud. 2008. Body-Based Schemata in Receptive Music Therapy. Musicae
Scientiae 12 (1): 49–74.
Bangert, M., and E. O. Altenmüller. 2003. Mapping Perception to Action in Piano Practice:
A Longitudinal DC-EEG Study. BMC Neuroscience 4: 26.
Berthoz, A. 1997. Le sens du mouvement. Paris: Odile Jacob.
Bever, T. G., and D. Poeppel. 2010. Analysis by Synthesis: A (Re-)Emerging Program of
Research for Language and Vision. Biolinguistics 4: 174–200.
Bizley, J. K., and Y. E. Cohen. 2013. The What, Where and How of Auditory-Object Perception.
Nature Reviews Neuroscience 14: 693–707.
Bregman, A. S. 1990. Auditory Scene Analysis. Cambridge, MA, and London: MIT Press.
256 rolf inge godøy

Cogan, R. 1984. New Images of Musical Sound. Cambridge, MA, and London: Harvard
University Press.
De Poli, G., A. Piccialli, and C. Roads. 1991. Representations of Musical Signals. Cambridge,
MA, and London: MIT Press.
Dowling, W. J. 1994. Melodic Contour in Hearing and Remembering Melodies. In Musical
Perceptions, edited by R. Aiello and J. A. Sloboda, 173–190. New York: Oxford University
Press.
Galantucci, B., C. A. Fowler, and M. T. Turvey. 2006. The Motor Theory of Speech Perception
Reviewed. Psychonomic Bulletin and Review 13 (3): 361–377.
Gjerdingen, R. O., and D. Perrott. 2008. Scanning the Dial: The Rapid Recognition of Music
Genres. Journal of New Music Research 37 (2): 93–100.
Godøy, R. I. 1997a. Formalization and Epistemology. Oslo: Scandinavian University Press.
Godøy, R. I. 1997b. Knowledge in Music Theory by Shapes of Musical Objects and Sound-
Producing Actions. In Music, Gestalt, and Computing, edited by M. Leman, 89–102. Berlin:
Springer-Verlag.
Godøy, R. I. 2001. Imagined Action, Excitation, and Resonance. In Musical Imagery, edited by
R. I. Godøy and H. Jørgensen, 239–252. Lisse, Netherlands: Swets and Zeitlinger.
Godøy, R. I. 2003a. Motor-Mimetic Music Cognition. Leonardo 36 (4): 317–319.
Godøy, R. I. 2003b. Gestural Imagery in the Service of Musical Imagery. In Gesture-Based
Communication in Human-Computer Interaction, LNAI 2915, edited by A. Camurri and
G. Volpe, 55–62. Berlin and Heidelberg: Springer-Verlag.
Godøy, R. I. 2006. Gestural-Sonorous Objects: Embodied Extensions of Schaeffer’s Conceptual
Apparatus. Organised Sound 11 (2): 149–157.
Godøy, R. I. 2008. Reflections on Chunking in Music. In Systematic and Comparative
Musicology: Concepts, Methods, Findings, edited by A. Schneider, 117–132. Frankfurt:
Peter Lang.
Godøy, R. I. 2010a. Images of Sonic Objects. Organised Sound 15 (1): 54–62.
Godøy, R. I. 2010b. Gestural Affordances of Musical Sound. In Musical Gestures: Sound,
Movement, and Meaning, edited by R. I. Godøy and M. Leman, 103–125. New York: Routledge.
Godøy, R. I. 2010c. Thinking Now-Points in Music-Related Movement. In Concepts,
Experiments, and Fieldwork: Studies in Systematic Musicology and Ethnomusicology, edited
by R. Bader, C. Neuhaus, and U. Morgenstern, 245–260. Frankfurt am Main: Peter Lang.
Godøy, R. I. 2011. Sound-Action Awareness in Music. In Music and Consciousness, edited by
D. Clarke and E. Clarke, 231–243. Oxford: Oxford University Press.
Godøy, R. I. 2013. Quantal Elements in Musical Experience. In Sound, Perception, Performance:
Current Research in Systematic Musicology, Vol. 1, edited by R. Bader, 113–128. Berlin: Springer.
Godøy, R. I. 2014. Understanding Coarticulation in Musical Experience. In Sound, Music, and
Motion, LNCS 8905, edited by M. Aramaki, O. Derrien, R. Kronland-Martinet, and S. Ystad,
535–547. Berlin: Springer.
Godøy, R. I. 2017. Key-Postures, Trajectories and Sonic Shapes. In Music and Shape, edited by
D. Leech-Wilkinson and H. Prior. Oxford: Oxford University Press.
Godøy, R. I., E. Haga, and A. R. Jensenius. 2006. Playing “Air Instruments”: Mimicry of
Sound-Producing Gestures by Novices and Experts. In GW2005, LNAI3881, edited by
S. Gibet, N. Courty, and J.-F. Kamp, 256–267. Berlin: Springer.
Godøy, R. I., A. R. Jensenius, and K. Nymoen. 2010. Chunking in Music by Coarticulation.
Acta Acustica united with Acustica 96 (4): 690–700.
Godøy, R. I., and H. Jorgensen. 2001. Musical Imagery. Lisse, Netherlands: Swets and Zeitlinger
Musical Shape Cognition 257

Godøy, R. I., and M. Leman. 2010. Musical Gestures: Sound, Movement, and Meaning.
New York: Routledge.
Godøy, R. I., M. Song, K. Nymoen, M. R. Haugen, and A. R. Jensenius. 2016. Exploring Sound-
Motion Similarity in Musical Experience. Journal of New Music Research 45 (3): 210–222.
Grafton, S. T., and A. F. de C. Hamilton. 2007. Evidence for a Distributed Hierarchy of Action
Representation in the Brain. Human Movement Science 26:590–616.
Griffiths, T. D., and J. D. Warren. 2004. What Is an Auditory Object? Nature Reviews
Neuroscience 5 (11): 887–892.
Grossberg, S., and C. Myers. 2000. The Resonant Dynamics of Speech Perception: Interword
Integration and Duration-Dependent Backward Effects. Psychological Review 107 (4):
735–767.
Haueisen, J., and T. R. Knösche. 2001. Involuntary Motor Activity in Pianists Evoked by Music
Perception. Journal of Cognitive Neuroscience 13 (6): 786–792.
Hindemith, P. 2000. A Composer’s World: Horizons and Limitations. Mainz: Schott.
Husserl, E. 1991. On the Phenomenology of the Consciousness of Internal Time, 1893–1917.
Translated by J. B, Brough. Dordrecht: Kluwer Academic.
Jensenius, A. R. 2008. Action, Sound: Developing Methods and Tools to Study Music- Related
Body Movement. PhD thesis, University of Oslo. Oslo: Acta Humaniora.
Jensenius, A. R. 2013. Some Video Abstraction Techniques for Displaying Body Movement in
Analysis and Performance. Leonardo: Journal of the International Society for the Arts,
Sciences and Technology 46 (1): 53–60.
Jensenius, A. R., and R. I. Godøy. 2013. Sonifying the Shape of Human Body Motion using
Motiongrams. Empirical Musicology Review 8: 73–83.
Klapp, S. T., and R. J. Jagacinski. 2011. Gestalt Principles in the Control of Motor Action.
Psychological Bulletin 137 (3): 443–462.
Kohler, E., C. Keysers, M. A. Umiltà, L. Fogassi, V. Gallese, and G. Rizzolatti. 2002. Hearing
Sounds, Understanding Actions: Action Representation in Mirror Neurons. Science
297: 846–848.
Kosslyn, S. M., G. Ganis, and W. L. Thompson. 2001. Neural Foundations of Imagery. Nature
Reviews Neuroscience 2: 635–642.
Leman, M. 1997. Music, Gestalt, and Computing: Studies in Cognitive and Systematic Musicology.
Berlin: Springer.
Leman, M. 2008. Embodied Music Cognition and Mediation Technology. Cambridge, MA:
MIT Press.
Liberman, A. M., and I. G. Mattingly. 1985. The Motor Theory of Speech Perception Revised.
Cognition 21: 1–36.
Loram, I. D., C. van de Kamp, M. Lakie, H. Gollee, and P. J. Gawthrop. 2014. Does the Motor
System Need Intermittent Control? Exercise and Sport Science Review 42 (3): 117–125.
Michon, J. 1978. The Making of the Present: A Tutorial Review. In Attention and Performance
VII, edited by J. Requin, 89–111. Hillsdale, NJ: Erlbaum.
Müller, M. 2015. Fundamentals of Music Processing. Heidelberg, NY: Springer.
Nymoen, K., R. I. Godøy, A. R. Jensenius, and J. Torresen. 2013. Analyzing Correspondence
between Sound Objects and Body Motion. ACM Transactions on Applied Perception 10 (2).
9:1–9:22.
Petitot, J. 1985. Morphogenèse du sens I. Paris: Presses Universitaires de France.
Pöppel, E. 1997. A Hierarchical Model of Time Perception. Trends in Cognitive Science 1 (2):
56–61.
258 rolf inge godøy

Rosenbaum, D. A., R. G. Cohen, S. A. Jax, D. J. Weiss, and R. van der Wei. 2007. The Problem
of Serial Order in Behavior: Lashley’s Legacy. Human Movement Science 26 (4): 525–554.
Schaeffer, P. 1966. Traite des objets musicaux. Paris: Editions du Seuil.
Schaeffer, P. (1967) 1998. Solfege de l’objet sonore. With sound examples by Reibel, Guy, and
Ferreyra, Beatriz. Paris: INA/GRM.
Smith, B. 1988. Foundations of Gestalt Theory. Munich and Vienna: Philosophia Verlag.
Stern, D. 2004. The Present Moment in Psychotherapy and Everyday Life. New York:
W. W. Norton.
Tenny, J., and L. Polansky. 1980. Temporal Gestalt Perception in Music. Journal of Music
Theory 24 (2): 205–241.
Thom, R. 1983. Paraboles et catastrophes. Paris: Flammarion.
Tufte, E. R. 1983. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Williams, T. I. 2015. The Classification of Involuntary Musical Imagery: The Case for Earworms.
Psychomusicology 25 (1): 5–13.
Xenakis, I. 1992. Formalized Music. Rev. ed. Stuyvesant: Pendragon.
Zatorre, R. J., and A. Halpern. 2005. Mental Concerts: Musical Imagery and Auditory Cortex.
Neuron 47: 9–12.
chapter 13

Pl ay i ng th e I n n er E a r
Performing the Imagination

Simon Emmerson

Introduction

Musicians and sound artists imagine sound; the imagination is the most powerful and
flexible audio workstation we have. It can do fabulous transformations—some beyond
current “real-world” capabilities. In this manner, it works on sound memory and
experience. But the imagination is also a synthesizer. Here we are in uncharted waters.
Real synthesis is experimental: we build circuits, set dials, or build algorithms and set
parameters—whatever the means, we can sit back and listen to the possibly unexpected
results. However, the imagination works differently in this mode—perhaps its inputs are
not at all conscious. I might hear a sound in my imagination apparently from nowhere—
I can perceive no immediate cause. How might we externalize and use this enormous
power? And, furthermore, might imagining music become a form of performance?1
There is on one hand a discourse of language2: first, better to describe what these inter-
nal sounds and processes sound like but, second, as a means to encourage their fuller
development. On the other hand, there is a subtle non- (pre-?) linguistic game. We do
not need words for this—they might even get in the way and limit the options. This
is about creativity and play—a continuous sequence of imagine, play, listen, modify,
imagine. . . . There may be other nonverbal ways to externalize, for example—what is the
role of visualization? Sound-to-visual synesthesia is a specific form of a more general
phenomenon (Van Campen 2008). This, too, may be a descriptive response but poten-
tially a powerful synthesizer as well. This is a form of reverse engineering—we describe
an effect and work backward to reconstruct a possible material cause.
This chapter will not deal directly with interface design, brain, and neuroscience. What
I want to discuss and encourage is greater engagement of this infinite world in the creative
process of sound- and music making. It is entirely speculative although based on ideas
and tools that seem to have had seeds in the last quarter of the twentieth century—indeed
260 simon emmerson

my own observations of approaching fifty years of making music with electroacoustic

technology3 suggest that, reflected forward another fifty, students studying today will
live to see some of this speculation come to pass.
The imagination has always been a powerful tool for sound creation. To share the
fruits of this almost infinite resource we need some way to externalize—at least for
the moment we cannot have access to imagined sound directly. Such a powerful tool
has the potential for seduction, envelopment, and immersion. From Plato onward this is
at the root of some belief systems’ concerns about the morality and ethics of sound and
music. When Caliban declaims:

Be not afeard. The isle is full of noises,

Sounds and sweet airs, that give delight and hurt not.
Sometimes a thousand twangling instruments
Will hum about mine ears, and sometime voices
That, if I then had waked after long sleep,
Will make me sleep again: and then, in dreaming,
The clouds methought would open and show riches
Ready to drop upon me, that when I waked
I cried to dream again
(The Tempest, Act III, scene ii)

it is clearly a wondrous dream and we all (almost immediately) are imagining what
sounds might make him “cry to dream again.” Interestingly, the word “noises” seems not
to have a negative connotation in this passage. If for the moment these imaginings are
private, how might we in future harness them to enhance our abilities and possibilities
in the shared perceptible world of sound?

From Information to Imagination

In the 1980s, the phrase that was often used to describe emerging computer and digital
applications was “music information technology.” This was a dry and practical reduc-
tion of music to information—eliminating or at least discouraging the descriptions
of aesthetic dimensions. We now have an emerging “music imagination technology”
(Emmerson 2011). Imagination is defined as “the faculty or action of forming new ideas,
or images or concepts of external objects not present to the senses.”4

Imagination and Imagery

There is a tendency to bracket together “image” and “imagination”—the two have the
same origin in the Latin “imago,” but we can make a distinction between them. Clearly
imagination has a greater scope than image. For the purposes of this discussion I will
playing the inner ear: performing the imagination 261

use image in its everyday sense as having a visual component, while imagination can
have a much broader range of sensory, space, and time elements (audio, visual, tactile,
and so forth)—thus an image may, of course, be real or take effect within the imagination.5
So as musicians we may imagine a scenario, an instrument, a performance, a sense of
space, place and movement, a form, an atmosphere. These may not be sounding but give
us a context for sound. For example, we may imagine a complex relationship expressed
through mathematics that somehow drives the sound synthesis. John Chowning talks
of something similar in his realization that the principles of frequency modulation
(FM, well established in radio) might be applied in the audio domain.
Then, most importantly, we have what I would call “inner listening modes” that
only partly correspond to those of the physical world. First, we can imagine acousmatic
sound—that is the sound itself directly (with no sense of place and origin). If we are
aware of our imagination at work we may not (after all) search for any source and cause.
But then again, as a second such mode, imagining source and cause is also possible—
we might construct imaginary instruments, environments, machines, and so forth
(in scenarios as surmised earlier), and “hear” what we believe is their sounding.

Form in Space and Time

The relationship of space and time in the sound imagination is complex and much more
interactive and exchangeable than in the world around us. Form, shape, direction,
extension can exist in either space or time—and often in an ambiguous “mix” of the
two. Iannis Xenakis is cited by the architect Le Corbusier, for whom he worked in the
1950s, as stating:

Goethe said that “architecture was music become stone.” From the composer’s point
of view the proposition could be reversed by saying that “music is architecture in
movement.” (Le Corbusier 1968, 326)

So, when sound becomes music there seem to be two approaches to its evolution: form
in space, which seems to have an outside time existence—a kind of architecture; and
form in time, “forming”—an emergent property that is built over time into memory. Let
us examine each in turn, starting with form in space. Composers have often described
imagining musical form “outside time”—some have claimed to “see” forms of musical
works in an instant.
Form in space clearly has a relationship to the idea of the (Western musical) score
where time is mapped onto space. But, perceptually, we have form in time—a kind of
accumulation in which only at the conclusion of listening does memory assemble the
whole.6 Of course, if it is a piece that already exists, and that I already know, then this works
somewhat differently, as I may be comparing the present with a memory of the past—but
for our imaginative synthesizer we are working much more in the moment on entirely new
sound and music. I have a compromise (perhaps alternative) approach to bring these two
262 simon emmerson

views together. Imagining sound through a form of performance—perhaps improvised,

perhaps rehearsed—might bring us closer to the realm of truly musical relationships.

Reverse Engineering: From

Reaction to Generation

Much study in the psychology of music tries to understand the reaction of the human to
musical stimulus, increasingly embodied as well as socially and ecologically situated
(Clarke 2005; Bharucha et al. 2006; Leman 2016). I suggest that we have here the elements
of a tool set for a reversal of the process. One of the aims of the study and understanding
of human reaction to sound might be to allow its generation from our ideas. From both
film music libraries and music information retrieval (MIR)-based sound spotting—that
is, finding sound with a given characteristic—we see the emergence of such toolkits.
It is this mirroring that will form the basis of my discussion. This is what I described
above as a kind of reverse engineering, starting with the result and working back to the
possible cause.

First Steps Toward Reconstruction

We have seen a first stage of this process in the highly significant steps forward in neu-
roscience in recent years following on from improved methods of observing (visualizing)
and recording brain activity during attention to both audio (listening) and visual
signals. Most of these follow the rapid development of noninvasive methods of scanning
such as the use of the electroencephalogram (EEG) and functional magnetic resonance
imaging (f MRI). This allows us to decide whether specific signals result in specific neu-
ral responses (involving complex spatial position, temporal, and strength measures)—a
correlation of stimulus to response patterns. Both audio and visual stimuli have been
used7—while any musician knows both make important contributions, their relation-
ship and interaction at neurological levels has barely been addressed (Thompson
et al. 2013).
A second stage is now tentatively becoming established, namely a bolder and more
speculative move beyond correlation to use of the observed activity data to reconstruct the
original signal (I shall refer mostly to sound from now on). This in turn has two (often
blended) approaches: reconstruction of a sound signal per se, and then the reconstruction
of the effect of the sound. Grimshaw and Garner (2015) suggest the former is the real focus
of sonic virtuality while Thompson, for example, moves toward the latter, optimistically
asserting, “it should be possible to accurately reconstruct aspects of subjective musical
experiences from neurophysiological signals” (2013, 56),8 also arguing for the need “to
establish whether the reconstructed audio evokes the same musical experience as the
playing the inner ear: performing the imagination 263

original audio. The results of these behavioral experiments will guide future research
toward accurate stimulus reconstruction from brain activity” (Thompson et al. 2013, 6).
This is based on the oldest of scientific methods that needs careful handling. If music
(or any signal) X tends to stimulate neural pattern Y sufficiently consistently across a
wide population then the observation of Y implies the imaginary or real presence of X.9
We know philosophically that a correlation is not necessarily a cause—but we tend pro-
gressively to adopt this belief the greater the supporting evidence (and in the absence of
contrary evidence). In short, we are using an inverse procedure to logical deduction—
namely induction—as we do not yet possess a direct causal mechanism between signal
and response. A sophisticated application of this to speech resynthesis is described
in Pasley and colleagues (2012). While great progress at the level of general features
has been made they still report that “Single trial reconstructions are generally not
intelligible. However, coarse features such as syllable structure may be discerned” (13).
Early research focused on measuring the neural response to the presence of real audio
or visual material. But more recently there has emerged the first stages of detecting the
neural activity patterns relating to such as memory and even imagination in the absence
of any physical signal. Grimshaw and Garner (2015) include a meticulous review of this
relationship. Their position is the most radical reappraisal of what (and where) sound is:

Sound is an emergent perception arising primarily in the auditory cortex and that is
formed through spatio-temporal processes in an embodied system.
(1, this idea is developed throughout the book)

To support this thesis they develop the idea of the sonic aggregate, which

comprises two sets of components: the exosonus, a set of material and sensuous
components; and the endosonus, a set of immaterial and nonsensuous components.
The endosonus is a requirement for the perception of sound to emerge; the exoso-
nus is not. (4)

As a consequence of this (extensively argued) definition, Grimshaw and Garner suggest

that all kinds of sound perception come under this one umbrella. Thus “imagined sound,”
tinnitus, sonic hallucinations, and so forth are “sound” in the same way that sound in the
presence of a sound wave is. However, they do not argue that these are all thus “the same.”10
It is interesting to ask whether a familiar memorized sound (perhaps the result of
many listenings) will produce, when recalled, stronger patterns than a speculative
imagined one. Zatorre and colleagues (1996) examine this question in detail but with
respect to melodic (song) fragments—their detailed discussion looks at the degree to which
different brain regions are engaged in imagining and perceiving the same auditory material
for a set task. I am generalizing this question to include sound quality, which receives very
little mention in the literature. I suspect—without any empirical evidence—that recog-
nizable (real-world) sounds will produce stronger results when recalled, as opposed to
the more abstract sound types often found in recent sound art.11
264 simon emmerson

In their final chapter, Grimshaw and Garner (2015) state one possible ultimate aim
as “simply thinking a sound when one wishes to design audio files” (196). I am, in this
chapter, taking off from this point—namely the observation and use of the neural patterns
resulting from imagined sound as the basis for a synthesis engine.12

Rendering Memory
Writers as well as research scientists have imagined the awesome power of tapping directly
into another person’s memory and somehow “reading” it. In Dennis Potter’s final play
created for British television, Cold Lazarus (1994/1996), the memories of a writer13 whose
head has been cryogenically frozen for nearly four hundred years are extracted and
projected in 3D into a relatively large space—a hi-tech laboratory whose funding might
depend on selling the results to a worldwide TV network.
As the experiment proceeds with increasing success, we see memories of landscapes
and people and hear sounds and conversations that the small group of scientists tries to
make sense of. The past is thus preserved and then projected into the present—or is it? It
turns out it is not that simple—the observers slowly become aware that “it” is interacting
with them; the head retains a degree of consciousness. As the head can observe the pro-
jected memories, memory, the present, and imagination become confused.

Rendering Imagination
Let us take Potter’s vision and give it a more optimistic, forward-looking projection.
What of the future—the act of imagination of what might be—could this also not be pro-
jected in like manner to be rendered and synthesized at our behest? This is not synthe-
sizing the future strictly but “the imaginative present”—we might project what we hear
(and see) in our imagination right now.
Can we really imagine sound without access to memory? This may be impossible to
answer, as we clearly access memory without conscious intention. As Murray Schafer
(1977) long ago pointed out, certain sounds seem to have a universal resonance, possibly
through their role in both our long-term evolution and in our experience of prebirth
sound in an amniotic state. Of course, we can consciously recall sound to our own present
internal perception but to communicate this to others in any detail14 we need (at present)
language and other descriptive symbol sets. Perhaps naming itself is an act of memory,
being shorthand for this description—but for that we seem to need a degree of stability
and repeatability.
Throughout my life I have heard sounds while driving that I have wanted to capture—
I am aware that this sensation lies uneasily between physical perception and imagination.
Sometimes, I cannot tell the provenance of the sound at all—my considered view is that
some aspect of the real sound around me provokes an additional imaginative layer and
the two strongly interact.
playing the inner ear: performing the imagination 265

Thus, this is more than simply externalizing imagination and effecting its synthesis
into sound—it may one day be possible to unravel these two components and understand
their interaction. The sounds are (unsurprisingly) typically drones but with great (and
sometimes changing) internal detail and occasional sharper events as a kind of punctu-
ation. Their mystery is compounded by ambiguous spatiality—both very close and very
distant at the same time. I have repeatedly referenced the line from the text of Stockhausen’s
Momente (from a personal letter from Mary Bauermeister): “Everything surrounding
me is near and far at once”—as a personally relevant resonance. I believe it is a key
modernist trope concerning spatiality in a mediated age.

Memory and Imagination

One aspect of imagination is that it may be seen as anticipatory behavior—a tool for sur-
vival. It has also been suggested that it has expanded into the mental bandwidth previously
occupied by the need to memorize—whether Homeric epics or routes for navigation on
land and sea, before maps, writing, and other externalized forms of memory. This process
continues in our smart technology, especially in its use to stream our experiences to exter-
nal platforms for sharing and later review. Why bother to remember (detail at least) when
the device does this for you—and you can choose what to recall and review later?

Embodied Response

Perhaps other aspects of embodied response may be reverse engineered to join our
battery of drivers of the new imaginative synthesizer. What do sound and music stimulate
within us? One most extensively researched recently is that of mirror neurons firing “in
sympathy.” Indeed, in their discussion of the relationship of music to the mirror neuron
system, Molnar-Szakacs and Overy (2006) go so far as to write that

we propose that humans may comprehend all communicative signals, whether

visual or auditory, linguistic or musical, in terms of their understanding of the
motor action behind that signal, and furthermore, in terms of the intention behind
that motor action. (238)15

Watch dance and we mentally dance too. It is thus not surprising that actual muscular
movement creeps back in, such as rhythmic foot tapping or bodily reduced (more or
less) dance gestures. Then there are families of “air” activities—air guitar and air con-
ducting are common—that indicate the embodied attunement and entrainment of the
listener. These have already been actively harnessed in many computer game controllers
and interfaces. Still strictly embodied and physical, we might add our air imagination—
which will have rather different characteristics that we will explore in what follows.16
266 simon emmerson

Time Scales of the Embodied

In 2001, I wrote an article for Computer Music Journal “From Dance! to ‘Dance’—Distance
and Digits” (Emmerson 2001). In part, this was a response to the anxiety surrounding
the relationship of periodic (beat driven) to nonperiodic electroacoustic musics.
I suggested this apparent division related to the different time scales of periodicity and
memory pertaining to the human body on the one hand and the environment on the
other. The evolution of the human organism has been one of continuous adaptation to
the environment including the earth’s gravitational field. The limits of the periodicities
of limb and movement are mechanically defined to provide our world of meso-time. But
our mind (which is, of course, a part of the body) deals with longer time scales, of contem-
plation, and reflection, as well as the ability to apprehend the longer rotations of seasons,
stars, and the social structures and conventions of life itself. There is no clear division or
border but a continuum of body time scales to mind time scales.
The relationship between gesture and texture—Denis Smalley’s (1986) paradigm-
defining duo—is crucial and shows a similar contrast of scale:

Gesture is concerned with action directed away from a previous goal or towards a
new goal . . . Texture . . . is concerned with internal behaviour patterning, energy
directed inwards or reinjected, self-propagating. (82)

Thus, gesture tends to imply the performative—involving clear cause, effect, and agent—
while texture tends to imply elemental continuity where the cause, effect, and agent
chains are more complex. We can either describe these as being at micro-scales we cannot
individually perceive, or as much larger immanent structures with some kind of con-
tinuous (and often vague) agency and causality. To harness the potential of “air imagination”
we will need to capture the complete range of these time scales, from the immediately
embodied rhythmic through to longer time scales of day, week, month and season, year,
growth, and decline.

Synesthesia and the

Visual Imagination

Some listeners describe a visual “accompaniment” provoked by sound. If we deal with

music in general this has a developed literature. Van Campen (2008), itself a key text,
has an extensive and comprehensive annotated bibliography. True involuntary visuali-
zation while listening to music—one of the most important forms of synesthesia—is rare.
But there seems to be some residual (less specific) visualization in many listeners. More
recent research suggests this is more common than we might think—whether related to
the imagination is a moot point—but relevant here (Van Campen 2008, 42). Much
music today is perceived acousmatically, that is, without any available visual clues as
playing the inner ear: performing the imagination 267

to source and cause. Electroacoustic music, of course, often uses this as the basis of its
aesthetics, deliberately bracketing out the search for origins and thus stimulating the
imagination through sound alone.17
I personally do have imaginary visualization when listening to electroacoustic music:
I see shapes, textures, colors, often “set” in a quasi-real-world vista and spaces. This
might be abstract geometric or more of a “landscape environment.” I listen with eyes
open to enhance this perception that seems to be in real space around me, superimposed
on (and strangely integrated with) the actual visual information from wall materials,
loudspeakers, other audience disposition, and set-up.

Notation, Visualization, and Evocation

There is available to us another possible tool that we might develop to capture some
of these experiences. It has long been discussed how best to write down any music from
an “aural” tradition. Much electroacoustic music belongs to such a tradition, one with
little or no human-readable notation.18 The need to “fix” this sonic flux19 comes from
several quarters.
First, there is diffusion (or projection) in concert: the performer—often the com-
poser—cannot remember every detail in the work and needs a representation to follow
and anticipate successive sound events, thus allowing a strategy of presentation—a per-
formance—usually at a mixing console to distribute the sound around the auditorium.
Classic examples are found from the musique concrète and acousmatic music traditions
over the years—for example, from Bernard Parmegiani (in Chion 1982) and recently a
wonderful book including many such scores from Trevor Wishart (2012). Second, those
that study and want to understand the music demand an “outside-time” representation
to allow analysis and contemplation. Then we have also a more hidden and private
domain: that of the composer’s mnemonic sketch score. This may have a range of func-
tions within the composing process (Gray 2013); remembering the characteristics of
sounds from one day to the next, suggesting possible sound trajectories and develop-
ments, but also, tantalizingly close to what we are developing here, projecting the evoca-
tive notation into sound rather than the reverse.
This leads to the need for what has been termed “evocative transcription.”20 The aim is
to create a symbolic visual representation (a “picture”) of the sound that both represents
it and evokes it in memory and imagination. Many traditions have produced both hand-
drawn and machine-driven transcriptions. In machine transcription, there are well-
established time and frequency-domain representation procedures. In the frequency
domain (which is key to our synthesizer ideas), spectrum representations may be
“objective” but have inadequate correlation to—and evocation of—the actual sound
as perceived. In more recent years, both the software packages Acousmographe (INA/
GRM)21 and EAnalysis (Couprie)22 aim for a hybrid of machine-assisted and manual
evocative transcription. Symbolic shapes from a library may be combined with freehand
drawing and both may be subject to standard shape transformations. There are others
that aim at a more machine-focused approach (Park 2016) but this will need to be informed
268 simon emmerson

by extensive research into human perception to allow the machine’s knowledge to be

accurately recognized by the human interpreter.
There have been more discussions than real attempts at a degree of agreement on
standard universal symbols for this visualization (Couprie 2004). But these generally
have a degree of abstraction that is difficult for many to grasp (we do not automatically
“map” the characteristics of sound in a standard way). Most symbolic systems to date
have thus been geometric and avoid reference to any representation of a source object
(for example a bell symbol to represent its sounding), yet many young people are happier
in this more direct representative domain.23 Perhaps some degree of coherence and
standardization of these visual attributions could be established through more thorough
research into evocative notation. Experience and a lot of use will tell us what works.

Reverse Engineering: “Air Imagining”

We aim to reverse engineer our “air imagining.” If we have the means to create an evoca-
tive transcription through the quasivisual imagination, we have the potential to work
this backward: a synesthesic synthesizer—requiring software to translate a repertoire of
shapes, attributes, colors, and textures into sound. And if generalization and universal
agreement on notation is not feasible, personalization of choices and preferences should
be possible. Perhaps later generations will be more at ease with a computer system seeming
to know, or at least endlessly second guess, our patterns of needs and desires, as in the plac-
ing of automated website advertising that appears to follow our recent—as well as making
suggestions for future—purchases. So, for our evocative synthesis engine, the system can
learn our individual preferences in graphic style, shape, and representation, and verbal
attribution—as well as pattern matching to known sound types we have used in the past.
The proposal inevitably suggests an experimental fuzziness—the aim in such an envi-
sioning of synthesis is to allow greater creative and imaginative play, not instant formulaic
generation. Predictability would soon lose the user’s interest. I am challenging the designers
to harness learning engines and MIR24 search capability to work out my evocative tran-
scription preferences, my visualization strategies to drive the system back into sound
synthesis. There are of course software packages and apps that are beginning to address
this process (e.g., Metasynth).

Playing and Performing

the Imagination

The synesthesic synthesizer may be played to produce new sounds—but what about new
music? Now is not the time to discuss any distinction between the two ideas—sound
may be music, music must effectively involve sound, but the boundary may be both
playing the inner ear: performing the imagination 269

porous and flexible. That said, I do have a bias toward retaining somewhere in the
relationship the idea of performance. Some of the actions that led to the imaginary
synthesis described previously were essentially performative. But I hesitate to say they
were performance.
I want to make a fuzzy distinction between playing and performing. Playing might be
seen as a search for suitable materials, performing as presenting some kind of structure—
maybe perceived as “expression” or “argument”—beyond the individual components,
although the two clearly overlap. Musicians are generally not dancers. Their movements
have been accurately directed by mechanical technology toward the physical excitation
of an object. The use of media has freed this up, allowing movement alone to control
sound, bringing dance and music performance a step closer. But there often remains a
strong residual desire for some sort of resistance. The study of such haptics informs inter-
faces and interactions where this enhances muscular control. The ultimate “air play”
may combine free and resistive components.25

Space and Imagination

Space is essentially tactile—it is so much more than an abstract extension. Let us look
back to the physical objects of the analog studio: switches, knobs, dials, and faders. We
had to move ourselves, our limbs, to operate the system far beyond the reduced actions
needed for a QWERTY keyboard. In the first instance, as the digital revolution first
took hold, these physical objects (switches, etc.) were steadily replaced first by “number
boxes” and then screen icons representing the absent objects themselves. The funneling
of information was extreme—to program a synthesizer in the early 1980s involved
addressing a tiny handful of parameters on these diminutive displays, one group at a
time, storing and moving on. It was not even possible to see, let alone feel, the disposition
of the machine.26
We increasingly sense this loss of “tactile location”—the desire to play using our hands
persists and has even increased in recent years. Thus, the analog studio was performed
with wide-ranging physical gestures. These might bear a curious relationship to possible
performance gestures at the mixing desk in the performance space. There are wonderful
images of François Bayle working in the GRM studios in 1984 that capture this sense of
performance, its energy and dynamism, perhaps with an improvisatory edge (Bayle 1993,
photo V et seq. following page 27). This contrasts with the more controlled and judged
performances on the Acousmonium (the multi-loudspeaker system originally designed
by Bayle at the GRM in the 1970s [photo II–III, following page 16]).

Sound Sculpting, Sound Dancing

Glimpses of this possibility are seen in some of the earliest experimental inventions.
At the musique concrète studios in Paris we see the pupitre d’espace (1951) designed to
270 simon emmerson

perform the spatialization of the prerecorded sound around the audience.27 We can now,
of course, adapt this for a real-time sound sculpture where the movement controls
sound quality. Thus, the next stage of our imaginary sonification performance might be
to manipulate sound in space as a malleable (even fluid) substance—to place, move, and
“smear” sounds within that space, as a painter might sketch or, more appropriately, as a
dancer might move or a sculptor might manipulate clay. This shifts the metaphor for
externalizing imagination from 2D painting or movie to a 3D activity—dancing, bricolage,
or sculpting. Yet again, many of the inventions and developments of previous decades
may give us helpful clues as to how the new haptics (with resistance) and 3D representation
can be adapted to the musician’s touch and feel. Our imagination of being a dancer/
sculptor can be as a creator immersed in the sound—something our real sculptor can
but dream of.28
From the STEIM studio in Amsterdam came some of the most inventive devices to
harness the elemental (human) agency of movement. From 1984 on, Michel Waisvisz
developed a series of controllers, known as “The Hands,” detecting hand and some body
movement (Waisvisz 1985, later versions may be seen on the STEIM website29 especially
in videos of Waisvisz’s performances over the years). Using a later technology, Laetitia
Sonami developed her “Lady’s Glove”30 and, in a more popular idiom, Imogen Heap has
used similar controls (her Mi.Mu gloves31). Such gesture transducers might be a very
suitable interface to capture the “air” controller gestures as we explore performance as a
creative part of the imagination synthesizer. We must also remember another powerful
tool that might be controlled by such dance or sculptural gestures.

An Imaginative Plug-In—Imaginary
Sound Transformation
As we remarked in the introduction, the mind—the imagination—is also a fabulous
sound-transformation device. While the sensational world is bound by some sense of an
externally applied space and time, the imagination knows not these as boundaries. Just
as we remarked on documented cases of composers glimpsing an entire piece in an
instant, so too we might be able to grasp a complex transformation in a flash, then per-
haps “play” it at something nearer the time of real performance. We might be able to
compare alternative strategies for the sound to develop. Time compression and expan-
sion can be a useful tool for creation but may not map simply to external world time.
And so too with space: Gaston Bachelard’s intimate immensity (1964) speaks to an
oneiric experience that I have often had in the twilight between wakefulness and sleep
and also when entering the Olympic Stadium in London in 2012. Many composers (and
movie sound designers) have tried to capture such an experience in real sound. If our
imaginative experience of space can be surreal (even completely unreal) then it can act
as a stimulus—an impossible goal we know we cannot reach but there might be fruitful
experience in trying.
playing the inner ear: performing the imagination 271

Where Do We Begin? Seeds

and Provocations

Let us assume we do indeed have a suitable synthesis engine that can (in some way yet to
be determined) respond to our imaginative synthesis wishes. But the system starts with
a blank. Where do we begin? What do we start with? With the concrète traditions of
music-making we start with a sound and play with it—an empirical and experimental
approach. But, with our imaginary synthesizer, we have many possible sounds that do
not yet exist. Or perhaps they do exist but are hidden from our consciousness until called
forth. Let us divide the possibilities into seeds and provocations. I will not come to defini-
tive conclusions here but make some key suggestions and questions that we can crea-
tively address.
First, let us describe a seed stimulus: this might be external or internal—empirical or
idealized. External, here, means the origin was a real sound and remembered. Then
again, the seed could be entirely imagined—or perhaps a combination of sounds impos-
sible in the world around. The system will not be perfect—we could even say it “guessti-
mates” what we are imagining. A (real) sound is made and play begins. The user can
treat this first attempt as a source, and maintain the original thought as the target. This is
a kind of “imagination control loop”—change slightly till matched, or till sufficiently
close. Or, of course, we could treat any outcome as something entirely new with a future
path of its own and forget the original stimulus. It might be the case that some imagined
sounds are in fact physically impossible.32 Furthermore, “holding” a sound in the imagi-
nation unaltered while being compared with other sounds might be a difficult (perhaps
impossible) task! Hence the need for the evocative transcription discussed earlier to
help us fix it.
Provocations work somewhat differently. One line of thought throughout both mod-
ernist and postmodern musical discourse has been the creation of the unexpected as a
major device—not unexpected simply from the listener’s perspective but from even the
composer’s and performer’s perspectives. These involve generative procedures—usually
some kind of automata that (more or less) decouples the immediate taste and will of the
composer and performer.
Some like external “models” for this reason; to generate what might not otherwise
have been conceived. Needless to say, others reject them completely! Common in recent
decades have been systems that are both mathematically beautiful and relate to a degree
to a real-world phenomenon. Examples include fractal and chaotic systems, swarm
algorithms, and so on. Within our model here, these could be provoking our imagination,
kick starting the synthesizer. It will remain a matter of choice as to whether and to
what degree we intervene and guide the system toward any goal. If we fixed a goal
(as earlier) using a form of evocative transcription then we remain free to modify and
moderate this ideal—perhaps our provocative system comes up with something we
prefer to our original target.
272 simon emmerson

That leads us to a final source of potential input to our mind machine. Earlier generations
thought of the term “synthesizer” as pertaining to electronic generation usually of sound
types clearly not derived directly from real, sounding objects, although often based on
instrumental models (wind, brass, string). But largely through the analysis/resynthesis
developments in the last quarter of the twentieth century, the distinctions between sam-
plers and synthesizers steadily blurred until none now remain. There has also been a
cultural shift to hearing technological sound as part of an extended environment—the
nature-culture divide has effectively disappeared. Birdsong sits alongside traffic sound
in the urban soundscape.

Nature, Playing, and Performing

This powerful symbiosis of the recorded and the synthesized allows a new form of rela-
tionship to environmental sound, based on a transition from it playing (recorded sound-
scape) to playing it (synthesized soundscape). This is nowhere more clearly seen and
heard than in the dream of harnessing the environment as music. This has a long history
(hinted at in the Shakespeare quoted previously) accelerated by the advent of recording
and communication technology. There are two imaginings here. At first, the object of
the imagining has a life of its own (it plays). This is a notion clearly articulated in the
nineteenth century with respect to “nature”:

At a sufficient distance over the woods this sound [bells] acquires a certain vibratory
hum, as if the pine needles in the horizon were the strings of a harp which it swept.
All sound heard at the greatest possible distance produces one and the same effect,
a vibration of the universal lyre. (Thoreau 1986, “Sounds” [from Walden 1854])

but radically reformed by the futurists to include urban and industrial sound:

We will sing of the vibrant nightly fervour of arsenals and shipyards blazing with
violent electric moons . . . deep-chested locomotives whose wheels paw the tracks
like the hooves of enormous steel horses bridled by tubing; and the sleek flight of
planes whose propellers chatter in the wind like banners and seem to cheer like an
enthusiastic crowd. (Marinetti [1909] 1973)
To convince ourselves of the amazing variety of noises, it is enough to think of the
rumble of thunder, the whistle of the wind, the roar of the waterfall . . ., and of
the generous, solemn white breathing of a nocturnal city. (Russolo [1913] 1973)

But, in the early twentieth century, there rapidly emerged from this a dramatic new
option—I can play it!—returning to the human listener the possibility of becoming per-
former. In 1922 (in Baku, Azerbaijan), Arseny Avraamov created a Symphony of sirens:

Avraamov worked with choirs thousands strong, foghorns from the entire Caspian
flotilla, two artillery batteries, several full infantry regiments, hydro-airplanes, twenty
five steam locomotives and whistles and all the factory sirens in the city. He also
invented a number of portable devices, which he called “Steam Whistle Machines”
playing the inner ear: performing the imagination 273

for this event, consisting of an ensemble of 20 to 25 sirens tuned to the notes of the
Internationale . . . Avraamov did not want spectators, but intended the active partic-
ipation of everybody. (Molina 2008, 19)33

However, the technology of recording allows a simulacrum of such a vast and ungainly
process. Recording sounds (more recently, of course, sound and image) allows the cre-
ation of a substitute for the real environment—and these are a lot simpler to play than the
original! From the very earliest days of Pierre Schaeffer’s experiments in the studio at
French radio, we have his invention of the sampler—in his imagination:

Once my initial joy is past, I ponder. I’ve already got quite a lot of problems with my
turntables because there is only one note per turntable. With a cinematographic
flash-forward, Hollywood style, I see myself surrounded by twelve dozen turntables,
each with one note. Yet it would be, as mathematicians would say, the most general
musical instrument possible. Is it another blind alley, or am I in possession of a solu-
tion whose importance I can only guess at?
(Pierre Schaeffer’s diary: April 22, 1948 [Schaeffer 2012, 7])

It is clear from the contextual discussion that Schaeffer does not mean “note” as the tra-
ditional pitched event but in a more general sense of what was to become the “sound
object.” Thus, with the advent of the internet nearly fifty years later, the ability to “sample”
sounds worldwide becomes a real possibility—even off-earth through, for example, the
NASA website—thus giving us the power to reach out to, play with, and ultimately perform
the environment in its mediated forms.34 Technology allows the creative reorganization of
these spaces; their transformation (often through the simplest means of amplification
and spatialization)—a “small” event can become a landscape.35 Our imagination allows
us to become Alice in Wonderland and change scale—the human scale can be made
gargantuan and the largest can be brought within human scale. John Cage famously did
this in many of his installations, projecting one space into another. His realizations of
Variations IV (1963) and Roaratorio (1979) are good examples. Thus, amplified small
sounds can fill a listening space alongside the reinjection of an entire city soundscape.

Conclusion—and a Footnote on
Ethics and the Transparency
of the “Fourth Wall”

In the five or so years between first thoughts about ideas behind this chapter and the
time of writing, speculation has rapidly become reality. On the one hand, the develop-
ment of increasingly accurate and extensive brain scanning techniques, on the other, the
advent of commercially available EEG and ECG brain interfaces (with a major drive to
produce thought-directed game controllers36) suggest that sooner rather than later we
could have imagination-driven sound synthesis.
274 simon emmerson

I have suggested that the way into this may not be so simple—both acousticians and
musicians have found it extraordinarily difficult to describe or define timbre or sound
quality. Just as the ideas themselves are multidimensional, so we shall need to harness all
the tools we have used to date in our new synthesis engine, from the most quantitative
measures to the most playful and creative actions across many modes—graphic, sculp-
tural, movement, or haptic. Musicians are well used to creative play and improvisation,
and I have argued that such embodied performance will be an integrated part of this
new experimental world—indeed vital to its fulfillment.
While an exciting prospect, potentially of enormous power, we shall need to tread
with mindful awareness. The example from the work of Dennis Potter I cited earlier
contained a basic ethical dilemma—the retrieval of the memories from the unfrozen
cortex was literally torture for the conscious head unable to express itself. Until in the
final episode it manages to construct a “message” on a piece of paper in its imagination
which begs for release—granted a short while later in a terrorist attack on the laboratory.
A final example will amplify this need for care in ethical matters as some of the tools
I am sketching do indeed come into existence in coming decades.
The creative model I have discussed here is based on an optimistic “projection out-
wards” under our aware control (with all its limitations) and with our consent. But there is
the dystopic mirror view that might become an “invasion inward” without our (apparent)
knowledge or permission. Mind reading is just the start of it in Andrei Tarkovsky’s
sci-fi film Solaris (1972). The space station is overrun by the invisible intelligence of the
planet’s ocean that can create apparently real people, things, and places from the memories
of the cosmonauts. In this case, they never had control over this immense power. They were
not responsible for its behavior and do not begin to understand its working. The scientists
on board (and back on Earth) know only of certain “rhythms” and changes in the ocean’s
behavior—and that the “creations” do not possess the same atomic structures as their
earthly equivalents. The humans on board not surprisingly become increasingly deranged.
If we wish to conclude with the more optimistic view that we can avoid our dreams
becoming nightmares, then we will need to share openly the necessary knowledge and
understanding of the workings of our imagination synthesizer. We might need to take
steps to ensure that this is the case and to gain aware consent for its use. Today we may
cursorily accept a “cookie” regime on our computer; but let us imagine an equivalent
(or more advanced) observer of our behavior while wearing an EEG interface and the
possible consequences of that data collection if made unaware and uncontrolled. While
moving to matters outside the remit of this chapter, we shall need to be aware of the
issues and participate in deciding our preferred safeguards.

Notes
1. This chapter is based on my keynote presentation to Audio Mostly 2014—“Imagining Sound
and Music,” run by the Music and Sound Knowledge Group, Aalborg University, October
2014. Some ideas first appear undeveloped in my keynote addresses to ACMC2011 (Auckland)
and ICMC2011 (Huddersfield).
playing the inner ear: performing the imagination 275

2. The standard understanding is that discourse will always be language—but musicians

muddy this somewhat by referring to “musical discourse” and “musical language” in a
close parallel (see Jean-Jacques Nattiez 1990).
3. My earliest experiences of electronic music date from 1969, meetings in Cambridge with
Roger Smalley and Tim Souster, and my subsequent work with them and their group
Intermodulation.
4. See http://www.oxforddictionaries.com/definition/english/imagination. Accessed January
15, 2016.
5. Unfortunately, this is not in line with common usage in neuroscience. In this literature
“image” can refer to a wide range of neuronal constructs that (it is argued) go to form the
experiences of a mental image that may not be visual. Zatorre and Halpern (2005) expressly
use this broader view in discussing “musical imagery.”
6. Levinson (1998) argues against the need for this architectonic reconstruction (in memory)
for the listener. But practitioners and anyone seeking to understand musical working (musi-
cologists included) will have need of a grasp of this wholeness after the event, usually exter-
nalized outside of time in a diagram of some kind.
7. There is much work using fixed image stimuli—recent moving image research adds much
greater complexity in time-domain capture (Nishimoto et al. 2011). There is no equivalent
in sound that inevitably possesses this complexity!
8. Stober and Thompson (2012) have suggested the subdiscipline of music imagery infor
mation retrieval (MIIR), but this remains (at present) a genre characteristic rather than
specific sound detail as it is intended to interface MIR (metadata) engines.
9. Shown well in Schaefer and coauthors (2011) for real music signals. Grimshaw and Garner
(2015) may blur the distinction but do not deny the difference.
10. p.158—where they aim to “account for both the similarities and differences observed in
neuroimaging research between imagined sound and sound perceived in the presence of
a sound wave.”
11. I further speculate that this might relate to what Denis Smalley (1996) describes as “indic-
ative fields” even within the abstraction of much acousmatic music—a kind of resonance
with archetypes—but this is for future research.
12. Grimshaw and Garner’s brief elaboration focuses on the emotional affect of the sound—
one of their exemplary streams is sound design in computer games—what follows here
deals with this only indirectly. I also do not discuss how I will an imagined or recall a
memorized sound into existence.
13. Daniel Feeld—whose end-life drama has been the subject of Potter’s previous play, Karaoke,
with which it is paired.
14. We might vocalize what we imagine in order to communicate it—see Malloch and
Trevarthen (2008) with respect to “communicative musicality.” Trevor Wishart has also
described the human voice as “a flexible sound-generating device, like a sophisticated
synthesiser” (Emmerson 2007, 106), though he did not describe this particular possible use!
15. This is not without detractors: Hickok (2009) strongly challenges assumptions, evidence,
and philosophy behind the theory’s being “all but accepted as fact.”
16. Although less overt in its manifestation, Sheets-Johnstone (2011) argues that this just as
much originates in the embodied world of movement.
17. This is Pierre Schaeffer’s notion of écoute réduite (reduced listening) that forms the basis
of musique concrète (Chion 1983, 33–34).
18. Computer representation of sound may be seen as a kind of notation but it cannot be
“read” directly except by other computers!
276 simon emmerson

19. That is assumed to mean we map the time of the music onto the space of the page (or screen
equivalent) in some way—discussed further in what follows.
20. This phrase has been around for several decades with no obvious origin. For a good intro-
duction to its meaning and function, see Hugill (2012, 237).
21. See http://www.inagrm.com/accueil/outils/acousmographe. Accessed May 15, 2017.
22. See http://logiciels.pierrecouprie.fr/?page_id=402. Accessed May 15, 2017.
23. While this has not been the subject of formal research, evidence and discussion may be
found in Wolf (2013), Holland (2016), and ideas behind the EARS2 resource site (ears2.
dmu.ac.uk) and the associated software Compose with Sounds (cws.dmu.ac.uk).
24. MIR is a fast-developing discipline that has harnessed machine assistance to seek, sort,
and represent (visualize and display) information of some use and comprehension to the
user from “big data” sources (see Casey et al. 2008).
25. There is an interesting case with a conductor. While theoretically a “no resistance” system,
I wonder to what extent the response of the orchestra/ensemble “feels” like a resistive
weight.
26. This did not stop the Yamaha DX7 (1983) becoming the most successful synthesizer in
history at that time.
27. For an image, see the booklet with the CD box archives GRM (INA/GRM 2004, p.18).
28. This is something well developed for musicians of limited physical movement—
see the Sound=Space environment developed by Rolf Gehlhaar as an example
(http://www.gehlhaar.org/x/pages/soundspace.htm. Accessed May 15, 2017).
29. See http://steim.org/. Accessed May 15, 2017.
30. See http://sonami.net/ladys-glove/. Accessed May 22, 2017.
31. See, https://mimugloves.com. Accessed December 14, 2018.
32. This is not the place to discuss the interesting relationship between “impossible” and
“impossible to produce”—or the possibility that anything imaginable might exist
somehow.
33. Molina’s words, but followed by the complete “instructions” for the performance written
by Avraamov himself (20–21), as published in the local press. These comprise, in fact, an
hour-by-hour scenario of the entire event. Molina reports that there were two predecessors
(1919, 1921) and a subsequent full version in Moscow (1923).
34. An excellent and extreme version is seen in such as “The Earth’s Original 4.5 Billion Year-
Old Electronic Music Composition,” an installation by Robin McGinley (2002) that proj-
ects “sferics” into the installation triggered by the visitors (see https://vimeo.com/66475800.
Accessed May 15, 2017).
35. I have written elsewhere on “space frames” and their transformation and reconfiguration
through technology (Emmerson 2007, 2015).
36. See, for example, the interfaces from Emotiv (emotiv.com) and Neurosky (neurosky.
com)—with others announced.

References
Bachelard, G. 1964. The Poetics of Space. Boston: Beacon.
Bayle, F. 1993. Musique acousmatique—propositions . . . positions. Paris: INA and Buchet/
Chastel.
Bharucha, J. J., M. Curtis, and K. Paroo. 2006. Varieties of Musical Experience. Cognition 100:
131–72.
playing the inner ear: performing the imagination 277

Casey, M., R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. 2008. Content-Based
Music Information Retrieval: Current Directions and Future Challenges. Proceedings of the
IEEE 96 (4). 668–696.
Chion, M. 1982. L’envers d’une oeuvre (Parmegiani: De Natura Sonorum). Paris: Buchet/Chastel.
Chion, M. 1983. Guide des objets sonores—Pierre Schaeffer et la recherche musicale. Paris:
Buchet/Chastel.
Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
Couprie, P. 2004. Graphical Representation: An Analytical and Publication Tool for
Electroacoustic Music. Organised Sound 9 (1): 109–113.
Emmerson, S. 2001. From Dance! to “Dance”: Distance and Digits. Computer Music Journal
25 (1): 13–20.
Emmerson, S. 2007. Living Electronic Music. Aldershot, UK: Ashgate.
Emmerson, S. 2011. Music Imagination Technology. Keynote Address. In Proceedings of the
International Computer Music Conference Huddersfield, ICMA, 365–372. San Francisco,
ICMA.
Emmerson, S. 2015. Local/Field and Beyond: The Scale of Spaces. In Kompositionen für
hörbaren Raum (Compositions for Audible Space), edited by M. Brech and R. Paland, 13–26.
Bielefeld: transcript Verlag.
Gray, D. 2013. The Visualization and Representation of Electroacoustic Music. PhD thesis,
Leicester: De Montfort University.
Grimshaw, M., and T. A. Garner. 2015. Sonic Virtuality: Sound as Emergent Perception. Oxford:
Oxford University Press.
Hickok, G. 2009. Eight Problems for the Mirror Neuron Theory of Action Understanding in
Monkeys and Humans. Journal of Cognitive Neuroscience 21 (7): 1229–1243.
Holland, D. 2016. Developing Heightened Listening: A Creative Tool for Introducing Primary
School Children to Sound-Based Music. PhD thesis, Leicester: De Montfort University.
Hugill, A. 2012. The Digital Musician. 2nd ed. New York and London: Routledge.
Le Corbusier. 1968. Modulor 2. Cambridge, MA: MIT Press.
Leman, M. 2016. The Expressive Moment: How Interaction (with Music) Shapes Human
Empowerment. Cambridge, MA: MIT Press.
Levinson, J. 1998. Music in the Moment. Ithaca and London: Cornell University Press.
Malloch, S., and C. Trevarthen. 2008. Communicative Musicality: Exploring the Basis of
Human Companionship. Oxford: Oxford University Press.
Marinetti, F. T. (1909) 1973. The Founding and Manifesto of Futurism. In Futurist Manifestos,
edited by U. Appolonio, 19–24. London: Thames and Hudson.
Molina Alarcón, M. 2008. Baku: Symphony of Sirens: Sound Experiments in the Russian Avant
Garde. London: ReR Megacorp.
Molnar-Szakacs, I., and K. Overy. 2006. Music and Mirror Neurons: From Motion to “E”motion.
Social Cognitive and Affective Neuroscience 1 (3): 235–241.
Nattiez, J.-J. 1990. Music and Discourse: Toward a Semiology of Music. Princeton, NJ: Princeton
University Press.
Nishimoto, S., A. T. Vu, T. Naselaris, Y. Benjamini, B. Yu, and J. L. Gallant. 2011. Reconstructing
Visual Experiences from Brain Activity Evoked by Natural Movies. Current Biology 21:
1641–1646.
Park, T. H. 2016. Exploiting Computational Paradigms for Electroacoustic Music Analysis. In
Expanding the Horizon of Electroacoustic Music Analysis, edited by S. Emmerson and
L. Landy, 123–147. Cambridge: Cambridge University Press.
278 simon emmerson

Pasley, B. N., S. V. David, N. Mesgarani, A. Flinker, S. A. Shamma, N. E. Crone, et al. 2012.

Reconstructing Speech from Human Auditory Cortex. PLoS Biology 10 (1): p.e1001251.
Russolo, L. (1913) 1973. The Art of Noises (Extracts). In Futurist Manifestos, edited by
U. Apollonio, 74–90. London: Thames and Hudson.
Schaeffer, P. 2012. À la recherche d’une musique concrète. Translated by J. Dack and C. North:
In Search of a Concrete Music. Los Angeles: University of California Press.
Schaefer, R. S., J. Farquhar, Y. Blokland, M. Sadakata, and P. Desain. 2011. Name That Tune:
Decoding Music from the Listening Brain. Neuroimage 56 (2): 843–849.
Schafer, R. M. 1977. The Tuning of the World. New York: Knopf.
Sheets-Johnstone, M. 2011. The Primacy of Movement. Amsterdam and Philadelphia: John
Benjamins.
Smalley, D. 1986. Spectro-Morphology and Structuring Processes. In The Language of
Electroacoustic Music, edited S. Emmerson, 61–93. London: Macmillan.
Smalley, D. 1996. The Listening Imagination: Listening in the Electroacoustic Era. Contemporary
Music Review 13 (2): 77–107.
Stober, S., and J. Thompson. 2012. Music Imagery Information Retrieval: Bringing the
Song on Your Mind Back to Your Ears. Paper presented at The 13th International
Conference on Music Information Retrieval (ISMIR12). https://pdfs.semanticscholar.org/
45a3/da1804f955ecde6a59eff7d6545ad7c607b6.pdf. Accessed May 15, 2017.
Thompson, J. 2013. Neural Decoding of Subjective Music Listening Experiences. Master’s
thesis (Digital Musics), Hanover, NH: Dartmouth College.
Thompson, J., M. Casey, and L. Torresani. 2013. Audio Stimulus Reconstruction using Multi-
Source Semantic Embedding. Poster presented at Neural Information Processing Systems
Workshop on Machine Learning and Interpretation in Neuroimaging, Lake Tahoe, USA.
Paper version available from https://sites.google.com/site/mlininips2013/proceedings-of-
mlini-2012-1. Accessed May 15, 2017.
Thoreau, H. D. 1986. Walden and Civil Disobedience. Harmondsworth, UK: Penguin.
Van Campen, C. 2008. The Hidden Sense: Synesthesia in Art and Science. Cambridge, MA:
MIT Press.
Waisvisz, M. 1985. The Hands: A Set of Remote Midi-Controllers. In Proceedings of the
International Computer Music Conference., 313–318. San Francisco: ICMA.
Wishart, T. 2012. Sound Composition. York, UK: Wishart.
Wolf, M. 2013. The Appreciation of Electroacoustic Music: An Empirical Study with Inexperienced
Listeners. PhD thesis, Leicester: De Montfort University.
Zatorre, R. J., A. R. Halpern, D. W. Perry, E. Meyer, and A. C. Evans. 1996. Hearing in the
Mind’s Ear: A PET Investigation of Musical Imagery and Perception. Journal of Cognitive
Neuroscience 8 (1): 29–46.
Zatorre, R. J., and A. R. Halpern. 2005. Mental Concerts: Musical Imagery and Auditory
Cortex. Neuron 47: 9–12.
pa rt I I I

P SYC HOL O GY
chapter 14

M usic i n Deten tion

a n d I n ter rogation
The Musical Ecology of Fear

W. Luke Windsor

Introduction

When we are fearful, it can be because we are threatened or perceive a threat: often that
threat is current, sometimes it is remembered, and sometimes it is imagined. Music (and
sound) can play a role in the generation of fear, and in this chapter I will argue that music
is used in detention and interrogation not only to influence our emotional state directly
but also to create an ambiguity and uncertainty that leaves the detainee subject to the
free play of imagination, perverting the benign imagination of aesthetic contemplation
into something malign and horrific. In order to do this, the boundary between the real
and the imagined will be explored: the chapter aims to identify the location of this bound-
ary, how it is set for individuals, and the circumstances in which it becomes crossed.
A subsidiary aim is to address the broader context for the use of music in detention and
interrogation in order that, in the more academic quest for an understanding of a set of
musical behaviors and their consequences, the history and psychology of music’s use as
a military tool is not overlooked.
Music is often seen as a force for good: for example, the co-optation of Mozart’s
music as a panacea has become a paradigmatic example of folk psychology, despite many
unintended consequences. Yet, throughout human history, music (and sound) has been
associated with and used by commercial, political, and military forces in attempts to
control behavior, and music itself seems to have intrinsic power to do harm. In Thomas
Kenneally’s book Schindler’s Ark (1982), an inmate-musician and a German officer in
a work camp in occupied Poland conspire to use the repetition of a musical work (the
infamous Gloomy Sunday) to fatal effect: as a result the officer commits suicide after
requesting that the song be repeated with increasing passion. Kenneally’s parable gives
282 w. luke windsor

musical power to the detained Jewish prisoner, and the officer willingly submits. This, of
course, is the opposite of the normal state of affairs: music in detention is most often
controlled by the captor and, as will be discussed later, used to attempt to influence the
thinking and behavior of the captive. The parable is also an exaggeration for effect: through
hyperbole, Kenneally actually highlights the powerlessness of the captive, who only has
the passion for music left as a weapon.
As Grant (2013a, 2014) points out, it is not just in the recent Iraq War, where the use of
loud recorded music in detention gained notable media coverage (see, e.g., Chrytoschek
2011), that music has been used to coerce or humiliate detainees. Moreover, forced sing-
ing and playing, as well as forced listening form part of this history of music in detention
and link it to a broader and longer context of music in military settings (see, e.g., Pieslak
2009; Grant 2013b). Furthermore, as Pieslak (2009) discovered, there is a considerable
overlap in the choice of music by soldiers for personal use and their selection of music in
overt and covert attempts to influence others.
This chapter will not attempt to provide an overview of all the ways in which music
might or might not be used to influence others, for ill or for good. It will instead focus on
an acute and special case of music psychology: that of forced listening to music in deten-
tion, whether or not such forced listening is intended to elicit information or not. The
aims here are to show how such uses of music can be reframed within a broader context
of musical persuasion and to provide a deeper engagement with the ethics of music than
could ever be achieved without considering the extreme case of music in detention. For
a more general discussion of the darker side of musical experience, Johnson and Cloonan
(2009) provide a thought-provoking survey of the many ways in which popular music is
deployed as an accompaniment to, or tool of, violence.
It is with considerable care that any music researcher should engage with the study of
music in detention and/or interrogation. This is partly due to the commonly held view
that music is, or should be, a benevolent art with positive impacts on individuals and
societies, a view that may cause us to turn away from a more malevolent instrumentalism.
Within such a context, research pointing out the harm that music can do seems
counterproductive. Indeed, if one is focused on maximizing the potential social benefit
of research (see, e.g., Sloboda 2005, 395–419), working on the harm that music can do
can only really be justified if it presents data or analysis that can be used in advocacy
against the harmful use of music or of it suggests tools to combat such uses.
Added to this particular disincentive, the broader study of internment, detention,
interrogation, and torture requires a sensitivity to and breadth of knowledge about
international and military law and custom, the ethical and moral background to cruel
and inhuman practices, and so on; few musicologists or music psychologists come
ready-prepared to engage with this body of work. In addition, the researcher may be
persuaded that, by studying music in interrogation, they might inadvertently promote
practices they disagree with, or indeed add to the body of knowledge that interrogators
employ in the field. The position taken by the Society for Ethnomusicology in 2007
(SEM 2007) suggests there is something particular about the use of music in coercive
music in detention and interrogation 283

interrogation that should be called out by musicologists, and also that musicologists
should call attention to such (ab)uses of music. Some musicologists are content to
disavow all coercive interrogation and question whether we need to make a special case
of music within this context, when considered in the context of the range of coercive
methods used in detention and interrogation:

The issue is really torture, which to me is always wrong, period. I can’t see that music
as torture is more or less wrong than anything else as torture, and I confess that deep
down this feels like special pleading—e.g., water resource managers complaining
about the use of water for torture, or (more ridiculously) Hello Kitty aficionados
complaining that Hello Kitty armbands were to be used by a Thai police department
as badges of malfeasance and indiscipline. (Bellman 2007)

This chapter takes an initial step back from these problems, and it does not initially
consider whether they are of particular importance given the wider debate about the
legality or morality of obtaining information through psychological or physical manipu-
lation or pressure. Instead, it will engage with the perceptual and social-psychological
consequences of playing music in situations of detention, and it will engage also with
the context for these practices, as this may help us better understand how they have
come about and how they can be seen to be situated within a broader context of music
as a source of behavioral control. Hence, although much reference will be made to
existing ethnographic and historical work in this domain (especially that of Cusick
2006, 2008a, 2008b; Pieslak 2009; Grant 2013a, 2014) the broader contexts that will be
applied are derived from psychological research that is related both to interrogation and
also to other forms of coercion, and the understanding of the relationship between
imagination, sound (and music), and direct perception.
The role of imagination in the creation of a fearful, vulnerable, and malleable state
has an explicit and implicit relationship with the ability or inability of a person to
directly and effectively act on and perceive their surroundings. It is for this reason that,
rather than analyzing the role of sound in coercive interrogation in a theoretical vacuum,
some positioning is required. This chapter will introduce and apply the work of Gibson
(e.g., 1966, 1979) on direct perception and ecological psychology and will attempt to
show how his theory of perception helps explain the ways in which sound and music,
normally helpful or benign, become sources of fear and confusion. The work of Gibson
will be returned to in the conclusion of this chapter in a more political vein, as it will
become clear that his approach to psychology provides a neat riposte to the co-optation
of (music) psychology by military and commercial interests for purposes of persuasion.
The contrast between Bernays’s (1942) and Gibson’s (1939) reactions to Nazi propaganda
efforts do not, as I will argue, rest on both an ethical distinction and a theoretical one:
their views of human psychology lead them to very different conclusions about how
we as individuals should respond to attempts by others to influence us. Before this,
however, it is necessary to review some of the existing work on music and interrogation/
torture and its intellectual and practical antecedents.
284 w. luke windsor

Music in Detention/Interrogation

Although music is mentioned in many recent accounts of the detention of political

prisoners and detainees captured by the United States and its allies in its “war on terror,” it
is neither the most significant aspect of their treatment nor is it isolated from a wider his-
tory of music in the context of persuasion or the broader context of sound in military or
intelligence applications. The scope of this chapter does not allow for a complete review
of the psychological or historical literature of psychological warfare and even less for a
review of the enormous literature on music in behavior control (see Volgsten, volume 1,
chapter 11). It is also the case that media interest in the use of music in interrogation, and
indeed in modern warfare, has been intense throughout the so-called war on terror,
arguably in disproportion to its uptake and impact in comparison to other more clearly
violent and illegal coercive practices described in military and CIA manuals and in the
accounts of detainees and practitioners, and in analyses thereof. Nonetheless, to under-
stand the peculiar appeal of music to military and intelligence interrogators, and its
relationships with mainstream applied music psychology and the history of sound in war-
fare, the following will provide a brief overview of some relevant literature and concepts.

The History and Broader Context of Music in

Detention and Interrogation
Before directly addressing the main focus of this chapter, the following sub-sections
provide necessary context. First, the use of music to control behavior is briefly addressed
in general terms; then the use of sound and music is addressed in military settings
through short introductions to the role music plays in military life, its use within psycho-
logical warfare, and lastly the use of music as a weapon. A final sub-section introduces
recent debate on the use of music in interrogation and detention by US forces in the
early part of the twentieth century.

Music and Behavior Control

Music has a huge and well-researched impact on our emotions (see, e.g., Juslin and
Sloboda 2011, for a comprehensive review). In many situations, we are free to choose our
own music, but in the cinema, or while watching television, shopping, or sitting in a hos-
pital waiting room or dentists’ surgery, music is presented to us through external agency.
Music, and especially recorded and publicly broadcast music, has a long history in
relation to the psychology of persuasion. It has been used in advertising and brand
promotion, where more or less subtle, intrinsic or extrinsic qualities of musical struc-
ture or lyrical content are deployed to attach an emotional valence to a product or brand,
or to manipulate arousal levels (see Gustafsson, volume 1, chapter 18, and Egermann, this
volume, chapter 17). Such approaches are also used in retail settings (such as shops,
music in detention and interrogation 285

malls, and restaurants) to influence not just our internal state, in an attempt to imbue
spaces with a particular ambiance, but also our level of activity. One of the most highly
cited publications in the field of consumer control is the description of a study in which
the volume of music was varied in a supermarket (Smith and Curnow 1966): louder
music was associated with less time in the store but no lesser volume of purchasing. The
authors of this study explain this through an arousal hypothesis, whereby the louder
music leads to greater arousal in the customers and faster shopping, rather than driving
the customers from the store. The correspondence of music to customer’s expectations
or their degree of liking are, however, important factors that can be manipulated to
influence their behavior. A study by North, Hargreaves, and Mckendrick (1999) demon-
strated that we will stay on hold to a help line longer when the music is both liked and
congruent with the task. More subtle dimensions of musical structure and associated
or evoked emotions can also influence what we purchase, how long we linger, and even
how much we are prepared to pay for products. For example, the style of music and its
associations with more or less expensive items might be a powerful predictor of pur-
chasing (Wilson 2003). Music has also been considered as a factor in delineating zones
within shopping malls and department stores, with different style of music helping to iden-
tify soft boundaries between different product areas (e.g., Yalch and Spangenberg 1993).
Music is also used without the intention of influencing purchasing in public spaces.
Just as we might employ it within our own spaces or through earphones to manage our
mood, our spaces’ musical ambiences are curated for us in attempts to speed or slow our
movements, make us more comfortable, or provide public information. Although these
uses are potentially more benign and may be alternatives to more expensive or harmful
attempts to influence us, the central aim is to coerce the listener into a more or less pas-
sive state. Dentists, for example, claim to use music to calm patients with some success,
aiming to make their work easier through a more relaxed patient without needing
recourse to medication. However, Aitken and colleagues (2002) found no effect of music in
such contexts above and beyond the patient’s enjoyment of it in a controlled setting, and
even in studies where it is shown to have an effect it may only be for less anxious
patients (e.g., Lahmann et al. 2008). Moreover, regardless of whether it is effective,
music may simply become another remembered feature of a hostile environment for an
“uncooperative” patient (see, e.g., Welly et al. 2012), and associations of music with expe-
rience can obviously flow both ways. Nonetheless, Standley’s meta-analysis of music in
dental and medical settings (1986) does suggest an effect. Similarly, in waiting rooms,
medical or otherwise, rather than speeding up service, one may choose to play music to
increase tolerance of waiting time (see, e.g., North et al. 1999) or reduce stress (see, e.g.,
Tansik and Routhieaux 1999).
Note, that in all these situations, music’s primary value in self-managing our psycho-
logical state is supplanted by external control of this environmental information. Of
course, music is but one of many kinds of stimulus information that we and others use to
orient and be oriented in the environment, but the semi-unavoidable nature of acoustic
stimulation is significantly different from some other forms of influence: averting or
closing one’s eyes is much easier than ignoring unwanted sound. Of course, one can
286 w. luke windsor

wear ear defenders, plugs, or headphones to block out or supplant this information with
silence or our own choice of music, a technological adaptation that serves to both regain
and enhance control of the auditory environment in a way that is thoroughly con-
temporary. As a corollary, the encouragement of employees to curate their own workplace
musical environment in order to increase productivity and staff well-being (and to avoid
the distractions of workplace noise) is becoming more widespread, and there is some
empirical evidence to support the effectiveness of such practices (see, e.g., Lesiuk 2005).
Of course, music’s ubiquity in this space of influence has led many to complain about,
campaign against, or avoid such settings and uses of music. The attempts by early
adopters of musical broadcast technology to impose music in settings such as public
transport often backfired (see, e.g., Hui 2016), and there is a general social consensus
that even the minor public acoustic spillage from headphones is an intrusion that can
attract considerable opprobrium.
Before concluding this section, and in order to form a link with the later discussion of
the relationship between more general uses of music as propaganda and in psycho-
logical warfare, a final way in which music is used in explicitly political settings is worthy
of mention. In an unusual and original study, Shevy (2008) used different genres of music
to influence participants’ perceptions of trustworthiness, friendliness, and political
ideology, exploiting the stereotypical associations of hip hop and country music: a perti-
nent feature of his findings, which will become relevant when discussing psychological
warfare and interrogation, is that the extent and nature of such influence should vary
with the ideology and musical preference of the listener: a liberal African American lis-
tener would be primed very differently by music than a white or Hispanic listener or a
conservative African American, and such influences would vary with preference for
musical genre. Music is a tool for subtle persuasion in the context of ideology, not just
for commercial ends.

Music in Military Life

Music is very much a part of military life (as is sound, see Bull, volume 1, chapter 9): all of
the behavioral applications for music listed above might apply to military situations,
both as externally applied attempts to manage the behavior of military personnel and in
the self-management of emotion in individuals. In addition, and particularly since the
Korean War, music has been used to influence opposing civilian or military populations
and has been treated more or less as a weapon. There are three particular features of
music that are common to all of these applications: rhythmic structure as a guide or
stimulus for movement; loudness as a method of overwhelming the auditory system;
and the biosemiotics of musical meaning, whereby values and even denotative meaning
can be expressed at a distance. Music can be cheaply and easily transmitted electronically
either through amplification, or via radio, Internet, and satellite, on its own or in
combination with visual images.
Musical rhythm is most often associated in military life with the direct entrainment of
movement to musical meter through marching. Most military units have marching
music in detention and interrogation 287

bands, and the coordination of movement to music has both utilitarian and psychological
dimensions. Even in situations where instruments are not used, soldiers will often march
to songs: the clearest examples of this in the Western military is the singing of the French
Foreign Legion which cuts across marching and more reflective settings; the Boudin, for
example, is sung standing to attention as well as in celebratory or functional marching
situations. The tempo of the Boudin, whether sung in motion or not, is surprisingly slow,
infamously necessitating the arrival of French Foreign Legion units at celebratory events
after other French units, and, indeed, it both denotes the separate identity of the Legion,
and connotes its rather dour character. This tempo, and a curious single style, with
truncated phrase ends, extends to a wide repertoire of traditional and popular songs,
mostly in French, which many of the recruits will barely speak, but also many songs are
in German, reflecting the large number of German recruits the Legion has attracted at
times (see, e.g., French Foreign Legion 2016). A related tradition, from the United States,
is that of the cadences and jodies sung by soldiers as they train (see Pieslak 2009): again,
synchronization of movement is paramount, but in both cases the content is also sig-
nificant, and rather different, as will be discussed below. Importantly, the music of march-
ing sits in an interesting zone in between self-chosen musical behavior and imposed
discipline: the choice to march or sing is not free, it is taken under military discipline,
and to refuse is a matter for the military courts.
Regardless of any other subtler parameter, the sheer volume of military music is
important. Whether participatory or not, military bands and even unaccompanied
singing produce loud sounds which travel far. In combat, and extensively documented
in Pieslak’s study of music in the Iraq War (also see Gittoes 2005, documentary), soldiers
not only take the trouble to select their own music to accompany combat within armored
vehicles, they create DIY sound systems within them to broadcast the music over their
intercom systems or through internally mounted loudspeakers. There is a sense in which,
just as a commuter masks the sounds of others with music over headphones, this cre-
ates a private environment within the vehicle, the sheer volume of sound masking the
influence of the threats from outside. The volume of broadcast or headset music here is
self-chosen, although in Gittoes’s extraordinary unsanctioned film about music in the
Iraq War (2005) some of the interviewees have clearly developed less coherent musical
selections, creating a conflicted musical environment within the confines of the armored
vehicle: being unable to escape from loud unwanted music is clearly a potential problem
in combat, just as it might be in other work-settings.
The semiotics of military music interacts with these other two parameters: the trite
example of the bugle call at one end of a spectrum which ends with the singing (and
broadcast over loudspeakers) of “Je ne regrette rien” in association with the withdrawal
of the final French Foreign Legion units from Algeria. The lyrics, tempo, and musical
structures of military music implicitly and explicitly influence soldiers before, during,
and after combat; they serve to identify particular units and they communicate ideas,
national identities, and ideologies. This semiotics was particularly important to impe-
rial military powers; for example, in Africa in the late nineteenth and twentieth centuries
288 w. luke windsor

(see, e.g., Clayton 1978). For example, in East Africa, both British and traditional African
music were adopted to build a corporate identity, often exploiting the usage of Swahili as
a cross-tribal language. To sing “Men of Harlech” in Welsh, or indeed English, is one
thing, but to sing it in Swahili, quite something else. In this case, the fantasy of Welsh
(actually mostly English) soldiers singing this song at Rorke’s Drift (in the film Zulu) has
a real counterpart in the musical practices of later colonial troops. Or consider the lyrics
of this traditional World War II song from Kenya (also sung in Uganda):

Mussolini Mussolini,
Mussolini amekimbia!
Nakumbuku njaro Nairobi!
Nakumbuku njaro Faifa keya!
Tutarudi!
Tutarudi!
Mussolini, Mussolini
Mussolini has run away!
We remember the light of Nairobi
We remember the brightness of 5 KAR.
(Clayton 1978, 38)

Psychological Warfare
Allied with the presentation of propaganda in spoken form via radio or loudspeaker,
music has long played a role, along with sound effects, in efforts to influence the behav-
ior of opposing forces. Indeed, for Volcler (2013; also see Goodman 2012 for a more theo-
retically driven treatment), the modern usage of music in this context (often associated
with the Korean War and later conflicts) is one of two main precursors of the use of
music in detention, the other being post-1945 CIA-sponsored research on the psy-
chology of coercive interrogation. Volcler also draws parallels between the nonlethal
usage of sound as a persuasive tool and as a weapon to disable or kill, which will be
addressed briefly in what follows. This historical link between music as an at-a-distance
tool of warfare and music in detention is also made by Pieslak (2009), who distances his
historical narrative from that of Cusick (2006, 2008a, 2008b), for whom, like Volcler
(2013), the sources of music in detention derive both from propaganda practice and
from covert psychological research programs. It is probable that the history of music’s
use in detention draws on many precursor practices (see Grant 2014, for an excellent
overview of the many ways in which music comes to be used as and in torture), and it is
likely that the use of music to explicitly influence behavior draws variously on all of these
precursors, depending on circumstance. This will be returned to later in relation to the
tension between improvised and more institutionally circumscribed practices described
in manuals and by practitioners and detainees.
Even in mainstream psychological warfare, the use of music often oscillates between
more improvised and administered extremes and between motivational soundtrack and
nonlethal weapon, as exemplified by the use of music during the siege of the Vatican
music in detention and interrogation 289

Embassy in Panama in 1989, originally intended to mask reporters’ attempts to

eavesdrop on negotiations:

While some accounts claim that the music was played to boost the morale of American
troops (a claim that even here demonstrates the overlap between psychological tactics
and inspiration for possible combat), it had, regardless of original intent, a powerful
side effect. When Noriega commented that the music was irritating him, the Marines
increased the volume, playing the music continuously. (Pieslak 2009, 82)

Rather than review the range of ways music is used in persuasion in the field, the reader
is directed toward Pieslak’s coverage of the use of music by opposing forces in the Iraq
War (2009): here both sides broadcast sound at high volumes via loudspeaker: nasheeds
on the Iraqi side, and rock and rap music on the US side: in both cases he argues that
such a sonic environment inspires friendly forces while also being intended to destabi-
lize the enemy.

Sound Weapons
Volcler (2013), in her provocative book Extremely Loud: Sound as a Weapon, argues that
the use of music in detention and interrogation takes place in a broader context of sonic
weaponization. Indeed, although spending much time on the claims made for physio-
logical applications of sound, she concludes that it is the psychological impact of sound
(and music), whether tacit or conscious, that is the most effective weapon. Although
sound at high intensity can damage the ear (or even other organs), and contemporary
technologies such as the Long Range Acoustic Device (LRAD) can both deliver verbal
instructions, tones, noise, or music at long ranges and high enough intensities to cause
distress or damage, she notes that the fear of such weapons is probably just as impactful
as their application. Importantly, like Cusick, she notes that the attraction of nonlethal
weaponry is often somewhat disingenuous: just as the LRAD is marketed as a long-range
communication device but potentially applied as a weapon at shorter ranges, sound and
music are portrayed as relatively harmless (no-touch) interrogation techniques rather
than as psychologically harmful torture methods:

“No-touch torture” shares with non-lethal weapons the advantage that it leaves no
marks directly caused by interrogators on the visible, fleshy surfaces of the body.
Thus hard to prove, and hard to jibe with images of torture familiar from visual
and literary culture, “no-touch torture’s” premise is nonetheless consistent with the
premise behind non-lethal weapons, including those that use sound; and it is con-
sistent with the premise by which PsyOps units use sound or music to prepare the
battlefield. The common premise is that sound can damage human beings, usually
without killing us, in a wide variety of ways. What differentiates the uses of sound or
music on the battlefield and the uses of sound or music in the interrogation room is
the claimed site of the damage. Theorists of battlefield use emphasize sound’s bodily
effects, while theorists of the interrogation room focus on the capacity of sound and
music to destroy subjectivity. (Cusick 2006)
290 w. luke windsor

Volcler’s most interesting conclusion is that, in many cases, the use of sound as weapon
is more effective as a purely psychological technique, a placebo weapon of the imagination:

The difficulty in understanding the functioning and effects of acoustic weapons, as

well as the mass of conspiracy theories and paranormal inventions they inspire,
works in their favor: the information about them becomes confused, thus fueling
the psychological effect from which they benefit . . . Weapons of high technology that,
like “no-touch torture,” touch without touching, pass through obstacles, and act
without seeming to act, acoustic weapons are also infused with a carefully sustained
illusion of magic. (2013, 137–138)

As I will argue later, it is the appeal to imagination as opposed to direct perception that
is at the heart of music’s use in detention and interrogation but, before turning to this
ecobehavioral interpretation, the next section provides a brief review of recent practices,
impacts, and narratives of music in detention and interrogation, focusing on recent
conflicts in Iraq, Afghanistan, and the wider “war on terror.”

Music in Interrogation and Detention:

Recent US Practices
The use of music in the early twenty-first century by US interrogators and guards is
embedded in a longer historical and technical context. Before focusing on accounts of
these practices from the perspective of military personnel and detainees, it is important
to recognize the ideological positioning of the main researchers and its impact on their
somewhat limited and selective choices of informants: Pieslak (2009) views the use of
music in interrogation from the perspective of the interrogator and, like Lagouranis
(e.g., Lagouranis and Mikaelian 2008), one of the few interrogators to write in detail
about his work, concludes that the small part that music and sound played in inter-
rogations was largely improvised in the absence of clear legal guidance (according to
Lagouranis much of the wider practice was similarly developed “on the job”).
Pieslak’s two main military sources exemplify this institutionally vague context in
their differing opinions on whether music can ever constitute torture as opposed to legal
coercion, noting that neither of the military manuals in effect at this time mention the
use of music in any detailed manner or loud sounds (Department of the Army 1992 and
2006, the latter subsequent to considerable amendment following public exposure of
the more extreme methods used by US forces). One of Pieslak’s sources views music and
other sounds as permissible within both the operational and legal guidelines as long as
the interrogator experienced them simultaneously with the detainee; the other viewed
music as an illegal “change of scenery” for the detainee, tantamount to actually blind-
folding, transporting, and confusing the detainee about their location. Pieslak notes
that his second informant was working in Iraq after procedures were made less extreme
following the exposure of prisoner treatment at Abu Ghraib prison in 2004.
music in detention and interrogation 291

Of course, some interrogators may have been implicitly or explicitly following

guidelines provided by the CIA, either the infamous KUBARK Counter Intelligence
Interrogation manual (CIA 1963; also see CIA 1983), or later medical guidance (CIA 2004).
These CIA manuals do discuss sensory deprivation (which music or noise can contrib-
ute to by masking other sounds) and the general principles of coercive interrogation,
but nowhere is the detailed use of music discussed. In the most recent of these manuals
(CIA 2004, 8), the use of “white noise or loud music” is ranked fifth out of twenty tech-
niques (in ascending severity) intended to act psychologically on the detainee, with
“shaving” the least severe, and “waterboarding” the most severe. Here, and later in this
manual, it is made clear that sounds should not be so loud as to damage hearing
(CIA 2004, 13) with a maximum of 79 dB. Given that such advice was routinely ignored,
according to contemporary accounts such as those cited here, including those reported
by Pieslak, it seems possible that the use of extremely loud music above such levels was
either improvised in the field, in line with the wider use of music in military settings, or
was directed through less-well-documented cultural practices.
In contrast, Cusick (2006, 2008a, 2008b; also Volcler 2013), views the use of music
and sound in interrogation as directly emanating from the guidance in the military and
CIA manuals mentioned earlier (also see McCoy 2006) and hence from covert CIA
research programs: unlike Pieslak, she argues that despite the lack of direct reference to
music in the manuals, they imply a particular set of practices “very much like the relation-
ship of performance practice norms to that of a published score” (Cusick 2008a, 16).
Although some of the evidence for a direct link between psychological research programs
on sensory deprivation and related forms of coercion and later practice is rather weak
(see, e.g., Blass 2007; Brown 2007, in relation to the contributions of Milgram and Hebb
to CIA research), it is clear that these manuals draw on empirical studies on sensory
deprivation, although, as pointed out by Pieslak (2009), there is no evidence that any of
these studies used music as a masking stimulus.
Cusick’s (2008a) primary sources are detainees themselves, although she does refer
in detail to secondary material from interrogators themselves. Her work provides the
clearest description of music in interrogation practices in this period, especially through
her interviews with Donald Vance and Moazzam Begg and her analysis of the inter-
rogation of Muhammad al-Qatani but also in her discussion of accounts by two pseudon-
ymous US interrogators and Lagouranis. In summary, these accounts provide clear
qualitative evidence of the practices and their impact:

1. music was played at very high volumes both in detention more generally and dur-
ing interrogations, certainly much louder than would be advised if permanent
damage were to be avoided
2. music was played for long durations (exacerbating the damaging effects of
loudness)
3. the music chosen reflected the individual tastes and cultural backgrounds of the
interrogators
4. music was interchangeable in some instances with everyday sounds
292 w. luke windsor

5. the interrogators used music for a number of intended purposes related to:
a. masking background sounds to isolate the detainee
b. interrupting cognition through distraction
c. creating cultural dissonance
d. establishing the dominance of the interrogator.

The issue of dominance (5d) seems particularly pertinent to the explicit training
interrogators received; the relationship of dependency between interrogator and
detainee is established not only through the playing of loud music (or indeed disturbing
everyday sounds) but by its cessation at the will of the interrogator. 5a–c correspond to
what Pieslak’s second informant refers to as a “change of scene”: music is intended to
block out and change the environment of the detainee in such a way as to maximize iso-
lation and minimize any sense of familiar surroundings. Such masking and distortion of
the environment is cultural as well as natural, as evidenced by the contrast Cusick iden-
tifies between the experiences of Begg and Vance (both familiar with Western popular
music) and that of al-Qatani, who was less so and more so considered music to be
haram (forbidden). Begg and Vance found the constant loud music irritating, disorienting,
and painful, but were not sensitive to its cultural dissonance: Begg even notes that he
believed that his interrogators were sensitive to this and did not use music with him in
the interrogation cell (although it was played elsewhere) (Cusick 2008a, 6–7). Vance,
however, like Begg found that the loud music, regardless of its cultural familiarity still
played a huge role in the psychological regime in which they were immersed. Regarding
al-Qatani, Cusick claims that the use of Western music was used knowingly to under-
mine his religious convictions due to its cultural specificity:

Given that the Taliban had forbidden music in Afghanistan for religious reasons, it
seems possible that al-Qatani genuinely believed that listening to music was haram,
forbidden, and therefore sinful. Yet his inability to talk knowledgeably about Islam’s
theological traditions on music allowed “the music theme” to merge with the themes
known as “the bad Muslim,” “al Qaeda betrays Islam,” “God intends to defeat al Qaeda,”
“arrogant Saudi,” and “I control all” to produce the overall “approach” called “Pride/
Ego Down.” That is, al-Qatani was humiliated, and his Muslim identity attacked, by
his obvious ignorance of his own tradition. Meanwhile, the “loud music” he may
have experienced as sinful continued to keep him awake, to end his interrogation
just before he was allowed to sleep, to awaken him, to prevent him from speaking in
answer to interrogators’ questions, and to fill up longer and longer parts of inter-
rogation days that were also filled with the argument over music’s alleged sinfulness,
which constituted “the music theme.” (2008a, 13)

This passage illustrates that the use of music by US interrogators is, if one accepts this
account at face-value, much more sophisticated than anything in the training manuals
declassified by the US Government. Music is not just another convenient loud sound to
disorient through controlling the environment, it is a cultural weapon of persuasion,
related to its use in propaganda and psychological warfare. This point is even more
music in detention and interrogation 293

forcefully made by Grant (2014), who locates music in detention within a broader
context of music as a method for sanitizing the act of applying militaristic power. Grant
suggest that there exists a continuum between the natural and the cultural, and between
the linguistic and musical in many interrogation or detention settings, and even implies
there is a perverse creativity in the use of music by interrogators to avoid more obvious
evidence of force.

Music, Information, and

Interpretation: Fear
and Imagination

Having described and contextualized the ways in which music has recently been used in
detention and interrogation, it is time to return to the ethical dimension of music and
its co-optation by interrogators. In their different ways, Cusick (2006, 2008a, 2008b),
Pieslak (2009), and Grant (2014) attempt to make sense of the way in which music seems
corrupted by its association with detention and interrogation, even though they may
argue about whether it can be considered torture in itself.
Rather than approach this question directly, this final section will recast the use of
music in detention and interrogation within the ecobehavioral approach to psychology
characterized by Gibson (e.g., 1966, 1979; also see Heft 2001). The intention here is to
demonstrate that this co-optation of music is partly a result of choosing to apply psycho-
logical research not only to the understanding of human behavior but also to its control.
To this end I will contrast the positions of Gibson (1939) and Bernays (1942) on Nazi
propaganda, but first it is necessary to describe Gibson’s mature position on the relation-
ship between direct perception and mediate perception, and how it helps explain both
benign and malign applications of music.

Sound as Information about Objects and Events

Within ecological psychology, stimulus information from objects and events is consid-
ered as information about those objects and events: sound, for example, is a source of
information about the world and, in Gibson’s later work, serves as information about the
potential actions afforded to an organism by the environment (1966, 1979). Gibson worked
much more extensively on vision than auditory perception, but his ideas have been
applied to perception via both musical and everyday sounds (see, e.g., Gaver 1993a,
1993b; Windsor and de Bezenac 2012). Music, like any other sound, provides information
about the actions that produce it, and although this may seem a rather unusual way of
viewing music, this tendency of the auditory system to detect the real (or virtual) causes
of sounds is both an undeniably important aspect of auditory perception and one that is
294 w. luke windsor

important to our interpretation of both more or less conventional (see, e.g., Clarke 2005)
and unconventional (see, e.g., Windsor 2000) musics.
In many everyday situations, we are able to identify or at least classify the sources for
the music that we hear, whether played on the radio or self-chosen, whether experienced
live, with the additional benefit of visual information, or acousmatically over loudspeak-
ers or headphones. We know something about where it comes from, who made it, and
what they might have wanted us to think or feel; we can infer meaning from lyrics or
instrumentation, or the subtleties of harmonic or melodic semiosis. Our sensitivity to
these dimensions of sound, however, is developed through developmental familiarity. It
might seem paradoxical in a book about imagination to turn to work on direct per-
ception: Gibson’s eschewal of representation and information processing in his account
of perception is both controversial within psychology (see, e.g., Fodor and Pylyshyn 1981)
and may seem unproductive when other approaches (such as that of Neisser, e.g., 1978,
89–105) explicitly try to understand the relationship between imagination, memory,
and perception. However, as I will argue later, the way that imagination functions in
Gibson’s work highlights a boundary between real and virtual which is both useful and
thought-provoking in this context.

Music and Sound as Information for Interpretation

and Imagination
Of course, if all music did was inform us about causation and agency, it is hard to
imagine that we would spend so much time and effort listening to it. Musical listening
also invokes interpretative action in response to the impoverished or ambiguous
information provided by the artwork:

As Gibson himself states in response to the question of what happens in cases of

inadequate information: “the perceptual system hunts” (Gibson 1966, 303; also see
Windsor 2000 and 2004). Where the immediate information from a particular
source is insufficient, the human being not only hunts for additional information
from the “natural” environment, but also from the social and cultural environment.
By observing the actions of others, exploring cultural artefacts, by involvement in
discussion with others, information may be explored which supplements that
provided by the event or object in question. In the case of a sound or sequence of
sounds which fails clearly to specify an event, the human listener attempts nonethe-
less to make sense of these sounds in relation to the environment (Windsor 2000
and 2011). Such activity represents for the listener the active part of what might
otherwise be considered a passive activity. Choosing to write or talk about music we
have heard, walking out of a gig, or even listening again to a passage of music are all
actions that leave traces on the environment for others to perceive. Music-makers
help furnish us with affordances for interpretative action; they do not merely pro-
vide sounds which we process in a passive internal fashion.
(Windsor and de Bezenac 2012, 14)
music in detention and interrogation 295

In everyday listening situations, the meaning of the sounds we listen to is mediated by

more or less active exploration of the real or virtual agency that created them: we talk to
others about music, we read record sleeves or online discussions, and, where these fail to
provide us with sufficient clarity, we invent, indeed imagine reasons that the music is like
it is (see also Bull, volume 1, chapter 9, on the mediation of sound in war).

Imagination and Fear of Music

If one accepts that music is a source of information about our surroundings, albeit one
that is somewhat contingent on familiarity, then exposure to music that we do not know
is particularly likely to invoke more imaginative and cognitive approaches to listening,
as opposed to more automatic responses. In situations where we do not choose the
music that we are listening to, this can be a pleasurable puzzle to be solved. Of course,
extreme unfamiliarity/complexity is also less than palatable because the cognitive effort
is perceived as too great (see, e.g., Berlyne 1971).
For the detainee, music is at the same time purely noise and a puzzle, but not one that
has any aesthetic value: for detainees such as Begg and Vance (see Cusick 2008a), their
experience of music in detention was mainly one of sensory deprivation and masking:
loud music functioned to isolate and irritate and was often boring and overly familiar
although sometimes absurd. Vance in particular notes the effect of being exposed to
random changes in the choice of music, often mid-track, on his ability to think. For Begg
in particular, the music served to block out sounds that otherwise provided useful
information about the real environment. These aspects of music fit neatly into the sen-
sory deprivation techniques described in the CIA manuals that implicitly or explicitly
informed their interrogations. The acoustic environment was loud, artificial, and unpre-
dictable: hardly useful stimulus information for engaging meaningfully with the already
impoverished and unfamiliar surroundings of detention.
For detainees with less familiarity with the music of their interrogators, there is an
additional set of perceptual challenges. It is not just that the music is unfamiliar noise,
although certainly that is one dimension of the experience: the additional challenge is
that, along with removal of the detainees’ abilities to explore the sources of sounds (or
indeed control them), such efforts to identify sound sources are largely pointless. In
such a situation, there is no room for interpreting the sounds at all and the choice of
music is simply loud and foreign. The only causation to be perceived is that of the
interrogator choosing to play or stop playing the music, a power that can be exploited to
punish or reward or simply to confuse. The only way of going beyond this is to imagine,
and imagination in such a situation is probably only a source of fear of the unknown.
Cusick’s account of al-Qatani’s interrogation (2008a) provides yet a third way in
which interpretative action functions. Here, it is not unfamiliarity per se that creates stress
and confusion, it is the way in which imagination is superseded by active interpretation
of musical stimuli: the interrogators and al-Qatani engage in a narration of the music
which, for the detainee, becomes dissonant and problematic: the music that troubles
296 w. luke windsor

al-Qatani most is the Arabic music that is played: the other music is distressing for the
reasons cited previously, acting more as noise than in a more subtle manner, but the
Arabic music opens up what turns out to be a distressing dialogue about Islamic culture’s
attitudes to music, one which, according to Cusick, he loses.

Conclusion: Music, Torture, and

the Necessity of Experience

It would be wrong to claim that enforced listening to music in detention is a simply a

malevolent distortion of the ways in which we are exposed to the uses of music by marke-
teers (discussed earlier), however inescapable that soundtrack might be: for many
detainees in recent conflicts there is such a disjunction between the musical cultures of
interrogator and prisoner that music functions as mere noise. The subtler emotional
impacts of music in advertising require us to recognize and decode styles and structures
of music in a way only possible for the acculturated. Where detainees share a musical
history with their captors, such subtle effects are still possible, and, for Vance (see
Cusick 2008a), the presentation of music with emotional impact became extremely
distressing due to its lack of fit with the setting of detention and interrogation. However,
in all these situations, regardless of the effectiveness of choosing a soundtrack for tor-
ture, there is a deeper implication of the use of music in these situations, one that exposes
a fundamental disagreement within psychology about the application of psychological
methods to influence behavior. To understand where this comes from it is necessary to
first take a diversion from direct discussion of music and focus instead on the roots of
psychological warfare in the field of public relations and propaganda. To this end I will
contrast the conclusions that Bernays (1942) and Gibson (1939) drew from the effec-
tiveness of propaganda in influencing public opinion in times of conflict.
Bernays’s work on propaganda and public relations (see, e.g., Bernays [1928] 2004)
developed in part from his work in World War I within the Committee on Public
Information in the United States. For Bernays, there was no conflict between the idea of
a free society and the use of propaganda to influence political opinions. Indeed, in an
essay written during World War II, Bernays (1942) sets up a clear platform for com-
bating the authoritarian and highly effective use of Nazi propaganda with Allied propa-
ganda. The moral justification for this rests on a belief that it is possible to coerce public
opinion in an ethical manner, that if the ends are good, then the means of persuasion can
be justified. Moreover, Bernays’s approach was characterized by an underlying exhor-
tation that propaganda should not lie; a restraint not always followed by practitioners
and, in areas of ambiguity, a matter of judgment.
Gibson, writing just three years earlier, had a rather different view of propaganda,
which conditioned his later desire to focus his empirical research on, and theoretical
writing on, the idea that direct experience (unmediated by visual displays, writing, or
music in detention and interrogation 297

even language) provides much richer and less unambiguous source of information
about the world than proposed by cognitive and social psychologists. For Gibson, the
deception of German people (particularly into anti-Semitism) through propaganda in
the 1930s was a reason to suspect all propaganda, because it is inherently deceptive. The
answer, for Gibson, was to direct psychology away from the study of preconceptions, and
the social influence that seeks to reinforce them, toward the study of direct perception:

Our world, more especially our world of social objects, is understood in terms of
preconceptions, preexisting attitudes, habitual norms, standards, and frames of
reference. When the preconception is sufficiently rigid, an object will be perceived
not at all in accordance with the actual sensory stimulation but in congruence with
the preconception. No psychological law has been more exhaustively demonstrated
than this one. Preconceptions of this sort, moreover, are wrought out socially and
modify individual judgments. This fact also has been amply demonstrated both inside
and outside of a laboratory. A preconception that is socially reinforced becomes a
norm or standard for everybody. It becomes verbally symbolized in the process and
thereby is stereotyped and strengthened. Each individual adopts and internalizes
it, forgetting its imitative origin, and incorporates it in his repertory of values and
opinions. (1939, 165–166)

When applied to the context of music and its use to influence behavior, to coerce, it
becomes clear that regardless of the setting, such a co-optation of an emotionally power-
ful and unavoidable stimulus is another way in which the powerful seek to control the
weak: it is a method for preventing an individual from hearing their environment and is
one part of the creation of a setting whereby direct experience is limited to a space con-
trolled by others. The problem with music in torture, then, is not that music is corrupted
by the interrogator but that it is used to curtail experience, rather than being a stimulus
for further interpretation and exploration. In detention, the aesthetic of music is that of
fear: imagination of the worst replaces any other possible interpretation. Music becomes
a stimulus for social control, and the ideal subject is the detainee who cannot escape,
who cannot explore and discover the world through direct experience. The coercive use
of music is just one way in which the experience of an individual in detention is eroded,
reifying a view of human beings that Gibson derided:

The greatest myth of the twentieth century is that people are sheep. Our intellectual
culture has been built on the idea that ordinary people tend to see things as others
want them to, with little independence of mind. This is a pernicious assumption. . . . It
was Gibson’s purpose to undermine such thinking. (Reed 1996, 162)

Whereas propagandists and marketeers might tacitly assume that human beings are as
passive and directable as “sheep,” using music to influence through the control of audi-
tory information, the interrogator forces the prisoner into a passive mode of engaging
with sound, one in which propaganda has clearly failed in its mission to persuade. Music
used in this way is a tacit admission that persuasion through musical influence has been
298 w. luke windsor

abandoned to brute force: like Bellman (2007), I can only conclude that it is torture that
is ethically repugnant, with music playing a minor, and paradoxically unmusical role.
This role is in direct contrast to the utopian (and possibly naïve) view of music as a propa-
ganda tool advanced by Young (1954) or Szafranski (1995), one in which enemies of the
United States could be influenced by exposure to US cultural products such as music:

Saudi Arabia recently joined China as the most recent nation to outlaw satellite
television receivers. One can easily appreciate the effects that Music Television (MTV)
might have on such cultures. (Szafranski 1995)

It is as if, faced with an enemy that was often not susceptible to such influence due to its
rejection of music as an acceptable mode of expression, the US military and intelligence
communities reacted by attempting to maintain this belief in the power of music, while
simultaneously undermining its aesthetic potential. The coercive, and hence restrictive,
nature of using music in detention and interrogation turns music, at its most degraded,
into noise, and at its most sophisticated, into a stimulus for a fearful imagination.

References
Aitken, J. C., S. Wilson, D. Coury, and A. M. Moursi. 2002. The Effect of Music Distraction
on Pain, Anxiety and Behavior in Pediatric Dental Patients. Pediatric Dentistry 24:
114–118.
Bellman, J. 2007. Music as Torture: A Dissonant Counterpoint. https://dialmformusicology.
com/2007/08/21/music-as-tortur/. Accessed January 19, 2017.
Bernays, E. L. (1928) 2004. Propaganda. New York: IG.
Bernays, E. L. 1942. The Marketing of National Policies: A Study of War Propaganda. Journal
of Marketing 6 (3): 236–244.
Berlyne, D. E. 1971. Aesthetics and Psychobiology. New York: Appleton-Century-Crofts.
Blass, T. 2007. Unsupported Allegations about a Link between Milgram and the CIA. Journal
of the History of the Behavioural Sciences 43 (2): 199–203.
Brown, R. E. 2007. Alfred McCoy, Hebb, the CIA and Torture. Journal of the History of the
Behavioural Sciences 43 (2): 205–213.
Chrytoschek, K., dir. 2011. Songs of War: Music as a Weapon. A & O Buero.
CIA. 1963. KUBARK Counterintelligence Interrogation. National Security Archive Electronic
Briefing Book No. 122. Washington, DC: George Washington University.
CIA. 1983. Human Resource Exploitation Training Manual. National Security Archive
Electronic Briefing Book No. 122. Washington, DC: George Washington University.
CIA. 2004. OMS Guidelines on Medical and Psychological Support to Detainee Rendition,
Interrogation, and Detention. Langley: CIA.
Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
Clayton, A. 1978. Communication for New Loyalties: African Soldiers’ Songs. Papers in
International Studies, Africa Series 34. Center for International Studies, Ohio University.
Cusick, S. G. 2006. Musicology, Torture, Repair. Radical Musicology 3. http://www.radical-
musicology.org.uk/2008/Cusick.htm. Accessed January 19, 2017.
music in detention and interrogation 299

Cusick, S. G. 2008a. “You Are in a Place That Is Out of the World . . . ”: Music in the Detention
Camps of the “Global War on Terror.” Journal of the Society for American Music 2 (1): 1–26.
Cusick, S. G. 2008b. Music as Torture/Music as Weapon. TRANS 8. http://www.sibetrans.
com/trans/articulo/152/music-as-torture-music-as-weapon. Accessed January 19, 2017.
Department of the Army. 1992. FM 34–52 Intelligence Interrogation. Washington, DC: Department
of the Army.
Department of the Army. 2006. FM 2–22.3 (FM 34-52) Human Intelligence Collector Operations.
Washington, DC: Department of the Army.
Fodor, J. A., and Z. W. Pylyshyn. 1981. How Direct Is Visual Perception? Some Reflections on
Gibson’s “Ecological Approach.” Cognition 9: 139–196.
French Foreign Legion. 2016. French Foreign Legion Songs and Marches. http://foreignlegion.
info/songs/. Accessed January 19, 2017.
Gaver, W. W. 1993a. What in the World Do We Hear? An Ecological Approach to Auditory
Event Perception. Ecological Psychology 5 (1): 1–29.
Gaver, W. W. 1993b. How Do We Hear in the World? Explorations in Ecological Acoustics.
Ecological Psychology 5 (4): 285–313.
Gibson, J. J. 1939. The Aryan Myth. The Journal of Educational Sociology 13 (3): 164–171.
Gibson, J. J. 1966. The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
Gibson, J. J. 1979. The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
Gittoes, G., dir. 2005. Soundtrack to War. Australia: ABC Video.
Goodman, S. 2012. Sonic Warfare: Sound, Affect and the Ecology of Fear. Cambridge, MA:
MIT Press.
Grant, M. J. 2013a. The Illogical Logic of Music Torture. Torture 23 (2): 4–13.
Grant, M. J. 2013b. Music and Punishment in the British Army in the Eighteenth and
Nineteenth Centuries. The World of Music 2 (1): 9–30.
Grant, M. J. 2014. Pathways to Music Torture. Musique et Conflits Armés après 1945 4: 2–19.
Heft, H. 2001. Ecological Psychology in Context: James Gibson, Roger Barker and the Legacy of
William James’s Radical Empiricism. Mahwah, NJ: Erlbaum.
Hui, A. 2016. Aural Rights and Early Environmental Ethics: Negotiating the Post-War
Soundscape. In Current Directions in Ecomusicology, edited by A. S. Allen and K. Dawe,
176–187. New York: Routledge.
Johnson, B., and M. Cloonan. 2009. Dark Side of the Tune: Popular Music and Violence. Farnham,
UK: Ashgate.
Juslin, P., and J. A. Sloboda. 2011. Handbook of Music and Emotion: Theory, Research,
Application. Oxford: Oxford University Press.
Kenneally, T. 1982. Schindler’s Ark. London: Hodder and Stoughton.
Lagouranis, T., and A. Mikaelian. 2008. Fear Up Harsh: An Army Interrogator’s Dark Journey
through Iraq. New York: New American Library.
Lahmann, C., R. Schoen, P. Henningsen, J. Ronel, M. Muehlbacher, T. Loew, et al. 2008. Brief
Relaxation versus Music Distraction in the Treatment of Dental Anxiety: A Randomized
Controlled Clinical Trial. Journal of the American Dental Association 139 (3): 317–324.
Lesiuk, T. 2005. The Effect of Music Listening on Work Performance. Psychology of Music
33 (2): 173–191.
McCoy, A. W. 2006. A Question of Torture: CIA Interrogation from the Cold War to the War on
Terror. New York: Metropolitan.
Neisser, U. 1978. Perceiving, Anticipating and Imagining. In Minnesota Studies in the Philosophy
of Science IX, edited by C. W. Savage. Minneapolis: University of Minnesota Press.
300 w. luke windsor

North, A. C., D. J. Hargreaves, and J. Mckendrick. 1999. Music and On-Hold Waiting Time.
British Journal of Psychology 90 (1): 161–164.
Pieslak, J. 2009. Sound Targets: American Soldiers and Music in the Iraq War. Bloomington:
Indiana University Press.
Reed, E. S. 1996. The Necessity of Experience. New Haven, CT, and London: Yale University Press.
SEM. 2007. Position Statement on Torture. http://www.ethnomusicology.org/?PS_Torture.
Accessed December 13, 2018.
Shevy, M. 2008. Music Genre as Cognitive Schema: Extramusical Associations with Country
and Hip-Hop Music. Psychology of Music 36 (4): 477–498.
Sloboda, J. A. 2005. Assessing Music Psychology Research: Values, Priorities and Outcomes.
In Exploring the Musical Mind, edited by J. A. Sloboda. Oxford: Oxford University Press.
Smith, P. C., and R. Curnow. 1966. “Arousal Hypothesis” and the Effects of Music on Purchasing
Behaviour. Journal of Applied Psychology 50: 255–256.
Standley, J. M. 1986. Music Research in Medical/Dental Treatment: Meta-Analysis and Clinical
Applications. Journal of Music Therapy 23 (2): 56–122.
Szafranski, R. 1995. A Theory of Information Warfare: Preparing for 2020. Air Power Journal
9 (1): 56–65. https://www.airuniversity.af.edu/Portals/10/ASPJ/journals/Volume-09_Issue-
1-Se/1995_Vol9_No1.pdf. Accessed December 13, 2018.
Tansik, D. A., and R. Routhieaux. 1999. Customer Stress-Relaxation: The Impact of Music in a
Hospital Waiting Room. International Journal of Service Industry Management 10: 68–81.
Volcler, J. 2013. Extremely Loud: Sound as a Weapon. New York: New Press.
Welly, A., H. Lang, D. Welly, and P. Kropp. 2012. Impact of Dental Atmosphere and Behaviour
of the Dentist on Children’s Cooperation. Applied Psychophysiology and Biofeedback 37 (3):
195–204.
Wilson, S. 2003. The Effect of Music on Perceived Atmosphere and Purchase Intentions in a
Restaurant. Psychology of Music 31 (1): 93–112.
Windsor, W. L. 2000. Through and around the Acousmatic: The Interpretation of
Electroacoustic Sounds. In Music, Electronic Media and Culture, edited by S. Emmerson,
7–35. London: Ashgate.
Windsor, W. L., and C. de Bézenac. 2012. Music and Affordances. Musicae Scientiae 16:
102–120.
Yalch, R. F., and E. Spangenberg. 1993. Using Store Music for Retail Zoning: A Field Experiment.
Advances in Consumer Research 20: 632–636.
Young, J. S. 1954. Communist Vulnerabilities to the Use of Music in Psychological Warfare.
Washington, DC: George Washington University.
chapter 15

Augm en ted U n r e a lit y

Synesthetic Artworks and
Audiovisual Hallucinations

Jonathan Weinel

Introduction

During “altered states of consciousness” (ASCs), such as those produced by psychedelic

drugs, an individual may experience substantial changes to mood, thoughts and per-
ception, and have subjective experiences of visual or auditory hallucinations. In Hobson’s
(2003, 44–46) discussion of his AIM (activation, input, modulation) model of conscious-
ness he distinguishes the imagery of dreams and hallucinations as “internal” sensory
inputs, in contrast with the “external” inputs that are received via the senses from the
surrounding environment during normal waking consciousness. For the purposes of
this chapter, external inputs correspond with physical “reality,” while the internal inputs
generated by the brain during dreams or hallucinations shall be considered as “unreality.”1
The use of “unreality” as a label should not undermine the significance of these
ASCs, as, throughout history, hallucinations have held a special place in human culture.
Ancient shamanic traditions use hallucinogenic plants and other “techniques of ecstasy”
(Eliade 1964) in order to induce visionary states that are considered to have religious
significance. Such practices are believed to have been used in early forms of human
religion (La Barre 1972), and still exist in a variety of surviving indigenous cultures today.
In contrast, the use of hallucinogenic drugs such as LSD is illegal in most countries, and is
typically viewed in more negative terms. Despite this, various countercultures since the
1960s have celebrated hallucinogens and their profound effects on conscious experi-
ence. Within these subcultures, representations of ASCs have been incorporated into a
substantial amount of “psychedelic” art, literature, and music. Meanwhile, as mainstream
302 JONATHAN WEINEL

films and video games have sought to provide audiences with ever more exotic experiences
and storylines, representations of hallucination have also been incorporated.
The focus of this chapter is on the material design of these representations of
hallucination within audiovisual media and the role of sound within these. First, I discuss
the form of visual hallucinations, auditory hallucinations, and synesthesia. Following
this, I consider how these may provide a basis for the design of audiovisual artworks.
Many of these artworks can be categorized as either diegetic or synesthetic in their
essential operation, as I reveal through an examination of examples from avant-garde
films, feature films, light shows, visualizations, VJ performances, music videos, and video
games. Following this exploration, I propose a conceptual model with three continua
that describes a range of possible approaches for the representation of ASCs using
audiovisual media. One of the theoretical configurations implied by this model is what
I refer to as “augmented unreality”: the convergent layering of synthetic sensory
information on real-world environments in order to simulate hallucinatory experiences
of unreality through digital media.2 Augmented unreality benefits from technological
advances such as high-resolution computer graphics, projection mapping, and multi-
channel surround sound systems. These allow not only for greater levels of accuracy in
the representation of hallucinations but also for these to be embedded in arenas where
audiences may not be expecting such an encounter. In these spaces, the boundaries
between the real physical environment and the synthetic unreal can be subverted and
dissolved; and it is this illusory capability that presents an important new paradigm
shift for digital cultures. Early examples of augmented unreality can be seen in elec-
tronic dance music culture, in which projection mapping techniques, decor, and sonic
manipulations are combined to simulate the experience of hallucinations at outdoor
psychedelic music festivals. In this chapter, through consideration of these various
examples in relation to the conceptual model, I will demonstrate how sound is used in the
context of audiovisual representations of hallucinations, and the role it may provide in
the emerging paradigm of augmented unreality.

Altered States of Consciousness

The term “altered states of consciousness” rose to prominence in the 1960s, to describe
the variety of conscious states that lie beyond the typical experience of normal waking
consciousness (Ludwig 1969). The varieties of ASCs include: psychosis, such as may be
experienced by schizophrenics; psychedelic experiences as produced by hallucinogenic
drugs such as LSD; the hallucinations caused by sensory deprivation; states of hypnosis;
trances, as experienced in spirit possession rituals; and states of meditation that are used
in Buddhism and other religions. Dreaming is also sometimes considered as a form of
ASC (e.g., Hobson 2003), as are the unusual states that occur on the boundaries of sleep
synesthetic artworks and audiovisual hallucinations 303

such as hypnogogic hallucinations or sleep paralysis. Altered states of consciousness

may be induced through various means, including sensory deprivation; sensory
overload, during which the senses are bombarded with excessive stimulation through
rapid drumming, the spraying of liquids, and the energetic dancing of trance cere-
monies; focus or absorption in a repetitive task; or changes to physiological condition
that may be caused by fasting, dehydration, sleep deprivation, or the use of intoxicat-
ing substances such as psychoactive drugs. The symptoms of an ASC are also various,
with each type exhibiting a selection of effects. These effects may typically include
changes to thought processes, memory, emotional state, and perception. Changes to
sensory perception that occur as “hallucinations” may affect vision, body image, sound,
tastes, and smells. While hallucinations are a component of many forms of ASC, this
chapter will focus mainly on their occurrence in psychedelic experiences, and the visual,
auditory, and synesthetic aspects of these in particular. As we shall see, these aspects
of hallucinations have been significant as a basis for a large number of psychedelic,
audiovisual artworks.

Visual Hallucinations
Considering the visual effects of hallucinogens in more detail, Heinrich Klüver (1971)
carried out studies exploring the effects of mescaline on the visual system. These and
related studies (Ostler 1970; Siegel 1977) explored the commonality of visual patterns
of hallucinations between subjects. Klüver proposed a set of “form constants”: lattices,
cobwebs, funnels, and spirals that constitute the basic form from which the visual impres-
sions perceived during mescaline hallucinations are derived. According to Klüver, in the
early stages of hallucination these form constants provide the basis for the visual pat-
terns of hallucination commonly described, while in later stages of hallucination, other
forms such as tunnels may be abstracted from these basic forms. In their study, Bressloff
and colleagues (2001) suggested that these form constants arise in the visual cortex, and
they are believed to be a cross-cultural feature of visual perception during hallucinations.
Figure 15.1 presents a spiral image based on the form constants, as used in Psych Dome:
an interactive audiovisual installation based on hallucinations (Weinel et al. 2015).

Auditory Hallucinations
Though visual hallucinations seem to be more prevalent, auditory hallucinations are
also commonly reported. Studies of auditory hallucinations have mainly focused on
schizophrenics, who experience “auditory-verbal hallucinations” (AVHs), in which
voices are heard as if from the external environment or inside the head (Wayne 2012, 87).
Though AVHs are the most common type experienced by people with schizophrenia,
304 JONATHAN WEINEL

Figure 15.1 Artistic impression of visual patterns of hallucination from Psych Dome (Weinel
et al. 2015). The Psych Dome installation uses a consumer-grade electroencephalograph (EEG)
headset to control parameters of an audiovisual artwork based on hallucinations.

“non-verbal auditory hallucinations” (NVAHs) are also known to occur and may
consist of hallucinated music (Kumar et al. 2014), bangs, or noises (Jones et al. 2012).
Neuroimaging studies have suggested that auditory hallucinations activate the parts of
the brain involved in inner speech and Heschl’s gyrus (the auditory cortex), supporting
the view that auditory hallucinations are perceived with a sense of reality comparable to
that of sounds that have origins in the external environment (Dierks et al. 1999). In the
hallucinations caused by drug experiences, perception of sound is also altered, ranging
on a continuum from enhanced enjoyment (or otherwise) to distortions in sound qual-
ity and total hallucination of sounds with no external acoustic origin (Weinel et al. 2014).
The latter may consist of either AVHs or NVAHs.
Figure 15.2 illustrates a continuum of aural experience from normal waking con-
sciousness to total hallucination (as discussed in Weinel et al. 2014). In normal waking
consciousness, auditory input comes predominantly from external sensory input, which
provides a basis for aural perception. As hallucinatory effects are intensified, the per-
ceptual experience of sounds becomes enhanced; sounds are perceived as more or less
enjoyable than usual or as profoundly significant. Further along the scale, the subjective
experience of sound becomes distorted, as if properties such as volume, spatial location,
or audio quality have been altered or manipulated with digital signal processes. As these
effects intensify, the balance shifts from external to internal sensory inputs that arise
within the brain. In the most extreme cases, total experiences of hallucination occur
that consist of hallucinated noises, voices, or music that have no acoustic origin in the
external environment.
synesthetic artworks and audiovisual hallucinations 305

Synesthesia
The term “synesthesia” comes from the Greek syn (union) and aisthesis (sensation), and
describes the dissolution of boundaries between the senses (Cytowic 1989). In such
experiences, sounds may have tastes or colors may have smells. These are not merely
imagined correspondences but actual experiences across the senses that are caused by a
given stimuli. The phenomenon is reported as a general trait for some individuals in
typical states of waking consciousness. However, psychedelic drugs such as mescaline,
psilocybin, or LSD are also known to promote experiences of synesthesia. Although
synesthesia can involve the blurring of any of the sensory modalities, in these psychedelic
experiences sounds often trigger corresponding visual images (e.g., Bliss and Clark
1962, 97), suggesting the directional flow of information that is illustrated in Figure 15.3.

Toward Representation
The visual and sonic components of hallucinations can be used to inform the design of
corresponding visual images and sounds. Indeed, such practices may be very old; it has
been proposed that examples of early shamanic rock art might have been based on the
visual images seen during hallucinations (Lewis-Williams 2004). In more recent examples
of psychedelic art and films, the internal experience of hallucinations can be represented
through appropriate design of audiovisual content. The design of this content has been
assisted by developments in sound and visual technology, such as computer graphics
and audio techniques that have allowed almost any sound or visual image imaginable to
be created. These technologies have allowed the subjective visual and aural experience
of hallucinations to be represented in digital video, by creating materials that correspond

Sensory Input Perception of Sound

Normal Waking
Consciousness External Normal
Enhanced

Distorted
Total
Hallucination Internal Hallucination

Figure 15.2 Continuum of auditory hallucination.

Sound Hallucinated
Representation Imagery

Figure 15.3 Sound-to-image synesthesia.

306 JONATHAN WEINEL

with the visual or aural experiences observed during ASCs. Audiovisual artworks
have also enabled sound-to-image processes, similar to those found in synesthesia, to
be realized through the design of moving images that correspond with music.3 In recent
years, these representations of hallucination have also become interactive, as video
game technologies present simulations of hallucination or synesthesia. As we shall see,
representations of hallucination do not have to follow one fixed approach but may use a
variety of possible approaches, ranging from those that seek to replicate visual or aural
experience as accurately as possible to those that use more stylized approaches such as
impressionism, metaphorical imagery, or symbolism.

Audiovisual Representations
of Hallucinations

From observing audiovisual representations of hallucinations in a range of existing

cultural artifacts, it is possible to consider a variety of possible design approaches that
are used. In the following subsections I have grouped illustrative descriptions of some of
these artifacts into two broad categories:

• Diegetic Representations of Hallucinations

• Synesthetic Artworks

These categorizations are by no means definitive, but provide a useful means through
which we can initially begin to distinguish some key differences between works that use
representations of hallucination. “Diegetic representations of hallucinations” is the
phrase that describes representations of ASCs occurring within narrative contexts and
applies to examples in various films and 3D video games. These examples use the illusory
properties of audiovisual media in order to construct narratives involving characters in
various environments. Within these narratives, scenes of hallucination are portrayed
through the use of various audiovisual techniques that enable changes to the conscious
state of the character to be communicated to an audience. In contrast, “synesthetic
artworks” provide audiences with sensory experiences of sound and light similar to
those that may be experienced during hallucination. Artworks in this category do not
typically present these representations of synesthesia within a narrative framework;
examples of synesthetic artworks can be found in avant-garde visual music films, visual-
izations, VJ performances, music videos, and interactive music visualizations. These
two categories can also be distinguished by whether they use audiovisual represen
tations of hallucination to enrich the sensory experience of a present location (synesthetic
artworks), or immerse the audience in a narrative depiction of another time and place
(diegetic representations of hallucination). In the following subsections, each of these
categories is illustrated through a selection of examples.
synesthetic artworks and audiovisual hallucinations 307

Diegetic Representations of Hallucinations

Examples in this category incorporate representations of hallucination within narrative
progressions. In early examples this was achieved primarily through visual innovations.
For example, the classic surrealist film Un chien andalou (Buñuel 1929) creates a
dreamlike narrative through a series of bizarre, nonsequitur events intended to reflect
the irrationality of the unconscious according to the Freudian view (e.g., Freud 1899).
Various works of the “trance-film”4 genre follow a similar approach. For example, Maya
Deren and Alexander Hamid’s Meshes of the Afternoon (1943) constructs a dream narrative
through the use of props, camera techniques, and editing. In Meshes of the Afternoon, a
swaying camera suggests disorientation from the eye-view of the protagonist, while
manipulation of props and other elements suggests the unreality of the dream. Perhaps
one of the most significant early representations of hallucination, Kenneth Anger’s
Inauguration of the Pleasure Dome (1954), rises to a hallucinatory visual crescendo
following the protagonists’ ritual consumption of intoxicating substances. Anger uses a
visually stunning process of “vertical montage” (in which images are superimposed) to
reflect visionary experiences and brightly colored symbols flash on screen to reflect
Thelemic5 visual hallucinations with increasing intensity. A similar technique was also
later used in The Trip (Corman 1967) to reflect the visual hallucinations of LSD experiences.
These early examples all show significant developments in the use of props, editing and
animation techniques—essentially “special effects”—to reflect the subjective visual
experience of hallucinations within a diegetic context. However, it is notable that almost
no attempts are made to represent the diegetic experience of auditory hallucinations;
sound is simply used to provide a nondiegetic musical support that sets the mood of the
film for the audience.
By the 1990s, the widespread availability of computer-generated imagery (CGI)
and digital audio techniques allowed the possibility for more accurate representations
of the subjective experience of hallucinations, and we begin to see diegetic representations
of auditory hallucination. For example, Terry Gilliam’s (1998) cinematic adaptation of
Hunter S. Thompson’s Fear and Loathing in Las Vegas (1971) uses CGI alongside
props, costumes, and lighting to describe visual hallucinations: faces metamorphose;
vine designs on a carpet creep up the walls; rooms pulsate with colored lights; while
cameras sway and frame-rates are dropped to suggest the cognitive impairment of the
intoxicated characters from a first-person perspective. This is matched by the sound: as
Hunter S. Thompson (Johnny Depp) listens to a telephone conversation in a hotel lobby,
the sound of the stranger’s voice is processed with reverb, causing it to momentarily
fill the sound stage, in reflection of Thompson’s absorbed attention; the sound literally
fills his “head space.”6 As a receptionist transforms into a snake, a pitch transposition
effect is applied to her voice; and as Thompson wades through the mud of his reptile zoo
hallucination, the audience hears the sloshing sounds of the unreal sludge. These uses
of sound go beyond nondiegetic musical accompaniment to reflect the aural experience
of hallucinations. These sounds can be seen to reflect the full continuum suggested in
308 JONATHAN WEINEL

Figure 15.2: from sounds that have an acoustic basis within the diegetic environment, to
distorted versions of these, and sounds that are entirely internal products of hallucination
with no acoustic basis in the diegetic environment.
Later examples, such as Enter the Void (Noé 2009), push further still toward accurate
representations of hallucination with the aid of CGI and digital audio techniques. Enter
the Void uses a sustained first-person perspective: the camera presents the subjective
eye-view of the protagonist, allowing the audience to see what he sees (including his
blinking eyelids); while sound presents his aural experience so that the audience hears
what he hears. Sound is not only used to relate his conversations, but also to reveal the
inner speech of his thoughts that are delineated from vocal speech by processing the
dialogue with an echo effect. Early in the film, the character smokes a glass pipe con-
taining DMT (dimethyltriptamine), a powerful hallucinogen with a rapid onset and
short duration. As he inhales the drug and the effects onset, his vision becomes blurred
and spots of light flash across his visual field. He closes his eyes, and we see a network of
organic fibers and fractal patterns (creating using CGI), suggestive of abstractions from
Klüver’s (1971) form constants. Throughout this sequence we hear an abstract sound
collage, in which the sounds from the Tokyo streets below are processed with flangers
and other effects in order to suggest perceptual distortions and auditory hallucination.
Through these various techniques, Enter the Void demonstrates how both sound and
visual images can be used to render the subjective experience of visual and auditory
hallucination with improved levels of accuracy, so that the media presented bears
stronger resemblance to the visual and aural experiences people actually describe
during hallucinations.
In recent years, the use of computer graphics and sound to describe visual and
auditory hallucinations has also been used in interactive media, such as first-person
shooters (FPS) video games. For example, Weinel’s (2011) Quake Delirium demo project
and Far Cry 3 (Ubisoft 2012) are video game projects that provide animated visual
properties in order to simulate distortions to visual perception, while also using digital
effects and sounds to simulate auditory hallucinations. In the latter game, the simulation
of hallucination provides a means through which to enrich the narrative, but also
demonstrates an emerging paradigm shift in which games allow the player to explore
new potentialities through the simulation of altered states of consciousness in the con-
text of virtual worlds.

Synesthetic Artworks
Synesthetic artworks present audiences with experiences of light and sound that are
comparable to those that may occur during a hallucination, without the use of a clearly
defined narrative context. “Visual music” is a form of avant-garde film that is specifically
orientated toward synesthetic forms (Brougher and Mattis 2005). In the films of artists
such as Len Lye, Norman McLaren, Oskar Fischinger, and John Whitney, animated
arrangements of color and shape are used to form dynamic relationships similar to those
synesthetic artworks and audiovisual hallucinations 309

found in musical composition. While much of the work in this idiom has been
characterized by the quest for a harmonic visual language that Whitney (1980) articulated
in his writings on visual music, some works were also conceptualized as representations
of the internal experiences of the “inner eye” (Wees 1992). Harry Smith’s Early Abstractions
(1946–1957) series7 and Jordan Belson’s visual music films, such as Allures (1961) and the
unfinished LSD (1962), are notable as examples that seek to present internal sensory
experiences through film. Both artists used music as a complement to their visuals,
creating synesthetic audiovisual experiences for their audiences. Although both drew
inspiration from their own experiences of ASCs, their work can be more appropriately
seen not as attempts to convey their own first-person experience but as constructing
new sensory experiences for their audiences that provoke a form of synesthesia through
the use of audiovisual media. This approach was also explored through the use of psyche-
delic light shows such as: Jordan Belson and Henry Jacob’s Vortex Concerts; works by the
USCO collective (Davis 1975, 67; Oren 2010); and Andy Warhol’s Exploding Plastic
Inevitable shows with live music by the Velvet Underground (Youngblood 1970, 102–105;
Joseph 2002). For audiences on psychedelic drugs, these light shows may provide a com-
plementary experience; however, they also construct a multimodal experience of sound
and light for those individuals who are not operating under a chemically altered mind-
set, and this imitates the processes of synesthesia, constructing a similar experience
synthetically through sound and projections.
New technologies such as light synthesizers and computer software acted as a catalyst
for the furthering of these synesthetic audiovisual experiences from the late 1970s onward.
Early sound-to-light devices such as the Atari Video Music (1976) can be seen as simu-
lating sound-to-image synesthesia (as in Figure 15.3). Subsequently, programs such as
Jeff Minter’s Psychedelia (Llamasoft 1984), Trip-a-Tron (Llamasoft 1988), Virtual Light
Machine (VLM) (Llamasoft 1990), and later Neon (Llamasoft 2004), are successive
iterations of synesthetic equipment that incorporate progressive levels of computational
integration between sound and image (Minter 2005). Along with the availability of
computer graphics software on home computers, programs such as these, and hardware
such as the NewTek Video Toaster, would be among those that supported the nascent VJ
(“video jockey”) performances that flourished in tandem with the electronic dance
music culture8 of the 1990s, as demonstrated on the Studio !K7 X-Mix (1993–1998)
series. The mode of these is essentially one of sensory stimulation, and incorporates
replications of visual hallucinations and synesthesia: looping 3D graphics, fractals, and
cycling textures are combined in correspondence with music to produce impressions of
psychedelic hallucinations and rave culture iconography. This VJ culture became a com-
mon element of larger dance music clubs and outdoor raves and has also grown to
encompass the use of projection mapping technology that allows multiple surfaces to be
used as video screens. Modern VJ software allows the use of real-time audio parameters
as a means to manipulate graphical filters that are applied to predesigned video clips, or
as parameters that drive animations. Recent examples of this type of work include the
videos of VJ Chaotic (Ken Scott), such as Forever Imaginary (2014a), and planetarium
(“fulldome”9) works such as Crystallize (2014b).
310 JONATHAN WEINEL

As with diegetic representations of hallucination, these synesthetic artworks have

gradually been moving toward interactivity. The antecedents for this can be found in
various earlier works such as the GRAphics Symbiosis System (GRASS, c. 1975) real-
time visual system or the interactive features of Jeff Minter’s work. Robin Arnott’s game
SoundSelf (2014) is notable in this area: reportedly “inspired by a group-ohm on LSD”
(Ismail 2014), the game uses the human voice as an input to control synesthetic tunnel
visualizations reminiscent of Klüver’s (1971) form constants, and supports the Oculus
Rift virtual reality (VR) headset. Similarly, the Psych Dome software (Figure 15.1) uses
EEG as means to control real-time generation of sounds and graphics (Weinel et al. 2015)
and has been used as part of an interactive performance by Darren Curtis and Bradley
Pitt based on the concept of a vision quest: Noosphere: A Vision Quest at Adelaide Fringe
Festival (Sacred Resonance 2016).

A Conceptual Model for Audiovisual

Representations of ASCs

The discussion so far has outlined two main types of representation of hallucinatory
ASCs: diegetic representations that present hallucinations within the context of a narra-
tive progression, and synesthetic artworks that enrich the sensory experience through
the presentation of hallucinatory audiovisual experiences.10 In order to further consider
the differences implicated by examples within these groups, Figure 15.4 presents a
conceptual model describing possible approaches for the representation of ASCs using
three continua: “input,” “mode of representation,” and “arena space.”

Stylized

Mode of
Representation

Transported

Arena Space
Accurate
Situation
Input
External Internal

Figure 15.4 Conceptual model for audiovisual representation of ASCs.

synesthetic artworks and audiovisual hallucinations 311

Input
The x axis of the model describes input, and corresponds with Hobson’s (2003, 44–46)
discussion of sensory Input that can be modulated between internal and external
sources. Visual or sonic materials can be used to represent experiences of external sensory
experience (e.g., impressions of actual environmental surroundings), or internal sen-
sory experience (e.g., hallucinated visions or sounds). For instance, a narrative represen-
tation of hallucination may include visual and auditory elements that describe either
an actual environment or a hallucination. Modulation between both external and
internal elements is also possible, such as if an audiovisual representation of an actual
environment is presented with gradually increasing distortions and the introduction of
hallucinated elements.

Mode of Representation
The y axis of the model describes mode of representation, which may range from “accurate”
to “stylized.” “Accurate” representations are those that attempt to render the visual or
auditory elements of hallucination as authentically as possible for the audience; hence,
visual effects may be used to present the visual experience of hallucination in a way that
closely approximates the first-person experience, while sound may be used to render
auditory distortions and auditory hallucinations.11 At the opposite end of this con-
tinuum, “stylized” describes a wide range of artistic possibilities for rending hallucinations,
such as through the use of art styles such as impressionism, cartooning, symbolism, or
metaphorical techniques.12 Modulation between accurate and stylized approaches is
possible, such as if an accurate representation diverges into the use of metaphorical
materials during certain sequences in order to describe hallucinations. Such modulation
is not uncommon, as movie directors often show the onset of hallucinations using visual
effects or geometric patterns, before transitioning into the use of symbolic or metaphorical
cinematic materials to describe the more intense phases of hallucination.

Arena Space
The z axis of the model describes arena space: the entire performance space in which
musical and visual elements are presented.13 At one end of this continuum, “transported”
approaches are those that seek to remove the audience from the awareness of their
real-world context through immersion into the illusory audiovisual medium. This is the
position typically used by diegetic works that seek to absorb the audience into a fictional
world and narrative. At the other end of this continuum, “situational” approaches are
those that work in conjunction with the real-world environment, presenting sound and
visual images that enhance the experience of the “here and now” (as opposed to the
312 JONATHAN WEINEL

“then and there”). Synesthetic artworks such as psychedelic light shows at rock concerts
often use this approach, since they aim to stimulate the senses of the audience within the
present. Modulation between transported and situational approaches is also possible,
since an audiovisual work may operate in conjunction with the arena space or seek to
transport the listener from it at various points during a performance.

In Practice
As demonstrated in Figure 15.5, the conceptual model can be used to describe the represen-
tational approach used by various examples, such as those discussed previously.
Enter the Void (Noé 2009) uses representations of both internal and external sensory
experience and modulates between the two as the protagonist shifts between normal
and hallucinatory states of consciousness. Due to these modulations, the actual point on
the conceptual model changes through the course of the film; hence, the ellipse indicates
not one point but the approximate range that is traversed over time. The mode of represen-
tation in Enter the Void leans toward accurate representations of ASC, and as a fictional
narrative, it seeks to transport the audience from awareness of the movie theater into the
diegesis of the story.
Fear and Loathing in Las Vegas (Gilliam 1998) also uses both internal and external
inputs; the hotel lobby scene described earlier includes real-world sounds of the environ-
ment and modified versions of these that suggest movement along the continuum
toward internal sensory perception and hallucination. However, while aspects of the
visual and auditory approach used in Fear and Loathing in Las Vegas correspond with
the actual form of ASCs, the mode of representation is relatively more stylized than
Enter the Void. As the work is diegetic, use of the arena space is similarly “transported”

Stylized
Fear & Loathing in
Las Vegas
Allures

Mode of Enter the Vold

Representation
LSD Transported

Arena Space
Accurate
Situation
Input
External Internal

Figure 15.5 Examples of audiovisual works positioned on the conceptual model.

synesthetic artworks and audiovisual hallucinations 313

for this work, and indeed this is the arena space position for most works in the “diegetic
representations of hallucination” group.
Psychedelic visual music films such as those by Jordan Belson do not generally
include representations of external elements; visual elements are descriptive of visual
impressions of inner experience and therefore occupy the internal part of the axis, as
indicated for Allures (1961) and the unfinished work LSD (1962) in Figure 15.5.
Considering the mode of representation, these films each fall somewhere between
accurate and stylized positions. For instance, LSD leans toward accuracy through the
depiction of forms similar to Klüver’s form constants; it resembles the type of imagery
people actually describe during visual hallucinations with closed eyes during LSD trips.
In contrast, Allures is a more metaphorical work. Both works could be considered as
“situational,” since they aim to actually induce synesthetic experience rather than transport
the listener into a fictional narrative. The situational approach is also the typical position
for many other works discussed in the “synesthetic artworks” category, since psyche-
delic light shows and VJ performances typically seek to bombard the senses with light
and sound and enhance the sensory experience of a space, rather than extract the
individual from his or her awareness of it.

Augmented Unreality

As we have seen, audiovisual representations of hallucinations may use a variety of

different approaches. The historic development of these forms has also been closely
related to the advancement of computer graphics and digital audio. These technologies
are especially useful because, by their very nature as subjective, unreal phenomena,
hallucinations cannot be captured or recorded in the way that images and sounds from
the external environment can be. Computer graphics and sound then provide a means
to create synthetic representations of visual or sonic material based on the form of
hallucinations, thus avoiding this problem. Through digital technologies we have
witnessed a progression from the use of camera techniques and props to represent ASCs
(Un chien andalou, Buñuel 1929; Meshes of the Afternoon, Deren and Hammid 1943), to
sophisticated CGI and digital audio as a means to portray the subjective experience of
hallucinations (Fear and Loathing in Las Vegas, Gilliam 1998; Enter the Void, Noé 2009).
These technologies have also had a significant impact on synesthetic artworks that are
presented in social situations such as psychedelic rock concerts and raves, taking us
from the early model of projecting analog film with approximately synchronized music,
to real-time, audiovisual light shows involving multiple projection surfaces where visuals
are linked computationally to electronic dance music (VJ culture).
The development of synesthetic artworks that occur in social situations, often in
ad hoc locations such as outdoor performance spaces (e.g., music festivals and raves) is
314 JONATHAN WEINEL

currently on the cusp of a further new development: “augmented unreality.” In computing,

“augmented reality” describes the use of immersive computer technologies in order to
add an additional layer of information that allows the user to access additional or virtual
data. Typically, the aim of these systems is to enrich the user experience by providing
access to useful information in correspondence with the location of the user. The
proposed concept of “augmented unreality” is similar in that it also uses immersive
technologies to add an additional layer of information; however, it is distinguished by
the use of this layer to disrupt the perceived reality of the situation, effectively bending it
toward imaginary or hallucinatory experiences of unreality. In these terms, the modifi-
cation is a distortion or subversion of physical, external reality, and may exist to alter
or corrupt information rather than add new data. The aim is less to inform, and more
to misinform and destabilize the perceptual experience of the subject. Figure 15.6
illustrates the concept of augmented unreality with regard to the conceptual model for
audiovisual representations of ASC. As shown by the model, in a given situation the
audience will experience both the real-world external environment of the situation and
a synthetic construction of internal unreality that is facilitated through digital tech-
nologies. However, these two input sources converge, so that sounds or visual images
from the external environment appear as if distorted, dissolving the boundaries between
the real-world environment and the synthetic unreality.14
Augmented unreality is significantly aided by the quality with which digital technologies
allow the production of media that accurately resemble the form of hallucinations.
As discussed, high-resolution computer graphics and digital audio techniques, such as
realistic spatialization, provide a powerful means through which to construct synthetic
illusory representations of unreality that are effective and convincing. The process is
also assisted by the computational processes that link the experience of sound to the
visuals, allowing the media to form a synesthetic mesh across the modalities, imitating
the mechanism of actual synesthesia (as in Figure 15.3). Materials can also be designed

Stylized

Mode of Synthetic
Representation Internal Unreality
Transported
Real-world
External Environment Arena Space
Accurate
Situation
Input
External Internal

Figure 15.6 Conceptual model of “augmented unreality.”

synesthetic artworks and audiovisual hallucinations 315

to converge with the external environment through the use of techniques such as the
imitation and processing of visual or aural information derived from the external environ-
ment; the external environment then becomes an input source that can be subjected to
graphical or sonic transformations. We find visual examples of this in the spectacles of
projection-mapped buildings, where artists use the actual form of the building and its
texture as a basis for the design of transformed materials. In sound, we find a similar
principle in electroacoustic compositions such as Rajmil Fischman’s No Me Quedo . . .
(2000; discussed in Fischman 2008), which uses recorded sound and digital transforma-
tions to provide convergence between instrumental sounds and synthetic electroacoustic
sounds. The delivery of these illusory forms of media is supported by the availability of
increasingly powerful technologies, such as multichannel speaker systems and multi-
projection mapping systems. These allow the media to be delivered convincingly, and
their (semi)portable nature also enables the illusions to be “thrown” and sited outside of
the usual arenas of cinemas or on computer screens where we might otherwise expect to
see them. This, in turn, allows the potential for illusory encounters that are unexpected
and, in some cases, may be indistinguishable from the real, physical environment. It is
the combination of convincing illusory media, coupled with the ability to site or throw
these anywhere, that exposes an important paradigm shift for digital culture, since almost
any public space is then a potential location where perceived reality can be corrupted
through the augmented unrealities of digital media. In ideal cases, the high-quality
sound and graphics will allow the surface of the media to qualitatively approach the
point where its synthetic nature cannot be detected with certainty, while the portability
of these illusions will help to catch audiences off-guard.
Early examples of augmented unreality can be observed in electronic dance music
culture. For example, psychedelic trance culture15 prioritizes the aesthetics of the
psychedelic experience in music, and at outdoor festivals ultraviolet decor and VJ col-
lectives such as Trip Hackers and Artescape design synesthetic visual elements that are
intended to mesh with outdoor (real-world) festival environments (e.g., Dickson 2015).
Projection mapping is used in conjunction with sculptural elements that provide
custom surfaces for projection and temporary architectural spaces that imitate the form
of visual hallucinations and mandalas. These sculptural elements allow animated
fractals and tunnel elements suggestive of visual hallucinations to be integrated into
real-world environments such as forests, subverting the physical reality of these situations.
As heard on Durango’s Tumult (2005), these visual elements are typically used in
conjunction with music that includes a combination of rhythmic and melodic elements
(intended to produce maximum energetic dance effects), coupled with sounds such
as noises and voices that are suggestive of auditory hallucinations. These sounds are
manipulated using high-quality digital spatialization and transformations, enabling the
enhancements and distortions of auditory hallucinations (Figure 15.2) to be represented
through sound. Both sounds and visual materials then explicitly simulate the sensory
experience of visual and auditory hallucinations. Since the light show is linked to the
audio, the form of synesthesia is also imitated, so that the colors and movement of visual
images fluctuate and jump in response to the sounds. The overall effect is “situational,”
316 JONATHAN WEINEL

since it works in conjunction with the real-world, outdoor setting of the festival,
integrating real environmental features such as trees, birds, and the skyline into the
equation. Digital media are used to elicit a synthetic experience of unreality in a manner
that blends with the real, physical environment, and thus augmented unreality (Figure 15.6)
is accomplished. In these situations, it is entirely possible that the audience may begin to
experience dissolution of the boundaries between the real environment and the synthetic
presentations of unreality. This may be especially true for audiences using chemical
substances to alter their mind-sets, however drugs may not be a prerequisite, since the
illusory properties of digital media alone could be sufficient to provide such experiences.
As the audiovisual technologies discussed thus far become pervasive, the capability
to convincingly invoke augmented unreality should increase. Although I have character-
ized augmented reality here in terms of projections and loudspeakers, it is possible that
other emerging audiovisual technologies could also be used to achieve similar effects.
For example: wearable video equipment such as the “smart contact lenses,” which play
and record video (currently in development); augmented/mixed reality glasses such as
the Microsoft’s Hololens; or headphone systems such as Doppler Labs’ Here (Doppler
Labs 2015), which modifies and filters sounds from the external environment, are among
those could theoretically be used to simulate hallucinations and achieve augmented
unreality.16 The long-term implications of this type of media could be dramatic, as the
glow of synthetic virtual environments and their accompanying sonic vibrations extend
over the everyday, allowing the potential to simulate ASC experiences without the use of
intoxicating substances.

Concluding Remarks

This chapter has provided an outline of the main effects of hallucinations (a form of
ASC) with regard to the visual and aural components of the experience, including
sound-to-image synesthesia. As we have seen, the typical form of psychedelic hallucin
ations follows some structural norms that produce commonality in the experiences
between participants. These norms have allowed the representation of hallucinations
in a variety of audiovisual media such as films, visualizations, and computer games.17
These can be broadly classified in terms of diegetic representations of hallucination
and synesthetic artworks and may use a range of possible approaches. These possible
approaches can be considered in terms of the conceptual model presented, which
allows the use of input, mode of representation, and arena space to be considered for a
given work. The conceptual model also allows us to reflect on the recent move toward
improved accuracy found in representations of hallucination and as afforded by digital
technologies for sound and computer graphics. I have argued that this drive toward
realism, coupled with new technologies for siting work in ad hoc locations, has opened
up a new paradigm of “augmented unreality,” in which real external environments
and synthetic representations of unreality converge. Augmented unreality is currently
synesthetic artworks and audiovisual hallucinations 317

exemplified by the synesthetic environments of psychedelic trance festivals, but over the
next few decades we can expect the trend to grow as illusory audiovisual technologies
become increasingly pervasive. As these technologies provide improved resolutions and
capabilities for modifying audience experience, the boundaries between external reality
and synthetic unreality may dissolve to the point where the two can no longer be distin-
guished; in effect, producing synthetic digital forms of ASCs.

Notes
1. In drawing on Hobson’s distinction of “external” and “internal” sensory inputs, we should
note that he does not propose these as binary categories, but rather a continuum of possible
states. It is acknowledged that internal processes can significantly shape normal waking
consciousness, and indeed, conversely, in some cases the contents of dreams can also be
influenced by external sensory inputs. What is important here is the main origin of sensory
material, which in normal waking consciousness is predominantly “external,” unlike
dreams and hallucinations that are primarily “internal.” As I explore in this chapter, both
the real (external) and the unreal (internal) can provide a basis for corresponding art,
sound, and music.
2. While the emphasis here is on digital practices, many of the essential approaches I explore
in this chapter were first proven with analog technologies such as film and magnetic tape,
and before that, techniques such as painting and the use of acoustic instruments.
3. It should be acknowledged here that experiences of synesthesia are highly individualized;
nonetheless, in drug experiences we find that a common mechanism of sound-to-image
synesthesia occurs, along with typical visual effects such as the “form constants”
(Klüver 1971). In this regard, there are generalizable processes that audiovisual media can
begin to reproduce, even if the specific manifestations of synesthesia that are experienced
by individuals may remain somewhat elusive.
4. As discussed by Sitney (1979, 21), the concept of the “trance-film” (similar to the “psycho-
drama”) describes films on such themes as dream, somnambulism, ritual, or possession.
5. “Thelemic” refers to the use of iconography derived from Aleister Crowley’s Thelema
religion, which Anger was a member of. These icons are presented in Inauguration of the
Pleasure Dome (Anger 1954) as if they were visual hallucinations, suggesting that the ritual
invokes visionary experiences related to the Thelemic principles.
6. For further discussion of the metaphorical use of reverb to suggest internal psychological
processes in films and popular music, see Doyle (2005).
7. During this period Harry Smith created a series of untitled films, of which several were
subsequently lost or destroyed (Sitney 1979, 232–233). Early Abstractions (1946–1957)
collects the remaining films from this series.
8. For a further discussion of electronic dance music culture, see St. John (2009).
9. “Fulldome” environments project video on to the hemispherical ceiling of a dome structure,
in order to provide an immersive 360° experience. These environments are used for
planetarium shows, but have also been used to provide various forms of expanded cinema.
Notable fulldome events showcasing new work in the United Kingdom have included
Mario DiMaggio’s Dome Club series and FullDomeUK.
10. The description of “hallucinatory audiovisual experiences” here does not presume that
audiences experience a hallucination in exactly the same way as would be precipitated by
318 JONATHAN WEINEL

other means (e.g., psychedelic drugs); rather, the experience of sound and images may
elicit distinct illusory experiences that imitate the form of hallucinations.
11. Instead of the term “accurate” we might otherwise have used the term “realistic” here, to
describe the stylistic approach taken, in correspondence with “realist” approaches in the
visual arts (e.g., photorealism). As Kennedy (2008, 449–450) remarks, realist approaches
can be used for depicting actual scenes, but they can also be used when rendering the
imaginary (or in this case, the hallucinatory). However, for our purposes here the
terms “realist” or “realistic” are unhelpful, since by definition the hallucinatory is unreal;
hence the term “accurate” is preferable, to avoid having to describe unreal materials as
also “realistic.”
12. For a further discussion of metaphors in art, see also Kennedy (2008).
13. The term “arena space” is borrowed from Smalley (2007), and describes “the whole public
space inhabited by both performers and listeners” (42) Here, the term is adapted to include
audiovisual elements.
14. The convergence of synthetic and real-world materials here is a development and adap-
tation of Fischman’s (2008) discussion of convergence of instrumental and electronic
materials in electroacoustic music, especially his own composition No Me Quedo . . . (2000).
15. For more information on psychedelic trance culture, see St. John’s definitive account
Global Tribe: Technology, Spirituality and Psytrance (2012).
16. In a series of public lectures, Carl Smith (see also 2014, 2016) has described these and
other technologies as enabling a new paradigm that he refers to as “context engineering”:
computer systems that allow the user to modify his or her contextual awareness, using
“reality as a medium.” In these terms, “augmented unreality” could be considered as a
specific branch of context engineering.
17. For an expanded discussion of how ASCs may be represented or induced across a wide
range of electronic music and audiovisual media, see also Weinel (2018).

References
Anger, K., dir. 1954. Inauguration of the Pleasure Dome.
Arnott, R. 2014. Soundself. Video game.
Belson, J., dir. 1961. Allures. USA.
Belson, J., dir. 1962. LSD. USA.
Bliss, E. L., and L. D. Clark. 1962. Visual Hallucinations. In Hallucinations, edited by L. J. West,
92–107. New York: Grune & Stratton.
Bressloff, P. C., J. D. Cown, M. Golubitsky, P. J. Thomas, and M. C. Wiener. 2001. Geometric
Visual Hallucinations, Euclidean Symmetry and the Functional Architecture of Striate
Cortex. Philosophical Transactions: Biological Sciences 356:299–330.
Brougher, K., and O. Mattis. 2005. Visual Music: Synaesthesia in Art and Music since 1900.
London: Thames & Hudson.
Buñuel, L., dir. 1929. Un chien andalou. France.
Corman, R., dir. 1967. The Trip. American International Pictures.
Cytowic, R. E. 1989. Synesthesia: A Union of the Senses. New York: Springer-Verlag.
Davis, D. 1975. Art and the Future. New York: Praeger.
Deren, M., and A. Hammid, dirs. 1943. Meshes of the Afternoon. USA.
Dickson, C. 2015. Earthdance Cape Town 2015: Main Stage Installation and Video Mapping by
Afterlife. Vimeo. https://vimeo.com/139905544. Accessed October 25, 2015.
synesthetic artworks and audiovisual hallucinations 319

Dierks, T., D. E. J. Linden, M. Jandl, E. Formisano, R. Goebel, H. Lanfermann, et al. 1999.

Activation of Heschl’s Gyrus during Auditory Hallucinations. Neuron 22:615–621.
Doppler Labs. 2015. Here Active Listening. http://www.hereplus.me/. Accessed October 25, 2015.
Doyle, P. 2005. Echo and Reverb: Fabricating Space in Popular Music Recording 1900–1960.
Middletown, CT: Wesleyan Press.
Durango. 2005. Tumult. Italy: Inpsyde Media.
Eliade, M. 1964. Shamanism: Archaic Techniques of Ecstasy. Princeton, NJ: Princeton
University Press.
Fischman, R. 2000. No Me Quedo . . . (17:30). Available on: R. Fischman . . . A Wonderful
World. EMF.
Fischman, R. 2008. Mimetic Space: Unravelled. Organised Sound 13 (2): 111–122.
Freud, S. 1899. The Interpretation of Dreams. Ware, UK: Wordsworth Editions.
Gilliam, T., dir. 1998. Fear and Loathing in Las Vegas. USA: Universal Pictures.
Hobson, J. A. 2003. The Dream Drugstore. Cambridge, MA: MIT Press.
Ismail, R. 2014. Robin Arnott Presskit: Soundself. SoundSelf. Video Game Website. http://
soundselfgame.com/presskit/sheet.php?p=soundself. Accessed October 25, 2015.
Jones, S. M., T. Trauer, A. Mackinnon, E. Sims, N. Thomas, and D. L. Copolov. 2012. A New
Phenomenological Survey of Auditory Hallucinations: Evidence for Subtypes and
Implications for Theory and Practice. Schizophrenia Bulletin 40 (1): 231–235.
Joseph, B. W. 2002. “My Mind Split Open”: Andy Warhol’s Exploding Plastic Inevitable.
Grey Room 8:80–107.
Kennedy, J. M. 2008. Metaphor and Art. In The Cambridge Handbook of Metaphor and
Thought, edited by R. W. Gibbs, 447–461. Cambridge: Cambridge University Press.
Klüver, H. 1971. Mescal and Mechanisms of Hallucinations. Chicago: University of Chicago Press.
Kumar, S., W. Sedley, G. R. Barnes, S. Teki, K. J. Friston, and T. D. Griffiths. 2014. A Brain Basis
for Musical Hallucinations. Cortex 52:56–97.
La Barre, W. 1972. The Ghost Dance: The Origins of Religion. London: Allen & Unwin.
Lewis-Williams, J. D. 2004. The Mind in the Cave. London: Thames & Hudson.
Llamasoft. 1984. Psychedelia. Commodore 64.
Llamasoft. 1988. Trip-a-Tron. Atari ST.
Llamasoft. 1990. Virtual Light Machine (VLM). Atari Jaguar.
Llamasoft. 2004. Neon. X-Box 360.
Ludwig, A. M. 1969. Altered States of Consciousness. In Altered States of Consciousness:
A Book of Readings, edited by C. T. Tart, 9–22. New York: John Wiley & Sons.
Minter, J. 2005. Neon. Llamasoft: Home of the Virtual Light Machine and the Minotaur Project
Games. http://minotaurproject.co.uk/neon.php. Accessed October 25, 2015.
Noé, G., dir. 2009. Enter the Void. France: Fidélité Films.
Oren, M. 2010. Getting Out of Your Mind to Use Your Head. Art Journal 69 (4): 76–95.
Ostler, G. 1970. Phosphenes. Scientific American 222 (2): 79–87.
Sacred Resonance. 2016. Noosphere: A Vision Quest. Interactive audio-visual performance).
Adelaide Planetarium, March 4–6, 2016. Adelaide, Australia. http://www.sacredresonance.
com.au/#!noosphere-/c1y6h. Accessed April 11, 2017.
Scott, K. 2014a. VJ Chaotic: Forever Imaginary. Available on Various Artists Optical Research.
DVD. London: Hardcore Jewellery.
Scott, K. 2014b. Crystallize. USA.
Siegel, R. K. 1977. Hallucinations. Scientific American 237 (4): 132–140.
Sitney, P. A. 1979. Visionary Film: The American Avant-Garde 1943–1978. 2nd ed. Oxford:
Oxford University Press.
320 JONATHAN WEINEL

Smalley, D. 2007. Space-Form and the Acousmatic Image. Organised Sound 12 (1): 38–58.
Smith, C. H. 2014. Context Engineering Hybrid Spaces for Perceptual Augmentation. In
Electronic Visualisation and the Arts (EVA 2014), 244–245. London: British Computer
Society. http://www.bcs.org/upload/pdf/ewic_ev14_s18paper3.pdf. Accessed September 29,
2016.
Smith, C. H. 2016. Context Engineering Experience Framework. In Electronic Visualisation
and the Arts (EVA 2016), 191–192. London: British Computer Society. http://dx.doi.org/
10.14236/ewic/EVA2016.37. Accessed September 29, 2016.
Smith, H. E., dir. 1946–1957. Early Abstractions. USA.
St. John, G. 2009. Technomad: Global Raving Countercultures. London: Equinox.
St. John, G. 2012. Global Tribe: Technology, Spirituality and Psytrance. London: Equinox.
Studio !K7. 1993–1998. X-Mix. Video Series.
Thompson, H. S. (1971) 2005. Fear and Loathing in Las Vegas. Reprint. London: HarperCollins.
Ubisoft. 2012. Far Cry 3. Sony PlayStation 3.
Wayne, W. U. 2012. Explaining Schizophrenia: Auditory Verbal Hallucination and Self-
Monitoring. Mind and Language 27 (1): 86–107.
Wees, W. C. 1992. Making Films for the Inner Eye: Jordan Belson, James Whitney, Paul Sharits.
In Light Moving in Time: Studies in the Visual Aesthetics of Avant-Garde Film, edited by
W. C. Wees, 123–152. Berkeley: University of California Press. http://publishing.cdlib.org/
ucpressebooks/view?docId=ft438nb2fr;brand=ucpress. Accessed October 25, 2015.
Weinel, J. 2011. Quake Delirium: Remixing Psychedelic Video Games. Sonic Ideas (Ideas
Sonicas) 3 (2): 22–29.
Weinel, J. 2018. Inner Sound: Altered States of Consciousness in Electronic Music and Audio-
Visual Media. New York: Oxford University Press.
Weinel, J., S. Cunningham, and D. Griffiths. 2014. Sound through the Rabbit Hole: Sound
Design Based on Reports of Auditory Hallucination. In ACM Proceedings of Audio Mostly
2014. Denmark: Aalborg University. doi: 10.1145/2636879.2636883
Weinel, J., S. Cunningham, N. Roberts, S. Roberts, and D. Griffiths. 2015. EEG as a Controller
for Psychedelic Visual Music in an Immersive Dome Environment. Sonic Ideas (Ideas
Sonicas) 7 (14): 85–91.
Whitney, J. H. 1980. Digital Harmony: On the Complementarity of Music and Visual Art.
Peterborough: Byte Books/McGraw-Hill.
Youngblood, G. 1970. Expanded Cinema. New York: E. P. Dutton.
chapter 16

Consum er Sou n d

Søren Bech and Jon Francombe

Introduction

This chapter deals with one of many methods (namely descriptive sensory analysis) used
for the objectification and quantification of the consumer’s imagination with respect to
the audio signal; it provides a justification of the method as used in the audio industry
for the design of audio playback technology that maximizes the potential for control
ling or improving the consumer’s auditory imagination. In the first section, the basic
assumptions and procedures behind sensory analysis are introduced. These are exem
plified by the quantitative descriptive analysis (QDA) method. QDA is one of the basic
methods in sensory analysis of food or sound quality; it addresses and controls the
complex influence of an individual listener’s expectations, mood, previous experiences,
and so on in an experimental context. In the following section, an example of a complete
sensory analysis of a complex sound field is provided, followed by details of the subse
quent development of a perceptual model for prediction of the attribute distraction in a
particular type of sound field. In the final section, upcoming and future developments in
this area are discussed.
The traditional role of the audio industry1 has been to provide means for a listener to
perceive and experience audio content (as made by some content creator, e.g., a music
artist or sound designer) at anytime and anywhere after the production of the content.
This includes products or services that are used to record and store the sound (micro
phones, tape, records, CD players, and so on); processes and products for transmission
of the sound to the end consumer; and finally, products for reproducing the sound in the
consumer’s home, car, or other listening venue. Theile (1991) states that a reproduction
system should “satisfy aesthetically and it should match the tonal and spatial properties
of the original sound at the same time.” A primary goal of the industry has therefore
always been “transparency”—that is, to create an impression or auditory experience2 for
the listener so that, for example, during a news broadcast it is possible for the listener to
form an auditory image of the announcer being “in the listening room” (as opposed to
322 søren bech and jon francombe

being in a remote studio). Another example is to enable any listener to imagine that he
or she is in the concert hall where, say, a classical music performance took place. The
main goals for researchers in academia and industry have therefore been to understand
the processes involved in the entire transmission chain (from recording to repro
duction) and to develop products that allow the listener to perceive auditory images that
(1) correspond to actual “participation” in the original performance; and (2) accurately
reflect the original and unmodified intentions of the artist and the producer.
This goal has driven a range of research areas under the general term “acoustics” that
is defined by ANSI/ASA (2013) as: “(a) Science of sound, including its production, trans
mission, and effects, including biological and psychological effects; (b) Those qualities
of a room that, together, determine its character with respect to auditory effects.” Specific
areas in the present context include “communication acoustics” (Blauert 2005; Pulkki
and Karjalainen 2015) and signal processing in acoustics (Havelock et al. 2008). The
audio industry has continuously improved or developed new techniques and a range of
products with the overall purpose of improving the ability of the rendering/reproduction
process to allow the listener to experience a perceptual image equivalent to that which
would accompany the original acoustic event. For example, over a number of decades
the optimal reproduction system has developed from a single channel (monophonic)
reproduction system through two-channel stereophony, 5.1 “surround sound,” and more
recently to advanced surround sound systems including 22.2 reproduction (Hamasaki 2011,
and references therein). Such systems and their evaluation are discussed further at the
end of this chapter. The increase in complexity of the recording/reproduction systems
was in part made possible by the introduction of digital signal processing; however, it
was not until the introduction of advanced encoding and decoding of audio and video
signals in products that such multichannel systems and other signal “manipulation”
techniques became widely available to consumers.
The introduction of digital signal processing in mass-market audio and video prod
ucts such as mp3 audio players produced a new range of possibilities for further improv
ing the quality of audio or video signals, and therefore the quality of “imagination” based
on the consumer’s auditory experience. In addition to benefits such as general higher
quality, increased number and availability of programs, and more advanced features, a
number of signal artifacts were unfortunately also introduced. These included “ringing”3
in audio and “squared clouds”4 in video. These artifacts were very noticeable even for the
average consumer. In order to remove or technically compensate for these imper
fections, the industry had need of “measuring” methods that could connect the physical
properties of the signals with the perceived auditory impression of the consumers. This
was not a new problem or topic area; researchers in psychophysics had been investigat
ing such relationships for years (see, e.g., Gescheider 2015, for an introduction), focusing
on “simple” auditory experiences such as the perceived strength of a sound (loudness).
However, the new problem was how to quantify complex multidimensional experiences
that, in addition to simple attributes such as loudness and timbre, also included a
number of completely unnatural artifacts (such as “squared clouds” in video). The
first task in this process was therefore to devise an experimental paradigm that could
consumer sound 323

be used to disentangle a complex auditory or visual experience into a number of

subcomponents—the so-called attributes—and then to apply standard psychophysical
procedures for each of the attributes.
The first author was faced with such a task in 1994, when the EUREKA-funded Adonis
project “Perceptual image quality of television displays” was started with Philips NatLab,
Philips TV, and Bang & Olufsen as partners (Bech et al. 1996). The purpose of the project
was to develop a framework that could be used to quantify the general image quality of
CRT displays and to establish a perceptual model or group of models that would link
physical measurements with a viewer’s general quality impression.
The idea of splitting complex percepts into individual attributes was not new. In 1922,
Sabine identified individual attributes of the overall impression of concert halls such as
loudness and distortion (Sabine 1922); later, Beranek (1962) identified further attributes
of concert hall sound and also developed questionnaires and rating schemes to be used
in experiments with human beings (in the following termed assessors). In the audio
field, researchers including Staffeldt (1974), Gabrielsson and Sjögren (1979), Toole (1982),
and Bech (1994) started developing procedures inspired by the work in concert halls, but
with the purpose of understanding perception of reproduced sound by loudspeakers in
domestic settings. The focus later moved to communication sound quality, particularly
for mobile phones (see Zacharov 2012, for a historical overview). The food industry
had also developed and standardized procedures (ISO 1993, 1994, 2002a, 2002b) for
description of food quality, based on identification of individual attributes (see Lawless
and Heyman 1998, for an overview).
The partners in the Adonis project, inspired by the tradition in the food industry,
developed the method “rapid perceptual image description” (RaPID) (Bech et al. 1996)
for quantification of image quality of television displays that has since been used by, for
example, Bang & Olufsen and other television manufacturers. The method includes a
specification of the entire evaluation process, including selection and training of asses
sors, conducting the experiments, and statistical analysis and reporting of the results.
The process of evaluation of consumer products using such methods is termed sensory
analysis or sensory evaluation.
The RaPID method was transferred and implemented for audio evaluations at Bang &
Olufsen by the first author and later used in a number of audio and video research
projects. Bech and Zacharov (2006) described the state-of-the art within sensory
analysis of sound and since then a number of research projects have used and further
developed the range of methods. The International Telecommunication Union—
Radiocommunication Sector (ITU-R) has also developed standardized methods for
perceptual evaluation of audio components such as low bit-rate codecs (ITU-R 1997, 2015).
In the food area, new “fast-track” methods have recently been developed to aid a more
efficient sensory analysis in industrial settings. These are currently being further tested
in audio (see Kaplanis et al. 2017a, 2017b; Moulin et al. 2016).
The introduction of so-called audio objects in the transmission protocol for the
broadcasting of sound (Herre et al. 2014) and in cinema sound (Kjörling et al. 2016)
has further emphasized the use of sensory analysis, as it is now possible to manipulate
324 søren bech and jon francombe

the spatial properties of the rendering to a much larger degree than ever before,
thereby further increasing the degrees of freedom for controlling or improving the
listener’s auditory images. This means that sensory analysis of spatial properties of
reproduced sound is currently a hot research topic in many large projects—see, for
example, the work of the “S3A: Future Spatial Audio for an Immersive Listener
Experience at Home” project.5

Sensory Analysis of Complex

Audio Stimuli

The introduction of new signal processing techniques (as discussed earlier) meant that
perceptual audio scientists needed to develop ways of quantifying the auditory experiences
of consumers presented with complex auditory stimuli. Various methods, often adapted
from other sensory sciences (for example, food science), have been used to achieve this
aim. This section includes a description of the basic assumptions of descriptive analysis and
the main principles of the QDA method. The content will be a summary of information
presented by Bech (1999), Bech and Zacharov (2006, chap. 4), and Martin and Bech
(2005). Readers are referred to these publications for additional details of QDA, other
methods, and general references.
The use of assessors to evaluate and report on the auditory experiences produced by a
certain set of stimuli in a scientifically valid manner requires (at least) two basic issues to
be clearly defined: first, the question the assessor is required to answer; and second,
a specification of how the assessor should report the answer.
The definition of the question to the assessor depends on the purpose of the experi
ment and the stimuli he/she is subjected to in the experiment. In a laboratory setting,
specific stimuli can be engineered to answer specific questions; conversely, in field set
tings the stimuli will be naturally occurring and this will have an impact on the type of
questions that can be posed to the assessors. In the Adonis project (introduced previ
ously), the key questions were related to general image quality—or lack thereof—due to
artifacts in the processing of the natural images; therefore, the stimuli had to be complex
natural images. However, in order to establish a scientifically valid relationship between
the physical phenomena and signal processing introduced to the original image and the
overall quality changes, it was necessary to focus first on specific aspects or attributes of
the image, and then to determine how these contributed to the overall image quality.
The simplified conceptual model of human perception shown in Figure 16.1 was there
fore established, inspired by previously developed models by Plomp (1976), Nijenhuis
(1993), Yendrikhovskij (1998), and Stone and Sidel (2004).
The process starts with a physical stimulus—in this case a sound field—that impinges
on the auditory system of an assessor. The sound field can be described by a number of
physical variables Φk each with a physical strength or intensity (e.g., sound pressure level
consumer sound 325

Sound field

Φ1…………………Φk Physical variables

Auditory system

Ψ1…………………Ψl Auditory attributes

Learning & ?

S1…………………Sm Sensorial strength

Context & ?

Individual
I1…………………In
impressions

Combination rules

Total auditory
Itot
impression

Figure 16.1 Conceptual model of human perception of multidimensional complex auditory

stimuli (from Bech et al. 1996).

and frequency). The auditory system transforms the mechanical activity of the eardrum
into nerve impulses that are assumed to be combined in the brain of the assessor, result
ing in a number of specific auditory attributes Ψl (e.g., pitch, loudness), each with a sen
sorial strength Sm. The sensorial strength of each attribute depends on the physical
strength of the variables Φk in combination with the properties of the auditory system
and experimental factors such as learning effects. For the present purpose it is sufficient
to characterize these properties by the auditory sensitivity (e.g., can you hear the sound
or not) and selectivity for example, the “just noticeable difference threshold” (i.e. can a
certain physical change of an audible sound be noticed or not).6
The next step in the process is assumed to be the result of a combination of the
individual attributes Ψl, with sensorial strength Sm, into specific impressions In. Finally,
these individual impressions are combined into an overall auditory impression Itot.
It is assumed that the combination of the specific attributes Ψl (each with a sensorial
strength Sm) into individual impressions In, as well as the combination of individual
impressions into an overall impression, depends on context, expectations, the mood of
the assessor, and so on.
326 søren bech and jon francombe

The assumed relationship between the physical domain and the attribute domain is
shown in Equation 1. Equation 2 shows the relationship between the attribute domain
and the total auditory impression.

Ψ1 = k 1Φ1 + k 2Φ2 + k 3Φ3 +…+ k kΦk + ε1 (1)

I tot = m 1I1 + m 2 I2 + m 3 I3 +…+ m n I n + ε tot , (2)

where kk represents a weighting factor reflecting the importance of the physical variable,
mn represents the weight of each individual impression in forming the overall impres
sion, and ε represents the noise or unexplained variance in the dataset. The assumption
made when using QDA or similar methods is that the assessor rating7 of each stimulus
for each attribute corresponds to its sensorial strength (Sm).
These very simplistic engineering relations have been shown to be able to describe the
experimental results and predict the outcome of new experiments for a large number
of situations in, for example, audio, video, or food quality experiments. It is also noted
that if these relationships can be established then it is possible to relate changes in the
physical variables directly to changes in the general impression of the assessors. This
represents key information for understanding human behavior and for the development
of new food, fragrance, audio, or video products, and explains why the descriptive
methods are used commonly by manufacturers in these areas. It is important to note
that Equations 1 and 2 do not represent a complete model of the human decision process
and they should not be expected to describe more than a maximum of 80–90 percent of
the variance in a dataset. However, this is quite often enough to make some very useful
estimations of future assessor behavior.
The model shown in Figure 16.1 was developed into the filter model (shown in
Figure 16.2) by Pedersen and Fog (1998).

Physical domain Perceptive domain Affective domain

Filter 1, the senses Filter 2, other factors

Physical
Perceived stimulus +/– Likes/dislikes
stimulus

Auditory Sensitivity Mood

and Selectivity Context
Emotion
Background
Expectation

Figure 16.2 The “filter model” developed by Pedersen and Fog (1998) inspired by Bech et al.
(1996) to describe the process of human sound perception.
consumer sound 327

The model now operates with three domains—the physical, the perceptual, and the
affective domains—which are characterized by the measurement principle that is nor
mally applied. In the physical domain, standard physical measures are used to character
ize the stimuli; in the perceptual domain, the stimuli are characterized by the assessor’s
judgment of the sensorial strength of the relevant individual attributes; and finally, in
the affective domain, the assessor’s rating of, for example, the overall auditory impres
sion (see Eq. 2) is used for characterization of the stimuli.
The general idea or principle of descriptive analysis is therefore to identify the indi
vidual attributes for the stimuli of interest and have assessors judge the sensorial strengths
of each of these. This is often done under highly controlled laboratory conditions
using a limited number of assessors (e.g., 15–20) as described for the QDA method. The
affective assessments are then established in the field using a large number of consumers
(e.g., 100–200). The two resulting perceptual datasets can be combined with the physi
cal variables; this process can provide the requisite information to be able to advise
the engineering department on how to achieve a certain strength of individual attributes
or overall impression.
The QDA method was developed by Stone and Sidel (2004) and is one of basic
methods that specifies in detail the entire experimental process, including identification/
elicitation of attributes, training of assessors, planning and conducting experiments,
analyzing the results, and presentation of the results and conclusions. The method is
described by Stone and Sidel:

a sensory methodology that provides quantitative descriptions of products, based

on perceptions of a group of qualified subjects. It is a total sensory description tak
ing into account all sensations that are perceived—visual, auditory, olfactory,
kinaesthetic; and so on—when the product is evaluated. The word product is used
here in the figurative sense; the product may be an idea or concept, an ingredient, or
a finished product purchased and used by the consumer. The evaluation can also be
total, for example, as in evaluation of a shaving cream before, during, and after use.
Alternatively, the evaluation can focus on only one aspect such as the use. The evalu
ation is defined in part by the product characteristics as determined by the subjects,
and in part by the nature of the problem. (2004, 203)

The QDA method exhibits the following properties (only those relevant to audio evalu
ations are listed here). The QDA method:

• is responsive to all sensory aspects of a product or stimulus;

• relies on a limited number of assessors for each test;
• uses assessors tested and trained before participating in the test(s);
• uses a language development process free from panel leader influence;
• provides quantitative results;
• employs a repeated trials experimental design; and
• specifies a data analysis strategy.
328 søren bech and jon francombe

The QDA method employs the so-called direct elicitation principle, in contrast to
other indirect elicitation methods. The direct elicitation principle assumes that there is a
close relationship between the individual attributes (Ψl) and verbal descriptors (single
words) elicited as a part of, for example, the QDA process. This is contrary to the indi
rect elicitation principle where it is not assumed that this relationship exists, and other
methods are used—for example, multidimensional scaling (see Shifmann et al. 1981, for
an introduction), in which the assessors rate only the perceived (dis)similarity between
the stimuli. The statistical analysis then allows for an identification of the individual
sensory dimensions that are assumed to be related to individual or groups of related
attributes. There are advantages and disadvantages to direct and indirect elicitation
techniques; both have been used—sometimes in conjunction—in audio attribute
elicitation studies. A full discussion is beyond the scope of this chapter, but Mason and
colleagues (2001) present a detailed review of the challenges of capturing a listener’s
imagination or impression of an auditory scene using verbal descriptors.
The direct elicitation principle assumes that it is possible to elicit a number of words
(a vocabulary) where each word corresponds to a specific attribute.8 Two main tech
niques exist for elicitation of this vocabulary:

1. consensus vocabulary techniques, in which a common vocabulary is developed

and agreed on by a team of highly trained assessors and
2. individual vocabulary techniques, in which each assessor develops a specific
vocabulary and the common vocabulary is established using statistical analysis of
the combined set of ratings from all assessors.

The QDA method uses the consensus vocabulary technique, and Lawless and
Heyman (1998) list the following properties, in order of importance, which each word in
the vocabulary should preferably fulfill. An attribute should:

• be able to discriminate between the stimuli;

• have little or no overlap with other words in the vocabulary;
• be related to concepts that influence naïve (consumers’) preference decisions;
• relate to physical measures defining the stimuli;
• be singular rather than combinations of several words or those holistic9 in
nature;
• be precise and reliable;
• be able to generate consensus between the members of the panel of assessors;
• be unambiguous to all assessors;
• be specifiable by a reference stimulus that is easy to obtain;
• have communication value and not be based on jargon; and
• be related to reality.
consumer sound 329

The QDA method defines a number of basic steps when applying the method:

1. selection of appropriate assessors;

2. development of the vocabulary and training of the assessors;
3. conducting the tests and report evaluations in quantified form; and
4. statistical analysis and presentation of the results.

Meilgaard and colleagues (1991) list the following generic requirements for selection
of a panel of assessors. Assessors should have the ability to:

1. detect differences between the attributes present and in their intensities;

2. describe those attributes using (1) verbal descriptors and (2) scaling methods for
differences in intensity;
3. abstract reasoning as descriptive methods depends on the use of references when
attributes must quickly be recalled and applied to other stimuli; and
4. participate in the ongoing training program and work of the panel.

The practical implementation of these requirements depends on the sensory

modality in question, but, as an example, Martin and Bech (2005) used the following
implementation when establishing a permanent panel for listening tests.
A general invitation to participate in the selection process was sent via the intranet to
all employees of Bang & Olufsen, and approximately sixty people responded positively.
To test criterion 1, each applicant was first subjected to a pass/fail standard audiometric
test using a maximum 20 dB deviation (ISO 1984) for no more than one ear as the
criterion. This test reduced the number of applicants to approximately thirty. A further
three tests were devised to assess the applicants’ abilities to hear various manipulations
of standard stereophonic material. The changes ranged from known threshold values
to clearly audible changes for known attributes such as timbre, spatial changes, and
distortion.
To test criteria 2 and 3, a standard fluency test (Spreen and Strauss 1998; Wickelmaier
and Choisel 2005) was used. Finally, an Occupational Personality Questionnaire inter
view was conducted to test the ability and suitability of the applicants to work in a team.
Based on their average score in the tests, applicants were rank ordered and ten assessors
were invited to be members of the permanent panel.
Once the permanent panel of assessors has been established, the QDA process parts
aimed at developing the vocabulary and starting the training process can be initiated.
The development of a vocabulary for the first time involves typically six phases, as described
in the following. Once the initial vocabulary has been established, the ongoing training
of the panel is used to maintain the vocabulary and to develop new words when new
types of stimuli (e.g., a new category of products) become relevant.
330 søren bech and jon francombe

Phase one includes a representative set of stimuli that excites all of the sensory
ifferences that are relevant for the experiment or product portfolio at hand. Before the
d
first session, assessors are typically asked to prepare their own list of words that they can
imagine for a predefined scenario—for example, considering the differences between
the sound reproduction equipment they possess in their home. In the first session the
team is subjected to the stimuli and asked to explain/discuss the meaning of their contri
butions and organize all of the contributed words into categories that represent the
same meaning/interpretation/percept. Phase two includes removing duplicate words
in each category, agreeing on a common word or attribute for each category, and eventu
ally adding a brief description of how the attribute should be interpreted. Phase three
includes further discussions of the categories of words and the agreed on common
attribute for each category, followed by the selection of representative stimuli that clearly
exhibit the agreed attribute. Phase four includes the first series of practical tests, where
the differences between the stimuli cover a large perceptual range. The subjects are asked
to discuss and define the endpoint markers of a rating scale that will be used to rate
or rank-order the stimuli for each of the agreed common attributes. The subjects also
familiarize themselves with the process of scaling the intensities of the selected stimuli
for each of the attributes. Typically, a graphical 15-cm horizontal line with no tick
marks except endpoint markers offset by 1.5 cm at each end is used for the rating process.
The assessor is asked to indicate by either a movable curser or tick, the rating of the
stimuli in question (e.g., see “Stage Four: Attribute Ratings later” later). Phase five intro
duces stimuli with smaller differences and repetitions are included in the experiment.
The results of the experiments in phase five are used to check the response system for
logical inconsistencies, and to check the abilities of the assessors and selected attributes
by answering the following questions:

• Are assessors consistent in their ratings of repeated stimuli for all attributes?
• Are assessors agreeing on the ratings (ranking) of individual stimuli and
attributes?

Phase six is the final check of the paradigm developed, and it includes experiments with
test conditions that are similar to those in real tests.
Zacharov and Lorho (2005) include an example of the development phases just
described. Vocabularies of attributes, developed using QDA or other methods, have been
published for specific sensory modalities. For example, Noble and colleagues (1987)
developed the “wine aroma wheel,” Bech and colleagues (1996) developed a list for image
quality of CRT displays, and Pedersen and Zacharov (2015) developed the sound wheel.
The statistical analysis of the preliminary training experiments in phases four and
five usually employs analysis of variance (ANOVA) models or more advanced pro
cedures (such as those described by Næs and colleagues 2010), and can be executed using
either commercial software or freeware such as Panelcheck10 or Consumercheck.11 Both
Panelcheck and Consumercheck were developed as part of research projects aimed at
developing statistical procedures specifically for sensory experiments and implementing
consumer sound 331

them so that nonexperts in statistics can easily use them. The ongoing assessment of
panel performance is especially important; procedures for that specific purpose are
included in Panelcheck or eGauge (Lorho et al. 2010), and the details of a specific proce
dure (eGauge) are described in ITU-R (2014b).
Once the panel and initial list of attributes have been established, the ongoing
training is used to maintain the attribute list and to check the performance of the
panel members. Thereafter, a typical application of the QDA procedure in audio
includes the following points:

1. definition of the stimuli (for example, the selection of loudspeakers and programs
to be tested);
2. initial listening sessions with all members of the panel present, focusing on the
selection of attributes from the existing vocabulary such that all perceptual differ
ences are covered by the selected attributes. A typical selection includes ten to
fifteen attributes;
3. conducting the listening tests where each stimulus (e.g., a loudspeaker-program
combination) is rated for all attributes selected in the initial listening sessions.
There are many options for the practical implementation of the final tests (see,
e.g., Bech et al. 2005; Martin and Bech 2005; Hegarty et al. 2007; Postel et al. 2011);
however, it is important that only one attribute be rated at a time. This forces the
assessor to keep focus on the interpretation of that particular attribute and the
differences between, for example, loudspeakers for a given program; and
4. statistical analysis of the results. This includes, in addition to the standard tests of
the quality and properties of the raw data, analysis for each of the attributes where
the main variables (e.g., loudspeakers and programs) are examined. The correlation
between the examined attributes should also be analyzed—for example, using
principal component analysis—to determine the number of independent attributes.
Experience from listening and viewing tests at Bang & Olufsen suggests that
highly trained subjects can distinguish between a maximum of four to five attri
butes independently from an initial group of ten to fifteen attributes. In addition
to examining the main variables, it is also important to check the performance of
the panel (as discussed earlier). Further details of the complete statistical analysis
of sensory data are presented by Næs and colleagues (2010).

This section has described the considerations that led to the development of
experimental paradigms aimed at analysis of highly complex, sensory experiences.
The QDA method has been described as an example of one of the basic methods, and
references are given to other more recent paradigms. To illustrate the process of a
sensory analysis in detail, the following section includes a description of a PhD proj
ect included in a recent research project named “Perceptually Optimized Sound
Zones” (POSZ). The PhD project was aimed at developing a perceptual model for
prediction of human perception of the interaction between separate sound zones in a
domestic situation.
332 søren bech and jon francombe

Sensory Analysis of Interaction

between Sound Zones: Development
of a Perceptual Model for
Prediction of “Distraction”

In this section, the POSZ project will be briefly introduced, followed by the descriptive
analysis procedure that was used to ultimately develop a predictive model of the
main aspect of the listener experience (namely, perceived distraction). The POSZ project
brought together researchers in signal processing and audio perception in order to
develop perceptually optimal algorithms for producing personal sound zones. In a per
sonal sound zone situation, two (or more) separate sound fields are produced in sepa
rate zones in a room in such a way that multiple program items (one item in each zone)
can be reproduced simultaneously over the same loudspeakers; consequently, multiple
listeners distributed between zones can listen to different program material without
the need for headphones. The reproduction of personal sound over loudspeakers—as
opposed to headphones—has a number of advantages that are worth the extra signal
processing required: removing the need for headphones enables communication between
people even if they are consuming separate audio programs, and also facilitates much
greater awareness of the environment (this is particularly important in an automotive
scenario e.g., for road awareness and safety).
The signal processing required to produce such a complex sound field introduces
considerable artifacts that are likely to degrade the target quality. At the same time, it is
difficult to achieve perfect separation between zones, meaning that a listener may expe
rience unwanted audio interference on their target audio program. The descriptive
analysis performed as part of the POSZ project focused on the latter perceptual problem
(i.e., imperfect separation), as there has been considerable prior work on modeling
audio quality (e.g., ITU-R 2001; Rumsey et al. 2008; Conetta et al. 2008; Dewhirst et al.
2008a, 2008b; George et al. 2008). A series of perceptual tests was performed to deter
mine the perceptual experience of a listener in an audio-on-audio interference situation
(i.e., a situation in which the experience of listening to some target audio is modified by
a secondary interfering audio program). However, an attempt was also made to quantify
the magnitude of the effect of these different facets (Baykaner et al. 2015).
In the previous section, the QDA paradigm—a strictly controlled and specified
method—was outlined. There have been numerous other methods, with similarities to
and differences from QDA, which are often trademarked and must be carefully con
trolled if they are to be strictly followed (Lawless and Heymann 1998, 227–257; Delarue
et al. 2016). In practice, it is common for researchers to select aspects of these methods as
required for particular elicitation tasks, leading to the development of new methods or
simply to ad hoc techniques that are appropriate for particular studies. Murray et al.
(2001) term such methods “generic descriptive analysis.”
consumer sound 333

As part of the POSZ project, we conducted a comprehensive literature review of

descriptive analysis methods (Francombe et al. 2014a; Francombe 2014). The review
focused particularly on the attribute elicitation stage, in order to determine the opti
mum method for identification of the most relevant attributes for evaluating the experi
ence of a listener in an audio-on-audio interference situation. The final method selected
drew on ideas presented by Zacharov and Koivuniemi (2001) in the “audio descriptive
analysis and mapping” (ADAM) method. The experimental method featured four
stages: (1) free elicitation; (2) team discussion; (3) attribute reduction; and (4) attribute
ratings. These experiment stages are discussed in the following sections; more detail on
the experimental method is given by Francombe and colleagues (2014a).
Following selection of the most suitable attribute, a predictive model was created.
Predictive models are important tools in sensory science, as they help to bridge the gap
between physical measurements (which are repeatable and quick to perform, but may
not relate directly to human perception—for example, a measurement of signal level is
not sufficient to determine the perception of “loudness”) and human responses (which
can be reliable and accurate and are naturally related directly to perception, but are very
time-consuming and expensive to obtain). A model that can accurately predict the
human response in an objective manner can be used for evaluation (saving the time and
expense of using a panel of human participants) as well as for system optimization. The
development of such a predictive model is discussed further in what follows.

Stages One and Two: Free Elicitation and Team Discussions

As discussed previously, direct elicitation processes can use individual or consensus
methods to determine attributes. There are advantages and disadvantages to both methods.
Consensus methods naturally create a team language, making it more likely that experi
ment participants understand and agree on the meaning of a descriptor (although
not absolutely guaranteed, as it can be difficult to use words to convey experiences with
words—see Mason et al. 2001). Individual vocabulary methods do not naturally have
this property—although it can be assured through statistical analysis—but they enable
all participants to have an equal say, and therefore remove a source of bias where sub
jects with stronger verbal skills or domineering personalities might take over in a
team discussion. Individual methods are also often less time consuming. In order to take
advantage of the benefits of both types of methods, a hybrid approach is sometimes
taken; this was the case in the ADAM method and in the POSZ project.
The first stage of the experiment was an individual free elicitation and was intended to
produce a wide pool of descriptive terminology that was relevant to audio-on-audio
interference situations. Participants were presented with a set of stimuli (various audio-
on-audio interference situations) as well as a reference (a target program with no inter
ference), and asked to write any words that they felt to be relevant for describing the
differences between the stimuli and the reference. Stimuli were presented using a custom
interface displayed on a computer screen (Figure 16.3); the audio excerpts were randomly
334 søren bech and jon francombe

Figure 16.3 User interface for the free elicitation task. Stimuli were replayed by clicking the
circular buttons. Participant responses were typed into the text box at the bottom of the screen.

assigned to a set of buttons, which were positioned above a text box into which responses
could be typed. The multiple stimulus presentation meant that participants could also
compare between stimuli, widening the pool of potential descriptors. Five trained lis
teners and four untrained listeners performed the first stage of the test (see below for
a discussion on participant experience). A total of 572 unique words and phrases were
produced in this first stage.
The second stage featured a set of team discussions that were intended to reduce
the large set of individually elicited words and phrases into a manageable set of carefully
defined attributes. The underlying assumption was that many of the responses from
stage one, although ostensibly unique, were describing essentially the same experience.
The task for the participants was to find the optimal terminology for labeling and
describing the underlying percept. The trained and untrained participants performed
the team discussion separately. Each phrase was presented back to the team (using phys
ical printouts on small cards), and the participants were asked to categorize together any
of the responses that described the same percept. It was necessary for the participants to
reach a consensus when performing the categorization. When all of the responses had
been categorized, participants were asked to produce an attribute definition (a label for
the category), endpoint definition (terms that could be used as the positive and negative
endpoints of a scale of the attribute), and an attribute description (a short description of
the percept that could be understood by someone who had not participated in the
experiment). The experiment was facilitated by an experimenter who played no active
consumer sound 335

part in the discussions, serving only to administer the task (e.g., by presenting the phrases
and documenting the results). The experimenter was well versed in the background of
the tests but was careful to avoid taking an active part in the discussions so as to avoid
biasing the results.
Using this procedure, the trained listeners categorized 259 responses into 9 attributes,
and the untrained listeners categorized 313 responses into 8 attributes. A further team
discussion was performed with both sets of participants in order to unify the attribute
sets. A number of minor changes were made to definitions, descriptions, and endpoints.
Where there were duplicate attributes in the two sets, the participants generally agreed
that the trained listener labels and descriptions should be retained. The final attribute set
included twelve attributes (see Francombe et al. 2014a, for details): masking; calming;
distraction; separation; confusion; annoyance; environment; chaotic; balance and blend;
imagery; response to stimuli over time; and short-term response to stimuli.

Stage Three: Attribute Reduction

In the first stage of the elicitation just described, participants were asked for all differ
ences between the test stimuli and the reference. It is likely that some of the differences
are much more important than others for evaluating the listener experience. It was
therefore necessary to determine the most relevant attribute(s) for further investigation.
This type of redundancy reduction is ideally performed using statistical methods; how
ever, this requires attribute ratings to be made for all of the elicited attributes. To avoid a
lengthy and potentially unnecessary attribute rating stage, an attribute reduction phase
was included to quickly select only the most relevant attributes. A simplified ranking
procedure was used, in which participants were played the experiment stimuli and
asked to select the one most relevant attribute for differentiating between the version of
the stimulus with and without auditory interference. The test was performed using a
custom computer interface, shown in Figure 16.4. The attributes were assigned to the
buttons at random.
The results of the experiment were analyzed using a chi-square goodness-of-fit
test, which quantifies differences from a specified distribution (in this case, the uniform
distribution—that is, the assumption that all attributes are used with equal probability).
It was clear that particular attributes were used at significantly greater than chance fre
quency, and consequently, four attributes were selected for further analysis: annoyance;
distraction; balance and blend; and confusion. The definitions for these attributes are
given in Table 16.1.

Stage Four: Attribute Ratings

As discussed previously, the relationship between attributes can be assessed by per
forming a statistical analysis of ratings made on each attribute. For the stimuli under test,
336 søren bech and jon francombe

Response to Stimuli Confusion Calming Balance and Blend

over Time
Extremely Confusing → Not At All Confusing Very Calming → Very Unsettling Complementary → Conflicting
Very Positive Feeling → Very Negative Feeling

Environment Short-term Chaotic Distraction

Realistic → Unrealistic
Response to Stimuli Chaotic → Simple Not At All Distracting → Overpowered
Very Positive Feeling → Very Negative Feeling

Separation Masking Imagery Annoyance

Completely Separate → Indistinguishable Completely Audible → Drowned Out Strong Relation → No Relation Very Annoying → Not Annoying At All

Play/Stop Play/Stop
Next
Reference Test

Trial 1/54

Figure 16.4 User interface for the attribute reduction stage. Stimulus playback was controlled
using the buttons at the bottom of the screen. The attribute labels and definitions were posi
tioned at random on the grid of buttons.

Table 16.1 Attribute Labels, Descriptions, and Endpoints for the Four Attributes
That Were Used at Significantly Greater Than Chance Frequency in the Attribute
Reduction Stage
Attribute Description Endpoints

Annoyance To what extent the alternate audio causes irritation when Very annoying to Not at
trying to listen to the target audio all annoying
Distraction How much the alternate audio pulls your attention or Not at all distracting to
distracts you from the target audio Overpowered
Balance and How you judge the blend of sources to be Complementary to
blend Conflicting
Confusion How confusing the merge of the two audio programs Extremely confusing to
is—rhythmically, melodically, or harmonically; how they Not at all confusing
blend together. Confusion because the sources interact
with each other
consumer sound 337

Figure 16.5 User interface for the attribute rating stage. Stimuli were replayed by clicking the
labeled circular buttons, and ratings were given using the vertical sliders.

ratings were made on the four attributes carried forward from the attribute reduction
stage. A multiple stimulus paradigm, modified from the standardized BS.1534-3
“MUSHRA” test (ITU-R 2015), was used: participants gave ratings on 15-cm vertical sliders
with endpoint label positions 1.5 cm from the scale ends. The user interface is shown in
Figure 16.5. A reference stimulus (just the target audio with no interference) could be
played by clicking a button labeled “R” that was positioned in line with the 0 point of the
scale (i.e., not at all distracting). The stimuli could be played by clicking the labeled but
tons at the top of the page; the distraction score was given by setting the associated slider
to the desired position. The experiment was performed by the listeners who had par
ticipated in the attribute elicitation as well as a small team of new participants in order to
ensure that the attributes could be used and understood outside of the original panel.
A principal component analysis (PCA) (Næs et al. 2010, 209–226) was performed to
assess the relationships between the four attributes. In PCA, orthogonal vectors that
explain the maximum variance are consecutively extracted from the attribute rating
data. The attributes (and ratings) can then be plotted in the new lower-dimensional
space to easily allow interpretation of the relationship between the attributes as well as
the relationship between attributes and ratings. The PCA solution is plotted in Figure 16.6.
The vectors show the correlation between each attribute and the first two principal com
ponents; the angle and length of each vector indicates the degree to which the associated
attribute is correlated with the two visualized components. The number of dimensions
on which the original data is represented can be chosen by considering metrics such as
“variance explained” or by visual evaluation of a scree plot. In this analysis, almost all of
the variance in the data could be explained by two components, indicating that there
338 søren bech and jon francombe

1
Balance and Blend

Component 2 (10.0% variance explained)

0.5
Confusion

Annoyance
0

Distraction

–0.5

–1
–1 –0.5 0 0.5 1
Component 1 (88.5% variance explained)

Figure 16.6 Principal component representation of four attributes. The vectors show the
correlation between each attribute and the two principle components represented in the plot
(and can therefore be different lengths depending on the strength of the relationship).

was considerable redundancy in the four attributes. The first component accounted
for 88.5 percent of the variance and was related to both annoyance and distraction. The
second, explaining a further 10 percent of the variance, was related to balance and blend.
The attribute confusion was equally loaded onto both dimensions. There were no appar
ent differences in the PCA solution between the participants who had taken part in the
whole experiment and those who only performed the rating task.
Further analysis of participant agreement suggested that confusion was the least well
understood of the four attributes (i.e., the ratings exhibited least agreement between
participants), while distraction was most well understood (or at least, participants used
the scale in the same way). Consequently, distraction was selected as the attribute to
model; it was strongly related to the component that explained the vast majority of the
variance in the data, and it was well understood by the participants.

Attribute Modeling
As discussed previously, it is hugely beneficial to be able to predict the human response
in a sensory evaluation task in a quick and repeatable manner. It is therefore desirable to
develop predictive models that use measured, physical features of the sound field to
derive predictions of the human response. As described earlier, the first stage in this pro
cedure is determining the correct perceptual attribute to model: in this case, distraction
consumer sound 339

due to the presence of some interfering audio program was found to be most appropriate.
It is then necessary to collect a large amount of human data constituting ratings of the
attribute for different stimuli—preferably over the entire stimulus space that the model
might encounter in its target usage domain. As well as collecting subjective ratings, it is
necessary to determine the physical parameters of the stimuli that contribute to the
ratings, in order that the mathematical relationship between physical parameters and
ratings can be modeled.
In order to collect a set of ratings, a pool of one hundred audio-on-audio interference
situations was created. It was considered desirable to ensure that the training stimuli
covered a wide range of potential broadcast audio content, but also that the model train
ing was not biased by closely controlling a set of physical parameters prior to the feature
extraction stage. Consequently, the stimulus set was established using a random sampling
method, in which program items were taken from online radio stations at randomly
generated times (Francombe et al. 2014b). The items were loudness matched using a
perceptual model prior to being used for the construction of the test stimuli by varying
a set of parameters. The test parameters (target level, interferer level, and interferer
direction12) were not varied in a full factorial manner—they were determined at random
(within reasonable ranges). In this manner, a diverse and representative training set was
developed. Listener ratings of distraction were collected using the same methodology as
for the attribute ratings described above (a multiple stimulus presentation rating test).
Participants exhibited strong agreement in their ratings, which helped to validate selection
of the attribute distraction. The random sampling stimulus selection method was found
to produce a set of stimuli that evenly covered the full range of the perceptual scale.
The next challenge was extraction of relevant physical parameters from the stimuli.
The range of features that it is possible to extract from audio recordings is multifarious;
therefore, selecting the correct features is a crucial and difficult task in any modeling
process. To aid with this procedure, participants were asked to write down reasons that
they had for finding the audio-on-audio situations distracting; the written response data
was analyzed using a form of verbal protocol analysis (Ericsson and Simon 1993, 1–62) to
generate a set of categories, which was then used to motivate the search for features.
Audio features were extracted using a variety of freely available toolboxes to produce
a set of 399 features. The categories and extracted features are described by Francombe
and colleagues (2015b).
After the creation of a large feature set, the next challenge is model fitting. There is a
variety of different methods of modeling data, but in this case, a simple linear regression
model was used. One of the main advantages of such a model is that it is easy to interpret
the relationship between the features and the response variable. This is not always the
case; in more complex model structures (for example, neural networks), this relation
ship can be obscured. The feature selection process involves training a large number of
models and using some criteria to determine which is the best. As an exhaustive search
through a large feature set (e.g., 399 in this case) is prohibitively time consuming, it is
common to use a search algorithm; in this case, a stepwise feature addition and removal
procedure was followed.
340 søren bech and jon francombe

One of the primary concerns for a predictive model is its generalizability; that is,
the model should be able to make predictions for situations outside of those on which
it was trained. As the number of features in a regression model tends toward the num
ber of data points, it becomes possible to mathematically account for all of the vari
ance in the data. However, this is not beneficial, as it is very unlikely that the model
will be able to make an accurate prediction for a new data point that falls outside of the
training data set. This problem is known as overfitting. It is far better for the model to
have some error, but for the features to accurately describe a physical phenomenon
and therefore to generalize to new situations, than for the model to very accurately
predict the training set but fail under new circumstances. It is therefore desirable to
minimize the number of features in the final model, while still including all features
that describe physical processes that determine the human response. A further consider
ation when selecting features for a linear regression model is the relationship between
each predictor. The linear regression model works under the assumption that the fea
tures do not correlate highly with each other (this is known as multicollinearity
between features).
There are two primary metrics that describe the performance of a model. Goodness-
of-fit is primarily measured using root-mean-square error (RMSE)—this quantifies
the difference between the measured subjective response and the model prediction.
The amount of variance explained by the model is measured by the coefficient of
determination, R2. Both metrics can be altered to reduce the chance of overfitting. Cross-
validation can be used to estimate the performance of the model on data points outside
of the training set. In cross-validation, a number of data points are withheld from the
training set, but used for testing (e.g., calculating the RMSE). This process can be
repeated for multiple groups of “holdout” data. The R2 statistic can be adjusted in such a
way that models with a higher number of features are penalized.
These adjusted statistics were used to ensure that the features selected were general
izable as well as providing an accurate fit. The final model included five features:
overall loudness; target-to-interferer ratio; interference-related perceptual score from
the “Perceptual Evaluation methods for Audio Source Separation” (PEASS) toolbox
(Emiya et al. 2011); high-frequency level range of the interferer; and percentage of
temporal windows with low target-to-interferer ratio. The model exhibited an RMSE
of approximately 10 percent on the training set and explained 88 percent of the vari
ance in the data.
Regardless of how well a model fits the training data, success or failure can only really
be assessed through validation on a new dataset, that is, on data points for which subjec
tive responses are available but were not used to train the model. In this manner, the
generalizability and accuracy of the model can truly be tested. Two validation data sets
were used to test the POSZ distraction model. The first used ratings from stimuli col
lected using the same procedure as that used for the training set data collection (but
were not included during the model training). The second validation set used stimuli
collected for a previous experiment, which were consequently different in some regards
(the program items were longer and some exhibited different conditions such as filtering
consumer sound 341

or the presence of simulated road noise). The RMSE increased from 10 percent to
approximately 12 percent and 16 percent for the two datasets respectively; the explained
variance (indicated by R2) decreased from 88 percent to 82 percent and 78 percent
respectively. This relatively modest reduction in performance suggested that the final
model was generalizable to a range of audio-on-audio interference situations with music
program material.

Discussion
The procedure just described was designed to ensure that a robust model of a relevant
facet of listener experience for a relatively new and unknown listening situation could
be created. The model was shown to perform well for training and validation datasets; it
has since been tested in a number of situations and found to perform very successfully,
with error remaining at approximately 10 percent (Rämö et al. 2016).
It is hoped that having an accurate model will enable quick, perceptually relevant
evaluation of personal sound zones. Some efforts have also been made to use the model to
optimize a sound zone generation system by selecting optimally positioned loudspeakers
(Francombe et al. 2013).
We believe that one of the primary reasons for the success of the model was the compre
hensive attribute elicitation experiment, which ensured that the correct facet of the
listening experience was being modeled. It was consistently found that the attribute
distraction produced strong agreement between participants; this is invaluable when
collecting training data. There are numerous mathematical modeling methods, feature
selection tricks, and so on; however, it is often the quality of the subjective training data
that is most important when developing such a model.
The elicitation procedure described drew heavily on some well-established ideas
within the literature but also introduced some novel aspects. It has been widely stated
(e.g., Lawless and Heymann 1998) that descriptive attributes should be developed
by trained participants while hedonic judgments (e.g., preference) should be made
by untrained participants. For the task of investigating the experience of a listener in
a personal sound zone system, we felt that it was desirable to perform the elicitation
experiment with both trained and untrained listeners. While the trained listeners
tended to give better descriptions (this was reflected by the selection of trained
listener attributes where there was overlap between the two sets), there were also
unique and important attributes determined by the untrained participants (e.g.,
balance and blend, which was found to be one of the four most relevant attributes
and explained a small but notable proportion of variance in the ratings). Of course,
there are some sensory evaluation tasks that require a high degree of experience—for
example, where very small degradations or artifacts are present. However, in the
case of audio-on-audio interference in sound zones, the perspective of untrained
listeners—who will ultimately be the end users of any commercial system—was
definitely valuable.
342 søren bech and jon francombe

The Next Step

The work described thus far has shown that perceptual models, developed using
advanced sensory science methodologies, are useful for pure research and for product
optimization. However, it is hard to see how perceptual scientists will ever complete the
task of quantifying the imagination of consumers with relation to complex auditory
scenes. The development of new and ever-more advanced signal processing methods is
unlikely to slow and, in fact, spatial audio is the topic of much current research. For
example, some recent or current large projects include the BILI project,13 the S3A: Future
Spatial Audio project,14 and the ORPHEUS project.15
Two-channel stereo reproduction has been prevalent in domestic and professional
audio replay for a number of decades. Five-channel surround sound has also seen
considerable uptake, if not quite to the same level of ubiquity as two-channel stereo.
However, there are a number of different surround sound reproduction methods available.
Channel-based methods have varying loudspeaker counts and positions (including
loudspeakers above and below the listener); a set of common loudspeaker layouts has
been standardized by the International Telecommunication Union (ITU-R 2014a).
Methods that require fewer channels and less set-up effort—such as headphones and
soundbars—are also becoming increasingly popular, particularly for domestic audio
reproduction. In the last year or so, the boom in virtual reality technology has yet again
pushed realistic spatial audio to the forefront of many research agendas. As technology
enables production of more complex and realistic experiences—even those that might
not relate to real-world situations—quantification of experience and imagination
remains of utmost importance.
Descriptive analysis experiments have been performed to try to uncover the per
ceptual differences between reproduction methods—see Francombe and colleagues
(2015a) for a review of relevant literature. The resultant picture is complex; there are many
different attributes, but limited consensus on their exact meanings or on which are most
important. There has been a recent effort to consolidate the existing research in order to
produce a standardized set of terms (Pedersen and Zacharov 2015; Zacharov et al. 2016),
drawing parallels with the ubiquitous wine aroma wheel (Noble et al. 1987).
Another current research topic is the development of faster and more efficient experi
mental methods; the so-called FastTrack or RaPID methods (see Delarue et al. 2016;
Moulin et al. 2016). The purpose is to increase the efficiency of the experimental effort
while maintaining the statistical quality of the data. This is especially important for
industrial applications, but also in academia for pilot experiments.
The research area described and exemplified in this chapter represents another step
in the development of sound reproduction techniques that will allow the listener to
imagine the sound event as intended by the creator/artist anywhere and anytime.
However, it is an ongoing challenge for evaluation methods and models to keep up
with the development of new sound reproduction and processing technologies.
consumer sound 343

We feel that the benefit of in-depth perceptual understanding and optimization make
this a worthwhile effort.

Notes
1. The audio industry is here defined to include researchers working at universities in areas
such as signal processing, electroacoustics, psychoacoustics, and psychology. The area also
includes developers working in companies producing products for recording, storing,
transmitting, and rendering sound. The “products” include new principles and algorithms,
as well as systems for recording, encoding, transmitting, decoding, and rendering sound
in the consumer’s home.
2. The following terminology, based on Dorsch (2016), is used in this chapter. A listener is
exposed to a sound field in an environment and a perception or percept is created after the
transformation of physical energy to neural information by the auditory system. The percept
results in an auditory impression or auditory experience. Based on the auditory impression,
one or more auditory images are created. The reader is referred to Dorsch (2016) and other
chapters in the handbook for a further discussion of imagination.
3. “Ringing” refers to added oscillations of an electrical or acoustic signal that were not
present in the original signal. The audible consequences can be that the signal continues
when it should have stopped; this is most noticeable on transient signals such as drums.
4. “Squared clouds” refers to a visible artifact in images where clouds have squared edges
instead of smooth edges as in nature. This is typically caused by a limited resolution in the
bit stream or loss of information during transmission of the signal.
5. http://www.s3a-spatialaudio.org/. Accessed October 5, 2017.
6. These auditory properties can be measured accurately using a range of psychophysical
procedures; however, it is outside the scope of this chapter to discuss them in further
detail. The reader is referred to Gescheider (2015).
7. The reader should note that the rating will reflect the assessor’s sensitivity to the attribute in
question plus a general component reflecting the so-called bias, which is a measure of an asses
sor’s tendency to respond that a stimulus is present compared to not present. These two com
ponents can be separated using signal detection theory; see, for example, Gescheider (2015).
8. It is noted that several of the elicited words and the corresponding ratings could be repre
sentative of the same attribute; however, such multicollinearity is identified and resolved
during the statistical analysis.
9. An example could be if the term “quality” is a part of the definition of the word, as “quality”
is often ambiguous to assessors.
10. http://www.panelcheck.com/.
11. https://consumercheck.co/.
12. These terms refer to a sound zone setup with two or more zones. Target level represents
the level of the primary sound in zone A (in which the assessor is situated). Interferer level
represents the level of the sound in zone A caused by the interference of sound from the
other zones. Interferer direction represents the spatial direction of the interfering sound
from other zones.
13. http://www.bili-project.org.
14. http://www.s3a-spatialaudio.org/.
15. https://orpheus-audio.eu/.
344 søren bech and jon francombe

References
ANSI/ASA. 2013. Acoustical Terminology. S1.1–2013. American National Standards Institute/
Acoustical Society of America.
Baykaner, K., P. Coleman, R. Mason, P. J. B. Jackson, J. Francombe, M. Olik, et al. 2015. The
Relationship between Target Quality and Interference in Sound Zones. Journal of the Audio
Engineering Society 63 (1–2): 78–89.
Bech, S. 1994. Perception of Timbre in Small Rooms: Influence of Room and Loudspeaker
Position. Journal of the Audio Engineering Society 42 (12): 999–1007.
Bech, S. 1999. Methods for Subjective Evaluation of Spatial Characteristics of Sound. In
Proceedings of the Audio Engineering Society 16th International Conference: Spatial Sound
Reproduction, 487–504. New York, NY: Audio Engineering Society.
Bech, S., M.-A. Gulbol, G. Martin, J. Ghani, and W. Ellermeier. 2005. A Listening Test System
for Automotive Audio, Part 2: Initial Verification. In Proceedings of the Audio Engineering
Society 118th Convention, 487–504. Barcelona, Spain. Convention paper 6359. New York,
NY: Audio Engineering Society.
Bech, S., R. Hamberg, M. Nijenhuis, C. Teunissen, H. Looren de Jong, P. Houben, et al. 1996.
Rapid Perceptual Image Description (RaPID) Method. In Proceedings of SPIE 2657, 17–28.
Bellingham, Washington, USA.
Bech, S., and N. Zacharov. 2006. Perceptual Audio Evaluation: Theory, Method and Application.
Chichester, UK: Wiley.
Beranek, L. L. 1962. Music, Acoustics and Architecture. New York: Wiley.
Blauert, J. 2005. Communication Acoustics. Berlin: Springer.
Conetta, R., F. Rumsey, S. Zielinski, P. J. B. Jackson, M. Dewhirst, S. Bech, et al. 2008. QESTRAL
(Part 2): Calibrating the QESTRAL Model using Listening Test Data. In Audio Engineering
Society 125th Convention. San Francisco. Convention paper 7596. New York, NY: Audio
Engineering Society.
Delarue, J., D. B. Lawlor, and D. M. Rogeaux. 2016. Rapid Sensory Profiling Techniques and
Related Methods: Applications in New Product Development and Consumer Research.
Cambridge: Woodhead.
Dewhirst, M., R. Conetta, F. Rumsey, P. J. B. Jackson, S. Zielinski, S. George, et al. 2008a.
QESTRAL (Part 4): Test Signals, Combining Metrics, and the Prediction of Overall Spatial
Quality. In Audio Engineering Society 125th Convention. San Francisco. Convention paper
9598., New York, NY: Audio Engineering Society.
Dewhirst, M., P. J. B. Jackson, R. Conetta, S. Zielinski, F. Rumsey, D. Meares, et al. 2008b.
QESTRAL (Part 3): System and Metrics for Spatial Quality Prediction. In Audio Engineering
Society 125th Convention. San Francisco. Convention paper 9597., New York, NY: Audio
Engineering Society.
Dorsch, F. 2016. Hume. In The Routledge Handbook of Philosophy of Imagination, edited by
A. Kind, 40–54. London: Routledge.
Emiya, V., E. Vincent, N. Harlander, and V. Hohmann. 2011. Subjective and Objective Quality
Assessment of Audio Source Separation. IEEE Transactions on Audio, Speech, and Language
Processing 19 (7): 2046–57.
Ericsson, K. A., and H. A. Simon. 1993. Protocol Analysis: Verbal Reports as Data. London:
MIT Press.
Francombe, J. 2014. Perceptual Evaluation of Audio-on-Audio Interference in a Personal
Sound Zone System. PhD thesis, Guildford, UK: University of Surrey.
consumer sound 345

Francombe, J., P. Coleman, M. Olik, K. Baykaner, P. J. B. Jackson, R. Mason, et al. 2013.

Perceptually Optimized Loudspeaker Selection for the Creation of Personal Sound Zones.
In Audio Engineering Society 52nd International Conference: Sound Field Control. Guildford,
UK. New York, NY: Audio Engineering Society.
Francombe, J., R. Mason, M. Dewhirst, and S. Bech. 2014a. Elicitation of Attributes for the
Evaluation of Audio-on-Audio Interference. Journal of the Acoustical Society of America
136 (5): 2630–2641.
Francombe, J., R. Mason, M. Dewhirst, and S. Bech. 2014b. Investigation of a Random Radio
Sampling Method for Selecting Ecologically Valid Music Programme Material. In Audio
Engineering Society 136th Convention. Berlin, Germany. Convention paper 9029. New York,
NY: Audio Engineering Society.
Francombe, J., T. Brookes, and R. Mason. 2015a. Perceptual Evaluation of Spatial Quality:
Where Next? In 22nd International Congress on Sound and Vibration Proceedings. 340–347,
Florence, Italy.
Francombe, J., R. Mason, M. Dewhirst, and S. Bech. 2015b. A Model of Distraction in an
Audio-on-Audio Interference Situation with Music Program Material. Journal of the Audio
Engineering Society 63 (1–2): 63–77.
Gabrielsson, A., and H. Sjogren. 1979. Perceived Sound Quality of Sound-Reproducing
Systems. Journal of the Acoustical Society of America 65 (4): 1019–33.
George, S., S. Zielinski, F. Rumsey, R. Conetta, M. Dewhirst, P. J. B. Jackson, et al. 2008. An
Unintrusive Objective Model for Predicting the Sensation of Envelopment Arising from
Surround Sound Recordings. In Audio Engineering Society 125th Convention. San Francisco.
Convention paper 7599. New York, NY: Audio Engineering Society.
Gescheider, G. A. 2015. Psychophysics: The Fundamentals. 3rd ed. London: Routledge.
Hamasaki, K. 2011. 22.2 Multichannel Audio Format Standardization Activity. Broadcast
Technology 45: 14–19.
Havelock, D., S. Kuwano, and M. Vorländer. 2008. Handbook of Signal Processing in Acoustics.
New York: Springer.
Hegarty, P., S. Choisel, and S. Bech. 2007. A Listening Test System for Automotive Audio,
Part 3: Comparison of Attribute Ratings Made in a Vehicle with Those Made using an
Auralization System. In Audio Engineering Society 123rd Convention. New York. Convention
paper 7224. New York, NY: Audio Engineering Society.
Herre, J., J. Hilpert, A. Kuntz, and J. Plogsties. 2014. MPEG-H Audio: The New Standard for
Universal Spatial/3D Audio Coding. Journal of the Audio Engineering Society 62 (12):
821–830.
ISO. 1984. Acoustics: Threshold of Hearing by Air Conduction as a Function of Age and Sex
for Otologically Normal Persons. 7029:1984. International Organisation for Standards.
ISO. 1993. Sensory Analysis: General Guidance for the Selection, Training and Monitoring of
Assessors, Part 1: Selected Assessors. 8586–1:1993. International Organisation for Standards.
ISO. 1994. Sensory Analysis: General Guidance for the Selection, Training and Monitoring of
Assessors, Part 2: Experts. 8586–2:1994. International Organisation for Standards.
ISO. 2002a. Sensory Analysis: General Guidance for the Staff of a Sensory Evaluation
Laboratory, Part 1: Staff Responsibilities. 13300–1:2006. International Organisation for
Standards.
ISO. 2002b. Sensory Analysis: General Guidance for the Staff of a Sensory Evaluation Laboratory,
Part 2: Recruitment and Training of Panel Leaders. 13300–2:2006. International Organisation
for Standards.
346 søren bech and jon francombe

ITU-R. 1997. Methods for the Subjective Assessment of Small Impairments in Audio
Systems Including Multichannel Sound Systems. Recommendation BS.1116–1. International
Telecommunication Union.
ITU-R. 2001. Method for Objective Measurements of Perceived Audio Quality. International
Telecommunication Union.
ITU-R. 2014a. Advanced Sound System for Programme Production. Recommendation
BS.2051–0. International Telecommunication Union.
ITU-R. 2014b. Methods for Assessor Screening. Recommendation BS.2300–0. International
Telecommunication Union.
ITU-R. 2015. Method for the Subjective Assessment of Intermediate Quality Levels of Coding
Systems. Recommendation BS.1534–3. International Telecommunication Union.
Kaplanis, N., S. Bech, S. Tervo, J. Pätynen, T. Lokki, T. Waterschoot, et al. 2017a. A Rapid
Sensory Analysis Method for Perceptual Assessment of Automotive Audio. Journal of the
Audio Engineering Society 65 (1–2): 1–17.
Kaplanis, N., S. Bech, S. Tervo, J. Pätynen, T. Lokki, T. Waterschoot, et al. 2017b. Perceptual
Evaluation of Car Cabin Acoustics. Journal of the Acoustical Society of America 141 (2):
1459–146.
Kjörling, K., J. Rödén, M. Wolters, J. Riedmiller, A. Biswas, P. Ekstrand, et al. 2016. AC-4: The
Next Generation Audio Codec. In Audio Engineering Society 140th Convention. Paris.
Convention paper 9491. New York, NY: Audio Engineering Society.
Lawless, H. T., and H. Heymann. 1998. Sensory Evaluation of Food: Principles and Practices.
New York: Springer.
Lorho, G., G. Le Ray, and N. Zacharov. 2010. eGauge: A Measure of Assessor Expertise in
Audio Quality Evaluations. In Audio Engineering Society 38th International Conference:
Sound Quality Evaluation, 1–10. Piteå, Sweden., New York, NY: Audio Engineering
Society.
Martin, G., and S. Bech. 2005. Attribute Identification and Quantification in Automotive
Audio, Part 1: Introduction to the Descriptive Analysis Technique. In Audio Engineering
Society 118th Convention. Barcelona. Convention paper 6360. New York, NY: Audio
Engineering Society.
Mason, R., N. Ford, F. Rumsey, and B. De Bruyn. 2001. Verbal and Nonverbal Elicitation
Techniques in the Subjective Assessment of Spatial Sound Reproduction. Journal of the
Audio Engineering Society 49 (5): 366–84.
Meilgaard, M., G. V. Civille, and B. T. Carr. 1991. Sensory Evaluation Techniques. Florida: CRC
Press.
Moulin, S., S. Bech, and T. Stegenborg-Andersen. 2016. Sensory Profiling of High-End
Loudspeakers using Rapid Methods, Part 1: Baseline Experiment using Headphone
Reproduction. In 2016 Audio Engineering Society Conference on Headphone Technology.
Aalborg, Denmark. New York, NY: Audio Engineering Society.
Murray, J. M., C. M. Delahunty, and I. A. Baxter. 2001. Descriptive Sensory Analysis: Past,
Present and Future. Food Research International 34 (6): 461–71.
Næs, T., P. Brockhoff, and O. Tomić. 2010. Statistics for Sensory and Consumer Science.
Hoboken, NJ: Wiley.
Nijenhuis, M. 1993. Sampling and Interpolation of Static Images: A Perceptual View. PhD
thesis, Institute of Perception Research, Eindhoven University of Technology, The
Netherlands.
consumer sound 347

Noble, A. C., R. A. Arnold, J. Buechsenstein, E. J. Leach, J. O. Schmidt, and P. M. Stern. 1987.

Modification of a Standardized System of Wine Aroma Terminology. American Journal of
Enology and Viticulture 38 (2): 143–46.
Pedersen, T. H., and C. L. Fog. 1998. Optimisation of Perceived Product Quality. Euronoise 2:
633–638.
Pedersen, T. H., and N. Zacharov. 2015. The Development of a Sound Wheel for Reproduced
Sound. In Audio Engineering Society 138th Convention. Warsaw. Convention paper 9310.
New York, NY: Audio Engineering Society.
Plomp, R. 1976. Aspects of Tone Sensation: A Psychophysical Study. London: Academic
Press.
Postel, F., P. Hegarty, and S. Bech. 2011. A Listening Test System for Automotive Audio, Part 5:
The Influence of Listening Environment on the Realism of Binaural Reproduction. In
Audio Engineering Society 130th Convention. London. Convention paper 8446. New York,
NY: Audio Engineering Society.
Pulkki, V., and M. Karjalainen. 2015. Communication Acoustics: An Introduction to Speech,
Audio and Psychoacoustics. Chichester, UK: Wiley.
Rämö, J., S. Marsh, S. Bech, R. Mason, and S. H. Jensen. 2016. Validation of a Perceptual
Distraction Model in a Complex Personal Sound Zone System. In Audio Engineering Society
141st Convention. Los Angeles, CA, USA. Convention paper 9665. New York, NY: Audio
Engineering Society.
Rumsey, F., S. Zielinski, P. J. B. Jackson, M. Dewhirst, R. Conetta, S. George, et al. 2008.
QESTRAL (Part 1): Quality Evaluation of Spatial Transmission and Reproduction using an
Artificial Listener. In Audio Engineering Society 125th Convention. San Francisco. Convention
paper 7595. New York, NY: Audio Engineering Society.
Sabine, W. C. 1922. Reverberation: Introduction. In The American Architect, reprinted in
W. C. Sabine Collected Papers on Acoustics, 3–68 London: Harvard University Press.
Shiffman, S. S., M. L. Reynolds, and F. W. Young. 1981. Introduction to Multidimensional
Scaling. London: Academic Press.
Spreen, O., and E. A. Strauss. 1998. A Compendium of Neuropsychological Tests. New York:
Oxford University Press.
Staffeldt, H. 1974. Correlation between Subjective and Objective Data for Quality Loudspeakers.
Journal of the Audio Engineering Society 22 (6): 402–415.
Stone, H., and J. L. Sidel. 2004. Sensory Evaluation Practices. 3rd ed. London: Academic
Press.
Theile, G. 1991. On the Naturalness of Two-Channel Stereo Sound. In Proceedings of the Audio
Engineering Society 9th International Conference: Television Sound Today and Tomorrow,
143–149. New York, NY: Audio Engineering Society.
Toole, Floyd E. 1982. Listening Tests: Turning Opinion into Fact. Journal of the Audio
Engineering Society 30 (6): 431–445.
Wickelmaier, F., and S. Choisel. 2005. Selecting Participants for Listening Test of Multichannel
Reproduced Sound. In Audio Engineering Society 118th Convention. Barcelona. Convention
paper 6483. New York, NY: Audio Engineering Society.
Yendrikhovskij, S. N. 1998. Color Reproduction and the Naturalness Constraint. PhD thesis,
Institute of Perception Research, Eindhoven University of Technology, The Netherlands.
Zacharov, N. 2012. The Impact of Sensory Evaluation of Sound: Past, Present and Future.
In Sensometrix. Rennes, France: The Sensometrix Society.
348 søren bech and jon francombe

Zacharov, N., and K. Koivuniemi. 2001. Unravelling the Perception of Spatial Sound
Reproduction: Analysis and External Preference Mapping. In Audio Engineering Society 111th
Convention. New York. Convention paper 5423. New York, NY: Audio Engineering Society.
Zacharov, N., and G. Lorho. 2005. Sensory Analysis of Sound (in Telecommunications). In
European Sensory Network Conference. Madrid, Spain: European Sensory Network.
Zacharov, N., T. Pedersen, and C. Pike. 2016. A Common Lexicon for Spatial Sound Quality
Assessment: Latest Developments. In 2016 Eighth International Conference on Quality of
Multimedia Experience (QoMEX), 1–6. Lisbon, Portugal: QoMEX.
chapter 17

Cr e ati ng a Br a n d
I m age through M usic
Understanding the Psychological Mechanisms
behind Audio Branding

Hauke Egermann

Introduction

For a brand today, it is about creating an emotional bond to people; to

leave the stage of merely being a product and to be seen more as a trusted
friend—a friend with values that you identify yourself and your lifestyle
with. . . . Music is something that people connect with, enjoy discussing
and sharing with others. Music preference relates to and can reveal a
person’s personality. Brands are becoming aware of the possibility to
emerge as an ambassador of this social media, the positive effect it can
have on their brand image, and how it can attract the attention of people
in product and brand marketing.
—Lusensky (2008, 9)

This quote was taken from a study report of a Scandinavian Music and Audio Branding
consulting agency. On the one hand, it describes the requirement to create meaningful
brands for successful marketing and, on the other, it emphasizes the multiple roles that
music is thought to play in this context: (1) music is said to create brand attention;
(2) music is said to create a positive-affective response in consumers; and (3) music can
presumably structure and influence the cognitive meaning dimensions of a brand image.
Accordingly, Jackson (2003) defines the professional practice of audio branding as the
creation of brand expressions in sound that depend on the consistent and strategic use of
these expressions in marketing communication (see also Gustafsson, volume 1, chapter 18).
350 HAUKE EGERMANN

They can have various compositional forms: audio logos that are often quite short
sequences of acoustical elements, longer jingles and brand songs, background sound tracks
and soundscapes, interaction sounds, and the typical brand voices (Krugmann 2007).
Potential touchpoints where a consumer experiences these elements could be advertise-
ments in media such as TV, radio, websites, or cinema but also corporate films, brand
events, or customer telephone lines.
As many of the audio branding elements have musical elements, these elements are
said to shape a long-term image of a brand. But how does this shaping work? How does
music function when it influences how a consumer imagines characteristics of a brand?
This chapter will present several theoretical and empirical accounts in order to under-
stand the psychological mechanisms at work when the imagination of a brand is influ-
enced by music. It will provide insights to the underlying functionality and effectiveness
of these practices that will ultimately be summarized in an integrative brand-music
communication model.

From Classical Conditioning

to Music-Brand Fit

The classical literature on music’s effects in advertising mostly focuses on associative

learning as the main mechanism involved (North and Hargreaves 2008; Zander 2006).
Here, for example, music is taken as an unconditioned stimulus with affective-cognitive
meaning that—through paired presentation with a brand or a product (conditioned
stimulus)—transfers its evaluative and associative qualities to this brand or product
and influences the recipients’ attitudes toward it. This, in turn, is thought to influence
consumers’ product choices (Lantos and Craton 2012). In an oft-cited experiment,
Gorn (1982) showed that one type of pen was chosen more often than another type of
pen, when it was presented together with liked music as opposed to disliked music. So,
the unconditioned stimulus music biased the preference for a certain type of product.
According to the elaboration likelihood model, there are two routes to this type of
persuasion: one of low involvement, that allows peripheral information/qualities to
influence choice decisions, and one of high involvement (Petty et al. 1983). In the latter
one, consumers are more likely to base their decisions on central qualities of the product
and accordingly are less likely to be influenced by music. In the low involvement route,
however, music is thought to impact consumer behavior. This difference between the
low and high route of persuasion is often exemplified with product categories that
involve different processing depths; for example, while the decision of spending a lot of
money on a car is more likely to be based on the high-involvement route of processing,
low-value goods of everyday buying choices are based on the peripheral route of
processing. Yet, the automotive industries are reported to be among the top four most
audio branding active industries, including financial services/banking, transportation,
Psychological Mechanisms behind Audio Branding 351

and healthcare/pharmaceuticals (Audio Branding Academy 2013). All of these industries

sell high-involvement products.
This contradiction could be explained by the concept of musical fit. Under certain
conditions, consumers in highly involved states are also likely to be influenced by music.
MacInnis and Park (1991) state that this happens when music fits with certain character-
istics in the advert: “While music may fit with many ad elements, fit is defined here
as consumers’ subjective perceptions of the music’s relevance or appropriateness to the
central ad message” (162). Accordingly, even with highly involved consumers, music
that activates certain knowledge structures relevant to the advertising message can
lead to a positive attitude change toward an advertised product. MacInnis and Park
evaluated this theory by showing that music was most effective when it fit the prod-
uct and brand characteristics that were presented to highly involved consumers in a
fictitious TV advert.
While some studies were able to replicate these findings (e.g., Zander 2006; for a
review, see North and Hargraves 2008), several contradictory results were reported that
were assumed to be caused by an often quite experimenter-focused definition of music-
product-fit that accorded with a lack of specificity of theoretical models employed
(North and Hargraves 2008). Therefore, in the following sections, I will specify a theo-
retical model that might explain why some music fits to a certain brand and advertisement
while some other types of music do not. In addition to the affective qualities of music
that make it an unconditioned stimulus when presented together with brands, cognitive
and associative meaning structures also seem to be involved when music is used in
branding. Thus, effective music elicits attentional, emotional, and cognitive responses in
consumers, and the following sections will illustrate which types of responses are relevant
for branding. However, before that, I will elaborate the structure of different cognitive-
affective meanings that are the focus of the branding process in general (to which different
music types might appear be more or less fitting).

Meaning Structure in Identity-Based

Brand Management

In the consumer-based brand equity model pyramid by Keller (2009), the first step in
developing a brand is to create brand salience. The use of branding helps to create aware-
ness and attention for a product and makes it possible to differentiate one product from
another similar product. When a brand has salience, an associated visual logo gains sign
qualities that refer to its product. Keller, furthermore, distinguishes brand performance
from brand imagery that result in judgments and feelings in consumers. While brand
performance is related to more functional aspects of the products (like quality, price,
service, or reliability), brand imagery is instead based on associative qualities like the
brand identity. If a brand has performance and imagery characteristics that are also
352 HAUKE EGERMANN

evaluated and responded to positively, the top of Keller’s pyramid is reached, which he
terms brand resonance: customers show loyalty with the brand and its product(s), and
this is accompanied by attachment, a sense of community, and engagement. Thus, estab-
lishing a brand image is thought to create a benefit to those who aim to market commercial
products and services: “According to this view, brand knowledge is not the facts
about the brand—it is all the thoughts, feelings, perceptions, images, experiences and
so on that become linked to the brand in the minds of consumers (individuals and
organizations)” (Keller 2009, 143).
But how is such a brand image created? Many authors relate it to the constant and
strategic planning and implementation of a brand identity. Accordingly, a brand image
is received and constructed by a consumer and can be seen to result from a brand identity
that was created by a sender (Kapferer 2012).
Brand identities share several similarities with the identities of human individuals
and social groups (Azoulay and Kapferer 2003). In this view, brand identities are con-
structed through human expressions that have led some authors to the conclusion that
consumers choose brands like they choose friends. Azoulay and Kapferer note, “human
individuals are perceived through their behaviour, and, in exactly the same way, con-
sumers can attribute a personality to a brand according to its perceived communication
and ‘behaviours’ ” (2003, 149). Furthermore, Aaker reports that consumers might even
view brands as their partners (1995). Therefore, in general, brands could have as many
characteristics as humans have. However, in consumer research and marketing practice,
several attributes have received more attention than others and hence seem to be the
most important: brand personality, brand values, and brand demographical-regional origin
(see also Burmann et al. 2003).
Brand personality and values have been described through several theoretical models.
In psychology, personality is often generally described as a construct that allows us to
explain individual differences in behavior, thought, and feelings that are stable and
coherent in humans (Mischel et al. 2004). It is often broken down into five different
facets consisting of: (1) openness to experience, (2) conscientiousness, (3) extraversion,
(4) agreeableness, and (5) neuroticism (also called the Five-Factor model, see Digman
1990). One widely used conceptualization of brand personality is that of Aaker (1997),
who describes it as a set of all human characteristics that can be associated with a brand.
These consist of the following five dimensions: sincerity, excitement, competence,
sophistication, and ruggedness (see Table 17.1). While it can be discussed whether all
these attributes can be considered personality features in a narrow sense (and show only
partially a similarity to the aforementioned Five-Factor model from psychology), it is
obvious that the same words could be used to describe humans. Furthermore, this
model is used in various marketing contexts and it has been empirically shown that
communicating brand personality characteristics create unique, congruent, and stronger
brand associations in consumers (Freling and Forbes 2005).
According to Schwartz (1992), there is a limited and fixed set of general, universal
human value types. These are based on universal human needs that manifest themselves
in behavioral orientations. Accordingly, “Values (1) are concepts or beliefs, (2) pertain to
Psychological Mechanisms behind Audio Branding 353

Table 17.1 Aaker’s Brand Personality Dimensions and Attributes (Aaker 1997)
Dimension Sincerity Excitement Competence Sophistication Ruggedness

Attributes Down-to-earth Daring Reliable Upper class Outdoorsy

Spirited Intelligent Charming Tough
Honest Imaginative Successful
Wholesome Up-to-date
Cheerful

desirable end states or behaviors, (3) transcend specific situations, (4) guide selection
or evaluation of behavior and events, and (5) are ordered by relative importance”
(Schwartz 1992, 4). Schwartz presented ten motivational types that can be used to group
values: universalism, benevolence, tradition, conformity, security, power, achievement,
hedonism, stimulation, and self-direction. Furthermore, he showed that this structure
of universal value types was found across different cultures. This list of value types has
subsequently been adopted to branding contexts, where some value types were found
not to be applicable (e.g., universalism, conformity, security) and others were added
(like aesthetics, ecology, or health; see Gaus et al. 2010). Allen (2002) showed that brands
that endorse human values that match those of consumers are preferred because of the
perceived product similarity to the consumers’ self-concepts. Accordingly, brands can
be used by consumers to express their self-identities.
Demographic-regional origin generally refers to the regional localization and
demographic context of a brand (Thakor and Kohli 1996). Different products and
product types are associated with different countries (e.g., alcoholic drinks like vodka
with Russia or whisky with Scotland) that evoke certain associative meaning patterns.
Furthermore, brand identities can also refer to certain demographic characteristics like
age, gender, or social status (Batra et al. 1993).

The Role of Music in Creating

Brand Images

According to the previously presented literature on brand personality, values, and

demographic-regional origin, brand identities can be differentiated via a combination of
personal, emotional, and cognitive human attributes that manifest themselves through
communication activities. The following section will illustrate how music is used in
audio branding as a medium to communicate these identity-building meaning struc-
tures that result in brand images. First, I will illustrate how music can establish brand
salience, after which I will elaborate on how music has emotional and cognitive meaning
that is relevant to branding activities.
354 HAUKE EGERMANN

Brand Salience
The ability to identify and localize objects is an important function of our auditory
perception system. Changes in auditory streams have been shown to lead to an increase
in attention allocation that is accompanied by a short activation of the peripheral nervous
system (the so called orienting response; see Chuen et al. 2016). These findings imply
that dynamic music and sounds employed in branding lead to an increased awareness of
a brand. For instance, playing music at a point of sale might direct customers’ attentional
foci to the location of the sound source. The concept of musical fit might also play an
important role in direction attention. According to the congruence-associations frame-
work presented by Cohen (2001), music that is presented together with a visual narrative
will influence how the narrative is perceived. While this theory was originally developed
to explain the effects of music in film, it could also be applied to advertising. It was
shown that those aspects of a visual narrative that are congruent to the music will
likely be in the focus of a perceiver’s attention (Marshall and Cohen 1988). Furthermore,
the associative-emotional meaning of the music will then be attributed to this focus of
visual attention. Thus, presenting music in an audiovisual commercial that structurally
or semantically fits a visually presented brand identity will lead to an increased attention
for the brand.
Like visual logos, audio logos help to memorize and identify a brand. The constant
presentation of musical elements together with a product can lead to a long-term memory
representation that enables brand recognition and recall. According to Keller (2009),
brand recognition refers to a situation in which a consumer is able to confirm prior
exposure to a brand when presented with a related brand cue (e.g., a visual logo). Brand
recall describes a situation where a consumer recalls a brand when only a product cate-
gory is primed. In audio branding, a consumer learns to associate musical/acoustical
elements (audio logo) with a brand, and subsequent exposure to the logo will activate
the mental representation for the brand and product. In a telephone survey that fol-
lowed the presentation of a nine-month automobile advertising campaign, Stewart and
colleagues (1990) observed that 83 percent of respondents recalled seeing the advertisement
when presented with a short musical excerpt that was used in the advert, whereas only
62 percent remembered seeing the advert when presented with the product name. Thus,
the musical cue was more sensitive than the verbal cues and resulted in stronger activations
of the mental network that represent the brand and its advert (brand recall). Audio logos
are almost always quite short, making them easy to memorize, and often use the melodic
elements of pitch and rhythm. Employing these musical features, audio logos can be
presented with varying timbres while preserving their original identity, which then
enables brand recognition. Related to this, a study by Bonde and Hansen (2013) implies
that pitch information is more perceptually relevant than rhythm information in audio
logo recognition. In a statistical analysis of musical features of radio station singles and
audio logos, we found that they were on average four notes long (range 3–9 notes) which
is likely to match the capacity of the short-term memory (Muellensiefen et al. 2015).
Psychological Mechanisms behind Audio Branding 355

Summarized together, previous research indicates that musical elements and sounds
presented together with brands are able to create brand awareness and brand memora-
bility. In this way, they contribute to brand salience, especially when the music used fits
(visual) brand qualities.

Emotional Brand Meaning

In addition to creating brand salience through establishing brand expressions that lead
to additional and stronger memory representations, music may contribute to creating
an emotional brand image and an emotional response to a brand. Accordingly, under-
standing which events elicit which emotional responses allows the prediction of human
behavior, a matter that is also considered important in consumer behavior research
(Hirschmann and Holbrook 1982). Purchase decisions and attitudes toward adverts,
products, and brands are very likely to be influenced by consumers’ emotional reac-
tions to them (Bagozzi et al. 1999). Thus, marketing strategies often target emotional
responses through communication measures. Here, one has to differentiate between
emotions that are expressed and recognized in the music/advertisement or emotions
that are felt as one’s own response to the music/advertisement.

From Expression to Recognition and Feeling of Emotion in Music

In one listener, a sad piece of music might be perceived as expressing a negative e motional
state and at the same time induce an unpleasant feeling. In another listener, that same
piece might induce a pleasant feeling. Since Gabrielsson (2002) called for a differenti
ation between expression and induction of emotion, there have been several experiments
comparing the two phenomena (Evans and Schubert 2008; Hunter et al. 2010; Kallinen
and Ravaja 2006). However, these studies have often remained rather exploratory in
showing that different relations between different emotion types do exist, but theo
retically grounded explanation of how the two different phenomena can be linked have
only been presented recently (Egermann and McAdams 2013). This differentiation between
expressed/recognized and induced emotion in music might equal the differentiation
between brand identity (that is created by a sender) and the resulting brand image (that
is constructed in a perceiver).
Why is music able to express emotions that are recognized by a listener? An emotional
expressive musician can make use of knowledge acquired during his or her musical
training (Juslin et al. 2006). Juslin and Laukka (2003) suggest that the ability to express
and recognize emotion in music might be based on acoustical similarities between
features used in music and those used in human vocal expressions. In a review of
145 studies from both domains (music and vocal), they show that basic emotions like
sadness, anger, fear, happiness, and tenderness are communicated with distinct patterns
related to pitch, intensity, timing, and timbre that are independent of the respective domain.
This idea was later expanded to other expressive body movement sounds by showing
356 HAUKE EGERMANN

partial similarities between musical expressions and walking sounds (Giordano et al.
2014). Hearing action sounds can lead to an understanding of associated actions through
activations of mirror neurons (Kohler et al. 2002). This coupling is thought to be based
on Hebbian learning that may have the capacity to bind perceptions, actions, and emo-
tional expressions together (Keysers and Gazzola 2009). This perspective on emotion
emphasizes the importance of the behavioral response component of emotion as described
by Scherer (2005). Accordingly, the main function of emotional responding can be
described as coordinating approach and avoidance behavior. Thus, motion and emotion
are strongly linked. Expressing and recognizing emotion through movement sounds
seems to be a general human capacity that could also apply to emotion expression and
recognition in music. This leads to the hypothesis that music might sound emotional
to us because it sounds like someone is moving in an emotionally expressive way.
Expressive movement characteristics of music that are presented together with a brand
might influence how a listener will perceive the identity of a brand. Yet, how does
expressing and recognizing emotion in music lead to the induction of an emotional
response in a listener? The following section presents several theoretical and empirical
accounts that try to explain why music creates emotions that we attribute to ourselves.

Music-Emotion Induction Mechanisms Applied to Audio Branding

One of the first attempts to summarize music-related emotion-induction theory and
research was presented by Scherer and Zentner (2001) who described several central
and peripheral emotion production routes incorporating appraisal, empathy, memory, and
peripheral arousal (see also Scherer and Coutinho 2013). In two more recent reviews,
Juslin and colleagues identified several additional psychological mechanisms (Juslin
and Västfjäll 2008; Juslin et al. 2010) that are thought to be involved in the induction of
different emotional qualities through music: cognitive appraisal, evaluative conditioning,
visual imagery, episodic memory, musical expectancy, brain stem reflexes, emotional
contagion, and rhythmic entrainment.
cognitive appraisal
Cognitive appraisal is thought to be involved in creating an emotional response to
music when the music listened to and the corresponding situations are evaluated.
Here emotions are thought to emerge from an appraisal on several dimensions like
novelty, urgency, coping potential, norm compatibility, or goal congruence (see also
Scherer 1999). In Egermann and colleagues (2013), my colleagues and I demonstrate how
social appraisal processes moderate emotional responses to music when participants
conform with social norms. Here, study participants were confronted with the emotional
impact ratings of previous participants. As this social source of information was stronger
than a computational source, we concluded that participants conformed based on
normative influence, indicating that cognitive appraisal processes might play a role in
responding to music. Furthermore, music that previously helped a listener in achieving a
certain goal in the past is remembered and could be used as an unconditioned emotional
stimulus in a branding context. For example, relaxing music that was previously listened
Psychological Mechanisms behind Audio Branding 357

to that resulted in the goal-congruent effect of relaxation in a listener might be associated

to the positive emotions of achieving the intended goal. On the other hand, music that
previously hindered a listener in reaching a goal (e.g., the loud party music from your
neighbors at 3:00 a.m. in the night preventing you from falling asleep) might result in
negative responses to it. Thus, using either type of music in branding will have a dif
ferent impact on the brand image and brand judgments in the consumer. This idea
draws attention to memory processes that are involved when listeners respond to music
emotionally. These mechanisms are particularly involved in creating interindividual
differences in consumers’ responses to music in branding contexts.
Memory-Based Mechanisms
Evaluative conditioning happens when an unconditioned positive or negative stimulus
has been paired with music and affects its emotional response to it. Episodic memory is
thought to be involved when personal emotional episodes are memorized and remem-
bered together with music. While the occurrence of evaluative conditioning and music has
not been investigated under laboratory conditions, Juslin and colleagues (2015) reported
that music that was often used in certain emotional situations (weddings, graduation,
beginning of summer) was able to induce the targeted emotions of happiness and nos-
talgia (see also Janata et al. 2007). Accordingly, these emotions might have been induced
through episodic memory. In the context of branding, these two mechanisms (con
ditioning and episodic memory) might be similar to second-order conditioning: the
original, conditioned stimulus music (CS1) is first paired with another unconditioned
and emotional stimulus (US) and then subsequently transfers its newly acquired
emotional meaning to another conditioned stimulus (CS2) through paired presentation,
which in this context is a brand.
Furthermore, the learning of statistical regularities of musical structures is thought to
give rise to listener expectations. Huron (2006) suggests that there are four different
types of expectation associated with music, which are created by statistical learning in
different auditory memory modules. Veridical expectations are derived from episodic
memory and contain knowledge of the progression within a specific piece. Schematic
expectations arise from being exposed to multiple pieces and contain information about
general event patterns of different musical styles and of music in general. Dynamic
expectations are built up through knowledge stored in short-term memory about a
specific piece that one is currently listening to and are updated in real time through
listening. Finally, Huron also describes conscious expectations that contain listeners’
explicit thoughts about how the music will sound. In his ITPRA (imagination, tension,
prediction, reaction, appraisal) theory, Huron links violations or confirmations of
these expectations to different types of emotional responses to musical structures. So
far, these links have only been partially experimentally studied. Steinbeis and colleagues
(2006) reported that harmonic expectancy violations induced increased emotional
intensity and physiological arousal. Egermann and colleagues (2013) presented an exper-
imental investigation that was conducted in a live concert setting. To confirm the existence
of the link between expectation and emotion, we used a threefold approach. (1) On the
358 HAUKE EGERMANN

basis of an information-theoretic cognitive model, melodic pitch expectations were

predicted by analyzing the musical stimuli presented. (2) A continuous rating scale
was used by one half of the audience to measure their experience of unexpectedness
toward the music heard. (3) Emotional reactions were measured using a multicompo-
nent approach: subjective feeling, expressive facial behavior, and peripheral arousal.
The results confirmed the predicted relationship between high-information-content
musical events, the violation of musical expectations (in corresponding ratings), and
emotional reactions. Musical structures leading to expectation reactions were mani-
fested in emotional reactions at two different emotion component levels: increases in
subjective arousal and autonomic nervous system activations. This emotion-induction
mechanism could be especially important when considering the timing of individual
elements in audiovisual advertisements: musical moments of expectation violation could
lead to sudden increases in experienced emotional intensity. Vermeulen and colleagues
(2011) showed that synchronous (vs. asynchronous) presentation of brand name and
musical peak moment improved the attitude toward the advert. However, they were not
able to find transfer effects from this effect on the attitude toward the brand.
Musical expectations might play another role in determining a consumer’s response
to a piece of music used in a branding context: Predictability (or expectation) of musical
structures is linked to the overall complexity of a given musical stimulus (Pearce and
Wiggins 2006). Berlyne (1971) hypothesized that stimulus complexity is positively
correlated to arousal. Furthermore, he claimed that a hedonic response to an artistic
stimulus is optimal when it induces an optimal arousal-level that is not too high or not
too low. In order to illustrate this relationship, he referred to the so-called Wundt-curve
that has the shape of an inverted U and describes the association between stimulus com-
plexity and hedonic response. There is only limited empirical evidence to show that this
theory also applies to music (e.g., North and Hargreaves 1995). If proven to be correct,
this theory could explain why some musical stimuli act as positive unconditioned stimuli
in branding and some not: Their overall predictability induces optimal or nonoptimal
arousal levels in listeners.
Cognitive appraisal, emotional memory, and musical expectations are all highly
linked to a consumer’s individual and cultural background. This would imply that our
responses to music are often very individual and there are only little interindividual
similarities in responses to music. Nevertheless, Juslin and Västfjäll (2008) also give
examples of emotional response mechanisms that are thought to be much more reliant
on low-level stimulus characteristics and perceptual processing that are less influenced
by individual learning.
Low-Level Stimulus Responses
Brain stem reflexes occur when basic acoustical characteristics (e.g., sudden, dissonant,
or loud sounds) of auditory stimuli signal events that have occurred might be relevant to
our general well-being. For example, a sudden increase in noisiness in the music (that has
similarities to the scream of a dangerous animal) might provoke an increase of arousal
in the listener (Blumstein et al. 2012). Furthermore, Nagel and colleagues (2008) reported
Psychological Mechanisms behind Audio Branding 359

that an increase in loudness was associated with the experience of music-induced

emotional peak moments (so-called chills). Creating sensory dissonance by spectral
manipulations of recorded music decreased the pleasantness of induced emotions in
Western listeners and participants from a native African population that was isolated
from Western cultural influences (Fritz et al. 2009). Employing a similar research design,
we also demonstrated that Western music, which was arousing to Canadians, induced
similar subjective and physiological arousal in a similarly isolated Pygmy population
when it was compared to calming Western music (Egermann et al. 2015). Since this pop-
ulation had never heard Western music before, we concluded that their responses were
mediated through universal response mechanisms that take basic acoustical features as
their input and create emotional meaningful responses to it. Accordingly, we observed
that Pygmies and Westerners responded similarly to acoustic changes in timbre, pitch,
and tempo, and this was indicated by higher activations of the peripheral nervous sys-
tem and subjective arousal ratings. Therefore, the arousal dimension of emotion seems
to be based on culturally universal response mechanisms. Yet, the valence dimension
(which differentiates between positive and negative emotions) seems to be influenced by
cultural learning, as there were no similarities here between the two participant groups.
Rhythmic entrainment occurs when body activity synchronizes with external musical
rhythms. Accordingly, Khalfa and colleagues (2008) reported that fast musical tempi
lead to higher respiration rates compared to slower tempi. This change in bodily arousal
is then thought to influence subjective arousal through the peripheral feedback route
(Scherer and Zentner 2001). However, so far there has been little empirical evidence
presented that has shown how the occurrence of entrainment leads to changes in sub-
jective music experiences (Labbe and Grandjean 2014). Thus, it remains unclear how
rhythmic entrainment leads to emotional responses of different qualities. Entrainment
might just be a predictor of emotional intensity instead of emotional quality.
Responses to Expressive Schemata and Personas
Emotional responses to music might be also based on (1) empathy with a performer or
composer to whom emotional expressions might be attributed (Scherer and Zentener
2001) and (2) a rather automated unconscious contagion through internal mimicking of
expressive cues in the music (Juslin and Västfjäll 2008). Both phenomena can be under-
stood as highly related because they create emotional responses that match those being
expressed. In an investigation on empathy and emotional contagion, we have shown
that they moderate whether expressed emotions are felt in the listener (Egermann and
McAdams 2013). Furthermore, music preference ratings were shown to strongly influ-
ence empathy. Thus, we are more likely to empathize with music that we like than music
that we dislike. Music that is expressive of positive emotions and that is liked by a con-
sumer will be more likely to act as an unconditioned stimulus when paired with a brand
than positive music that a listener dislikes (for an extensive discussion on the determi-
nants of music preference, see Lamont and Greasley 2012).
Also, it has been claimed that emotion is induced during the listening process when
emotional mental images are built up by music (also called visual imagery; Juslin and
360 HAUKE EGERMANN

Västfjäll 2008). These experiences are thought to be based on the nonlinear mapping
between features of the musical structures and image schemata (Lakoff 1987; Johnson
1987; for a use of visual imagery in music therapy, see also Bonde, this volume, chapter 21).
This mechanism has yet to be studied experimentally in showing that these mental images
are the course of an emotional response to music. The only study that I have found
that explicitly states that it investigates this mechanism has been published by Vuoskoski
and Eerola (2013). The authors reported that a sad narrative, that was read before
listening to a piece of music, intensified the sadness experienced by participants.
It was then concluded that, during listening, visual images of that narrative were experi-
enced by participants. However, in contrast to what was originally stated by Juslin
and Västfjäll (2008), in this case, it was not the music that brought up emotional images,
but the narrative.

Cognitive Brand Meaning

While music might fit a brand in terms of its affective qualities and could act as an
unconditioned emotional stimulus, it could also activate more or less brand-fitting
cognitive knowledge structures. Koelsch (2011) identifies two different types of meaning
in music: extra-musical and intramusical. While the latter type refers to meaning and
listener responses that emerge through relationships from one musical structure to
another (like tension that is built up before the uncertain onset of a highly expected
chord), the former type refers to nonmusical meaning that is attributed to music due to
three different sign qualities (based on Peirce’s [1994] semiotic theory): iconic meaning
where music refers to nonmusical attributes due to its structural similarity (e.g., the
association of a staircase with an ascending melodic line), indexical meaning where
music refers to an action by someone else (e.g., a sad musical expression might refer to
the sad state of a sad performer/composer), and symbolic meaning where, through asso-
ciative learning, music becomes sign-like qualities of nonmusical objects or events (e.g.,
a national anthem that is associated with a nation). The latter mechanism might be the
most relevant mechanism in helping to understand why music functions as contributing
to the cognitive structure of brands.
While there is an unlimited amount of cognitive meaning concepts that could be
potentially associated with music, the previous section on brand identities shows that
there are a much smaller number of concepts that might apply to brands and music at
the same time. Like brand identities, music is also associated with human identities.
Assessing association patterns with music, Watt and Ash (1998) show that “those catego-
ries with the highest levels of inter-subject agreement are those that are most naturally
applied to people; the lowest levels of inter-subject agreement are reached on categories
that are not naturally applied to people” (46). Thus, in Western culture, music could be
perceived as a virtual person or a virtual group of peoples. What follows illustrates what
the content of these associations might be.
Psychological Mechanisms behind Audio Branding 361

In the process of socialization, music listeners use music as a tool for social identity
formation. During social bonding processes, music preferences are often topics of
conversations (Rentfrow and Gosling 2006). The more similar music preference profiles
for two people are, the more likely these two people will bond (Boer et al. 2011). Here,
musical genres are especially associated with certain human characteristics. According
to North and Hargreaves (1999), adolescents use music as a “badge” for their social iden-
tity that communicates something about their self-concepts (see also Lamont, volume 1,
chapter 12). For example, listening to indie, classical, or pop music is associated with
several typical personal qualities and attributes. The study of North and Hargreaves has
stimulated several other investigations into the stereotypical knowledge structures that
are associated with fans and performers of different music genres (Table 17.2). Here, it
was shown, that these people were usually linked to certain demographics (e.g., age,
education, sex), values, personality traits, ethnicities, clothing styles, and various other
personal qualities (e.g., attractiveness, trustworthiness, or friendliness).
While music genres seem to be socially constructed phenomena, they can also be
described as cognitive musical schemata (Huron 2006). Genres consist of typical melodic,
rhythmic, and harmonic features and instrumental arrangements. Therefore, employing
these particular musical features in branding contexts will elicit particular genre-relevant
associations in listeners. Fischer (2009) showed, for example, that the same melodic
fragment presented on different instruments led to different typical value associations.
Tradition was positively related to the melody being performed on an accordion, an
oboe, and a violin, and negatively to a synthesizer and guitar. On the other side, hedonism
was associated with a guitar and a synthesizer but not an oboe, violin, or accordion.
Furthermore, trumpets and violins were highly associated with power.

Table 17.2 Overview of Studies Testing and Showing Relationship between

Music Genre Stereotypes and Human Characteristics
North and Rentfrow and Rentfrow and Kristen and
Human Hargreaves Gosling Gosling Shevy Rentfrow Shevy
characteristics 1999 2006 2007 2008 et al. 2009 2013

Demographics X X X
Values X X X X
Personality X X X X
Ethnicity X X X
Clothing X X
Intellect/ X X X
expertise
Other personal X X X X
qualities
362 HAUKE EGERMANN

In Egermann and Stiegler (2014) we showed that traditional instrumental pieces from
different European countries are more or less correctly associated with their country of
origin in an online listening test. While participants were not able to correctly identify
music from northern Italy or Sweden, Spanish flamenco music was correctly identified
by nearly all participants in a recognition paradigm (where participants were given the
names of different European countries to choose from). In a free open recall version of
the study where participants were asked to list all music-evoked words, again, around
85 percent of participants reported an association with the country Spain. In a second part
of this study, we showed that some music excerpts that were chosen to represent music
styles that were popular in different decades of the twentieth century were able to induce
correct time/decade associations in the listeners. Here, we observed that especially those
styles that were popular during the participant’s adolescent years were most effective.
Taken together, these results indicate, that music can activate shared meaning structures
that could be used for communication purposes (see also Shevy 2008). However, the
success of these measures depends on the similarity of interindividual, extra-musical
association networks and the strength of the learning of associations between music and
other features (exemplified by the lower recognition rate of some countries and decades).
Thus, when creating or selecting music to communicate specific, extra-musical meaning,
as done in audio branding practice, a detailed knowledge about listeners seems to be just
as crucial as the design of the stimuli themselves.

An Integrated Brand-Music
Communication Model

The previously reported theoretical and empirical reports can be summarized in the
following hypothetical model (see Figure 17.1). It presents a simplified communication
process, where a company aims to create a brand image in its customers by expressing
its brand identity using music. Here, three different functions of music are identified.
Music is thought to create salience through creating attention and establishing an addi-
tional memory representation for the brand (1. Brand Salience). Furthermore, through
shared knowledge about cognitive human attributes related to certain musical character
istics, music communicates brand values, brand personality, and many other concepts
(2. Cognitive Meaning). The characteristics associated with the social group behind cer-
tain music genres consisting of performers and listeners are here used as a tool to elicit
relevant social associations when music is chosen or produced. When brand identities
try to form brand images through music, consumers will process its social-referential
meaning with the same mental social capacity as the one they usually employ for person
perception. Furthermore, in addition to being able to express emotions that are recog-
nized by a listener (again, probably due to its similarities with typically human expressive
sounds), music is also able to evoke and induce emotion (3. Emotional Meaning). Through
Psychological Mechanisms behind Audio Branding 363

Communication Process

Company Expression Consumer

Brand identity Music Brand image

1. Salience

2. Emotional meaning

3. Cognitive meaning

Shared human attributes and knowledge structures

Fit between brand and music

Figure 17.1 Brand-music communication model.

conditioning, music might become an unconditioned stimulus that projects its emotional
and cognitive meaning to an original brand without meaning. All three functions
(providing salience and cognitive and emotional meaning) are improved when the
human attributes evoked by brands and music are semantically similar and “fit” (North
and Hargreaves 2008).
While many of the reported relationships have been studied separately, there are still
no studies that test the entire communication process from the conception of a brand
identity to the achievement of a brand image in a consumer through the use of music.
In many studies, music was chosen that had certain qualities that are relevant in this
context (being salient, emotional, or associated with cognitive concepts). Nevertheless,
few studies have focused on studying the emergence of these qualities in a branding con-
text. Therefore, this model remains speculative in that its components have not been
tested in their independent functionality. However, the anecdotal evidence reported by
audio branding practitioners (Lusensky 2008), who in their daily work influence how
consumers imagine brands to be like, is quite striking.

References
Aaker, J. L. 1997. Dimensions of Brand Personality. Journal of Marketing Research
24: 347–356.
Aaker, J. L., S. Fournier, D. E. Allen, and J. Olson. 1995. A Brand as a Character, a Partner and
a Person: Three Perspectives on the Question of Brand Personality. Advances in Consumer
Research 22: 391–395.
Allen, M. 2002. Human Values and Product Symbolism: Do Consumers Form Product
Preference by Comparing the Human Values Symbolized by a Product to the Human
Values That They Endorse?. Journal of Applied Social Psychology 32 (12): 2475–2501.
364 HAUKE EGERMANN

Audio Branding Academy. 2013. Audio Branding Barometer 2013. http://audio-branding-

academy.org/media/barometer/ABB2013_20131103.pdf. Accessed April 9, 2017.
Azoulay, A., and J.-N. Kapferer. 2003. Do Brand Personality Scales Really Measure Brand
Personality? Journal of Brand Management 11 (2): 143–155. http://doi.org/10.1057/palgrave.
bm.2540162.
Bagozzi, R. P., M. Gopinath, and P. U. Nyer. 1999. The Role of Emotions in Marketing. Journal
of the Academy of Marketing Science 27 (2): 184–206.
Batra, R., D. R. Lehmann, and D. Singh. 1993. The Brand Personality Component of Brand
Goodwill: Some Antecedents and Consequences. In Brand Equity and Advertising, edited
by, D. A. Aaker and A. L. Biel. Hillsdale, NJ: Erlbaum.
Berlyne, D. E. 1971. Aesthetics and Psychobiology. New York: Appleton-Century-Crofts.
Blumstein, D. T., G. A. Bryant, and P. Kaye. 2012. The Sound of Arousal in Music is Context-
Dependent. Biology Letters 8 (5): 744–747. http://doi.org/10.1098/rsbl.2012.0374.
Boer, D., R. Fischer, M. Strack, M. H. Bond, E. Lo, and J. Lam. 2011. How Shared Preferences
in Music Create Bonds between People: Values as the Missing Link. Personality and Social
Psychology Bulletin 37: 1159–1171. http://doi.org/10.1177/0146167211407521.
Bonde, A., and A. G. Hansen. 2013. Audio Logo Recognition, Reduced Articulation and
Coding Orientation: Rudiments of Quantitative Research Integrating Branding Theory,
Social Semiotics and Music Psychology. SoundEffects 3 (1): 113–135.
Burmann, C., Blinda, L., and Nitschke, A. 2003. Konzeptionelle Grundlagen des identitäts-
basierten Markenmanagements. LIM-Arbeitspapiere No. 1. Bremen, Germany.
Chuen, L., D. Sears, and S. McAdams. 2016. Psychophysiological Responses to Auditory
Change. Psychophysiology 53 (6): 891–904.
Cohen, A. J. 2001. Music as a Source of Emotion in Film. In Music and Emotion: Theory and
Research, edited by, J. A. Sloboda and P. N. Juslin. Oxford: Oxford University Press.
Digman, J. M. 1990. Personality Structure: Emergence of the Five-Factor Model. Annual
Review of Psychology 41: 417–440.
Egermann, H., N. Fernando, L. Chuen, and S. McAdams. 2015. Music Induces Universal
Emotion-Related Psychophysiological Responses: Comparing Canadian Listeners to
Congolese Pygmies. Frontiers in Psychology 5: 1–9. http://doi.org/10.3389/fpsyg.2014.01341.
Egermann, H., and S. McAdams, S. 2013. Empathy and Emotional Contagion as a Link between
Recognized and Felt Emotions in Music Listening. Music Perception 31 (2): 139–156.
Egermann, H., M. T. Pearce, G. A. Wiggins, and S. McAdams. 2013. Probabilistic Models of
Expectation Violation Predict Psychophysiological Emotional Responses to Live Concert
Music. Cognitive, Affective, and Behavioral Neuroscience 13 (3): 533–553. http://doi.org/10.3758/
s13415-013-0161-y.
Egermann, H., and C. Stiegler. 2014. Communicating National and Temporal Origin of Music:
An Experimental Approach to Applied Musical Semantics. In Abstract Book of the 13th
International Conference of Music Perception and Cognition. Seoul: ICMPC, 343.
Evans, P., and E. Schubert. 2008. Relationships between Expressed and Felt Emotions in
Music. Musicae Scientiae 12 (1): 75–99.
Fischer, R. 2009. Sinn für die Marke: Systematisierung des akustischen Markentransfers
anhand der Instrumentation. Unpublished master thesis. Hochschule für Musik und
Theater Hannover.
Freling, T. H., and L. P. Forbes. 2005. An Empirical Analysis of the Brand Personality
Effect. Journal of Product and Brand Management 14 (7): 404–413. http://doi.org/10.1108/
10610420510633350.
Psychological Mechanisms behind Audio Branding 365

Fritz, T., S. Jentschke, N. Gosselin, D. Sammler, I. Peretz, R. Turner, A. D., et al. 2009. Universal
Recognition of Three Basic Emotions in Music. Current Biology 19 (7): 573–576. http://doi.
org/10.1016/j.cub.2009.02.058.
Gabrielsson, A. 2002. Emotion Perceived and Emotion Felt: Same or Different. Musicae
Scientiae (Special Issue 2001–2002): 123–145.
Gaus, H., S. Jahn, T. Kiessling, and J. Drengner. 2010. How to Measure Brand Values? Advances
in Consumer Research 37: 1–2.
Giordano, B. L., H. Egermann, and R. Bresin. 2014. The Production and Perception of
Emotionally Expressive Walking Sounds: Similarities between Musical Performance and
Everyday Motor Activity. PLoS One 9 (12): e115587. doi:10.1371/journal.pone.0115587.
Gorn, G. J. 1982. The Effects of Music in Advertising on Choice Behavior: A Classical
Conditioning Approach. Journal of Marketing 46 (1): 94–101.
Hirschman, E. C., and M. B. Holbrook. 1982. Hedonic Consumption: Emerging Concepts,
Methods and Propositions. Journal of Marketing 46 (3): 92–101.
Hunter, P. G., G. Schellenberg, and U. Schimmack. 2010. Feelings and Perceptions of
Happiness and Sadness Induced by Music: Similarities, Differences, and Mixed Emotions.
Psychology of Aesthetics, Creativity, and the Arts 4 (1): 47–56. http://doi.org/10.1037/
a0016873.
Huron, D. 2006. Sweet Anticipation. Cambridge, MA: MIT Press.
Jackson, D. 2003. Sonic Branding: An Essential Guide to the Art and Science of Sonic Branding.
New York: Palgrave Macmillan.
Janata, P., S. T. Tomic, and S. K. Rakowski. 2007. Characterisation of Music-Evoked Autobio
graphical Memories. Memory 15 (8): 845–860. http://doi.org/10.1080/09658210701734593.
Johnson, M. 1987. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason.
Chicago: University of Chicago.
Juslin, P. N., G. Barradas, and T. Eerola. 2015. From Sound to Significance: Exploring the
Mechanisms Underlying Emotional Reactions to Music. American Journal of Psychology
128 (3): 281–304.
Juslin, P. N., J. Karlsson, E. Lindström, A. Friberg, and E. Schoonderwaldt. 2006. Play It Again
with Feeling: Computer Feedback in Musical Communication of Emotions. Journal of
Experimental Psychology: Applied 12: 79–95. doi:10.1037/1076-898X.12.2.79.
Juslin, P. N., and P. Laukka. 2003. Communication of Emotions in Vocal Expression and
Music Performance: Different Channels, Same Code? Psychological Bulletin 129: 770–814.
Juslin, P. N., S. Liljeström, D. Västfjäll, and L.-O. Lundqvist. 2010. How Does Music Evoke
Emotions? Exploring the Underlying Mechanisms. In Handbook of Music and Emotion:
Theory, Research, Applications, edited by P. N. Juslin and J. A. Sloboda, 605–643. Oxford:
Oxford University Press. http://doi.org/10.1093/acprof:oso/9780199230143.003.0022.
Juslin, P. N., and D. Västfjäll. 2008. Emotional Responses to Music: The Need to Consider
Underlying Mechanisms. Behavioral and Brain Sciences 31 (5): 559–575; discussion 575–621.
http://doi.org/10.1017/S0140525X08005293.
Kallinen, K., and N. Ravaja. 2006. Emotion Perceived and Emotion Felt: Same and Different.
Musicae Scientiae 10 (2): 191–213.
Kapferer, J. 2012. The New Strategic Brand Management: Advanced Insights and Strategic
Thinking (5th ed.). London: Kogan Page.
Keller, K. L. 2009. Building Strong Brands in a Modern Marketing Communications
Environment. Journal of Marketing Communications 15 (2–3): 139–155. http://doi.org/
10.1080/13527260902757530.
366 HAUKE EGERMANN

Keysers, C., and V. Gazzola. 2009. Expanding the Mirror: Vicarious Activity for Actions,
Emotions, and Sensations. Current Opinion in Neurobiology 19: 666–671. doi:10.1016/j.conb.
2009.10.006.
Khalfa, S., M. Roy, P. Rainville, S. Dalla Bella, and I. Peretz. 2008. Role of Tempo Entrainment
in Psychophysiological Differentiation of Happy and Sad Music? International Journal of
Psychophysiology 68 (1): 17–26. http://doi.org/10.1016/j.ijpsycho.2007.12.001.
Koelsch, S. 2011. Towards a Neural Basis of Processing Musical Semantics. Physics of Life
Reviews 8 (2): 89–105. http://doi.org/10.1016/j.plrev.2011.04.004.
Kohler, E., C. Keysers, M. A. Umiltà, L. Fogassi, V. Gallese, and G. Rizzolatti. 2002. Hearing
Sounds, Understanding Actions: Action Representation in Mirror Neurons. Science 297
(5582): 846–848. http://doi.org/10.1126/science.1070311.
Kristen, S., and M. Shevy. 2013. A Comparison of German and American Listeners’ Extra
Musical Associations with Popular Music Genres. Psychology of Music 41 (6): 764–778.
http://doi.org/10.1177/0305735612451785.
Krugmann, D. 2007. Integration akustischer Reize in die identitätsbasierte Markenführung.
LiM-Arbeitspapiere No. 27. Bremen, Germany.
Labbe, C., and D. Grandjean. 2014. Musical Emotions Predicted by Feelings of Entrainment.
Music Perception 32 (2): 170–185. http://doi.org/10.1525/mp.2014.32.2.170.
Lakoff, G. 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind.
Chicago: University of Chicago Press.
Lamont, A., and A. Greasley. 2012. Musical Preferences. Oxford Handbooks Online. http://
www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199298457.001.0001/oxfordhb-
9780199298457-e-015. Accessed April 7, 2017.
Lantos, G. P., and L. G. Craton. 2012. A Model of Consumer Response to Advertising Music.
Journal of Consumer Marketing 29 (1): 22–42. http://doi.org/10.1108/07363761211193028.
Lusensky, J. 2008. Sounds Like Branding. Heartbeats International. http://www.soundslike-
branding.com/pdf/slb_digital.pdf. Accessed May 7, 2016.
MacInnis, D. J., and C. W. Park. 1991. The Differential Role of Characteristics of Music on
High- and Low-involvement Consumers’ Processing of Ads. Journal of Consumer Research
18: 161–173.
Marshall, S. K., and A. J. Cohen. 1988. Effects of Musical Soundtracks on Attitudes toward
Animated Geometric Figures. Music Perception 6 (1): 95–112.
Mischel, W., Y. Shoda, and O. Ayduk. 2004. Introduction to Personality: Toward an Integration.
New York: John Wiley & Sons.
Muellensiefen, D., H. Egermann, S. Burrows. 2015. Radio Station Jingles: How Statistical
Learning Applies to a Special Genre of Audio Logos. In Audio Branding Yearbook 2014–2015,
edited by K. Bronner, R. Hirt, and C. Ringe, 53–72. Baden-Baden, Germany: Nomos.
Nagel, F., R. Kopiez, and O. Grewe. 2008. Psychoacoustical Correlates of Musically Induced
Chills. Musicae Scientiae 12 (1): 101–113.
North, A., and D. J. Hargreaves. 1995. Subjective Complexity, Familiarity, and Liking for
Popular Music. Psychomusicology 14: 77–93.
North, A., and D. Hargreaves. 1999. Music and Adolescent Identity. Music Education Research
1 (1): 75–92. http://doi.org/10.1080/1461380990010107.
North, A. C., and D. J. Hargreaves. 2008. The Social and Applied Psychology of Music. Oxford:
Oxford University Press.
Pearce, M. T., and G. A. Wiggins. 2006. Expectation in Melody: The Influence of Context and
Learning. Music Perception 23 (5): 377–405. http://doi.org/10.1525/mp.2006.23.5.377.
Psychological Mechanisms behind Audio Branding 367

Peirce, C. S. 1994. Elements of Logic. In The Collected Papers of Charles Sanders Peirce. Electronic
Edition, Vol. 2, edited by C. Hartshorne and P. Weiss. Charlottesville, NC: InteLex Corp.
Petty, R. E., J. T. Cacioppo, and D. T. Schumann. 1983. Central and Peripheral Routes to
Advertising Effectiveness: The Moderating Effect of Involvement, Journal of Consumer
Research 10: 135–146.
Rentfrow, P. J., S. D. Gosling. 2006. Message in a Ballad the Role of Music Preferences in
Interpersonal Perception. Psychological Science 17 (3): 236–242.
Rentfrow, P. J., and S. D. Gosling. 2007. The Content and Validity of Music-Genre Stereotypes
among College Students. Psychology of Music 35 (2): 306–326.
Rentfrow, P. J., J. A. Mcdonald, and J. A. Oldmeadow. 2009. You Are What You Listen To:
Young People’s Stereotypes about Music Fans. Group Processes and Intergroup Relations
12 (3): 329–344. http://doi.org/10.1177/1368430209102845.
Scherer, K. 1999. Appraisal Theory. In Handbook of Cognition and Emotion, edited by
T. Dalgleish and M. Power. 637–663. Chichester, UK: Wiley.
Scherer, K. R. 2005. What Are Emotions? And How Can They Be Measured? Social Science
Information 44 (4): 695–729.
Scherer, K. R., and M. R. Zentner. 2001. Emotional Effects of Music: Production Rules. In
Music and Emotion: Theory and Research, edited by P. N. Juslin and J. A. Sloboda, 361–392.
Oxford: Oxford University Press.
Scherer, K. R., and E. Coutinho. 2013. How Music Creates Emotion: A Multifactorial Process
Approach. In The Emotional Power of Music Multidisciplinary Perspectives on Musical
Arousal, Expression, and Social Control, edited by T. Cochrane, B. Fantini, and K. R. Scherer.
Oxford: Oxford University Press.
Schwartz, S. H. 1992. Universals in the Content and Structure of Values: Theoretical Advances
and Empirical Tests in 20 Countries. In Advances in Experimental Social Psychology, Vol. 25,
edited by M. Zanna, 1–65. Orlando, FL: Academic Press.
Shevy, M. 2008. Music Genre as Cognitive Schema: Extramusical Associations with
Country and Hip-Hop Music. Psychology of Music 36 (4): 477–498. http://doi.org/10.1177/
0305735608089384.
Steinbeis, N., S. Koelsch, and J. A. Sloboda. 2006. The Role of Harmonic Expectancy Violations
in Musical Emotions: Evidence from Subjective, Physiological, and Neural Responses.
Journal of Cognitive Neuroscience 18 (8): 1380–1393.
Stewart, D. W., K. M. Farmer, and C. I. Stannard. 1990. Music as a Recognition Cue in
Advertising-Tracking Studies. Journal of Advertising Research (September): 30 (4) 39–48.
Thakor, M. V., and C. S. Kohli. 1996. Brand Origin: Conceptualization and Review. Journal of
Consumer Marketing, 13 (3): 27–42.
Vermeulen, I., T. Hartmann, and A.-M. Welling. 2011. The Chill Factor: Improving Ad
Responses by Employing Chill-Inducing Background Music. Proceedings of the 61th Annual
Conference of the International Communication Association (ICA), May 26–30, Boston, MA.
Vuoskoski, J. K., and T. Eerola. 2013. Extramusical Information Contributes to Emotions
Induced by Music. Psychology of Music 43 (2): 262–274. http://doi.org/10.1177/0305735613502373.
Watt, R. J., and R. L. Ash. 1998. A Psychological Investigation of Meaning in Music. Musicae
Scientiae 2 (1): 33–53. http://doi.org/10.1177/102986499800200103.
Zander, M. F. 2006. Musical Influences in Advertising: How Music Modifies First Impressions
of Product Endorsers and Brands. Psychology of Music 34 (4): 465–480. http://doi.org/
10.1177/0305735606067158.
chapter 18

Sou n d a n d Emotion

Erkin Asutay and Daniel Västfjäll

Introduction

Auditory stimulation in our daily life is an ever-present phenomenon. We are usually

subject to a constant stream of sounds, even when we are asleep. Throughout this
handbook, authors approach sound from many different perspectives, from art to tech-
nology, from politics to psychology. Among many other indicators, this alone shows
that sound perception is multifaceted. Sound can provide us information but it can also
move, harmonize, or traumatize us (LaBelle 2007). Yet, sometimes we can totally ignore
it. Sounds also underlie most communication. We talk to people and listen to what they
have to say, and the communication is carried out not only by language but also by prosody
and intonation. We listen to music to relax and to stimulate ourselves. We can even use
sounds to communicate to other species such as our pets. Apart from this communication
aspect, sounds inform us regarding objects, events, and spaces that surround us. The
auditory system makes sense of this complex input in a seemingly effortless and effective
manner. A subset of the auditory input may be of interest to us while the rest may go
unnoticed. With the help of executive functions and selective attention, we are able
to decompose the acoustic input that reaches our ears into separate auditory streams
and focus on the ones that are of value while pushing others into the background.
Psychological mechanisms such as attention, motivation, and prior experience can have an
impact on sound perception. This chapter focuses on the relationship between sound
and emotion; that is, how sounds evoke emotions and how emotional processes influence
sound perception and auditory attention. The main objective of the current chapter is to
show that affective experience is integral to auditory perception.
370 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

Auditory stimuli have a great potential to evoke emotions in people (Armony and
LeDoux 2010; Tajadura-Jiménez 2008). The auditory system scans our surrounding
environment, detects and identifies significant objects and events, and signals for attention
shifts when necessary (Juslin and Västfjäll 2008). It can also orient the visual system to
a particular region of interest (Arnott and Alain 2011). Critically, it has been shown
that the auditory system takes the behavioral state of the organism (i.e., emotional, moti-
vational, and attentional) into account while processing auditory stimuli (Weinberger
2010). On the other hand, emotions work in concert with perceptual processes. They
can guide us to establish our motivation and preferences about objects, events, and
places (Lang and Bradley 2010) and can call for rapid mobilization for action when
necessary (Frijda 2008). Here, we present evidence documenting the interplay between
auditory and emotional processes.
Moreover, imagination, in this chapter, is broadly taken as mental representations
that are induced by sounds; and we focus on the impact of these mental representations
on the affective experience during sound perception. These imaginations could be very
different depending on the context, the listener’s condition, and the sound itself.
We make an overall classification of these mental representations from the perspective
of the distinction between musical listening and everyday listening (Gaver 1993). The
distinction comes from the application of an ecological approach to sound perception
(Clarke 2005; Neuhoff 2004). Imagine that you are walking by the pier and you hear a
sound. If you pay attention to the sound, you may focus on its perceptual features like
loudness, pitch, and timbre and how these features evolve in time. On the other hand,
you might just notice that you hear the sound of a passing boat, and your attention will
be on the source of the sound. The former is an example of musical listening, while the
latter exemplifies everyday listening. Note also that the distinction between everyday
and musical listening does not suggest that all musical sounds are received in musical
listening mode and vice versa.
In the following, we first start with basic properties of the auditory system, and
present a view of the auditory system as an adaptive and cognitive network that special-
izes in processing acoustic stimulus features while integrating behavioral state of the
organism to its processing. This forms the biological and behavioral basis for our main
argument that affective experience is one of the main parts of sound perception. Next, in
order to show the tight connections between auditory and affective processes, we focus
on affective responses to auditory stimuli, reviewing empirical evidence from behavioral
and neuroimaging studies. We present the subject in three different sections: responses
to learned emotional meaning of sounds, responses to vocal signals, and responses to
music. In doing so, we also attempt to make clear how these different sources of stimuli
induce affective reactions in us and how we respond to them. We also discuss how
the mental representations evoked by sounds influence affective reactions to auditory
stimuli. Then, we present evidence on how the affective significance of sounds can
influence perception and attention. Finally, we will bring all this together and underline
the main argument of this chapter that the affective experience is an integral part of
sound perception.
Sound and Emotion 371

The Auditory System

Sound perception is a fundamental part of our interactions with and experience of the
external environment. We receive a continuous flow of auditory stimulation from our
surroundings, and the auditory system makes sense of this input. It has been suggested
that the auditory system has evolved as an alarm system that scans our surroundings,
detects salient events in it, and signals for attention shifts to prioritized targets (Juslin
and Västfjäll 2008).
The reception of sound starts at the ear, which is a specialized organ in sensing local
pressure fluctuations. Sound waves travel through the ear canal and set the ear drum in
motion, which in turn sets the three bones in the middle ear vibrating. Their function is
to amplify the mechanical oscillations and transmit them to the inner ear. These oscil-
lations travel through the fluid in the cochlear canals and set the basilar membrane in
motion. The hair cells in the cochlea generate action potentials depending on the basilar
membrane motion. Hence, in this manner, acoustic signals are converted to neural sig-
nals that travel from the auditory nerve to the central nervous system. On this auditory
pathway, substantial information processing takes place in the brain stem and several
midbrain stations. The information generated in these structures is sent to the thalamus,
which is a relay station that collects signals from the periphery and passes it to the sen-
sory cortices. The primary auditory cortex (A1) is located in the superior part of the
temporal lobe of the brain, and the adjoining areas are referred to as the auditory belt
areas (Woods et al. 2009). Neurons in the A1 have higher sensitivity to acoustic stimulus
features compared to the belt areas, whereas the belt areas show a greater attentional
modulation than A1 neurons do (Woods et al. 2010). Neurons in the auditory pathway
have preferred frequency regions that they respond to; and in most of the auditory areas
there is tonotopic organization: an orderly correspondence between the location of the
neurons and their specific frequency tuning (for detailed information on the auditory
system, see Moore 2012; Rees and Palmer 2010).

Sound Perception and Localization

Perception of a sound can be characterized by a number of subjective (perceptual)
dimensions. The main perceptual dimensions of acoustic stimuli are loudness, pitch,
and timbre (Fastl and Zwicker 2007). Loudness is one of the most important low-level
features, and is the perceptual equivalent to sound intensity. Nevertheless, loudness is
only moderately correlated to sound intensity. This is mainly due to the fact that human
hearing is not equally sensitive over the entire hearing spectrum. It is most sensitive to
mid-frequency ranges (from approximately 100 to 5,000 Hz) where speech signals con-
tain the most energy. Pitch is another main perceptual dimension of auditory stimuli
and is related to frequency content and spectrum. Here, pitch also only roughly correlates
372 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

with the frequency spectrum of acoustic signals. Pitch perception arises from tonality,
periodicity, and harmonicity. Hence, both the temporal and spectral aspects contribute
to pitch perception (for more detailed accounts of pitch and loudness, see Fastl and
Zwicker 2007; Moore 2012; Wang and Bendor 2010; Young 2010). Two sounds can have
both the same loudness and pitch, yet could sound completely different from one
another. To exemplify, consider two different instruments playing exactly the same tone
at the same loudness. Timbre is the perceptual quality that accounts for the differences
between the two instruments. It is a multidimensional feature; that is, it arises from vari-
ous aspects of acoustic signals (e.g., transients, relative strength of harmonics).
Auditory stimuli also provide spatial information. Localizing sound sources in space
is a computationally challenging task, since the auditory system, unlike the visual system,
seems to lack a topographical space representation. Spatial cues have to be computed from
the signals that reach the respective ears. Intensity and arrival time differences between
the respective ears provide cues for sound localization (Blauert 1997). Interaural time
difference (ITD) is the main cue for the perceived azimuth of low-frequency sounds
(below approximately 1.5 kHz; Hartmann et al. 2013), while interaural level difference
(ILD) seems to be more useful for high-frequency signals (above 2 kHz). Apart from
these binaural cues, humans also employ monaural cues to extract auditory spatial
information. Here, the auditory system makes use of the spectral modulations of the
incoming sound that are caused by the shape of the outer ear and the incoming angle of
the sound waves (Blauert 1997). Although monaural cues are highly frequency depend-
ent, they can be useful for localizing sounds in the median plane (e.g., front vs. back).
It seems that the neural processing of the auditory spatial information (ILDs and ITDs)
already starts at the brainstem level (see Ahveninen et al. 2014; Yin and Kuwada 2010).
While the role of the auditory cortex on spatial processing is not clear, recent research
has led to a two-channel model (hemifield code), in which two neuronal populations are
broadly tuned to the left or the right side of the auditory space (Stecker et al. 2005).
According to the hemifield code the joint activity of these two populations leads to the
azimuth perception.

Attention and Higher Order Influences

The auditory system can decompose the complex auditory input into separate streams
based on physical and perceptual principles (very much like Gestalt principles) in a
seemingly preattentive manner (Bregman 1999). These separate auditory streams, which
are perceived as coherent entities, compete for attentional resources to guide perception
and behavior (Fritz et al. 2007; Shinn-Cunningham 2008). Perception of separate audi-
tory streams in complex environments depends on both the stimulus characteristics
and the listener’s attentional state and intentions. For instance, while listening to an
orchestral piece one can focus on and listen to a particular instrumental section, or one
can attend to the music as a whole. Previous research has found that frequency, phase,
temporal envelope, and source location differences between successive and concurrent
sounds can facilitate stream segregation (Moore and Gockel 2012).
Sound and Emotion 373

Research on auditory attention has indicated that the attentional modulation of the
auditory cortex could facilitate the processing of behaviorally relevant sounds (Petkov
et al. 2004). The auditory cortex shows both learning-induced (Ohl and Scheich 2005) and
attention-driven plasticity (i.e., changes in the neural responses due to factors like moti-
vation, learning, stimulus-statistics, etc.; see Ahveninen et al. 2011). It can also acquire
specific memory traces (Weinberger 2004) and adapt to the changing nature of auditory
environments (Dahmen et al. 2010). Spatial sensitivity of the auditory cortex is enhanced
by engaging auditory (Lee and Middlebrooks 2011) or visual spatial tasks (Salminen
et al. 2013). Furthermore, auditory brain stem responses can be modulated by working
memory load (Sörqvist et al. 2012) and selective attention (Lehmann and Schönweisner
2014). Taken together, these findings indicate that the processing of auditory stimuli is
dynamic, adapts to changing environments, and is optimized to process behaviorally
significant stimuli. The adaptive capacity of the auditory system suggests that the audi-
tory cortex is not a mere acoustic analysis center. It has been argued that the auditory
cortex can integrate higher-order, nonauditory input (e.g., motivation, attention, motor
function) into its processing (Weinberger 2010). Apart from the cortex, studies on the
inferior colliculus (IC—a hub for the construction of a higher-order auditory percept)
in the auditory midbrain shows that neural activity in the IC is sensitive to factors such
as eye movements, learning-induced plasticity, motivation, emotion, and task engage-
ment (Bajo et al. 2010; Gruthers and Groh 2012; Malmierca 2005; Marsh et al. 2002).
Furthermore, connectional analyses (mainly of the cat and the primate brain) indicate
that the auditory network shows a unique architecture with its corticocortical, thalamo-
cortical, and corticocollicular connections (Read et al. 2002; Winer and Lee 2007).
Taken together, the behavioral and functional evidence presented in this section sug-
gests that the auditory network is specialized in processing acoustic stimulus features as
its main input; and it also makes use of the information on the behavioral state of the
organism during auditory processing.

Emotional Responses
to Auditory Stimuli

How does sound induce emotions? In this section, we discuss the affective experience
induced by various auditory stimuli such as environmental sounds, vocalizations,
and music. The main aim is to present the close relationship between auditory and
affective processes.
In her work on auditory-induced emotions during everyday listening, Tajadura-
Jiménez (2008; Tajadura-Jiménez and Västfjäll 2008) suggested four general con
tributing factors to the affective experience induced by auditory stimuli: physical, spatial,
cross-modal, and psychological. The physical factors are related to acoustical features of
sounds (such as loudness, pitch, duration, transients, etc.) causing affective reactions in
people. In basic psychoacoustic research, the effects of physical features on sound
374 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

perception are generally studied using tone and noise complexes that do not possess
semantic content or a particular sound source (Fastl and Zwicker 2007). The perceived
loudness and sharpness (i.e., high/low frequency balance) of such tone and noise com-
plexes can be related to the affective reactions they induce (Västfjäll 2012). In music, for
instance, sounds that feature dissonant, loud, sudden, or fast temporal components can
induce physiological arousal and negative affect in listeners (Juslin and Västfjäll 2008).
Auditory stimuli also provide spatial information regarding both the spaces we occupy
and objects in our surroundings (i.e., their location and motion with respect to our
bodies), and this spatial information can also possess affective quality (Asutay and
Västfjäll 2015a). Previous research has found behavioral, neural, and emotional biases
in favor of approaching sound sources compared to receding sound sources (Hsee et al.
2014; Maier and Ghazanfar 2007; Seifritz et al. 2002; Tajadura-Jiménez, Väljamäe, et al.
2010). In particular, approaching sounds are found to be more emotional and behaviorally
salient than receding sounds. The impact of sound source distance and location together
with room size on affective responses has also been studied in the context of everyday
listening (Tajadura-Jiménez, Larsson, et al. 2010). Cross-modal factors in auditory-
induced emotion are related to the role of the information that we gather from other
modalities. This happens when affective information we receive from one modality
influences processing in another (Gerdes et al. 2014). Finally, the psychological factors
that influence emotional reactions to auditory stimuli are related to specific meaning
and interpretation of a sound, and associations evoked by a sound and/or its source. In
everyday listening, these factors are related to sound source identification and semantic
content (Tajadura-Jiménez 2008).

Mental Representations Induced by Sound

One of the psychological mechanisms through which sounds induce emotions in people
is by evoking certain mental representations. As introduced at the beginning of the chapter,
we approach imagination as mental representations evoked by sounds. The nature of
these mental representations can be very different depending on the situational context,
the listener’s behavioral state, and the sound itself. In everyday listening, where we hear
events and objects rather than sounds per se, mental representations evolve around the
sound source and psychological associations evoked by the sound source. Imagine hear-
ing a bird song while sitting on a park bench. In everyday listening mode, you would
almost automatically identify it as a bird; and you might do that without even focusing
on the sound itself. The mental representation of a sound source (e.g., a bird) and asso-
ciations evoked by that source (e.g., a serene forest, or a happy vacation in the past)
would influence the affective reactions induced by the sound. To investigate these deter-
minants of auditory-induced emotion, usually one follows an ecological approach
(Neuhoff 2004). For instance, Bradley and Lang (2000) used visual scales and psycho-
physiological measures to assess affective reactions to many different naturally occurring
sounds. In our lab, we have studied the affective quality of the meaning associated with
Sound and Emotion 375

environmental sounds (mainly related to the sound source; Asutay et al. 2012). We used
a Fourier-time-transform algorithm that performs spectral broadening to reduce the
identifiability of sounds, while preserving temporal and spectral variation. The results
indicated that emotional reactions to environmental sounds were mostly defined by the
meaning attributed to the sound source by the listener. In other words, when par
ticipants could not identify the source of a particular sound, strong affective reactions
induced by the same sound were mostly eliminated.
Mental representations evoked by music can be qualitatively very different in com-
parison with environmental sounds. For instance, it has been suggested that music can
trigger visual imagery, that is, when the listener conjures up visual images (Juslin and
Västfjäll 2008). Visual imagery is defined as a quasiperceptual experience that resembles
an actual perceptual experience but occurs in the absence of visual stimuli. The exact
nature of how music evokes mental images remains to be determined. It seems that
listeners conceptualize the musical structure using a metaphorical nonverbal mapping
between the music and an “image-schemata” that is grounded in bodily experience
(Lakoff and Johnson 1980). Visual imagery evoked by musical stimuli can be a part of
the affective experience induced by music (Juslin and Västfjäll 2008). Moreover, mental
images evoked by music can also occur in connection with memory, where certain
musical stimuli trigger a specific memory of a particular event; this process also influ-
ences affective reactions to music. Another line of somewhat relevant research comes
from auditory imagery studies where researchers have studied the nature of auditory
imagery in the absence of auditory stimulation (for detailed accounts, see Hubbard 2010;
Zatorre and Halpern 2005). Although this research is far from definitive, it has been
found that auditory imagery preserves many structural and temporal properties of
sounds and that it involves many of the same brain areas as auditory perception.

Learned Emotional Meaning of Sound

As mentioned earlier, affective reactions to environmental sounds are mostly due to the
meaning associated with them by the listener. Responses to learned emotional meaning
of sound have mostly been studied using conditioning paradigms. Classical con
ditioning involves learning relationships between events that are initially not related. In
its most basic form, the relationship between an unconditioned stimulus (US) and a
conditioned stimulus (CS) is formed through successive pairing of the two. A US (e.g., a
mild electric shock) readily evokes a response (e.g., fear) with autonomic (e.g., increased
heart rate) and behavioral components (e.g., freezing, facial expressions) without any
training. However, a CS (e.g., a tone) initially has little to no meaning for the organism.
After consistent CS–US pairing is established, a CS when presented alone starts to evoke
similar responses (at both behavioral and autonomic levels) that are caused by the
US. This is called a conditioned response (for detailed accounts on the parameters
that influence the effectiveness of conditioning, see De Houwer et al. 2001; Delgado
et al. 2006; Domjan 2005; Olsson and Phelps 2004; Rescorla 1998). Hence, a CS, which
376 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

becomes a warning signal for a US through conditioning, can evoke evolutionarily

shaped defensive and appetitive responses. In other words, a CS gains affective salience
through conditioning.
Neural structures and mechanisms associated with processing of emotional auditory
information have been extensively studied with animals using conditioning paradigms
(for detailed accounts, see Armony and LeDoux 2010; Weinberger 2010). Ample
research points to the importance of the amygdala, which is an almond-shaped structure
located in the temporal region of the brain adjacent to the hippocampus with around a
dozen interconnected nuclei, as a critical structure for auditory conditioning (Phelps
and LeDoux 2005). The amygdala is one of the most studied structures in relation to
emotional processing in all sensory modalities. Recent theories posit that the amygdala
functions as a relevance detector for biologically and emotionally salient targets in our
surroundings (Sander et al. 2003). Neurons in the amygdala change their response
patterns to the CS after conditioning. This change or plasticity usually occurs in the form
of increased firing rates to the CS. The CS-specific plasticity observed in the amygdala can
occur after just a few CS–US pairings and it persists throughout conditioning and even
during extinction (i.e., sole CS presentation after conditioning).
Furthermore, after conditioning, the frequency selectivity of the cells in the auditory
system can change to enhance their responses to the CS frequency at the expense of
other frequencies. This receptive field plasticity has been observed in both the primary
auditory cortex (A1) and in other structures in the auditory pathway (Xiao and Suga
2005). Receptive field plasticity due to associative learning in A1 is highly specific to the
CS frequency, develops rapidly in as few as five trials, shows long-term retention (can
endure up to eight weeks after a thirty-trial conditioning session), and continues to
develop (Weinberger 2010). Also, it has been shown in both positive and negative affective
contexts, and in several species including humans. Other studies found increased
perceptual sensitivity in the auditory system after associative affective learning (see,
Armony and LeDoux 2010). Taken together, these findings indicate that if an audi-
tory signal gains affective or motivational value through learning, then its represen
tation in the auditory system during its processing will develop specific plasticity. This
is perfectly in line with the high adaptive capacity of the auditory system presented
in the previous section.

Vocal Affect
Humans and most animals use vocalizations to communicate with their conspecifics.
Vocal acoustics is valuable for communication between individuals regarding important
events that may arise in their environment, for example, presence of a predator or a food
supply. Vocalizations, together with facial expressions, are also important for inferring
the emotional state of the speaker. The ability of an individual to successfully interpret the
emotional state of the speaker can be crucial for survival in certain situations, and it is
Sound and Emotion 377

critical for social interactions. Unlike other animals, humans also have language to rely
on in their communications. Speech signals can carry emotional information not only
through semantic content but also through intonation—that is, prosody.
Even though there is some conflicting evidence, recent brain imaging studies have
found increased amygdala activity for emotional in contrast to neutral vocalizations
(Fecteau et al. 2007; Sander and Scheich 2005; Sander et al. 2003; Wiethoff et al. 2009).
Other brain areas involved in emotional processing of vocal information are the temporal
(superior temporal sulcus [STS] and superior temporal gyrus [STG]) and frontal regions
(orbitofrontal cortex [OFC] and inferior temporal gyrus [IFG]). The STS has been
shown to respond to the human voice regardless of linguistic content (Belin et al. 2000).
The auditory areas along the middle and superior temporal cortex (e.g., STS and STG)
are sensitive to the emotional content in vocal signals, and their activation does not
seem to depend on attentional focus or task demands (Brück et al. 2013; Grandjean and
Frühholz 2013). On the other hand, frontal regions (e.g., OCF and IFG) seem to be
involved in emotional processing of vocal signals in a context-dependent (i.e., attentional
and task demands) fashion. Hence, models of emotional processing of vocalizations and
prosody suggest that affective processing of vocal signals takes place in regions within
the STS and STG (some models have proposed that facial expressions are also integrated
into this processing, e.g., Brück et al. 2013). The outcomes of this processing are made
accessible for higher-order cognitive processes that take place in frontal regions (Brück
et al. 2013; Grandjean and Frühholz 2013; Kotz et al. 2013; Schirmer and Kotz 2006).
Most research concerning emotional vocalizations approach the subject from the
perspective of successful decoding of the affective state of the speaker. This is mainly due
to the understanding that the main function of vocalizations is to inform the receiver
about the speaker’s emotions. However, other researchers argue that the primary function
of vocalizations is to induce emotions in the receiver (Bachorowski and Owren 2008;
Owren and Rendall 2001; Russell et al. 2003). According to this framework (known as
the affect-induction account of vocalizations, see Bachorowski and Owren 2008), the
primary function of vocal signaling is not to inform the receiver about the speaker’s
affect, even though vocalizations usually arise from the speaker’s emotions. Listeners
can clearly make inferences regarding the affective state of speakers, but this is a secondary
outcome. However, the primary outcome is that affective vocal signals induce emotional
reactions in listeners in order to modulate their behavior, depending on the context in
which the vocalizations occur and the listener’s prior experience with such signals.
Hence, vocal signals are not merely displays of the speaker’s emotions. Instead, they are
tools of social influence. The affect-induction account began with research on the func-
tions of primate calling (Owren and Rendall 2001), later applied to specific human emo-
tional vocalizations such as laughter (Owren et al. 2013). In connection to this account,
it has been argued that infant crying has a function of increasing caregiver arousal
(Zeskind 2013). Furthermore, research conducted on tamarin monkeys suggest that
species may use emotional features in their vocalizations in order to induce arousing
and calling states in receivers (Snowdon and Teie 2013).
378 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

Music
A chapter concerning emotional reactions to sound would be incomplete without
music, as it is very important with its high emotional significance for humans. Music is
an indispensable part of humanity. Musical instruments are among the oldest cultural
artifacts that have been discovered. The bone flutes discovered in Southern Germany
date back to about 35,000 years (Conrad et al. 2009). Despite being an ancient part of
human life, the evolutionary origins of music are still under debate, and this is, of course,
a very difficult inquiry to answer with certainty. Some researchers argue that music is a
human invention that has no direct adaptational biological function. For instance, Patel
(2010) proposed that music relies on brain functions that are developed for other pur-
poses and that music is not an original function that has shaped our species through
natural selection. According to Patel, humans employed previously acquired abilities to
invent music. On the other hand, there are adaptationist views postulating that music is
in fact an evolutionary adaptation with survival value. Among those, Charles Darwin
([1879] 2004) proposed that music evokes strong emotions and could be an antecedent
to our language capacity. Further, it has been suggested that music, with its capacity to
be an important channel for communication of emotions, could promote successful
reproduction and improve social cohesion (for a detailed discussion, see Altenmüller
et al. 2013; Patel 2010). These social functions of music (i.e., social cohesion, com
munication, and cooperation) might have been critical for survival for human beings.
Moreover, music can influence the autonomic nervous system and immune system
activity (Koelsch 2011), and musical emotion processing can activate serotonergic
(increased serotonin is associated with satisfactory feelings from expected outcomes)
and dopaminergic (dopamine is associated with the reward system and reward related
feelings) neuromodulatory systems in the brain (Altenmüller et al. 2013; Koelsch 2013).
Musical Emotions
The emotion-inducing power of music is usually central in adaptationist views.
Nevertheless, there is a conception that musical emotions are merely aesthetic experi-
ences. Some researchers have claimed that music cannot induce everyday emotions
such as sadness, happiness, and anger (e.g., Scherer 2003); and others argue that music
cannot induce emotions at all (Konecni 2003). One of the main arguments here is that
music cannot induce everyday emotions related to survival functions, as it does not
seem to possess any capacity related to an individual’s goals and well-being. Hence, it
can only induce subtler feelings and aesthetic experiences that are not considered as
“real emotions.” Here, we reject this conception and claim that music can in fact induce
both basic and complex emotions in listeners through various psychological mecha-
nisms, some of which are not specific to musical stimuli and are common with other
emotion-inducing stimuli. Although there are a number of emotion theories whose
proponents do not agree on a precise definition of what an emotion is, they largely agree
on several components of an emotional episode (for detailed accounts of several
Sound and Emotion 379

emotion theories, see Barrett 2006; Moors 2010; Russell 2009; Scherer 2009). Emotions
are generally brief, affective reactions to salient events, and they involve several
components such as physiological arousal (i.e., autonomic activity such as changes in
heart rate), motor expression (e.g., smiling), subjective feeling (e.g., feeling happy upon
hearing a loved song), action tendency (e.g., dancing), and regulation. Previous research
has shown that music can evoke changes in all of the components that an emotional epi-
sode would have (Juslin and Västfjäll 2008; Koelsch et al. 2010). Furthermore, music can
induce activity in core neural structures of emotion processing (Koelsch 2013), which is
another indicator that music can in fact induce emotions.

Psychological Mechanisms of Emotion Induction by Music

Music can induce emotions in many different ways. Juslin and Västfjäll (2008; Juslin
2013), in their model, have proposed a number of psychological mechanisms through
which music can induce emotions in the listener: (1) brain-stem reflex, (2) rhythmic
entrainment, (3) evaluative conditioning, (4) emotional contagion, (5) visual imagery,
(6) episodic memory, (7) musical expectancy, and (8) aesthetic judgment. Below, we very
briefly explain these mechanisms (for detailed accounts, see Juslin and Västfjäll 2008;
Juslin et al. 2010; Juslin, 2013).
Brain-stem reflex refers to the process of emotion induction due to fundamental
acoustical characteristics of the music. For instance, music that features sudden, loud, or
fast components can induce changes in arousal and negative affective reactions. Brain-stem
reflexes are automatic and related to the early stages of auditory processing. Rhythmic
entrainment refers to a process whereby the external rhythm of the music influences
certain internal bodily rhythm, such as heart rate and respiration. The adjusted bodily
rhythm then can influence emotions via proprioceptive feedback. Evaluative con
ditioning refers to a process of emotion induction by a piece of music that has been paired
repeatedly with positive or negative experiences. Thus, this mechanism involves learning
of the association between the music and the affective experience. Emotional contagion
refers to a process of emotion induction where the listener perceives the emotional
expression of music, and then this mimetically leads to the induction of the same
emotion. For instance, a piece of music that evokes happiness through contagion could
be fast, moderately loud, and with an intonation contour that makes great leaps. On the
other hand, sad music could involve slower and softer sections. Some researchers have
explained this as music mimicking emotional expressions (Johnson-Laird and
Oatley 2008). Visual imagery refers to a process of emotion induction when music
evokes visual images with affective qualities (e.g., a serene landscape). Mental images
can trigger affective reactions internally, and music is argued to be effective in stimulating
mental images. Episodic memory refers to a process of emotion induction because a
piece of music evokes a memory of a particular event from the listener’s past. When a
memory is evoked, the emotions associated with it are evoked as well. Musical expect-
ancy refers to a process of emotion induction where a specific musical feature confirms,
violates, or delays listeners’ expectations, and this in turn may lead to feelings of surprise,
380 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

tension, or suspense (Meyer 1956). Musical expectations are related to the anticipation
of future sounds, which involves memory and statistical learning of musical structures.
In addition, expectation and anticipation are linked to the reward processing and the
dopaminergic system in the brain (Huron and Margulis 2010). Finally, aesthetic judgment
refers to emotional reactions induced through a subjective evaluation of the aesthetic
value of music. Taken together, one may argue that emotional reactions to music could
occur through several psychological mechanisms, some of which are not specific to
music but, instead, are common to other emotion-inducing stimuli. This also suggests
that musical emotions are in fact emotions and they share commonalities with emotions
induced by other stimuli.

Neural Correlates of Musical Emotions

In this section, we briefly review a number of findings from brain imaging studies
regarding affective processing of music (for more detailed accounts, see Koelsch 2013;
Koelsch et al. 2010). A number of functional imaging studies concerning the processing
of musical emotions indicate the involvement of the amygdala (e.g., Ball et al. 2007;
Blood and Zatorre 2001; Eldar et al. 2007; Fritz and Koelsch 2005; Koelsch et al. 2006).
According to a number of studies, the amygdala is involved in the processing of both
pleasant and unpleasant musical stimuli.
Several studies have found activations in the ventral striatum (associated with the
reward and experience of pleasure) while listening to pleasant music (e.g., Blood
and Zatorre 2001; Koelsch et al. 2006; Menon and Levitin 2005). The ventral striatum
is involved in selecting and rewarding behavior in response to incentive stimuli.
The nucleus accumbens (NAc), which is a part of the ventral striatum, reflects dopa-
minergic activity, and is a part of the reward network (sensitive to rewarding stimuli
such as food, sex, and money; Sescousse and colleagues 2013). Another study found that
intense pleasure in response to music can lead to increased dopamine release in the
striatum (Salimpoor et al. 2011).
Moreover, a number of brain imaging studies on emotional processing of music
found involvement of the structures that are associated with memory processes in the
brain (i.e., the hippocampus, the parahippocampal gyrus, and the temporal poles; Fritz
and Koelsch 2005; Koelsch et al. 2006). Koelsch and colleagues (2010) suggested that
involvement of these structures that are associated with memory processes can also be
linked to emotional processing.
The activity of the neural structures that are involved in autonomic (i.e., the anterior
cingulate cortex; ACC) and endocrine system activity (i.e., the insular cortex) can be
influenced by musical stimuli (see, Koelsch 2011; Koelsch 2014). As discussed earlier,
music can induce changes in physiological arousal that is mainly related to autonomic
(e.g., heart rate, perspiration, pupil dilation, respiration) and endocrine system activity
(Hodges 2010; Koelsch 2011). Nevertheless, the ACC and the insular cortex activity are
not necessarily related to emotion processing. Taken together, the evidence reviewed in
this section indicates that music can activate brain structures within networks related to
Sound and Emotion 381

emotional, reward, and memory processes, as well as the structures related to autonomic
and endocrine system activity. Therefore, it is not difficult to understand why music is
such a special construct for human societies.

Emotional Influences on Sound

Perception and Auditory Attention

A growing body of empirical evidence suggests that the affective salience of external
stimuli provides invaluable cues for allocation of attentional resources and enhances
perception possibly via fast neural routes to sensory processing areas in the brain. One
of the main arguments is that emotional stimuli form a special group of high-salient
stimuli that are prioritized in sensory processing often at the expense of emotionally
neutral stimuli. In other words, people readily pay more attention to emotional signals
in comparison to neutral signals. Most of the studies concerning the impact of emotional
processes on attention and perception comes from the visual modality (e.g., Vuilleumier
2005; Vuilleumier and Driver 2007; Yiend 2010). Although comparable evidence in the
auditory modality is scarce, it seems to be accumulating. Here, we review evidence from
human behavioral and brain imaging studies on how affective sounds can modulate
perceptual and attentional processes.
In a change detection experiment, we found that the affective significance of individual
sounds in a complex auditory scene guides auditory attention (Asutay and Västfjäll
2014). Participants listened to two complex auditory scenes (each consisting of six
simultaneous environmental sounds), and indicated whether the two scenes were
identical or there was a change. Changes took the form of sound replacement (i.e., one
sound was replaced by another). Detection accuracy was higher when the changed
stimuli were emotionally negative and arousing compared to neutral. In addition, there
was an overall increase in perceptual sensitivity for trials in which the unchanged events
were negative. These findings suggest that the emotional salience of sounds guides
attentional resources in a complex environment and that the presence of an emotionally
negative and arousing environment can lead to an overall decrease in auditory atten-
tional thresholds.
Furthermore, using an aversive conditioning paradigm, we found that affective learning
not only modulates the affective significance of the CS but also can alter loudness
perception (Asutay and Västfjäll 2012). In this experiment, participants went through a
conditioning session, in which a CS (CS+; bandpass noise) was consistently paired with
a US (a vibratory shock delivered to the chair participants sat on). They were also exposed
to a control stimulus (CS−) that was not associated with the US. Sounds were bandpass
noise at different frequencies, and CS+ and CS− assignments were counterbalanced
among participants. After conditioning, CS+ was reported as being more fearful and
382 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

negative and perceived as louder compared to CS−. Another recent study also found that
negative emotion can influence loudness perception (Siegel and Stefanucci, 2011). They
used a mood-induction technique to induce negative affect in half of the participants
and neutral affect in the rest. Participants then listened to auditory stimuli and performed
loudness judgments. People in the negative affect group perceived the auditory stimuli as
being louder compared to those in the neutral affect group.
In our laboratory, we have also investigated the effect of emotional salience of sounds
on auditory spatial attention (Asutay and Västfjäll, 2015a, 2015b). Using a covert spatial
orienting paradigm, we found that negative sounds provide exogenous cues to orient
auditory spatial attention to a particular region of space where they originate (Asutay
and Västfjäll 2015b). The auditory stimuli in the experiment were environmental sounds
with inherent meaning.
Neural models explaining the influence of emotion on other processes place the
amygdala in a central position (e.g., LeDoux 2012; Phelps 2006; Pourtois et al. 2013). The
amygdala seems to receive information regarding the affective salience of external stim-
uli early in the processing and, through its fast neural routes to sensory cortical regions,
it can modulate perceptual and attentional processing; that is, it can induce transient
changes in attentional thresholds in the presence of emotional stimuli. (Phelps 2006;
Phelps and LeDoux 2005). Emotional information can also modulate neural activity in
regions associated with attentional control that can modulate the impact of selective
attention on sensory processing (Domínguez-Borràs and Vuilleumier 2013). Apart from
this, the amygdala has direct projections to neuromodulatory systems (e.g., cholinergic,
adrenergic, dopaminergic) that are capable of modulating perceptual and attentional
processes. Cholinergic nuclei located in the basal forebrain receive input from the amyg-
dala, and they can release acetylcholine to widespread cortical areas. Activation of the
cholinergic system can facilitate neural excitability in sensory areas and is argued to be
central in learning-induced changes in the auditory cortex (Weinberger 2010). The cen-
tral amygdala also projects to the locus coeruleus (LC) in the brain stem, which is a part
of the noradrenergic system. The LC sends noradrenaline inputs to widespread cortical
areas to regulate arousal and autonomic functions. Activation of the noradrenergic sys-
tem can facilitate sensory processing, enhance cognitive flexibility, and promote vigilant
attentional shifting in the presence of significant sensory stimuli (Corbetta et al. 2008;
Sara and Bouret 2012). In general, the presence of emotionally significant stimuli can
activate the neuromodulatory systems that, in turn, can regulate the activity in the brain
regions that are involved in active information processing. Although most evidence on
the effect of these neuromodulatory systems relies on animal models, a few human studies
exist (Hermans et al. 2011; Thiel et al. 2002; Weis et al. 2012). In conclusion, it seems that
processing of emotionally significant stimuli is enhanced via several gain control mecha-
nisms (direct influence on sensory processing and attentional thresholds, and indirect
influence of modulatory systems) that are mediated by a large brain network centered
around the amygdala.
Sound and Emotion 383

Concluding Remarks

In this chapter, we have focused on the relationship between sound and emotion: how
acoustic stimuli induce affective reactions in listeners and how the affective significance
of sounds influences the way we perceive and attend to them. We reviewed human
behavioral and neuroimaging studies concerning learning-induced emotional reactions,
vocal emotional signals, and music. Our main aims here were to illustrate the close relation-
ship between affective and auditory processes and to state that affective experience is an
integral part of auditory perception.
First, viewing the auditory system as an adaptive network specialized in processing
acoustic stimuli indicates that the affective and motivational significance of auditory
stimuli influences both the way they are processed and our reactions to them. It also
makes intuitive sense when we consider the function of the auditory system that scans
our surroundings, detects potentially relevant targets, and signals for attention-shifts to
salient objects when necessary. In that respect, it functions as an adaptive warning
system. Hence, emotional, motivational, and attentional state of the organism are taken
into account while complex auditory input is processed and analyzed.
Next, conditioning studies have shown that as the emotional significance of an
auditory stimulus changes through learning, the representation of that particular sound
in the auditory system will develop specific neural plasticity. Thus, emotional significance
of sounds can lead to biases in auditory processing and adapt the system to be more
attentive and tuned to significant events. This conclusion is also very much in line with
the adaptive capacity of the auditory system. In addition, empirical evidence and neural
models show how emotional significance of auditory stimuli can effectively rewire the
neural structure so that affective stimuli receive priority during sensory processing.
Taken together, the findings reviewed here point to the close relationship between affec-
tive and auditory processes.
Furthermore, music can influence both the autonomic nervous system and immune
system activity and activate serotonergic and dopaminergic modulatory systems. Music
can also induce and regulate emotions through various psychological mechanisms,
most of which are common to other emotion inducing stimuli. Empirical evidence
suggests that musical stimuli can consistently activate the main neural structures of
affective processing. Emotional signals in music activate brain structures within the
networks related to emotional, reward, and memory processes, as well as the structures
related to autonomic and endocrine system activity.
In addition, mental representations that are evoked by auditory stimuli might also
influence emotional reactions elicited during sound perception. We argue that these
mental representations depend on the situational context, the listener, and the stimulus
itself. Evoked mental representations that are related to the sound source and its mean-
ing induce emotional reactions while listening to environmental sounds. On the other
384 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

hand, visual imagery evoked by the acoustic features of sound might have a completely
different nature. For instance, musical stimuli can induce visual imagery and episodic
memory, both of which have an impact on emotional experience while listening to
music. The former seems to be influenced by the structure of music, while the latter is
the retrieval of a memory with emotional significance by music.
In conclusion, we argue that auditory perception is central to most interactions we
have with our surroundings. Sounds have great potential to communicate biologically
significant emotional information, such as vocal signals and music, which modulates
both the sensory processing in the brain and the behavioral outcomes. Finally, considering
the high adaptational capacity of the auditory system, we claim that emotional experience
is integral to sound perception.

References
Ahveninen, J., M. Hämäläinen, I. P. Jääskeläinen, S. P. Ahlfors, S: Huang, F. H. Lin, et al. 2011.
Attention-Driven Auditory Cortex Short-Term Plasticity Helps Segregate Relevant Sounds
from Noise. Proceedings of the National Academy of Sciences 108: 4182–4187.
Ahveninen, J., N. Kopco, and I. P. Jääskeläinen. 2014. Psychophysics and Neuronal Basis of
Sound Localization in Humans. Hearing Research 307: 86–97.
Altenmüller, E., R. Kopiez, and O. Grewe. 2013. A Contribution to the Evolutionary Basis of
Music: Lessons from the Chill Response. In Evolution of Emotional Communication, edited by
E. Altenmüller, S. Schmidt, and E. Zimmermann, 313–335. Oxford: Oxford University Press.
Altenmüller, E., S. Schmidt, and E. Zimmermann. 2013. Evolution of Emotional Communication.
Oxford: Oxford University Press.
Armony, J. L., and J. LeDoux. 2010. Emotional Responses to Auditory Stimuli. In The Oxford
Handbook of Auditory Science: The Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer,
479–505. New York: Oxford University Press.
Arnott, S. R., and C. Alain. 2011. The Auditory Dorsal Pathway: Orienting Vision. Neuroscience
and Biobehavioral Reviews 35: 2162–2173.
Asutay, E. 2014. Emotional Influences on Auditory Perception and Attention. Doctoral disser-
tation. Chalmers University of Technology, Sweden.
Asutay, E., and D. Västfjäll. 2012. Perception of Loudness is Influenced by Emotion. PLoS One
7: e388660.
Asutay, E., and D. Västfjäll. 2014. Emotional Bias in Change-Deafness in Multisource Auditory
Environments. Journal of Experimental Psychology: General 143: 27–32.
Asutay, E., and D. Västfjäll. 2015a. Attentional and Emotional Prioritization of Sounds
Occurring Outside the Visual Field. Emotion 15: 281–286.
Asutay, E., and D. Västfjäll. 2015b. Negative Emotion Provides Cues for Orienting Auditory
Spatial Attention. Frontiers in Psychology 6: 618.
Asutay, E., D. Västfjäll., A. Tajadura-Jiménez, A. Genell, P. Bergman, and M. Kleiner. 2012.
Emoacoustics: A Study of the Psychoacoustical and Psychological Dimensions of Emotional
Sound Design. Journal of the Audio Engineering Society 60: 21–28.
Bachorowski, J. A., and M. J. Owren. 2008. Vocal Expressions of Emotion. In Handbook of
Emotions, 3rd ed., edited by M. Lewis, J. M. Havilland-Jones, and L. F. Barrett, 211–234.
New York: Guilford Press.
Bajo, V. M., F. R. Nodal, D. R. Moore, and A. J. King. 2010. The Descending Corticocollicular
Pathway Mediates Learning-Induced Auditory Plasticity. Nature Neuroscience 13: 253–260.
Sound and Emotion 385

Ball, T., B. Rahm, S. Eickhoff, A. Schulze-Bonhage, O. Speck, and I. Mutschler. 2007. Response
Properties of Human Amygdala Subregions: Evidence Based on Functional MRI Combined
with Probabilistic Anatomical Maps. PLoS One 3: 307.
Barrett, L. F. 2006. Solving the Emotion Paradox: Categorization and the Experience of
Emotion. Personality and Social Psychology Review 10: 20–46.
Belin, P., R. J. Zatorre, P. Lafallie, P. Ahad, and B. Pike. 2000. Voice-Selective Areas in Human
Auditory Cortex. Nature 403: 309–312.
Blauert, J. 1997. Spatial Hearing. Rev. ed. Cambridge, MA: MIT Press.
Blood, A. J., and R. Zatorre. 2001. Intensely Pleasurable Responses to Music Correlate with
Activity in Brain Regions Implicated in Reward and Emotion. Proceedings of the National
Academy of Sciences 98: 11818–11823.
Bradley, M. M., and P. J. Lang. 2000. Affective Reactions to Acoustic Stimuli. Psychophysiology
49, 204–215.
Bregman, A. 1999. Auditory Scene Analysis: The Perceptual Organization of Sound. 2nd ed.
London: MIT Press.
Brück, C., B. Kreifelts, T. Ethofer, and D. Wildgruber. 2013. Emotional Voices. In The Cambridge
Handbook of Human Affective Neuroscience, edited by J. Armony and P. Vuilleumier, 265–285.
New York: Cambridge University Press.
Clarke, E. F. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. New York: Oxford University Press.
Conrad, N. J., Malina, M., and Münzel, S. C. 2009. New Flutes Document the Earliest Tradition
in Southwestern Germany. Nature 460: 737–740.
Corbetta, M., G. Patel, and G. L. Schulman. 2008. The Reorienting System of the Human
Brain: From Environment to Theory of Mind. Neuron 58: 306–324.
Dahmen, J. C., P. Keating, F. R. Nodal, A. L. Schulz, and A. J. King. 2010. Adaptation to
Stimulus Statistics in the Perception and Neural Representation in Auditory Space. Neuron
66: 937–948.
Darwin, C. (1879) 2004. The Descent of Man. London: Penguin.
De Houwer, J., S. Thomas, and F. Baeyens. 2001. Associative Learning of Likes and Dislikes:
A Review of 25 Years of Research on Human Evaluative Conditioning. Psychological Bulletin
127: 853–869.
Delgado, M. R., A. Olsson, and E. A. Phelps. 2006. Extending Animal Models of Fear
Conditioning to Humans. Biological Psychology 73: 39–48.
Domínguez-Borràs, J., and P. Vuilleumier. 2013. Affective Biases in Attention and Perception.
In The Cambridge Handbook of Human Affective Neuroscience, edited by J. Armony and
P. Vuilleumier, 331–356. New York: Cambridge University Press.
Domjan, M. 2005. Pavlovian Conditioning: A Functional Perspective. Annual Review of
Psychology 56: 179–206.
Eldar, E., O. Ganor, R. Admon, A. Bleich, and T. Hendler. 2007. Feeling the World: Limbic
Response to Music Depend on Related Content. Cerebral Cortex 17: 2828–2840.
Fastl, H., and E. Zwicker. 2007. Psychoacoustics: Facts and Models. Berlin: Springer.
Fecteau, S., P. Belin, Y. Joanette, and J. L. Armony. 2007. Amygdala Responses to Nonlinguistic
Emotional Vocalizations. Neuroimage 36: 480–487.
Frijda, N. 2008. The Psychologists’ Point of View. In Handbook of Emotions, 3rd ed., edited by
M. Lewis, J. M. Havilland-Jones, and L. F. Barrett, 68–87. New York: Guilford Press.
Fritz, J. B., M. Elhilali, S. V. David, and S. A. Shamma. 2007. Auditory Attention: Focusing the
Searchlight on Sound. Current Opinion in Neurobiology 17: 1–19.
Fritz, T., and S. Koelsch. 2005. Initial Response to Pleasant and Unpleasant Music: An fMRI
Study (Poster). NeuroImage 26 (Suppl.), 271.
386 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

Gaver, W. W. 1993. What Do We Hear in the World? An Ecological Approach to Auditory

Event Perception. Ecological Psychology 5: 1–29.
Gerdes, A. B. M., M. J. Wieser, and G. W. Alpers. 2014. Emotional Pictures and Sounds:
A Review of Multimodal Interactions of Emotion Cues in Multiple Domains. Frontiers in
Psychology 5: 1351.
Grandjean, D., and S. Frühholz. 2013. An Integrative Model of Brain Processes for the
Decoding of Emotional Prosody. In Evolution of Emotional Communication, edited by
E. Altenmüller, S. Schmidt, and E. Zimmermann, 211–228. Oxford: Oxford University Press.
Gruthers, K. G., and J. M. Groh. 2012. Sounds and Beyond: Multisensory and Other Non-
Auditory Signals in the Inferior Colliculus. Frontiers in Neural Circuits 6: 96.
Hermans, E. J., H. J. F. van Marle, L. Ossewaarde, M. J. A. G. Henckens, S, Qin, M. T. R. van
Kesteren, et al. 2011. Stress-Related Noradrenergic Activity Prompts Large-Scale Neural
Network Reconfiguration. Science 334: 1151–1153.
Hartmann, W. M., L. Dunai, and T. Qu. 2013 Interaural Time Difference Thresholds as a
Function of Frequency. In Basic Aspects of Hearing, edited by B. C. J. Moore, R. D. Patterson,
I. M. Winter, R. P. Carlyon, and H. E. Gockel, 239–246. New York: Springer.
Hodges, D. A. 2010. Psychophysiological Measures. In Handbook of Music and Emotion,
edited by P. N. Juslin and J. A. Sloboda, 279–312. New York: Oxford University Press.
Hsee, C. K., Y. Tu, Z. Y. Lu, and B. Ruan. 2014. Approach Aversion: Negative Hedonic Reactions
toward Approaching Stimuli. Journal of Personality and Social Psychology 106: 699–712.
Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136:
302–329.
Huron, D., and E. H. Margulis. 2010. Musical Expectancy and Thrills. In Handbook of
Music and Emotion, edited by P. N. Juslin and J. A. Sloboda, 575–604. New York: Oxford
University Press.
Johnson-Laird, P. N., and K. Oatley. 2008. Emotions, Music, and Literature. In Handbook of
Emotions, 3rd ed., edited by M. Lewis, J. M. Havilland-Jones, and L. F. Barrett, 102–113.
New York: Guilford Press.
Juslin, P. 2013. From Everyday Emotions to Aesthetic Emotions: Towards a Unified Theory of
Musical Emotions. Physics of Life Reviews 10: 235–266.
Juslin, P., and D. Västfjäll. 2008. Emotional Responses to Music: The Need to Consider
Underlying Mechanisms. Behavioral and Brain Sciences 31: 559–621.
Juslin, P. N., S. Liljeström, D. Västfjäll, and L. O. Lundqvist. 2010. How Does Music Evoke
Emotions? Exploring the Underlying Mechanisms. In Handbook of Music and Emotion,
edited by P. N. Juslin and J. A. Sloboda, 605–642. New York: Oxford University Press.
Koelsch, S. 2011. Toward a Neural Basis of Music Perception: A Review and Updated Model.
Frontiers in Psychology 2: 110.
Koelsch, S. 2013. Emotion and Music. In The Cambridge Handbook of Human Affective
Neuroscience, edited by J. Armony and P. Vuilleumier, 286–303. New York: Cambridge
University Press.
Koelsch, S. 2014. Brain-Correlates of Music-Evoked Emotions. Nature Reviews Neuroscience
15: 170–180.
Koelsch, S., T. Fritz, D. Y. von Cramon, K. Müller, and A. D. Frederici. 2006. Investigating
Emotion with Music: an fMRI Study. Human Brain Mapping 27: 239–250.
Koelsch, S., W. A. Siebel, and T. Fritz. 2010. Functional Neuroimaging. In Handbook of Music and
Emotion, edited by P. N. Juslin and J. A. Sloboda, 313–344. New York: Oxford University Press.
Konecni, V. J. 2003. Review of Music and Emotion: Theory and Research. Music Perception
20: 332–341.
Sound and Emotion 387

Kotz, S. A., A. S. Hasting, and S. Paulmann. 2013. On the Orbito-Striatal Interface in Acoustic
Emotional Processing. In Evolution of Emotional Communication, edited by E. Altenmüller,
S. Schmidt, and E. Zimmermann, 229–240. Oxford: Oxford University Press.
LaBelle, B. 2007. Background Noise: Perspectives on Sound Art. New York: Continuum.
Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago: University of Chicago Press.
Lang, P. J., and M. M. Bradley. 2010. Emotion and the Motivational Brain. Biological Psychology
84: 437–450.
LeDoux, J. 2012. Rethinking the Emotional Brain. Neuron 73: 653–676.
Lee, C. C., and J. C. Middlebrooks. 2011. Auditory Cortex Spatial Sensitivity Sharpens during
Task Performance. Nature Neuroscience 14: 108–114.
Lehmann, A., and M. Schönweisner. 2014. Selective Attention Modulates Human Auditory
Brainstem Responses: Relative Contributions of Frequency and Spatial Cues. PLoS
One 9 (e85442).
Maier, J. X., and A. A. Ghazanfar. 2007. Looming Biases in Monkey Auditory Cortex. Journal
of Neuroscience 27: 4093–4100.
Malmierca, M. S. 2005. The Inferior Colliculus: A Center for Convergence of Ascending and
Ascending Auditory Information. Neuroembryology and Ageing 3: 215–229.
Marsh, R. A., Z. M. Fuzessery, C. D. Grose, and J. J. Wenstrup. 2002. Projection to the
Inferior Colliculus from the Basal Nucleus of the Amygdala. Journal of Neuroscience
22: 10449–10460.
Menon, V., and D. J. Levitin. 2005. The Rewards of Music Listening: Response and Physiological
Connectivity of the Mesolimbic System. NeuroImage 28: 175–184.
Meyer, L. B. 1956. Emotion and Meaning in Music. Chicago: Chicago University Press.
Moore, B. C. J. 2012. An Introduction to the Psychology of Hearing. 6th ed. London:
Academic Press.
Moore, B. C. J., and H. E. Gockel. 2012. Properties of Auditory Stream Formation. Philosophical
Truncations of the Royal Society B 367: 919–931.
Moors, A. 2010. Theories of Emotion Causation: A Review. In Cognition and Emotion: Review
of Current Research and Theories, edited by J. de Houwer and D. Hermans, 1–37. New York:
Psychology Press.
Neuhoff, J. G. 2004. Ecological Psychoacoustics. Boston, MA: Elsevier Academic Press.
Ohl, F. W., and H. Scheich. 2005. Learning-Induced Plasticity in Animal and Human Auditory
Cortex. Current Opinion in Neurobiology 15: 470–477.
Olsson, A., and E. A. Phelps. 2004. Learned Fear of “Unseen” Faces after Pavlovian,
Observational, and Instructed Fear. Psychological Science 15: 822–828.
Owren, M. J., and D. Rendall. 2001. Sound on the Rebound: Bringing Form and Function
Back to the Forefront in Understanding Nonhuman Primate Vocal Signaling. Evolutionary
Anthropology 10: 58–71.
Owren, M. J., M. Phillip, E. Vanman, N. Trivedi, A. Schulman, and J. Bachorowski. 2013.
Understanding Spontaneous Human Laughter: The Role of Voicing in Inducing Positive
Emotion. In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt,
and E. Zimmermann, 175–190. Oxford: Oxford University Press.
Patel, A. 2010. Music, Biological Evolution, and The Brain. In Emerging Disciplines, edited by
M. Bailar, 91–144. Houston, TX: Houston University Press.
Petkov, C. I., X. Kang, K. Alho, O. Bertrand, E. W. Yund, and D. L. Woods. 2004. Attentional
Modulation of Human Auditory Cortex. Nature Neuroscience 7: 685–663.
Phelps, E. A. 2006. Emotion and Cognition: Insights from Studies of the Human Amygdala.
Annual Review of Psychology 57: 27–53.
388 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

Phelps, E. A., and J. LeDoux. 2005. Contributions of the Amygdala to Emotion Processing:
From Animal Models to Human Behavior. Neuron 48: 175–187.
Pourtois, G., A. Schettino, and P. Vuilleumier. 2013. Brain Mechanisms for Emotional
Influences on Perception and Attention: When Is Magic and What Is Not. Biological
Psychology 92: 492–512.
Read, H. L., J. A. Winer, and C. E. Schreiner. 2002. Functional Architecture of Auditory
Cortex. Current Opinion in Neurobiology 12: 433–440.
Rees, A., and A. R. Palmer. 2010. The Oxford Handbook of Auditory Science: The Auditory
Brain, Vol. 2. New York: Oxford University Press.
Rescorla, R. A. 1998. Pavlovian Conditioning: It’s Not What You Think It Is. American
Psychologist 43: 151–160.
Russell, J. A. 2009. Emotion, Core Affect, and Psychological Construction. Cognition and
Emotion 23: 1259–1283.
Russell, J. A., J. A. Bachorowski, and J. M. Fernandez-Dols. 2003. Facial and Vocal Expressions
of Emotion. Annual Review of Psychology 54: 329–349.
Salimpoor, V., M. Benovoy, K. Larcher, A. Dagher, and R. Zattere. 2011. Anatomically
Distinct Dopamine Release during Anticipation and Experience of Peak Emotion Music.
Nature Neuroscience 14: 257–262.
Salminen, N. H., J. Aho, and M. Sams. 2013. Visual Task Enhances Spatial Selectivity in the
Human Auditory Cortex. Frontiers in Neuroscience 7: 44.
Sander, D., J. Grafman, and T. Zalla. 2003. The Human Amygdala: An Evolved System for
Relevance Detection. Reviews in the Neurosciences 14: 303–316.
Sander, K., A. Brechman, and H. Scheich. 2003. Audition of Laughing and Crying Leads to
Right Amygdala Activation in a Low-Noise fMRI Setting. Brain Research Protocols 11: 81–91.
Sander, K., and H. Scheich. 2005. Left Auditory Cortex and Amygdala, but Right Insula
Dominance for Human Laughing and Crying. Journal Cognitive Neuroscience 17: 1519–1531.
Sara, S. J., and S. Bouret. 2012. Orienting and Reorienting: The Locus Coeruleus Mediates
Cognition through Arousal. Neuron 76: 130–141.
Scherer, K. R. 2003. Why Music Does Not Produce Basic Emotions: A Plea for a New Approach
to Measuring Emotional Effects of Music. In Proceedings of the Stockholm Music Acoustics
Conference 2003, edited by R. Bresin, 25–28. Stockholm, Sweden: Royal Institute of Technology.
Scherer, K. R. 2009. Emotions and Emergent Processes: They Require a Dynamic Computational
Architecture. Philosophical Transactions of the Royal Society B 364: 3459–3474.
Schirmer, A., and S. A. Kotz. 2006. Beyond the Right Hemisphere: Brain Mechanisms
Mediating Vocal Emotional Processing. Trends in Cognitive Science 10: 24–30.
Seifritz, E., J. G. Neuhoff, D. Bilecen, K. Scheffler, H. Mustovic, H. Schächinger, et al. 2002.
Neural Processing of Auditory Looming in the Human Brain. Current Biology 12: 2147–2151.
Sescousse, G., X. Caldú, B. Segura, and J. C. Dreher. 2013. Processing Primary and Secondary
Rewards: A Quantitative Meta-Analysis and Review of Human Functional Neuroimaging
Studies. Neuroscience and Biobehavioral Reviews 37: 681–696.
Shinn-Cunningham, B. G. 2008. Object-Based Auditory and Visual Attention. Trends in
Cognitive Sciences 12: 182–186.
Siegel, E. H., and J. K. Stefanucci. 2011. A Little Bit Louder Now: Negative Affect Increases
Perceived Loudness. Emotion 11: 1006–1011.
Snowdon, C. T., and D. Teie. 2013. Emotional Communication in Monkeys: Music to Their
Ears? In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt, and
E. Zimmermann, 133–151. Oxford: Oxford University Press.
Sound and Emotion 389

Sörqvist, P., S. Stenfelt, and J. Rönnberg. 2012. Working Memory Capacity and Visual-Verbal
Cognitive Load Modulate Auditory-Sensory Gating in the Brainstem: Toward a Unified
View of Attention. Journal of Cognitive Neuroscience 24: 2147–2154.
Stecker, G. C., I. A. Harrington, and J. C. Middlebrooks. 2005. Location Coding by Opponent
Neural Populations in The Auditory Cortex. PLoS Biology 3: 78.
Tajadura-Jiménez, A. 2008. Embodied Psychoacoustics: Spatial and Multisensory Determinants
of Auditory-Induced Emotion. Doctoral dissertation. Chalmers University of Technology,
Sweden.
Tajadura-Jiménez, A., P. Larsson, A. Väljamäe, D. Västfjäll, and M. Kleiner. 2010b. When
Room Size Matters: Acoustic Influences on Emotional Responses to Sounds. Emotion
10: 416–422.
Tajadura-Jiménez, A., A. Väljamäe, E. Asutay, and D. Västfjäll. 2010a. Embodied Auditory
Perception: The Emotional Impact of Approaching and Receding Sounds. Emotion
10: 216–229.
Tajadura-Jiménez, A., and D. Västfjäll. 2008. Auditory-Induced Emotion: A Neglected
Channel for Communication in Human-Computer Interaction. In Affect and Emotion in
Human-Computer Interaction: From Theory to Applications, edited by C. Peter and R. Beale,
63–74. Berlin/Heidelberg: Springer-Verlag.
Thiel, C. M., K. J. Friston, and R. J. Dolan. 2002. Cholinergic Modulation of Experience-
Dependent Plasticity in Human Auditory Cortex. Neuron 35: 567–574.
Västfjäll, D. 2012. Emotional Reactions to Sounds without Meaning. Psychology 3: 606–609.
Vuilleumier, P. 2005. How Brains Beware: Neural Mechanisms of Emotional Attention. Trends
in Cognitive Sciences 9: 585–594.
Vuilleumier, P., and J. Driver. 2007. Modulation of Visual Processing by Attention and
Emotion: Windows on Causal Interactions between Human Brain Regions. Philosophical
Transactions of the Royal Society B 362: 837–855.
Wang, X., and D. Bendor. 2010. Pitch. In The Oxford Handbook of Auditory Science: The
Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 149–172. New York: Oxford
University Press.
Weinberger, N. M. 2004. Specific Long-Term Memory Traces in Primary Auditory Cortex.
Nature Reviews: Neuroscience 5: 279–290.
Weinberger, N. M. 2010. The Cognitive Auditory Cortex. In The Oxford Handbook of Auditory
Science: The Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 441–478. New York:
Oxford University Press.
Weis, T., S. Puschmann, A. Brechmann, and C. M. Thiel. 2012. Effects of L-dopa during
Auditory Instrumental Learning in Humans. PLoS One 7: e52504.
Wiethoff, S., D. Wildgruber, W. Grodd, and T. Ethofer. 2009. Response and Habitation of the
Amygdala during Processing of Emotional Prosody. Neuroreport 20: 1356–1360.
Winer, J. A., and C. C. Lee. 2007. The Distributed Auditory Cortex. Hearing Research 229: 3–13.
Woods, D. L., T. J. Herron, A. D. Cate, E. W. Yund, G. C. Stecker, T. Rinne, et al. 2010. Functional
Properties of Human Auditory Cortical Fields. Frontiers in Systems Neuroscience 4: 155.
Woods, D. L., G. C. Stecker, T. Rinne, T. J. Herron, A. D. Cate, E. W. Yund, et al. 2009.
Functional Maps of Human Auditory Cortex: Effects of Acoustic Features and Attention.
PLoS One 4 (e5183).
Xiao, Z., and N. Suga. 2005. Asymmetry in Cortigofugal Modulation of Frequency-Tuning
in Moustached Bat Auditory System. Proceedings of the National Academy of Sciences
102: 19162–19167.
390 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

Yiend, J. 2010. The Effects of Emotion in Attention: A Review of Attentional Processing of

Emotional Information. Cognition and Emotion 24: 3–47.
Yin, T. C. T. and S. Kuwada. 2010. Binaural Localization Cues. In The Oxford Handbook of
Auditory Science: The Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 271–302.
New York: Oxford University Press.
Young, E. D. 2010. Level and Spectrum. In The Oxford Handbook of Auditory Science: The
Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 93–124. New York: Oxford
University Press.
Zatorre, R. J., and A. R. Halpern. 2005. Mental Concerts: Musical Imagery and Auditory
Cortex. Neuron 47: 9–12.
Zeskind, P. S. 2013. Infant Crying and the Synchrony of Arousal. In Evolution of Emotional
Communication, edited by E. Altenmüller, S. Schmidt, and E. Zimmermann, 155–176.
Oxford: Oxford University Press.
Chapter 19

Volu n ta ry Au ditory
Im agery a n d M usic
Pedag ogy

Andrea R. Halpern and Katie Overy

Introduction

Auditory imagery is a common everyday experience. People are able to imagine the
sound of waves crashing on a beach, the voice of a famous movie actor, or the melody of
a familiar song or TV theme tune. Although people vary in the extent to which they
report these as vivid experiences, on average they rate vividness of imagined sounds at
the upper end of rating scales such as the Bucknell Auditory Imagery Scale (Halpern 2015),
averaging about 5 on a 7-point scale, where 7 means “as vivid as actually hearing the
sound.” Imagined music can also be involuntary (Beaman and Williams 2010; Hyman
et al. 2013), or even hallucinatory (Griffiths 2000; Weinel, this volume, chapter 15) but the
focus of our discussion is on the willful calling to mind of music. Our argument here is
that, in general, auditory imagery is not just something people do when mind-wandering
or passing the time; it can have definite positive consequences in mood regulation,
self-entertainment, and mental rehearsal. More particularly, musicians, composers, and
music educators understand that auditory imagery is a tool, and they regularly employ
auditory imagery in both pedagogical and professional capacities. We suggest that this
important skill could be used more widely than it is already; for example, to enable
musicians to employ more efficient memorization skills and to rehearse both physical
and expressive aspects of performance without risking excessive motor practice.
Using imagined music to accomplish something beneficial is reported among
non-musicians as well as musicians, of course. People report voluntarily bringing music
to mind to regulate their emotional state, and they judge the emotionality of familiar
imagined music similarly to judging the emotionality of heard music (Lucas et al. 2010).
Recorded music has been shown to assist athletic performance in a variety of situations,
392 andrea r. halpern and katie overy

including keeping a steady pace during swimming (Karageorghis et al. 2013), and
imagined music can have similar benefits. One compelling example was reported by the
marathon swimmer Diana Nyad during her record-setting swim of the Straits of Cuba,
covering 110 miles in 53 hours:

Diana Nyad uses singing to help pass the time and the monotony and sensory depri-
vation inevitable in marathon swimming. To help, she sings silently from a mental
playlist of about 65 songs [including] Janis Joplin’s chart-topping version of Me and
Bobby McGee. “If I sing that 2,000 times in a row, the whole song, I will get through
five hours and 15 minutes,” Nyad said . . . . “It’s kind of stupid,” she added, “but it gets
me through.”1

For musicians, imagining a musical performance can be a useful rehearsal tool and even
a powerful experience, as expressed by violinist Romel Joseph, who was trapped under
the rubble of his music conservatory for eighteen hours after the Haiti earthquake of 2010.
This quote captures both the emotional and performance aspects of imagining music in
a most deliberate way:

He didn’t panic—instead, he kept himself to a strict schedule. He spent part of each

hour in prayer. The rest of the time he filled by rehearsing his favorite classical music
performances in his head, note by note. “For example, if I perform the Franck
sonata, which is [sic] 35 minutes long in my honors recital at Juilliard, then I would
bring myself to that time. That allows me . . . to mentally take myself out of the space
where I was.”2

Psychology Research
on Auditory Imagery

If auditory imagery had an arbitrary, or even illusory link to perceiving and performing
real music, then advocating for the increased use of imagery in musical rehearsal and
pedagogy might not make a compelling argument. However, research over the years has
suggested that both musicians and nonmusicians (i.e., those who haven’t studied musical
performance to a high level) can mentally represent a surprisingly wide range of audi-
tory characteristics of actual musical sound (see Hubbard 2010 for a comprehensive
review), in many cases using imagery very consciously and deliberately. For instance,
most individuals, including nonmusicians, can call to mind the melody of a familiar
song without any difficulty. For songs with no canonical recorded versions, people are
remarkably consistent in reproducing or choosing a similar pitch to the one they pro-
duced or chose on a prior occasion for the same song (Halpern 1989). In addition, they
are also fairly accurate in reproducing the opening pitches of well-known recordings of
voluntary auditory imagery and music pedagogy 393

music from their mental playlist (Levitin and Cooke 1996; Frieler et al. 2013) and can
usually recognize the correct pitch within two semitones (Schellenberg and Trehub 2003).
Auditory imagery also represents some of the temporal characteristics of sounded
music remarkably accurately. If asked to carry out memory tasks comparing pitches at
two nonadjacent places in a familiar melody, reaction times increase proportionally to
the distance between the notes in beats of the actual tune (Halpern 1988a) and if asked to
mentally complete a phrase of a familiar tune after a sounded cue of the opening notes,
reaction times similarly increase proportionally for longer phrases (Halpern and
Zatorre 1999). A recent study of involuntary musical imagery asked people to tap the
tempo of involuntary auditory images that occurred over a five-day period; tempos
were recorded via an accelerometer. Results showed that 77 percent of 115 reports of
episodes involving recorded music were within 15 percent of the original recorded
tempo (Jakubowski et al. 2015).
Even a multidimensional construct such as timbre is processed similarly in hearing
and imagery. Halpern and Zatorre (2004) asked people to make similarity judgments
between pairs of sounded and imagined musical instruments while undergoing fMRI
scanning. Similarity ratings in both conditions were highly correlated. Additionally, both
types of judgments involved activation of the secondary auditory cortex (judgments on
sounded but not imagined music additionally activated the primary auditory cortex).
Thus, we have some basis to conclude that mentally simulating or rehearsing music
might involve similar neural processing and thus confer some of the same benefits as
actual hearing or even production—given that production includes not only motor but
also auditory skills.
On the other hand, apart from the case of auditory hallucinations (Griffiths 2000;
Weinel, this volume, chapter 15) most people do not actually confuse imagining with
hearing, and thus we should not be surprised if there were behavioral and neural sub-
strate differences between the two. One obvious difference between the two types of
auditory experience is that auditory imagery tasks are on average more difficult than
matched perceptual tasks. For example, Zatorre and Halpern (1993) presented patients
who had undergone surgery removing part of their temporal lobes (mean age about
thirty years old) and matched controls with the text of the first line of a familiar tune,
such as Jingle Bells. Two lyrics were highlighted, as in “Dashing through the SNOW, in a
one-horse open SLEIGH.” The task was to judge whether the pitch of the second high-
lighted lyric was higher or lower in pitch than the first such lyric (the reader is invited to
try that now). In one condition, participants heard a recording of someone singing the
tune; in the other condition, they had to use mental imagery only. The right-temporal
lobectomy patients had lower performance in both conditions compared to the other
two groups, implicating the role of the right temporal lobe in pitch perception and imagery,
whereas accuracy rates for all participants were about 12–15 percent lower in the imag-
ined than heard condition. In a subsequent study with healthy young adults in which
only the two to-be-compared lyrics were presented, similar performance drops from
heard to imagined conditions were found (Zatorre and Halpern 1996).
394 andrea r. halpern and katie overy

Imagery tasks are likely to be more difficult because they involve considerable working
memory resources; indeed, Baddeley and Andrade (2000) found that working memory
(WM) performance scores correlated with self-reported vividness of both auditory and
visual imagery. WM span is also positively correlated with measures of pitch and temporal
imagery ability (Colley et al. 2017). Brain imaging studies also point to the involvement of
executive function in auditory imagery. Herholz and colleagues (2012) asked people with
a range of musical experience to listen to or imagine familiar songs, while simultaneously
viewing the lyrics in a karaoke-type video presentation. Compared to a baseline, cerebral
blood flow (measured via fMRI) in imagined and heard conditions activated perceptual
areas such as the superior temporal gyrus (STG, the locus of the secondary auditory
cortex), but imagining tunes also uniquely activated several areas associated with
higher-order planning and other executive functions, such as the supplementary motor
area (SMA), intraparietal cortex (IPS), inferior frontal cortex (IFC), and right dorso-
lateral prefrontal cortex (DLPFC) (see Figure 19.1). This additional neural activity is
interpreted as reflecting extra cognitive effort and suggests that imagery tasks are more
difficult. However, such tasks may also benefit music learning in both the short and long
term precisely because of this increased level of cognitive engagement (known as a
“desirable difficulty” in the cognitive literature), potentially leading to better encoding
and later retention (Bjork et al. 2014).
As alluded to in our opening remarks, auditory imagery is not always intentional—it can
come unbidden in the form of so-called earworms, or what is sometimes called involuntary
musical imagery (INMI). Numerous researchers have now studied this phenomenon,
documenting the incidence and phenomenology of the experience, the relationship to
personality variables, and the characteristics of the triggers and the tunes themselves that
come to mind (for example, Bailes 2006; Halpern and Bartlett 2011; Hyman et al. 2013;
Müllensiefen et al. 2014; Williamson and Jilka 2013). However, in this chapter we focus on
voluntary auditory imagery precisely because it is under the control of the individual and
thus can be harnessed and modified as needed to accomplish musical goals.

IPS DLPFC
SMA listen > imagine
2.3 5.4

imagine > listen

2.3 3.7

STG IFC and Insular

Cortex
y = –52 z = 42 x = 38

Figure 19.1 Brain areas more active in listening than imagining a familiar tune (the major
activity is labeled “STG”—orange on the companion website) and those more active in imagi
ning than listening to a familiar tune (the major activity is labeled “IPS,” “SMA,” “DLPFC,” and
“IFC”—blue on the companion website). (Reprinted with permission from Herholz et al. 2012).
voluntary auditory imagery and music pedagogy 395

Individual Differences in Auditory Imagery Abilities

Auditory imagery is sometimes separated into two distinct types of processing—
generation and transformation. Generation refers to calling a perceptual experience to
mind, that is, initiating the auditory (or visual) image. Most individuals are able to gen-
erate musical auditory images, such as the initial note of a familiar tune (Halpern 1989),
the sounds of two different instruments (Halpern and Zatorre 2004), a phrase of a melody
(Herholz et al. 2012), or even an entire minute of a fully realized classical symphony
(Lucas et al. 2010). Once an auditory image has been generated, or even during the
process, transformations can be applied to the internal representations. For instance, if
asked to imagine the familiar melody of “Happy Birthday” in a version where the third
note moves down instead of up in pitch, most people report they can do so. Other transfor-
mations can be more difficult, such as mentally reversing a just-presented tune and
answering a question about the reversal accurately (Zatorre et al. 2010).
Both of these processes can be useful in professional musical life, for example a per-
former engaging in mental rehearsal of a familiar or notated passage of music might rely
primarily on generation of the auditory image, but might also use transformation to try
out different expressive interpretations prior to executing them. Similarly, composers
and arrangers who are working with and developing musical themes can use both types
of process. There are considerable variations in both self-reported and objectively
measured abilities in these areas of auditory imagery ability, and researchers have become
quite interested of late in creating scales and measures that index these differences.
One example of a self-report measure is the Bucknell Auditory Imagery Scale (BAIS),
mentioned earlier, which has two subscales that capture self-report of generation
(Vividness) and transformation or the ability to control characteristics of the image
(Control) (Halpern 2015). An example of an item on the Vividness subscale (BAIS-V) is:
“Consider attending a choir rehearsal. [Imagine] the sound of an all-children’s choir
singing the first verse of a song.” An example of an item on the Control subscale
(BAIS-C) is: “Consider being present at a jazz club. [First imagine] the sound of a
saxophone solo. [Imagine that] the saxophone is now accompanied by a piano.” Both
subscales typically elicit a wide range of responses on a scale of 1 (“no image present”) to
7 (“as vivid as the actual sound”) scale. For instance, in the original development sample
of seventy-six undergraduates with a variety of musical training backgrounds, seventy-
four of them used at least four scale points on at least one of the scales. On average,
respondents showed a standard deviation of 1.5 in their ratings over the fourteen items
on both the BAIS-V and BAIS-C. Correlations between scores and years of musical
training was typically modest on both scales, about .3. Objectively measured auditory
imagery performance also varies considerably among people, as shown in tasks as
diverse as mentally imagining a pitch contour (Gelding et al. 2015) and recognizing or
producing a mentally reversed or transposed phrase (Greenspon et al. 2017).
More importantly for our current purpose, individual differences in self-report can
predict objectively measured task-based imagery performance, and this predictive
validity appears to vary according to whether the auditory imagery task is more loaded
396 andrea r. halpern and katie overy

on generation or transformation. So, for instance, performance on the pitch contour

task of Gelding and colleagues (2015), which requires the mental continuation of a pitch
pattern, was predicted by scores on the BAIS-V. Greenspon and colleagues (2017), on the
other hand, gave good and poor vocal pitch-matchers a suite of auditory imagery tasks that
required mental transformations of melodies. For instance, given a short target melody
(three or four notes), participants had to produce the notes in reverse order or to transpose
the segment to a new key. BAIS-C scores (but not BAIS-V scores) predicted the extent to
which performance was worse in the transformed versus exact reproduction condition.
These differences in auditory imagery self-report also predict behaviors that are more
indirectly linked with auditory imagery skill. For example, self-report of auditory vivid-
ness correlates with the extent to which undergraduate students pitch-match accurately
(Pfordresher and Halpern 2013). This points to the importance of the sensorimotor
relationship of imagining an auditory target to successful execution of the very fine vocal
movements involved in accurate singing. In the temporal domain, Colley and colleagues
(2017) asked nonmusicians to tap along to expressive piano music (Chopin, Etude op 10,
no. 3). The challenge in this task was that, being expressive, the tempo of the music
(i.e., the speed of the underlying pulse) changed frequently as the pianist accelerated
and decelerated, using rubato in the phrases. The authors measured the overall asynchrony
of the taps to the onsets of the beat (i.e., combining anticipating or lagging with respect
to the beat), as well as the extent to which the participant learned to anticipate (or predict)
the beat, over multiple trials. They also tested performance with a more temporally
regular piano piece by Mozart. BAIS-V predicted good synchronization in the Mozart
piece. On the more difficult task of tapping with the Chopin, synchronization perfor-
mance was predicted by BAIS-C, as well as by objective pitch and temporal imagery
tasks. BAIS-C and temporal imagery ability correlated with learning to anticipate the
beat, which is considered to be a measure of temporal sensitivity.
We can also see individual differences in auditory imagery self-report reflected in
functional and structural brain differences. In the study by Herholz and colleagues (2012),
where participants were imagining or hearing familiar songs, a functional connectivity
analysis showed that the right superior temporal gyrus and the right dorsolateral pre-
fontal cortex (which mediates working memory) were functionally connected when
participants were imagining familiar songs. This correlation was stronger in individuals
with higher scores on the BAIS-V. Even brain volume in some areas is larger in individuals
with more vivid imagery: Lima and colleagues (2015) measured gray matter volume as a
function of BAIS-V score and found more gray matter volume in more vivid imagers in
the supplementary motor cortex, as well as areas previously associated with auditory
imagery in the parietal and frontal lobes. This relationship held independent of age (the
age range was from 20 to 81), musical background, or short-term memory span. Indeed,
auditory imagery has been shown to vary even among trained musicians. As noted pre-
viously, in most studies using the BAIS, the correlation is only about .3 with years of
musical training (Halpern 2015), and the differences in brain connectivity shown by
Herholz and coauthors (2012) emerged even among the trained musicians in that study.
voluntary auditory imagery and music pedagogy 397

Given this range of individual differences, the potential complexity of auditory

imagery and the links with perception abilities, it seems possible that the deliberate
practice of auditory imagery might be of benefit to practicing musicians. As reviewed by
Keller (2012), auditory imagery allows for the prediction of upcoming movements by
imagining upcoming sounds, which can benefit speed and accuracy of motor sequences
such as striking piano keys. This prediction also benefits ensemble coordination, as
players must anticipate what other members of the ensemble are about to play. Highben
and Palmer (2004) found that pianists with the highest scores on aural skills tests coped
the best when auditory feedback was absent during playing from memory, presumably
because they could internally generate that feedback. Imagery is even more important
for players of non-fixed-pitch instruments where there is no direct, unequivocal
mapping between the note to be played and the finger positions on the instrument.
An in-depth interview study with elite brass players by Trusheim (1991) revealed that
many players reported reliance on auditory imagery to anticipate the movements
needed to play the next note in tune and with the desired tone. The human singing voice
may benefit most from auditory imagery, which allows planning and error correction,
and indeed good pitch-matching singing skills have been linked with both self-reported
and objectively measured auditory imagery skills (Pfordresher and Halpern 2013;
Greenspon et al. 2017).

Music Pedagogy and

Voluntary Auditory Imagery

It is perhaps unsurprising then, that several approaches to musical training involve the
explicit training of auditory imagery skills. In fact, the skill of reading music notation
and “hearing” the appropriate auditory image “in one’s head” is so commonly used in
musical performance and training that it often is not even given a particular name; its
centrality is simply assumed (much like the skill of reading a book “in one’s head” is
commonly assumed). The use of auditory imagery in expert musical performance
preparation is sometimes called “mental rehearsal” and often combined with motor and
visual imagery. Some of the first music psychologists also noted the importance of
auditory imagery, starting with the earliest published measures of musical ability
(Seashore 1919). Indeed, Carl Seashore regarded auditory imagery as the highest form of
musicianship: “[T]he most outstanding mark of the musical mind is a high capacity
for auditory imagery” (Seashore 1938, 161). It is thus useful at this point to consider the
ways in which voluntary auditory imagery has been employed in particular approaches
to music pedagogy. Although not all such approaches have either been documented or
investigated empirically, there is nevertheless a long tradition of such training, going
back decades and perhaps even centuries.
398 andrea r. halpern and katie overy

Two major classroom music education figures of the twentieth century, Zoltán
Kodály and Edward Gordon, made auditory imagery an explicit feature of their peda-
gogical approaches, referring to it as “inner hearing” or “audiation,” respectively. Perhaps
the most methodical, worked-out teaching method arose from the work of Kodály, a
Hungarian composer and professor at the Liszt Academy in the first half of the twentieth
century (Ittzés 2002). Observing that Austrian and Viennese music were held in higher
regard than Hungarian folk music during the period of the Austro-Hungarian Empire
and apparently unimpressed by both the general quality and the repertoire of urban
children’s singing in Hungary, Kodály developed an entirely new approach to classroom
music education. This new approach was based on what he considered to be the chil-
dren’s musical “mother-tongue,” that is, their musical vernacular, Hungarian folk songs,
which he collected and preserved in collaboration with Béla Bartók (Kodály 1960, 1974).
Essentially the idea, which was developed and put into practice by Kodály’s students
(Ádám 1944; Szőnyi 1974), is to begin classroom music lessons with songs already some-
what familiar to children, and through regular repetition and analysis of this familiar
repertoire, learn the fundamentals of musical knowledge such as scales, rhythm, musi-
cal notation, and sight-singing. The skills acquired can then give children direct access
to participating in and understanding the entire world of Western art music (and indeed
other music from around the globe). Integral to this process is the use of “inner hearing”
as a device to develop children’s musicianship and literacy skills. For example, a primary
school music activity might involve learning to miss out a few notes or words during
a song and to imagine them instead of singing them. To take Jingle Bells as an example
again, children might be asked to sing the whole song together, while leaving out
the words “bells” and “all the way” throughout the whole song (try it!). Not only does
this rehearse the skill of “inner hearing,” it can also be made into an enjoyable game,
and additionally, the musical structure appearing from the three repetitions of “Jingle”
(mi-mi, mi-mi, mi-so; see Figure 19.2) becomes prominent and can be “discovered”
by the children with the guidance of the teacher, leading to an understanding of
musical form.
Developing such skills to a more advanced level eventually allows older children to be
able to sight-sing one melody while imagining a countermelody (i.e., a simultaneous
melody), or imagine a familiar chord sequence (e.g., I–VI–IV–V–I) in various major
and minor keys, for example (see Figure 19.3).
The focus on repeatedly singing and analyzing familiar songs is key to the Kodály
approach, and in the context of auditory imagery it is worth noting the strong emphasis
placed on regular practice and depth of understanding. Kodály believed that the collecting

Figure 19.2 First line of the song Jingle Bells, where the word “Jingle” is sung aloud each time
and the rest of the line is imagined.
voluntary auditory imagery and music pedagogy 399

Figure 19.3 Harmonic chord sequence of I, VI, IV, V, I, first shown in the key of C major and
then in A minor, where I is the tonic chord and V is the dominant chord.

of musical experiences is more important than studying music theoretically (Kodály 1974)
and placed special emphasis on the use of relative sol-fa (i.e., naming notes according to
their position in a musical scale, rather than by their absolute pitch) and two-part sing-
ing (Kodály 1962). Baddeley and Andrade suggest that the experience of vivid imagery
requires abundant sensory information to be available from long-term memory (2000),
and Neisser (1976) has noted that imagery arises (at least in part) from schemata, based
on prior experience. Since a voluntary auditory image is self-generated, it reflects
considerable prior processing and should not be seen as an uninterpreted sensory copy
(Hubbard 2010)—mental models play an important part in musical imagery, much as
they do in music perception (Schaefer 2014). Imagery and memory are also considered
to be closely linked; mental imagery is an important component of working memory
rehearsal, for example (Baddeley and Logie 1992). Singing may also be of particular
value in the development of “inner hearing” skills because a vast amount of familiar
musical material can be brought to mind through songs, without requiring any instru-
mental expertise. It has even been shown that musicians subvocalize when performing a
notation-reading auditory imagery task (Brodsky et al. 2008), suggesting reliance on an
imagined sung version of a melody (although it must be noted that neuroimaging
studies to date have not revealed activation of the primary motor cortex during auditory
imagery) (see Zatorre and Halpern 2005).
Another point of interest regarding the Kodály approach is that it specifically aims to
develop the ability to hear, or imagine, more than one melodic line at the same time, an
ability that has recently been shown to be particularly developed in musical conductors
(Wöllner and Halpern 2016). While the extent to which such a skill involves actual
divided attention, versus rapid switching between parts, is still debated (Alzahabi and
Becker 2013), it is nevertheless clear that this ability can be trained and developed to a
high level of skill. Indeed, at an advanced level, such as an undergraduate “harmony and
counterpoint” or “stylistic composition” exam, a music student might be asked to write a
fugue in the style of Bach and a song in the style of Schubert while sitting at a desk in an
exam hall, thus relying on auditory imagery of several melodic lines and/or harmonic
progressions, as well as expert musical knowledge, in order to complete the task.
A final aspect of the Kodály approach that is rarely discussed but important to note,
is the fact that it involves group musical learning, almost always taking place in the
school or university classroom. The “inner hearing” activities thus involve what we
400 andrea r. halpern and katie overy

might describe as group auditory imagery, or “shared auditory imagery,” which can
bring a highly focused sense of shared attention when used effectively, as well as allowing
more generally for the potential benefits of group learning and social music-making
(Heyes 2013; Kirschner and Tomasello 2010; Overy and Molnar-Szakacs 2009;
Overy 2012). The idea of “shared auditory imagery” is not well documented and perhaps
warrants future research.
A second major music education figure of the twentieth century, Edward Gordon,
based in the United States, focused his own music education approach much more spe-
cifically on auditory imagery, or what he calls “audiation.” Gordon proposes that only by
understanding where a young child’s current audiation skills lie, can the child be taught
appropriately, and much of Gordon’s work focuses on an interest in the variability of this
skill in the general population and how to measure it appropriately (1987). Importantly,
Gordon extends the meaning of the word “audiation” from auditory imagery alone to
include the process of listening to music with some cognition of its structure, rather
than just sensory perception, arguing that “audiation” is part of intelligent music listen-
ing. The Gordon measures of music audiation (e.g., Gordon 1979, 1982) have become
some of the most commonly used measures of musical ability in children and are often
also used in psychology and brain imaging research (e.g., Ellis et al. 2012). Examples
of the kinds of tasks used are the melody and rhythm discrimination tests, in which
two melodies or rhythms are heard and the child or adult’s task is to determine
whether they are the same or different, a task commonly found in tests of musical ability
(e.g., Bentley 1966; Wing 1970; Overy et al. 2003; 2005). Gordon assumes that, in order
to perform this comparison task, a child must be able to hold the initial melody in mind
for a short period of time, that is, to “audiate” the short extract. This measure thus links
directly with the idea that working memory rehearsal requires mental imagery, as
outlined earlier (Baddeley and Logie 1992).
Voluntary auditory imagery is central to the Gordon concept of musical ability and is
regarded as an important aspect of musicianship and an effective learning tool in the
Kodály approach. On further analysis, there are also some interesting key elements in
common between the two approaches. For example, both approaches: (1) use physical
movement gestures in the teaching of “inner hearing” or “audiation;” (2) place strong
emphasis on what Gordon calls “notational audiation” and what Kodály calls “musical
literacy,” that is, the ability to read a musical score and hear the music in one’s head; and
(3) place a strong emphasis of the importance of teacher-training programs in these
skills. A detailed comparative analysis of the two approaches would no doubt generate
some clear focus points for future research in this area, and perhaps lead to a richer
understanding of how auditory imagery can be used, adapted, and developed in a range
of different musical and pedagogical contexts.

Rehearsal Strategies for Instrumental Performance

An entirely different area of musical training in which a highly worked out method-
ology for voluntary auditory imagery has been explicitly developed is professional
voluntary auditory imagery and music pedagogy 401

instrumental performance, or more specifically, the memorization of musical material

for piano performance. The amount of repertoire that a professional pianist needs to be
able to play accurately by memory (going back to Franz Liszt, who is reported to have
started the showmanship of performing whole piano recitals without the score, see
Hamilton 2008) is vast, involving hours of often highly technically demanding music,
which must be perfectly executed. Such performances require an extraordinary feat of
memory and can sometimes lead to extreme performance anxiety and subsequent medi-
cation (e.g., James and Savage 1984). In addition, the motivation to overlearn the material
and reduce memory slips in performance can lead to the risk of overpractice, leading to
physical strain and personal injury (e.g., Rosety-Rodriguez et al. 2003). The use of
voluntary auditory imagery or what is more usually referred to as “mental rehearsal” or
“mental practice” to prepare for such concerts is commonly recommended, and has
been shown empirically to be effective (e.g., Bernardi et al. 2013) but is rarely specifically
trained, even at conservatory level (Clark and Williamon 2011).
One method developed to help prepare expert pianists for performance is that of
Nelly Ben-Or, a concert pianist based in London and professor of the Alexander
Technique at the Guildhall School of Music. Ben-Or combines concepts from the
Alexander Technique, in which body imagery strategies are applied during preparation
for action (McEvenue 2002), with auditory, visual, and motor imagery strategies, devel-
oping what she calls “techniques of mental representation.” Using these multimodal
techniques of mental representation, an entire musical piece (or large section, depend-
ing on the scale of the piece) is imagined and memorized away from the piano, prior
to physical rehearsal. Regular piano practice thus becomes a mix of mental imagery
rehearsal and physical rehearsal, with the aim that physical rehearsal only takes place
once imagined recall is fluent, thus limiting the possibility that any physical or technical
performance constraints will impose restraints on what is imagined and ultimately
performed and ideally allowing for more musical expression and flexibility.
In an observational, ethnographic study of eleven pianists training in Nelly Ben Or’s
method, Davidson-Kelly and colleagues (Davidson-Kelly 2014; Davidson-Kelly et al. 2015)
described the process as acquiring “total inner memory” prior to performance and
noted that the ability to understand a musical score is crucial to the success of the
approach, requiring adequate theoretical knowledge as well as aural and technical skills.
This relates back to the idea of mental imagery requiring strong schematic and prior
knowledge, as mentioned previously (e.g., Neisser 1976; Hubbard 2010). In addition, the
method assumes a level of skill in which the performer is not overly hampered
by technical difficulties—if a piece requires rapid finger movements or large leaps,
these are assumed to be mostly within the technical capabilities of the performer.
Interviews with pianists training with Nelly Ben-Or revealed that the method was
perceived by most participants to be effortful and challenging, but extremely effective
at increasing awareness of the nuances of a piece and consolidating memory: “[I]t is
really difficult to change 17 years habit,” “It requires enormous . . . time, effort and concen-
tration,” “I am more secure and have less memory mistakes [sic],” “These pieces stay
in my head and I can refresh my memory very quickly. They are very reliable”
(Davidson-Kelly et al. 2015).
402 andrea r. halpern and katie overy

In summary, voluntary auditory imagery is already used regularly and effectively in a

variety of music pedagogy, performance, and composition practice contexts. Nevertheless,
it has not yet been widely demonstrated or studied empirically in these real-world
contexts. Further investigation of the nature, extent, and boundaries of this skill and
how to train it most effectively, may prove beneficial to understanding auditory imagery
in general and may also elucidate the potential benefits to professional musicians and
their rehearsal, memorization, and performance techniques.

Conclusions

Auditory imagery is an ability that most people can access and control, with a fair
amount of precision and with fidelity to actual perceived sounds. Such imagery can be
used for entertainment and emotional self-regulation (such as imagining calming songs
if one is in a stressful situation). But we wish to emphasize another aspect of this experi-
ence: what seems at times to be an effortless ability in fact can require a fair amount of
cognitive resources, including working and long-term memory. For both musicians and
nonmusicians, the successful re-evoking of music often reflects the fact that the material
has been encountered multiple times and reflects a detailed knowledge of the piece
(particularly in Ben-Or’s approach). Thus, we could view voluntary auditory imagery in
music learning as a tool that takes some effort to use, but results in superior technical
and expressive skills, or a “desirable difficulty.” The fact that brain activation during
auditory imagery shows areas in common with auditory perception but also unique
activation of higher-order areas involved in memory and executive function, supports
this idea of imagery being used as a tool to enhance learning.
Auditory imagery does not occur in a vacuum of course. Musicians can also use
motor and kinesthetic imagery (Meister et al. 2004), as they imagine their hand and
body movements during playing, and visual imagery when imagining a score, a piano
keyboard, or a conductor’s gestures from a prior rehearsal. Much research has pointed
to the multimodality of imagery, both behaviorally and in terms of neural function
(McNorgan 2012). The translation of a visual score into an auditory experience requires
coordination across the two modalities, often via some representation of the motor sys-
tem. Some of the pedagogy techniques described here exploit this interaction and could
perhaps still be extended. For example, in the Kodály approach, preschool children are
often asked to keep a sequence of learned motor actions going throughout an “action
song” while imagining some melodic lines and singing the others. Similarly, Curwen
hand-signs (Curwen 1854) are used in the Kodály approach with young children to repre-
sent pitch for both imagined and sung musical activities, before moving on to written
notation and more advanced musical materials. This use of the motor system to repre-
sent sound while it is being imagined might be further exploited in more advanced
ways, yet to be conceived and developed.
voluntary auditory imagery and music pedagogy 403

In this chapter, we have discussed the role of auditory imagery in primary music
e ducation as well as in professional musical situations. These methods explicitly recog-
nize that individuals with different levels of ability and training might use imagery in
different ways. For example, Ben-Or proposes using multimodal imagery or “total inner
memory” (Davidson-Kelly et al. 2015) to memorize and mentally rehearse a piece,
assuming that the piece is largely within the performer’s current technical expertise. The
Kodály approach extends from preschool to undergraduate levels of musicianship,
entailing the wide range of beginner to expert levels of repertoire and musical skill
therein. We assume that other approaches to music learning, such as imitative, oral
transmission styles found in non-Western cultures and nonnotated musical genres
such as pop and folk, may also use features of voluntary auditory imagery in a variety
of different ways.
We would also like to recognize here that adults older (even!) than undergraduates
often have an interest in beginning or furthering their musical experiences or training.
Some of the training methods referred to in this chapter could easily be adapted so that
the training was appropriate for middle-aged or senior adults, for example by using gener-
ation-appropriate songs and physical activities. For adults with more seriously limited
mobility, such adapted techniques might even be helpful for motor rehabilitation, for
example in cases of stroke survival or Parkinson’s disease, where musical imagery of a
steady beat, for example, has been proposed as potentially helpful in the rehabilitation of
motor skills (Schaefer 2014).
Of course, we should also emphasize that auditory imagery in music pedagogy is not
always focused on (eventual) proficiency in singing, playing a musical instrument or
reading music notation. We mentioned the value of social music-making earlier on, and
fully recognize that many adults who are not necessarily formally trained in music
nevertheless enjoy singing together in a group. However, many of these individuals are
not satisfied with their vocal abilities and wish they could improve. Some adults do not
sing at all, but wish they could improve their skills in order to enjoy both the artistic
and social benefits of music-making such as choral singing (Clift and Hancox 2010).
Research in progress with colleagues at a UK music conservatory is currently investi-
gating a new way to teach adults who do not sing much, or well, to sing more confidently
and more accurately. Given the strong relationship between auditory imagery vividness
and pitch matching ability, and the importance of musical imagery skills in many peda-
gogical approaches, one aspect of the research will be to create an intervention to train
and improve auditory imagery skills. The study will include a version of the mental pitch
comparison task mentioned earlier (Zatorre and Halpern 1993), where difficulty is grad-
ually increased by probing pitches that are increasingly distant from each other within
the song. Developed as an enjoyable app that can be accessed at home, the study will
track whether (1) it is possible to measure the improvement of auditory imagery skills
and (2) whether that improvement correlates with improved pitch matching and vocal
quality. Such improved skills may also lead to new possibilities in the areas of improvising
and composing for these adult learners.
404 andrea r. halpern and katie overy

We close with the thought that auditory imagery tasks are both inexpensive (one only
has to imagine sounds!) and can be fun, such as asking people to imagine and play with
famous tunes in their heads (we will leave you with an auditory image of the beautiful
song “Danny Boy” and ask you to enjoy and spend too long on the highest note).
Auditory imagery tasks can be developed for individuals with a wide range of musical
backgrounds and performance goals and can thus serve to enhance the traditional tools
of music educators.

Acknowledgments
Katie Overy thanks Eva Vendrei (in memoriam) for her inspirational teaching, Ittzés
Mihály (in memorium) for his expert advice, and the International Kodály Society for their
2001 Sarolta Kodály scholarship to study at the Zoltán Kodály Pedagogical Institute of
Music, Hungary.

Notes
1. https://pingroof.com/diana-nyad-inspiring-more-than-one-generation/ Accessed September
20, 2017.
2. “Wife, School Lost in Quake, Violinist Vows to Rebuild,” from the NPR news program All
Things Considered (2010). http://www.npr.org/2010/01/23/122900781/wife-school-lost-in-
quake-violinist-vows-to-rebuild. Accessed September 20, 2017.

References
Adam, J. 1944. Módszeres Énektanítás a Relatív Szolmizáció Alapján (Systematic Singing
Teaching Based on the Tonic Sol-fa). Budapest: Editio Musica Budapest.
Alzahabi, R., and M. W. Becker. 2013. The Association between Media Multitasking, Task-
Switching, and Dual-Task Performance. Journal of Experimental Psychology: Human
Perception and Performance 39: 1485–1495.
Baddeley, A. D., and J. Andrade. 2000. Working Memory and the Vividness of Imagery.
Journal of Experimental Psychology: General 129: 126–145.
Baddeley, A. D., and R. H. Logie. 1992. Auditory Imagery and Working Memory. In Auditory
Imagery, edited by D. Reisberg, 179–197. Hillsdale, NJ: Erlbaum.
Bailes, F. A. 2006. The Use of Experience-Sampling Methods to Monitor Musical Imagery in
Everyday Life. Musicae Scientiae 10: 173–190.
Beaman, C. P., and T. I. Williams. 2010. Earworms (Stuck Song Syndrome): Towards a Natural
History of Intrusive Thoughts. British Journal of Psychology 101: 637–653.
Bentley, A. 1966. Measures of Musical Abilities, Manual. London: George A. Harap.
Bernardi, N. F., A. Schories, H.-C Jabusch, B. Colombo, and E. Altenmueller. 2013. Mental
Practice in Music Memorisation: An Ecologicalempirical Study. Music Perception
30: 275–290.
Bjork, E. L., J. L. Little, and B. C. Storm. 2014. Multiple-Choice Testing as a Desirable Difficulty
in the Classroom. Journal of Applied Research in Memory and Cognition 3: 165–170.
voluntary auditory imagery and music pedagogy 405

Brodsky, W., Y. Kessler, B-S. Rubinstein, J. Ginsborg, A. Henik. 2008. The Mental Representation
of Music Notation: Notational Audiation. Journal of Experimental Psychology: Human
Perception and Performance 34: 427–445.
Clark, T., and A. Williamon. 2011. Evaluation of a Mental Skills Training Program for
Musicians. Journal of Applied Sport Psychology 23: 342–359.
Clift, S., and G. Hancox. 2010. The Significance of Choral Singing for Sustaining Psychological
Wellbeing: Findings from a Survey of Choristers in England, Australia and Germany. Music
Performance Research 3: 79–96.
Colley, I. D., P. E. Keller and A. R. Halpern. 2017. Working Memory and Auditory Imagery
Predict Sensorimotor Synchronization with Expressively Timed Music, Quarterly Journal
of Experimental Psychology 71: 1781–1796. doi:10.1080/17470218.2017.1366531.
Curwen, J. 1854. An Account of the Tonic Sol-fa Method of Teaching to Sing. London: Tonic
Sol-fa Press.
Davidson-Kelly, K. 2014. Mental Imagery Rehearsal Strategies for Expert Pianists. PhD thesis,
University of Edinburgh, Scotland.
Davidson-Kelly, K., R. S. Schaeffer, N. Moran, and K. Overy. 2015. “Total Inner Memory”:
Deliberate Uses of Multimodal Musical Imagery during Performance Preparation.
Psychomusicology: Music, Mind and Brain 25 (1): 83–92.
Ellis, R. J., A. C. Norton, K. Overy, E. Winner, D. C. Alsop, and G. Schlaug. 2012. Differentiating
Maturational and Training Influences on fMRI Activation during Music Processing.
NeuroImage 60 (3): 1902–1912.
Frieler, K., Fischinger, T., Schlemmer, K., Lothwesen, K., Jakubowski, K., & Müllensiefen, D.
(2013). Absolute Memory for Pitch: A Comparative Replication of Levitin’s 1994 Study in
Six European Labs. Musicae Scientiae 17 (3): 334–349.
Gelding, R. W., W. F. Thompson, and B. W. Johnson. 2015. The Pitch Imagery Arrow Task:
Effects of Musical Training, Vividness, and Mental Control. PLoS One 10 (3): e0121809.
Gordon, E. E. 1979. Primary Measures of Music Audiation. Chicago: GIA Publications.
Gordon, E. E. 1982. Intermediate Measures of Music Audiation. Chicago: GIA Publications.
Gordon, E. E. 1987. The Nature, Description, Measurement and Evaluation of Musical Aptitude.
Chicago: GIA Publications.
Greenspon, E. B., P. Q. Pfordresher, and A. R. Halpern. 2017. Mental Transformations of
Melodies. Music Perception 34: 585–604.
Griffiths, T. D. 2000. Musical Hallucinosis in Acquired Deafness: Phenomenology and Brain
Substrate. Brain 123: 2065–2076.
Halpern, A. R. 1988a. Mental Scanning in Auditory Imagery for Songs. Journal of Experimental
Psychology: Learning, Memory, and Cognition 14: 434–443.
Halpern, A. R. 1989. Memory for the Absolute Pitch of Familiar Songs. Memory and Cognition
17: 572–581.
Halpern, A. R. 2015. Differences in Auditory Imagery Self Report Predict Behavioral and
Neural Outcomes. Psychomusicology: Music, Mind, and Brain 25: 37–47.
Halpern, A. R., and J. C. Bartlett. 2011. The Persistence of Musical Memories: A Descriptive
Study of Earworms. Music Perception 28: 425–431.
Halpern, A. R., and R. J. Zatorre. 1999. When That Tune Runs through Your Head: A PET
Investigation of Auditory Imagery for Familiar Melodies. Cerebral Cortex 9: 697–704.
Halpern, A. R., R. J. Zatorre, M. Bouffard, and J. A. Johnson. 2004. Behavioral and Neural
Correlates of Perceived and Imagined Musical Timbre. Neuropsychologia 42: 1281–1292.
406 andrea r. halpern and katie overy

Hamilton, K. 2008. After the Golden Age: Romantic Pianism and Modern Performance.
New York: Oxford University Press.
Herholz, S. C., A. R. Halpern, and R. J. Zatorre. 2012. Neuronal Correlates of Perception,
Imagery, and Memory for Familiar Tunes. Journal of Cognitive Neuroscience 24: 1382–1397.
Heyes, C. 2013. What Can Imitation Do for Cooperation? In Cooperation and Its Evolution,
edited by K. Sterelny, R. Joyce, B. Calcott, and B. Fraser. Cambridge, MA: MIT Press.
Highben, Z., and C. Palmer. 2004. Effects of Auditory and Motor Mental Practice in
Memorized Piano Performance. Bulletin of the Council for Research in Music Education
159: 58–65.
Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136:302–329.
Hyman, I. E., Jr., N. K. Burland, H. M. Duskin, M. C. Cook, C. M. Roy, J. C. McGrath et al.
2013. Going Gaga: Investigating, Creating, and Manipulating the Song Stuck in My Head.
Applied Cognitive Psychology 27: 204–215.
Ittzés, M. 2002. Zoltán Kodály: In Retrospect. Kecskemét, Hungary: Kodály Institute.
Jakubowski, K. N. Farrugia, A. R. Halpern, S. K. Sankarpandi, and L. Stewart. 2015. The Speed
of Our Mental Soundtracks: Tracking the Tempo of Involuntary Musical Imagery in
Everyday Life. Memory and Cognition 43: 1229–1242.
James, I., and I. B. Savage. 1984. Beneficial Effect of Nadolol on Anxiety-Induced Disturbances
of Performance in Musicians: A Comparison with Diazepam and Placebo; Proceedings of
a Symposium on the Increasing Clinical Value of Beta Blockers Focus on Nadolol. American
Heart Journal 108: 1150–1155.
Karageorghis, C. I., J. C. Hutchinson, L. Jones, H. L. Farmer, M. A. Ayhan, R. C. Wilson, et al.
2013. Psychological, Psychophysical, and Ergogenic Effects of Music in Swimming.
Psychology of Sport and Exercise 14: 560–568.
Keller, P. E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and
Potential Benefits. Annals of the New York Academy of Sciences 1252 (1): 206–213.
Kirschner, S., and M. Tomasello. 2010. Joint Music Making Promotes Prosocial Behavior in
4-Year-Old Children. Evolution and Human Behavior 31: 354–364.
Kodály, Z. 1960. Folk music of Hungary. London: Barrie and Rockliff.
Kodály, Z. 1962. Bicinia Hungarica. London: Boosey and Hawkes.
Kodály, Z. 1974. The Selected Writings of Zoltán Kodály. London and New York: Boosey &
Hawkes.
Levitin, D. J., and P. R. Cook. 1996. Memory for Musical Tempo: Additional Evidence That
Auditory Memory Is Absolute. Perception and Psychophysics 58: 927–935.
Lima, C., N. Lavan, S. Evans, Z. Agnew, A. R. Halpern, P. Shanmugalingam, et al. 2015. Feel the
Noise: Relating Individual Differences in Auditory Imagery to the Structure and Function
of Sensorimotor Systems. Cerebral Cortex 25: 4638–4650. doi:10.1093/cercor/bhv134.
Lucas, B. J., E. Schubert, and A. R. Halpern. 2010. Perception of Emotion in Sounded and
Imagined Music. Music Perception 27: 399–412.
McEvenue, K. 2002. The Actor and the Alexander Technique. New York: Palgrave Macmillan.
McNorgan, C. 2012. A Meta-Analytic Review of Multisensory Imagery Identifies the Neural
Correlates of Modality-Specific and Modality-General Imagery. Frontiers in Human
Neuroscience 6: 285–295.
Meister, I. G., T. Krings, H. Foltys, B. Boroojerdi, M. Müller, R. Töpper, and A. Thron. 2004.
Playing Piano in the Mind—an fMRI Study on Music Imagery and Performance in Pianists.
Cognitive Brain Research 19: 219–228.
voluntary auditory imagery and music pedagogy 407

Müllensiefen, D., J. Fry, R. Jones, S. Jilka, L. Stewart, and V. Williamson. 2014. Individual
Differences Predict Patterns in Spontaneous Involuntary Musical Imagery. Music Perception
31 (4): 323–338. doi:10.1525/MP.2014.31.4.323.
Neisser, U. 1976. Cognition and Reality: Principles and Implications of Cognitive Psychology.
New York: Freeman.
Overy, K. 2012. Making Music in a Group: Synchronization and Shared Experience. Annals of
the New York Academy of Science 1252: 65–68.
Overy, K., and I. Molnar-Szakacs. 2009. Being Together in Time: Musical Experience and the
Mirror Neuron System. Music Perception 26: 489–504.
Overy, K., R. I. Nicolson, A. J. Fawcett, and E. F. Clarke. 2003. Dyslexia and Music: Measuring
Musical Timing Skills. Dyslexia 9: 18–36.
Overy, K., A. Norton, K. Cronin, E. Winner, and G. Schlaug. 2005. Examining Rhythm and
Melody Processing in Young Children using fMRI. Annals of the New York Academy of
Science 1060: 210–218.
Pfordresher, P. Q., and A. R. Halpern. 2013. Auditory Imagery and the Poor-Pitch Singer.
Psychonomic Bulletin and Review 20: 747–753.
Rosety-Rodriguez, M., F. J. Ordonez, and J. Farias. 2003. The Influence of the Active Range
of Movement of Pianists’ Wrists on Repetitive Strain Injury. European Journal of Anatomy
7: 75–77.
Schaefer, R. S. 2014. Auditory Rhythmic Cueing in Movement Rehabilitation: Findings and
Possible Mechanisms. Philosophical Transactions of the Royal Society B 369: 20130402.
Seashore, C. E. 1919. Seashore Measures of Musical Talent. New York: Columbia Phonograph
Company.
Seashore, C. E. 1938. Psychology of Music. New York: McGraw Hill.
Schellenberg, E. G., and S. E. Trehub. 2003. Good Pitch Memory Is Widespread. Psychological
Science 14: 262–266.
Szőnyi, E. 1974. Musical Reading and Writing. Vol.1. Budapest: Editio Musica Budapest.
Trusheim, W. H. 1991. Audiation and Mental Imagery: Implications for Artistic Performance.
Quarterly Journal of Music Teaching and Learning 2: 138–147.
Williamson, V. J., and S. R. Jilka. 2013. Experiencing Earworms: An Interview Study of
Involuntary Musical Imagery. Psychology of Music 42: 653–670. doi:10.1177/0305735613483848.
Wing, H. D. 1970. Standardised Tests of Musical Intelligence. Windsor: NFER-Nelson Publishing.
Wöllner, C., and A. R. Halpern. 2016. Attentional Flexibility and Memory Capacity in
Conductors and Pianists. Attention, Perception, and Psychophysics 78: 198–208. doi:10.3758/
s13414-015-0989-z.
Zatorre, R. J., and A. R. Halpern. 1993. Effect of Unilateral Temporal-Lobe Excision on
Perception and Imagery of Songs. Neuropsychologia 31: 221–232.
Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., and Evans, A. C. 1996. Hearing in the
Mind’s Ear: A PET Investigation of Musical Imagery and Perception. Journal of Cognitive
Neuroscience 8: 29–46.
Zatorre, R. J., and A. R. Halpern. 2005. Mental Concerts: Musical Imagery and the Auditory
Cortex. Neuron 47: 9–12.
Zatorre, R. J., A. R. Halpern, and M. Bouffard. 2010. Mental Reversal of Imagined Melodies:
A Role for the Posterior Parietal Cortex. Journal of Cognitive Neuroscience 22: 775–789.
chapter 20

A Differ en t Way of
Im agi n i ng Sou n d
Probing the Inner Auditory Worlds of Some
Children on the Autism Spectrum

Adam Ockelford

Introduction

Imagine that you are walking along a road at night when you hear a sound. On the
one hand, you might pay attention to its pitch and loudness and the ways they
change with time. You might attend to the sound’s timbre, whether it is rough or
smooth, bright or dull. . . . These are all examples of musical listening, in which the
perceptual dimensions and attributes of concern have to do with the sound itself,
and are those used in the creation of music. . . . On the other hand, as you stand there
in the road, it is likely that you will not listen to the sound itself at all. Instead, you
are likely to notice that the sound is made by an automobile with a large and power-
ful engine. Your attention is likely to be drawn to the fact that it is approaching
quickly from behind. And you might even attend to the environment, hearing that
the road you are on is actually a narrow alley, with echoing walls on each side. This
is an example of everyday listening, the experience of listening to events rather
than sounds.

So writes William Gaver (1993, 1), in relation to his “ecological” analysis of hearing, in
which he sets out how, for most listeners, in everyday contexts, the function of sounds
in auditory perception is privileged over their acoustic properties. However, there are
people for whom this does not appear to be the case, including a significant minority
of children who are on the autism spectrum. Children with autism typically have chal-
lenges with social interaction and communication, and tend to exhibit a narrow—even
obsessive—focus on particular activities that are often characterized by pattern and
410 adam ockelford

predictability. The perceptual qualities of objects may be more important than their
function, and, in the auditory domain, parents often report a fascination with sound,
apparently for its own sake:

“My son Jack is obsessed with the beeping sound of the microwave when its cooking
cycle comes to an end. He can’t bear to leave the kitchen till it’s stopped. And just
lately, he’s become very interested in the whirr of the tumble-drier too.”
“My four-year old daughter just repeats what I say. For a long time, she didn’t speak
at all, but now, the educational psychologist tells me, she’s ‘echolalic.’ I say, ‘Hello,
Anna,’ and she says ‘Hello, Anna’ back. I ask ‘Do you want to play with your toys’
and she just replies ‘Play with your toys,’ though I don’t think she really knows what
I mean.”
“Ben wants to listen to the jingles that he downloads from the internet all the time.
And I mean, the whole time—16 hours a day if we let him. He doesn’t even play
them all the way through: sometimes just the first couple of seconds of a clip, over
and over again. He must have heard them thousands of times. But he never seems to
get bored.”
“Callum puts his hands over his ears and starts rocking and humming to himself
when my mobile goes off, but totally ignores the ringtone on my husband’s phone,
which is much louder.”
“My ten-year-old son Freddie constantly flicks any glasses, bowls, pots or pans that
are within reach. The other day, he emptied out the dresser— and even brought in
half a dozen flowerpots from the garden—and lined everything up on the floor.
Then he sat and ‘played’ his new instrument for hours. I couldn’t see a pattern in
what he’d done, but if I moved anything when he wasn’t looking, he’d notice straight
away, and move it back again.”
“Every now and then, Romy only pretends to play the notes on her keyboard—
touching the keys with her fingers but not actually pressing them down. And some-
times, she introduces everyday sounds that she hears into her improvising. For
example, she plays the complicated descending harmonic sound of the aeroplanes
coming into land at Heathrow as chords, and somehow integrates them into the
music she is playing.”
“Omur repeatedly bangs away at particular notes on his piano (mainly ‘B’ and ‘F
sharp.’ high up in the right hand), sometimes persisting until the string or the ham-
mer breaks.”
“Derek (who is blind) copies the sounds of the page turns in his own rendition of a
Chopin waltz that his piano teacher played for him by tapping his fingers on the
music rack above the keyboard.” (Ockelford 2013)

Why should this be the case? What causes some autistic children to hear sounds in this
way? And what impact, if any, does this idiosyncratic style of auditory perception have
on the way that they perceive, remember, and imagine music? These are the questions
that lie at the heart of this chapter.
inner auditory worlds of children on the autism spectrum 411

An Ecological Model of
Auditory Perception

Early in life, “neurotypical” human infants learn to differentiate between auditory input
according to one of three functions that it can fulfill. This results in the development of
“everyday” listening, which, as Gaver observes, is concerned with attending to events
such as a car passing by or a door slamming; “musical” listening, which focuses on
perceptual qualities such as pitch and loudness; and “linguistic” listening, which is ulti-
mately based on the perception and cognition of speech sounds. The separation of music
and language perception ties in with evidence from neuroscience, which suggests that,
while the two domains share some neurological resources, they also have dedicated pro-
cessing pathways (Patel 2012) that are distinct from those activated by environmental
sounds (Norman-Haignere, Kanwisher, and McDermott 2015).
It is not known just how these three types of auditory processing—relating to everyday
sounds, music, and speech—become defined in the brain’s architecture following the
initial development of hearing around four to three months before birth (Lecanuet 1996).
There is currently some debate as to which develops first, although there is increasing
evidence that musical hearing and ability are essential to language acquisition (Brandt,
Gebrian, and Slevc 2012). My own work (e.g., Ockelford 2017) supports this view. My
theory of what makes music “music”—“zygonic” theory (Ockelford 2005)—contends
that, for music to exist in the mind, there must be perceived imitation of one feature of a
sound by another (Ockelford, 2005), and the fact that, from an early age, babies do copy
vocal sounds and relish being copied long before they can use or understand words sug-
gests that music is indeed a precursor of language (Voyajolu and Ockelford 2016). In any
case, singing and speech appear to follow discrete developmental paths from around
the beginning of the second year of life (Lecanuet 1996). We can surmise that the other
category—“everyday” sounds—must perceptually be the most primitive of all, since it
appears to require less cognitive processing than either music or speech. And in phylo-
genetic terms (in our development as a species), the capacity to process music and then
language are thought to be relatively recent specialisms of the auditory system (see, e.g.,
Masataka 2007). Hence it seems reasonable to assume that, early on in “typical” human
development, the brain treats all sound in the same way and that music processing starts
to emerge first, followed by language. We can speculate that the residue that is left
remains as “everyday” sounds. Hence the ecological model of auditory perception can
be represented as follows, in which it is assumed that, as well as their shared neural
resources, music and language come to have distinct, additional distinct neural cor-
relates during the first postnatal year (Figure 20.1). Clearly, since the precise nature of
the sounds that constitute speech or music, and the relationship between them, varies
somewhat from one culture to another, the model should be regarded as indicative
rather than absolute.
412 adam ockelford

Developmental
stream of sound
perception
and production

–3 months
“neurotypical”
development
Proto-musical
processing starts
Birth to emerge

Proto-speech
processing starts
to become distinct
from music

12 months
“neurotypical”
development
Distinguishable
cognitive domains
(though some
overlap remains)

Everyday Speech
sounds Music

Figure 20.1 The emerging streams of music and language processing in auditory development.

But what of children on the autism spectrum? As the parents’ descriptions suggest, it
seems that certain sounds, especially those that are particularly salient or pleasing to an
individual, such as the whirring of the tumble drier, acquire little or no functional sig
nificance for some children. Instead, they tend to be processed only in terms of their
sounding qualities—that is, in musical terms. It seems also that everyday sounds that
involve repetition or regularity (such as the beeping of a microwave) may be processed
in music-structural terms. This would imply that the children hear the repetition that is
actually generated mechanically or electronically as being imitative (Figure 20.2).
There is, of course, another possibility that we should acknowledge: that the autistic
children who are preoccupied with the sounding qualities of certain everyday objects
and the repetitive patterns that some of them make do not actually hear them in a musical
way—that is, as being derived from one another through imitation—but purely as regu-
larities in the environment. Furthermore, it could be that those same children do not
hear music as “music” either, but merely as patterned sequences of sounds, to which no
sense of human agency is transferred. Why should this be the case? Perhaps because
inner auditory worlds of children on the autism spectrum 413

Developmental
stream of sound
perception
and production

–3 months
“neurotypical”
development
Proto-musical
processing starts
Birth to emerge

Proto-speech
processing starts
to become distinct
from music

12 months
“neurotypical”
development

Everyday Speech
sounds Music

Figure 20.2 Some everyday sounds might be processed as music among children on the
autism spectrum.

such children did not engage in the early vocal interactions with carers—“communicative
musicality” (Malloch and Trevarthen 2009)—that I have suggested may embed a sense
of imitation in sounds that are repeated (Ockelford 2017). However, the accounts of
Romy reproducing the whines of jet engines of airplanes coming in to land and integrat-
ing them into her improvisation at the piano, of Derek evidently regarding the rustle of a
page turn as part of a Chopin waltz, and of Freddie appropriating everyday sound-makers
(flower pots) to be used as musical instruments, suggest that some autistic children, at
least, do perceive everyday sounds in a musical way.
It may well be that this tendency is reinforced by the prevalence of music in the lives
of young children (Lamont 2008); in the developed world, they are typically surrounded
by electronic games and gadgets, toys, mobile phones, mp3 players, computers, iPads,
TVs, radios, and so on, all of which emanate music to a greater or lesser extent. In the
wider environment too—in restaurants, cafés, shops, cinemas and waiting rooms, cars
and airplanes, and at many religious gatherings and other public ceremonies— music is
414 adam ockelford

ubiquitous. So, given that children are inundated with nonfunctional (musical) sounds,
designed, in one way or another, to influence emotional states and behavior, perhaps we
should not be surprised that the sounds with which they often co-occur that to neuro-
typical ears are functional, should come to be processed in the same way.
The manner in which some autistic children perceive the world can have other conse-
quences too. For example, the development of language can be affected, resulting in,
among other things, “echolalia”—a distinctive form of speech widely reported among
blind and autistic children (Mills 1993; Sterponi and Shankey 2013) and which was
originally defined as the meaningless repetition of words or phrases (Fay 1967; 1973).
However, it appears that echolalia actually fulfills a range of functions in verbal interaction
(Prizant 1979), including turn-taking and affirmation, and often finds a place in noninter
active contexts too, serving as a self-reflective commentary or rehearsal strategy
(Prizant and Duchan 1981; McEvoy, Loveland, and Landry 1988). Given the hypothesis
that imitation lies at the heart of musical structure (Ockelford 2012), it could be argued
that one cause of echolalia is the organization of language (in the absence of semantics
and syntax) through the structure (repetition) that is present in all music. It is as though
words become musical objects in their own right, to be manipulated not according to
their meaning or grammatical function, but purely through their sounding qualities.
This implies a further modification to the ecological model of auditory development
(see Figure 20.3).
It is of interest to note that echolalia is not restricted to certain exceptional groups
who exist on one extreme of the multidimensional continuum that makes up human
neurodiversity; it is a feature of “typical” language acquisition in young children
(Mcglone-Dorrian and Potter 1984) when, it seems, the urge to imitate what they hear
outstrips semantic understanding. This would accord with a stage in the ecological model
of auditory development when the two strands of communication through sound—
language and music—are not cognitively distinct, and would support the notion that
musical development precedes the onset of language.
For children on the autism spectrum, it is worth noting that music itself can become
“superstructured” with additional repetition, as the account, for example, of Ben shows;
it is common for children on the autism spectrum to play snippets of music (or videos
with music) over and over again. It is as though music’s already high proportion of repe-
tition, which is at least 80 percent (Ockelford 2005), is insufficient for the mind raven-
ous for structure, and so it creates even more. Speaking to autistic adults who are able to
verbalize why (as children) they would repeat musical excerpts in this way, it appears
that the main reason (apart from the sheer enjoyment of hearing a particularly fascinat-
ing series of sounds repeatedly) is that they could hear more and more in the sequence
concerned as they listened to it again and again. Bearing in mind that most music is, as
we have seen, highly complex, with many events occurring simultaneously (and given
that even single notes generally comprise many pitches in the form of harmonics), to the
child with finely tuned auditory perception, there is in fact a plethora of different things
to attend to in even a few seconds of music, and an even greater number of relationships
inner auditory worlds of children on the autism spectrum 415

Developmental
stream of sound
perception
and production

–3 months
“neurotypical”
development
Proto-musical
processing starts
Birth to emerge

12 months
“neurotypical”
development

Everyday Speech
sounds Music

Figure 20.3 Speech might also be processed in musical terms by some children on the autism
spectrum.

between sounds to fathom. So, for example, while listening to a passage for orchestra one
hundred times may be extremely tedious to the “neurotypical” ear, which can detect only
half a dozen composite events, each fused in perception, to the mind of the autistic child,
which can break down the sequence into a dozen different melodic lines, the stimulus
may be rich and riveting.
Moreover, there tends to be far more structure in a piece of music than would theo
retically be required for it to make sense (Ockelford 2017). Compositions are, by any
standards, overengineered, typically with levels of repetition of 80 percent or more
(Ockelford 2005). In terms of information theory, they are high redundant. Why should
this be the case? Perhaps because, traditionally, composers have been aware that they
need to design pieces in such a way that their message will still come across in the sub
optimal circumstances that will inevitably characterize most performances. For example,
different interpretations may unexpectedly foreground some features of a work at the
416 adam ockelford

expense of others. The acoustics in which a concert takes place may be less than ideal.
Listeners’ concentrations may wander. For the child on the autism spectrum, though,
attending to the same short passage for the nth time, it means that to go on listening
remains a worthwhile venture; there are still new connections between notes to
be unearthed.

Absolute Pitch

It seems that one of the consequences of an early preoccupation with the “musical”
qualities of sounds is the development of “absolute pitch” (AP)—the capacity to identify
or produce pitches in isolation from others. In the West’s population as a whole, this
ability is extremely rare, with an estimated prevalence of 1 in 10,000 (Takeuchi and
Hulse 1993). However, among those on the autism spectrum, the position is very dif-
ferent; recent estimates, derived from parental questionnaires, vary between 8 per-
cent, N = 118 (Vamvakari 2013) and 21 percent, N = 305 (Reese 2014). These figures are
broadly supported by DePape, Hall, Tillmann, and Trainor (2012) who, in a study of
twenty-seven high-functioning adolescents with autism spectrum condition, found
that three of them (11 percent) had AP. It is very unusual to find such high orders of
difference in the incidence of a perceptual ability between different subgroups of the
human population and, evidently, there is something distinct in the way that the parts
of the brain responsible for pitch memory wire themselves up in a significant minority
of autistic children.
While AP is a useful (though inessential) skill in “neurotypical” musicians—including
those performing at the highest level—it appears to be an indispensable factor in the
development of music performance skills in autistic children with learning difficulties—
so-called “savants” (Miller 1989). It appears to be this unusual ability that motivates and
enables some young children with a limited understanding of the world, from the age of
twenty-four months or so, to pick out tunes and harmonies on instruments that they may
encounter at home or elsewhere—typically the keyboard or piano. This may well occur
with no adult intervention (or, indeed, awareness). It seems that AP has this impact
since each pitch sounds distinct, potentially eliciting a powerful emotional response, so
being able to reproduce these at will must surely be an electrifying experience. But more
than this, AP makes learning to play by ear manageable, in a way that “relative pitch”—
the capacity to process melodic and harmonic intervals—does not. To understand why,
consider a typical playground chant that children use to taunt one another (Figure 20.4).

Figure 20.4 A playground chant.

inner auditory worlds of children on the autism spectrum 417

In “neurotypical” individuals, motifs such as this are likely to be encoded in the mind,
stored and retrieved principally as a series of differences between notes (although “fuzzy”
absolute memories will exist—a child would know if the chant were an octave too high,
for example). However, for children with AP, the position is quite different, since they have
the capacity to capture the pitch data from music directly, rather than as series of intervals.
Hence, in seeking to remember and repeat groups of notes over significant periods of
time, they have certain processing advantages over their “neurotypical” peers, who
extract and store information at a higher level of abstraction, and thereby lose the “surface
detail.” (Note that there are disadvantages to “absolute” representations of pitch too since,
on their own, they cannot take advantage of the patterns that exist through the repetition
of intervals and they make greater demands on memory. However, as there appears to
be, to all intents and purposes, no limit on the brain’s long-term storage capacity, this is
not a serious problem; indeed, having an exceptional memory is something that is com-
mon to many children with autism.)
In my view, it is this capacity for “absolute pitch data capture” that explains why chil-
dren with AP who are on the autism spectrum and have learning difficulties are able to
develop instrumental skills at an early age with no formal tuition since, for them, repro-
ducing groups of notes that they have heard is merely a question of remembering a series
of one-to-one mappings between given pitches as they sound and (typically) the keys on
a keyboard that produce them. These relationships are invariant; once learned, they can
service a lifetime of music making, through which they are constantly reinforced. On
the other hand, were a child with “relative pitch” to try to play by ear, he or she would
have to become proficient in the far more complicated process of calculating how the
intervals that are perceived map onto the distances between keys, which, due to the
asymmetries of the keyboard, are likely to differ according to what would necessarily be
an arbitrary starting point.
For example, the interval that exists between the first two notes of the playground
chant (a minor 3rd) shown in Figure 20.5 can be produced through no fewer than twelve
distinct key combinations, comprising one of four underlying patterns. Moreover, the
complexity of the situation is compounded by the fact that virtually the same physical
leap between other keys may sound different (a major 3rd) according to its position on
the keyboard.
That is not to say that children with AP who learn to play by ear do not rapidly
develop the skills to play melodies beginning on different notes too, and it is not unu-
sual for them to learn to reproduce pieces fluently in every key. This may appear contra-
dictory, in the light of the processing advantage conferred by being able to encode
pitches as perceptual identities in their own right, each of which, as we have seen,
maps uniquely onto a particular note on the keyboard. However, the reality of almost
all pieces of music is that melodic (and harmonic) motifs variously appear at differ-
ent pitches through transposition and so, to make sense of music, young children
with AP need to learn to process pitch relatively as well as absolutely (Stalinski and
Schellenberg 2010).
418 adam ockelford

Playing by ear: Playing by ear:

“relative” pitch “absolute” pitch
Multiple
potential One-to-one
mappings mappings
to notes on to notes on
keyboard keyboard

? ?
P
Motif stored Motif stored G
I as series of as series of E
intervals pitches

First key press produces sound;

mental calculation of interval from this sound by (initial)
trial and error to find second key (to match the interval)
There are twelve possibilities: four different patterns
Confounding factor: the same pattern and
similar ones produce different intervals
Minor 3rds Major 3rds

But!

Figure 20.5 The different mechanisms involved in playing by ear using “absolute” and “relative”
pitch abilities.

The Impact of Remembering

and Imagining Musical Sounds
in Absolute Terms

What is the day-to-day impact of AP on children with learning difficulties who are on
the autism spectrum likely to be? The answer is: as varied as the children are themselves.
Elsewhere, I have written at length about the extraordinary life of Derek Paravicini
(Ockelford 2009) who is what Treffert (2009) calls a “prodigious” musical savant. It is
simply not possible to imagine Derek without his piano playing, in which the way he
thinks, the way he feels, and the way he relates to other people are embodied. But there are
many other children on the autism spectrum with whom I have worked over many years
and who are no less exceptional in their different ways and no less enlightening as to how
musical sounds can be remembered and imagined.
inner auditory worlds of children on the autism spectrum 419

In this context, here are two accounts of children whom I have worked with every
week for a number of years. They are taken from blogs that were designed to raise aware-
ness of autism and musicality and to stir the debate on the relationship between so-called
disability and ability. The children and their parents visit me in a large practice room at
the University of Roehampton where I am based. There are two pianos, to avoid potential
difficulties over personal space. A number of the children rarely say a word. Some, like
Romy, are entirely nonverbal. She converses through her playing, showing what piece
she would like next, and indicating when she has had enough. On occasions, she will
tease me by apparently suggesting one thing when she means another. In this way, jokes
are shared and, sometimes, feelings of sadness too. For Romy, music truly functions as a
proxy language (Figure 20.6).

A Session with Romy

On Sunday mornings, at 10:00 a.m., I steel myself for Romy’s arrival. I know that the
next two hours will be an exacting test of my musical mettle. Yet Romy has severe
learning difficulties, and she doesn’t speak at all. She is musical to the core, though;
she lives and breathes music—it is the very essence of her being. With her passion
comes a high degree of particularity; Romy knows precisely which piece she wants
me to play, at what tempo, and in which key. And woe betide me if I get it wrong.
420 adam ockelford

When we started working together, six years ago, mistakes and misunderstandings
occurred all too frequently since, as it turned out, there were very few pieces that
Romy would tolerate: for example, the theme from Für Elise (never the middle
section); the Habanera from Carmen; and some snippets from “Buckaroo Holiday”
(the first movement of Aaron Copland’s Rodeo). Romy’s acute neophobia meant that
even one note of a different piece would evoke shrieks of fear-cum-anger, and the
session could easily grow into an emotional conflagration.
So gradually, gradually, over weeks, then months, and then years, I introduced new
pieces—sometimes, quite literally, at the rate of one note per session. On occasion,
if things were difficult, I would even take a step back before trying to move on again
the next time. And, imperceptibly at first, Romy’s fears started to melt away. The
theme from Brahms’s Haydn Variations became something of an obsession, fol-
lowed by the slow movement of Beethoven’s Pathetique sonata. Then it was Joplin’s
The Entertainer, and Rocking All Over the World by Status Quo.
Over the six years, Romy’s jigsaw box of musical pieces—fragments ranging from just
a few seconds to a minute or so in length—has filled up at an ever-increasing rate.
Now it’s overflowing, and it’s difficult to keep up with Romy’s mercurial musical mind;
mixing and matching ideas in our improvised sessions, and even changing melodies
and harmonies so they mesh together, or to ensure that my contributions don’t!
As we play, new pictures in sound emerge and then retreat as a kaleidoscope of ideas
whirls between us. Sometimes a single melody persists for fifteen minutes, even half
an hour. For Romy, no matter how often it is repeated, a fragment of music seems to
stay fresh and vibrant. At other times, it sounds as though she is trying to play sev-
eral pieces at the same time—she just can’t get them out quickly enough, and a ver-
itable nest of earworms wriggle their way onto the piano keyboard. Vainly I attempt
to herd them into a common direction of musical travel.
So here I am, sitting at the piano in Roehampton, on a Sunday morning in mid-
November, waiting for Romy to join me (not to be there when she arrives is asking
for trouble). I’m limbering up with a rather sedate rendition of the opening of Chopin’s
Etude in C major, Op. 10, No. 1 when I hear her coming down the corridor, vocal-
izing with increasing fervor. I feel the tension rising, and as her father pushes open
the door, she breaks away from him, rushes over to the piano and, with a shriek and
an extraordinarily agile sweep of her arm, elbows my right hand out of the way at
the precise moment that I was going to hit the D an octave above middle C. She
usurps this note to her own ends, ushering in her favorite Brahms-Haydn theme.
Instantly, Romy smiles, relaxes and gives me the choice of moving out of the way or
having my lap appropriated as an unwilling cushion on the piano stool. I choose the
former, sliding to my left onto a chair that I’d placed earlier in readiness for the move
that I knew I would have to make.
I join in the Brahms, and encourage her to use her left hand to add a bass line. She
tolerates this up to the end of the first section of the theme, but in her mind she’s
already moved on, and without a break in the sound, Romy steps onto the set of
A Little Night Music, gently noodling around the introduction to Send in the Clowns.
But it’s in the wrong key—G instead of E flat—which I know from experience means
that she doesn’t really want us to go into the Sondheim classic, but instead wants me
to play the first four bars (and only the first four bars) of Schumann’s Kleine Studie
inner auditory worlds of children on the autism spectrum 421

Op. 68, No. 14. Trying to perform the fifth bar would, in any case, be futile since Romy’s
already started to play . . . now, is it I am Sailing or O Freedom? The opening ascent
from D through E to G could signal either of those possibilities. Almost tentatively,
Romy presses those three notes down and then looks at me and smiles, waiting, and
knowing that whichever option I choose will be the wrong one. I just shake my head
at her and plump for O Freedom, but sure enough Rod Stewart shoves the Spiritual
out of the way before it has time to draw a second breath.
From there, Romy shifts up a gear to the Canon in D—or is it really Pachelbel’s
masterpiece? With a deft flick of her little finger up to a high A, she seems to suggest
that she wants Streets of London instead (which uses the same harmonies). I opt for
Ralph McTell, but another flick, this time aimed partly at me as well as the keys, shows
that Romy actually wants Beethoven’s Pathetique theme—but again, in the wrong
key (D). Obediently I start to play, but Romy takes us almost immediately to A flat (the
tonality that Beethoven originally intended). As soon as I’m there, though, Romy
races back up the keyboard again, returning to Pachelbel’s domain. Before I’ve had
time to catch up, though, she’s transformed the music once more; now we’re hearing
the famous theme from Dvorak’s New World Symphony.
I pause to recover my thoughts, but Romy is impatiently waiting for me to begin
the accompaniment. Two or three minutes into the session, and we’ve already touched
on twelve pieces spanning 300 years of Western music and an emotional range to
match. Yet, here is a girl who in everyday life is supposed to have no “theory of
mind”—the capacity to put yourself in other people’s shoes and think what they are
thinking. Here is someone who is supposed to lack the ability to communicate. Here
is someone who functions, apparently, at an 18-month level. But I say here is a joyous
musician who amazes all who hear her. Here is a girl in whom extreme ability and
disability coexist in the most extraordinary way. Here is someone who can reach out
through music and touch one’s emotions in a profound way. If music is important to
us all, for Romy it is truly her lifeblood.1

How did Romy, severely learning disabled, become such a talented, if idiosyncratic,
musician? In my view, it was her early inability to process language, in tandem with her
inability to grasp the portent of many everyday sounds, that enhanced her ability to
process all sounds in a musical way. The two were inextricably linked. Indeed, without
the former, we can surmise that the latter would never have developed.
Romy has AP, meaning that for her, as we have seen, her mental images of musical
sounds are distinct with regard to pitch. Hence, every note on the piano is instantly recog-
nizable. But more than this, for Romy, each pitch provides a stable point of reference in
a capricious world. And it’s not just notes on the piano that function for Romy in this
way. In her mind, each of the notes in any piece of music sounds distinct. While, for most
of us, musical sounds pass by unremarkably in perceptual terms, for Romy, different
notes, different chords, can affect her profoundly: an E flat major harmony can make her
quiver with excitement, for example, while G7 can make her cry.
In itself, though, absolute pitch is insufficient to make an exceptional musician; that
takes at least seven thousand hours of practice (Sloboda et al. 1996). How, then, did Romy
acquire her musical skills? Like many autistic children early in life, she developed an
422 adam ockelford

obsession. In her case this was a small electronic keyboard, whose notes lit up in the
sequence needed to play one of a number of simple tunes. As far as Romy was con-
cerned, this musical toy was one of only a few things with which she could meaningfully
interact, and whose logic she could understand, and she spent hundreds of hours play-
ing with it. The keyboard was comfortingly predictable in comparison to any human
being—even her devoted family, whose language and behavior differed subtly from one
occasion to another, as all interaction engagement does. The keyboard, though, invaria-
bly responded to Romy in the same way. Whenever she pressed a particular key, it always
sounded the same as it did before. Here was something in the environment that Romy
could predict and control.
And so, through countless hours of self-directed exploration as a toddler, Romy discov-
ered where all the notes (whose sounds she could hear in her head) are on the keyboard.
Today, as a teenager, for Romy to play the piano merely requires her to hear a tune in her
head (available to her through the internal library of songs, stored as series of absolute
auditory images) and play along with it, pressing down the correct keys in sequence as
their pitches sound in her head. And this approach works not only for music. As we noted
earlier, she will reproduce the sounds of the jet engines of planes as they descend toward
Heathrow Airport, for example, and she unhesitatingly copies any ringtones that inter-
rupt her piano lessons.
Absolute pitch can have other consequences for children on the autism spectrum too.
The absolute representation of sounds in their heads appears to fuel musical imagination
in a way that is more vivid, more visceral even, than the relative memory of intervals alone.
And, although formal research is yet to be undertaken, the anecdotal accounts of par-
ents and teachers suggest that earworms are widespread; evidenced most obviously in
some children’s incessant vocalizing of melodic fragments. With minds full of tunes that
seem to be playing the whole time, external sounds can be at best superfluous and at
worst an irritation, as the following account of a session with Freddie, then eleven years
old, shows (Figure 20.7).

Freddie—the Silent Musician

“Why’s he doing that?” Freddie’s father, Simon, sounded more than usually puzzled
by the antics of his son.
After months of displacement activity, Freddie was finally sitting next to me at the
piano, and looked as though this time he really were about to play. A final fidget and
then his right hand moved towards the keys. With infinite care, he placed his thumb
on middle C as he had watched me do before— but without pressing it down.
Silently, he moved to the next note (D), which he feathered in a similar way, using
his index finger, then with the same precision he touched E, F and G, before coming
back down the soundless scale to an inaudible C.
I couldn’t help smiling.
inner auditory worlds of children on the autism spectrum 423

“Fred, we need to hear the notes!”

My comment was rewarded with a deep stare, right into my eyes. Through them,
almost. It was always hard to know what Freddie was thinking, but on this occasion
he did seem to understand and was willing to respond to my request, since his
thumb went back to C. Again, the key remained un-pressed, but this time he sang
the note (perfectly in tune), and then the next one, and the next, until the five-finger
exercise was complete.
In most children (assuming that they had the necessary musical skills), such behav-
ior would probably be regarded as an idiosyncratic attempt at humor or even mild
naughtiness. But Freddie was being absolutely serious and was pleased, I think, to
achieve what he’d been asked to do, for he had indeed enabled me to hear the notes!
He stared at me again, evidently expecting something more, and without thinking
I leant forward.
“Now on this one, Fred,” I said, touching C sharp.
Freddie gave the tiniest blink and a twitch of his head, and I imagined him, in a frac-
tion of a second, making the necessary kinesthetic calculations. Without hesitation
or error, he produced the five-finger exercise again, this time using a mixture of
black and white notes. Each pressed silently. All sung flawlessly.
424 adam ockelford

And then, spontaneously, he was off up the keyboard, beginning the same pentatonic
pattern on each of the twelve available keys. At my prompting, Freddie re-ran the
sequence with his left hand—his unbroken voice hoarsely whispering the low notes.
So logical. Why bother to play the notes if you know what they sound like already?
So apparently simple a task, and yet . . . such a difficult feat to accomplish: the whole
contradiction of autism crystallized in a few moments of music making.2

As I later said to Freddie’s father, if I had wanted to teach a “neurotypical” child to do

what his son had achieved with little or no apparent effort, it would probably have taken
many lessons, and hundreds of hours of practice for the pupil to master the relationship
between the Western tonal system and the asymmetrical (yet regular) layout of the
piano keyboard. Yet Freddie had done it merely by watching and listening to what I had
done, attending to the streams of notes flowing by, extracting the implicit rules of Western
musical syntax, and using these to create patterns of sounds anew. The crucial point is that
I had never played the full sequence of scales to Freddie that he subsequently produced.
He had worked out the necessary structures intuitively, merely through exposure to music.

Conclusion

In this chapter, we have seen how some children on the autism spectrum appear to have
aural imaginations that are rooted in processing a range of everyday sounds and even
speech in a musical way. The way they perceive, remember, and imagine sounds has a
high level of intensity born of their sense of AP. This enables them to play by ear—a skill
that is often acquired entirely through their own efforts and that typically first manifests
itself in the early years. But more than this, for Freddie, for Romy, and for many other
children on the autism spectrum, music may be the key not only to aesthetic fulfillment,
but also to communication, shared attention, and emotional understanding. It can do this
because it is a language built not on symbolic meaning but on repetition; on order and
on predictability in the domain of sound. With musically empathetic adults with whom
to interact, this love of pattern—insistence, even—need not restrict the children’s auditory
imaginations but can emancipate them, through the capacity to understand musical
structure and the rules of the generative grammars through which melodies, harmonic
sequences, and rhythms are created afresh.

Notes
1. http://blog.oup.com/2012/12/music-proxy-language-autisic-children. Accessed September
15, 2017.
2. http://www.huffingtonpost.com/adam-ockelford/autism-genius_b_4118805.html.
Accessed September 15, 2017.
inner auditory worlds of children on the autism spectrum 425

References
Brandt, A., M. Gebrian, and L. R. Slevc. 2012. Music and Early Language Acquisition. Frontiers
in Psychology 3. doi:10.3389/fpsyg.2012.00327.
DePape, A.-M. R., G. B. C. Hall, B. Tillmann, and L. J. Trainor. 2012. Auditory Processing in
High-Functioning Adolescents with Autism Spectrum Disorder. PLoS One 7 (9): e44084.
doi:10.1371/journal.pone.0044084.
Fay, W. H. 1967. Childhood Echolalia. Folia Phoniatrica et Logopaedica 19 (4): 297–306.
doi:10.1159/000263153.
Fay, W. H. 1973. On the Echolalia of the Blind and of the Autistic Child. Journal of Speech and
Hearing Disorder 38 (4): 478. doi:10.1044/jshd.3804.478.
Gaver, W. W. 1993. What in the World Do We Hear? An Ecological Approach to Auditory
Event Perception. Ecological Psychology 5 (1): 1–29. doi:10.1207/s15326969eco0501_1.
Lamont, A. 2008. Young Children’s Musical Worlds: Musical Engagement In 3.5-Year-Olds.
Journal of Early Childhood Research 6 (3): 247–261. doi:10.1177/1476718x08094449.
Lecanuet, J.-P. 1996. Prenatal Auditory Experience. In Musical Beginnings, 3–34. Oxford: Oxford
University Press.
Malloch, S., and C. Trevarthen, eds. 2009. Communicative Musicality: Exploring the basis of
Human Companionship. New York, NY: Oxford University Press.
Masataka, N. 2007. Music, Evolution and Language. Developmental Science 10 (1): 35–39.
McEvoy, R. E., K. A. Loveland, and S. H. Landry. 1988. The Functions of Immediate Echolalia
in Autistic Children: A Developmental Perspective. Journal of Autism and Developmental
Disorders 18 (4): 657–668. doi:10.1007/bf02211883.
Mcglone-Dorrian, D., and R. E. Potter. 1984. The Occurrence of Echolalia in Three Year Olds’
Responses to Various Question Types. Communication Disorders Quarterly 7 (2): 38–47.
doi:10.1177/152574018400700204.
Miller, L. 1989. Musical Savants: Exceptional Skill and Mental Retardation. Hillsdale, NJ: Laurence
Erlbaum.
Mills, A. 1993. Visual Handicap. In Language Development in Exceptional Circumstances edited
by D. Bishop and K. Mogford. Hove: Psychology Press, 150–164.
Norman-Haignere, S., N. G. Kanwisher, and J. H. McDermott. 2015. Distinct Cortical Pathways
for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition. Neuron 88 (6):
1281–1296. doi:10.1016/j.neuron.2015.11.035.
Ockelford, A. 2005. Repetition in Music: Theoretical and Metatheoretical perspectives. Farnham:
Ashgate.
Ockelford, A. 2009. In the Key of Genius: The Extraordinary Life of Derek Paravicini. London:
Random House.
Ockelford, A. 2012. Music, Language and Autism. London: Jessica Kingsley.
Ockelford, A. 2013. Applied Musicology: Using Zygonic Theory to Inform Music Education,
Therapy, and Psychology Research. New York, NY: Oxford University Press.
Ockelford, A. 2017. Comparing Notes: How We Make Sense of Music. London: Profile Books.
Patel, A. D. 2012. Language, Music, and the Brain: A Resource-Sharing Framework. In
Language and Music as Cognitive Systems, edited by P. Rebuschat, M. Rohmeier,
J. A. Hawkins, and I. Cross, 204–223. Oxford: Oxford University Press.
Prizant, B. 1979. An analysis of the functions of immediate echolalia in autistic children,
Dissertation Abstracts International 39 (9-B), 4,592–4,593.
426 adam ockelford

Prizant, B. M. and J. F. Duchan. 1981. The Functions of Immediate Echolalia in Autistic

Children. Journal of Speech and Hearing Disorder 46 (3): 241. doi:10.1044/jshd.4603.241.
Reese, A. 2014. The Effect of Exposure to Structured Musical Activities on Communication
Skills and Speech for Children and Young Adults on the Autism Spectrum. Unpublished
PhD thesis, University of Roehampton, London.
Sloboda, J. A., J. W. Davidson, M. J. Howe, and D. G. Moore. 1996. The Role of Practice in the
Development of Performing Musicians. British Journal of Psychology 87 (2): 287–309.
Stalinski, S. M., and E. G. Schellenberg. 2010. Shifting Perceptions: Developmental Changes in
Judgments of Melodic Similarity. Developmental Psychology 46 (6): 1799–1803. doi:10.1037/
a0020658.
Sterponi, L., and J. Shankey. 2013. Rethinking Echolalia: Repetition as Interactional Resource
in the Communication of a Child with Autism. Journal of Child Language 41 (2): 275–304.
doi:10.1017/s0305000912000682.
Takeuchi, A. H., and S. H. Hulse. 1993. Absolute Pitch. Psychological Bulletin 113 (2): 345.
Treffert, D. 2009. The Savant Syndrome: An Extraordinary Condition. A Synopsis: Past, Present,
Future. Philosophical Transactions of The Royal Society B: Biological Sciences 364 (1522):
1351–1357. doi:10.1098/rstb.2008.0326.
Vamvakari, T. 2013. My Child and Music: A Survey Exploration of the Musical Abilities and
Interests of Children and Young People Diagnosed with Autism Spectrum Conditions.
Unpublished master’s thesis, University of Roehampton, London.
Voyajolu, A., and A. Ockelford. 2016. Sounds of Intent in the Early Years: A Proposed
Framework of Young Children’s Musical Development. Research Studies in Music Education
38 (1): 93–113. doi:10.1177/1321103x16642632.
Chapter 21

M u ltimoda l I m agery
i n the R ecepti v e
M usic Th er a py Model
Gu ided I m agery
a n d M usic (GI M )

Lars Ole Bonde

Introduction

Music is a “technology of the self,” as Tia DeNora (2000) concluded in her pioneering
study of how music is used in everyday life. DeNora based her study on interviews
and observations, focusing on how music was used in contexts as different as aerobic
exercise classes, karaoke evenings, and music therapy sessions. DeNora elaborated on
Gibson’s (1983) concept of affordance—in this case documenting how listening to music
can offer the listener a variety of options for use (affordances), mirrored in specific
appropriations related to the listener’s needs and the context. Since DeNora’s study, a
number of empirical studies have provided more and other evidence of how music
listening is appropriated, that is, used for a number of purposes (Bonde et al. 2013;
Clarke 2005; Lilliestam 2013). In my own research, I have concentrated on health
music(k)ing; that is, how music can be used as/in therapy and as a health resource in every
day life (Bonde 2000, 2005, 2007, 2010, 2017; Bonde and Blom 2016). In an ongoing
study on music and public health (Bonde et al. 2018; Ekholm et al. 2016a, 2016b) it is
documented that two-thirds of the adult Danish population use music for relaxation
and mood regulation and that an equal number regard music as a health resource.
In this chapter, I will focus on a specific model of receptive music therapy; that is,
psychotherapy based on imagination facilitated by music listening, namely the Bonny
Method of guided imagery and music (GIM), because this model can illustrate the close
428 lars ole bonde

connection between music, body, and mind, as it emanates spontaneously in

multimodal imagery during music listening. Imagery and imagination are both impor
tant parts of our mental life. Since Descartes, Western philosophers have placed mental
imagery at the center of imagination, as a kind of “perception with the mind’s eye,” and it
is common knowledge that it is possible to imagine even phenomena that we have never
experienced in “real life” (Kind 2006). This is a core feature of creative thinking and
imagination, and highly relevant also in psychotherapy. There have been many con
troversies, especially among psychologists, over the nature of mental images, and the
dichotomy of “pictorialism” versus “descriptionalism” will briefly be mentioned in the
theoretical section. The chapters opens with a brief introduction to the GIM model and
its clinical development over the last forty years; then some clinical material is presented
and analyzed in a neuroaffective perspective; the rest of the chapter will outline different
theoretical perspectives on music and imagery and describe selected and ongoing
research in GIM as “sound imagination.”
Even if GIM is used clinically as a therapeutic method, this type of music listening can
also be used in nonclinical settings, for instance, for self-development. Clarke (2011)
describes the difficulties of studying “what it is like to hear music” and the lack of data to
support a scientifically grounded understanding of musical consciousness. However,
recordings and transcripts of GIM sessions provide unique data that can contribute to
such an understanding. Researchers can study the spontaneous report of experiences
supported and evoked by the music in GIM in many different ways; for example, by per
forming thematic analyses of transcripts (Blom 2014; Bonde and Beck 2018) or by ana
lyzing EEG signals from clients as well as therapists (“travelers” and “guides”) recorded
during therapy (Fachner et al. 2015). This will be illustrated in the chapter, and the value
of such data will be discussed in a broader perspective.

Guided Imagery and Music—Music

Listening as Psychotherapy

The American musician and music therapist Helen Lindquist Bonny (1921–2010)
developed a new model of receptive music therapy in the 1970s and 1980s. It is called the
Bonny Method of GIM and today it is the internationally best-known receptive music
therapy model with training, clinical work, and research performed in four continents.
The Bonny Method is the name of an individual session format developed by Bonny,
while GIM is a generic concept encompassing many different individual or group for
mats using music, imagery, and verbal dialogue in/as therapy (Bruscia 2002). The Bonny
Method is “a model of music psychotherapy centrally consisting of a client imaging
spontaneously to pre-recorded sequences of classical music” (Abrams 2002, 103).
It should be added that the spontaneous imaging to (classical) music in GIM is based on
the induction of an altered state of consciousness (ASC) through deep relaxation
multimodal imagery in guided imagery and music 429

(e.g., progressive muscle relaxation or autogenic suggestions) and supported by a

continuous dialogue in an interactive dyad format. A Bonny Method session lasts
90–120 minutes and is composed of five elements or stages: (1) prelude (verbal check-in
and focusing, 15–20 minutes); (2) relaxation/induction (the client lies down with closed
eyes, 4–8 minutes); (3) music travel (exploring imagery to therapist-selected, prere
corded classical music, with the client in an altered state and guided by the therapist,
30–45 minutes); (4) return (to normal state of consciousness and upright position, often
facilitated by mandala drawing, 5–10 minutes); (5) postlude (verbal dialogue with dis
cussion and integration of the music/imagery experience, as related to prelude focus,
10–20 minutes) (Bonny 1978a; Bonde 2010; Bruscia 2002).
This classical, individual session format is used when clients and patients have
strength/stamina enough to “travel” 30–45 minutes. In many clinical contexts, this is not
the case, and therefore a number of guided or unguided “music and imagery” formats
have been developed for individual and group work in many different clinical contexts
(Grocke and Wigram 2007; Grocke and Moe 2015. In these types of GIM the relaxation/
induction can be shorter and clients may sit up; the music listening phase can be reduced
to a few minutes; there may be no dialogue or guiding interventions during music lis
tening; the music can be nonclassical; the processing may be verbal only; and so forth.
In all formats of GIM, the music functions as a container for the imagery experience
that can be supportive, reeducative, or reconstructive (Summer 2002, 2009). The
choice of music follows similar principles, with a span from gentle and supportive music
to intense and challenging music, as explained in a “taxonomy of therapeutic music”
(Wärja and Bonde 2014). The so-called intensity profile of a piece of music is dependent
on its stability (predictability) or variability (unexpected changes) in tension and release
of the musical parameters.
Grocke (2010) presents an overview of GIM research, including quantitative effect
studies in medical conditions (e.g., hypertension, rheumatoid arthritis, cancer, heart
surgery), as well as qualitative studies: primarily a large number of case studies, studies
of therapeutic processes, imagery development, and the role of the music in GIM. In the
last five years, a number of studies of GIM in new clinical areas (e.g., refugees with war
traumas, clients with simple or complex PTSD, patients in palliative care, war veterans
who are victims of abuse) has broadened the evidence base of GIM (McKinney and
Honig 2017).

Multimodal Imagery in GIM—Examples

and a Neuroaffective Perspective

Music listening, both in and outside GIM, can evoke and support imagery in all sensory
modalities: visual, auditory, olfactory, gustatory, and sensory-kinesthetic. In GIM
theory and practice, emotions and memories are also considered imagery modalities.
430 lars ole bonde

Every individual—client or not—has a personal style of imaging where a certain modality

can be dominant; image sequences can be superficial (stream-of-consciousness) or
organic and intense; the tempo of imagery development may be slow or fast; the body
may be calm or expressive; the dialogue sparse or rich; and the guiding must be adapted
accordingly. In Table 21.1 it is possible to follow several examples of how imagery to one
piece of music is highly individual, and at the same time there are common factors
related to the music as it unfolds.
Imagery can unfold in many ways in the “music travel” of the GIM session. Typically,
it is configured as narrative episodes centered round one or a few core images/meta
phors. In some cases the music travel inspires a whole, coherent story—a narrative
engaging all sensory modalities and following the principles of narrative configuration
or mimesis, as described by Ricoeur (1978; see also Bonde 2000, 2004, 2005). In the fol
lowing, a number of image experiences to the same music are presented. Data comes
from doctoral research (Bonde 2005), and includes the therapist’s session notes and
recordings of four participants’ music travels to the same music: Bach’s Mein Jesu, was
für Seelenweh! BVW 487 in an orchestral (string) arrangement by Leopold Stokowski.
The piece is part of the GIM Music Program Mostly Bach (constructed by Bonny in
1977). The participants in this project were cancer survivors, and the theme “Death” or
fear of death would now and then appear in the imagery, especially when they listened
to this particular piece of music. The original song text is very emotional and describes
Jesus in the garden of Gethsemane on the night before his death, from the perspective of
a compassionate witness. The Stokowski arrangement is purely instrumental, but the
very slow tempo and expressive phrasing brings the passionate nature of the melody and
harmony to the foreground. Extra data material comes from a research workshop where
a group of music therapy researchers listened to the music. Their experiences and com
ments are synthesized in one column. The presentation format is called “event structure
analysis” (Tesch 1990; Marr 2001), and it outlines how the imagery is connected to
specific sections of the music.
Table 21.1 shows that the imagery is personal (columns 4–7) and multimodal (column 2;
olfactory and gustatory imagery were not present in this dataset). All four participants
work with their relationship to death and dying. For Mrs. H, it is very painful to go in
that direction. She resists and reacts with a high degree of body tension. Mrs. A also has
strong bodily reactions; however, she accepts the tension and explores her ambivalent
emotions, with accept of the ultimate transition on the one hand, doubt and not-knowing
on the other hand. In the end (after a repetition of the piece) she accepts the ambiva
lence. Mrs. F explores “The garden of Death” and is reassured by strong images of Death
as a friend. Mrs. L reports a metaphoric fantasy of “The death of an elephant.” She identi
fies the elephant as herself and, quite unexpectedly, a lotus appears as a sign of pro
tection and reassurance. The research workshop group (column 8) identifies the changing
moods in the music and confirm at a more analytical level how the music affords an
existential exchange related to very strong and difficult emotions. The event structure
presentation format gives information on how the dynamic and emotional narratives of
the listeners follow the music’s development closely.
Table 21.1 Event Structure Analysis of Four Cancer Survivors’ and a Group of Researchers’ Imagery Experiences during Listening to
Bach: Mein Jesu! BVW 487 Arrangement (most guide interventions are included, in italics)
Episode/ Code Music Mrs. L 6,2–3 Mrs. F 8,4–5 Mrs. A 10,2–3 Mrs. H 9,4–5 Workshop Comments
Bars

Mein Jesu E Strings only. They sound Now I feel sadness. I want Death to What happens in A cemetery. The music evokes The sad and sombre
soft and muted. Cello Allow yourself to feel be my friend. your neck? Graves and visual imagery and mood from the previous
A1 R
plays the melody. that. Where do you feel Something is pressing stones. emotional reactions. track (Bach: Komm süsser
(1–6) S The bowing is the sadness? (she yawns, massages Tod) is deepened by the
V continuous. In my head. her jaws, tears) soft and earnest voice
of the celli.
A2 V Violins take over the I see an elephant. It is Is anything preventing Can you feel what the It reminds me of Mood: 2 sadness, The initial statement
(1–6 rep.) melody, one octave huge and tired, moves you from that? press is about? death. I don’t sorrow, loneliness. is confirmed by the
R
higher. heavily. The battle is I don’t think so. want Opening toward a violins one octave
E lost. What battle? Is Death nearby? Yes. to go into that. meeting or a vast higher.
How does it look? space.
B1 V/E Celli take over the The battle about It is light and mild—like Allow yourself to feel The body is tense Expansion and The tension builds
(7–14) melody again. The managing everything. the angel: the feelings. all over. exploration—dialogue up through the harmonic
E/A
chromatic quavers end That’s why it is sad.—I It says: “Be not afraid!” is possible. underpinning of the
S with the first breathing am the elephant. It Minor mood changes: chromatic melodic line.
point. Second breathing didn’t succeed. It will 1 (spiritual, dignified, The breathing points allow
point is before the last be shot, I think. It has serious), or 3 (longing, the listener to digest and
phrase. given it up. A lotus yearning). let go. The final melodic
suddenly appears. phrase is like a prayer.

(continued)
Table 21.1 (Continued)
Episode/ Code Music Mrs. L 6,2–3 Mrs. F 8,4–5 Mrs. A 10,2–3 Mrs. H 9,4–5 Workshop Comments
Bars

B2 E/V Violins take over. Just above the head of How is it for you to Images of death/ The celli repeat the final
(7–14 rep.) Dynamic intensity, the elephant. How does hear that? rebirth or saying phrase in an introverted
E
both in crescendi and in that feel? Very good. goodbye are possible. confirmation of the
diminuendi. Surprising Very confident. The necessity of prayer. The
subito piano in the end lotus is a sign that final major chord offers
of the chromatic phrase. someone holds his hand comfort.
over it, even if it can’t
see it . . . I can see it.
B3 E Last three bars are re How is it for you to be It is something about
(Coda) repeated, with celli aware of that? accept—will I ever reach
R
playing the melody. (Coughs). the other side?—And
Ends on a D major It feels safe. what is the other side?
chord. (tears).
How is it for you
right now?
Both difficult and OK.

First column: Episodes corresponding with phenomenological description and formal analysis of the music.
Second column: Coding of image modalities (V = visual, A = auditory, S = sensory-kinesthetic, O = olfactory, G = gustatory, E = emotions, M = memories, R = reflections and
thoughts, T = transpersonal, Ot = other, e.g., body tension).
Third column: Cues referring to the phenomenological description and the Intensity profile of the music.
Fourth–seventh columns: Imagery of four participants (1,1 = First session, first music selection).
Eighth column: Results from a research workshop with music therapy researchers as participants. Mood numbers refer to Hevner (1936).
Ninth column: The author’s hermeneutic interpretation of music and image potential.
multimodal imagery in guided imagery and music 433

The clinical outcome of the participants’ music and imagery experiences is reported
elsewhere (Bonde 2005, 2007). In the context of this chapter, I will examine the expe
riences from a neuroaffective perspective (Hart 2012; Lindvang and Beck 2017).
Neuroaffective theory describes and explains how affects and emotions are aroused and
regulated in different consciousness states, and three basic neurological levels, as
presented, for example, in the theory of the triune brain (the autonomous nervous
system, the limbic system, and the neocortex) and its relevance for psychotherapy
(MacLean 1990; Hart 2012; Lindvang and Beck 2017). Imagery experiences in GIM are
fine illustrations of how these levels are at work—in the same session, in the slightly
altered state (facilitated by the deep relaxation before the music travel), multimodal
images are evoked during music listening and can be correlated with alpha waves in the
brain activity and responses at the autonomous level of the nervous system, focused on
sensory perception and arousal regulating. Imagery is closely connected with emotions,
processed in the limbic system, while the ongoing dialogue between client and therapist
makes a verbal-metaphorical bridge to the frontal cortex system, focused on mentali
zation (Fachner et al. 2015; Hunt 2011, 2015, 2017).
In Mrs. F’s travel, there are very few words. However, the imagery is intense and
concentrated on the existential question: I may die soon, how shall I approach this fact?
Emotionally, she moves between despair and hope, but in the music travel she experi
ences a transformation of anxiety. Hope is activated when Death appears as a friend, not
a foe. This transformation is both emotional (relief and joy) and bodily (deep breathing,
serenity). The music travel of Mrs. L illustrates neuroaffective theory very clearly. First,
a sensory response indicates a change in perception (level one: autonomous), then emo
tions arise (level two: limbic) and images are evoked; and, finally, the transformative
experience of being the elephant links to the frontal level—the client even mentalizes
the elephant as herself. The final phase of the GIM session—the postlude dialogue—is
the stage of integrating the neuroaffective levels by examining emotions, images, and
their connection with the theme in focus. Such existential experiences can lead to new
coping strategies and increased self-awareness.
Together with the growing research in music in everyday life, studies like this suggest
that GIM and other types of deep music listening have an almost unexplored health
potential and should be used in prophylactic projects. The transformative potential of
such experiences can be illustrated by results from a study of GIM with a nonclinical
population (Blom 2014; Bonde and Blom 2016). Ten participants volunteered for a project
presented as “Self-development through music and imagery.” Six participants had
previous GIM experience and were offered three sessions. Four participants had never
experienced GIM; they were offered five sessions. Advanced GIM music programs,
identified as potential sources of transformation, were used in the sessions. All pro
grams included strong and challenging music, with the purpose of inspiring and facili
tating existential and spiritual processes of transformation. Participants filled in
questionnaires on existential well-being, and they were interviewed about their experi
ences together with the therapist (in so-called collaborative interviews); all session tran
scripts were analyzed. This analysis documented that all ten participants used GIM to
434 lars ole bonde

facilitate deep existential work. They all reported strong experiences of beauty and
confirmation at a deep level of being. The experience of surrender (Blom 2014—
described later) could be documented for eight of ten participants in the sessions, and,
in the interviews, they described the seminal influence of these music and imagery
experiences for their inner and outer life.
Such “strong music experiences” have been documented in literature from music
psychology, music sociology, ethnomusicology, and music therapy. One of the pioneer
researchers, the Swedish music psychologist Alf Gabrielsson (2011), collected more than
a thousand first-person reports on such experiences. He and his colleagues analyzed
them phenomenologically and developed a descriptive categorization of characteristics
and types. Gabrielsson and other Scandinavian researchers have looked into the health
potential of such experiences (Bonde et al. 2013; Lilliestam 2013). These studies indicate
that strong music experiences not only have existential meaning for the listener but also
are health promoting—especially when they are shared, such as in individual or group
therapy (Stern 2010).

Theories of Consciousness, Music,

Imagery, Emotion, and Health—as
Related to GIM

The GIM experience is complex, and many types of theories are relevant as part of the
framework of understanding how music, imagery, guiding, drawing, and verbal pro
cessing work together. Helen Bonny, the creator of GIM, thought of GIM as a transfor
mational practice, enabling even transpersonal experiences through music listening.
She was influenced by transpersonal psychology and worked for some years together
with Stanislav Grof at Maryland Psychiatric Center on the selection of music for experi
mental LSD sessions (Bonny 1975, 2002a, 2002b; 2002c). Her so-called Cut-log diagram
(Bonny 1975) is a theoretically based map of the mind, integrating layers and states of
consciousness known from the psychological theories of Freud, Jung, Grof, and Wilber.
The diagram reflects the enormous diversity of GIM experiences in thousands of travel
ers and how the GIM experience can lead the client to many different layers or states in
the same session. The diagram and the theory has been further developed by other GIM
therapists (Goldberg 2002; Clark 2014). Clark (2014) documents how the original two-
dimensional model expanded into three dimensions of “funnel” models (Bush 1995),
a holographic model (Goldberg 2002), and Clark herself suggests a “synthesis” model of
the “invisible, interpenetrating fields” of center and periphery, consciousness, music,
guide, and traveler.
Bonde (2000, 2004, 2005) studied GIM experiences inspired by metaphor theory
(Ricoeur 1978; Lakoff and Johnson 1980; 1999; Johnson 2007). Based on these studies,
multimodal imagery in guided imagery and music 435

I suggest that (mostly nonverbal) images are reported as metaphors in the traveler-guide
dialogue, and that images are configured in narrative scenes, episodes, or complete
narratives revealing embodied core metaphors and “scripts” that can be processed
therapeutically. In this type of dialogic music listening, imagery is reported as a meta
phorical narrative of experiences in other sensory modalities (Horowitz 1983, see later).
I also studied the relationship between music and imagery—among GIM practitioners
often described with the didactic metaphor of “music as cotherapist” (Bonde 2010;
Wärja and Bonde 2014). Based on a number of event structure analyses (see Table 21.1
for an example), I formulated a series of grounded theories (Bonde 2005; 2017), address
ing steps or stages in the therapeutic process and the roles and functions of the musical
elements (melody, harmony, rhythm, form, style, etc.) in GIM. Here are a few obser
vations in the theory of narrative patterns (in the imagery configuration) related to musi
cal structure (Bonde 2005; 2017): The clearer the narrative structure of the music is, the
clearer this will be reflected in the imagery. Music introducing higher intensity and ten
sion is reflected in the imagery in many ways: a change of perspective is seen, manifest
action may replace hesitation or a block, emotional outlets may follow reflections, sud
den insight (“messages”) are experienced, or the imagery develops in a new direction.
Examples can be seen in Table 21.1; for example, in the development of the “Death of an
elephant” story, where the intimate relationship between musical form and narrative
form is demonstrated. A ternary form in the music may impose a ternary narrative or
dramatic structure on the imagery. Simplicity and complexity are complementary in the
development of music and imagery. Simple musical forms with many repetitions tend to
stabilize the imagery, inviting extended descriptions and a differentiation of (emotional)
qualities, while complex or developmental forms with many changes or transformations
tend to impose a dynamic process on the imagery. This theory is closely related to how
DeNora (2011) understands GIM as a “laboratory” where music “provides structures for
formulating thought and . . . knowledge of the world”:

GIM is an excellent natural laboratory, a place in which to see how agents transfer
musical properties to extra-musical properties and how they come to understand
those extra-musical matters through the sonic structure of music, and in real time,
that is, in direct correlation with the unfolding musical event. (317)

Based on Bonny’s early ideas of the “profile of affective/energy dynamics” of the music in
GIM (1978b), I developed a basic classification of “therapeutic music in GIM,” distinguish
ing between the specific intensity profiles of (1) supportive music, (2) mixed supportive/
challenging music, and (3) challenging music (Bonde 2005). The classification was later
developed into a “taxonomy” (Wärja and Bonde 2014) describing in more detail how the
ebb and flow of musical tension and release can be understood in a therapeutic context.
Theories of imagery form a controversial field in clinical psychology. For what
is imagery actually, and how is it related to imagination? The psychologist and
psychotherapist Horowitz (1983) presented a theory of mental representation, with
436 lars ole bonde

imagery in a central role. In this theory, there is a distinction between three modes of
representation—three types of “thinking.” According to Horowitz, enactive representation
is the “thinking of the body” and, mostly, this kind of knowledge is tacit and implicit, and
the first to be developed in the child. Image representation is next in the developmental
process, a specific way of processing information with the inner senses—with at least six
modalities: visual, auditory, sensory-kinesthetic, olfactory, gustatory, and emotional.
The latest stage in the developmental process is thinking in words and concepts (logic
and numbers), what Horowitz calls lexical representation. Horowitz’s theory is a relevant
framework for the understanding of GIM experiences where all three modes of represen
tation are active and where metaphors bridge them. It is also close to neuroaffective
theory. Thinking in multimodal images that are expressed verbally in metaphors and
narrative episodes is much more common and important than we normally think, and
music is probably the most image-stimulating and -evoking medium that exists. In dreams,
daydreams, and creative imaginative states of consciousness imagery belongs to a
specific form of human creativity. In cognitive psychology, however, there has been a
hot debate going on over decades on how to understand mental imagery and its role in
cognition (Kind 2006). There are two competing views: propositional and depictive
(descriptionalism versus pictorialism; the first claiming that images are represented
roughly in the way language is represented, the latter that images are represented roughly
in the same way as pictures). Based on my GIM studies, I am in line with Kosslyn and
colleagues (2006) who support the depictive view and contend not only that mental
images depict information but also that these depictions play a functional role in human
cognition (for example, problem solving, memory, creativity).
From the perspective of interpersonal psychology, the study by Blom (2011, 2014) is
taking music and imagery (and GIM research and theory) to a new level. The study of
imagery in GIM has long focused on the content of the imagery, and systems of classifi
cation have been suggested (Grocke 1999; 2007). As an alternative, Blom suggests that
the focus should be on process, based on the premise that music in GIM is a relational
agent, with the musical elements metaphorically serving as relational ingredients with
transformational potential. The therapeutic relationship (the triangle of music–therapist–
client) is the interpersonal framework of that process, including explicit and implicit
negotiation, disruption and repair, and moments of intense affectivity. Based on the
thorough analysis of music and imagery in ten nonclinical participants’ thirty-eight
music travels to advanced GIM music programs, she developed an intersubjective
understanding of the process of “surrender” in GIM. The processes and the shared
multimodal imagery can be divided into six categories, with the first three describing
basic ways of sharing (1. shared attention, 2. shared intention, 3. shared affectivity) while
the last three are genuine interpersonal experiences (4. confirmation, 5. nonconfirmation,
6. surrender or transcendence).
Imagery is only mentioned briefly in two recent handbooks of music psychology
(Hallam et al. 2009; Juslin and Sloboda 2011). However, in experimental music psychology
both modality-independent and modality-specific imagery have been studied using
multimodal imagery in guided imagery and music 437

functional neuroimaging techniques (McNorgan 2012). McNorgan’s review suggests

there is a core network of brain regions recruited during all types of imagery, while
modality-specific imagery is associated with increased activation in corresponding
sensorimotor regions (10). Hubbard (2010) reviewed empirical studies in auditory
imagery, confirming the activation of neural regions also activated in auditory per
ception. However, these studies of imagined intervals, melodies, and other musical and
verbal elements are not so relevant in the context of multisensory imagery in GIM.
Visual imagery is part of Juslin and Västfjäll’s ecological theory of human perception
of sound and music. They use the acronym BRECVEMA for the eight suggested
mechanisms, conscious as well as unconscious, for induction of emotion through
music: (1) Brain stem reflex, (2) Rhythmic entrainment, (3) Evaluative conditioning,
(4) Contagion, (5) Visual imagery, (6) Episodic memory, (7) Musical expectancy, and
(8) Aesthetic judgment. Juslin, Barrada, and Eerola (2015) have tested four of these
mechanisms experimentally (nos. 1, 4, 6, and 7, i.e., not including imagery). Results
confirm the hypothesis that evoked emotions were related to specific, target mechanism
conditions, and the authors conclude that a multimechanism framework is necessary to
explain how emotional responses to music are mediated. In the context of this chapter,
mechanism no. 5—Visual imagery—is of special interest. Juslin and Västfjäll write that
emotions are evoked because the listener experiences imagery during music listening by
metaphorical transfer from the musical structure (e.g., a rising melodic contour) to per
sonal images and fantasies (e.g., a sunrise). This explanation is in line with the theories
of Bonny as well as Lakoff and Johnson (mentioned earlier). However, this transfer
should not be limited to the visual sense only. Juslin and Västfjäll mention that our
remote human ancestors have been dependent on recognizing and identifying sound
patterns and their meaning; thus, sound and imagery had an important phylogenetic
function. As several chapters in this book show, it is not very likely that only one sense
should be involved in such an important cognitive operation.
Many of the same elements or mechanisms are discussed by Clarke et al. (2015). They
suggest a new “model of musical empathic engagement, from a listening perspective”
(18). This complex and promising theoretical model does not mention imagery explicitly,
however the concept of “mimetic resonance” is very close: “a tendency to hear musical
events in ‘anthropomorphic’ or more broadly animated ways, according to their gestural,
vocal, or dynamic qualities; and incorporating mirror neurons and other components of
perception–action mimicry” (21). This could easily be read as a description of how music
affords the listener a number of options that can be explored through music and imagery
(DeNora 2011). In his ecological theory of music listening, Clarke (2005) emphasizes
the continuity between music and the realities of everyday life and links the auditory
experience of music in modern human beings to the practical functions of auditory
perception in phylogenesis. Clarke does not mention imagery but, like DeNora, he
uses Gibson’s concept of “affordance” and underlines the importance of the social and
the musical context for the actual affordance(s), and therefore also for the appropri
ations by the listener.
438 lars ole bonde

The neuroscience of music has developed a lot over the last twenty years (Christensen
2012). Cognitive neuroscience has broadened our understanding of how music is
processed in the brain, and how the complex interplay of music and emotion involves all
three “systems” of the brain, as mentioned in the section on neuroaffective theory earlier.
However, there are not many neuroscientific studies of spontaneous, music-evoked
imagery or of GIM experiences. An early study by Lem (1999) presented a promising way
of using EEG to document brain activity during listening to a piece of music from the GIM
repertoire and correlating this with the imagery reported post hoc. In a recent neuro
phenomenological study (Hunt 2017), a similar method was used to investigate brain
activity during music listening. The participants listened to music and a script focusing
on only one of six specific imagery modalities: body, visual, kinesthetic, interaction,
affect, and memory (Hunt 2017). In these studies, there was no dialogue and no verbal
reporting during music listening—the imagery cannot be reported immediately because
talking and movements disturb the EEG signal. Therefore, it has not until now been
possible to study brain activity in a naturalistic GIM setting. An ongoing study (Fachner
et al. 2015) has the ambition of solving the problem, at least partially. Two GIM sessions
were recorded in a naturalistic setting, and the traveler’s brain responses were EEG-
recorded during (1) rest, (2) relaxation/induction, and (3) the music travel. The verbal
dialogue was transcribed verbatim to enable an analysis of the imagery and its meaning.
Based on this analysis, core metaphors and episodes of special interest were identified,
and some of these were selected for EEG analysis, based on the premise that there should
be long enough periods of silence before and/or after the verbal report to enable an
uncompromised EEG signal. The analysis is ongoing, and a preliminary conclusion of
this neurometric EEG-LORETA case study was that the ASC (defined as alpha waves
or slower) induced in the relaxation phase has marked influence on the music listen
ing process, and ASC-related change indicates connection to visual imagery processing
during music listening in GIM. In the second phase of this study, EEG signals were been
recorded from both therapist and client simultaneously and in a naturalistic setting. The
analysis is ongoing.

Discussion

Music therapy is not limited to clinical practice areas only. Music therapy research is
recognized as a specific tradition in its own right within musicology (Ruud 2016).
References to music therapy are increasingly found in theories and studies in music
psychology (e.g., Juslin and Västfjäll 2008; Asutay and Västfjäll, this volume, chapter 18;
Eerola and Vuoskoski 2013), and music therapy theory contributes to the understanding
of musicking in a health perspective and an embodiment perspective (Bonde and Beck
forthcoming 2019; Small 1998; Stige 2003). As shown earlier, many different theories
have been developed to explain the complex interplay of music, imagery, and the
multimodal imagery in guided imagery and music 439

interpersonal relationships in GIM. There is also a lot of research to support the evidence
of GIM as an effective method of psychotherapy.
However, neuroscientific evidence of GIM as effective psychotherapy is still quite
sparse. Such experimental studies using advanced technology in a laboratory to study
music and imagery is quite far from both the naturalistic GIM setting and everyday music
listening, and there is still a long way to go to document whether/how pivotal or trans
formative imagery is correlated with changes in brain activity. Therefore, an important
design development (as described earlier) is to record the EEG of both traveler/client and
guide/therapist simultaneously. This can give important information on the neurological
nature of the interpersonal relationship in particular, and the interpersonal nature of
the GIM experience, as suggested by Blom (2014).
With her interpersonal theory of processes in GIM, Blom (2011, 2014) indirectly
contributes to a demystification of spiritual and transpersonal experiences that are often
reported in GIM. Blom gives these strong experiences of “surrender” a contemporary
relational psychological framework and her study indicates the health potential of
such experiences.
Most of the existing music and imagery research in music psychology investigates
imagination of intervals, melodies, and other musical elements in order to compare
them to the listening process (Hubbard 2010; Hubbard, volume 1, chapter 8). This kind
of experimental research has a long history; however, it often lacks ecological validity in
the contexts of receptive music therapy or everyday music listening. It is interesting that
“imagery” is not listed in the index of The Oxford Handbook of Music Psychology (Hallam
et al. 2009), and that “imagining” is only mentioned in the chapter on the psychology of
composition (Impett 2009). Kinesthetic-image schemas are mentioned in the chapter
on music and meaning (Cross and Tolbert 2009), with references to the cognitive
metaphor theory by Lakoff and Johnson (mentioned earlier), listed as an example of an
experientialist approach to music and meaning. Even if the handbook has many chapters
on music and emotion, imagery is not an element in them. In the Handbook of Music
and Emotion (Juslin and Sloboda 2011), imagery is included in the index and discussed
in two chapters. Woody and McPherson (2011) describe how musicians use imagery and
metaphors to evoke emotions for performance. Gabrielsson (2011) reports from his
study of “strong experiences with music” (mentioned earlier) how such experiences are
often reported by the listeners/informants. Juslin and Västfjäll (2008) include imagery
in their promising BRECVEMA model (described earlier), however, they only mention
visual imagery and, as we have seen from the empirical data, imagery is multimodal, not
only visual. As shown by McNorgan (2012) each imagery modality has both general and
specific neural correlates and therefore contributes to meaning in a unique way.
I think the relative absence of empirical, naturalistic music, and imagery studies
in neuroscience reflects a dominating, more or less traditional, postpositivist approach
to research in music listening. The more actual listening reports are included in the
research, the more imagery comes to the foreground. What is suggested here is that
research in music listening should be much more focused on naturalistic settings
440 lars ole bonde

and that the study of multimodal imagery can be a key to broaden our understanding
not only of GIM and other receptive music therapy methods (Hunt 2015) but also
of music listening as—in DeNora’s words—“a technology of the self ” in everyday life
(DeNora 2000, 2007, 2011) and as a genuine health resource (Ekholm et al. 2016a, 2016b;
Bonde et al. 2018). Cognitive neuroscience and neurophenomenology can contribute
to this if the researchers take the epistemological stance that the first-person and the
third-person perspective are equally important (Hunt 2015).

Conclusion

Music imaging is a natural phenomenon that can be encouraged and used in many
different ways and contexts, including music education (Halpern and Overy, this volume,
chapter 19). It is used in therapy (e.g., in GIM) to stimulate the client’s creative imagination
and ability to change or transform inappropriate patterns of attachment and emotion
regulation, but it is also used in everyday life as what Tia DeNora calls “a technology
of the self.” Using the concepts of the social psychologist James Gibson, we can say that
music affords imaging and music imaging can be appropriated in multiple ways, for
creative-imaginative purposes as well as for the regulation of physical, psychological,
and spiritual well-being. What Even Ruud (2010) calls “listening self-care” and “musical
self-medication” are typical forms of appropriations. Music imaging is both a mode of
thinking (based on introjection of patterns afforded by the musical material) and a mode
of expression (affording the projection of personal material of all sorts on the music).
Music listening in GIM therapy is of course not “music listening” per se. Client
experiences are highly personal, even idiosyncratic, and the therapeutic focus is always
more important in the context than the aesthetic qualities of the music. However, GIM
experiences are good examples of music’s affordances and appropriations (DeNora 2000).
With the therapist’s support, the GIM client takes from the music what is needed to explore
salient physical, psychological, social, existential, or spiritual issues. The combination of
music and imagery is not just relevant in a clinical context, even if “image listening” has
been regarded as irrelevant by musicology until recently; the experience of multimodal
imagery while listening to music is inherently human and has a great potential as a health
resource. Before creating the Bonny Method of GIM, Helen Bonny worked together
with the Canadian musicologist Louis Savary on a project called “Listening with a new
consciousness” (Bonny and Savary 1973). This book presents many manuscripts of guided
“music travels” for groups, with many different target groups from school children to
religious groups. The GIM therapist Carol Bush developed “GIM on your own” (1995) as a
method for self-development. The study of imagery during music listening is increas
ingly being integrated in music psychology, and early evidence from neuroscience supports
the prophylactic potential of music and imagery work. In other words, GIM is a well-
documented example of “sound imagination” contributing to a new perspective or
paradigm that Tia DeNora calls a MusEcological perspective (DeNora 2011).
multimodal imagery in guided imagery and music 441

References
Abrams, B. 2002. Transpersonal Dimensions of the Bonny Method. In Guided Imagery and
Music: The Bonny Method and Beyond, edited by K. E. Bruscia and D. E. Grocke, 339–358.
Gilsum NH: Barcelona Publishers.
Blom, K. M. 2011. Transpersonal—Spiritual BMGIM Experiences and the Process of Surrender.
Nordic Journal of Music Therapy 20 (2): 185–203.
Blom, K. M. 2014. Experiences of Transcendence and the Process of Surrender in Guided
Imagery and Music (GIM). PhD thesis. Aalborg University. http://vbn.aau.dk/files/
204635175/Katarina_Martenson_Blom_Thesis.pdf. Accessed December 29, 2018.
Bonde, L. O. 2000. Metaphor and Narrative in Guided Imagery and Music. Journal of the
Association for Music and Imagery 7: 59–76.
Bonde, L. O. 2005. The Bonny Method of Guided Imagery and Music (BMGIM) with Cancer
Survivors: A Psychological Study with Focus on the Influence of BMGIM on Mood and
Quality of Life. PhD thesis, Aalborg University. http://www.wfmt.info/Musictherapyworld/
modules/archive/dissertations/pdfs/Bonde2005.pdf. Accessed December 28, 2018.
Bonde, L. O. 2007. Imagery, Metaphor and Perceived Outcomes in Six Cancer Survivors’
BMGIM Therapy. Qualitative Inquiries in Music Therapy, Vol. 3, edited by A. Meadows,
132–164. Gilsum, NH: Barcelona Publishers.
Bonde, L. O. 2010. Music as Support and Challenge. Jahrbuch Musiktherapie Bd. 6,
Imaginationen in der Musiktherapie, 89–118. Wiesbaden: Reichert Verlag.
Bonde, L. O. 2017. Embodied Music Listening. In The Routledge Companion to Embodied
Music Interaction, edited by M. Lesaffre, M. Leman, and P.-J. Maes, 269–277. London:
Routledge.
Bonde, L. O., and B. D. Beck. 2019 (forthcoming). Imagining Nature during Music Listening.
An Exploration of the Meaning, Sharing and Therapeutic Potential of Nature Imagery in
Guided Imagery and Music. In Nature in Psychotherapy and Arts-Based Therapy, edited by
E. Pfeifer and H.-H. Decker-Voigt. Giessen: Psychosozial Verlag.
Bonde, L. O., and K. M. Blom. 2016. Music Listening and the Experience of Surrender: An
Exploration of Imagery Experiences Evoked by Selected Classical Music from the Western
Tradition. In Cultural Psychology of Musical Experience, edited by H. Klempe, 207–234.
Charlotte, NC: Information Age Publishing.
Bonde, L. O., O. Ekholm, and K. Juel. 2018. Associations between Music and Health-Related
Outcomes in Adult Non-Musicians, Amateur Musicians and Professional Musicians—
Results from a Nationwide Danish Study. Nordic Journal of Music Therapy 27 (4): 262–282.
Bonde, L. O., M. S. Skånland, E. Ruud, and G. Trondalen. 2013. Musical Life Stories: Narratives
on Health Musicking. Oslo: Skriftserie fra Senter for musikk og helse.
Bonny, H. L. 1975. Music and Consciousness. Journal of Music Therapy 12: 121–135.
Bonny, H. L. 1978a. GIM Monograph #1: Facilitating GIM Sessions. Salina, KS: Bonny
Foundation.
Bonny, H. L. 1978b. GIM Monograph #2: The Role of Taped Music Programs in the GIM Process.
Salina, KS: Bonny Foundation.
Bonny, H. L. 2002a. Autobiographical Essay. In Music and Consciousness: The Evolution of
Guided Imagery and Music, edited by L. Summer, 1–18. Gilsum NH: Barcelona Publishers.
Bonny, H. L. 2002b. The Early Development of Guided Imagery and Music (GIM). In Music
and Consciousness: The Evolution of Guided Imagery and Music, edited by L. Summer, 53–68.
Gilsum NH: Barcelona Publishers.
442 lars ole bonde

Bonny, H. L. 2002c. Guided Imagery and Music (GIM): Discovery of the Method. In Music
and Consciousness: The Evolution of Guided Imagery and Music, edited by L. Summer, 43–52.
Gilsum NH: Barcelona Publishers.
Bonny, H., and L. Savary. 1973. Music and Your Mind: Listening with a New Consciousness.
New York: Harper & Row.
Bruscia, K. E. 2002. The Boundaries of Guided Imagery and Music (GIM) and the Bonny
Method. Guided Imagery and Music. In Guided Imagery and Music: The Bonny Method and
Beyond, edited K. E. Bruscia and D. E. Grocke, 37–61. Gilsum NH: Barcelona Publishers.
Bush, C. 1995. Healing Imagery and Music: Pathways to the Inner Self. Portland, OR: Rudra Press.
Christensen, E. 2012. Music Listening, Music Therapy, Phenomenology and Neuroscience.
PhD thesis. Aalborg: Aalborg University. http://vbn.aau.dk/files/68298556/MUSIC_
LISTENING_FINAL_ONLINE_Erik_christensen12.pdf. Accessed May 7, 2017.
Clark, M. 2014. A New Synthesis Model of the Bonny Method of Guided Imagery and Music.
Journal of the Association for Music and Imagery 14: 1–22.
Clarke, D., and E. Clarke. 2011. Music and Consciousness: Philosophical, Psychological, and
Cultural Perspectives. Oxford: Oxford University Press.
Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
Clarke, E. 2011. Music Perception and Musical Consciousness. In Music and Consciousness.
Philosophical, Psychological, and Cultural Perspectives, edited by D. Clarke and E. Clarke,
193–213. Oxford: Oxford University Press.
Clarke, E., T. DeNora, and J. Vuoskoski. 2015. Music, Empathy and Cultural Understanding.
Physics of Life Reviews 15, 61–88. doi:dx.doi.org/10.1016/j.plrev.2015.09.001.
Cross, I., and E. Tolbert. 2009. Music and Meaning. In The Oxford Handbook of Music Psychology,
edited by S. Hallam, I. Cross, and M. Thaut, 33–46. Oxford: Oxford University Press.
DeNora, T. 2000. Music in Everyday Life. Cambridge: Cambridge University Press.
DeNora, T. 2007. Health and Music in Everyday Life—A Theory of Practice. Psyke and Logos
28 (1): 271–287.
DeNora, T. 2011. Practical Consciousness and Social Relation in MusEcological Perspective.
In Music and Consciousness: Philosophical, Psychological, and Cultural Perspectives, edited
by D. Clarke and E. Clarke, 309–326. Oxford: Oxford University Press.
Eerola, T., and J. K. Vuoskoski. 2013. A Review of Music and Emotion Studies: Approaches,
Emotion Models, and Stimuli. Music Perception: An Interdisciplinary Journal 30 (3): 307–340.
Ekholm, O., K. Juel, and L. O. Bonde. 2016a. Associations between Daily Musicking and
Health: Results from a Nationwide Survey in Denmark. Scandinavian Journal of Public
Health 44 (7): 726–732. https://doi.org/10.1177/1403494816664252.
Ekholm, O., K. Juel, and L. O. Bonde. 2016b. Music and Public Health—An Empirical Study of
the Use of Music in the Daily Life of Adult Danes and the Health Implications of Musical
Participation. Arts and Health 8 (2): 154–168. https://doi.org/10.1080/17533015.2015.1048696.
Fachner, J., E. Ala-Ruona, and L. O. Bonde. 2015. Guided Imagery in Music—A Neurometric
EEG/LORETA Case Study. In Proceedings of the Ninth Triennial Conference of the European
Society for the Cognitive Sciences of Music, 17–22 August 2015, edited by J. Ginsborg,
A. Lamont, M. Phillips, and S. Bramley. Manchester, UK: Society for the Cognitive Sciences
of Music (ESCOM).
Gabrielsson, A. 2011. Strong Experiences with Music: Music Is Much More Than Just Music.
Oxford: Oxford University Press.
Gibson, J. J. 1983. The Senses Considered as Perceptual Systems. Westport, CT: Greenwood Press.
multimodal imagery in guided imagery and music 443

Goldberg, F. S. 2002. A Holographic Field Theory Model of the Bonny Method of Guided
Imagery and Music (BMGIM). In Guided Imagery and Music: The Bonny Method and Beyond,
edited by K. E. Bruscia, and D. E. Grocke, 359–377. Gilsum NH: Barcelona Publishers.
Grocke, D. 1999. A Phenomenological Study of Pivotal Moments in Guided Imagery and
Music (GIM) Therapy. PhD thesis. Melbourne: Faculty of Music, The University of
Melbourne. In Music Therapy Info CD-Rom III, edited by D. Aldridge. Witten: Universität
Witten/Herdecke.
Grocke, D. 2010. An Overview of Research in the Bonny Method of Guided Imagery and
Music. Voices: A World Forum for Music Therapy 10 (3). https://voices.no/index.php/voices/
article/view/1886/1651. Accessed December 28, 2018.
Grocke, D., and T. Wigram. 2007. Receptive Methods in Music Therapy: Techniques and Clinical
Applications for Music Therapy Clinicians, Educators, and Students. London: Jessica Kingsley.
Grocke, D., and T. Moe. 2015. Guided Imagery and Music: A Spectrum of Approach. London:
Jessica Kingsley.
Hallam, S., I. Cross, and M. Thaut. 2009. The Oxford Handbook of Music Psychology. Oxford:
Oxford University Press.
Hart, S. 2012. Neuroaffektiv psykoterapi med voksne [Neuroaffective Psychotherapy with
Adults]. Copenhagen: Hans Reitzels Forlag.
Hevner, K. 1936. Experimental Studies of the Elements of Expression in Music. American
Journal of Psychology 48: 246–268.
Horowitz, M. 1983. Image Formation and Psychotherapy. New York: Jason Aronson.
Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136 (2): 302.
Hunt, A. M. 2011. A Neurophenomenological Description of the Guided Imagery and Music
Experience. PhD Thesis.: Philadelphia, PA Temple University.
Hunt, A. 2015. Boundaries and Potentials of Traditional and Alternative Neuroscience
Research Methods in Music Therapy Research. Frontiers in Human Neuroscience, June 9. 9:
342. doi:10.3389/fnhum.2015.00342.
Hunt, A. 2017. Protocol for a Neurophenomenological Investigation of a Guided Imagery and
Music Experience (Part II). Music and Medicine 9 (2): 116–127.
Impett, J. 2009. Making a Mark: The Psychology of Composition. In The Oxford Handbook of
Music Psychology, edited by S. Hallam, I. Cross, and M. Thaut, 651–666. Oxford: Oxford
University Press.
Johnson, M. 2007. The Meaning of the Body: Aesthetics of Human Understanding. Chicago,
IL: Chicago University Press.
Juslin, P. N., and J. A. Sloboda. 2011. Handbook of Music and Emotion. Oxford: Oxford
University Press.
Juslin, P. N., and D. Västfjäll. 2008. Emotional Responses to Music: The Need to Consider
Underlying Mechanisms. Behavioral and Brain Sciences 31: 559–575.
Juslin, T., Barradas, G., and Eerola, T. 2015. From Sound to Significance. Exploring the
Mechanisms Underlying Emotional Reactions to Music. American Journal of Psychology
128 (3): 281–304.
Kind, A. 2006. Imagery and Imagination. Internet Encylopedia of Philosophy, 1–19. https://
www.iep.utm.edu/imagery/. Accessed December 29, 2018.
Kosslyn, S. M., W. L. Thompson, and G. Gais. 2006. The Case for Mental Imagery. Oxford:
Oxford University Press.
Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago and London: University of
Chicago Press.
444 lars ole bonde

Lakoff, G., and M. Johnson. 1999. Philosophy in the Flesh: The Embodied Mind and Its Challenge
to Western Thought. New York: Basic Books.
Lem, A. 1999. Selected Patterns of Brainwave Activity Point to the Connection between
Imagery Experiences and the Psychoacoustic Qualities of Music. In Music Medicine, Vol. 3,
edited by R. R. Pratt, and D. E. Grocke, 75–87. Melbourne: University of Australia.
Lilliestam, L. 2013. Music, the Life Trajectory and Existential Health. In Musical Life Stories:
Narratives on Health Musicking, edited by L. O. Bonde. E. Ruud, M. Skånland, and
G. Trondalen, Anthology #6, 17–39. Oslo: Publications from the Centre for Music and
Health.
Lindvang, C., and B. D. Beck. 2017. Musik, krop og følelser: Neuroaffektive processer i musikterapi
[Music, Body, and Emotions. Neuroaffective Processes in Music Therapy]. Copenhagen:
Frydenlund Academic.
MacLean, P. D. 1990. The Triune Brain in Evolution: Role in Paleocerebral Functions.
New York: Plenum.
Marr, J. 2001. The Use of the Bonny Method of Guided Imagery and Music in Spiritual Growth.
Journal of Pastoral Care 55 (4): 397–406.
McKinney, C., and T. Honig. 2017. Health Outcomes of a Series of Bonny Method of Guided
Imagery and Music Sessions: A Systematic Review. Journal of Music Therapy 54 (1): 1–34.
McNorgan, C. 2012. A Meta-Analytic Review of Multisensory Imagery Identifies the Neural
Correlates of Modality-Specific and Modality-General Imagery. Frontiers in Human
Neuroscience 2012 (6): article 285.
Ricoeur, P. 1978. The Rule of Metaphor: Multi-Disciplinary Studies of the Creation of Meaning
in Language. London: Routledge & Kegan Paul.
Ruud, E. 2010. Music Therapy: A Perspective from the Humanities. Gilsum, NH: Barcelona
Publishers.
Ruud, E. 2016. Musikkvitenskap. Oslo: Universitetsforlaget.
Small, C. 1998. Musicking: The Meanings of Performing and Listening. London: Wesleyan
University Press.
Stern, D. 2010. Forms of Vitality: Exploring Dynamic Experience in Psychology, the Arts,
Psychotherapy and Development. Oxford: Oxford University Press.
Stige, B. 2003. Elaborations toward a Notion of Community Music Therapy. Oslo: Unipub.
Summer, L. 2002. Group Music and Imagery Therapy: Emergent Receptive Techniques in
Music Therapy Practice. In Guided Imagery and Music: The Bonny Method and Beyond,
edited by K. E. Bruscia and D. E. Grocke, 297–306. Gilsum NH: Barcelona Publishers.
Summer, L. 2009. Client Perspectives on the Music in Guided Imagery and Music (GIM).
PhD thesis. Aalborg University. http://vbn.aau.dk/files/112202270/6467_lisa_summer_
thesis.pdf. Accessed December 29, 2018.
Tesch, R. 1990. Qualitative Research: Analysis Types and Software Tools. London.
Wärja, M., and L. O. Bonde. 2014. Music as Co-Therapist: Towards a Taxonomy of Music in
Therapeutic Music and Imagery. Music and Medicine 6 (2): 16–27.
Woody, R. H., and G. E. McPherson. 2011. Emotion and Motivation in the Lives of Performers.
In Handbook of Music and Emotion, edited by P. N. Juslin and J. A. Sloboda, 401–424.
Oxford: Oxford University Press.
chapter 22

Empir ica l M usica l

Im agery beyon d
the “Mi n d’s E a r”

Freya Bailes

Introduction

Many empirical studies of musical imagery begin by defining their subject as music
“heard” by the mind’s ear, before swiftly acknowledging the importance of additional
musical dimensions to the sonic. In defense of this approach, there are good reasons to
emphasize the auditory components of imaged music when defining it, since mental
imagery is generally understood to be a visual phenomenon. Even those who have
previously encountered the term “musical imagery” might conceive of it as a primarily
visual image accompanying heard music, as in its therapeutic use in Guided Imagery and
Music (see Bonde, this volume, chapter 21). An alternative approach to communicating
the intended meaning of musical imagery is to provide examples, which might
include having an “earworm,” mentally continuing music that has stopped, audiating a
musical score (see Halpern and Overy, this volume, chapter 19), mentally rehearsing for
a music performance, or imagining1 a new composition. None of these examples is
prescriptive with respect to the sensory modalities that might be represented in imagi-
nation, but neither do they indicate what might be imaged in addition to sound, and this
chapter aims to explore the multimodality of our imagery for music.
Returning to attempts to define musical imagery, in Bailes (2007) I explicitly refer to
the “mind’s ear,” defining musical imagery as “the experience of imagining musical
sound in the absence of directly corresponding sound stimulation from the physical
environment” (555). While this definition encapsulates the notion of simulating sensory
experience, it focuses exclusively on sound. Beaty and colleagues (2013) describe musical
imagery as “melodies of the mind” (1163), a neutral expression though one that suggests a
passive occurrence. In their study of involuntary musical imagery, Jakubowski and
446 freya bailes

colleagues (2015) refer to a “mental replay of music” (1229), while Liikkanen (2012)
poetically describes musical imagery as “a mental soundscape audible for our ‘inner
ear’ ” (236). Weber and Brown (1986) introduced musical imagery as “a particular form
of auditory imagery in which one imagines a melody or song . . . the ability to imagine,
among other things, tonal progressions” (411).
Researchers must account for an increasing body of evidence to suggest that rather
than merely imaging the sound of music, we image visual and kinesthetic dimensions of
musical experience as well.2 In this chapter, I begin with an introduction to theories
whereby our seemingly disembodied mental imagery can instead be understood in
relation to embodied cognition. I will revisit the findings of empirical studies of musical
imagery to determine the extent to which embodied cognition could hold explanatory
power, before considering recent work that has directly tested hypotheses relating
body movement to musical imagery, and outlining a number of possible directions for
future research.

Embodied Cognition
and Mental Imagery

Others have already reflected on how mental imagery might relate to our embodied
experience. One theoretical position of relevance to an embodied account of mental
imagery is experiential cognition (Reybrouck 2001), which posits that our represen-
tation of the world is generated by an interaction between environmental input and
our capacity to represent it. Our bodies are our most immediate environments, and
our physicality in turn governs our interaction with the wider environment. In their
seminal text on embodied cognition, Varela and colleagues (1991) emphasize the
dependence of minds on bodies that are characterized by certain sensorimotor
capacities. For them, embodied “means reflection in which body and mind have been
brought together” (Varela et al. 1991, 27). By this argument, the apparently disembodied
mental simulation of sensorimotor experience is necessarily conditioned by our
physical experiences of the world.
In parallel work, there is increasing evidence to support theories that our perceptions
are influenced by the possible actions afforded by what we perceive (Gibson 1986;
Hubbard 2013). According to these theories, perceiving the actions of another will
activate motor plans of our own (Schiavio et al. 2014). In this way, listening to music
implies the actions associated with its production (Cox 2001; Reybrouck 2001). The
motor theory of perception originated as a theory of language perception, and has been
evoked to explain the influence of motor constraints on our representations of verbal
stimuli (Hubbard 2013). Callan and colleagues (2006) extended the concept to suggest the
existence of a motor theory of music perception, to account for their findings of activation
of the motor cortex in both covert speech and song.
empirical musical imagery beyond the “mind’s ear” 447

The relevance of perception–action coupling for our understanding of mental

imagery has also been explored. Referring to work by Berthoz (1996) on imaged
movements, Reybrouck notes that the supplementary motor area (SMA) of the brain is
implicated in both perception and action, and the same is true of imaged action. He
writes, “Perception, therefore, can be considered as simulated action, as imagining the
actions that are implied in using the perceived objects” (120–121). It follows:

• That there is a strong link between our knowledge of sound and sound sources,
both in perception and cognition, so that features of sound are in most cases
related to features of sound-production, sound-production here understood as
including both the sound-producing action and the features of the resonant bod-
ies and environments. And, as an extension of this:
• That images of sound-production, including visual, motor, tactile etc. elements,
may actually trigger images of sound, and conversely, that images of sound may
trigger images of sound-production. (Godøy 2001, 238)

As an extension of this idea, Godøy (2001) also suggests that the greater our under-
standing of how sounds are produced, the greater the likelihood of their salience as
auditory imagery.3 This leads to my prediction that the degree to which musical imagery
is embodied lies along a continuum ranging from imaging oneself performing a clearly
defined and rehearsed sonic output at one extreme, to imaging the timbre of an arti
ficially produced sine wave, which the human body could not produce without recourse
to digital means, at the other. A pertinent question arises as to whether our musical
imagery can ever be so abstracted from its origins in sound production as to be effectively
disembodied. In line with the theoretical propositions of embodied cognition
(e.g., Varela et al. 1991; Niedenthal et al. 2005), I argue that musical imagery cannot be fully
disembodied. As embodied minds, our thoughts are inseparable from our sensorimotor
experience, and in the absence of personal experience in producing specific sounds, we
draw on our knowledge of the actions required to make similar sounds to infer and
image the sorts of articulatory gestures involved in their making (Cox 2001; Godøy 2001;
Godøy, this volume, chapter 12).

Offline Cognition

Theories of embodied cognition distinguish between “online” and “offline” processes

(Niedenthal et al. 2005). Online embodiment relates to our processing of the real-world,
external environment, occurring, for example, while listening to music. Offline
embodiment is described as the simulation of our online cognitions, decoupled from
current real-world operations. By such an account, mental imagery can be understood
as a form of modality-relevant offline cognition, simulating our embodied online
experiences. Insofar as our online cognition of music is a multimodal phenomenon
448 freya bailes

relating to our bodily senses, we might expect offline music cognition to reflect these
same bodily concerns.
The distinction between online and offline cognition contrasts sensorimotor processing
with ideomotor simulation respectively. Relevant to the motor theory of perception
outlined above, Reybrouck (2001) argues that:

It makes a difference . . . as to both the intensitiy [sic] and precision of the covert
movements (the ideomotor simulation) if the subject who tries to imagine a certain
musical structure is an expert or a layman. Subjects who received formal musical
training can use this explicit musical knowledge and will easily imagine all the
motor processes that are connected with the production of the sounds. (129)

Indeed, embodied, real-world experience through the online cognition of music is an

important prerequisite for the offline cognition of music, and hence our ability to
simulate music in imagination is inextricably linked to our musical experience. This
experience might take the form of many thousands of hours of individual practice on a
musical instrument, but just as importantly it might relate to our experience of being
sung to as infants.
The establishment of sensorimotor associations through learning is central to
Pfordresher, Halpern, and Greenspon’s (2015) model of multimodal imagery association
(MMIA). According to this model, those with a vocal pitch-imitation deficit have
imagery weaknesses in auditory-motor mapping. One example of such poor mapping
would be between an auditory image of the desired pitch height of a note and a motor
image of the necessary laryngeal tension to sing it. A strength of the MMIA model is its
inclusion of a proposed account of the development of sensorimotor imagery, from
the initial associations between vocalizations and auditory feedback that are formed
during infancy, through the development of associations in memory between specific
auditory and motor imagery, through to the ability to generalize sensorimotor associ-
ations to imagine new musical sequences.
There is some contention regarding embodied accounts of auditory imagery. An
unresolved debate originating in research on auditory imagery for speech concerns
the extent to which auditory imagery is a matter of an “inner ear” or an “inner voice”
(Hubbard 2013; Hubbard, volume 1, chapter 8). The “inner ear” in the context of imagery
for speech is akin to an auditory image of the encoded speech, while the “inner voice”
involves auditory and kinesthetic modalities though subvocal articulatory rehearsal
(Kalakoski 2001). In 2013, Hubbard suggested that this distinction was redundant,
instead drawing on perception–action coupling theories to defend his perspective, as
follows: (1) Perception of another’s actions is believed to activate one’s own motor plans,
so (2) If imagery involves the same mechanisms as perception (which is consistent with
the claims of embodied cognition), then (3) Imaging another’s speech should involve
articulatory mechanisms just as imaging one’s own speech. In other words, according to
this account, embodiment is at work in mental imagery even for sounds that we have
not ourselves enacted. The debate is complicated in its application to musical imagery,
empirical musical imagery beyond the “mind’s ear” 449

since not all music is vocally produced (Cox 2001; see Hubbard volume 1, chapter 8), and
yet there is compelling evidence that we are able to simulate and image a variety of
musical sounds, which I will now review.

An Embodied Review of Empirical

Studies of Musical Imagery

We have seen a move in the cognitive sciences toward an embodied account of our
mental activity (Niedenthal et al. 2005; Glenberg et al. 2013), and other chapters in this
handbook reflect this focus (see Christensen, this volume, chapter 1; Huvenne, volume 1,
chapter 30; Saslaw and Walsh, this volume, chapter 7). I will now revisit the findings of
past empirical studies of musical imagery from this embodied perspective. The purpose
of this review is not to prove or disprove the embodiment of musical imagery, since such
an approach is methodologically untenable, and it would be impossible to refute the
argument that our minds are embodied. Rather, the purpose is to determine whether
embodied cognition could have explanatory power with respect to the findings of
musical imagery studies that were not necessarily designed to test such theories. It
should be noted that our retrospective view of the indicators of embodied imagery is
probably obscured by the neglect of past researchers to enquire about, or express an
interest in, those imagery parameters that might reflect embodiment. A similar point
has been made by Hubbard (2013) regarding empirical studies of auditory imagery that
do not habitually ask participants about their concurrent experiences of visual imagery.
However, some studies of musical imagery are directly concerned with bodily
involvement, since they focus on music performance. These will be reviewed first,
before a review of studies in which musical imagery occurs during composition and
listening, in voluntary musical imagery tasks, and during involuntary musical imagery.

Imagery in Performance
In order to perform almost all forms of music,4 one must move to produce the sound.
This fundamental auditory-motor association appears to be represented in imagery,
with increasing behavioral and brain imaging evidence being consistent with a role for
kinesthetic imagery in music performance (see Lotze 2013, for a review). However, such
auditory-motor associations must first be formed through experience of action in our
sonic environment and, in the case of expert musicians, through repeated and deliberate
musical enactment. In research by Lotze and colleagues (2003), professional violinists
scored higher than amateur violinists for the vividness of their movement imagery and,
at a neural level, they showed increased brain activations in the representation areas of
the fingers during an imagined performance of Mozart’s violin concerto in D Major.
450 freya bailes

To aid in their memorization of pitch sequences, trained music students are able
to use finger tapping, as though tapping on a keyboard. This motor-encoding strategy
appeared to reinforce their representation of the auditory stimuli (Mikumo 1994). In a
study of pianists’ uses of musical imagery for expressive parameters during performance,
we also found evidence that musical imagery was strengthened when the pianists were
able to play on a silent piano keyboard, thus providing motor reinforcement (Bishop
et al. 2013). There is evidence that during imaged song, the motor cortex is activated. For
instance, Callan and colleagues (2006) asked participants in a functional magnetic
resonance imaging (fMRI) study of the brain regions involved in perceived and imaged
speech and song to covertly sing (i.e., image) stimuli cued by visually presented lyrics.
Even though the task made no explicit motor demands, it seems that the song imagery
was nevertheless embodied.
An important means by which musical imagery can facilitate performance is by
enhancing the ability to anticipate upcoming events. This was the focus of work by
Keller and colleagues (2010), whose findings were consistent with the use of auditory
imagery to enable action planning. Specifically, their method allowed them to relate
anticipatory imagery for specific pitches to the accuracy of the actions required to
“perform” them. They concluded that cross-modal (i.e., auditory, visual, motor) ideomotor
processes were in operation, which would be consistent with an embodied representa-
tion of the pitch-space array. In a related study, Keller and Appel (2010) investigated the
role of anticipatory auditory imagery in ensemble performance. The auditory imagery
abilities of the duo pianists related to the quality of their coordination, regardless of
whether or not they were able to see each other as they performed. The authors again
postulate a role for ideomotor processes, suggesting that auditory imagery enhances the
operation of internal models that simulate the action of both oneself and others. Indeed,
learning the part of one’s duo partner by rehearsing it can be detrimental when it comes
to subsequently performing the duo with them, since an embodied representation of
the partner’s part, which necessarily differs from one’s own interpretation, can hinder
coordination (Ragert et al. 2013).
Another study of imagery for performance is a participant observation study of an
extended masterclass led by Nelly Ben-Or for expert pianists (Davidson-Kelly et al. 2015).
Central to Ben-Or’s approach is the use of multimodal musical imagery during per-
formance preparation. Eleven participants in her five-day masterclass were observed
and interviewed about their experiences, and a follow-up questionnaire was also given
out nine months later. A thematic analysis of the resulting data led to the articulation of
key elements of Ben-Or’s pedagogy. The principal feature is that performers should
memorize the music before physically rehearsing it. While this might seem to be an
extreme of disembodiment, the opposite could be said of the mental imagery that is
consequently required of the pianists, since in order to memorize a performance piece,
auditory, motor, and visual aspects must be integrated. Nelly Ben-Or herself explains
that the memory formed during deliberate imagery rehearsal is “a kind of memory that
includes an inner sense of the action of playing that music which I see [and it] has to
include a vision of the keyboard” (Davidson-Kelly et al. 2015, 86). The authors of the
empirical musical imagery beyond the “mind’s ear” 451

study explore possible cognitive mechanisms by which “total inner memory” might
enable effective performance. In particular, they suggest that the mental focus on the
distal performance goal afforded by the multimodal image could enhance a close
connection with the sound without the potentially disruptive effects of attending to
proximal issues of technical production. While Ben-Or’s instruction prioritizes nonmotor
tasks, an embodied understanding of sound production is assumed, so that the motor
aspects of performance automatically fall into place as long as the musical image is
complete. Interestingly, the pianists who participated in this study increased their
ratings of the importance of imagining movement during performance preparation
following the masterclass.
In an experience sampling survey of the everyday experiences of musical imagery
(Bailes 2007), music students reported imaging music in the course of their daily life
that they had recently performed, and also music that they were preparing for an
upcoming performance. The extent to which musicians are more inclined to imagine
music associated with performance than music they do not normally enact remains an
open question. It seems appropriate to look to the relationship of music to dance for
confirmation of mental kinesthetic-musical links. In an experimental study of memory
for music and dance (Mitchell and Gallagher 2001), participants were visually presented
with sequences alternating music and dance stimuli. Some participants reported mentally
accompanying the silent dance performances with the previously presented musical
stimulus. In other words, there was a reported tendency to match performed movement
with imaged sound.

Imagery in Composing and Listening

If we accept that musical imagery for performance is embodied, we might wonder
whether performer-composers manifest such embodiment in their compositional
imagery. Evidence of sorts for the potential influence of playing the keyboard (e.g.,
organ, harpsichord, piano) on the imagery of composers is presented by a detailed
score analysis of works by Bach, Beethoven, Schubert, Schoenberg, and Webern
(Baker 2001). Baker argues that keyboard-thinking is reflected in the spatial arrange-
ments of parts that match physical dispositions such as asymmetrical writing around
the body’s center at the keyboard (middle C), and gestures reminiscent of hand
crossing. In this way, the musical imagery of composers experienced as keyboard
players is embodied.
Eitan and Granot (2006) provide other clues to an embodied understanding of mental
imagery in relation to music. Rather than concerning the ways in which we image the
auditory dimensions of music, they studied the ways in which their participants imaged
motion while actually listening to music. Parallels were observed between changes in
the auditory space of the stimuli (e.g., pitch height) and the analogies of movement in
visual space as described by the listeners. The results were framed in terms of a complex
mapping between bodily motion and auditory information.
452 freya bailes

Voluntary Musical Imagery

An activity in which many performing musicians in Western classical music traditions
engage, but which can be separated from performance, is that of reading a musical score.
Brodsky and colleagues (1999, 2003) invented an ingenious method to assess the
“notational audiation” abilities of musicians (including professional orchestra players,
conservatorium musicians, and music-specialist high school students). This is the
ability to transform musical notation into an auditory image. Well-known musical
themes were embedded within written variations, which were presented to musicians to
read and audiate. Musicians were then presented with two aurally presented themes,
from which they had to select the one that had been embedded in the notated variation.
To test the hypothesis that musicians would subvocalize the music that they were
mentally reading, consecutive interference tasks were introduced to variously disrupt
such subvocalization. Performance on the embedded melodies task was worse during
phonatory interference (e.g., wordless singing) than rhythmic interference (e.g., finger
tapping a beat while hearing a task-irrelevant rhythmic pattern), and electromyography
(EMG) measurements of electrical activity in the muscles near the larynx were greater
when participants read the musical score than during control tasks. The authors conclude
that notational audiation involves kinesthetic-like phonatory processes, consistent with
an embodied account of score reading.
Our knowledge of the role of subvocalization in auditory imagery relates to the more
extensive body of work on inner speech. Much of this concerns phonological encoding
(the so-called inner ear) and rehearsal (the so-called inner voice), which are supposed
to be handled by the phonological loop of Baddeley’s (1986) working memory model.
Kalakoski (2001) reports experiments to extend studies on working memory and
auditory imagery for speech to the realm of music. Interestingly, she finds that articu
latory suppression and concurrent speech interfere with performance on experimental
tasks that measure melody recall but not pitch comparison. Can such a result be
explained by an embodied account of mental imagery? It might be speculated that a nat-
ural response to melody is to inwardly “sing” it, while the comparison of pitches is less
likely to call on covert production. Such a difference in rehearsal strategy could account
for disrupted subvocalization during melody tasks, but not during pitch comparison.
A fascinating instance of empirical musical imagery research that produced unex-
pected but potentially illuminating results can be found in a brain imaging study by
Halpern and colleagues (2004). In their study of the neural correlates of imagined
timbre, they sought to see whether supplementary motor area (SMA) activity previously
observed during auditory imagery tasks (Zatorre et al. 1996; Halpern and Zatorre
1999) could be explained by subvocalization. They predicted that a timbre imagery
task would not elicit such subvocalization, given the difficulties of vocally producing
nonvocal timbres, and this would be reflected in the absence of activity in SMA.
However, some subthreshold activity in SMA was recorded, and the authors
acknowledged that the timbre used in their study was pitched, which is a parameter
that can be readily vocalized.
empirical musical imagery beyond the “mind’s ear” 453

A recent fMRI study of the brain activity associated with mentally transforming
imagery for melodies found that some regions associated with motor control were
activated (Foster et al. 2013). In this study, participants were instructed to mentally
transpose or reverse melodies, and then judge whether the subsequently presented
comparison stimulus matched their transformed image. Activation was found in the
intraparietal sulcus (IPS), forming part of the posterior parietal cortex (PPC), which is
connected to both working memory and motor-planning centers of the brain. The
authors also found clusters of significant interest in SMA when they contrasted the
reversed condition with the control, as well as consistent activations in pre-SMA during
both types of melody transformation task. While the authors do not speculate on their
findings of SMA and pre-SMA activation, the involvement of motor centers in such
an ostensibly mental task could be indicative of a link with covert production or
ideomotor simulation.

Involuntary Musical Imagery

A recent use of neuroimaging to assess the brain structures associated with involuntary
musical imagery did not report the significant role for the SMA suggested by earlier
studies of voluntary musical imagery (Halpern 2015, 33). Because of the difficulties of
capturing involuntary musical imagery (INMI) in a laboratory setting, however, par-
ticipants in this study were not scanned while imaging music. Rather, the focus was on
the individual differences in cortical structure that might be linked with general INMI
experiences across time. The INMI experiences were measured using the Involuntary
Musical Imagery Scale (IMIS; Farrugia et al. 2015), and ratings for the “Movement”
subscale did not significantly relate to variations in brain structure either. A thorough
search for neural evidence of embodied INMI would need to capture the musical
imagery as it occurs, which still represents important methodological challenges for
researchers of spontaneous cognition.
Current trends for naturalistic studies have favored a preponderance of publications
examining the everyday experience of INMI. The importance of the body in everyday
experiences of musical imagery is suggested by the finding of a correlation between
individuals’ frequencies of INMI episodes and their ratings on the “Body” factor of the
Goldsmiths Musical Sophistication Index (Floridou et al. 2012). Moreover, recent work
points to an association between singing and the frequency and pleasantness of INMI
(Williamson and Jilka 2014). More evidence of embodied INMI comes from a question-
naire study of the individual differences that predict INMI patterns (Müllensiefen
et al. 2014). It was found that a measure of everyday singing, which encompassed self-rated
singing ability and the extent of self-reported sing-along behavior, predicted the length
of reported episodes of INMI. Since this measure of everyday singing relates to musical
enactment, any resulting imagery might have an embodied dimension. A link was also
found between listening engagement and reported INMI frequency. However, while
musical training is supposed to enhance auditory-motor associations, no relationship
454 freya bailes

was found between musical training and the self-report measures of INMI employed in
this study. Active musical engagement is important in relation to INMI (Williamson
et al. 2011), and Liikkanen (2012) found that INMI was associated with exercising. In
earlier work, I found that music students reported musical imagery during activities
that involve motion, such as when traveling or getting up in the morning (Bailes 2006).
Music students describing their experiences of musical imagery report concurrent
visual and motor dimensions (Bailes 2007), as did the musically experienced interviewees
in an INMI study by Williamson and Jilka (2014).
If we are more inclined to image music that we are able to sing than music that we are
not able to sing, then our musical imagery should reflect the characteristics of vocal
music. Work by Burgoyne5 and colleagues is consistent with this argument. They have
been using online gaming to gather data about the catchiness of music, with gamers
indicating their familiarity with popular music and then indicating whether the music’s
continuation after a period of silence is correct or not, requiring them to mentally con-
tinue the music and compare the subsequently presented snippet with their mental
continuation. Using sophisticated algorithms, they have been able to establish the
salient musical parameters that make such music catchy, and these are melodic repe-
tition, vocal prominence, melodic conventionality, and melodic range conventionality.
The propensity for certain popular music to be experienced as an “earworm” is not
explained by its popularity (chart position) or exposure (recent runs) alone (Jakubowski
et al. 2016). Perhaps it is of significance that it is music that can be readily sung that sticks
in our memory.
Floridou and Müllensiefen (2015) used experience-sampling methods to explore the
conditions that predict INMI. Respondents in their study were asked not only about
their experiences of INMI, but also about mind wandering. Responses were modeled in
relation to contextual factors such as the activity that participants were undertaking
when they were contacted. One finding was the statistical relationship between mind
wandering and INMI, suggesting that mind wandering was a prerequisite to INMI. In
turn, mind wandering was statistically linked to the activity that respondents were
doing when their experience was sampled. Given that physical movement was one of the
activities found to favor mind wandering, this research could point to a role for the
bodily initiation of a chain of effect from activity to mind wandering to INMI. However,
a replication and extension of my earlier (Bailes 2007) empirical study of musical
imagery in everyday life did not confirm the previously found relationship between the
activity that respondents were engaged with and their propensity to imagine music
(Bailes 2015). This more recent work sampled the experiences of members of the general
public rather than university music students. Perhaps the association between activity
and imagery found in the earlier work relates to the theoretically stronger auditory-
motor associations that result from musical training.
An increasingly frequent suggestion in studies of INMI is that arousal state plays a
role in its encounter. In an experience sampling study of the phenomenology of musical
imagery in its everyday occurrence, Beaty and colleagues (2013) report that participants
imaged music more when they felt happy or worried, but not sad. While happy and worried
empirical musical imagery beyond the “mind’s ear” 455

represent emotions that are high in arousal, sad is typically considered to be a low arousal
state. As a result of their interview study, Williamson and Jilka (2014) speculate, “INMI
may have a functional relationship with arousal state whereby it can be triggered uncon-
sciously in order to modulate a person’s psychophysiological arousal level” (666). It
seems that the body might play a variety of different roles when it comes to shaping our
everyday experiences of imaging music: physical enactment contributes to an embodied
memory for music; our physical capabilities facilitate imagery for music that we can
produce with our bodies; INMI could function to moderate our physiological arousal;
and activities involving motion are associated with musical imagery.
In my experience sampling study of everyday musical imagery occurrences (Bailes 2015),
I was interested in the mood of participants at the times they were observed. Mood
scales were included to measure the respondents’ positivity, present-mindedness, and
arousal (alert-drowsy, energetic-tired). A model of mood ratings during musical
imagery episodes found that respondents were unlikely to report imaging music when
they felt drowsy. The relationship between INMI and subjective arousal is relevant to
work by Jakubowski and colleagues (2015), who tracked the tempo of imaged music by
asking respondents to tap it as it occurred in everyday life, with measurements recorded
by a wrist-worn accelerometer. Participants further noted information about their
circumstances at the time in a diary. While no measure of the physiological arousal of
the respondents was recorded, we do have information about their subjective ratings of
arousal, and these were found to be significantly related to the tempo of the music that
they tapped. This is in keeping with one of the four factors of the newly created IMIS,
“Movement”6 (Floridou et al. 2015). A factor analysis of answers to a self-report inventory
of individual differences in INMI grouped the following movement items together:
“The rhythms of my earworms match my movements,” “The way I move is in sync with
my earworms,” and “When I get an earworm I move to the beat of the imagined music.”
This “Movement” factor was subsequently found to correlate with a number of other
existing measures. Notable correlations occurred with the reported frequency of
experiencing INMI and the Bucknell Auditory Imagery Scale-Vividness (BAIS-V)
(Halpern 2015). The authors note a “potential for overlap in embodied responses to
hearing real music and experiencing spontaneous INMI, a link that could be explored
with both behavioral and neuroimaging studies” (Floridou et al. 2015, 33).
It is commonly believed that “earworms” are an annoyance, and work by Williamson
and colleagues (2014) sought to understand how we deal with them when they occur.
Using data from English and Finnish online surveys, the authors conducted a qualitative
analysis of 1,046 earworm reports and found that physical approaches to dealing with the
phenomenon were among the most popular responses. For example, respondents would
seek out the tune (including singing it or playing it) or use musical or verbal distraction
such as humming, singing, talking aloud, or listening to music/the radio/television.
Response categories derived from the English survey included a “Physical” subcategory
under the “Distract” theme, with physical behaviors intended to distract the respondent
from their earworm including the subgroupings “eat,” “rhythmic,” “breathe,” “exercise,”
and “work.” A second model was derived for the English survey data to only include INMI
456 freya bailes

behaviors that were rated as being effective. This model retained a “Physical” subcategory
for the “Distract” theme, and nonmusical forms of distraction included speech and
watching television. Williamson and colleagues (2014) suggest that we use distraction
behaviors that compete in working memory with the musical imagery, in this case
implicating movement and thus bodily involvement.

Tests of Musical Imagery’s Embodiment

In summary, empirical studies of musical imagery present data that vary in the strength
to which they can be considered to be supportive of an embodied interpretation. Where
evidence is lacking, this could be attributable to the use of research methods that are not
optimally applied to further our understanding of imagery embodiment, since they
were designed to address quite different research problems. However, a handful of
studies have now been conducted to test specific hypotheses that relate musical imagery
to movement. McCullough Campbell and Margulis (2015) set about testing the hypoth-
esis that physical activity during music listening would induce more frequent INMI
than passive music listening. In this research, 123 participants were randomly assigned
to different experiment conditions that varied in the requirement to have a motor
involvement while listening to a song (thought to be likely to induce INMI). Participants
were instructed to listen, move, or sing while hearing the song over headphones, before
being asked to take part in a dot-tracking task designed to induce INMI because of its
low demands on the participants’ attention. Following this, participants completed a
questionnaire asking them about their INMI experiences both during the experiment
and in general. Contrary to expectation, no significant differences were found between
the experiment groups, seemingly because participants found it difficult to comply with
the instruction to listen silently without moving. Consequently, the authors of the study
re-analyzed the data comparing INMI frequency in relation to the amount of motor
involvement that the individual participants reported, rather than the amount that was
asked of them by their experimental condition. This analysis revealed that those par-
ticipants who reported both moving and vocalizing during the song presentation expe-
rienced more INMI than those who reported being still and silent. The finding that
“moving and vocalizing proved near irresistible” (Margulis et al. 2015, 353) in itself lends
support to the case for the embodiment of musical engagement, and the propensity to
move or vocalize to some extent, while listening, is well known.
Beaman and colleagues (2015) investigated the role of articulatory motor planning
during both voluntary and involuntary musical recollections. Following in the tradition
of research suggesting the importance of subvocalization in auditory imagery, they
devised a paradigm in which participants were exposed to a particular song and then
the incidence of imaging it was recorded. Subsequent to the song presentation, participants
were either asked to chew gum or were not given gum to chew. The authors hypothesized
empirical musical imagery beyond the “mind’s ear” 457

that chewing gum should serve to degrade articulatory motor programming, and so
reduce the incidence of musical imagery accounts. Their findings suggest that musical
recollections were reduced when chewing gum, and the authors argue that this reflects
an association between articulatory motor programming and imagery for song. I will
now consider some of the other ways in which theories of embodied mental imagery
might be explicitly tested to further our understanding of the role of the body in our
musical imagination.

Directions for Future Research

We have seen that many empirical findings from studies of musical imagery challenge
the restricted notion of hearing in the “mind’s ear,” since our body is implicated in the
quality of the experience. However, searching the literature for compatible evidence for
an embodied account of musical imagery is a problematic endeavor because: (1) the
search decontextualizes findings in ways that might mask or even contradict the
original purpose of the source research, (2) it runs the risk of amplifying evidence by
virtue of its isolation, (3) it is susceptible to bias in the selection of relevant material, and
(4) it can only highlight associations rather than establish causal relationships. In order
to assess the extent to which musical imagery is an embodied cognition phenomenon, a
tailor-made research agenda is needed to enable more hypothesis testing about the role
of the body in our musical consciousness.
Before outlining some promising future directions, it is important to acknowledge evi-
dence that seems to temper the claims that can be made for an embodied musical imagery.
First, research corroborates centuries of music pedagogy in suggesting that physical prac-
tice at an instrument will lead to greater improvements than mental practice (Cahn 2008;
Bernardi et al. 2013). Similarly, Lotze and colleagues (2003) argue that auditory-motor asso-
ciations were not sufficiently tight in their study of violinists for them to be co-activated
without actually hearing the performed sound, or actually producing the performed
movement. Finally, Aleman and colleagues (2000) found that musicians outperformed
nonmusicians on auditory imagery tasks. While superior musical imagery abilities are to be
expected, and these are entirely consistent with embodied imagery, there is no reason to
suppose that musicians should have any more embodied knowledge of the everyday sounds
used in the auditory imagery task than the nonmusicians.7 Moreover, the superior perfor-
mance of the musicians on the auditory imagery task cannot be explained by a greater abil-
ity to compare sounds as a result of their training, since musicians and nonmusicians were
comparable in their performance on the equivalent sound perception task. These potential
caveats support the case for an empirical exploration of the extent of the contribution that
embodiment makes to musical imagery experience.
In uncovering movement as an important factor in the experience of INMI, Floridou
and colleagues (2015) agree that embodied cognition is a relevant avenue for future
458 freya bailes

work. I will now point to a selection of the questions that are raised by expanding musical
imagery beyond the mind’s ear. For example, should we test for possible differences
in the degree to which our musical imagery is embodied, and what could such differ-
ences tell us? Godøy (2001) argues, “we have more salient images of sound when we
have more salient images of how the sounds are produced” (238). This hypothesis is
amenable to experimental testing and is ripe for future research. Here, we can return to
the unexpected finding from Halpern and colleagues (2004) of activation in SMA
during a timbre imagery task: pitch and timbre might be disentangled by asking partici-
pants to image a selection of noise-based stimuli, with the prediction that noisy timbres
that are difficult to produce will not elicit SMA activity.
Musical imagery (e.g., re-presenting a sequence of just heard notes in one’s mind)
feeds our musical imagination (e.g., creating a new sequence of notes in one’s mind). To
the extent that our musical imagery is embodied, are our imaginative re-presentations
of music constrained by our physical experience, and if so how can we understand the
role of our body in creative musical thought? Empirical studies of the musical imagery
of composers are lacking (Bailes and Bishop 2012), and research is needed to explore
how bodily experience shapes compositional ideas.
Embodied accounts of musical imagery necessarily relate to learning, since it would
be the changes in our embodied experience that come to shape our mental represen-
tations. This is arguably the research area for which we have the most empirical evidence
in the guise of studies demonstrating enhanced auditory-motor coupling as a result
of musical training. How might an empirical understanding of mental imagery as
embodied be applied in music education? The music pedagogy of Jaques-Dalcroze (1967)
places great emphasis on the importance of movement and sound. This intimately asso-
ciates sounds with their physical production (Campbell 1989), and it seems reasonable
to suggest that the musical representations of those who follow such training are strong
in motor imagery. A related prediction is that there is a link between musical experience
and the fidelity of musical imagery, and that this can be explained in terms of embodi-
ment. Anecdotal evidence of more vivid musical imagery for music that has been expe-
rienced through dance or musical performance might be corroborated by experimental
research. An embodied account of learning should help to explain the pedagogical links
between doing and thinking. The implications for music education are extensive,
suggesting that practice-led learning is the most effective approach to developing reliable
representations of music.
Finally, interoception, which is the sense of our physiological condition, is theoreti-
cally relevant to mental imagery when this is viewed as an offline simulation of embod-
ied cognition. Research into interoception (e.g., Kadota et al. 2010) suggests that it can
have significant consequences on our psychological state. It seems that our perceptions
are subconsciously tuned to our own biological rhythms such as heartbeat (Aspell
et al. 2013), and it remains an open question as to whether interoceptive forces shape our
musical imagery.
empirical musical imagery beyond the “mind’s ear” 459

Concluding Remarks

I would like to conclude by reflecting on the apparent intangibility of both sound and
imagination, with a reminder that our corporality tangibly relates the two. Imagination
is often taken to be synonymous with mental freedom, yet our thoughts are shaped by
our environmental, biological, and cognitive experience. If this shaping extends to
mental imagery, then our musical imagery will be characterized by those features of our
environment that are of personal significance, and investigating musical imagery should
enable a better understanding of what is meaningful in sound. A review of the empirical
literature has demonstrated that our imagery for musical sound is not limited to a single,
auditory modality, and the involvement of motor imagery in particular reminds us that
music results from physical action. In this way, our understanding of sound as embod-
ied can be illuminated through the lens of imagination.
We might then ask how our understanding of imagination can be magnified through
the lens of sound. For most people, the term “mental imagery” is equated with visual
imagery. However, much can be gained from studying mental imagery for other modali-
ties: sound and music are obviously articulated through time, with auditory and musical
imagery emphasizing the dynamic processes that underpin their generation rather than
the apparently static product evoked by imagining a visual scene (Bailes forthcoming).
This chapter has argued that we should extend our understanding of auditory imagery
beyond the “mind’s ear.” Sound naturally affords a focus on the essentially dynamic
properties of imagery and imagination, and this is often missing in the frequently disem-
bodied, static conceptualization of visual imagery occurring in the “mind’s eye.”

Notes
1. Throughout this chapter the terms “imagery” and “imaging” primarily reference
re-presentation, while “imagination” and “imagining” are used more often to signal the
imaginative.
2 In this respect, imagined music resembles perceived music as a cross-modal phenomenon
in which auditory, visual, and kinesthetic senses seem most likely to feature rather than
gustatory or olfactory modalities.
3. Though see Schiavio, Menin, and Matyja (2014) for arguments against the loose
adaptation of the unconscious embodied simulation account to describe conscious
phenomena.
4. Many forms of electronic music require minimal gestures.
5. Burgoyne, J.A. 2015. Resurrecting the Earworms of Our Youth: What Is Responsible for
Long-Term Musical Salience? Paper read at Investigating the Music in our Heads, June 1, 2015,
Goldsmiths University of London.
6. The others are “negative valence,” “personal reflections,” and “help.”
7. Unless they have been trained in environmental listening or acousmatic composition.
460 freya bailes

References
Aleman, A., M. R. Nieuwenstein, K. B. E. Böcker, and E. H. F. de Haan. 2000. Music Training
and Mental Imagery Ability. Neuropsychologia 38 (12): 1664–1668. doi:10.1016/S0028-
3932(00)00079-8.
Aspell, J. E., L. Heydrich, G. Marillier, T. Lavanchy, B. Herbelin, and O. Blanke. 2013. Turning
Body and Self Inside Out: Visualized Heartbeats Alter Bodily Self-Consciousness and
Tactile Perception. Psychological Science 24 (12): 2445–2453. doi:10.1177/0956797613498395.
Baddeley, A. 1986. Working Memory. Oxford: Clarendon Press.
Bailes, F. 2006. The Use of Experience-Sampling Methods to Monitor Musical Imagery in
Everyday Life. Musicae Scientiae 10 (2): 173–190.
Bailes, F. 2007. The Prevalence and Nature of Imagined Music in the Everyday Lives of Music
Students. Psychology of Music 35 (4): 555–570. doi:10.1177/0305735607077834.
Bailes, F. 2015. Music in Mind? An Experience Sampling Study of What and When, Towards
and Understanding of Why. Psychomusicology: Music, Mind, and Brain 25 (1): 58–68.
doi:10.1037/pmu0000078.
Bailes, F. 2019. Musical Imagery and the Temporality of Consciousness. In Music and
Consciousness 2: Worlds, Practices, Modalities, edited by D. Clarke, R. Herbert, and E. Clarke.
Oxford University Press.
Bailes, F., and L. Bishop. 2012. Musical Imagery in the Creative Process. In The Act of Musical
Composition: Studies in the Creative Process, edited by D. Collins, 54–77. Farnham, UK:
Ashgate.
Baker, J. M. 2001. The Keyboard as Basis for Imagery of Pitch Relations. In Musical Imagery,
edited by R. I. Godøy and H. Jørgensen, 251–269. Lisse, Netherlands: Swets & Zeitlinger.
Beaman, C. P., K. Powell, and E. Rapley. 2015. Want to Block Earworms from Conscious
Awareness? B(u)y Gum! Quarterly Journal of Experimental Psychology 68 (6): 1049–1057.
doi:10.1080/17470218.2015.1034142.
Beaty, R. E., C. J. Burgin, E. C. Nusbaum, T. R. Kwapil, D. A. Hodges, and P. J. Silvia. 2013.
Music to the Inner Ears: Exploring Individual Differences in Musical Imagery. Consciousness
and Cognition 22 (4): 1163–1173. doi:10.1016/j.concog.2013.07.006.
Bernardi, N. F., M. De Buglio, P. D. Trimarchi, A. Chielli, and E. Bricolo. 2013. Mental Practice
Promotes Motor Anticipation: Evidence from Skilled Music Performance. Frontiers in
Human Neuroscience 7:451. doi:10.3389/fnhum.2013.00451.
Berthoz, A. 1996. The Role of Inhibition in the Hierarchical Gating of Executed and Imagined
Movements. Cognitive Brain Research 3:101–113. doi:10.1016/0926-6410(95)00035-6.
Bishop, L., F. Bailes, and R. T. Dean. 2013. Musical Imagery and the Planning of Dynamics and
Articulation during Performance. Music Perception 31 (2): 97–117. doi:10.1525/mp.2013.31.2.97.
Brodsky, W., A. Henik, B.-S. Rubinstein, and M. Zorman. 1999. Inner Hearing among
Symphony Orchestra Musicians: Intersectional Differences of String-Players versus Wind-
Players. In Music, Mind, and Science, edited by S. W. Yi, 370–392. Seoul: Seoul National
University Press.
Brodsky, W., A. Henik, B.-S. Rubinstein, and M. Zorman. 2003. Auditory Imagery from
Musical Notation in Expert Musicians. Perception and Psychophysics 65 (4): 602–612.
doi:10.3758/BF03194586.
Cahn, D. 2008. The Effects of Varying Rations of Physical and Mental Practice, and Task
Difficulty on Performance of a Tonal Pattern. Psychology of Music 36 (2): 179–191. doi:10.1177/
0305735607085011.
empirical musical imagery beyond the “mind’s ear” 461

Callan, D. E., V. Tsytsarev, T. Hanakawa, A. M. Callan, M. Katsuhara, H. Fukuyama, et al.

2006. Song and Speech: Brain Regions Involved with Perception and Covert Production.
Neuroimage 31 (3): 1327–1342. doi:10.1016/j.neuroimage.2006.01.036.
Campbell, P. S. 1989. Dalcroze Reconstructed: An Application of Music Learning Theory to
the Principles of Jaques-Dalcroze. In Readings in Music Learning Theory, edited by
C. C. D. L. Walters & Taggart, 301–315. Chicago, IL: GIA Publishers.
Cox, A. 2001. The Mimetic Hypothesis and Embodied Musical Meaning. Musicae Scientiae
5 (2): 195–212. doi:10.1177/102986490100500204.
Davidson-Kelly, K., R. S. Schaefer, N. Moran, and K. Overy. 2015. “Total Inner Memory”:
Deliberate Uses of Multimodal Musical Imagery during Performance Preparation.
Psychomusicology: Music, Mind, and Brain 25 (1): 83–92. doi:10.1037/pmu0000091.
Eitan, Z., and R. Y. Granot. 2006. How Music Moves: Musical Parameters and Listeners’
Images of Motion. Music Perception 23: 221–247. doi:10.1525/mp.2006.23.3.221.
Farrugia, N., K. Jakubowski, R. Cusack, and L. Stewart. 2015. Tunes Stuck in Your Brain: The
Frequency and Affective Evaluation of Involuntary Musical Imagery Correlate with Cortical
Structure. Consciousness and Cognition 35: 66–77. doi:10.1016/j.concog.2015.04.020.
Floridou, G. A., and D. Müllensiefen. 2015. Environmental and Mental Conditions Predicting
the Experience of Involuntary Musical Imagery: An Experience Sampling Method Study.
Consciousness and Cognition 33: 472–486. doi:10.1016/j.concog.2015.02.012.
Floridou, G. A., V. J. Williamson, and D. Müllensiefen. 2012. Contracting Earworms: The
Roles of Personality and Musicality. In 12th International Conference on Music Perception
and Cognition and the 8th Triennial Conference of the European Society for the Cognitive
Sciences of Music, edited by E. Cambouropoulos, C. Tsougras, P. Mavromatis, and
K. Pastiadis, 302–309. Thessaloniki, Greece: School of Music Studies, Aristotle University of
Thessaloniki.
Floridou, G. A., V. J. Williamson, L. Stewart, and D. Müllensiefen. 2015. The Involuntary
Musical Imagery Scale (IMIS). Psychomusicology: Music, Mind, and Brain 25 (1): 28–36.
doi:10.1037/pmu0000067.
Foster, N. E. V., A. R. Halpern, and R. J. Zatorre. 2013. Common Parietal Activation in Musical
Mental Transformations across Pitch and Time. Neuroimage 75: 27–35. doi:10.1016/j.
neuroimage.2013.02.044.
Gibson, J. J. 1986. The Ecological Approach to Visual Perception. New York, NY: Taylor &
Francis Group.
Glenberg, A. M., J. K. Witt, and J. Metcalfe. 2013. From the Revolution to Embodiment: 25
Years of Cognitive Psychology. Perspectives on Psychological Science 8 (5): 573–585.
doi:10.1177/1745691613498098.
Godøy, R. I. 2001. Imagined Action, Excitation, and Resonance. In Musical Imagery, edited by
R. I. Godøy and H. Jørgensen, 237–250. Lisse, Netherlands: Swets & Zeitlinger.
Halpern, A. R. 2015. Differences in Auditory Imagery Self-Report Predict Neural and
Behavioral Outcomes. Psychomusicology: Music, Mind, and Brain 25 (1): 37–47. doi:10.1037/
pmu0000081.
Halpern, A. R., and R. J. Zatorre. 1999. When That Tune Runs through Your Head: a PET
Investigation of Auditory Imagery for Familiar Melodies. Cerebral Cortex 9: 697–704.
doi:10.1093/cercor/9.7.697.
Halpern, A. R., R. J. Zatorre, M. Bouffard, and J. A. Johnson. 2004. Behavioral and Neural
Correlates of Perceived and Imagined Musical Timbre. Neuropsychologia 42: 1281–1292.
doi:0.1016/j.neuropsychologia.2003.12.017.
462 freya bailes

Hubbard, T. L. 2013. Auditory Imagery Contains More Than Audition. In Multisensory

Imagery, edited by S. Lacey and R. Lawson, 221–247. New York: Springer.
Jakubowski, K., N. Farrugia, A. R. Halpern, S. K. Sankarpandi, and L. Stewart. 2015. The Speed
of Our Mental Soundtracks: Tracking the Tempo of Involuntary Musical Imagery in
Everyday Life. Memory and Cognition 43 (8): 1229–1242. doi:10.3758/s13421-015-0531-5.
Jakubowski, K., S. Finkel, L. Stewart, and D. Müllensiefen. 2017. Dissecting an Earworm:
Melodic Features and Song Popularity Predict Involuntary Musical Imagery. Psychology of
Aesthetics, Creativity, and the Arts 1 (1): 122–135. doi:10.1037/aca0000090.
Jaques-Dalcroze, É. 1967. Rhythm, Music and Education. Woking, UK: The Dalcroze Society.
Kadota, Y., G. Cooper, A. R. Burton, J. Lemon, U. Schall, A. Lloyd, and U. Vollmer-Conna.
2010. Autonomic Hyper-Vigilance in Post-Infective Fatigue Syndrome. Biological Psychology
85 (1): 97–103. doi:10.1016/j.biopsycho.2010.05.009.
Kalakoski, V. 2001. Musical Imagery and Working Memory. In Musical Imagery, edited by
R. I. Gødoy and H. Jørgensen, 43–56. Lisse, Netherlands: Swets & Zeitlinger.
Keller, P. E., and M. Appel. 2010. Individual Differences, Auditory Imagery, and the
Coordination of Body Movements and Sounds in Musical Ensembles. Music Perception
28 (1): 27–46. doi:10.1525/MP.2010.28.1.27.
Keller, P. E., S. Dalla Bella, and I. Koch. 2010. Auditory Imagery Shapes Movement Timing
and Kinematics: Evidence from a Musical Task. Journal of Experimental Psychology: Human
Perception and Performance 36 (2): 508–513. doi:10.1037/a0017604.
Liikkanen, L. A. 2012. Musical Activities Predispose to Involuntary Musical Imagery.
Psychology of Music 40:236–256. doi:10.1177/0305735611406578.
Lotze, M. 2013. Kinesthetic Imagery of Musical Performance. Frontiers in Human Neuroscience
7: 280. doi:10.3389/fnhum.2013.00280.
Lotze, M., G. Scheler, H.-R. M. Tan, C. Braun, and N. Birbaumer. 2003. The Musician’s Brain:
Functional Imaging of Amateurs and Professionals during Performance and Imagery.
NeuroImage 20 (3): 1817–1829. doi:10.1016/j.neuroimage.2003.07.018.
McCullough Campbell, S. M., and E. H. Margulis. 2015. Catching an Earworm through Movement.
Journal of New Music Research 44 (4): 347–358. doi:10.1080/09298215.2015.1084331.
Mikumo, M. 1994. Motor Encoding Strategy for Pitches and Melodies. Music Perception 12 (2):
175–197. doi:10.2307/40285650.
Mitchell, R. W., and M. C. Gallagher. 2001. Embodying Music: Matching Music and Dance in
Memory. Music Perception 19 (1): 65–85. doi:10.1525/mp.2001.19.1.65.
Müllensiefen, D., J. Fry, R. Jones, S. Jilka, L. Stewart, and V. J. Williamson. 2014. Individual
Differences Predict Patterns in Spontaneous Involuntary Musical Imagery. Music Perception
31 (4): 323–338. doi:10.1525/mp.2014.31.4.323.
Niedenthal, P. M., L. W. Barsalou, P. Winkielman, S. Krauth-Gruber, and F. Ric. 2005.
Embodiment in Attitudes, Social Perception, and Emotion. Personality and Social
Psychology Review 9 (3): 184–211. doi:10.1207/s15327957pspr0903_1.
Pfordresher, P. Q., A. R. Halpern, and E. B. Greenspon. 2015. A Mechanism for Sensorimotor
Translation in Singing: The Multi-Modal Imagery Association (MMIA) Model. Music
Perception 32 (3): 242–253. doi:10.1525/mp.2015.32.3.242.
Ragert, M., T. Schroeder, and P. E. Keller. 2013. Knowing Too Little or Too Much: The Effects
of Familiarity with a Co-Performer’s Part on Interpersonal Coordination in Musical
Ensembles. Frontiers in Psychology 4: 368. doi:10.3389/fpsyg.2013.00368.
Reybrouck, M. 2001. Musical Imagery between Sensory Processing and Ideomotor Simulation.
In Musical Imagery, edited by R. I. Godøy and H. Jørgensen, 117–135. Lisse, Netherlands:
Swets & Zeitlinger.
empirical musical imagery beyond the “mind’s ear” 463

Schiavio, A., D. Menin, and J. Matyja. 2014. Music in the Flesh: Embodied Simulation in
Musical Understanding. Psychomusicology: Music, Mind, and Brain 24 (4): 340–343.
doi:10.1037/pmu0000052.
Varela, F. J., E. Thompson, and E. Rosch. 1991. The Embodied Mind: Cognitive Science and
Human Experience. Cambridge, MA: MIT Press.
Weber, R. J., and S. Brown. 1986. Musical Imagery. Music Perception 3 (4): 411–426. doi:10.2307/
40285346.
Williamson, V. J., S. R. Jilka, J. Fry, S. Finkel, D. Müllensiefen, and L. Stewart. 2011. How Do
“Earworms” Start? Classifying the Everyday Circumstances of Involuntary Musical
Imagery. Psychology of Music 40 (3): 259–284. doi:10.1177/0305735611418553.
Williamson, V. J., and S. R. Jilka. 2014. Experiencing Earworms: An Interview Study of
Involuntary Musical Imagery. Psychology of Music 42 (5): 653–670. doi:10.1177/0305735613483848.
Williamson, V. J., L. A. Liikkanen, K. Jakubowski, and L. Stewart. 2014. Sticky Tunes: How Do
People React to Involuntary Musical Imagery? PLoS One 9 (1): e86170. doi:10.1371/journal.
pone.0086170.
Zatorre, R. J., A. R. Halpern, D. W. Perry, E. Meyer, and A. C. Evans. 1996. Hearing in the
Mind’s Ear: A PET Investigation of Musical Imagery and Perception. Journal of Cognitive
Neuroscience 8 (1): 29–46. doi:10.1162/jocn.1996.8.1.29.
pa rt I V

A E ST H ET IC S
chapter 23

Im agi nati v e Listen i ng

to M usic

Theodore Gracyk

That’s the worst of music—these silly dreams.

—Virginia Woolf

Introduction

Appreciative listening to music involves the exercise of taste, for it involves attention
to aesthetic properties, as when we distinguish between graceful and clunky transitions,
and between violent and sluggish rhythms. For over three centuries, major figures in
philosophical aesthetics have argued that aesthetic engagement with art—and therefore
music—includes pleasures of the imagination (Addison and Steele 1965). So listening is
both perceptual and imaginative. Frequently, this connection is cashed out with respect
to the problem of how music conveys emotion, as when R. K. Elliott diagnoses the expe-
rience of emotional qualities in music as a case of “imaginatively enriched perception”
(1967, 119). Listening to music differs from hearing the sounds that constitute the music,
and some of this difference stems from our imaginative enrichment of those sounds.1
Although I endorse the conventional thesis that imaginative engagement is normally
required to appreciate music when listening to it, I argue that we should be more cir-
cumspect about this claim than is typically the case. For example, most accounts of
musical expressiveness say that imaginative engagement is required in order to perceive
the melancholy that runs through most of Mozart’s G Minor String Quintet (K. 516) or
the joy of Louis Prima’s “Sing Sing Sing” as performed by Benny Goodman and his
Orchestra. In turn, expressiveness is frequently tied to imaginative enrichment that
recasts auditory events as musical motion. Imagination lets us hear motion and gestures
“in” the progression of sounds, which in turn facilitates an experience of expressiveness.2
I reject both of these proposals, as well as the weaker proposal that imagination is
468 theodore gracyk

required in the form of a descriptive supposition that guides expressive interpretation.

At the same time, I caution that we swing too far in the opposite direction if we dismiss
all imaginative engagement as a subjective, distracted response (e.g., Meyer 1956, 257).

The Opposition of Hearing

and Listening

Virginia Woolf frequented the opera and sought out performances of Beethoven’s string
quartets. Reflecting in her diary about a concert of instrumental chamber music, she
mused, “musical people don’t listen as I do, but critically, . . . without programmes”
(Woolf 1980, 39). She wondered whether she was listening properly when the music
encouraged streams of imaginative imagery and associations. Woolf remarks that, when
a concert program features a Bach concerto, “its [sic] difficult not to think of other
things” (Woolf 1977, 33). She offers a lengthy description of listening to music in her
stream-of-consciousness story “A String Quartet,” in which a nameless protagonist
responds as follows to the opening measures of a Mozart quartet.

[L]ooking across at the player opposite, the first violin counts one, two, three—
Flourish, spring, burgeon, burst! The pear tree on the top of the mountain. Fountains
jet; drops descend. But the waters of the Rhone flow swift and deep, race under the
arches, and sweep the trailing water leaves, washing shadows over the silver fish, the
spotted fish rushed down by the swift waters, now swept into an eddy.
(Woolf 2003, 133)

If we read this passage as quasi-autobiographical, it appears that by “think[ing] of other

things” Woolf means she engages in vivid imaginative associations while listening.
Although Woolf probably did not know of it, Violet Paget was just then concluding an
empirical study of listeners’ responses to classical music. A London acquaintance of
Woolf, Paget published both fiction and nonfiction under the pseudonym Vernon Lee.
There were almost certainly occasions when they attended the same music performances.
When Lee analyzed her listener-supplied data, she concluded that attentive listeners
who focus on following “the notes and all their relations” are the least likely to accompany
their listening with “extraneous suggestion . . . metaphorical allusion, [and] visual
analogy” (1932, 440). Consequently, Lee divides music audiences into the categories of
listeners and hearers, where the latter are distracted by extraneous imaginings. She
frequently praises less imaginative listeners as “the more musical” audience (e.g., 1932,
30). Lee seems to be adapting and defending the ideas of Edmund Gurney, who divides
listening into two categories, definite and indefinite (1880, 304). Definite listening
occurs when a “musical ear” attends closely to musical form. Indefinite listeners treat
musical sound as “a congenial background for their subjective trains of thought and
imaginative listening to music 469

emotion” (Gurney 1880, 306). According to Gurney and Lee, Woolf ’s active imagination
places her squarely in the company of mere hearers. Her listening is indefinite.3
For Woolf, the issue was more than an academic question. Her mode of attending to
music had recently been described and criticized by her brother-in-law, Clive Bell.
Explaining and defending aesthetic formalism, Bell valorizes the pattern-focused
attention of Lee’s “listeners” and Gurney’s definite listening:

Tired or perplexed, I let slip my sense of form . . . I begin to read into the musical
forms human emotions of terror and mystery, love and hate, and spend the minutes,
pleasantly enough, in a world of turbid and inferior feeling. At such times, were the
grossest pieces of onomatopoeic representation—the song of a bird, the galloping of
horses, the cries of children, or the laughing of demons—to be introduced into the
symphony . . . they would afford new points of departure for new trains of romantic
feeling or heroic thought. I know very well what has happened. I have been using art
as a means to the emotions of life and reading into it the ideas of life. I have been
cutting blocks with a razor (Bell 1914, 31–32).

Bell acknowledges that musical representation and therefore imaginative response is

sometimes intended by composers. Yet he dismisses it: “The representative element in a
work of art may or may not be harmful; always it is irrelevant” (Bell 1914, 25). Form alone
is the whole of music’s artistic value. Bell endorses listening, in Woolf ’s words, “without
programmes.” Imaginative engagement is not necessary when listening, and it is a sign
that one is distracted from attending closely to relevant musical relationships.
This disparagement of imaginative engagement is a recent phenomenon. Previously,
a long tradition had regarded it as central to musical experience. For example, Johann
Mattheson’s doctrine of affections identified “strong imagination” as a precondition for
responding appropriately to instrumental music (1739, 82). Modern philosophical
aesthetics went further and recognized imagination as a necessary component of
aesthetic response. In 1712, Joseph Addison identifies the chief value of the “polite” or
fine arts as the function of generating “the pleasures of the imagination” (1965, 535–582).
By the end of the decade, the abbé Du Bos was preaching the same gospel in France.
Where Addison concentrates on literature and the plastic arts and mentions music only
briefly, Du Bos recognizes that instrumental music requires special attention and
devotes pages to explaining how it engages the imagination (1748 I, 360–375). Soon after,
Charles Batteux offers the first coherent definition of the fine arts and argues that all
good music is representational—therefore it engages the imagination—for otherwise it
could not be art (2015, 136). Back across the channel, David Hume proposes that “a true
judge in the finer arts” must respond with “delicacy of imagination” (Hume 1987, 234).
The century’s debates about the nature of aesthetic response culminate in the touchstone
of modern aesthetics, Immanuel Kant’s Critique of the Power of Judgment. Synthesizing
eight decades of aesthetics, Kant famously contends that aesthetic judgment is a free
play of imagination and understanding. Instrumental music occupies “the lowest place”
in the fine arts because it provides limited guidance to imagination; its charms are too
470 theodore gracyk

much the result of the stimulation of “sensations” (Kant 2000, 206). The centrality of
imagination continues to dominate theories of musical experience through the nineteenth
century and then into our own time. In recent years, imaginative engagement is treated
as an essential component of listening by both philosophers and musicologists, including
Roger Scruton (1974, 1999), Charles Rosen (1995), Nicholas Cook (1990), Denis Dutton
(2009), and Jerrold Levinson (2006a).
Let us assume that many people attend to instrumental music as Woolf did, engaging
with music more imaginatively than is minimally required for perception of sound
sequences.4 Against the common prejudice that listeners who engage in a more robust
imaginative response are less musical than those who listen “without programmes,”
imaginative supplementation of what we actually hear seems to be unavoidable and
necessary in music listening. But if all music listening is partly perceptual and partly
imaginative, the real issue with Woolf ’s response is the degree to which some listeners let
their imaginations run free.

Three Species of Imagination

Many, many distinct roles have been assigned to imagination since Aristotle identified it
as central to human thought (Sparshott 1990; Stevenson 2003; Townsend 2006, 160–161).
Therefore, disagreements about its role in listening cannot be resolved unless we pro-
vide focus by determining which roles are relevant. I have already discarded most of the
roles assigned to imagination by focusing on occurrent imagining, where imagination is
applied to music that one currently hears. Occurrent imagining may have little or
nothing in common either with having a tune stuck in the head (an eidetic image or
“earworm”) or with imagining sounds while silently studying a musical score (Tovey 1936).
Mary Warnock provides a succinct summary of the relevant central idea as our
“capacity to look beyond the immediate and the present” (1976, 201). Of course, this
does not distinguish imagination from memory, which can share precisely the same
content. When I look at my yellow house and remember that it used to be blue, I need
not form a mental image of it as blue. However, suppose I do, and “picture” the house as
it used to be. Psychologists refer to this phenomenon as memory imagery. Since the
experiential content of memory and imagination imagery can be identical, we need
nonphenomenal criteria for differentiating memory imagery from imagination imagery.
Current consensus holds that imagining the house as blue differs from remembering that it
was blue according to whether one believes that it was blue. Imagination imagery is
belief-independent. If someone believes that the image reproduces something as it was
experienced in the past, then it is a memory (even if it is false).5 In Roger Scruton’s pre-
ferred description, “imagination involves thought which is unasserted” (1974, 97; cf.
Scruton 1999, 88–89). Other recent explanations say that imagination is “quarantined”
from beliefs, where “pretense representations differ from belief representations by their
function” (Nichols 2004, 130). Or, more precisely, by their reduced function as measured
imaginative listening to music 471

by behavioral consequences. As a well-worn example has it, a horror movie may induce
some level of fear, but imaginary monsters do not prompt normal people to call the
police for help, nor do they jump from their seats and run for safety. When Woolf
imagines the spotted fish in the eddy, she does not make plans to return to that spot later
with a fishing rod.6
So which species of imagining are most relevant to music listening? I will concentrate
on three modes of imaginative engagement that philosophers typically discuss in rela-
tion to experiencing pictures, literature, and music. They are propositional imagining,
imagination imagery, and hearing-in.7
Propositional imagining involves conceiving or making-believe that a proposition or
set of propositions is true of some world, without necessarily believing that it holds true
of our own. This species of imagining is normally understood to simulate belief. For
example, suppose you are reading Jane Austen’s Sense and Sensibility and reach the line,
“a neat wicket gate admitted them into [a small green court].” One way to respond to this
linguistic prompt is to suppose, for purposes of the narrative, that the Dashwood family
has now passed through a wicket gate and has entered a small courtyard in front of their
new cottage. This thought may or may not be accompanied by a second imaginative
activity, imagination imagery. Some readers will supplement the propositional imagining
with imagery, visualizing (in their “mind’s eye”) a fence and wicket gate and a group
of women going toward a cottage.8 There will be considerable variation in what is
imagined. One reader may construct an image of a green, grassy courtyard in front of
a one-story cottage. Another may furnish the area with rose bushes, and imagine a
small, two-story house.
However, music listening might require a third species of imaginative engagement,
which literary fiction does not require. This third kind is common with sculpture,
pictures, and films: it involves imaginative transformation of what one directly perceives.
Suppose I look at a landscape painting and I see both a paint-cracked surface and the
painting’s representation of a horse-drawn cart crossing a stream beside a white cottage.
Looking at the canvas, I imagine that I am actually seeing the English countryside in
some past age. This kind of imaginative engagement accompanies reading when illus-
trations appear in the particular edition that one is reading. But, graphic novels aside,
pictures are not necessary for the experience of literature. For sculpture and pictures,
the requisite imaginative engagement with the visual object is generally referred to as
seeing-in (e.g., Lopes 2005). With music, the parallel case is hearing-in, as when the rumble
of thunder is heard in the tympani rolls in the storm sequence of Beethoven’s Pastoral
Symphony.9 Hearing-in is guided by the listener’s direct experience of sonic features,
their combination, and their sequencing. Hearing-in and seeing-in are alike in that each
involves imaginatively experiencing a perceived object to be something more than it is.
Granted, both seeing-in and hearing-in are sometimes supplemented with—and guided
by—propositional imagining.10 Informed listeners may attend to a relevant passage in
the Pastoral Symphony by consciously or unconsciously imagining “The storm is passing
now” without believing they have witnessed a storm.11 Others will also add imagination
imagery, supplementing the auditory experience by visualizing a storm and then its
472 theodore gracyk

lifting.12 There will be cases, therefore, when seeing-in and hearing-in invite all three
species of imaginative engagement.
Is this third kind of imaginative engagement with music, hearing-in, required in
music listening, either by itself or as guided by propositional imagining? A central test
case for the necessity of imagination in listening is the thesis that music demands
hearing-in when we imaginatively “animate” what we hear (e.g., Trivedi 2011, 118). Many
accounts of listening treat hearing-in as essential to the experience of musical move-
ment and to the experience of music’s expressive qualities. However, it is possible that
different imaginative processes—or perhaps none at all—are involved when we experience
movement, structure, and expressivity. The other two species of imaginative engagement,
propositional imagining and imagination imagery, are subject to the objection that the
thoughts and images are unnecessary and inessential additions to the listening process.
This objection appears to capture Woolf ’s worries about her listening strategies.
However, it cannot be raised against hearing-in if both of two conditions hold: (1) the
experience of musical animation is an essential aspect of the experience of listening; and
(2) the perceived animation requires imaginative hearing-in, which is not required
more generally for auditory perception. Following some additional prefatory work, I
will challenge the second of these two conditions; having done so, I will look for other
ways that hearing-in might be essential to music listening.

Props, Triggers, and Absolute Music

My general concern is the plausibility of the thesis that a particular species of imaginative
engagement, hearing-in, is required for music listening. In this section I elaborate on
why hearing-in is the crucial test case.
It does not take much to demonstrate the weaker thesis that imaginative engagement
is frequently appropriate when listening. For example, it is appropriate for songs, opera,
and program music where verbal cues will attune the listener’s sensitivity to extra-musical
representation in various musical structures. Listening to Jimi Hendrix’s rendition of
“The Star-Spangled Banner” at Woodstock, we should hear bombs exploding in the
guitar pyrotechnics following the (unsung) line, “the rocket’s red glare, the bombs bursting
in air.” Likewise, we should hear the foot treadle of the spinning wheel in the music of
Schubert’s “Gretchen am Spinnrade.” Although these cases of hearing-in are appropriate,
imaginative responses, they do not advance the case that hearing-in is a necessary
element of music listening. They fail for the same reason that stage sets at the opera do
not count as evidence that music is an audiovisual art form. These are hybrids of music
and something more, and the “something more” is an obligatory guide to the imagi-
nation in our response to the hybrid object of attention (see Davies 1994, 113–114).
The plausibility of the stronger thesis, that listening to music requires hearing-in,
hinges on imagination’s role for listeners who do not receive explicit guidance from
extra-musical information. As Eduard Hanslick (1986, 15) argues, absolute music is the
imaginative listening to music 473

ideal test case for any view that a property or process is essential to music or music
listening. We can generalize from it because it is “pure, objective, and self-contained—
that is, not subordinated to words (song), to drama (opera), to a literary programme or
even to emotional expression” (Hamilton 2007, 87).
For the remainder of this essay, I will concentrate on examples that lack extra-musical
information. But when listeners’ imaginations float free from intramusical guidance,
purists can dismiss the imaginative response as subjective, irrelevant, and unmusical.
So the strong thesis requires intramusical guidance from absolute music that yields
(relatively) reliable recognition of whatever is heard in the music; listeners would report
agreement at roughly the same level that people agree that a dog is pictured when shown
a picture of a dog. Because no such level of agreement is evident with musical represen-
tation, we seem justified in dismissing idiosyncratic images, such as Woolf ’s “pear tree
on the top of the mountain.” After all, how can anyone hear a tree, much less a particular
type of fruit tree, by way of hearing-in? But the very same objection can be raised against
any musical representation in which a sound does not resemble another sound. The
tympani may sound like thunder and the woodwinds can imitate birdcalls, but imagi-
nation will play a limited role in listening if it is limited to cases of onomatopoeia. We need
more than onomatopoeia but less than universal recognition of whatever is represented.
We need an account of how musical patterns and passages guide propositional and/or
imaginative hearing in a non-onomatopoeic manner (see Davies 1994, chap. 2).
To better understand guided response, it will be useful to adapt a distinction from
Kendall Walton. Consider the difference between cases where we imagine, of some
object that we perceive, that it is something it is not, versus cases where an object leads
us to imagine something, but we do not imagine it of the prompt itself (Walton 1990, 25).
To imagine that a tree stump is a bear is a case of the former, whereas imagining that it is
raining somewhere when one sees a dripping faucet is a case of the latter sort (because one
is not imagining that the dripping water is rain). In the former case, the object is a prop,
while in the second it is a trigger. The perceived object is a prop if there are conventions
in place by which its properties generate fictional truths that guide our imaginings; that
is, it is a prop if its particular features guide appropriately backgrounded participants
to imagine a determinate state of affairs. Suppose we are playing the board game
Monopoly and I mistakenly move someone else’s token and then “buy the railroad” on
which it lands. Other players can (and will!) object that I have moved from the wrong
location and so cannot buy that railroad with my Monopoly money. The various props
together with established rules-of-play endorse certain imaginative responses and not
others. Here, the mistake of moving the wrong token has an objective consequence for
what is taking place (and, also, not taking place) in the game world. Conversely, the same
object of perception is a mere trigger when the imagined content is imaginative imagery
that is idiosyncratic and unconstrained by the object’s features. Suppose I select the
wheelbarrow as my game token because it encourages happy thoughts of a bountiful
harvest from my small backyard garden. But my garden is too small to involve use of a
wheelbarrow, and my response floats free of the game; now, the token functions as a
trigger, rather than a prop, for my imaginative enrichment.
474 theodore gracyk

Aligned with the distinction between hearing-in and imaginative imagery, the
istinction between props and triggers offers a general framework for evaluating differ-
d
ences in listeners’ responses. It directs us to ask, of any particular imaginative response,
whether it is appropriately directed and focused by perceptual cues. Purists are correct
to question the appropriateness of responses of someone who treats all Western instru-
mental music as a trigger for fanciful, free-roaming imaginings.13 So we seem to have a
principled reason to set aside idiosyncratic responses, such as Woolf ’s fish and pear tree.
Therefore, imagination imagery is a poor candidate for the strong thesis and we should
concentrate on the way that music functions as a prop (rather than a trigger) for hearing-in.
For example, Lee found a pattern of water imagery independently associated with certain
pieces of absolute music (1932, 428–429; e.g., for a Chopin Nocturne). This high level of
agreement suggests that this imagery arises because the music is a rule-governed prop
for our hearing-in. However, we must not be too hasty. Instrumental music often serves
as a fragmentary, largely indeterminate prop: the imaginings they license are less deter-
minate than in most games of make-believe (Walton 1994, 52). From that perspective,
there is no reason to object to the fact that Woolf ’s imaginary fish and tree are highly
determinate interpretations of audible elements of the musical experience.14 She may be
more “musical” than she thinks she is, but thinks otherwise due to the influence of
formalists who deny that music should be a prop for hearing-in. What is at stake is
whether there is any prop-function that holds for all music listening.
Given the anti-imagination stance of formal purists, it is important to recognize that
some formalists endorse hearing-in. Hanslick, perhaps the most influential formalist of
the nineteenth century, urged a distinction between necessary and unnecessary imagi-
native hearing-in. He is frequently attacked for a variety of intellectual sins, real and
exaggerated, but he is seldom given credit for foreshadowing Walton’s distinction
between prompts and triggers. Hanslick famously argues that it is an error to imagine a
narrative or expressive persona when listening to Bach’s Das wohltemperierte Klavier
(1986, 14). However, Hanslick is not anti-imagination: “If we are to treat music as an art,
we must recognize that imagination and not feeling is always the aesthetical authority”
(5). Some imaginative responses count as appreciative response, while some others do
not (30). Basically, hearing-in is only appropriate when there are real properties of the
music that serve as focal points for the imaginative response. For Hanslick, the first
requirement is a culturally entrenched tonal system. On this basis, our sense of musical
form is tied to our apprehension of it as a representation of “the motion of a physical
process according to the prevailing momentum: fast, slow, strong, weak, rising, falling . . . It
can depict not love but only such motion as can occur in connection with love” (11).
Hearing-in generates awareness of musical animation. It is therefore essential to all
musical content, which Hanslick identifies with “tonally moving forms” (29). He dis-
tinguishes this from cases of “hearing” where music is a mere trigger for free associ
ation and where the listener is not appreciating it for what it is, despite enjoying the
experience (59).
Hanslick’s sketchy remarks endorse the necessity of imaginatively enriched perception.
More importantly, he identifies the feature that has attracted almost universal consensus
imaginative listening to music 475

as the indispensable example of hearing-in: musical movement. In contrast, purism (à la

Gurney, Bell, and Lee) does not distinguish between appropriate and inappropriate
cases of imaginative engagement. When instrumental music encourages thoughts and
images of anything but the music, the purist dismisses it as a trigger for imaginative play
that distracts from the proper object of attention, which is musical form. In principle,
I side with Hanslick. Music is not a mere trigger for fanciful association, for we can identify
ways in which the imagining of some properties or situations is directly guided by the
music, where there is some level of intersubjective agreement that particular imaginings
do or do not fit the music’s features in the absence of extra-musical cues. The best possible
case will be one where listeners cannot recognize musical properties without this sort of
imaginative engagement. But if we can establish that hearing-in is necessary when
listening, however minimally, the purist cannot use the exercise of imagination as a
criterion for regarding a listener as unmusical or the listening as distracted.

Expressiveness

Appropriately backgrounded listeners frequently hear music as sad, joyful, anxious, and
so on. Levels of agreement are so high that we can use expressivity as a test case of musical
competence. For example, we must doubt the musicality of anyone who reports that
Benny Goodman’s performances of “Sing Sing Sing” sound melancholy and despairing.
Following established philosophical usage, I speak of music’s “expressiveness” and
“expressive qualities” rather than its “expression of emotion.” Genuine expression
requires a person or sentient being who has an emotion and signals it to others by
means of external signs. Thus, my dog expresses happiness by wagging his tail. However,
composers are capable of composing music that sounds happy or sad, or happy or sad in
a very particular way, without having to draw on their own emotional experiences as a
source of the music’s design. The key to composing sad music is knowing what sad music
sounds like. There may be occasions where composers engage in self-expression, but
self-expression is not necessary for the music’s having an expressive dimension.
Therefore, it is better to describe the sadness of, say, a twelve-bar blues as an expressive
quality than to treat it as an expression of the emotion of sadness.
I will be brief about expressiveness and imagination. Many, and perhaps most,
philosophies of art analyze musical expressiveness by reference to imagination and
make-believe. Malcolm Budd pinpoints the “underlying idea” as the proposal that
“emotionally expressive music is designed to encourage the listener to imagine the
occurrence of experiences of emotion” (1989, 135). Unfortunately, the idea that music’s
expressiveness emerges through imaginative engagement does not establish that all
music listening requires imagination. Some music is not appropriately heard as pos-
sessing expressive qualities, including some the serialism of Milton Babbitt, Pierre
Boulez’s Structures I and II, and Philip Glass’s Music in Contrary Motion. “Expressionless”
music is not restricted to the twentieth and twenty-first centuries. The fugues of
476 theodore gracyk

J.S. Bach’s Das wohltemperierte Klavier and Die Kunst der Fuge are frequently identified
as examples of “emotionless” musical masterpieces (Lang 1997, 509). So although a hearing-
in account of expressiveness supports the view that imaginative enrichment is sometimes
necessary to perceive expressive features, we should not generalize this finding to all
music listening.
However, I regard even that position on expressiveness as overly generous. The limited
scope of that endorsement collapses if there is a plausible nonimagination account of
music’s expressiveness. Here, I think that Budd and Stephen Davies are correct to
exclude imagination from our detection of musical sadness and happiness, the two
most universally recognized “emotions” in music (Budd 1989, 137; Davies 2011, 1–20).
We describe many external appearances with emotion terms, yet we do not always
imagine in these cases that we are detecting any underlying mental states. For example,
in the same way that we can describe weather as “gloomy” without attributing feelings to
weather, we can describe someone as having an “angry” tone of voice without thinking
they are angry. Since emotion descriptions of music are obviously descriptions of how
the music sounds, phrases such as “angry music” and “sad music” may be compressed,
literal descriptions of angry-sounding music, sad-sounding music, and so on.
Yet the topic of musical expressiveness is not irrelevant to our interests here. Many
accounts of expressiveness regard it as dependent on a second phenomenon, our experi-
ence of musical animation or movement (Hanslick 1986, 11; Lee 1932, 80; Kivy 1989,
52–58; Levinson 2006b, 121–123; Davies 2011, 10–11). In turn, the experience of musical
movement and animation is generally thought to require imaginative engagement (and
doubly so, when it is interpreted as a bodily gesture reflecting agency). Since all music
displays some kind of motion or animation, it is not expressiveness but rather musical
motion that provides a universal musical phenomenon that may require imagination.
I investigate this proposal in the next section.

Metaphor, Musical Space,

and Movement

Here is Shakespeare, four hundred years ago: “That strain again! It had a dying fall.”15
Which strain does Duke Orsino want to hear again? The one that moves with a dying
fall. Here is a recent description of some music in Bernard Herrmann’s score for
Hitchcock’s Psycho: “The opposing nature of the two musical lines moving toward each
other reflects . . . two perspectives” (Rothbart 2013, 46). The musical lines are oriented in
an unreal acousmatic “space” in which they are moving toward each other, and on this
basis they can represent what is happening in the film.16
But do we imagine the movement of a melodic line, or the leap of the octave? Despite
significant cultural differences, the use of motion-terminology and action-descriptions
to characterize music is a cross-cultural phenomenon (Becker 2010). An important
imaginative listening to music 477

assumption is that we are not dealing with after-the-fact metaphorical descriptions,

because then we may be looking at propositional imagining detached from hearing-in.
We want to pursue the idea that we talk that way about music because that is how music
sounds. And, to experience the music as moving, we must also perceive an unreal space
in which it moves.17
The primary reason to think that imagination is at play here is that a melody does not
literally die, for it is not alive. Nor can it approach another melody, for it is not an object
in space. It is not subject to gravitational pull, so it cannot fall. A melody is a sequence of
tones, one following another in time. Granted, sound waves are physical displacements,
but a “falling” melody is not literally one in which sound waves move from a higher to a
lower location in our physical environment. Similarly, music seems to speed up, or slow
down, as happens repeatedly and to notable effect in the Beatles’ “We Can Work It Out.”
However, a tempo change is very unlike our paradigm examples of motion, as when a
moving vehicle accelerates. Unlike an automobile, music presents no stable object that
changes speed as it moves through acousmatic space. Instead, tempo changes involve
changes in the frequency of certain periodic events that we perceived as grouped
together. But a change in the frequency of an event is not the same thing as a speeding up
or slowing down.
Roger Scruton provides an analysis of musical motion that, if accepted, proves that
these experiences require imagination (see also Zangwill 2007). He argues that we
cannot perceive harmony, melody, and rhythm except by thinking of sounds in terms of
our concepts of spatial arrangement and movement. (For example, a chord covers a
spatial distance or width, and a melody involves movement up and down and from one
note to another.) Since the sounds do not actually possess spatial properties and actions
of the sort we perceive in the music, music must be “the object of metaphorical per-
ception” (Scruton 1999, 353; see also 2014, 161).18 In the context of the present discussion,
Scruton’s analysis is important for proposing that metaphors involve an imaginative
transference of concepts, so his claim that music is the object of metaphorical per-
ception fits our characterization of music as an imaginative prompt. He offers an
account of propositional imagination as a guide to hearing-in.
Scruton’s theory of metaphorical listening is both interesting and influential, but
many criticisms have been launched against it. In the end, it does not support the
view that imaginative perception is required to listen to music. Here are two potent
lines of criticism.
First, metaphors are conceptual constructs, and there is no reason to suppose that
prelinguistic infants engage in the conceptual transference that underlies metaphorical
thinking. Yet there is solid empirical evidence supporting the view that prelinguistic
infants attend to music and recognize at least some of its expressive content, in much the
way that adult listeners do.19 So linguistically guided metaphor is not essential to our
basic experience of music, nor its basic expressivity. This point is especially salient when
we consider the appearance of multiple distinct motions in short passages of music, as
when the two musical lines approach each other in Herrmann’s score for Psycho. It is
unlikely that we construct a metaphor for each, one rising and the other falling, and then
478 theodore gracyk

a further metaphor for the relationship between them (as approaching each other).
The metaphors would explode exponentially even in cases of a moderately more
complex piece of music, such as when Talking Heads perform “Crosseyed and Painless”
with a vocal line and seven distinct instrumental parts. If we do not find an alternative to
the view that awareness of musical motion requires propositional imaginings, then how
many metaphors do we juggle in our minds when we attend to polyrhythmic music of
this sort? Or do we concede that we cannot hear the musicality of most of those instru-
ments during most of the performance? But that is simply nonsense. We can perceive a
great deal more musical detail and interplay than we conceptualize.
Paul Boghossian provides a second criticism.20 We can distinguish between justified
and unjustified metaphors. To do so, “we would have to be aware of some layer of
musical experience with a perfectly literal content that our musical metaphors would
be designed to illuminate. But there doesn’t seem to be such a layer of experience”
(Boghossian 2007, 123). To put it another way, if one insists that all music listening
derives from the guidance of a particular metaphor, then there is no principled way to
distinguish between the metaphorical and the literal components of the experience, and
the literal component cannot guide the application of the metaphor. Consequently, we
should also be able to listen to any sound sequence in terms of the same metaphor, and
we will hear it as music. (On this hypothesis, there would be nothing radical in John
Cage’s invitation to attend to the music in seemingly nonmusical sound.) However,
although almost everyone recognizes pitch differences in various natural and environ-
mental sounds, and can hear patterns of change in these pitched sounds, it is very
difficult to hear “music” in sound sequences when they have not been organized inten-
tionally as such. Although there may be objective cues in some sound sequences (and
absent from others) that invite us to recognize musical “motion” of various kinds, the
perception of musicality does not arise from our application of a particular metaphor or
from their subsumption under concepts imported from the visual and tactile realms.
Stephen Davies (2011, 32) makes the related point that metaphors only facilitate con
ceptual transference when they are “live,” as nonstandard descriptions that cast new light
on a situation. However, we are not being creative or imaginative when we talk of musical
motion, so the metaphor does not work as a metaphor any longer.
I conclude that music listening does not require metaphorical perception, nor meta-
phor-guided perception. This conclusion deprives us of our most compelling reason to
think that all music listening is infused with imagination. It does not, however, prove
that imagination is always dispensable. Another argument might demonstrate that it
has a necessary role, without invoking the guidance of metaphor.
We might construct an alternative argument by modifying Scruton’s premises. First,
suppose there is a metaphor, but it comes after-the-fact or as a supplement to the experi-
ence, as our best description of what we experience with harmony, melody, and
rhythm.21 Second, we now allow that we perceive neither spatial arrangement nor
motion in a melody or rhythm, for there is no object in space that fits our description
when we say that a sad melody droops. We do not perceive space and motion and we do
not, upon reflection, believe that music moves. Yet we experience something in the
imaginative listening to music 479

music that is usefully described with this language. Some sort of transference is taking
place. Therefore, some underlying mental process is at work that encourages this mode
of description. We must be engaging in an imaginative transformation of what we
perceive, even if what we perceive is ineffable and only approximated by our space and
movement metaphors. We have some kind of propositional imaging that guides
hearing-in, but its precise nature is not available to our introspection.
There are three strong objections to this line of thinking, and I think they are jointly
decisive. First, there is nothing added by appeal to imagination that is not accomplished by
admitting that human experience is largely ineffable. We frequently resort to descriptions
that we do not endorse as literally true. However, this practice does not generally prove
that imagination has transformed the experience in a way that defies description, so
there is no reason to postulate it concerning music. The second counterargument
simply rejects the premise that generates the previous objection. We can deny that the
experience of music is unusually resistant to fine-grained description. Hanslick makes
the point that musicians and music theorists possess a detailed technical vocabulary for
describing music. The problem is not music, but the fact that so few people have learned
to employ this vocabulary. The frequent use of “poetical fictions” to describe basic musical
phenomena provides no evidence that people hear something in the music that is not
literally there (Hanslick 1986, 30). The correct conclusion is that we simply have a lot of
people who have not learned to articulate what they hear. Third, Davies (2011, 25–32)
argues that there is simply no metaphor at work when we say that a melody falls or that
the span of one chord is wider than another. We talk of spatial distances and movement
for all sorts of things besides bodies in our three-dimensional environment. We apply
these concepts and use these words literally whenever our experience is highly similar to
established exemplars. Our general “motion” vocabulary is polysemous, not metaphorical
(Davies 2011, 32).22
We have hit another wall in the search for a compelling reason to grant that imaginative
enrichment must inform listening.

Experiential Illusion

Not everything within experience that is observer-dependent is imagination-dependent.

Rather than requiring imaginary or imagination-assisted aspect perception, the experi-
ence of musical animation corresponds to standard criteria for experiential illusion: we
perceive an object in our environment, but we directly experience it as having properties
that it does not really have (Johnston 2006, 269). Experiential illusions, such as the
appearance of the line lengths in the Müller-Lyer illusion, persist even in the face of a
belief that the object is not as it appears to be. Similarly, there are notable color illusions,
as when two patches of identical color in the same image look like two different colors
because of the different contrast colors that are beside them. More to the point, visual
illusions are not always static. We readily “see” motion where there is no actual motion.
480 theodore gracyk

Obvious examples include marquee lights and strings of Christmas lights. When they
are rapidly lit in sequence, they create the illusion that a single point of light is moving
along the string. We see motion even when we know that something else is really hap-
pening. Rafael De Clerq (2007) offers the example of moving the cursor around on a
computer screen. But I do not really move anything there. In reality “there is no cursor
moving on my computer screen: there are just local changes in the light emitted (just as,
strictly speaking, there [are] only sounds or vibrations in the surrounding air)” (De
Clerq 2007, 162). The important point is that these illusory motions are systematic,
natural, and belief-resistant effects of our perceptual system, and they are not imagina-
tive transformations of a more basic experience.
Likewise, the experience of musical space and musical motion seem to involve an
immediate, unlearned, unconscious process. As with a host of optical illusions, our
species just seems to be hard-wired to organize some kinds of sounds in particular ways
that we find “musical.” The experience of (illusory) movement in musical space would
be a prominent example: we may talk about a piano sonata moving into a remote key,
but there are no literal spaces or distances to traverse.23 Like any other perceptual illusion,
the experience of melodic movement and of size or width in a musical chord persists
in the face of knowledge that the relevant acousmatic “space” is not real. Because the
experience of movement in music is common and fits our standard criteria for per-
ceptual illusion, there is no reason to invoke imagination to explain the experience of
acousmatic “space” and motion within it. Musical movement is an experiential illusion
in which phenomenal effects systematically mislead us in response to certain kinds
of sound structures, which composers and musicians exploit. (Again, the attention
that infants give to melodies casts suspicion on the necessary role of imaginative
processing.)
If we are to appeal to illusion, rather than imaginings, we are committed to saying that
it is the natural product of our inherited auditory system. Charles Nussbaum (2007)
offers just such an account. The physiology of the human ear is integrated with mental
structures that map all sounds spatially; we cannot attend to musical structure unless we
also “move through [music’s] virtual tonal space in imagination” (2007, 99). It is note-
worthy that Nussbaum places no weight on his occasional references to imagination.
Given his explanation of why our mental representation of tonal space is an unavoidable
response, Nussbaum more often and more accurately refers to it as an illusion (e.g., 50).
A related line of explanation might posit that there is some degree of synesthesia
involved in music perception (i.e., some natural incorporation of nonauditory per-
ceptual systems into auditory perception; see the chapter on cross-modal corres
pondences by Eitan and Tamir-Ostrover, volume 1, chapter 36). At the same time, Saam
Trivedi (2008) and Stephen Davies correctly emphasize that any recourse to a biological
explanation of this illusion is a supplement to the philosophical question at hand, which
is the question of the proper referent of various terms when applied to music. Something
perceived, or something imagined? Davies returns us to the philosophical issues by
observing that a causal explanation is relevant when it explains why particular des
criptions are employed so universally. Cross-cultural uniformity in language use is a good
imaginative listening to music 481

reason to think that the language is being used literally, not metaphorically: we really do
experience music in terms of its own space, with movement in that space (Davies 2011,
31–32). And, again, that is a reason to think that we have hit another dead end in trying
to prove that imagination is necessary for hearing sounds as music. If the experience of
musical space and movement is going to be linked to imagination, the connection
involves a sense of “imagination” that I have not explored. Imagination is only necessary
if we stipulate that perceptual illusions are imaginative constructs. Although that
stipulation was common in the past, contemporary usage does not endorse it.

Less Obvious Candidates

If musical motion is experienced rather than imagined, then what other phenomena
might require imaginative engagement?
There is a complex debate about the importance of listeners’ comprehension of
large-scale structure or musical “architecture” for complex instrumental compositions.
Because these structures are never immediately present for direct perception, they must
be imagined by knowledgeable listeners, either propositionally or though imagination
imagery. Suppose Aaron Copland is right, and “imagination and the imagination alone”
permits a listener “to see all around the structural framework of an extended piece
of music” (1952, 15). Unfortunately, Copland’s qualification about “an extended piece of
music” shows that it might not be required for all listening. We certainly do not need it
when listening to an instrumental version of a strophic folk song, such as Donald Bird’s
jazz take on “House of the Rising Sun.” It might not be necessary even for extended
forms, such as symphonies. According to Jerrold Levinson’s (1997) concatenationism
account of basic musical understanding, it is not required, for we can get what is most
important from any musical work by attending moment-to-moment.
Let us pause to take a closer look at moment-to-moment listening, for it involves
anticipation and expectation of where the music is going (Meyer, 1956). It is this point,
above all, that leads Hanslick to maintain that “imagination . . . is always the aesthetical
authority” (1986, 5). Imagination yields aesthetic pleasure through “the mental satisfaction
which the listener finds in continuously following and anticipating the composer’s
designs, here to be confirmed in his expectations, there to be agreeably led astray” (64).
But why are our anticipations imagined in one of the three relevant ways identified earlier,
rather than the product of a nonimaginative cognitive process, such as inference?
Hanslick is silent on this point. If listeners are “agreeably led astray” because they have a
justified belief that something will occur, imagination is superfluous. However, there is
one phenomenon that might bolster the claim that musical anticipation is a kind of
imaginative hearing-in. It is the case where a listener knows that a musical event will
occur, but who is nonetheless surprised by it. One of the classic examples is the crash of
noise in the second movement of Haydn’s Symphony No. 94, aptly nicknamed the
Surprise Symphony. It startles me even though I know it will be there, an effect that
482 theodore gracyk

would seem to require imaginative representation of where the music is headed in the
unfolding “world” of the music, independent of my belief about its real structure.
However, this musical phenomenon can also be explained without recourse to
imagination. Our listening process is cognitively complex, and consequently two
distinct, competing thoughts can be generated by different cognitive processes.
Haydn’s “surprise” exploits schematic expectation on the local scale, which leads us to
expect, moment to moment, another passage of soft and lulling music. Expectation of
disruption is based on episodic memory (from study of the score, or from past listening),
which conflicts with immediate listening expectations that arise unconsciously from
awareness of the perceptual pattern as it is being heard. Two doxastic states are in
conflict: for a moment, I genuinely expect a continuation of the soothing melody, and
for independent reasons I expect a disruption. Again, we have a straightforward
explanation of being “agreeably led astray” that does not require us to suppose that
imagination is involved.24 It is only imagination if imagination likewise generates my
expectation that the stuff in my mug will taste like coffee (because it tasted that way
when I sipped it thirty seconds ago), or when I anticipate that the airplane moving in a
straight line above me will, in the next few seconds, continue in the same line. However,
this is a general (and dubious) claim about all near-term anticipation, and not illuminating
about music.
One final candidate remains, and I think it fits the bill—it does not justify a conclusion
about all music, but it arises when listening to a great deal of music (especially Western
tonal music), and it requires imagination. When Haydn’s symphony arrives at its
“surprise,” it does more than violate our expectations about the music’s likely continu-
ation. The music’s forward motion has been suddenly, momentarily arrested.
Unconscious inference may be at work, yet it is not simply a matter of mistaken inference
about the immediate future. As Daniel Barenboim puts it: “There is a certain inevitability
about music. Once it is set in motion, it follows its own natural course” (2003, 190; see
also Meyer 1989, 33). Walton rightly emphasizes that we hear musical patterns as more
than sequences of motions: “we imagine (subliminally anyway) that causal principles
are operating by virtue of which the occurrence of the dominant seventh makes it likely
that a tonic will follow, and . . . we imaginatively expect the tonic, whether or not we
actually expect it” (1994, 49). There are two imaginative acts in Walton’s description.
We imagine that the tonic will follow and we imagine causal principles by virtue of
which this occurs. I have just explained why I do not think our expectation of the tonic is
a case of imagining. But the idea “that certain musical events are nomologically con-
nected” (Walton 1994, 49), where one sound or process is heard as if caused by another,
is a strong candidate for imaginative hearing-in. Since we do not perceive causal con-
nections, and so we are not subject to perceptual illusions about them, imagination
appears to be at work when we experience music as if causal relationships internal to the
music are guiding and organizing its unfolding. The aptness of this causal pretense
might be explained by an implication-realization model of musical listening
(Narmour, 1990), but our sense of guided motion goes beyond mere inferential
expectation that some sound events are more or less likely.
imaginative listening to music 483

What’s more, this fictional causality is generally imposed within a teleological

framework. In a lot of instrumental music, we sense the music’s starting out for
somewhere, of its being sidetracked, disrupted, and its eventual arrival. One might
suppose that we must therefore imagine the agency of a person, and therefore imagine a
persona, and therefore endorse Jonathan Kramer’s view that “tonal motion is always
goal-directed” (1988, 25). However, I might be content with the more schematic idea of a
virtual “gravitational field” with vectoral properties and competing forces (Hatten 2004,
133). Or I might flesh this out by imagining a pear tree, growing on a mountain, or the
course of the waters of the Rhone as they flow from Switzerland to the Mediterranean.
The structure of the music is a prop for imagining causal relationships and transformations,
but this causal activity can be imagined under a wide range of descriptions. Someone
who has no sense of causal relationships within tonal music must be listening in a very
strange and attenuated way (much like the listening of a very young child).

Conclusion

The experience of causal forces at work within music, shaping and directing it, arises
through the listener’s imaginative engagement in a manner that can be characterized as
hearing-in. In recognizing that there is at least one way in which music listening requires
imaginative engagement in the Western common practice tonal tradition, I have
defended a middle position between a musical purism that denigrates imaginative
response and the traditional consensus that finds the imagination at work in multiple
ways, especially in the apprehension of all musical motion and structure. The underlying
experience of musical motion betrays the hallmarks of illusion, not imagination imagery
or hearing-in. This illusion provides a natural, cross-cultural basis for music’s expres-
siveness. So our experience of expressive qualities does not always require imaginative
enrichment of what we hear.
It might seem a bit of a letdown to conclude that neither propositional imagining nor
hearing-in is required for much of the experience that is distinctively musical. However,
that conclusion has been secured by focusing on examples of absolute music that are
devoid of extra-musical cues about its interpretation. In practice, most music has either
an associated text, or is rich in expressive properties, or both. Artworks typically pre-
scribe imaginative engagement, and music typically does, too.25 To borrow Jerrold
Levinson’s way of putting it (2015, 135), an anti-imaginative view of “listening” misdirects
us if it counsels us to ignore the invitations that most music extends. That is, most musical
experience takes place in a cultural context that constitutes an invitation to imagine that
it portrays particular situations or events, so the music serves as a detailed yet inde
terminate prop for our imaginative engagement.
Artworks and other cultural artifacts, including most music, are designed to elicit a
particular response, and someone is not appreciating the artwork or cultural achieve-
ment for what it is if their response is indifferent to its cultural particularity. (Because
484 theodore gracyk

cultural artifacts have a history, responses that ignore their history may even invite
moral censure, see Gracyk 2011) A Strauss waltz’s invitation to respond physically, by
dancing, is quite different from a tone poem’s invitation to respond imaginatively,
including with rich imaginative imagery. And both of those invitations seem very differ-
ent from that of Bach’s Die Kunst der Fuge. Yet even that work invites imaginative
engagement in the form of hearing-in. Consequently, anti-imagination purism endorses
an impoverished response to a great deal of music.

Notes
1. The distinction between hearing music and listening to it is prominent in Hanslick’s
formalism (1986, 60). For additional discussion of his distinction and how it was deployed,
see Gracyk (2007, chap. 5), and Cook (1990, 15–17).
2. Prominent gesture theorists include Godøy (2010) and Levinson (2006a).
3. Susanne K. Langer categorizes a listener like Woolf as “a person of limited musical sense”
(1957, 242); imagination imagery should not be encouraged by teachers, critics, and
composers (243).
4. I set aside, without further comment, the degree to which listening to music involves
imagination because imagination is at work in all perceptual experience, filling in the gaps
as our attention flits among objects and providing more stability and coherence than we
directly perceive (e.g., Zinkin 2003; Stevenson 2003, 249–253). Such an account will not be
especially illuminating about music listening.
5. Context matters: beliefs and memories can be incorporated into fictional narratives;
consequently, the distinction between memory and imagined event is sometimes a matter
of the use of a representation (Walton 1990, 369). But, again, content alone does not make
the difference. I thank Bryan Parkhust for reminding me of this point.
6. Tamar Gendler offers compelling arguments that “quarantine” is always limited, and that
“a certain degree of contagion is inevitable, indeed desirable” (2010, 8; see especially chaps.
11 and 12).
7. Some analyses employ the phrase “imaginative hearing” rather than “hearing-in” (e.g.,
Trivedi 2011, 114). However, I avoid this phrase because other writers use it to refer to
imagination imagery.
8. An important line of analysis, descriptivism, argues that there is no such imagery. See
Pylyshyn (1973) and Dennett (1981).
9. Some accounts distinguish seeing-in from seeing-as, and so there might be reasons to
worry about a difference between hearing-in and hearing-as (e.g., the former requires
audibility but the latter involves propositional imagining). Following Levinson’s analysis
(1996, 111–112), I doubt that the distinction is relevant to anything I discuss.
10. I thank an anonymous reader for noting that it is difficult to conceive of propositional
imagining about music occurring without some form of imagination imagery. However, I
would suggest that this commonly occurs when one encounters descriptions of fictional
musical works in literary works, such as Marcel Proust’s description of the Vinteuil Sonata.
For more on fictional composers and compositions, see Ross (2009).
11. Consequently, the issue on which I focus is distinct from the question of the minimum
level and type(s) of conceptual information that a listener must apply in order to possess
musical understanding. The scope of this debate is outlined by Davies (2011, 88–128).
However, that debate concerns a listener’s genuine beliefs, not imagination.
imaginative listening to music 485

12. Woolf ’s (2003) story is ambiguous: her description of the listener’s stream-of-consciousness
may describe propositional imagining, or it may describe imagination imagery, or both.
13. However, assigning imagination imagery to the category of “triggered” associations
does not prove that listeners are “less musical” because they sometimes respond in this
manner.
14. Kendall Walton resists the interpretation that all music is representational and/or a prop
(1994, 59–60). However, his reservations turn on the technical point that one is actually
using one’s experience, not the music itself, as a prop. So my use of “prop” is more liberal
than Walton’s considered view (and more in line with Walton 1990, 63).
15. William Shakespeare, Twelfth Night, or What You Will, 1.1.
16. Movement in acousmatic space is to be distinguished from movement of sound in real
space, as when the marching band moves toward you and then away from you as it moves
down the street.
17. Budd (2003) has the most stringent position, denying that we must perceive an acous-
matic “space” in order to perceive musical motion.
18. Music theorist Steve Larson (2012) adopts and develops Scruton’s thesis by extending it to
metaphors of musical forces.
19. E.g., Nawrot (2003). At six months of age, infants display marked musical preferences
based on cultural exposure (Adachi and Trehub 2012). Although their listening is
impoverished compared to that of a competent adult, they are listening, not merely
hearing, and there is no reason to think that they are guided by metaphors.
20. Trivedi (2008, 51–52) makes a similar argument, but it depends on a premise about the
eliminability of all metaphor that is too sweeping.
21. Budd (2003, 212) proposes that Scruton might be read in this way, as distinguishing
between the perception and the metaphor, where the metaphor arises only in any subse-
quent verbal expression of that experience. However, Budd’s interpretation flies in the face
of the great many places where Scruton clearly says that “our experience of music involves
an elaborate system of [spatial] metaphors” (Scruton 1999, 80).
22. Consider the prevalence of such language in descriptions of philosophical exchanges: a
philosopher “stakes out a position” and then makes a “move” in an argument. These
descriptions have ceased to be metaphors and they can be employed without exercise of
the imagination.
23. This analysis is briefly suggested in Davies (2011, 32).
24. This explanation, in terms of two response systems, paraphrases Huron (2006, 226).
25. See Kieran (1996, 337). Having rejected the view that music is representational and invites
imagination whenever it “moves” or displays expressiveness, Kieran’s characterization is
more accurate than Walton’s position that “virtually all music qualifies” as representa-
tional (Walton 1994, 48).

References
Adachi, M., and S. E. Trehub. 2012. Musical Lives of Infants. In The Oxford Handbook of Music
Education, edited by G. McPherson and G. Welch, 229–247. New York: Oxford University Press.
Addison, J., and R. Steele. 1965. The Spectator. Vol. 3. Edited by D. F. Bond, Oxford: Clarendon Press.
Barenboim, D. 2003. A Life in Music. Edited by M. Lewin. New York: Arcade Publishing.
Batteux, C. 2015. The Fine Arts Reduced to a Single Principle. Translated by J. O. Young. Oxford:
Oxford University Press.
486 theodore gracyk

Becker, J. 2010. Exploring the Habitus of Listening: Anthropological Perspectives. In Handbook

of Music and Emotion: Theory, Research, Applications, edited by P. N. Juslin and J. A. Sloboda,
127–158. Oxford: Oxford University Press.
Bell, C. 1914. Art. London: Chatto and Windus.
Boghossian, P. 2007. Explaining Musical Experience. In Philosophers on Music: Experience,
Meaning, and Work, edited by K. Stock, 117–129. New York: Oxford University Press.
Budd, M. Music and the Communication of Emotion. 1989. Journal of Aesthetics and Art
Criticism 47: 129–38.
Budd, M. 2003. Musical Movement and Aesthetic Metaphors. British Journal of Aesthetics
43: 209–223.
Cook, N. 1990. Music, Imagination, and Culture. Oxford: Oxford University Press.
Copland, A. 1952. Music and Imagination. Cambridge, MA: Harvard University Press.
Davies, S. 1994. Musical Meaning and Expression. Ithaca, NY: Cornell University Press.
Davies, S. 2011. Musical Understanding and Other Essays on the Philosophy of Music. Oxford:
Oxford University Press.
De Clerq, R. 2007. Melody and Metaphorical Movement. British Journal of Aesthetics
47: 156–168.
Dennett, D. 1981. Two Approaches to Mental Images. In Imagery, edited by N. Block, 87–107.
Cambridge, MA: MIT Press.
Du Bos, J.-B. 1748. Critical Reflections on Poetry, Painting and Music. 5th ed. Translated by
T. Nugent. London: John Nourse.
Dutton, D. 2009. The Art Instinct: Beauty, Pleasure, and Human Evolution. Oxford: Oxford
University Press.
Elliott, R. K. 1967. Aesthetic Theory and the Experience of Art. Proceedings of the Aristotelian
Society 67: 111–126.
Gendler, T. S. 2010. Intuition, Imagination, and Philosophical Methodology. Oxford: Oxford
University Press.
Godøy, R. I. 2010. Gestural Affordances of Musical Sound. In Musical Gestures: Sound,
Movement, and Meaning, edited by R. I. Godøy and M. Leman, 103–125. New York:
Routledge.
Gracyk, T. 2007. Listening to Popular Music: Or, How I Learned to Stop Worrying and Love Led
Zeppelin. Ann Arbor: University of Michigan Press.
Gracyk, T. 2011. Misappropriation of Our Musical Past. he Journal of Aesthetic Education
45 (3): 50–66.
Gurney, E. 1880. The Power of Sound. London: Smith, Elder & Company.
Hamilton, A. 2007. Aesthetics and Music. London and New York: Continuum.
Hanslick, E. 1986. On the Musically Beautiful. Translated by G. Payzant. Indianapolis: Hackett.
Hatten, R. S. 2004. Interpreting Musical Gestures, Topics, and Tropes: Mozart, Beethoven,
Schubert. Bloomington: Indiana University Press.
Hume, D. 1987. Of the Standard of Taste. In Essays Moral, Political, and Literary, edited by
E. F. Miller, 226–249. Indianapolis, IN: Liberty Fund.
Huron, D. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge,
MA: MIT Press.
Johnston, M. 2006. Better than Mere Knowledge? The Function of Sensory Awareness. In
Perceptual Experience, edited by T. S. Gendler and J. Hawthorne, 260–290. Oxford: Oxford
University Press.
Kant, I. 2000. Critique of the Power of Judgment. Translated by P. Guyer and E. Matthews.
Cambridge: Cambridge University Press.
imaginative listening to music 487

Kieran, M. 1996. Art, Imagination, and the Cultivation of Morals. Journal of Aesthetics and Art
Criticism 54: 337–351.
Kivy, P. 1989. Sound Sentiment: An Essay on the Musical Emotions. Philadelphia, PA: Temple
University Press.
Kramer, J. D. 1988. The Time of Music: New Meanings, New Temporalities, New Listening
Strategies. New York: Schirmer.
Lang, P. H. 1997. Music in Western Civilization. New York: W.W. Norton.
Langer, S. K. 1957. Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and Art.
3rd ed. Cambridge, MA: Harvard University Press.
Larson, S. 2012. Musical Forces: Motion, Metaphor, and Meaning in Music. Bloomington:
Indiana University Press.
Lee, V. 1932. Music and Its Lovers: An Empirical Study of Emotion and Imaginative Responses to
Music. London: G. Allen & Unwin.
Levinson, J. 1996. Musical Expressiveness. In The Pleasures of Aesthetics: Philosophical Essays,
90–125. Ithaca, NY: Cornell University Press.
Levinson, J. 1997. Music in the Moment. Ithaca: Cornell University Press.
Levinson, J. 2006a. Sound, Gesture, Spatial Imagination, and the Expression of Emotion in
Music. In Contemplating Art: Essays in Aesthetics, 77–90. Oxford: Oxford University Press.
Levinson, J. 2006b. Nonexistent Artforms and the Case of Visual Music. In Contemplating Art:
Essays in Aesthetics, 109–128. Oxford: Oxford University Press.
Levinson, J. 2015. Musical Concerns: Essays in Philosophy of Music. Oxford: Oxford University Press.
Lopes, D. M. 2005. Sight and Sensibility: Evaluating Pictures. Oxford: Oxford University Press.
Mattheson, J. 1739. Der vollkommene Capellmeister. Hamburg, Germany: Christian Herold.
Meyer, L. B. 1956. Emotion and Meaning in Music. Chicago: University of Chicago Press.
Meyer, L. B. 1989. Style and Music: Theory, History, and Ideology. Chicago: University of
Chicago Press.
Narmour, E. 1990. The Analysis and Cognition of Basic Melodic Structures: The Implication-
Realization Model. Chicago: University of Chicago Press.
Nawrot, E. S. 2003. The Perception of Emotional Expression in Music: Evidence from Infants,
Children and Adults. Psychology of Music 31: 75–92.
Nichols, S. 2004. Imagining and Believing: The Promise of a Single Code. Journal of Aesthetics
and Art Criticism 62: 129–139.
Nussbaum, C. O. 2007. The Musical Representation: Meaning, Ontology, and Emotion.
Cambridge, MA: MIT Press.
Pylyshyn, Z. W. 1973. What the Mind’s Eye Tells the Mind’s Brain: A Critique of Mental
Imagery. Psychological Bulletin 80: 1–24.
Rosen, C. 1995. The Romantic Generation. Cambridge, MA: Harvard University Press.
Ross, A. 2009. Imaginary Concerts: The Music of Fictional Composers. New Yorker, August
24: 72.
Rothbart, P. 2013. The Synergy of Film and Music: Sight and Sound in Five Hollywood Films.
Lanham, MD: Scarecrow Press.
Scruton, R. 1974. Art and Imagination: A Study in the Philosophy of Mind. London: Methuen.
Scruton, R. 1999. The Aesthetics of Music. Oxford: Oxford University Press.
Scruton, R. 2014. The Soul of the World. Princeton, NJ: Princeton University Press.
Sparshott, F. 1990. Imagination: The Very Idea. Journal of Aesthetics and Art Criticism 48: 1–8.
Stevenson, L. 2003. Twelve Conceptions of Imagination. British Journal of Aesthetics
43: 238–259.
Tovey, D. F. 1936. The Training of the Musical Imagination. Music and Letters 17: 337–356.
488 theodore gracyk

Townsend, D. 2006. Historical Dictionary of Aesthetics. Lanham, MD: Scarecrow Press.

Trivedi, S. 2008. Metaphors and Musical Expressiveness. In New Waves in Aesthetics, edited by
K. Stock and K. Thomson-Jones, 41–57. Basingstoke, UK: Palgrave Macmillan.
Trivedi, S. 2011. Music and Imagination. In The Routledge Companion to Philosophy and Music,
edited by T. Gracyk and A. Kania, 113–122. London and New York: Routledge.
Walton, K. L. 1990. Mimesis as Make-Believe: On the Foundations of the Representational Arts.
Cambridge, MA: Harvard University Press.
Walton, K. L. 1994. Listening with Imagination: Is Music Representational?” Journal of
Aesthetics and Art Criticism 52: 47–61.
Warnock, M. 1976. Imagination. Berkeley and Los Angeles: University of California Press.
Woolf, V. 1977. The Diary of Virginia Woolf, Vol. 1: 1915–1919, edited by A. O. Bell. New York:
Harcourt Brace Jovanovich.
Woolf, V. 1980. The Diary of Virginia Woolf, Vol. 2: 1920–1924, edited by A. O. Bell. New York:
Harcourt Brace Jovanovich.
Woolf, V. 2003. A Haunted House: The Complete Shorter Fiction, edited by S. Dick. London:
Vintage.
Zangwill, N. 2007. Music, Metaphor, Emotion. Journal of Aesthetics and Art Criticism
65: 391–400.
Zinkin, M. 2003. Film and the Transcendental Imagination: Kant and Hitchcock’s The Lady
Vanishes. In Imagination, Philosophy, and the Arts, edited by D. Lopes and M. Kieran,
245–258. London: Routledge.
chapter 24

A Hopefu l Ton e
A Waltonian Reconstruction
of Bloch’s Musical Aesthetics

Bryan J. Parkhurst

Introduction

Here are two similar-sounding terms: normative aesthetics and normativist aesthetics.
The principal contentions of this paper are that (1) Ernst Bloch’s normative aesthetics of
music and Kendall Walton’s normativist aesthetics of music both set out to address the
relationship between music and the imagination or, more broadly, between “musicking”
(Small 1998) and the imagination1; and that (2) Walton’s normativist theoretical frame-
work provides conceptual resources that are helpful for interpreting and critiquing
Bloch’s normative claims. The first order of business, then, is to give a provisional expla-
nation of the difference between normative aesthetics and normativist aesthetics.
Normative aesthetic claims belong to the realm of the aesthetic “ought.” They concern
what the aesthetic subject ought to do and how the aesthetic object ought to be. Aristotle
speaks in a normative-aesthetic register when he states:

if you string together a set of speeches expressive of character, and well finished in
point of diction and thought, you will not produce the essential tragic effect nearly
so well as with a play which, however deficient in these respects, yet has a plot and
artistically constructed incidents (1961, 63).

So does Hume, in telling us that “to enable a critic the more fully to execute [his critical]
undertaking, he must preserve his mind free from all prejudice, and allow nothing to
enter into his consideration, but the very object which is submitted to his examination”
(Hume 2006, 244). It is thus correct to say that “normative aesthetics establishes rules for
490 BRYAN J. PARKHURST

the artist and standards for the critic” (Jerusalem 1920, 207). But it is not complete,
for the aesthetic subject need not be an artist or a critic in the usual sense (she might, for
example, be a participant in a traditional worksong of a tribal community) and the
aesthetic object need not be an artwork (it might instead be a human body, a sunset, a
mathematical equation). Additionally, we might include in the domain of normative
aesthetics interpretive claims about what a specific work means or represents (taking
a cue from philosophers of language who hold that meaning itself is a normative
property). And we might also include judgments that concern the moral and political
character of works of art, so that normative aesthetics becomes capacious enough to
encompass, as well, the critical theory or critique of aesthetic phenomena, that is,
morally or politically committed aesthetic inquiry animated by “a vision of the good
social order grounded in both a detailed, empirical understanding of how existing insti-
tutions function and a commitment to normative criteria that are (in the broadest sense)
ethical” (Neuhouser 2011, 281).
A good thing to mean by “normativist aesthetics,” and what is meant by it here, is the
investigation, description, and systematization of the norms that are held to be constitu-
tive of a given aesthetic activity, that is, the norms one must abide by insofar as one is a
participant in an aesthetic practice.2 “Meta-aesthetics” might also be an appropriate
term for this type of inquiry. On one reading of it, Kantian aesthetics is in large part norm
ativist: it seeks to identify the norms the adjudicating subject follows in performing
a type of judgment that counts as distinctively aesthetic (as opposed to distinctively
empirical or moral, in the Kantian trichotomy).3 Whereas normative aesthetics espouses
norms, normativist aesthetics delineates the internal practical structure of normatively
governed practices, and is in that sense a kind of Geistes- or Kulturwissenschaft, or what
could be called a philosophical anthropology. As we shall see, the line of demarcation
between normative and normativist aesthetics, although it will be useful for us as a heu-
ristic, is not sharply inscribed.4
In what follows, I juxtapose Marxist normative aesthetics with Anglo-American
analytic normativist aesthetics. I do this by looking at Bloch’s theory of utopian musical
listening (a theory, I will suggest, of how music ought to be heard, of how its latent
revolutionary content ought to be disclosed by acts of imaginative listening) against the
background of Walton’s theory of musical representation and emotionality (an account
of the norms that govern music-centered make-believe). My aim in bringing together
these two very different philosophical treatments of the musical imagination is to use
Waltonian tools to reconstruct a core Blochian position regarding the relationship
between musical sound and revolutionary political consciousness. The position in
question is that music makes an appreciable contribution to the psychological faculty
of imagining, and to the political project of constructing, a better world, a “regnum
humanum” (a kingdom of humanity, Bloch 1986, 1296), in which there is an abolish-
ment of alienation, violence, and privation.5 For Bloch, a Hegelian Marxist, the project
of actualizing a regnum humanum through the implementation of communism is a
historical labor of communal Selbstbildung and Selbstverständigung (self-formation and
self-reflection):
A Hopeful Tone 491

Once man has comprehended himself and has established his own domain in real
democracy, without depersonalization and alienation, something arises in the world
which all men have glimpsed in childhood: a place and a state in which no one has
yet been. And the name of this something is home (Heimat). (1971, 44–45)6

I am concerned to assess whether Bloch is entitled to regard musical works as indispensable

coadjutants in this process and to invest them with as much political gravity as he does.
The structure of this chapter is as follows. I begin with an overview of Walton’s theory
of fiction and his application of it to music. I then historically situate Waltonian musical
aesthetics as a way of transitioning to the contrasting ideological framework of Bloch’s
Marxist aesthetics. Thereafter, I use Walton’s categories to give an interpretive recon-
struction ofBloch’s key proposal about music. Having thereby delimited a commitment
that is (1) plausibly attributable to Bloch and (2) clear enough to be evaluable, I close
with some evaluative remarks.

Waltonian Fictionality

Walton’s theory of fiction, as set out most notably in Mimesis as Make-Believe, is a theory
of what it is for an artwork to have the representational content it has.7 The content of
representational works of art corresponds to what is true in the world of the work. Saying
that a proposition is true in the world of the work is the same as saying that the propo-
sition is “fictional.” And the fact that a proposition is fictional is a fact about a normative
status it possesses. Propositions that are fictional are to be imagined; they are what an
appreciator (a reading, listening, viewing consumer) of an artwork ought to (is under
some form of normative pressure to) imagine, because and insofar as she engages with
the artwork as a participant in a game of artwork-centered make-believe. Hence, for
Walton, what is “inside” an artwork—the otherworldly fictional content it contains—has
everything to do with what goes on “outside” of it, that is, with how this-worldly appreci
ators conduct themselves in relation to the artwork and in accordance with certain rules
of aesthetic behavior that dictate what, how, and when to imagine:

Fictional worlds are imaginary worlds. Visual and literary representations estab-
lish fictional worlds by virtue of their role in our imaginative lives. The Garden of
Earthly Delights gets us to imagine monsters and freaks. On reading Franz Kafka’s
story, “A Hunger Artist,” one imagines a man who fasts for the delight of spectators.
It is by prescribing such imaginings that these works establish their fictional worlds.
The propositions we are to imagine are those that are “true in the fictional world,” or
fictional. Pictures and stories are representational by virtue of the fact that they call
for such imaginings. (2015, 153)

A game of make-believe played with an artwork is often one in which the work serves
as a “prop.” A prop has the function of rendering certain propositions fictional in the
492 BRYAN J. PARKHURST

context of a set of “principles of generation.” These are conventions that regulate how
specific features of the prop (such as the property a slab of marble has, when carved just
so, of resembling an uncovered female figure) confer fictionality on specific propositions
(such as the proposition Aphrodite is naked). But some of the make-believe called for by
an artwork is not centered on the artwork’s (propositional) content proper (its fictional
world) and is instead centered on the appreciator’s own sensory and cognitive engagement
with that artwork as a prop. When looking at Bruegel the Elder’s The Peasant Wedding,
according to Walton’s account, I imagine not only that there is rustic merrymaking, but
also that I see rustic merrymaking, and that my visual experience of the painting is a
visual experience of rustic merrymaking. What is imagined in imaginings of this sort
belongs to a “game world” rather than to a “work world.” These imaginanda are not
constitutive of the artwork’s subject matter aptly so-called, but imagining them is never-
theless made appropriate by the particular manner and circumstances in which the
artwork puts its appreciator in epistemic contact with its subject matter.
This précis leaves out many subtleties. But three notable features of Walton’s theory
are evident from what has been said so far:

1. It is first and foremost a theory of the “intentionality” of artworks, their property

of being about something or of being representations of something.
2. The nature of this aboutness is cashed out mainly along propositional lines: art is
representational when it normatively enjoins us to imagine that a proposition is true,
that something is the case, that a state of affairs obtains.
3. The appreciator’s aesthetic experience of an artwork is held, contra various species
of formalism, to paradigmatically involve the interpretive, meaning-gleaning activity
of deriving items of semantic content (fictions) from features of the work and from
features of the context in which it is experienced.

This being the case, it seems right to class Walton’s theory as a normativist theory of
reception. And it seems equally right to assume that this is not the sort of reception the-
ory you get if you take music as your starting point or primary datum. Music looks to be
a proposition-mongering affair only intermittently and per accidens, maybe even devi-
antly (in opposition to the true nature of true music). Pretheoretically, music does not
strike us as a form of art that consistently and constitutively has us imagining that
such-and-such is thus-and-so. “The weak representational nature of music (relative to the
other arts)” (Klumpenhouwer 2002, 34)—what Richard Wagner called the “infinitely
hazy character of music”—has led aestheticians to insist, with some justice, that music is
not an art of content but instead an art of pure form. Or, in variations on this theme, they
have held that music’s form is its content (as Eduard Hanslick believed); or that music’s
content is distinctively and exclusively musical, owing to music’s thoroughgoing self-
reflexivity, its inability or refusal to be a signifier of anything besides itself (as Heinrich
Schenker believed). In the most extreme version of this gesture, it is claimed (by, among
others, the musical aestheticians of the German Frühromantik, such as Tieck and
Hoffmann) that music is sheerly ineffable, and that the content of a musical experience
is entirely refractory to the fixities of linguistic description.8
A Hopeful Tone 493

The history of musical aesthetics contains far more denials than affirmations that
music, as such, is “a transcribable, thus readable, discourse” (Attali 1985, 25) that is replete
with linguistically paraphrasable content.9 But there is a dogged recurrence of the trope
that music still somehow aspires to the condition of language, perhaps by being organized
rhetorically, like a speech (as Baroque-era theorists such as Johann Mattheson argued),
perhaps by possessing a underlying grammar-like structure (e.g., a formalizable syntax)
that floats free from a domain of reference,10 or perhaps by being in some looser way
pseudolinguistic, for example, by being a “temporal succession of articulated sounds that
are more than just sound, [a succession that is] related to logic [in that] there is a right and
a wrong” (Adorno 1993, 401). Yet often, as in the case of Adorno’s theory, the acknowledg-
ment of a “language-character” (Sprachcharakter) in music is accompanied by undi-
minished eagerness to evacuate music of semantic significance. Music may in some sense
“speak” to its hearers and abide by some kind of “logic,” Adorno believes, “but what is said
cannot be abstracted from the music; it does not form a system of signs” (1993, 401).11
Whether or not Adorno’s or any other antisemantic theory fully and accurately
diagnoses music’s condition, it seems inarguable that antisemanticism represents a
motivated response to some deep, distinguishing, and perhaps distinguished feature of
music. And music’s “infinitely hazy character” shows up as a problem to be reckoned
with if one’s explanatory standpoint, like Walton’s, is on the whole, a propositional/
representational one.

Walton on Music

Walton’s representation-centric normativist theory does not rest on a bedrock of intu-

itions and considerations that are primarily musical: music alone, we can agree, simply
does not create a pressing demand for a theory of fictional representation. But, as
Walton’s rich and attentive discussions of music show, this does not mean that music
cannot be profitably examined from a representationalist angle. Walton’s important
contribution to the philosophy of music is to have drawn our attention to a host of con-
tinuities between music and the other (more unequivocally representational) arts,
continuities that would have been suppressed or passed over by a theory that began
with, and that attempted to theoretically enshrine, the conviction that music is sui generis.
In Mimesis, and in several articles that specifically address music, Walton presents reasons
for believing that much, perhaps most, music is representational in the same sense that
novels and paintings are. These include, but are not limited to, the following:

1. A musical work may be associated with a fictional world because it prescribes

imaginings about its own constituents, such as its chords and melodic lines. For example,
we may imagine that musical objects causally impinge on one another:
[w]e imagine (subliminally anyway) that causal principles are operating by virtue
of which the occurrence of the dominant seventh makes it likely that a tonic will
follow, and on hearing the dominant we imaginatively expect the tonic, whether or
494 BRYAN J. PARKHURST

not we actually expect it. If, or to the extent that [this imagining] is prescribed, we
have fictionality. (Walton 2015, 154)

2. In many instances, music’s expressivity may be a function of how it represents

human behaviors, which it does by prescribing that we imagine that there is a behaver
who behaves:
[T]here can be no doubt that some expressive music is expressive by virtue of con-
nections with human behavior. There is little strain in thinking of some musical
passages as representing, as inducing us to imagine, exuberant or agitated or bold
behavior. . . . Where there is behavior there is a behaver. If music represents an
instance of behaving calmly or nervously or with determination, it represents, at
least indirectly, someone so behaving. So the fictional world contains human beings,
anonymous fictive agents, whether or not the sounds themselves are characters in it.
(Walton 2015, 156–7)

3. Music may represent properties of events or actions and thus make it fictional that
property-bearing events or actions take place, while leaving largely indeterminate the
kinds of objects or actors implicated in those events and actions:
[T]he lateness of the upper voice, and its dallying quality, the rigidity of the bass’s
progression, the fortuitousness or accidentalness of the D-major triad, the movement
to something new, are in the music . . . Some of this at least is a matter of imagining.
We imagine something’s being late, probably without imagining what sort of thing it
is. And we imagine a fortuitous or accidental occurrence. . . . [W]hy shouldn’t it count
as representational, . . . as representing instances of lateness, fortuitousness, etc.?
(Walton 2015, 159)

Without feeling at all disposed to deny these claims, we may still feel disposed to pass
comment on how theory-laden they are. These are the sorts of things one would be
primed to notice and point out about music if one’s motivating objective were to extend
the applicability of a propositional model of fictional representation, a model designed
in the first instance to accommodate literary and visual aesthetic phenomena. This not
an indictment. All observations and explanations are probably in some measure theory-
laden. Moreover, it is satisfying to follow Walton to the counterintuitive but unavoidable
conclusion he reaches, which is that there is a class of familiar music-appreciative behav-
iors that centrally involve make-believe, such that music (more often than not, and more
often than anyone had realized) is representational and is (to that extent) a cognate of
pictures and literature qua fiction-conveying technologies. And the attraction of having
a unified theory under which art-forms can be subsumed serves as an incentive to think
and talk about music in terms of its pervasive representationality, or (equivalently) its
fictionality, or (equivalently) its imagination-prescribing function.
One may worry that there is a danger that those attractions will lead us to dispro-
portionately accentuate music’s ties to word and image and to downplay whatever it is that
is idiosyncratically musical (radically nonliterary, radically nonpictorial) about music.
For we can readily grant that the kinds of fictionality Walton finds in music may be
A Hopeful Tone 495

commonplace (many people may in fact perform such imaginings when they listen to
music) and may be in some sense mandatory (one may do a worse job of appreciating
music if one fails to perform such imaginings) while at the same time believing that such
imaginings are not a sine qua non of musical experience in and of itself. One arguably
cannot count as appreciating Maxim Gorky’s Mother as a novel, or count as appreciating
Evdokiya Usikova’s Lenin with Villagers as a picture, at all without using one’s imagi-
nation to explore a fictional world populated with fictional objects and events. By contrast,
although a relatively nonimaginative experience of Shostakovich’s Leningrad Symphony
might arguably be impoverished in comparison with an experience that is rich in propo-
sitional imaginings, most of us will be unwilling to insist that a listener who fails to
imagine the Siege of Leningrad while listening to this piece, or who fails even to non-
specifically imagine that something or other is destroyed or imperiled, or that violence
or trauma somehow transpire, is thereby disqualified from counting as a musical lis-
tener altogether.12
But actually, there is little cause for worry, for Walton concedes most of this. He rec-
ognizes that there is a significant “remainder” (as Adorno would say) when music is
brought under the categories native to a theory of fictional representation, something
left over that is, paradoxically, made conspicuous by the very fact that it is excluded or
downplayed, something that is (again as Adorno would say) “nonidentical” with the
concepts whose explanatory use the Waltonian theory of fictionality encourages.13
Accordingly, Walton looks outside the bounds of fictional representation for a feature
that marks music off from the literary and visual arts. To seek this, given an antecedent
conception of aesthetic appreciation as something that requires compliance with norms of
imagining, is to seek a form of imaginative experience the prescribing of which is unique
to music. Walton finds music’s individuating trait, the differentium specificum that it
does not share with novels and paintings, and so forth, in the way it prevails upon us to
imagine that our auditory experience of the musical work is an affective experience of an
emotional state. Music “gets us to imagine experiencing a certain feeling, and possibly
expressing it or being inclined to express it in a certain manner. It often does this without
getting us to imagine knowing about (let alone perceiving) someone else having that
experience or expressing it in that manner” (Walton 2015, 173). Rather, music gets the
listener to imagine of her experience of hearing sounds that it is an experience of a par-
ticular emotion. Walton’s terminology allows this idea to be expressed concisely: music,
unlike other forms of art, gets us to treat our perceptual experience itself (as opposed to
external objects that cause and are represented by that experience) as a prop. Walton
frames the suggestion in terms of the difference between work worlds and game worlds:

Work worlds comprise fictional truths generated by the work alone. But feelings . . . do
not exist independently of people who feel them. . . . So there is no pressure to regard
the music itself as establishing a fictional world in which there are feelings. . . . It is
the listener’s auditory experiences, which, like feelings, cannot exist apart from being
experienced, that make it fictional that there are feelings. When the listener imagines
experiencing agitation herself, there is no reason to think of the music as making
496 BRYAN J. PARKHURST

anything fictional. It is the listener’s hearing of the music that makes it fictional
that she feels agitated. The only fictional world is the world of her game, of her
experience. (2015, 173)

The Historical and Class

Character of Walton’s Theory

Ignoring for present purposes the technical nuts and bolts of what I have elsewhere
called Walton’s “first-person feeling theory of musical expression,”14 I now segue into a
discussion of Bloch by first ideologically diagnosing the kind of imaginative listening that
Walton’s normativist theory takes as its object of theorization. A critical-historical vantage
point allows us to see that this mode of listening has affinities with the ideology of what
Korstvedt (2010, 122), following Adorno, calls “Romantic bourgeois Innerlichkeit.”15
One key respect in which the Waltonian listener can be identified as a stereotypically
“Romantic” subject is that this listener’s experience (of music as a locus of expression
or emotionality) is one in which introspectible “sentiment, longing, and emotion . . .
even suppressed animality” (Korstvedt 2010, 122)16 are elevated to the status of ends-
in-themselves. For the Romantic aesthete—in whose eyes art is a supremely valorized
object of perception and cognition because it facilitates a “spontaneous overflow of pow-
erful feelings”17—an activation or intensification or heightened awareness of the emo-
tions is the raison d’être of music, or of a certain way of listening to it. According to this
outlook, musical sounds are not to be valued primarily as stimuli that lead to proper
action (as Plato’s account of the musical modes in The Republic holds) nor as aids to the
restoration of bodily and spiritual equilibrium (as Aristotle’s account of musically abet-
ted catharsis in the Politics holds). Nor are musical sounds to be listened to for their own
sake, or for the sake of detached contemplation of their sensuous auditory properties,
as happens in the modernist practice of “reduced” listening, that is, “listening for the
purpose of focusing on the qualities of the sound itself (pitch, timbre, etc.) independent
of its source or meaning” (Chion 1994, 222–223). Instead, music figures into the Romantic
vision as an instrumentally valuable means to an intrinsically valuable end of emotional
extremity or disequilibrium.
According to Hegel, whose aesthetic system both expounds and historicizes the
Romantic conception of music, music’s release from its self-incurred tutelage (its ages-
long period of subordination to the verbal art forms) occurs when it matures into
a romantische Kunstform, a mode of artistic expression that taps into “inner spirit”
(der innere Geist, i.e., emotional subjectivity). More so than the other arts, music pro-
vides “a resonant reflection, not of objectivity in its ordinary material sense, but of the
mode and modifications under which the most intimate self of the soul, from the point
of view of its subjective life and ideality, is essentially moved” (Hegel 1920, 342). Music is
a “province which unfolds in expanse the expression of every kind of emotion, and every
shade of joyfulness, merriment, jest, caprice, jubilation and laughter of the soul, every
A Hopeful Tone 497

gradation of anguish, trouble, melancholy, lament, sorry, pain, longing and the like, no
less than those of reverence, adoration, and love fall within the appropriate limits of its
expression” (Hegel 1920, 359). Adorno, a Romantic at heart, accepts these premises and
attempts to draw out what he sees as their consequences for the psychology of class-
position. He interprets the musically assisted retreat into the “exclusive and private
warmth” (Daniel 2001) of bourgeois interiority as a gesture of withdrawn resignation,
on the part of a no longer heroic middle class, in the face of monstrous and impersonal
forces and relations of production that lie outside the ken of the individual subject’s
power to efficaciously intervene for social change:

Although inwardness, even in Kant, implied a protest against a social order heter-
onomously imposed on its subjects, it was from the beginning marked by an indif-
ference toward this order, a readiness to leave things as they are and to obey. This
accorded with the origin of inwardness in the labor process: Inwardness served
to cultivate an anthropological type that would dutifully, quasi-voluntarily, per-
form the wage labor required by the new mode of production necessitated by the
relations of production. With the growing powerlessness of the autonomous subject,
inwardness consequently became completely ideological, the mirage of an inner
kingdom where the silent majority are indemnified for what is denied them socially.
(Adorno 1998, 116)

Walton, I wish to suggest, offers a normativist description of the reception practice

that is the shared basis of Adorno’s and Hegel’s aesthetics of music: a modality of listening
in which music functions as “a resonant reflection . . . of the mode and modifications under
which the most intimate self of the soul . . . is essentially moved.” “Reflection,” in its Hegelian
usage, stands for a relationship between inner and outer (the appearance of something
reflects its essence), as well as for a form of second-order or self-directed awareness
(I reflect on my own thoughts and experiences when they are treated as objects of further
thoughts and experiences). As Walton describes it, music is caught up in a social practice
that is reflective in both of these senses. Music’s much-vaunted expressiveness arises
from its special use-value (to use Marx’s term), its capacity for setting in motion, and
steering the course of, an exercise of the imaginative, affective, and introspective faculties.
And this reflective, reflexive relation of the listening self to itself is achieved through the
mediation of an external musical sound-object, a work that is the “reflective” outward
appearance of an inner emotional esse, to wit, the music’s expressive content (Inhalt).
Also, music’s conventionally instituted, socially enacted prescriptivity, the exhortative
force with which it prompts us to engage in an imaginative exploration of the intimacies
of the self, is the normative ground of its felt expressiveness for the individual subject.
This represents an additional layer of reflective relation between inner and outer: the
“outer,” social fact that music has the job of “get[ting] us to imagine feeling or experiencing
exuberance or tension ourselves—or relaxation or determination or confidence or anguish
or wistfulness,” according to Walton, “goes a long way toward explaining the intimacy
[one feels] with the anguish in the music” (2015, 165). As all this indicates, Walton can
plausibly be viewed as within the groove of a venerable Romantic problematic.
498 BRYAN J. PARKHURST

The Waltonian emotional listener is a “bourgeois” subject, we can go on to say (with an

Adornian accent), in that her auditory activity is conducted as the private affair of an
autonomous individual who exerts a form of self-control or self-mastery over her own
subjectivity, specifically by configuring her auditory experience in accordance with the
rules of a game of make-believe. A sense of ownership, possession, or inhabitance is implicit
in such listening,18 as is suggested by the spatial metaphors Walton uses to describe it:

My impression is the opposite of being distanced from the world of the music. . . .
I feel intimate with the music, more intimate, even, that I feel with the world of a
painting . . .it is as though I am inside the music, or it is inside me. Rather than hav-
ing an objective, a perspectival relation to the musical world, I seem to relate to it in
a most personal and subjective manner. (2015, 165)

And, although it is inherently norm-compliant, musical experience also stakes out a

preserve in which the pursuit of one’s private aesthetic interest is insulated from inter-
ference by an external authority figure (the author). For, in music, unlike in literature
and painting, it is the appreciator’s subjective experiences, rather than the work’s objec-
tive properties, that are directly responsible for generating fictional truths. In that sense,
music’s normativity is self-legislated by the listener:

It is the auditory experiences, not the music itself, that generate fictional truths.
I can step outside of my game with a painting. When I do, I see the picture and notice
that it represents a dragon, that it calls for the imagining of a dragon (even if I don’t
actually imagine this). But when I step outside my game with music and consider
the music itself, all I see is music, not a fictional world to go with it. There are just
the notes, and they themselves don’t call for imagining anything. The absence of a
work world does not, however, prevent the listener’s imagination from running wild
as she participates in her game of make-believe. (Walton 2015, 174)

The term “Innerlichkeit” (inwardness, interiority, subjectivity, introspection, introversion)

is pertinent to all this. It captures the fact that the Waltonian listener is self-reflexive, is
absorbed in the “world of her game,” is intent on the phenomenological properties made
manifest to her in her own act of introspective self-awareness, and is concerned with the
public, outer realm (the realm in which musical sounds have their objective being) only
insofar as its sonic events provide a mode of access to, or an occasion to withdraw into,
an interior realm of solitary, self-directed emotional fantasy.

Bloch’s Musical Aesthetics

In that it essentially involves phenomenological introspection, the having of an experience

whose intentional content is (at least in part) the having of an experience (recall that
Walton proposes that we imagine of our auditory experience of music that it is an emotional
A Hopeful Tone 499

experience), Waltonian emotional listening represents an encounter with the self. This is
a self-encounter in which, moreover, the imagination is irreducibly involved. Bloch’s
philosophy of music likewise thematizes imaginative self-encounters and it likewise
has a pronounced Romantic slant to it (Habermas [1969–1970] calls Bloch a “Marxist
Romantic”). But rather than attempting to codify the norms indigenous to a historically
localized form of “bourgeois-Romantic” listening, as Walton’s theory can be read as doing,
Bloch’s philosophy of music endeavors to bring about the dialectical transcendence—
the processual subversion, preservation, and elevation—of those norms. It does this by
prescribing a mode of emotional listening that poses challenges to both the aesthetic
ideology in which it itself is rooted—namely, bourgeois Innerlichkeit—and the economic
configuration that is, in Bloch’s Marx-inspired view, determinative of this ideology—
namely, the capitalist mode of production. This reappropriation and redeployment of
his culture’s musical heritage is part and parcel of Bloch’s overall philosophical strategy
of pitting culture (religion, philosophy, art, etc., as they have been handed down) against
itself (by “sublating,” problematizing, and radicalizing it) for the sake of itself, that is, for
the sake of instituting new cultural forms that can nurture the “subjective conditions for
revolution” (Kellner and O’Hara 1976, 19) by evoking a “future kingdom of freedom as
the real content of revolutionary consciousness” (Bloch 1972, 272). Habermas points to
the Hegelian basis of this maneuver:

What Bloch wants to preserve for socialism, which subsists on scorning tradition, is
the tradition of the scorned. In contrast to the unhistorical procedure of Feuerbach’s
criticism of ideology, which deprived Hegel’s “sublation” (Aufhebung) of half of
its meaning (forgetting elevare and being satisfied with tollere), Bloch presses the
ideologies to yield their ideas to him; he wants to save that which is true in false
consciousness: “All great culture that existed hitherto has been the foreshadowing of
an achievement, inasmuch as images and thoughts can be projected from the ages”
summit into the far horizon of the future. (1969–1970, 312)

It is clear from the amount of attention Bloch lavishes on music,19 and from his undi-
luted enthusiasm for it, that he judges the Western musical heritage to be among the
most precious items in the bequeathed patrimony of “great culture.” This is because
music performs with distinction what Bloch sees as the rightful function of art in
general, that of putting us in touch with our longing for, and with our will to create, a
world unblemished by alienation, exploitation, and oppression. Music is preeminent
for Bloch because it is preeminently utopian:

For Bloch, music is the most utopian of the arts. It is speech which men can under-
stand: a subject-like correlate outside of us which embodies our own intensity,
and in which we experience an anticipatory transcendence of the existing interval
or distance (Abstand) between subject and object. “Identity,” the “last moment,”
“a world for us,” “utopia” is present in music: as the anticipatory presence and
pre-experience (Vorgefühl) of the possibility of self. . . . Music expresses something
“not yet.” It copies what is objectively undetermined in the world. There is a human
500 BRYAN J. PARKHURST

world in music which has not yet become actual: a pre-appearance of a possible
regnum humanum. For Bloch, music is the most public organon of the Incognitio
or subjective factor in the world as a whole, and it provides an anticipatory experi-
ence of the subject-like (subjekthaft) agens as if it had become objectified in the
external world. (1982, 175)

On top of having a revelatory or epiphanic function (which I return to below), music—as

Bloch hears it, and as he directs us to hear it—has a motivational, action-precipitating,
hortatory function that goes beyond a prescription to merely imagine. Bloch’s ideal lis-
tener, in and by listening, attunes herself to a subjective, atavistic “agens” (Bloch 1986, 204),
a desiderative force that conducts the revolutionary agent forward along the historically
necessary path. Music sets in motion and sustains a process of revolutionary world-
transformation by providing a critical insight—a disconcealment of the distance between
what the world is and what it ought to be—and by helping us to “keep faith with the
directions implied by such beginnings” (Bloch 1960, 286; translated by Jameson 1971, 122).
Music sonically amplifies, and places us in auditory communion with, the agens, “the
primordial hunger that activates the (human) subject” as a site of “desire and hope”
(Levy 1997, 177).
Thus, to adapt Marx’s phrase, Bloch’s philosophy of music seeks to discover the revo-
lutionary kernel within the ideological shell of inherited modalities of musical listening.
Bloch’s dialectical approach leaves intact many of the salient features of bourgeois-
Romantic listening. Bloch never wavers in affirming music’s deep and ineradicable
connection to emotion, subjectivity, and introspection: “Expression is always music’s
terminus a quo and terminus ad quem” (1985, 206), and “music is . . .subjective . . .in
that its expression . . .mirrors the affective looking-glass reflecting a given society and the
world as it occurs in affective correlate” (208). Thanks to its “capacity for directly human
expression” (Bloch 1985, 201), music enables us to “approach ourselves purely, encounter
ourselves” (Bloch 2000, 39). But this encounter with the self is not—as it is in the
approach to listening that is extolled by Hegel and analyzed by Walton—mostly or merely
a turn inward toward solipsistic fantasies and unfettered emotional peregrinations.
Musical experience “is not just romantic or quasi-freely subjective” (Bloch 1985, 200).
Rather, the musical self-encounter is a turn inward that is always also a turn outward.
The self-identity that the musical listener introspectively encounters is at the same
time a reflective, communal identity. Hegel speaks in this connection of an “ ‘I’ that is a
‘We,’ and a ‘We’ that is an ‘I’ ” (Hegel 1977, 110). Emotionality remains a precondition for
genuinely musical experience, but the having of a private emotional pathē is no longer
held up as an end-in-itself. Rather, private, individual emotion is pressed into service
for the sake of a public, communitarian causa finalis: “Expression is realized in forms
regarded not as reifications and an end in themselves but as means to a word-surpassing or
wordless statement and always, ultimately, to the utterance of a—call” (Bloch 1985, 205).
A “call” in a number of senses. Music calls to us, in a “language” that has no recourse
to words and that, either in spite of or because of this limitation, is able to tap into our
A Hopeful Tone 501

profoundest feelings of hope and expectancy.20 It calls for us, beckons us, from a utopian
future we currently see through a glass, darkly. And it calls upon us to recognize and
fully actualize the nature of our true, ideally social selves, both by causing us to regret,
and to resolve to rectify, the current incompleteness of our historical project of self-
emancipation and self-realization, and also by pushing us to adopt the means necessary
for achieving a world that is adequate to our shared species-being (Gattungswesen).
Such a world must of needs be characterized by collectivity, nonalienation, solidarity,
and the absence of scarcity—attributes whose political and economic precondition,
Bloch believes, is the abolition of the capitalist mode of production through the insur-
rectionary activity of the proletarian class. “The realized We-world” is Bloch’s term for
the unqualifiedly redemptive commonwealth of humanity that is the asymptotic goal of
the socialist movement. By means of an act of divinatory musical hearing (Hellhören),
Bloch thinks, we can feel the real possibility of this future (or, if you like, future-perfect)
state of affairs, can gain sensuous knowledge of the world’s objective tendency (Tendenz)
to move toward the actuality of communism.21 “Music as a whole stands at the boundary
of humanity, but [it is] the boundary where humanity, with a new language and the call-
aura surrounding deeply felt intensity, a realized We-world [der Ruf-Aura um getroffene
Intensität, erlangte Wir-Welt] first comes into being. The order in the musical expression
also suggests a house, even a crystal, but one composed of future freedom; a star, but as a
new earth” (Bloch 1986, 1103).

Utopia

That all sounds rousing and eschatological enough, but what precisely can it mean?
If we hope to decipher such aphorisms, we must place them against the background of
Bloch’s more general theory of utopia, the master narrative that structures all of Bloch’s
philosophical and sociological investigations. For the politically committed Marxist-
Leninist Bloch of The Principle of Hope, a variety of human communicative practices—
predominantly those involving the “production and usage of signs” to convey “social
meaning[s] expressed in a code” (Attali 1985, 24)—possess a “utopian function.” In a
motley assemblage of cultural forms—architecture, fairy-tales, the detective novel,
religion, alchemy, circuses, advertisements, fashion, medicine, and the fine arts—Bloch
espies a “Vorschein,” an anticipatory illumination of a possible, preferable, future state of
affairs.22 Sometimes a Vorschein may reveal itself to us in the mundane transactions of
contemporary commodity culture. “Shop windows and advertising are in their capitalist
form exclusively lime-twigs for the attracted dream birds” (Bloch 1986, 334). To use one
metaphor to unpack another: the siren song of manipulative marketing, notwithstanding
its liability to mystify consumers and fetishize commodities, gives voice (unbeknownst
to itself) to a legitimately humanistic wish for a new and improved way of life. In other
cases, a Vorschein may only become perceptible in hindsight, through an interpretive
502 BRYAN J. PARKHURST

reassessment of an antiquated cultural form, or what Hegel calls a “shape of life that
has grown old” (Hegel 1992, 23):

[S]hould visionary hearing of that kind be attained through successful musical

poesis a se, then all music we already know will later sound and give forth other
expressive contents besides those it has had so far. Then the musical expression per-
ceived up to now could seem like a child’s stammering by comparison, a language of
an ultimate kind that is seeking to take shape but has come close to doing so only in
a few, very exalted places. Nobody can understand it yet, although it is occasionally
possible to surmise its meaning. But nobody has as yet heard Mozart, Beethoven or
Bach as they are really calling, designating and teaching; this will only happen much
later, with the fullest maturation of these and all great works. (Bloch 1985, 207)

Both quotidian experiences and aesthetic experiences, both modern shapes of life and
outmoded shapes of life can, under proper scrutiny, show us the yawning gap between
how the world is and how it could and should be. Bloch’s hermeneutic undertaking is to
use cultural and aesthetic criticism to sharpen our experience of the nonidentity of is
and ought. He seeks to brighten and intensify the utopian Vorschein that is every-
where to be glimpsed in the world of human values, institutions, culture and art.
Accordingly, the philosophy of music progressively elaborated in The Spirit of Utopia
and The Principle of Hope encourages us to hear the revolutionary, hopeful tone23 that
resounds in the masterworks of the Western canon (to which Bloch’s musical prefer-
ences are more or less restricted).24 With Bloch’s aesthetic philosophy as its handmaiden,
music can finally come into its own as a “source-sound of self-shapings still unachieved
in the world” (Bloch 1985, 219).
Fair enough, but what exactly is a Vorschein, and how exactly is it to be found in music?
Bloch’s cryptic, prophetic reflections on culture never give way to a precise statement of
what it is for a cultural practice, musical or otherwise, to radiate a pre-appearance of
utopia. At the risk of oversimplification, I would point to two basic ideas that seem to lie
at the center of Bloch’s proposal: (1) there is a way of imaginatively engaging with cul-
tural objects so that they provide a sensory and intellectual provocation to construct a
mental representation of (fictional) utopian circumstances, and (2) this fiction-generating
imaginative activity, properly performed, furnishes us with motivation to make the
utopian fiction a true representation; we are propelled by our utopian make-believe to
try to bring the world into alignment with the utopian fictions we imagine.
These two ideas are especially germane in the case of musical experience, as Ruth
Levitas notes: “Bloch argues not only that music is the most utopian of cultural forms
but that it is uniquely capable of conveying and effecting a better world” (Levitas 2013,
220, emphasis mine). To paraphrase Levitas, music has pride of place in the sphere of
the arts because of, on the one hand, its unparalleled capacity for possessing utopian
sense or reference (the semantic property of being about or signifying utopia) and, on the
other hand, its capacity for carrying utopian prescriptive force (the power to exhort us
to make utopia real). Music exceeds the other arts in its power to summon a vision of
a utopian “Not-Yet-Being” (Noch-Nicht-Sein) or “Real-Possible” (objectiv-real Mögliches)
A Hopeful Tone 503

that lies beneath the surface of “That-Which-Is” (Das Seiendes)—the world as it currently
confronts and confounds us; and it also has greater power to make palpable the historical
urgency of this vision: “music is that art of pre-appearance which relates most intensively
to the welling core of the existence-moment of That-Which-Is and relates most expan-
sively to its horizon—cantus essentiam fontis vocat [singing summons the existence of
the fountain]” (Bloch 1986, 1069–1070).25 Couched in language that is less expressionistic:
the world as it now is, which includes us as we now are, is pregnant with—contains as
an “objectively real possibility” (objektiv-reale Möglichkeit)—the world as we imagine it
should be, which includes us as we would wish ourselves to be. Music reveals that the
alienated, self-dirempted world of capitalist modernity is implicitly and immanently
(not yet, noch nicht) “a homeland of identity in which neither man behaves toward the
world, nor the world behaves towards man, as if toward a stranger” (Bloch 1986, 209).
How? Music “relates most intensively to the welling core of the existence moment” by
being the most real art, the art most immediately connected with our concrete material
and corporeal predicament as embodied creatures, in the perfectly literal sense that
there is nothing spatially between us and the physical vibrations that are music’s material
substratum. Music’s realness thus rests on transhistorical features of our sensory appa-
ratus. Bloch says as much in saying, wryly, that “as hearers we can keep closely in touch,
as it were. The ear is slightly more embedded in the skin than the eye is” (1985, 73). And,
very much in the spirit of Walton’s remarks about our spatial oneness with musical
sounds (“it is as though I am inside the music, or it is inside me”), Bloch refers to the
“heard note” as a “sound that burns out of us . . . a fire in which not the vibrating air but
we ourselves begin to quiver and to cast off our cloaks” (1).
To give sense to the idea that music relates “most intensively” to “That-Which-Is,”
we can appeal, on Bloch’s behalf, to the spatial immediacy and bodily resonances that
place music in an “incomparable proximity to existence” (Bloch 1985, 227)—namely our
creaturely existence as bodies that are repositories of affect and desire. Here, force is a
function of distance: the potency of music’s utopian prescription has to do with its close-
ness to us. Music “comes close to the subject-based and driving force of events” (208),
the human will as the authentic engine of history, because of music’s capacity to (non-
metaphorically) move us, indeed to become one with us, on a somatic level. “There is not
music of fire and water or of the Romantic wilderness that does not of necessity, through
the very note-material, contain within it the fifth of the elements: man” (227). The nature
of sound and the nature of our bodies ensure that the material conditions are continually
present for establishing a “correspondence between the motion of the note and the
motion of the soul” (123).
But why believe that music is “uniquely capable of conveying . . .a better world?” in the
sense of helping us to imagine one? The figurative and literary arts can convey semantic
freight of a utopian sort by showing us or telling us about some utopian situation or
other, such as the leisure-filled, egalitarian, neo-Medieval England described in William
Morris’s utopian novel News from Nowhere. But it seems that music unaided by words
and pictures is not merely inferior as a vehicle for “conveying” utopia; music seems
wholly unfit for this representational task.
504 BRYAN J. PARKHURST

Bloch might respond that this is too simplistic a way of framing the issue. Utopian art,
as Bloch conceives of it, is not simply, nor is it primarily or paradigmatically, art that
draws a blueprint of a better world and/or a better way of living in the world:

Thus the concept of the Not-Yet and of the intention towards it that is thoroughly
forming itself out no longer has its only, indeed exhaustive example in the social
utopias; important though the social utopias are . . . [T]o limit the utopian to the
Thomas More variety, or simply to orientate it in that direction, would be like trying
to reduce electricity to the amber from which it gets its Greek name and in which it
was first noticed. Indeed, the utopian coincides so little with the novel of an ideal
state that the whole totality of philosophy becomes necessary . . .to do justice to the
content of that designated by utopia. (1986, 11)

Levitas, taking her lead from Bloch’s standoffishness toward the “novel of the ideal state,”
states that “the importance of . . . all utopias, lies not in the descriptions of social arrange-
ments, but in the exploration of values that is undertaken” (2010, 140). Utopian art is not
limited to idealistic science fiction; rather, it is any art that permits us to navigate a space
of alternative values, not so that we might come to commit ourselves to those exact values,
but so that we might cultivate the imaginative faculty of thinking deeply and creatively
about a radically novel personal and societal ethos, one that might possibly emerge into
prominence within a radically reorganized way of producing and reproducing human
civilization. Hudson (1982) essentially agrees with Levitas when he states that Bloch’s
view is that utopian artworks might or might not contain “descriptions of social arrange-
ments,” but must possess a “cognitive function as a mode of operation of constructive
reason; [an] educative function as a mythography which instructs men to will and desire
more and better, [an] anticipatory function as a futurology of possibilities which later
become actual, and [a] causal function as an agent of historical change” (51).
Be that as it may, the question stands: how is music alone supposed to do this? Even if
Levitas’s “exploration of values” and Hudson’s utopian functions do not presuppose
full-blown verbal or pictorial “descriptions of social arrangements,” they seem to be
predicated on the presence of semantic content of some sort, that is, on the availability of
a specifiable, “transcribable” meaning or representational content that can be somehow
accessed by means of (proper engagement with) the utopian-functioning artwork.
Shouldn’t this mean that music’s “weak representational function” disqualifies it from
playing a genuinely utopian role, at least on its own? It looks as though Bloch shares this
worry: “It does not go without saying that the note can indicate external things and be
related to them. After all, it inhabits precisely that region where our eyes can tell us noth-
ing more and a new dance begins” (Bloch 1985, 219). Yet he blithely proceeds as though it
does go without saying and exempts himself from giving a justification for his conviction
that music is the utopian medium par excellence.
This lacuna cannot be passed over without comment. To have even a minimal appreci-
ation for Bloch’s large philosophical investment in music, we must have some measure
of warranted sympathy for his belief in music’s utopian function. And to have this, we need
to be able to explain to ourselves how (purely) musical utopianism is so much as possible.
A Hopeful Tone 505

Waltonizing Bloch

With the help of Walton’s theoretical apparatus and a clue from Adorno, we can formulate
Bloch’s position so that it makes enough sense to be assessable. As a way into this, let us
return to the dualism I set out at the beginning of the chapter.
Inarguably, Bloch’s aesthetics is robustly normative. Although Bloch has no use for
micro-evaluative rankings of individual artworks and is an aesthetic omnivore who, “in
contrast to Lukács, breaks with the high culture bias in Marxist aesthetics” (Hudson 1982,
179),26 Bloch’s musical aesthetics is at root an endorsement of the classical canon’s sup-
posed aptness for promoting socialist values. It is also an elaborate exposition of the
view that the value of a musical work is partly based on its fitness for aiding the cause of
human emancipation.
Recall that normativist aesthetics has the job of describing and systematizing the
norms that govern and constitute real-life aesthetic practices and habits. Walton
examines the practices and habits that surround fictional representations; he explains
the property of fictional representationality in terms of the uses to which fictionally
representational aesthetic objects are put, and in terms of the normative statuses those
objects are accorded, by participants in games of make-believe. Bloch, on the other
hand, does not set out to explain how an already-up-and-running aesthetic practice is
organized and administered. His conception of “visionary hearing” (Hellhören), for
instance, does not arise out of an attempt to explain what the typical listener typically
does when listening to music. But Bloch’s aesthetics does adopt the holistic, synoptic
perspective characteristic of Walton’s normativist work. Where Walton sets out to trace
the normative contours of a complex representational and imaginative practice as it
currently exists, Bloch calls for the revision or remaking of time-honored aesthetic
customs. In both cases, the theoretical object is all of a certain musical way of life in
its globality We might therefore adopt a more fine-grained version of the normativist/
normative distinction and speak instead of a distinction between descriptive meta-
aesthetics, as pursued by Walton, and normative meta-aesthetics, as pursued by Bloch.
Bloch’s aesthetics is normative less at the level of the individual work or individual
aesthetic judgment and more at the level of the entire aesthetic culture in which such
works and judgments have their place.
The normative system that Bloch propounds, like the one whose defining attributes
Walton catalogs, has authority over acts of the imagination. According to the norms of
imaginative listening Bloch would have us adopt, music (read: the masterworks of the
Western tonal canon) should be accorded the function of evincing a utopian vision. The
utopian vision to be evoked is substantially the same from piece to piece. Music, all of it,
has an immutable representational content that is prior to the contingent, individuating
details of specific works. This Bloch refers to both as the “a priori latent theme [that] . . . is
really central to all the magic of music” (Bloch 2000, 3) and as “the hearing-in-Existence . . .
common to all forms of music” (Bloch 1986, 1089). That Bloch sees himself as breaking
506 BRYAN J. PARKHURST

with aesthetic tradition by promulgating a novel imaginative practice, rather than as

factually describing the representational properties musical works already have (relative
to an actually existing interpretive practice), is made evident by a passage in which he
describes music’s utopian function as an unorthodox, unprecedented representational
use for which music could be purposely enlisted:

[T]he musical object that has really to be brought out is not decided. The . . . dramatic-
symphonic movement posits only an area of very general readiness into which the
poetically executed music-drama can now be fitted “at one’s discretion.” And by the
same token, there yawns between the most transcendable [compositional devices]
and the ultimate signet-character of great composers or indeed the ultimate object,
the ideogram of utopian music in general, an empty, damaging hiatus which renders
the transition more difficult. Even in rhythm and counterpoint illumined theo-
retically and set in relation philosophically, it is not possible to come directly to the
kind of presentiment accessible to the weeping, shaken, most profoundly torn-apart,
praying, listener. In other words, without this special learning-from-oneself, feel-
ing-oneself-expressed, human outstripping of theory [through] the interpolating
of a fresh subject (though one most closely related to the composer) and of this
subject’s visionary speech . . .without this, all transcending relations of the [compo-
sitional devices] to the apeiron . . .will remain stationary. Thus with the presentiment,
a stage which no longer belongs to the history of music, the note itself reappears as the
solely intended, explosive aha!-experience of the parting of the mist; the note which
is heard and used and apprehended, heard in a visionary way, sung by human beings
and conveying human beings. (1985, 92, emphasis in original)

Part of Bloch’s point seems to be that there is no way of explaining music’s utopianism,
no way of tracing a path from what is objectively the case about music’s structure
(perhaps at the level of “rhythm and counterpoint illumined theoretically”) to its capacity
for “visionary speech.” But, as I have insisted, a vindication of the possibility of musical
utopianism is a requirement for taking Bloch’s revisionary, revolutionary musical
aesthetics at all seriously, and any such vindication would seem to stand in need of
such an explanation.
Adorno’s writings, as interpreted by Richard Leppert, may permit us to be more
optimistic than Bloch is about the prospects of explaining musical utopianism. In spite
of his infamous pessimism, Adorno is in many respects a utopian thinker about music.27
A utopian sensibility imbues Adorno’s formulation of the concept of structural listening,
a privileged mode of “formalist” hearing whose decline he blames on the organs of mass
culture, principally the radio and the phonograph. Leppert explains:

Adorno promoted the realization through listening of a reciprocal relation between

part and whole, by means of which each would be the more fully realized. Heard
atomistically . . . the detail was rendered meaningless in its isolation, just as any
sense of the whole was obliterated. Conversely, however, if the detail were heard
solely as a building block of something larger, it would surrender any sense of its
A Hopeful Tone 507

own spontaneity—which ultimately must be preserved if the whole is to express

anything more than its own immanent structure. The relation between part and
whole is radically reciprocal; each emerges and, indeed, lives from and through
its other. (Leppert 2005, 116)

For Adorno, music’s sensuous presentation of the reconciliation of part and whole
(at least sometimes) stands for a state of perfection, self-subsistence, harmony, plenitude,
consummation, fulfillment, and nonalienation, in a way that is (at least sometimes)
utopian.28 Such a state of reconciliation obtains when the elemental constituents of a
system and the system as a unified whole are mutually adjusted and accommodated to
one another, such that each is a necessary requirement of, and in turn requires, the other.
This familiar organicist conceit is readily transposed into a political key:

In musical details Adorno heard the subject speaking, willingly bending toward the
musical object (the whole) in order to make possible the work, a whole larger than
the sum of its individual parts. Something, in other words, like a utopian society.
Musical details, bending and blending their expressive character toward the whole,
while retaining their own specific character, permitted the reenactment of reconcilia-
tion between subject and object, for Adorno the artwork’s highest goal.
(Leppert 2005, 116)

Music, on this somewhat cursory telling of the Adornian story, is “like” a utopian society:
the reciprocal mediation of whole and part (piece and note) in organically unified music
is relevantly similar in form (or “isomorphic,” “structurally homologous,” etc.) to the
reciprocal mediation of whole and part (society and person) that is distinctive of a
“homeland commensurate with man” (Bloch 1986, 136). Subsequent to humankind’s
hard-won entrance into its postcapitalist homeland, individuality is not, and cannot be,
alienated from collectivity. This is because, as the Communist Manifesto famously puts
it, communism is by definition a circumstance in which “the free development of each is
a condition for the free development of all.” Animated by exactly this vision, Adorno
holds that great music’s

greatness is shown as a force for synthesis. Not only does the musical synthesis pre-
serve the unity of appearance and protect it from falling apart into diffuse culinary
moments, but in such unity, in the relation of particular moments to an evolving
whole, there is also preserved the image of a social condition in which above those
particular moments would be more than mere appearance. (2002, 290)

Adorno thus singles out a formal similarity, a shared mereological property, as the
common denominator that semiotically links utopian music to utopian social circum-
stances. It is on the basis of this structural resemblance that he ascribes to music an ability
to “preserve an image” of utopia. Should we therefore infer that utopian music has a
representational function analogous to that of Bosch’s The Garden of Earthly Delights
508 BRYAN J. PARKHURST

or Manet’s Le Déjeuner sur l’herbe? These utopian paintings catalyze our imagining of
utopian states of affairs by looking like a setting in which such states of affairs obtain;
looking at these paintings is phenomenologically similar (in relevant respects) to what it
would be like to actually look at an actual utopian setting. A Waltonian would say that
this visual similarity enables and invites us to pretend that in perceiving the artwork we
are perceiving the utopian state of affairs that is represented in and by the artwork. Does
utopian music do something like this? Is the Adornian position (or the most defensible
position consistent with the most charitable interpretation of Adorno’s remarks) the posi-
tion that music’s utopian function consists in its being what Walton calls a depiction?
Regarding depictions, Walton claims:

The viewer of Meindert Hobbema’s Water Mill with the Great Red Roof plays a game
in which it is fictional that he sees a red-roofed mill. As a participant in the game, he
imagines that this is so. And this self-imagining is done in a first-person manner:
he imagines seeing a mill, not just that he sees one, and he imagines this from the
inside. Moreover, his actual act of looking at the painting is what makes it fictional
that he looks at a mill. And this act is such that fictionally it itself is his looking at
a mill; he imagines of his looking that its object is a mill. We might sum this up by
saying that in seeing the canvas he imaginatively sees a mill. Let’s say provisionally
that to be a “depiction” is to have the function of serving as a prop in visual games
of this sort. (1990, 294)

Most pieces of music, according to Walton’s arguments discussed earlier, do not have a
work world and thus do not function as depictions. But some pieces quite obviously do
function this way. To take a familiar example, those who listen to the fourth movement
of Beethoven’s Pastoral Symphony play a game in which it is fictional that there is a
thunderstorm, and in which they are to imagine that their act of listening to the music
is an act of listening to a thunderstorm. Underlying this depictive function is the sonic
resemblance the music bears to a thunderstorm. Is this how we should explain what
happens when Bloch’s musical listener “psychologically anticipates the Real-Possible”
(1986, 144)—the concrete possibility of utopian reconciliation between the human
species and the human lifeworld—when she performs an act of “visionary hearing?”
Does music’s utopianism consist in its being an auditory depiction of utopia?
Probably not. It would be odd to claim that musically induced utopian make-believe
involves imagining that we hear utopia. Our game world when we listen to utopian
music is not plausibly one in which it is fictional that we have an auditory experience of
utopia. The reason for this is banal: there is not a distinctive way (or even a distinctive
set of ways) a utopian social arrangement would sound. Unlike thunderstorms, utopian
social arrangements lack a defining sonic profile. Perhaps it is true that utopian music’s
tonal structure is abstractly isomorphic to utopia’s interpersonal structure—but this is
not the same as music sounding like utopia. Music cannot sound like utopia, because
there is nothing (in general) that utopia sounds like.
But if music cannot depict utopia, what can it do? One response that comes to mind
is that music might allegorize utopia. Here, again, Walton’s analysis is helpful. Walton
A Hopeful Tone 509

understands allegory as art that (1) refers to something that is different from what it
represents; and (2) refers by representing:

Dr. Pangloss in Voltaire’s Candide stands for Leibniz, to whom the work refers. . . . But
I prefer not to regard [this] work as representing Leibniz . . . in our sense. It is not
fictional of Leibniz that his name is “Pangloss” and that he became a “beggar covered
with sores, dull-eyed, with the end of his nose fallen away, his mouth awry, his teeth
black, who talked huskily, was tormented with a violent cough and spat out a tooth
at every couch,” and in this sorry state met his old philosophy student, Candide, to
whom he continued to prove that all is for the best. We are not asked to imagine
this of Leibniz, although we are expected to think about him when we read about
Pangloss, to notice and reflect on certain “resemblances” between the two. Pangloss
is Voltaire’s device for referring to Leibniz, but he refers to Leibniz in order to com-
ment on him, not in order to establish fictional truths about him. Reference is thus
built on the generation of fictional truths, ones not about the things referred to, is
one common kind of allegory. (1990, 113)

What could utopian music fictionally represent, such that it could thereby allegorically
refer to utopia? We can answer this question by giving Leppert’s Adorno-inspired words
some Waltonian prefixes: a work of music makes it fictional that its notes “bend . . .and
blend . . .their expressive character toward the whole,” gets us to imagine that those notes
“permit . . .the reenactment of reconciliation between subject and object,” prescribes that
we make-believe that the notes don’t “surrender any sense of [their] own spontaneity”
(Leppert 2005, 116). Musical notes are not agents and so cannot literally surrender sponta-
neity (or perform any of the actions Leppert mentions); but music gets us to imagine it.
And in prescribing such an imagining, Bloch could say, were he inclined to be so precise,
musical sounds thereby refer to a not-yet-existing utopia and impel us “to notice and
reflect on certain ‘resemblances’ between the two” (Walton 1990, 113) (i.e., between the
musical-objects-as-agentially-imagined and the allegorically referred-to utopian
state). Or at least, musical sounds would do so in the context of a semiotic listening
practice that has been reconstituted according to Bloch’s prescriptive meta-aesthetics.
In the music-interpretive practice whose adoption Bloch can be understood as advocat-
ing, music assumes a utopian function via something like the following mechanism: by
getting us to imagine that its notes relate to one another dynamically, holistically, and
organically as mutually reconciled sonic agents, music makes allegorical reference to
utopia, invites us to perform the contemplative action of reflecting on the nature of
utopia, and causes us to desire the actualization of utopia.

Conclusion

We have at last arrived at a proposal that is both sufficiently Blochian and sufficiently trans-
parent: music ought to be taken to be as an allegory (in Walton’s technical sense) of utopia.
510 BRYAN J. PARKHURST

Bloch’s writings on cultural hermeneutics are meant to serve as a series of object lessons
in how to endow cultural items with a utopian function. I have attempted to specify with
a reasonable degree of precision how this utopian function could work in the musical case.
At most, this interpretive reconstruction shows the bare possibility of putting music to
such a use. But it goes no great distance toward demonstrating the wisdom or utility or
likelihood of putting music to such a use. One might well ask: what is there to admire,
even for a committed Marxist, about an aesthetic practice in which lots of music unin-
tentionally (through no deliberate decision on the part of its composer) allegorizes pretty
much the same thing? Also, is the concept of an unintentional allegory at all sensible?
If allegorical content is unrelated to authorial intention, what, if anything, constrains
the interpreter’s attributions of allegorical meaning? And, even if these questions have
convincing answers, one wonders why Bloch sees this type of listening practice as hav-
ing paramount exigence for politics. There is a hard row to hoe for anyone who would
defend Bloch’s insistence on the political momentousness of listening to the canon of
common-practice classical masterworks as radical allegories. In the first place, it is diffi-
cult to see how utopian music could tell us anything we do not already know about utopia.
For the act of allegorical interpretation to get off the ground, the interpreter needs to
already be aware of the one thing music says about utopia, which is that utopia possesses
(roughly speaking) organic form; otherwise the interpreter would have no way of deter-
mining that utopia, in particular, is what the music allegorically refers to. Moreover, the
only people who would have any real inclination to try to hear music as Bloch instructs
us to hear it—as radical political allegory—are those who are antecedently convinced
of the rightness of socialist ideals and antecedently disposed to pursue them (by, among
other methods, listening to music in an appropriately socialist way). Thus, Bloch’s
prescribed aesthetic practice presupposes the kind of knowledge and motivation that it
would need to instill, were it to hold any real claim to political efficacy. Bloch’s failure
to notice the self-underminingness of his central commitment is symptomatic of an
underlying credulity that runs throughout his writings and that threatens to vitiate his
positive project at large. Even Bloch’s most sympathetic expositors at times feel the
temptation of dismissing his system wholesale:

The problem is that Bloch . . .retreats into cipher talk at so many analytically crucial
points that [his philosophical system] runs the risk of being poetry philosophy, a
theurgic aestheticist Weltanschauung: a system of faith in hope with splendid meta-
mystical meditations, but little explanatory power. (Hudson 1982, 151–152)

It may also be the case that Bloch, in spite of his “emphasi[s] that Marxism must actively
inherit the total cultural heritage” and in spite of his “break . . .with the Eurocentrism
and high bourgeois bias of the Marxist tradition in aesthetics” (Hudson 1982, 174),
was an unwitting captive of his own elevated taste for European art music. A cultured,
middle-class convert to Marxism, Bloch was unable to relinquish the conviction that the
esteemed musical works of the Austro-German tradition ought to be politically salvage-
able and, more than this, politically essential in relation to his adopted Marxist ideals.
A Hopeful Tone 511

In Hudson’s politely damning assessment, “the underlying insight that Bloch always
remained a ‘bourgeois’ intellectual with left adventurist sympathies is not without
foundation” (Hudson 1982, 211).
This is not to say that Bloch offers no genuine insights to politically committed
Marxists who are concerned with art. Bloch’s writings read like the diaries of a resolute
socialist determined to find hope and inspiration wherever they can be found. At their
best, they make vivid the appeal of trying to reconfigure our extant aesthetic practices so
that they become sources of moral hope and political fervor as well as instruments of
hegemonic (in Gramsci’s sense) culture-building more generally. But Bloch tethered his
attempt to formulate a revolutionary aesthetics to a fundamentally passive (and character-
istically bourgeois-Romantic) conception of the aesthetic domain as principally a
space of aesthetic reception. What matters most, for Bloch, is that music be heard in
the right way. It seems not to have occurred to him to attempt to develop a comple-
mentary notion of the revolutionary potentialities of aesthetic production, nor to think
through the social implications of an aesthetic practice that transcends the division of
labor between aesthetic producer and aesthetic consumer, or one that transcends the
division (massively expanded under capitalism) between producing art and producing
“necessities.” Walton’s normativist theory helped us to put our finger on these elementary
deficiencies, which Bloch’s formidable prose style and esotericism make it easy to
overlook. If we are to begin to remedy these deficiencies, though, first we must duly free
ourselves from the constricting tenets of (exclusively) reception-based aesthetics.

Notes
1. To music is to take part, in any capacity, in a musical performance, whether by performing,
by listening, by rehearsing or practicing, by providing material for performance (what is
called composing), or by dancing. We might at times even extend its meaning to what the
person is doing who takes the tickets at the door or the hefty men who shift the piano and
the drums or the roadies who set up the instruments and carry out the sound checks or
the cleaners who clean up after everyone else has gone. They, too, are all contributing to the
nature of the event that is a musical performance (Small 1998, 9).
2. This is consistent with the use of “normativist” in the philosophy of law. “Normativism or
the normative theory of legal science represents an attempt to describe (and to rationalize)
the actual practice and thinking of contemporary jurists [in which] jurists in fact typically
provide statements of norms in a deontic language—in a language that is to say, that is
syntactically indistinguishable from the language used to give expression to the norms
themselves” (Guastini 1998, 317). Normativist legal theory seeks to describe the fundamentally
normative practices of jurists, just as normativist aesthetic theory seeks to describe the fun-
damentally normative practices of aesthetic subjects.
3. As is well known, some of the relevant judgmental norms for Kant are disinterestedness,
universality, and (the representation or apprehension of) purposiveness without a purpose.
4. This may already be obvious. Hume, one may reasonably think, can be equally well
described as making a normative claim about how aesthetic judges ought to behave or,
alternatively, as making a normativist claim about what rules are in fact followed by those
who count as true aesthetic judges. Though the distinction between normative aesthetics
512 BRYAN J. PARKHURST

and normativist aesthetics is easy to blur, it proves to be analytically useful for describing
Bloch’s project. And there is a tolerably clear difference between paradigmatic instances
of purely evaluative normative claims, such as Marx’s pronouncement that ancient sculp-
ture remains a perfect aesthetic “standard and model beyond attainment” for modern
artists (quoted in Lifshitz 1973, 89), and purely anthropological normativist claims, such
as Marx’s observation that the earliest Greek statues normatively adhered to “models of
the mathematical construction of the body” and were the products of normative practices
in which “nature was subordinated to reason rather than to the imagination” (quoted in
Lifshitz 1973, 37).
5. I should acknowledge at the outset that this goes against the mystical, theurgic grain of
Bloch’s philosophy. Bloch’s “erratic blocks of hyphenated terminology, luxuriant growths
of pleonastic turns, [and] heaving of dithyrambic breath” (Habermas 1969–1970, 316) are
not often counterpointed by rigorously clear argumentation. Nevertheless, I try to elicit
from Bloch’s writings about music an unambiguous basic commitment and a possible
rationale for it. Without this much, we have no basis for making a principled assessment
of what is living and what is dead in Bloch’s aesthetics.
6. Kellner and O’Hara describe Bloch’s philosophical venture as having a Hegelian-
teleological complexion: “For Bloch history is a struggle against those conditions which
prevent the human being from attaining self-realization in non-alienating, non-alienated
relationships with itself, nature, and other people. Bloch constantly argues that Marxist
theory ought not to forget its telos, which is, as Marx puts it in the 1844 Economic-
Philosophic Manuscripts: ‘the naturalization of man and the humanization of nature’”
(1976, 14–15).
7. For the sake of convenience, I will summarize Walton’s theory as though it were focused
solely on imaginative engagement with artworks, even though he deals with imaginative
engagement with artworks as a special case of engagement with fictional representations
in general, which is itself a special case of make-believe in general.
8. Jankélévitch (2003) is a contemporary champion of this sort of view.
9. The musicological subdiscipline of musical semiotics, part of which involves an attempt to
recover of codes of signification (“topics”) that would have been familiar to contemporaneous
audiences of historically remote music, is the major source of recent affirmations.
10. Cf., Kivy:
Unlike random noise or even ordered, periodic sound, music is quasi-syntactical;
and where we have something like syntax, of course, we have one of the necessary
properties of language. That is why music so often gives the strong impression of
being meaningful. But in the long run syntax without semantics must completely
defeat linguistic interpretation. And although musical meaning may exist as a
theory, it does not exist as a reality of listening. (1990, 8–9)
11. See Hullot-Kentor’s translator’s note in Adorno (1998, 273) for a helpful discussion of the
term “Sprachcharakter.” One of the leitmotifs of Adorno’s Aesthetic Theory is the notion
that modernist artworks (of whatever medium) express themselves in “a language remote
from all meaning” (105). This is problematic not just because the analogy with language
becomes strained for nonsematic artworks that also lack a codifiable syntactical dimen-
sion (such as abstract expressionist paintings), but also because, as we shall see, Adorno’s
denial of musical signification sits uneasily with his musical utopianism.
A Hopeful Tone 513

12. “[T]he lateness of the upper voice, and its dallying quality, the rigidity of the bass’s progression,
the fortuitousness or accidentalness of the D-major triad, the movement to something new,
are in the music. To miss these is, arguably, to fail fully to understand or appreciate the
music” (Walton 2015, 158). Perhaps so, but this insufficient understanding or appreciation
does not appear to disbar someone from counting as listening to music, the way one would
fail to count as reading a novel if one imagined nothing about anything while running
one’s eyes over its words.
13. Walton notes that some of the representational musical imaginings he catalogs “may
strike one as optional, as not mandated especially by the music itself, and so not contribut-
ing to a fictional world of the musical work” (2015, 173).
14. I defend Walton’s view in Parkhurst (2012).
15. According to Daniel (2001), in Adorno’s Aesthetic Theory,
Innerlichkeit (a term Adorno’s translators variously render as “inwardness,” “interi-
ority,” and the “bourgeois interior” and which refers simultaneously to the inner
psychic domain of the bourgeois subject and his actual living space) is described
typically as having been initially a strategy developed by the emergent bourgeoi-
sie for its own self-differentiation and self-definition in the face of a rigidly
imposed external order. A psychic site of refuge constructed to accommodate an
imagined alternative life, the bourgeois interior was fatally flawed, however, in
that it was content merely to look like an alternative to the external order without
really being in any way resistant to it.
16. This list is drawn from Korstvedt’s attempt to describe, in Blochian terms, how Bloch seeks
to simultaneously cancel, appropriate, transcend, and reconfigure emotional listening:
“Bloch imagines a refunctioning of Romantic, bourgeois Innerlichkeit that transforms
subjective space from a place of sentiment, longing, and emotion, even of suppressed
animality, into one that opens onto “an ethics and metaphysics of inwardness, of fraternal
inwardness, of the secrecy disclosed within itself that will be the total sundering of the
word and the dawn of truth over the graves as they dissipate.” He believes that music,
the only “subjective theurgy,” is the way that leads to this mystery, yet he is rather vague,
almost groping in his explanation; he avers that as “the inwardly utopian art,” music “lies
completely beyond anything empirically demonstrable,” but suggests that the sublime
music of deliverance “at the End will not withdraw allegorically back into a home strange
or even forbidden to us; but will accompany us, in some deep way, to the mystery of
utopia” (2010, 122).
17. William Wordsworth’s preface to his and Samuel Taylor Coleridge’s Lyrical Ballads gives
this famous description of what is accomplished by “all good poetry” (Wordsworth and
Coleridge 2008, 175). M. H. Abrams (1971) advances the view that Romantic aesthetics is
to a significant extent unified by its tendency to generalize this description of (or pre-
scription for) poetry and extend it to all the arts.
18. Adorno pursues the idea that innerlichkeit is a kind of spiritual real-estate. Daniel (2001)
helpfully epitomizes Adorno’s view:
The alternative world of interiority is one built ostensibly for self-protection, a
psychic/physical space into which the subject can withdraw for comfort and ref-
uge. This option of withdrawal is clearly a class privilege of the bourgeois, who is
naive and/or arrogant enough to presume that he can create his own exclusive and
514 BRYAN J. PARKHURST

private warmth. But it is a privilege indulged at great cost. The alternative world of
interiority can only be inhabited (although “occupied” might be the more accurate
term here) once the subject has renounced a somatic relationship with the world:
the bourgeois interior is thus “museal,” a “still life [in which] the self is over-
whelmed in its own domain by commodities and their historical essence.”
19. Both Bloch’s early expressionist work, The Spirit of Utopia (1918), and the sprawling
presentation of his heterodox Marxist philosophy, The Principle of Hope (three volumes,
1954–1959), deal extensively with music. Bloch (1985) contains the most important musical
discussions from these works.
20. This view of Bloch’s anticipates the treatment of language in Adorno’s Aesthetic Theory.
According to Francesca Vidal:
Music cannot be understood the way language can; it is not interpretable in the
sense that words are. Therefore, Bloch employs the term “call” (Ruf). Music
wants to be heard; this links it with language, but it is understandable otherwise
than language. That the call for an “otherwise than here” is attributed to it
derives from the philosophy of music. That the relationship between philosophy
and music is mutual, and philosophy is not simply interpreting something into
music, is because music itself expresses something of the future, something that
in the openness of its process has to do not only with music itself but with the
world. (Vidal 2003, 173)
21. “[C]lairvoyance is long extinguished. Should not however a clairaudience, a new kind of
seeing from within, be imminent, which, now that the visible world has become too weak
to hold the spirit, will call forth the audible world, the refuge of the light, the primacy of
flaring up instead of the former primacy of seeing, whenever the hour of the language of
music will have come. For this place is still empty, it only echoes obscurely back in meta-
physical contexts. But there will come a time when the sound speaks” (Bloch 2000, 163).
22. The breadth of Bloch’s interests is staggering. Part I of the Principle of Hope examines
“small day dreams”; Part II explores the “anticipatory consciousness” of utopia; Part III
explores “the reflection of wish-images” in advertisements, fashion and design, fairy tales,
travel, circuses, and theater; Part IV explores how “the outlines of a better world” may be
descried in utopian literature, technology, architecture, painting opera, poetry, philoso-
phy, and recreation; Part V explores the “wish images of the fulfilled moment” that arise
in moral philosophy, music, funereal practices, religion, and communism as humankind’s
summum bonum. Interestingly, Walton’s philosophy of make-believe has also been rightly
lauded for the vastness of the range of cultural products it brings into consideration.
23. Bloch’s conception of musical tones themselves as material bearers of utopian content is
historically and musicologically contextualized in Gallope (2012).
24. “Bloch works overwhelmingly with European and particularly German music, so
much so that he is really offering a Western philosophy of music, in content at least.
The trap is a common one, for the very particular and anomalous history of Western
economics and culture becomes the norm for universalizing ‘the’ philosophy of music”
(Boer 2014, 105).
25. Here Bloch draws on the Christian image and instrument of the fountain as a source
of the “Water of Life” by which the faithful are baptized into immortality. Bloch’s engage-
ment with Christianity, and his willingness to place Marxism in dialogue with Judeo-
Christian theology, have been widely discussed. Marsden 1989 is a good introduction
to this topic.
A Hopeful Tone 515

26. This is truer of Bloch’s aesthetics as a whole than it is of his musical aesthetics, which deals
predominantly with the great works of the Western canon.
27. For a book-length argument to the effect that Adorno’s aesthetics is more utopian than
not, see Boucher (2013).
28. Copious qualifications are in order, given that Adorno sees a crucial difference between
(for instance) the kind of totality exhibited by the music of the heroic period of the
bourgeoisie (Beethoven), by the kind of neoclassical music that apes such music (early
Stravinsky), and the “administered totality” of twelve-tone serialism (Schoenberg after 1921).
For present purposes, I am trying to avoid such complications and cut to the chase.

References
Abrams, M. H. 1971. The Mirror and the Lamp: Romantic Theory and the Critical Tradition.
Oxford: Oxford University Press.
Adorno, T. W. 1993. Music, Language, and Composition. Translated by S. Gillespie. Musical
Quarterly 77 (3): 401–414.
Adorno, T. W. 1998. Aesthetic Theory. Translated by R. Hullot-Kentor. Minneapolis: University
of Minnesota Press.
Adorno, T. W. 2002. Essays on Music. Berkeley, CA: University of California Press.
Aristotle. 1961. Aristotle’s Poetics. Translated by S. H. Butcher. New York: Hill and Wang.
Attali, J. 1985. Noise: The Political Economy of Music. Translated by B. Massumi. Minneapolis:
University of Minnesota Press.
Boer, R. 2014. Theo-Utopian Hearing: Ernst Bloch on Music. In The Dialectics of the Religious
and the Secular, edited by M. R. Ott, 100–133. Leiden: Brill.
Boucher, G. 2013. Adorno Reframed. London: I. B. Tauris.
Bloch, E. 1960. Spuren. Frankfurt: Suhrkamp Verlag.
Bloch, E. 1971. On Karl Marx. New York: Herder and Herder.
Bloch, E. 1972. Atheism in Christianity. Translated by J. T. Swann. New York: Herder.
Bloch, E. 1985. Essays on the Philosophy of Music. Translated by P. Palmer. Cambridge:
Cambridge University Press.
Bloch, E. 1986. The Principle of Hope. 3 vols. Translated by N. Plaice, S. Plaice, and P, Knight.
Cambridge, MA: MIT Press.
Bloch, E. 2000. The Spirit of Utopia. Translated by A. A. Nassar. Stanford, CA: Stanford
University Press.
Chion, M. 1994. Audio-Vision: Sound on Screen. Translated by C. Gorbman. New York:
Columbia University Press.
Daniel, J. O. 2001. Achieving Subjectlessness: Reassessing the Politics of Adorno’s Subject of
Modernity. Cultural Logic 3(1). https://clogic.eserver.org/jamie-owen-daniel-achieving-
subjectlessness.
Gallope, M. 2012. Ernst Bloch’s Utopian Ton of Hope. Contemporary Music Review 31 (5–6):
371–387.
Guastini, R. 1998. Normativism or the Normative Theory of Legal Science: Some
Epistemological Problems. In Normativity and Norms: Critical Perspectives on Kelsenian
Themes, edited by S. L. Paulson and B. Litschewski Paulson, 317–330. New York: Oxford
University Press.
Hegel, G. W. F. 1920. Philosophy of Fine Art. Vol. 3. Translated by F. P. B. Osmaston. London:
G. Bell and Sons.
516 BRYAN J. PARKHURST

Hegel, G. W. F. 1977. Phenomenology of Spirit. Translated by A. V. Miller. Oxford: Oxford

University Press.
Hegel, G. W. F. 1992. Elements of the Philosophy of Right. Translated by A. Wood. Cambridge:
Cambridge University Press.
Hudson, W. 1982. The Marxist Philosophy of Ernst Bloch. London: Macmillan.
Habermas, J. 1969–1970. Ernst Bloch—A Marxist Romantic. Salmagundi 10–11: 311–25.
Hume, D. 2006. Essays: Moral, Political, and Literary. New York: Cosimo Classics.
Jameson, F. 1971. Marxism and Form. Princeton, NJ: Princeton University Press.
Jankélévitch, V. 2003. Music and the Ineffable. Translated by C. Abbate. Princeton, NJ:
Princeton University Press.
Jerusalem, W. 1920. Introduction to Philosophy. Translated by C. F. Sanders. New York:
Macmillan.
Kellner, D., and H. O’Hara. 1976. Utopia and Marxism in Ernst Bloch. New German Critique
9: 11–34.
Kivy, P. 1990. Music Alone: Philosophical Reflections on the Purely Musical Experience. Ithaca,
NY: Cornell University Press.
Klumpenhouwer, H. 2002. Commodity Form, Disavowal, and Practices of Music Theory. In
Music and Marx: Ideas, Practice, and Politics, edited by R. B. Qureshi. New York: Routledge,
23–44.
Korstvedt, B. M. 2010. Listening for Utopia in Ernst Bloch’s Musical Philosophy. Cambridge:
Cambridge University Press.
Leppert, R. 2005. Music “Pushed to the Edge of Existence”: Adorno, Listening, and the
Question of Hope. Cultural Critique 60 (1): 92–133.
Levitas, R. 2013. Singing Summons the Existence of the Fountain: Bloch, Music, and Utopia.
In The Privatization of Hope: Ernst Bloch and the Future of Utopia, edited by P. Thompson
and S. Zizek. Durham, NC: Duke University Press.
Levy, Z. 1997. Utopia and Reality in the Philosophy of Ernst Bloch. In Not Yet: Reconsidering
Ernst Bloch, edited by J. O. Daniel and T. Moylan, 175–185. London: Verso.
Lifshitz, M. 1973. The Philosophy of Art of Karl Marx. Translated by R. B. Winn. London:
Pluto Press.
Marsden, J. 1989. Bloch’s Messianic Marxism. New Blackfriars 70: 32–44.
Neuhouser, F. 2011. The Idea of a Hegelian “Science” of Society. In A Companion to Hegel,
edited by S. Houlgate and M. Baur. Oxford: Wiley-Blackwell.
Parkhurst, B. 2012. The First-Person Feeling Theory of Musical Expression. Postgraduate
Journal of Aesthetics 9(2): 14–27.
Small, C. 1998. Musicking: The Meanings of Performing and Listening. Middletown CT:
Wesleyan University Press.
Vidal, F. 2003. Bloch. In Music in German Philosophy: An Introduction. Chicago: University of
Chicago Press.
Walton, K. 1990. Mimesis as Make-Believe. Cambridge, MA: Harvard University Press.
Walton, K. 2015. In Other Shoes: Music, Metaphor, Empathy, Existence. Oxford, UK: Oxford
University Press.
Wordsworth, W., and S. T. Coleridge. 2008. Lyrical Ballads: 1798 and 1800. Peterborough,
Ontario: Broadview Press.
chapter 25

Sou n d as
En v iron m en ta l
Pr e sence
Toward an Aesthetics of Sonic Atmospheres

Ulrik Schmidt

Introduction

Contemporary auditory culture is characterized by an intensified and often rigorously

detailed focus on the design and sensory experience of our sonic environment. From
shopping arcades and private homes to film, sound art, digital media, and computer
games, sound is used to stimulate our sensation of being in a particular environment.
Whether professionally designed or a result of our practice of everyday aestheticization,
environmental sound design has become ubiquitous. It frames and penetrates our every-
day existence, action, and social exchange.
However, our knowledge of sonic environments is still somewhat limited. Perhaps,
this shortcoming of concepts and theoretical models may partly be explained by the
very nature of the sonic environment itself. Due to its intermediary position and ephem-
eral, ubiquitous distribution of multiple simultaneous events, it is notoriously difficult
to determine the status and describe the character of environments in general and sonic
environments in particular. But, as Gernot Böhme notes, although “ ‘atmosphere’ is used
as an expression for something vague, this does not necessarily mean that the meaning
of this expression is itself vague” (1993, 118). Or as Gilles Deleuze describes it—in a com-
ment on Leibniz’s famous analysis of his “small perceptions” while listening to a noisy
seascape—sonic environments are “distinct-obscure” (1994, 213). They are distinct and
obscure at the very same time.
But, despite the perceptual obscurity of sonic environments—or perhaps better,
exactly because of it—we must be in constant search for new terminologies and theoretical
518 ulrik schmidt

frameworks to parallel the sophistication in design and amount of perceptual detail

characteristic of our current sonic environments. In this chapter, I will present a theoretical
perspective on what I see as some of the most important questions regarding the ways
sound affects the sensory experience of our environment: What does it mean to be
affected by the sonic environment as environment and not as a set of individual sounds
in the environment? And what perspectives and conceptual frameworks will allow us to
distinguish between different types of sonic environments and different ways of being
affected by them?
I will approach these questions by proposing the term “sonic environmentality” as a
general term for the ways sound can act and affect us as environment. The concept of
sonic environmentality enables a further distinction between three basic forms or
dimensions of the way sound may affect us as environment: atmosphere, ambience, and
ecology. During the last decades, we have seen considerable scholarly interest in
atmosphere (Böhme 1993, 1995, 2001; Schmitz 1993, 2014; Anderson 2014; Hasse 2014)
and, especially in recent years, also in ambience (McCullough 2013; Kim-Cohen 2013;
Schmidt 2013, 2015) and ecology (Guattari 2000; Morton 2007, 2010; Herzogenrath
2008, 2009). However, there has been a widespread tendency here to understand
atmosphere, ambience, and ecology as generic, all-embracing, and, to some extent,
synonymous concepts for our experience of and relations with the environment. In
contrast to this common tendency, I will propose a clear and explicit distinction between
atmosphere, ambience, and ecology as separate affective dimensions of our environment,
in the sense that they express three different aspects of the way an environment performs
and affects us as environment, each with its own distinct aesthetic potential. To sub-
stantiate and further develop this thought in more detail, the last part of the chapter will
focus on one of the three environmentalities, sonic atmosphere, as one basic dimension
of a more general sonic environmentality. I will begin from a broader perspective,
though, by considering affect and imagination as decisive features in our perception of
the surroundings as sonic environments.

Environmentality

In the most general terms, an environment can be described as a meaningful relationship

between an individual and its immediate surroundings. This insight was the cornerstone
in biologist Jakob von Uexküll’s influential studies of the animal environment as Umwelt
(1921, [1934] 2010). According to Uexküll, an Umwelt—a word that literally translates as
“surrounding world”—differs from the mere physical surroundings by constituting a
meaningful whole, a “world,” which is specific to each individual who experiences it.
Uexküll’s thoughts have played a major role in many later understandings of the environ-
ment in disciplines such as biology, psychology, semiotics, sociology, and philosophy.
For example, the psychologist James J. Gibson and philosophers such as Martin
Heidegger, Maurice Merleau-Ponty, and Gilles Deleuze all explicitly pursued Uexküll’s
sound as environmental presence 519

theory of the environment as an intimate and meaningful relationship between an

individual and its surroundings and developed this idea in different directions
(Buchanan 2008). Hence, in his ecological approach to perception, Gibson based his
theory on the existence of an essential “mutuality of animal and environment” (1986, 8).
And with direct reference to Uexküll, Deleuze has argued for a basic characterization
of individuals “by the affects they are capable of ” (1988, 125) and that this specific
capacity to be affected, this system of affective disclosure, is what produces the overall
range and properties of their particular environment as a set of conditions for action
and affection.
Following this line of thinking we can describe an environment in general terms as a
set of elementary affective relations between an individual and its surroundings. However,
although we perceive our environment as a meaningful whole through our affective
relations with it, the specific experience of it—that is, the situated act of perceptual
engagement with our environment—is often characterized by a more or less consciously
focused attention on singular and discrete objects or events, which we isolate from
the environment as a whole in an elementary figure-ground segmentation. Sound—like
light and smell—may basically be of a surrounding character; it propagates, resonates,
and reverberates throughout space. For that reason alone, sound by nature stimulates
a directly environmental perspective in perception, involving a capacity for what I shall
call environmental imagination. However, most often sounds are not experienced directly
as environments but rather as vibrating products of particular and localized events
coming to us in streams from particular positions in the environment (Bregman 1990). In
such cases, the very environmental character of the environment withdraws perceptually
to a secondary position in the background of our attentive awareness. In other words,
while the environment affects us, and does so perpetually, by constantly providing us
with a range of individual, conscious and nonconscious stimuli, our engagement with
the environment as a whole—that is, with the environment as environment—is seldom
conscious and actively intended but will usually take place in a preattentive and pre-
conscious mode of experience.
Still, however, despite its perceptually withdrawn and peripheral character, we are
constantly engaging in an affective relationship with our sonic environment as environ-
ment. This affective relationship does not so much concern the experience of individ-
ual objects, parts, streams, and events but the specific ways that the events, in each
particular case, merge into a total performance of environmental wholeness. Since
we lack a proper concept, I propose the term “environmentality” to describe this perfor-
mance and potential affectivity of the environment as environment. The expression is
borrowed from a phrase in Heidegger’s Sein und Zeit ([1927] 1996) where he, with direct
inspiration from the early writings of Uexküll (1921), speaks of “environmentality”
(Umweltlichkeit) as “the worldliness of the surrounding world” (Heidegger [1927] 1996, 62).
However, in contrast to Heidegger, who was mainly interested in the experiential consti-
tution and ontological status of the environment as a meaningful world, I understand
environmentality as a basic performative capacity of some phenomena to act and
directly affect us as encompassing, environmental wholes.
520 ulrik schmidt

Accordingly, sonic environmentality is a set of phenomenal properties that potentially

make a particular sound or collection of sounds perform and affect us as environment.
The most basic feature of sonic environmentality is thus related to the very environing
character of a given sound. A sound performs and affects us with its environmental
characteristics when it penetrates our total perceptual field with a ubiquitous sense of
being everywhere and all-over. It pervades the entire situation by destabilizing immanent
hierarchies and by merging all individual sounds into a heterogenic, all-encompassing
milieu. For the same reason, sonic environmentality is essentially nonfigurative in the
gestalt psychological sense that it dissolves the perceptual tension between central and
peripheral elements. You may hear isolated sounds and discrete events, but if they affect
you as environment, no isolated event will play a privileged part or take a central position.
From an environmental perspective, no sound is more important than others. They all
mix in the performative production of a consistent sonic whole.
Unfortunately, this crucial point regarding the aesthetic nature of our sonic environment
has been somewhat neglected in many leading theories of soundscapes and sonic
environments. Most notably, this is the case in R. Murray Schafer’s influential soundscape
theory (1977). Schafer built his analysis of soundscapes around the delicate identification
and perceptual attention toward (and potential preservation of) the individual sounds
that were most characteristic of, and intimately associated with, a particular environment
(“soundmarks” and “sound signals”). As theorists and composers such as Francisco
López (1998) and David Toop (2004) have remarked, Schafer’s prime interest in unique
soundmarks and sound signals is closely related to his critique of noisy, modern
soundscapes and what Toop describes as Schafer’s “personal aversion to urbanism”
(Toop 2004, 62). This is most apparent in Schafer’s critique of the so-called lo-fi
soundscapes of industrial societies in which “individual acoustic signals are obscured in
an overdense population of sounds” (Schafer 1977, 43). Schafer’s longing for the natural,
nonurban environment is thus closely related to his basic preference for sonic environ-
ments that allow individual sounds to be perceived in isolation from other disturbing
sources. For this reason, and in contrast to his general intentions, Schafer’s theory of the
soundscape in fact advocates an analysis of sonic environments in which the very
environmentality of the soundscape, the very tendency to affect us in all its environmental
wholeness, is overshadowed by an essentially “nonenvironmental” listening to individual
sounds in the environment. Only secondarily, if at all, did he consider the environment
as a consistent whole of affective relations. Schafer’s soundscape theory, in other words,
is not a theory of sonic environmentality.
What I propose here is another approach in which we carefully insist on the very
environmental character of the soundscape in order to investigate directly what it means
to be affected by sound as environment. As a term, sonic environmentality highlights
the fact that a merging of sounds from all disparate sources (humans, things, animals,
matter, and energy) into a global mix is a basic feature of our experience of sonic
environments. It makes us aware that when an environment performs its specifically
environmental characteristics through sound, all sounds belong, as Timothy Morton
describes it, to the same de-hierarchized and all-encompassing “worldly mesh” of
“imagine[d] interconnectedness” (2010, 15).
sound as environmental presence 521

Environmental Imagination

Whether produced by events in the immediate acoustic surroundings, by technical

recording and playback or by electronic sound synthesis, all sonic events have a
potential to become perceptually environmental and to produce a basic sense of sonic
environmentality. However, we do not experience sonic environments in pure sensory
isolation. On the contrary, the experience of our environment is essentially multisen-
sorial. As Merleau-Ponty has argued, perception is the product of an intimate multi-
sensory relationship between the individual and its surroundings as a whole. “My body,”
he notes, “is the seat or rather the very actuality of the phenomenon of expression, and
there the visual and auditory experiences, for example, are pregnant one with the other,
and their expressive value is the ground of the antepredicative unity of the perceived
world” (Merleau-Ponty [1945] 2005, 273). Accordingly, auditory perception alone does
not provide an exhaustive phenomenal experience of the environment as a whole. It will
always be but one dimension of our full, affective relations with the perceived environ-
ment as an “antepredicative unity” (i.e., a precategorical and preobjective whole).
This has important implications for the ways in which sound as specific material
manifestation contributes to the affectivity and multisensory experience of a particular
environment. As mentioned earlier, sound is by nature environmentally encompassing
because of its vibrating and reverberating properties alone. Consequently, in terms of
sensory perception our sonic environment is, in each particular situation, directly given
and perceivable for us in its full sonic manifestation. Visually, however, we only perceive
the environment as a large fragment of its total existence, framed as it is by our limited
visual perspective. Part of our environment always withdraws into the environmental
darkness of nonvisual perception. Thus, in terms of visibility and visuality the environ-
ment is never fully actualized, and the manifestation of environmental wholeness only
exists for us virtually, as a potential somewhere in the proximity of our sensory engagement
with the surrounding world. In other words, when environmentalities affect us in their
all-encompassing multisensory wholeness, they do so because they are partly imagined.
Imagination is to bring to presence what is not present by producing a virtual image
that affects us as real. According to the philosopher Tamar Gendler, to “imagine
something is to form a particular sort of mental representation of that thing” (2013).
Accordingly, because it is not directly perceivable in its entirety, the visual dimension of
our environment needs to be partly imagined as such by producing an image of environ-
mental wholeness with which we can engage in an environmental exchange of affective
relations. This potentially has crucial consequences for our understanding of environ-
mentality in general and sonic environmentality in particular. First, our general multi-
sensory perception of the environment as an antepredicative unity is in fact divided
between a direct and full auditory perception of the environment in all its sonic whole-
ness and a visual perception of it that is limited and fragmentary and must be completed
virtually in our imagination. Second, the imaginative aspect of environmental perception
is not limited to the visual domain. Because of its basic function in a unitary multisensorial
522 ulrik schmidt

system of environmental perception, sonic environmentality contributes directly to the

elementary process of what I shall call environmental imagination. Sonic environmen-
talities, as all environmentalities, affect us as partly imagined, not because they are partly
unreal but because they are partly “veiled” and withdrawn from perception. And finally,
this intimate connection to a domain of relative invisibility further exposes another
fundamental quality of sonic environmentality that is specific to the auditory domain
compared to other modalities of the perceptual system: all sonic environments are—to a
greater or lesser extent—acousmatic environments.
The word “acousmatic” derives from the name of the disciples of Pythagoras
(acousmatikoi) who listened to the teachings of their master while he was hidden from
view behind a curtain. After being reintroduced by the French composer Pierre
Schaeffer in 1966, the term has been widely used to describe the experience of a sound
without seeing the source that produces it. However, as critics such as Brian Kane have
argued, while the term has often been closely associated with the introduction of
technological sound recording and, more specifically, electroacoustic music, the acous-
matic has much wider implications than first indicated by Schaeffer (Kane 2014). This is,
I will further argue, especially the case in terms of sonic environments. We may be able
to hear an environment in all its surrounding sonic complexity but, because of our
limited visibility, the environment as a whole will always stay partly veiled behind the
acousmatic curtain. As Francisco López notes in regard to field recordings he made in
La Selva rainforest in Costa Rica, we can find a clear empirical demonstration of this
“environmental acousmatics” (2004, 86) if we listen to the noisy cacophony of a dense
tropical rain forest. “There are many sounds in the forest,” López says:

but one rarely has the opportunity to see the sources of most of those sounds . . . This
acousmatic feature is best exemplified by one of the most characteristic sounds of La
Selva: the strikingly loud and harsh song of the cicadas. . . . You hear it with an
astonishing intensity and proximity. Yet, like a persistent paradox, you never see its
source. (López 2004, 86)

This basic acousmatic quality of the sonic environment indicates a basic link between
the experience of sonic environmentality and the process of environmental imagination.
Because of the acousmatic curtain, the auditory experience of our environment
corresponds with a cognitive process in which we spontaneously map the cacophony of
sonic environmental effects onto a total image of the environment as a multisensory
whole. This environmental imagination by way of acousmatic environmentality can
take place in two different ways. It can be produced by sonic events that take place as
part of the individual’s actual physical surroundings. Or it can be produced by sonic
events that take place in a virtual space. The virtual production can again either happen
representationally, as is the case in most technical reproductions or simulations (technical
or mental) of actual sonic environments, or it can happen in a nonrepresentational,
synthetic construction of an abstract virtual environment. However, in terms of sonic
environmentality as an acousmatic stimulation of the individual’s environmental
sound as environmental presence 523

imagination it makes no essential difference whether this environmentality is produced

in our actual physical surroundings or in a simulated virtual space (representational or
nonrepresentational). Whether “actually” or “virtually” produced, sonic environmentality
stimulates the same environmental imagination because of its essential acousmatic
character. All sonic environments involve an act of imagination as the construction of a
virtual image of environmental wholeness.

Basic Sonic Environmentalities:

Atmosphere, Ambience, and Ecology

As argued initially, sonic environmentality is basically characterized by a nonhierarchical

mesh of sounds into a relational whole, which makes the environment perform as
environment. However, in each particular situation, a given sonic environmentality
will also perform and affect us in a certain way that is specific to it. All sonic environ-
mentalities affect us environmentally by stimulating our environmental imagination, but
when they do so, they simultaneously express a particular environmentality that differs,
more or less substantially, from all other particular environmentalities. In other words,
sonic environmentality simultaneously involves an environment’s performance of its
own environmental characteristics on a general level, where it produces a set of generic
environmental effects shared by all environmentalities, and on a specific level, where it
distinguishes itself from all other environmentalities.
It is still possible, however, to identify a number of environmental characteristics that
are shared by all particular environmentalities. I will refer to such environmental
characteristics, shared by all particular environmentalities, as basic sonic environmen-
talities. They are basic in the sense that they express different elementary aspects or
dimensions common to all environmentalities, which can be articulated or emphasized
to a greater or lesser extent in each particular situation. Furthermore, I propose a dis-
tinction between three, basic environmentalities: atmosphere, ambience, and ecology.
In contrast to common practice, in which the three concepts are often used somewhat
synonymously, I thus argue for a well-defined and explicit differentiation between them
as three distinctive dimensions or “varieties” of the way an environment performs and
affects us as environment.
All three, basic sonic environmentalities stimulate our general environmental imagi-
nation, but in each particular situation they will, as mentioned, do so to a greater or
lesser extent and with quite different aesthetic potentials. In fact, the basic environmen-
talities differ because they comprise and express different aspects of the generically
environmental. More precisely, ambient environmentality emphasizes an environment’s
ubiquitous properties, potentially intensifying in experience the basic environmental
imagination of being surrounded in and by the environment as a total field. Ecological
environmentality, on the other hand, emphasizes an environment’s relational properties,
524 ulrik schmidt

thereby potentially intensifying the basic environmental imagination of interconnectivity

and of being mutually involved with all its parts in a dehierarchized relationship. And
finally, atmospheric environmentality emphasizes an environment’s anthropomorphic,
social and site-specific qualities in order to intensify the basic environmental imagi-
nation of a spatially distributed presence.
The three, basic sonic environmentalities are not mutually exclusive, and they seldom
exist and affect us in pure isolation. On the contrary, a particular sonic environment can
in most cases be characterized by the ways in which it combines the three basic sonic
environmentalities into a consistent image of environmental wholeness that is unique to
that particular environment. Still, however, each basic environmental variety will
typically have a more or less profound impact on the overall environmental imagination
in the sense that, for instance, sonic environmentalities dominated by the atmospheric
dimension could be described as “atmospheric” and sonic environmentalities dominated
by the ambient or ecological dimensions could be described as “ambient” or “ecological”
respectively.
Since it is not possible here to go into details with all three basic sonic environmentalities
and their aesthetic potentials, I will—while not ignoring the other basic environmental-
ities entirely—narrow my focus in the rest of the chapter to an exploration of sonic
atmosphere. First, I will outline some of the general theoretical implications and aesthetic
potentials related to the consideration of atmosphere as basic environmentality. In the
last sections of the chapter, I will specifically discuss sonic atmosphere and analyze it in
relation to specific cases in cinematographic sound design and sound art.

Atmosphere as Environmental Presence

As elementary expressions of the generically environmental, all three basic environmen-

talities obviously have many characteristics in common. For instance, qualities such as
an intermediary position between subject and object (quasi-objectivity, in-betweenness),
and the diffuse, ephemeral, and enveloping character often highlighted as core attributes
of atmosphere in many theoretical accounts (Böhme 1993, 1995, 2001; Schmitz 2014;
Hasse 2014), are in fact characteristics that are shared by all environmentalities and
not exclusive to atmospheres. Quasi-objectivity, in-betweenness, ephemerality, and
envelopment are just as elementary in ambient and ecological environmentalities as
they are in atmospheres. What are in focus here are the characteristics that distinguish
the three environmentalities from each other and make them stimulate our environmental
imagination in different ways. What is of key interest is, in other words, sonic atmosphere
as the specifically atmospheric dimension of sonic environmentality.
The most characteristic element of atmosphere, which distinguishes it from other
environmentalities, is, I will argue, another basic feature often mentioned in texts on
atmosphere, albeit usually rather briefly and in an evanescent manner. This feature is
clearly articulated in a passage by Böhme in which he describes an atmosphere’s capacity
sound as environmental presence 525

to produce a “sense of presence” [das Spüren von Anwesenheit] (Böhme 2001, 45).
Atmospheres, Böhme notes with explicit inspiration from Heidegger, “seem to fill the
space with a certain tone of feeling like a haze” (1993, 113–114); they evoke a vague sense
of something’s or someone’s environmental “being-here” as a feeling of “indeterminate
and spatially disseminated moods” (2001, 47).
How can we, more precisely, characterize this experience of environmental presence
as a spatially distributed mood, tone, or feeling? As I argue, the sense of atmosphere as
environmental presence is mainly evoked by two interrelated factors. First, it is related
to the production of presence as a site-specific (Kwon 2002) sense of being in a particular
place. As Jürgen Hasse describes it, we perceive atmospheres “as an affective tone of a
place. . . . They communicate something about the distinct qualities of a place in a
perceptible manner, they tune us to its rhythm” (Hasse 2014, 215). This sense of site-
specificity is closely related to what Böhme calls the “ecstasy of things” (1993, 1995, 2001)
by which he understands a certain capacity of individual entities to “go out of them-
selves” in our environmental perception (gr. ekstasis: standing out). In each specific
situation, all individual things—objects, materials, persons, sounds, everything that
makes up the sociomaterial environment—go out of themselves and merge into a
unique assemblage that intimately connects our atmospheric imagination to the place
or site in which it is produced and evoked.
Second, the atmospheric sense of presence is closely related to the environmental
imagination of a particularly human or social quality. According to Heidegger, “mood”
(Stimmung) is the existential capacity of Dasein that continuously and in each particular
moment in time “makes manifest ‘how one is and is coming along’” ([1927] 1996, 127).
Following Böhme’s Heideggerian definition of atmosphere as “spatially d isseminated
moods,” atmosphere can in view of that be described as the experience of “how a particular
environment is and is coming along.” It discloses the “state” of a place or situation
by investing it with a human-like affectivity and expression of being in a certain mood.
Or as Jürgen Hasse describes it, again with an implicit reference to Heidegger, atmospheres
“let us comprehend without words how something is around us. Therefore, atmospheres
are also indicators of social situations” (2014, 215). Atmospheres “are not things, but
emotions that we are affected by as essences of the world-with-others” (221).
Anthropomorphism is the attribution of human characteristics—human form,
behavior, consciousness, expressivity—to nonhuman things. In view of that, the pro-
duction and experience of atmosphere as spatially disseminated moods can be said to
involve an unmistakable anthropomorphism because of its tendency to infuse our
environmental imagination with a human-like sense of intentionality, expressivity, and
emotion. Anthropomorphism makes the environment act and perform “like a human
being” emerging from the assemblage of relations between materials, things, bodies,
and events dynamically distributed throughout the place—like “a sort of spirit that floats
around,” as Michel Orsoni describes it (quoted in Anderson 2014, 137). An atmosphere
is, in other words, not only essentially social; it is the environmental performance and
imagination of our sociomaterial surroundings as “Other” in the form of an abstract,
quasi-subjective being. Needless to say, this sense of mood, spirit, or intentionality does
526 ulrik schmidt

not involve the construction of another subject, a particular person, in our imagination.
Atmospheric anthropomorphism remains essentially environmental.
To summarize my argument so far, atmosphere can be described as the affective
production and environmental imagination of a site-specific and anthropomorphic
presence emerging from the material layout of a particular environment. This anthropo-
morphic and site-specific character, it must be stressed, is a unique quality of atmosphere
as basic environmentality. You will find nothing like it in either ecological or ambient
environmentalities that both affect the individual by way of essentially nonhuman
properties and effects. Actually, the very difference between atmosphere and the other
two basic environmentalities regarding the specifically human properties and imagi-
nations allows us to see how atmosphere is in fact the sole basic environmentality in
and by which the human dimension of our environment—that is, social, subjective,
anthropomorphic—is performed and experienced. Atmosphere, in short, is the per-
formance and imagination of the specifically human relations with our environment.

Sonic Atmospheres

So, if we return to the question of sonic atmosphere, how can we more precisely understand
the sonic production of site-specific and anthropomorphic presence? What is the
specifically atmospheric dimension of our sonic environment? What does a sonic
atmosphere sound like? To explore this question in more detail I will, in the last part of
the chapter, consider two examples taken from two different domains in which the
staging and experience of sonic atmospheres is particularly prevalent: cinematographic
sound design and contemporary sound art. To emphasize the specifically atmospheric
qualities of the sonic environments in question, I will occasionally include accompanying
observations on ambient and ecological environmentalities as well.

David Lynch: Eraserhead (1977)

Sound design is a central domain for the staging of sonic atmospheres. Especially in fiction
film and computer games, sound design plays a key role in creating a sensation in the
listening spectator/player of being in the imagined environment in which the action
takes place by creating an atmospheric sense of place and anthropomorphic presence.
Obviously, music often contributes strongly to the overall experience by investing the
scene with an emotional character, but it typically does so by way of conventional
musical expression, not by producing environmental effects. In rare cases, however, film
music with immanent environmental properties may be used to autonomously produce
sonic atmospheres. As an example of this, consider Stanley Kubrick’s sympathetic use of
György Ligeti’s environmental compositions (Atmosphères [1961], Requiem [1963–1965],
and Lux Aeterna [1966]) in 2001: A Space Odyssey (1968). In perfect line with the narrative,
sound as environmental presence 527

the music continually invests the film’s many scenes of the dark and empty outer space
with a mystical abstract presence that hovers over the imagined environment throughout
the film.
However, rather than being a result of the use of music, the cinematographic creation
of sonic atmosphere mainly takes place in relation to the overall sound design of the film
or game in question. As a paradigmatic example of this, consider David Lynch’s and
Alan Splet’s pioneering sound design for Lynch’s Eraserhead (1977). The film’s sound
track itself—later made available in a shorter version as a stand-alone release in its own
right (1982)—is a profound example of the evocation of environmental presence by the
use of sound. By incessantly combining acousmatic site-specific action with environ-
mental sounds of anthropomorphic expression, the soundscape itself becomes a leading
character in the staging of the film’s bizarre and anxious universe.
In order to explore this in more detail, consider the first scene of the film (6:00–11:20)
that comes after a short prologue. Here, we follow the protagonist Henry walking
home from work through an empty industrial landscape, into his building, up the
elevator, and down the hallway to his apartment. Outside his apartment, the woman
next door approaches him with a message from a girl named Mary who called on the
payphone. After a minute’s dialogue, Henry enters his apartment. The whole scene
takes approximately five minutes.
The film’s emphasis on durational time, with little action and narrative progression,
gives room for an affective staging of environmentality and the stimulation of our
environmental imagination. Henry’s appearance, including his conversation with the
neighbor, is awkwardly nervous and tense and it gives the whole scene an uneasy and
claustrophobic feeling. This feeling of anxiety and tension, however, is not only a product
of Henry’s awkward behavior. It is effectively intensified by the scene’s sound design. In
the whole passage, as is the case throughout the entire film, we constantly hear deep,
droning layers of complex abstract noise. On this noisy background, an acousmatic
series of inconspicuous individual sound events is heard, coming from particular but
undefined off-screen locations in the environment as an imagined whole. The overall
result is a looming and penetrating feeling of environmental presence.
The sonic environmentality staged in Eraserhead is not exclusively atmospheric but
equally involves a production of ambient and ecological effects. The droning noise, for
instance, envelops the whole scenario in an ambient sensation of being immersed in a
total field of sound. And the layers of individual sounds from disparate ontological
levels might intensify the environmental imagination of a scenario in which all parts are
interconnected and mutually involved with everything else in a nonhierarchical, eco-
logical mesh. Still, however, sonic atmosphere is arguably the most profound of the
three basic sonic environmentalities in Eraserhead. While the main aesthetic function
of the heavy layers of background noise mainly is to give the whole scenario a strong
overall environmental character (ubiquity, consistency), the major role of each indi-
vidual sound event is to stage a particular atmosphere by simultaneously evoking a
strong sense of site-specificity and a feeling of anthropomorphic presence penetrating
the whole scenario.
528 ulrik schmidt

A short list of the most important individual sound cues that can be heard over the
layer of distant noise during the first scene could read like this:

6:30–6:45 Sound of distant foghorn (c)

6:45–7:20 Organ piece (Fats Waller) with a strong diegetic character as if being
played “live” somewhere in the distance (b)
7:25–7:30 Foghorn in the distance (c)
7:45–8:00 Squeaking mechanical noises (c)
7:50–8:10 Low-pitched electronic hum (a)
8:00–8:30 Sounds of someone banging on metal (b)
8:00–8:45 Heavy low-pitched breathing (c)
8:45–8:50 Foghorn (c)
9:00–9:45 Highly intensified elevator sounds (a)
9:55–10:05 Sound of malfunctioning electric installation (a)
10:30–11:15 Short dialogue with woman across the hall (b)

The individual sounds in the first scene can be categorized into three main groups:
industrial sounds of mechanical or machinic activity (designated with an “a” in the list);
concrete sounds of human bodily actions (b); and sound signals and other sounds with a
strong anthropomorphic, voice-like character (c). These individual sounds from the
different groups, and the way they mix into a continuous sequence of varying intensity,
are the main contributors to the overall atmosphere of the scene as a sense of environ-
mental presence. The sounds of mechanical and bodily action help—in the midst of
chaotic noise—to perceptually consolidate the scenario as a particular place, a physical
location, in which concrete actions take place. And both the specifically human character
of the action (b) and the anthropomorphic sound events (c) further invest the scenario
with a human presence that is not reducible to each single sound but rather stems from
the environment itself as an expressive imaginary whole. The whole environment seems
to be alive, constantly expressing itself and communicating to us about its state of being.
One might want to interpret this expression as a mere sonic representation of Henry’s
mental and emotional condition. But the aesthetic effect is first and foremost nonrepresen-
tational and profoundly environmental. The various sounds persistently perform as
an environmental whole. The combination of noise, site-specific action, and anthropo-
morphic expression affects us by directly stimulating our environmental imagination
and enveloping us in the sense of environmental presence we call atmosphere.

Janet Cardiff and George Bures Miller: Forest

(for a Thousand Years) (2012)
Janet Cardiff and George Bures Miller’s sound art installation Forest (for a Thousand
Years) was created specifically for Documenta 13 in Kassel, Germany, in 2012. Installed
in a small forest opening in the beautiful Karlsaue Park, the audience was invited to sit
sound as environmental presence 529

down and listen to an all-encompassing sonic environment (28-minute loop) coming

from a large system of loudspeakers discreetly hidden from view up among the trees.
Hence, compared to the cinematographic environmentality of Eraserhead, the environmen
tality of Forest is produced by the use of sound only and it is meticulously installed—
with the use of a complex Ambisonics surround sound system—into a specific location
in the Karlsaue Park. This site-specific and surrounding quality of the installation is
used extensively to stage the work’s particular environmental characteristics.
As was the case in Eraserhead, the sonic environmentality produced in Forest is not
purely atmospheric. Ambient and ecological effects are also important aspects of the
overall experience of the work as sonic environmentality. For instance, the mix of
technically reproduced sound with actual sounds from the physical surroundings
into a single acousmatic mesh potentially evokes an ecological sense of dehierarchized
interconnectivity. Sounds from the loudspeakers blend, almost imperceptibly, with
natural sounds from the forest and from members of the audience into an intensified,
acousmatic forest ecology. And simultaneously, the immersive character of the whole
technical setup—with loudspeakers disseminated throughout the entire site and the use
of a full-sphere surround system that allows for simulations of accurate location and
movement of virtual sound events—strongly helps to amplify a basic ambient sensation
of being surrounded by the sonic environment as a consistent, all-encompassing whole.
Nevertheless, the most profound characteristic of Forest is, once again, the way in
which the piece affectively tunes the whole situation into a penetrating atmospheric
sense of environmental presence. In terms of aesthetic impact, the functions of the
ecological and ambient environmentalities are secondary as they mainly support this
pervading production of atmosphere. As was the case in Eraserhead, the atmospheric
presence in Forest mainly has to do with the very character and properties of the sounds
heard. In order to give a better idea of the sonic material in Forest and the sequential
structure of the work, consider a short description of a single 28-minute loop reconstructed
from field notes I made while visiting Documenta in July 2012:

We arrive in a quiet section of the work. All we hear are birds singing quietly among
the trees occasionally accompanied by the sound of people walking, handling things
and moving different objects around. A low-pitched drone of electronic sound fades in
to fill the environment for a few minutes, superimposed with speaking voices in a chaotic
mix of non-sensible chatter. The drones and voices disappear and we hear birds again,
now accompanied by the sound of people hammering and knocking on wood.
Occasionally, the knocking sounds join in rhythmic coordination, always on the verge
of becoming a musical practice. Suddenly, a large tree crashes to the ground with a
loud sweeping sound. After a short while a group of people starts to laugh, at first more
discreetly and dispersed, but soon more intensely and in concert. They laugh together
and they laugh at something, although we do not know what it is. The laughing stops
and after a short silence a high-pitched sound emerges, slowly, almost imperceptibly,
and soon we find ourselves immersed in the cacophonic noise of a heavy storm descend-
ing. After the storm has passed, we once again hear the sound of birds and people
walking around, handling different pieces of metal and wooden objects. Occasionally
530 ulrik schmidt

we can hear the snorting sound of a large animal nearby. After a period of stillness—a
disturbing stillness, as if the whole environment is waiting for something to happen—the
space is pierced by the haunting scream of a girl somewhere in the distance. After a
short while, we hear the metallic sound of cars rolling by, soon followed by marching
feet and droning airplanes above. What sounds like a large wooden wagon is being
pulled across the forest, we hear the neighing of horses and sounds of military drums
approaching. Suddenly a group of men are shouting aggressively nearby, and we find
ourselves in the middle of a sonic battle of gunshots and bombs exploding. The droning of
airplanes return, machineguns and missiles are being fired everywhere. The battle ends in
a brief intense climax. After a short period of penetrating silence, we can hear the beautiful
sound of choir music (Arvo Pärt’s Nunc dimittis [2001]). The music plays for a few
minutes, then the sounds of singing birds and people moving around return once again
and another 28 minutes’ loop begins. (notes translated from Danish by the author)

As this short description of Forest suggests, each sonic event has a very specific aes-
thetic function in the overall production of atmospheric environmentality. They all
help to provoke our environmental imagination by creating a tense experience of
physical action and aroused emotions, acousmatically distributed among the trees to
produce an atmospheric sense of environmental presence. In fact, Forest is a profound
example of the very combination of site-specific action and anthropomorphic affec-
tivity that is the main feature of atmospheric environmentality. Everything we hear
supports the production of a sense of specificity and virtual human presence among
the woods and helps to intensify the overall affective character of the environment as
an imaginary whole.
Apart from the musical intermezzo, the basic sonic means used to create the atmosphere
in Forest are essentially quite similar to the ones in Eraserhead. The sonic material
mainly consists of acousmatic sounds of animals, human bodily action, voices, and
machines combined with occasional sounds of electronic drones and stormy
weather. What distinguishes the production of sonic atmosphere in Forest from that of
Eraserhead is, among other things, a much stronger emphasis on dramaturgical elements.
The narrative action, however, remains somewhat abstract throughout the whole cycle.
Despite the fact that we hear all sonic action in excessive detail, what exactly is taking
place remains obscured, hidden as the action is behind the double acousmatic curtain of
the forest/sound system.
But again, precisely because of this acousmatic abstraction, the sounds stimulate our
environmental imagination all the more forcefully by intensifying our tendency toward
causal listening. We constantly strive to locate the virtual action that is taking place
around us and to figure out “what it is.” In direct contrast to Schaeffer’s hope for a pure
reduced listening in acousmatic space, Forest thus becomes a demonstration of how, in
Luke Windsor’s words, “the acousmatic curtain” not merely serves “to obscure the sources
of sounds. Indeed, it can be seen to intensify our search for intelligible sources, for likely
causal events” (2000, 31). So, in this process of intensified causal listening effectuated by
Forest’s double acousmatics, we spontaneously merge the disparate events into a multi-
sensory feeling of environmental wholeness that is both abstract and concrete at the
same time. To repeat the initial quote from Deleuze, the feeling of environmental
sound as environmental presence 531

presence evoked by Forest is “distinct-obscure”—distinct and obscure at the same time.

We never know exactly what is going on, but what we hear still affects us directly as
particular actions taking place around us in all their specific environmental presence,
right here and right now.
This atmospheric stimulation of our environmental imagination takes place in two
ways. First, by being mapped perceptually onto a mental image of environmental
wholeness, the artificially produced acousmatic sounds produce a strong environmental
reality effect. The combination of advanced technology and forest acousmatics creates a
hyperreal environmental spectacle in which the line between (virtual) reproduction and
(actual) production is perceptually blurred. Precisely because of this double acousmatic
character of the piece we are encouraged all the more intensely to invest the sounds with
a strong sense of site-specific presence—as if they were actually taking place around us,
as if they were in fact causal products of real environmental activities hidden from
our eyes out there among the trees. Forest is site-specific sonic environmentality as
atmospheric spectacle.
Second, Forest is a profound example of the use of anthropomorphic sound to create a
sense of all-encompassing environmental affectivity. Apart from the changing weather
conditions and generic forest sounds such as bird song, the sonic environment continually
evokes a particular sense of human presence: human bodily activity, social interaction,
groups of people battling, intense cries of joy and horror. However, because of the
acousmatic veil we are not given a fully actualized diegetic world in which to situate the
human activities. Instead, we are left with a more abstract and nonlocalizable environ-
mental imagination of a “global” physical and emotional being, affectively penetrating
the scenario as a whole. We may not know exactly what is going on and we may not be
able to locate the events and distinguish them from each other, but we sense and imagine
the anthropomorphic tension and constant change in intensity and mood as an overall
environmental affect. We attune ourselves to the state and being of the environment as
an imaginary whole.
To summarize, we can conclude that the atmospheric environmentality of Forest is
characterized by the combined production of two different affective sensations of environ-
mental presence. First, Forest creates a feeling of a site-specific, hyperreal spectacle that
anchors the event in the space and time of the performative situation itself. And second,
the abstract anthropomorphic expressions penetrate the entire event with an environ-
mental imagination of affective presence.

Conclusion

The aim of this chapter has been twofold. First, the aim has been to explore our affective
relations with the sonic environment on a general level and, second, to analyze this
relationship in a more specific context as the production and imagination of sonic
atmospheres. Atmosphere is understood as the environmental production of a sense of
site-specific and anthropomorphic presence. The two examples considered—Lynch’s
532 ulrik schmidt

Eraserhead and Forest (for a Thousand Years) by Cardiff and Miller—are from the fields
of cinematographic sound design and contemporary sound art respectively. It would
indeed be possible, though, to expand the perspective and transfer the chapter’s overall
argument to other areas of contemporary auditory culture where the staging and
experience of sonic environmentality is of equal importance. In many computer games,
for instance, not only is the sonic production of environmentality crucial to give the
gameplay a sense of worldly realism but also very often sound is used intensively to
affect our environmental imagination of the game environment with an atmospheric
sense of site-specific and anthropomorphic presence quite alike the ones found in film
(Eraserhead) and sound art installations (Forest). And again, we can find similar tenden-
cies, albeit with quite different means, in our everyday use of background music where
the sonic production of atmospheric presence often plays an important role in the
staging of everyday social interactions. While generally stimulating the basic environ-
mental mode of listening described by Anahid Kassabian as a form of “ubiquitous
listening” (2013), background music is also, on a more specific level, typically used in
everyday life to intensify our experience of being in a particular place or social situation
by evoking a sense of site-specific and anthropomorphic presence. Sound and music is
employed to affectively evoke an environmental feeling of being in a particular place
and a particular mood.
In other words, sonic environmentality and the production of sonic atmosphere
cover a vast and diverse field of aesthetic practice including some of the most important
areas of contemporary auditory culture. With the distinction presented here between
atmosphere, ambience, and ecology as three basic dimensions of our affective relations
with the sonic environment, I have proposed a theoretical framework for a possible
further exploration of it in its different “distinct-obscure” manifestations. Hopefully,
such a framework may inspire other contributions to the future development of what
could become a general aesthetics of sonic environmentality. Still, however, in this process
we must keep in mind not only the affective and imaginative character of sonic environ-
ments but also how they affect us and stimulate our imagination as environments.
A true aesthetics of our sonic environment is, first and foremost, an aesthetics of sonic
environmentality.

References
Anderson, B. 2014. Encountering Affect: Capacities, Apparatuses, Conditions. Farnham,
UK: Ashgate.
Böhme, G. 1993. Atmosphere as the Fundamental Concept of a New Aesthetics. Thesis
Eleven 36: 113–126.
Böhme, G. 1995. Atmosphäre. Frankfurt am Main: Suhrkamp Verlag.
Böhme, G. 2001. Aisthetik. München: Wilhelm Fink Verlag.
Bregman, A. S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound.
Cambridge, MA: MIT Press.
Buchanan, B. 2008. Onto-Ethologies. Albany: State University of New York Press.
Deleuze, G. 1988. Spinoza: Practical Philosophy. San Francisco: City Light Books.
sound as environmental presence 533

Deleuze, G. 1994. Difference and Repetition. New York, NY: Columbia University Press.
Gendler, T. 2013. Imagination. Stanford Encyclopedia of Philosophy, edited by E. N. Zalta.
http://plato.stanford.edu/archives/fall2013/entries/imagination/. Accessed June 26, 2017.
Gibson, J. J. 1986. The Ecological Approach to Visual Perception. Hillsdale, NJ: Erlbaum.
Guattari, F. 2000. The Three Ecologies. London: Athlone Press.
Hasse, J. 2014. Atmospheres as Expressions of Medial Power. Lebenswelt 4 (1): 214–229.
Heidegger, M. (1927) 1996. Being and Time. Albany: State University of New York Press.
Herzogenrath, B. 2008. An [Un]Likely Alliance: Thinking Environment[s] with Deleuze/
Guattari. Newcastle upon Tyne, UK: Cambridge Scholars.
Herzogenrath, B. 2009. Deleuze/Guattari and Ecology. New York: Palgrave Macmillan.
Kane, B. 2014. Sound Unseen: Acousmatic Sound in Theory and Practice. Oxford: Oxford
University Press.
Kassabian, A. 2013. Ubiquitous Listening: Affect, Attention, and Distributed Subjectivity.
Berkeley: University of California Press.
Kim-Cohen, S. 2013. Against Ambience. New York: Bloomsbury.
Kwon, M. 2002. One Place after Another. Cambridge, MA: MIT Press.
López, F. 2004. Profound Listening and Environmental Sound Matter. In: Audio Culture,
edited by C. Cox and D. Warner, 82–87. New York, NY: Continuum.
López, F. 1998. Schizophonia vs L’objet Sonore: Soundscapes and Artistic Freedom. eContact
1 (4). http://www.franciscolopez.net/schizo.html. Accessed June 21, 2016.
Lynch, D. 1977. Eraserhead. Libra Films International.
Lynch, D., and A. Splet. 1982. Eraserhead. Original Soundtrack. I.R.S. Records.
McCullough, M. 2013. Ambient Commons: Attention in the Age of Embodied Information.
Cambridge, MA: MIT Press.
Merleau-Ponty, M. (1945) 2005. Phenomenology of Perception. London and New York:
Routledge.
Morton, T. 2007. Ecology without Nature: Rethinking Environmental Aesthetics. Cambridge,
MA: Harvard University Press.
Morton, T. 2010. The Ecological Thought. Cambridge, MA: Harvard University Press.
Schmidt, U., 2013. Det ambiente: Sansning, medialisering, omgivelse [The Ambient: Sensation,
Mediatization, Environment]. Gylling, Denmark: Aarhus University Press.
Schmidt, U. 2015. The Socioaesthetics of Being Surrounded: Ambient Sociality and Movement-
Space. In Socioaesthetics: Ambience—Imaginary, edited by A. Michelsen and F. Tygstrup,
25–39. Leiden: Brill Publishers.
Schmitz, H. 1993. Gefühle als Atmosphären und das affektive Betroffensein von ihnen. In: Zur
Philosophie der Gefühle, edited by H. Fink-Eitel and G. Lohmann, 33–56. Frankfurt am
Main: Suhrkamp Verlag.
Schmitz, H. 2014. Atmosphären. München: Verlag Karl Alber.
Schaeffer, P. 1966. Traité des objets musicaux. Paris: Éditions du Seuil.
Schafer, R. M. 1977. The Soundscape: Our Sonic Environment and the Tuning of the World.
Rochester, VT: Destiny Books.
Toop, D. 2004. Haunted Weather. London: Serpent’s Tail.
Uexküll, J. von. 1921. Umwelt und Innenwelt der Tiere. Berlin: Springer.
Uexküll, J. von. (1934) 2010. A Foray into the Worlds of Animals and Humans. Minneapolis:
University of Minnesota Press.
Windsor, Luke. 2000. Through and around the Acousmatic: The Interpretation of Electro-
Acoustic Sounds. In Music, Electronic Media and Culture, edited by S. Emmerson, 7–35.
Aldershot, UK: Ashgate.
chapter 26

The A esth etics of

Im prov isation

Andy Hamilton

Introduction

Within philosophical aesthetics, and musicology generally, improvisation as an approach

to musical performance remains misunderstood. It is still sometimes regarded as having a
lower status than the interpretation of composed works, even if it is now less commonly
treated as “instant composition,” “made up as you go along.” The opposition between an
aesthetics of perfection and an aesthetics of imperfection offers a fruitful context for its eluci-
dation and applies across the range of sound and imagination. The latter concept was coined
by Ted Gioia in his book The Imperfect Art, to denote a valuation of spontaneous process
over finished product, expressed most clearly in the work of improvising musicians. It is
important to stress that this ideal of spontaneous creation applies also to interpretation of
composed works, which are typically not mechanically reproduced as improvisers some-
times suggest, but creatively realized; spontaneity at the micro-level is compatible with fol-
lowing a score, and creates a higher level of creative performance. This chapter is concerned
to develop and defend an aesthetics of imperfection, which provides a theoretical
framework that contrasts with traditional perfectionist attitudes across the range of per-
formance practice and deepens our understanding and appreciation of both improvised
and composed music. It concludes with a discussion of the relation between the aesthetics
of imperfection and the status of improvised music as art music or classical music.

A Philosophical Humanist Approach

to Music Aesthetics

In contrast to Justin Christensen’s (this volume, chapter 1) entry, this chapter addresses
improvisation in its cultural rather than psychological aspects—its expression as historical
536 andy hamilton

practice, rather than its nature as embodied cognition. It is impossible completely to

separate these contrasting aspects. But on the humanistic approach to music aesthetics
which this article assumes, they stand in some tension. A humanistic approach treats
music as a sounding, vibrating phenomenon, and a performing art or entertainment. This
approach is opposed to an abstract, Platonist conception, which is nonparticipant and
intellectualist, and identifies a musical work with a written score; but it also opposes
the subpersonal standpoint of much philosophical discussion of neural research.
Humanism asserts the centrality of artistic criticism to the understanding of art—a
humane, aesthetic investigation in which neural research has little relevance. Such
research can tell us much about our musical responses, but its implications for the
appreciation or understanding of music as an art should not be exaggerated—as it is
when the personal level of aesthetics and the subpersonal level of brain processes and
activity are conflated.
On a humanistic view, music is an art with at least a small “a”—a practice involving
skill or craft whose ends are essentially aesthetic, and which is the necessary object of
aesthetic attention, with sounds regarded as tones. (The most commodified popular
music perhaps falls below this level, into more entertainment—see Hamilton forth-
coming.) “Aesthetic attention” involves appreciation of beauty and cognate or related
notions. Many Ancient Greek theorists seem to neglect the auditory experience of
music, and so do not regard it as an art in our current sense; it is sometimes assumed
that the Greeks had an ethical or a mathematical, rather than an aesthetic, conception
of music. That may be true of theorists, but not of Greeks in general; an aesthetic con-
ception of music should be attributed here as in other cross-cultural instances (see,
Hamilton 2007a, chap. 1).
The art of music aims at the imaginative treatment of sound—according to the con-
trast between fantasy and imagination stressed by Coleridge, “imaginative” implies that
the result is not mere entertainment, but aspires to artistic creativity. Scruton (2015), in
Art and Imagination, generalized Coleridge’s account of imagination as a capacity of
rational beings, beyond central cases of imagining and “seeing as,” to include any enter-
taining of unasserted thoughts and their equivalent in mental imagery and perception.
For Scruton, experience of music has as its intentional objects (1) sounds and silences
(featured in an asserted thought) and (2) life and movement (featured in an unasserted
thought, involving the imagination). That is, on Scruton’s view, we literally hear sounds
and silences and, through a process of imaginative, metaphorical perception, hear life
and movement in them. While his view that the perception of music is necessarily meta-
phorical may be questioned, the essentially humanistic basis of Scruton’s account should
be acknowledged. An imaginative or artistic treatment of sound is an artistic treatment,
though it is not unique to music, if one accepts that there is nonmusical sound art—and
clearly, the latter can include improvisation. But my present focus is on improvisation in
music, rather than in nonmusical sound art.
To develop this connection between imagination and art: I am asserting a conceptual
holism between art and imagination, an interdependence of the two concepts.1 This
position is famously associated with one of the most notorious theories of art in the
the aesthetics of improvisation 537

philosophical canon, the Croce-Collingwood theory. In Outlines of a Philosophy of Art

from 1924, Collingwood introduced his view that art is a form of imaginative activity, to
be contrasted with the production of a physical object. He developed the distinction in
his better-known The Principles of Art.2
For Collingwood, the artist is involved in a special type of making, namely, imaginative
creation:

a thing which “exists in a person’s head” and nowhere else is alternatively called an
imaginary thing. The actual making of the tune is therefore alternatively called the
making of an imaginary tune . . . the making a tune is an instance of imaginative
creation. The same applies to the making of a poem, or a picture, or any other work
of art. (1958, 134)

According to the standard reading of the Croce-Collingwood view, the “total imagina-
tive experience” that constitutes the “work of art proper”—the artwork in a strict rather
than colloquial sense—must be regarded as related only contingently to the physical
artifact. (Collingwood’s position in fact may be subtler than this.)
In the aesthetic as opposed to general psychological case, I believe, this radically
mentalistic conception of imagination is mistaken. Equally mistaken, I believe, is Sartre’s
account of imagination, with which it has affinities. Sartre (2004) argues that a musical
work such as Beethoven’s Seventh Symphony exists neither in time, in the usual sense,
nor space. Rather, it exists in the imagination, in the imaginary, outside the real:

To the extent that I grasp it, the symphony is not there, between those walls, at the
tip of the violin bows. Nor is it “past” as if . . . in the mind of Beethoven. It is entirely
outside the real. It . . . possesses an internal time [which] does not follow another
time that it continues . . . nor is it followed by a time that would come “after” the
finale. [Yet the] Seventh Symphony depends, in its appearance, on the real: that
the conductor does not faint, that a fire breaking out in the hall does not put a sudden
stop to the performance. (191–194)

This Sartrean view of music shows the limitations of metaphysics, I believe—a cloud of
philosophy condensed in a drop of grammar—though this is not the place to offer
a critique.
However, one does not have to follow these writers in espousing a radically mentalistic
conception of both art and imagination, in order to recognize the germ of truth in the
connection between these concepts. The truth that these theories mislocate can be
explained as follows: When a piece of entertainment or craft is described as involving an
imaginative achievement, then it is being claimed as belonging to the realm of art. In
contrast, fantasy—such as the “Game of Thrones” genre—is a staple of low-grade enter-
tainment. Similarly, musical fantasia is composition that panders to a relaxed state of
pleasant, unimaginative connections and themes requiring no concentration or attention,
a pleasing, lazy stream of association that thinks of nothing beyond its own sensations.
This is not to dismiss fantasy entirely from the realms of art. There are works whose
538 andy hamilton

content is predominantly fantastical, such as the last plays of Shakespeare, Keats’s Lamia,
whose main point of interest is genuinely artistic—and Mozart’s keyboard “Fantasias”
include some of his most sublime shorter works. Much entertainment, in contrast, is for
its fans simply fantasy, or “novelty,” to resurrect an eighteenth-century aesthetic category.
Fantasy productions are “imaginative” only in the sense that they involve great insight
into popular taste, thus opening the way to commercial opportunities—an important
and neglected sense of imagination, but one that is nonartistic.
These contrasts apply to performances of improvised music. Here, as in other cases,
answering the questions “Is this art as opposed to mere craft or entertainment?” and
“Does this work involve imagination rather than fantasy?” appeals to the same kinds of
feature. In what follows I demonstrate the artistic status of improvised music, thus
showing its imaginative content.
To reiterate, a humanistic as opposed to abstract account of music sees it as a sounding,
vibrating phenomenon, and a performing art. Abstract or static accounts, in contrast,
are nonparticipant and intellectualist; they regard rhythm statically, as a pattern of
possibly unstressed sounds and silences—as simply order-in-time as opposed to order-
in-movement. Humanists stress music’s essential origins in the human production of
sound and movement, involving a distinctive attack characteristic of traditional musical
means of producing sounds by striking, bowing, or blowing. These means of production,
supplemented in the twentieth century by electronic media, are still essential to the concept
of music. On a humanistic conception, music, dance, and poetry originated together
and are essentially connected. (Chimps may dance or march rhythmically; but for
humanism in my sense, chimps are close enough to human.)
Philosophical humanism affirms the importance of humane understanding against
both scientism and—less common in a more secular intellectual climate—supernaturalist
exceptionalism. Hence the tripartite distinction:

Scientism: the view that the physical or natural sciences constitute the paradigm of
human knowledge, one on which other disciplines must model themselves.
Exceptionalism: the (normally religious) view that “human animal” is a contradiction
in terms, and that human beings are the only biological entity that cannot be grouped
with others on any level.
Philosophical humanism: holds that the explanation of human behavior is irreducibly
personal—that is, it essentially involves what is often termed the intentional stance,
resting on commonsense or “folk” psychology and the attribution of beliefs, desires,
intentions, and similar attitudes to rational agents. Whole-person ascription involving
the intentional stance is the fundamental level of explanation of human behavior.
Subpersonal and neural explanation has a place, but not, as scientism holds, the
ultimate one; so humanism does not amount to exceptionalism as defined earlier.
Humanism is not antiscientific, but antiscientistic—a quite different thing. (This
tripartite distinction is developed in Hamilton 2013a, chap. 7)

This chapter assumes a humanistic recognition of the value of art and the aesthetic to
human well-being. It advocates a normative conception of art and culture that challenges
the aesthetics of improvisation 539

fashionable sociological conceptions, holding that art has a purpose or purposes

essentially, and argues against the common assumption that because “anything can be
art,” it is therefore indefinable.

High Art and Vernacular Art

To say that music is an art, is not to say that it is always a high art—that was the point of
describing it as an art at least with a small “a.” Clearly most music is not; but I will argue
that an improvised art music is possible, for instance in modern jazz. Art music is “Art”
with a capital “A”—high art as opposed to mechanical, vernacular, or popular art with a
small “a,” essentially craft or entertainment. The art historian Oscar Kristeller (1951)
famously argued that the modern system of the fine or high arts appeared only in the
eighteenth century:

In [the] broader meaning, the term “Art” comprises above all the five major arts of
painting, sculpture, architecture, music and poetry. These five constitute the irre-
ducible nucleus of the modern system of the arts, on which all writers and thinkers
seem to agree. (497)

On Kristeller’s view, Plato and the Greeks did not think of poetry and drama, music,
painting, sculpture, and architecture as species of the same genus, practiced by “artists”
in the current overarching sense of the term (Kristeller discussed in Hamilton 2007a).
The modern system separated fine art from craft, generating a concept of high art
produced by artists of genius, while leaving great scope for differences between the
individual arts.
Kristeller’s view underlies the modernist consensus. (Artistic modernism being an
intensification of modernity, from the later nineteenth century onward.) But even if one
disagrees with his claim that the Arts—with a capital “A,” the fine or high arts—arose
only in a modern system, aesthetics still needs to explore the very wide divergences
between modern concepts of art and those found in antiquity and in non-Western
cultures. Kristeller’s concern was with Western art, and widely differing models or
systems are found in other cultures—Edo-era Japan, for instance, valued “the Four
Accomplishments” or gentlemanly pursuits of music, games of skill, calligraphy, and
painting (Guth 2010, 11). Such cross-cultural data are essential in addressing the question
“What is art?,” and are more significant than considerations arising from postmodernism
that have tended to preoccupy Western commentators.3
The twenty-first century has no very clear system of the arts—the vogue for stipulating
one did not much outlive the eighteenth century—and there is a vagueness in the under-
standing of our present “system” of the arts, and in its accompanying notion of an “artistic
conception.” Nonetheless, it should be recognized that there is an implicit system, other-
wise practices of arts funding, newspaper reporting, and so on, would be impossible. The
modernist narrative interprets the fine or high arts, with their associated self-conscious
540 andy hamilton

artistic conception, as autonomous artforms—independent of each other, and having

lost any defining practical or social function. Autonomous arts, according to the mod-
ernist narrative, transcend both the practical utility of the useful or mechanical arts such
as furniture or ceramics, and the social functions—religious, courtly, and military—
which art and music served prior to their evolution as high arts by the eighteenth and
nineteenth centuries. The concept of autonomous art received its most intense expres-
sion in the later nineteenth century doctrine of art for art’s sake, which attempted to
locate the artwork outside the socioeconomic nexus.
The first sense of autonomous art dates from the appearance of the modern system of
the arts: it is the sense that excludes decorative art with a practical function such as
ceramics, weapons, and furniture. That is, it excludes art that lacks practical autonomy.
Whether an artform is capable of such autonomy cannot be entirely predicted; but
humans would have somehow to lose the need for furniture, before such artifacts could
become autonomous art. Even when exhibited in a museum, their functional origins are
inescapable. They may therefore be characterized as intrinsically heteronomous art.
A second sense of autonomy is social autonomy. Though the demarcation between social
and practical function is not a clear one—the representational or pictorial function of
painting, for instance, while serving social functions such as enhancing an aristocratic
patron’s prestige, is not itself social—social autonomy is particularly stressed by the
modernist narrative. Other examples of social function would be eighteenth-century
music for banquets or military pageants, or twentieth-century political art and mass
entertainment—functionality persists after the advent of autonomy. Such artforms are
contingently heteronomous, because they are capable of becoming autonomous. Socially
autonomous art constitutes an autonomous practice whose defining function is aes-
thetic or artistic rather than social.
The possibility of socially autonomous art is often rejected out of hand, but this
rejection may rest on a misunderstanding. What I have in mind is art that has no social
defining function, though it clearly has nondefining functions that are social. The defin-
ing function is what one needs to know, in order to understand anything at all about the
event or process. For instance, Bach’s cantatas were originally composed for church
services, whose purposes they served. In contrast, it would be absurd to say of modern
concert performances that the music serves the social occasion of a concert; the music is
the social occasion. The performance has no defining social function, but rather a
defining functionlessness—though of course it has many nondefining functions that, as
Adorno stressed, arise in virtue of that defining functionlessness. (Arguably the act of
performing has a social function—that of presenting the music to an audience—and so
it could be argued that it has a defining social function. This question must be pursued
elsewhere.)
The aesthetic significance of social autonomy lies in how it can free artist and audience
from socially conditioned taste. It generates what I term a post-Romantic conception of
art, one that regards high or classic art as neither didactic nor pleasurable diversion; one
could say that such art aims at truth, but is not reducible to anything as crude as a
“message,” and artworks are concerned, rather, to raise possibilities for consideration.
the aesthetics of improvisation 541

According to this conception, art is autonomous, and its audience has freedom or
autonomy in interpreting it (see Hamilton 2013b).
This freedom is relative because, according to a familiar modernist dialectic, social
and thus aesthetic autonomy arises from, yet is in tension with, capitalist commodification.
In the period from the Renaissance to the later eighteenth century, different artforms
in turn became free of church and aristocratic patronage, as the artist’s work was com-
modified through entry into the capitalist marketplace. This process is found also in
non-Western art, such as that of Edo-era Japan and, indeed, on a smaller scale in art of
many eras (see Hamilton 2009). What is distinctive about post-eighteenth-century
developments, as in the development of capitalism generally, is their scale and ubiquity.
The concept of high art originates in social distinction, but the implied contrast is not
purely social. According to a persuasive modernist narrative, high art, which appeared
differentially across the arts from the Renaissance onward, is autonomous art; to reiterate,
it transcends the practical utility of mechanical arts, and the premodern social functions—
religious, courtly, and military—of art and music before their evolution as high arts.
High art originated as the art patronized by church and aristocracy, with elevated
themes and subjects. But high social location is neither sufficient nor necessary for high
art. Inigo Jones’s courtly masques for James I are regarded by art historians as expensive,
frivolous high-class entertainment, that wasted the architect’s genius. In contrast, in the
era of modernism, high art embraced low subjects. French realists such as Millet and
Courbet chose humble scenes. While The Gleaners might be imagined as having a high,
biblical theme, impressionism’s urban subjects could not; Caillebotte was criticized for
his working-class The House-Painters and The Floor-Scrapers. High art and art with a
small “a” is distinguished by autonomy and not directly by aesthetic value.
High art is not just a social category but also is historically conditioned—fully mani-
fested in Western modernity, but present in earlier times and other places. “High art”
parallels “High Renaissance” or “high modernism”—it refers to the highest or most
exemplary achievement (see Hamilton 2009). (“Fine art” stands in contrast with
mechanical art.) The modernist narrative interprets the fine or high arts, with their asso-
ciated self-conscious artistic conception, as autonomous artforms—independent of each
other, and having lost any defining practical or social function. Art ceased to be a prod-
uct simply for an occasion, and is liberated from direct social function in service of
court, aristocracy or church. It is created not simply to satisfy a patron, but as authentic
artistic expression. The modernist picture is that such a possibility, though perhaps only
remotely realizable, opens up when art enters the marketplace; art becomes potentially
autonomous at the same time as it becomes commodified. For different artforms, this
liberation occurred at different points from the Renaissance to the eighteenth century,
when music, the most backward art in this respect, finally gained its freedom. It is no
coincidence that the concept of genius and originality, less possible in a craft tradition,
flourished at this time.
In presenting a work of art before the public, the artist is claiming—or hoping—that
it is worthy of their undivided attention and will richly reward it. Thus, at a concert of
contemporary music at the Huddersfield Festival in 2011, I was struck by the way that
542 andy hamilton

programming a concert of art music, in which the audience is meant to be silent and
attentive, implies a demand on them by the artwork—one which, as on this occasion,
might be justified, if the quality of the works were not high. In contrast, muzak in a bar, or
Tafelmusik at an eighteenth century aristocratic banquet, make no such claim. Similarly,
a painting in an art gallery makes the claim of art, while a kitsch reproduction at a cheap
furniture store does not. That is one sense of “the claim of art.” It is the claim that an
artwork makes on us, as opposed to the claim involved in calling something an artwork.
Some proponents of high art might argue that improvised music does not justify such
attention—hence its performance in clubs or bars. This criticism is now addressed by
exploring the dialectic of perfectionist and imperfectionist aesthetics.

Perfectionist and Imperfectionist

Aesthetics

A humanistic aesthetic rejects the artistic primacy of the musical score, espousing what
I have termed the aesthetics of imperfection. This aesthetics questions the centrality of
the Western art music tradition within philosophical aesthetics and argues, with Ted
Gioia, that despite its formal deficiencies, we are nonetheless interested in the “imper-
fect art” of improvisation. Gioia originates the term “the aesthetics of imperfection,” and
defends it against what he calls “the aesthetics of perfection,” which takes composition
as the paradigm (Gioia 1988).
The aesthetics of perfection emphasizes the timelessness of the work and the author-
ity of the composer and, in its pure form, is Platonist and antihumanistic. In contrast,
the aesthetics of imperfection is more consciously humanistic. It values the event or
process of performance, especially when this involves improvisation—though these
opposites turn out to be dialectically interpenetrating. Thus, the contrast between
composition and improvisation proves more subtle and complex than Gioia and other
writers allow. The focus in this chapter is principally on jazz and related popular music,
but much of the discussion is applicable to other kinds of improvised music.
The opposition between these rival aesthetics became sharpened and intensified in
the West during the nineteenth century with the increasing specification and pre-
scription that musical notation placed on performers. The process reached its high
point during the later nineteenth and twentieth centuries, being associated with the
increasing hegemony of the work-concept. An artistic practice that had once involved
improvisational freedom for performers became limited to interpretation of an essentially
fixed work. The dichotomy between improvisation and composition lacked its present
meaning, or perhaps any meaning at all, before this process was well advanced: “By
1800 . . . the notion of extemporization acquired its modern understanding [and] was seen
to stand in strict opposition to ‘composition’ proper” (Goehr 1992, 234). Philosophers have
tended to neglect improvisation as a contrast to composition. In Scruton’s The Aesthetics
the aesthetics of improvisation 543

of Music, for instance, the work-concept dominates, and an improvisation is treated as a

work that is identical with a performance (Scruton 1997). I will argue that an aesthetics
of perfection arose with the work-concept and is opposed by an aesthetics of imper-
fection associated with improvisation.
This opposition offers a fruitful framework for looking at certain aesthetic questions in
the performing arts. An illustration is found in the debate between Busoni, the defender
of improvisation, and Schoenberg, the compositional determinist.4 Schoenberg empha-
sized the autonomy of the composer-genius in the creation of masterworks that, he
insisted, required the complete subservience of the performer; he stood for increasing
individuality for the composer at the expense of that of the performer. Busoni, however,
found virtues in improvisation and in the individual contribution of the performer-
interpreter. He argues, “Every notation is, in itself, the transcription of an abstract
idea. The instant the pen seizes it, the idea loses its original form.” In a rather elusive
discussion, he argues that the purity of the improvisation is closer to the locus of artistic
inspiration. This opposition expresses a dilemma that Western art music has found very
hard to resolve. As Rose Rosengard Subotnik (1991) puts it, “when efforts to preserve the
autonomy of the composer’s vision are unbounded, the performer is turned into a kind
of automaton” (256).
The aesthetics of imperfection focuses on the event or process of performance, while
the aesthetics of perfection endorses them and emphasizes the timelessness of the work.
The dichotomy, as will become clear, implies others: process and product; imperma-
nence and permanence; spontaneity and deliberation.
The idea of an “aesthetics of imperfection” may appear paradoxical, its connotations
too negative—how could imperfection be an aesthetic value? However, “perfection” and
“imperfection” have a descriptive sense close to their Latin derivation—“perficere”
means “to do thoroughly, to complete, to finish, to work up;” “imperfectus” means
“unfinished, incomplete.” The aesthetics of imperfection finds virtues in improvisation
that transcend errors in form and execution—virtues that arise precisely from the
“unfinished state” of such performances. Thus, in the arts, sketches help us understand a
work’s development, and are sometimes regarded as at least as valuable as the fully
crafted final product—the inspiration is freer, and closer to its unconscious source. For
instance, some critics might regard Constable’s full-scale preliminary sketches as having
a liveliness that the finished landscape paintings, for all their other qualities, might lack.
The result is an “aesthetics of imperfection” or incompleteness, where the listener or
reader contribution is greater than in more prescriptive or “perfectionist” aesthetics.
However, one should not advocate either aesthetic, of perfection or imperfection, with-
out qualification. Rather, “improvisation” and “composition” denote ideal types or inter-
penetrating opposites. A feature that seems definitive of one type also turns out to be
present, in some sense, in the other—or so I will argue with regard to preparation, spon-
taneity, and structure.
There is a continuum of improvised practice, as follows. Pre-realized electronic
music stands at the far limit of prestructuring since, although possibly possessing sponta-
neity at the level of composition, at the level of performance or “sounding” it is fixed.
544 andy hamilton

Trial-and-error compositional efforts of students in a recording studio stand in contrast

with the organic, motivically developing, through-composed works of Brahms and
Schoenberg. Within the improvised sector, preperformance structuring ranges from the
work of jazz composers such as Ellington and Gil Evans to the very loose frameworks
brought along by Miles Davis to the Kind of Blue recording session. At the furthest
“improvised” limit of the continuum stands free improvisation, a development of 1960s
free jazz, which abandons the recurring harmonic structures and groundbeat of earlier
jazz. Thus, the aesthetics of perfection and imperfection apply not just at the level of
performance, but within the process of composition also. Or rather, there is a sense in
which these levels overlap.
The rival aesthetics extend into other aspects of artistic production; thus, for instance,
recording offers its own issues of perfection versus imperfection. Perfectionists believe
that allegedly contingent conditions of live performance can be screened out—as in the
creative recording techniques of the pianist Glenn Gould. The imperfectionist view, in
contrast, is that recording should be a transparent medium giving a faithful represen-
tation of a particular performance, with only the grossest imperfections eliminated (see
Hamilton 2003).
Although an aesthetics of perfection seems to demand absolute fidelity to the compos-
er’s intentions—or rather, it has a very narrow and stringent conception of what such fidel-
ity involves—it should be separated from a commitment to authentic performance in its
present-day sense (see Davies 2001). The aesthetics of perfection may imply a Platonist
conception of the musical work as a timeless sound-structure, detachable from its original
conditions of performance, instruments as well as locations. The converse implication,
from Platonism to perfectionism, is stronger, as Glenn Gould’s remarks illustrate:

Music need not be performed any more than books need to be read aloud, for its
logic is perfectly represented on the printed page; and the performer . . . is totally
unnecessary except as his interpretations make the music understandable to an
audience unfortunate enough not to be able to read it in print.
(Gould in Bazzana 1997, 20–21)

The late-twentieth-century concept of authenticity, in contrast, exhibits aspects of both

perfection and imperfection. It has been argued that it rejects the “portability of music”
in favor of an ideal of acoustic interdependence of composer, ensemble, and environment;
but it also seems to search after a timeless conception of the work.5

The Concept of Improvisation

and “Improvised Feel”

What does spontaneity amount to in improvised performances? And how does it matter
aesthetically? These questions bring us to the heart of the concept of improvisation.
Those who adopt a purely causal account of the concept of improvisation imply that its
the aesthetics of improvisation 545

presence is of little aesthetic consequence. Thus, Cavell claims that the standard concept
“seems merely to name events which one knows, as matters of historical fact . . . inde-
pendent of anything a critic would have to discover by an analysis or interpretation . . . not
to have been composed.” And Eric Hobsbawm writes: “There is no special merit in
improvisation. . . . For the listener it is musically irrelevant that what he hears is improvised
or written down. If he did not know he could generally not tell the difference.” However, he
continues, “improvisation, or at least a margin of it around even the most ‘written’ jazz
compositions, is rightly cherished, because it stands for the constant living re-creation of
the music, the excitement and inspiration of the players which is communicated to us.”6
The concept of improvisation does have an essential genetic component—a succinct
definition would be “not written down or otherwise fixed in advance.” A purely genetic
account claims that whether a performance is improvised may not be apparent merely
by listening to it, and adds that the mere fact that a performance is improvised is not an
aesthetically or critically relevant feature. The account diagnoses what amounts to an
“intentional fallacy” concerning improvisation—reminiscent of the suggestion that
extraneous knowledge of authorial intention is irrelevant to critical evaluation.
The genetic account exaggerates the extent to which improvisation is undetectable,
however. There is a genuine phenomenon of improvised feel, gestured at by Hobsbawm’s
comments on what improvisation symbolizes. In The Art of Improvisation from 1934,
T. C. Whitmer offered a set of “General Basic Principles,” which included the expression
of an aesthetics of imperfection:

Don’t look forward to a finished and complete entity. The idea must always be kept
in a state of flux. An error may only be an unintentional rightness. Polishing is
not at all the important thing; instead strive for a rough go-ahead energy. Do not
be afraid of being wrong; just be afraid of being uninteresting.
(Whitmer in Bailey 1993, 48)

From this feel arises the distinctive form of melodic lines and voicings in an improvised
performance. Lee Konitz describes a “very obvious energy” in improvisation, which he
believes does not exist in a prepared delivery: “There’s something maybe more tentative
about it, maybe less strong or whatever, that makes it sound like someone is really
reacting to the moment” (Konitz in Hamilton 2007b).
One might say of a purported improvisation “That couldn’t have been improvised”—
meaning for instance that the figuration is too complex or the voicings too clear to be
created under the constraints of an improvised performance. (Perhaps a genius such as
J. S. Bach could do so.) Conversely, an improvised feel might be present in prepared
playing that takes improvisation as its model, or where a composer is looking to create
an improvised effect. The fact that the performance was not improvised might justifiably
alter one’s view of the skill of the performer; but there is a more elusive sense in which it
matters aesthetically. The artistic ideal of spontaneous creation is one factor that sepa-
rates improvised art music from entertainment. The entertainer, in contrast, perfects a
prepared routine and sticks with it, in the knowledge that it works—a “bag of tricks”
model of improvisation. Routines are avoided by the “modernists” who reject the culture
546 andy hamilton

industry—jazz musicians such as Bill Evans, Paul Bley, Lee Konitz, and others who
disdain flashy virtuosity.
There are various senses in which improvisation matters aesthetically, therefore. Even
assuming a viable notion of “extraneous” knowledge, claims of an intentional fallacy are
not vindicated. They are further undermined when one comes to consider the role of
preparation. Cavell and Hobsbawm seem to subscribe to the “instant composition” view
of improvisation. In my criticism of this view I will develop a positive definition of
improvisation in terms of improvised feel. A continuum of composition and improvi-
sation is reflected in the idea of different kinds of preparation for performance.

Spontaneity and the Aesthetics

of Perfection

The characterization of improvisation as instant composition is assumed both by an

aesthetics of imperfection, with its ideal of complete spontaneity, and by an aesthetics of
perfection, which denigrates improvisation. These positions are in some sense mutually
dependent; the difference is that one praises instant composition while the other con-
demns it. Later I will criticize the first position. Here, I argue against the second that
claims that improvisers, in their fruitless aspiration to spontaneity, recycle rehearsed
material; on this view, improvisation is a barrier to individual self-expression, not a way
of realizing it.
Thus, Adorno treats jazz’s aspirations to spontaneous improvisation as hollow, subju-
gated by the demands of the culture industry of which it is a part. Modernist composers
are almost unanimous in their negative view of improvisation. Elliott Carter (1997), for
instance, argues that it allows undigested fragments of the unconscious to float to the
surface. His conclusion is that “improvisation is undertaken mainly to appeal to the
theatrical side of musical performance and rarely reaches the highest artistic level
of . . . Western [art] music” (324–325). Pierre Boulez questions the more radical chance or
aleatoric techniques, deployed during the 1960s by Stockhausen and others, which leave
much to the performer’s decision. His criticism is that familiar patterns of notes are
embedded in the performer’s muscular memory as a result of countless hours spent with
the instrument, to be regurgitated when there is no restraining score. Improvisers
express themselves less than they think because so much of what they play is what they
are remembering, including things they do not even know they are remembering. In a
later interview, Boulez was better disposed toward jazz than toward aleatoric improvi-
sation, though he still stressed what he regards as its limitations: “The [work-concept is]
the top level not only of enjoyment, but also depth. I cannot consider improvisation as
really the highest level.”7
As the aesthetics of imperfection recognizes, the improviser has less chance than the
composer of eradicating cliché in their work. But the improviser’s preparation and practice
the aesthetics of improvisation 547

is precisely intended to keep them from playing what they already know. Thus, there is a
relation between preparation and performance not envisaged by Carter and Boulez—
nor by the polar opposite of their view, the pure spontaneity assumed by a full-blown
aesthetics of imperfection. Mediating the extremes of perfection and imperfection yields
the following picture. Interpreters think about and practice a work with the aim of giving
a faithful representation of it in performance. Improvisers also practice, but with the aim
of being better prepared for spontaneous creation. Many improvisers will formulate
structures and ideas, and, at an unconscious level, these phrases will provide openings
for a new creation. Thus, there are different ways for a performer to get beyond what they
already do, to avoid repeating themselves. For the improviser, the performance must feel
like a leap into the unknown, and it will be an inspired one when the hours of preparation
connect with the requirements of the moment and help to shape a fresh and compelling
creation. At the time of performance, they must clear their conscious minds of prepared
patterns and simply play. Thus, it makes sense to talk of preparation for the spontaneous
effort. As Lee Konitz puts it, “That’s my way of preparation—to not be prepared. And that
takes a lot of preparation!”8 This is the qualified truth in Busoni’s claim, discussed earlier,
that improvisation is valuable because it is closer to the original idea.

Free Improvisers, Interpreters,

and “Improvisation as a
Compositional Method”

To reiterate, the connection between preparation and performance is misconceived by a

radical aesthetics of imperfection, as well as by improvisation’s perfectionist critics.
Thus, some free improvisers aim to improvise, in Ornette Coleman’s words, “without
memory,” while Derek Bailey (1993) advocated “non-idiomatic improvisation,” appar-
ently without a personal vocabulary—a paradoxical notion for such a highly idiomatic
and individual improviser, as has often been pointed out. Against these authorities,
I would argue that an improviser’s individuality resides in, among other things, their
creative development of favorite stylistic or structural devices, without which they risk
incoherence and noncommunication.
The interpretation of composed works is also misconceived, both by imperfectionists
and perfectionists. Many proponents of an aesthetics of imperfection believe that inter-
preters simply “reproduce the score.” The dialectic here parallels that concerning instant
composition: imperfectionists dismiss interpretation as mere reproduction, while per-
fectionists praise it for the same reason, since—on their view—a reproduction leaves no
space for the performer’s individuality. (These are extreme statements, and the views of
Busoni and Schoenberg are more subtle.) In fact, the greatest interpreters produce the
illusion of spontaneous creation.9 Artists of the stature of Lipatti, Brendel, Furtwängler,
or Kocsis make us hear the work anew, as it never has been before. This is a genuine
548 andy hamilton

phenomenon, not an artistic illusion. As interpreters come to know a work intimately,

internalizing it and making it their own—just as actors become the part—a certain free-
dom develops. In contrast to the macro-freedom of improvisers, interpreters have a
micro-freedom to reconceive the work at the moment of performance, involving subtle
parameters such as tone and dynamics.
The process of interpretation is misunderstood by an aesthetics of perfection also.
A well-rehearsed performance of a familiar work will, after all, involve something that
the performer has already played, and this could become stultifying. So, the interpreter
must strive for that improvisational freshness that gives the illusion that they are not
playing “what they already know”—that is, a pre-existing work.
Improvisation makes the performer alive in the moment, bringing them to a state of
alertness, enhanced in a group situation of interactive empathy (Konitz in Hamilton
2007b). But all players have choices inviting spontaneity in performance. These choices
arise from the room in which they are playing: its humidity and temperature, who they are
playing with, and so on. Interactive empathy is present in classical music too, at a high level
in the traditional string quartet. Again, both perfectionists and imperfectionists fail to
recognize that improvisation and composition are interpenetrating opposites—features
apparently definitive of one are found in the other also. It should be stressed that improvi-
sation is not just—perhaps not mainly—an individual achievement of one musician, but
the product of collective teamwork of several musicians. Communication between
musicians and the audience is also a vital part of the process of improvisation.
Although improvisers and composers are no longer in two mutually uncomprehending
camps, pervasive misunderstandings of improvisation remain, which this chapter has
tried to correct. Despite the qualifications of it presented here, I believe that the aesthetics
of imperfection is right to focus on music as event—subverting the received account
whereby works are merely exemplified in performance. This conclusion provides further
support for a humanistic philosophy of music.

Jazz as Classical Music

We now consider the relation between the aesthetics of imperfection and the status of
improvised music as art music. In particular, in what sense is jazz an art music? The jazz
historian Scott DeVeaux writes,

The rapid acceptance of bebop as the basic style by an entire generation of musicians
helped pull jazz away from its previous reliance on contemporary popular song,
dance music, and entertainment and toward a new sense of the music as an autono-
mous art.10

Jazz became an autonomous art, one with a fairly capital “A”—a practice involving skill,
with an aesthetic end that richly rewards serious attention. Like Ming vases and Ancient
the aesthetics of improvisation 549

Greek sculptures, its products are now accepted as (high) art even though its creators
possessed no such concept.
However, many have reservations about describing jazz as an art music; even more so,
about describing it as a classical music. Its products have many of the features of art
music, despite evidently being less contrived than the great works of the Western canon.
Historically, jazz has drawn for its material on the charms of ephemeral pop music,
whose charms arise from their powers of association for individual listeners—what has
been described dismissively as the “potency of cheap music.” When those materials are
used as they are in jazz, an art of great power can be created. The present situation is
more complex, but jazz still provides a case study of the dialectic between popular and
art music. This dialectic gives rise to central aesthetic questions, much-discussed in
musicology and sociology of music, but whose deeper roots philosophical aesthetics
tends to neglect. My suggestion is that jazz shares some of the features of Western art
music—that apparently unique, autonomous art music that contrasts with nonautono-
mous art musics such as gagaku, courtly gamelan, and Indian art musics.
The claim that jazz is a classical music commonly means:

1. Jazz is a serious art form whose long association with the entertainment industry
is no longer essential—in Adorno’s language, it is an autonomous art.
2. It has arrived at an era of common practice, which is codified and taught in the
academy.
3. It has a near-universality and constitutes an international language, transcending
national and ethnic boundaries.

It might be questioned whether any art music—whether Western art music or jazz—
has feature 3. Western art music is not widely appreciated in India, for example. We are
speaking of near- or relative universality, therefore. Features 1–3 apply only partially to
non-Western art musics such as Korean or Japanese classical music, or courtly gamelan.
I suggested that the latter art musics are nonautonomous, but one could argue that all
of these musics developed from a folk or popular music to an autonomous art music.
However, during the twentieth century, jazz acquired the universal status that was previ-
ously the claim solely of the Western classical tradition. Feature 3 is neither necessary
nor sufficient for a genre to be a classical music. It is not sufficient, because rock and roll,
for instance, has a universality, is an international language, but does not—with limited
exceptions, perhaps including vocational courses—constitute an art music taught in the
academy, and is not as separate from the entertainment industry as jazz is. Nor is it
necessary, because Indian art musics do not constitute a universal language. Ascribing a
“universal status” to Western art music will cause objections from many quarters; it
might be argued, for instance, that jazz involves a break with conceptions of “Western”
and “non-Western.” These difficult and controversial issues clearly require a longer
treatment than is possible here.
Jazz’s academic status is shown by music programs like that at Berklee, which encourage
the idea of jazz improvisation as a craft that can be taught academically. What David
550 andy hamilton

Liebman calls the “apprenticeship system”—young players going on the road with Art
Blakey, Miles Davis, and other leaders—has been replaced by an academic training.11
Another factor in jazz’s classical status is canon-creation—the ready availability on digital
media of the complete recorded history of jazz. Critics have an essential role in creating
and sustaining a canon. As Krin Gabbard writes:

The jazz history we have now really wouldn’t exist without the critics . . . would we
have Ornette Coleman without Martin Williams? There were certain artists who fit
the aesthetic and the predetermined historical notions of critics so perfectly that
they were written into the jazz canon. (2000)

Defining Popular and Classical Music

We need to explore in more depth what “classical music” means. It now exists as one half
of a polarity, interdefined with popular music—each concept depends on the other.
(This claim needs to be reconciled with the fact that they did not quite originate
together.) “Classical music” means, in order of decreasing specificity:

1. music conforming to a style-period within Western art music, namely, the first
Viennese School of Haydn, Mozart, and Beethoven—music with ideals of balance
and proportion, in contrast to Baroque garishness and disproportion.
2. Western art music in general—a sense that appeared together with the developing
contrast with popular music. This is the definition understood by the ordinary
listener, for whom “classical music” denotes a range of music from Baroque or
earlier to the contemporary avant-garde.
3. music that possesses a standard of excellence and formal discipline, belonging to
the canon—the accumulation of art, literature, and humane reflection that has
stood the test of time and place, and established a continuing tradition of refer-
ence and allusion.

It was only from the early twentieth century that classical and popular music began to
be defined as a contrasting pair. Popular music is music directed at the tastes of the
mass of the population. “Popular” is normally defined in terms of scale of activity—for
example, sales of sheet music or recordings. The growing divide between art music and
popular music during the nineteenth century was deepened by Wagnerian opera and
became a rupture with the advent of modernism; for many commentators, modernist
art actively sets itself against popular culture (Sadie and Tyrrell 2004). The most influential
account of the sociology and aesthetics of the classical/popular divide is Adorno’s. He
held that, from the nineteenth century onward, all varieties of music, from folk to avant-
garde classical music, have been subject to mass mediation through the “culture industry,”
a term that implies mechanical reproduction for the masses, rather than production by
the aesthetics of improvisation 551

them. For Adorno, the divide is not so much between serious and popular music as
such—a division that has become, in his view, increasingly meaningless due to the almost
inescapable commodity character of cultural products in the twentieth century—but
rather between music that accepts its character as commodity, and self-reflective music
that critically opposes this fate, and thus alienates itself from society (Paddison 1982).
One objection to applying the term “classical music” to Western art music is the
apparent implication that it is the unique classical music—which clearly it is not.
However, I will argue that even its unique “abnormality” is now qualified by the appear-
ance of a comparably “abnormal” classical music, jazz.

The Critique of “Jazz as

Classical Music”

Does jazz exhibit classical tendencies? Are such tendencies desirable? Factual and nor-
mative dimensions of jazz’s classical status interpenetrate but should be distinguished.
Some see jazz still poised between art and entertainment, close to popular music in the
ordinary sense of the term, and contrasting with Western art music. The jazz trumpeter
Brad Goode, for instance, writes, “most jazz musicians, post be-bop, consider themselves
to be ‘artists’ and consequently only consider the integrity of the music during their
performances,” an attitude he finds inconsistent with making a living. My view is that
jazz can be a classical music, and that exploiting the divide between the classical and
popular (in the mass sense) is one of its distinctive strengths as an art of improvisation.
Setting aside the views of those who deny that jazz could be classical because it is of
little artistic value, there are three main reasons for rejecting the classicizing tendency—
that it makes jazz elitist, or safe, or static. The final objection is the most powerful, but is
also misguided. During the eighteenth and nineteenth centuries, Western art music
entered an era of common practice based on functional harmony and the tonal system
of major and minor keys. Some argue that this era came to an end with the “emancipation
of the dissonance” by Schoenberg and his contemporaries; others hold that—concerning
music in everyday life—it is still with us.
There has been a corresponding period in jazz. Like classical music, jazz also seemingly
reached the limits of avant-gardism, though more rapidly. Conrad Cork (1996) argues
that while the evolution of jazz practice was rapid for about five decades, it became much
reduced after the 1970s, either “because the music has atrophied [or] because it has
arrived at a period of common practice, where it can function on its own terms” (73).
Just as classical tonality returned to fashion in the 1970s and 1980s, however, jazz has
seen a conservative reaction. Others are more critical of the era of common practice,
arguing that classical musics and languages are no longer created actively but are con-
served in conservatories; interpreters study the seminal texts in order to restore them to
life. Thus, Emmett Price writes, “Classical implies static, non-changing; a relic frozen in
552 andy hamilton

time. Jazz has never been static, non-changing or frozen,” while Alex Ross refers to the
“pernicious” implication that jazz “has become ‘classical’ in the pejorative sense: complete,
finished, historical.”12
This negative picture is unduly critical, I believe. Classical music is not the curatorial
exercise that these writers assume, and which the authenticity movement in early music
may appear to imply. Classical musics do not have to be “static, non-changing, frozen.”
As Parakilas argues, rather than resuscitating corpses, the classical repertory keeps
“certain old works . . . ever-popular, ever-present, ever-new. It is an idea founded on rev-
erence for the past, but not necessarily on a modern scholarly conception of history. . . .
[It may not take] notice of historical differences between one work and another within
it,” as proponents of early music do ([1984] 2004, 39).
Whether classical musics are “static, non-changing, frozen” depends on the extent to
which a repertory admits new material. Parakilas comments that such a repertory

need not be kept up-to-date with works from the period just past. The repertory of
Gregorian chant, for instance, was considered closed by the time of the Renaissance,
and performers did not sing the older chants within that repertory differently from
the younger chants, though the repertory as a whole was performed differently
from place to place and from one period to the next. (39)

Performance choices were made following a scholarly conception of history, to resusci-

tate the “corpse” of Gregorian chant. In contrast, a living classical repertory is one that is
kept up to date. This, I think, is true both of Western art music and jazz.
A reverence for the exact notes transmitted by history, Parakilas argues, is character-
istic of classic repertories. His comment that since Charlie Parker has become “classic
jazz,” musicians give classical performances which reproduce exactly the “text” of a
recorded performance, probably refers to arrangements of Parker solos by Supersax;
George Russell’s arrangement of Miles Davis’s solo on “So What” is another example.
These cases are not central, however; they are an “early music” as opposed to classical
tendency in jazz. As an improviser’s and not an interpreter’s art, jazz imposes strict
limits on the former, early music possibility—but less so on the latter, classical tendency,
as we now see.

Art and Entertainment: Jazz as

an Art Music of Improvisation

I have argued that the description “classical” is benign, and that the process of classici-
zation has been a largely beneficial one. Jazz and other improvised musics do not need to
be legitimated in a practical as opposed to philosophical sense. What is in question is
not whether the music has artistic value, but how that value arises. One view is that—in
contrast to Western art music—jazz’s artistic value arises in part at least from its status as
the aesthetics of improvisation 553

improvised music. This is Gioia’s assumption, who as we saw, defends the “imperfect
art” of improvisation. On this view, spontaneity implies authenticity, and it makes sense
to talk of preparation for the spontaneous effort—Konitz’s “way of preparation—to not
be prepared.” Konitz has “complete faith” in the spontaneous process (Hamilton 2007b).
A purist version of the aesthetics of imperfection asserts essential differences between
jazz and Western art music. But there are also growing similarities arising from the
developed artistry of jazz, which means that it can be described as an “imperfectionist
art music.”
In jazz, an aesthetics of imperfection, expressed through improvisation, allows pop-
ular materials to achieve art music status. In its early decades, jazz was an offshoot of
the entertainment industry and used its materials. Jazz players later developed loftier
aspirations. As we have seen, some writers distinguish a classical art, that involves
restoration, from a living art, that involves novelty and innovation; on their view,
creativity in interpretation of a classic is the limited kind that re-enacts or reanimates.
This is a misguided account of many classical performing arts I believe. Interpretation is
neither “mechanical reproduction,” as proponents of the aesthetics of imperfection
sometimes view it, nor restoration as in the case of painting or architecture. Of course,
there are different approaches, as there are in the restoration of paintings; but no pristine
authentic performance is possible—the performing arts are inexhaustibly interpretable.
As Parakilas notes, it is the project of the early music tendency, but not that of classical
performers, to reproduce historical Beethoven performances—and even for early music
practitioners, interpretation is inescapable, and usually recognized as such.
It would be wrong to separate sharply “classical arts” and “living arts,” therefore. Against
Parakilas’s assumption that classical and new music are separate practices, they may form
a continuum, thus further undermining the rigid demarcation between classical and
living arts. In performance, the era of common practice endures, both for Western art
music and jazz. These musics aspire to exist in a “common present,” as a living art; classical
exemplars offer inspiration rather than rigid templates. The dialectic between aesthetic
perfectionism and imperfection recurs, therefore (see Hamilton 2007a). Improvisation in
jazz is perfectionist in its affinities with Western art music; while interpretation in Western
art music is imperfectionist in its affinities with improvisation. But improvisation imposes
limits on classical perfectionism in jazz. Recordings such as A Love Supreme or Mingus
Ah Um, are rightly described as “classics” since, as recordings, they are fixed in their per-
fection and work without qualification to classicize jazz. Concert recreations of A Love
Supreme reconstruct but cannot replicate the recording.
Jazz’s nature as an improviser’s rather than an interpreter’s art informs its classical
status, because improvisation is an expression of performers’ creativity. In improvisation,
the performer rather than the composer is the primary creator. In interpreted music, the
composer is the primary creator, and the performer is secondary, though still creative.
This fact sets limits to the “classicization” of improvised music, depending on whether
the performer is primarily concerned with exploring the song’s essence, or prioritizes
their own artistic self-expression. In jazz, the superiority of spontaneous creation over
prepared solos began to be stressed at the same time—during the transition from
554 andy hamilton

swing to bebop, as jazz was becoming an art music and therefore “classicized.” That is,
improvisation became valued in jazz as the music was gaining an identity beyond the
realm of entertainment and commercial commodification. This fact lends support to
the suggestion that jazz is an art music of improvisation. And in showing that improvised
performances have artistic depth, to reiterate the argument of an earlier section, I have
shown that they involve imagination as opposed to mere fantasy or fancy.

Acknowledgments
Thanks for comments and discussion with Gabriele Tommasi, Joanna Demers, Philip Clark,
Conrad Cork, Lee Konitz, Max Paddison, Lara Pearson, Lewis Porter, Brian Marley, David
Udolf, and Jeff Williams.

Notes
1. Conceptual holism is a leitmotif of Hamilton (2013a, see for instance chap. 1).
2. See Guyer, in Baldwin (2003, 728).
3. The contrast between art with a small “a” and with a capital “A,” and the nature of art
before the modern system, is addressed in Hamilton (2007a).
4. The “debate” consisted of Schoenberg writing marginal comments in his copy of Busoni’s
book; subsequent quotations are from Busoni (1962, 84) and Stuckenschmidt (1977,
226–227).
5. The former is the view of Robin Maconie (1990, 150–151).
6. Cavell, “Music Discomposed” (1976, 200); Hobsbawm quote from The Jazz Scene, first
published 1959 under the pseudonym of Francis Newton, quoted in Gottlieb (1997, 813).
7. Boulez (1986, 461); interview with the author, Usher Hall, Edinburgh International
Festival, August 2000.
8. Quoted in Hamilton (2007b); Konitz’s ideas on improvisation are discussed in chapter 6.
9. Stressed by Gunther Schuller in “The Future of Form in Jazz”(1986, 24–25).
10. http://www.oxfordmusiconline.com/grovemusic/view/10.1093/gmo/9781561592630.
001.0001/omo-9781561592630-e-1002248431. Accessed December 17, 2018.
11. Interview in Jazz Review, April/May 2008.
12. Emmett Price, http://www.allaboutjazz.com/php/article.php?id=807. Accessed April 15,
2017; Classical View; Talking Some Good, Hard Truths About Music by Alex Ross,
November 12, 1995, http://query.nytimes.com/gst/fullpage.html?res=9A00E2D61439F931
A25752C1A963958260&sec=&pagewanted=2. Accessed April 15, 2017.

References
Bailey, D. 1993. Improvisation: Its Nature and Practice in Music. Cambridge, MA: Da Capo.
Baldwin, T., ed. (2003). The Cambridge History of Philosophy 1870–1945. Cambridge: Cambridge
University Press.
Bazzana, K. 1997. Glenn Gould: The Performer in the Work. New York: Oxford University Press.
Boulez, P. 1986. Orientations. London: Faber.
Busoni, F. 1962. Sketch of a New Aesthetic of Music. In Three Classics in the Aesthetic of Music.
New York: Dover.
the aesthetics of improvisation 555

Carter, E. 1997. Collected Essays and Lectures, 1937–95. Edited by J. Bernard. Rochester, NY:
University of Rochester Press.
Cavell, S. 1976. Music Discomposed. In Must We Mean What We Say?, 180–212. Cambridge:
Cambridge University Press.
Collingwood, R. 1958. The Principles of Art. Oxford: Oxford University Press.
Cork, C. 1996. Harmony with Lego Bricks. Rev. ed. Leicester, UK: Tadley Ewing Publications.
Davies, S. 2001. Musical Works and Performances. Oxford: Clarendon Press.
Gabbard, K. 2000. Race and Reappropriation: Spike Lee Meets Aaron Copland. American
Music 18 (4): 370–390.
Gioia, E. 1988. The Imperfect Art. Oxford: Oxford University Press.
Goehr, L. 1992. The Imaginary Museum of Musical Works. Oxford: Clarendon.
Gottlieb, R. 1997. Reading Jazz. London: Bloomsbury.
Guth, C. 2010. Art of Edo Japan: The Artist and the City 1615–1868. Yale University Press.
Guyer, P. 2003. Aesthetics between the Wars: Art and Liberation. In The Cambridge History of
Philosophy 1870–1945, edited by T. Baldwin, 721–738. Cambridge: Cambridge University Press.
Hamilton, A. 2003. The Art of Recording and the Aesthetics of Perfection. British Journal of
Aesthetics 43 (4): 345–362
Hamilton, A. 2007a. Aesthetics and Music. London: Continuum.
Hamilton, A. 2007b. Lee Konitz: Conversations on the Art of the Improviser. Ann Arbor:
University of Michigan Press.
Hamilton, A. 2009. Scruton’s Philosophy of Culture: Elitism, Populism, and Classic Art.
British Journal of Aesthetics 49: 389–404.
Hamilton, A. 2013a. The Self in Question: Memory, The Body and Self-Consciousness. London:
Palgrave Macmillan.
Hamilton, A. 2013b. Artistic Truth. In Philosophy and the Arts, edited by A. O’Hear. Cambridge:
Cambridge University Press.
Hamilton, A. Forthcoming. Art and Entertainment. London: Routledge.
Kristeller, O. 1951. The Modern System of the Arts. Journal of the History of Ideas 12 (4):
496–527.
Maconie, R. 1990. The Concept of Music. Oxford: Clarendon Press.
Paddison, M. 1982. The Critique Criticised: Adorno and Popular Music. Popular Music 2:
201–218.
Parakilas, J. (1984) 2004. Classical Music as Popular Music. In Popular Music: Critical Concepts
in Media and Cultural Studies, Vol. 2, edited by S. Frith, 36–54. London: Routledge.
Sadie, S., and J. Tyrrell. 2004. Modernism. In New Grove Dictionary of Music and Musicians,
edited by S. Sadie and J. Tyrrell. New York: Oxford University Press.
Sartre, J.-P. 2004. The Imaginary: A Phenomenological Psychology of the Imagination. London:
Routledge.
Schuller, G. 1986. The Future of Form in Jazz. In Musings, 18–25. New York: Oxford University
Press.
Scruton, R. 1997. The Aesthetics of Music. Oxford: Clarendon Press.
Scruton, R. 2015. Art and Imagination: A Study in the Philosophy of Mind. London:
St. Augustine’s Press.
Stuckenschmidt, H. 1977. Arnold Schoenberg: His Life, World and Work. London: John Calder.
Subotnik, R. R. 1991. Developing Variations: Style and Ideology in Western Music. Minneapolis:
University of Minnesota Press.
pa rt V

P O S T H U M A N ISM
chapter 27

Son ic M ater i a lism

Hearing the Arche-Sonic

Salomé Voegelin

Introduction

This chapter tries to make a contribution to current ideas on materiality, reality, objectivity,
and subjectivity as they are articulated in the many texts on New Materialism that have
emerged recently under the auspices of speculative realism, object-orientated ontology,
complexity theory, and various other current and emerging “subgenres” that all share a
renewed interest in the status and understanding of materiality, material relationships,
and the role of the human subject in the context of a contemporary world, whose techno-
logical and actual globalization demands a new critical engagement and scholarship to
grasp the impact and to articulate the significance of its fluid interconnectedness. The
origin of the term “New Materialism” invariably gets sited in the mid- to late-1990s,
where it is associated chiefly with the writings of Manuel DeLanda and Rosi Braidotti,
whereby the newness of its project or its status as a continuation of traditional materialism
remains debated and debatable. Nevertheless, in current discourse the term acts as a
shared name for different approaches toward the question of materiality and subjectivity
in a digital age. It covers an interest in the relationship between nature and culture,
“naturecultures,”1 and brings with it a critique of an anthropocentric worldview. It is
articulated variously in relation to climate change and its amplification of ecological
consequentiality; it engages in the organization and significance of the global flow of
capital and goods, and gives words to the consideration of a concurrent fluidity or fixity
of persons; it presents new strategies to engage in issues of identity, sexuality, race, and
feminism; and it provides a framework and tools to debate and bring into association all
those issues and dynamics to grasp the world and its material reality not as a stable and
singular construction but as a matter of agency, interdependence, and reciprocity that
impact on its social and political actuality. This chapter is placed in the context of these
theorizations that deal with the relationship between nature and culture, materiality and
subjectivity, and seeks to participate in the current discourse about matter from a sonic
560 salomé voegelin

point of view. This sonicomaterialist perspective is motivated by the idea that the invisible
mobility of sound is always already critical of the dualisms of a visuohumanist tradition,
in that it is always and by necessity focused on the in-between of things: their relation-
ship and interbeing. Sound is not “this” or “that” but is the between of them, and thus it
brings with it a conception of the world as a relational field.
To probe this interpretation and try its suggestions, this text focuses on the writing
of Quentin Meillassoux, whose book After Finitude (2009) can be understood as a
central if somewhat eccentric articulation of New Materialist considerations.
Meillassoux critiques an anthropocentric view of the world, attributed by him to the
correlationism of phenomenology and metaphysics in general. In its place, he pro-
motes the mind-independence of mathematics to measure and calculate a world
before and after human experience. Thus, he sets out the possibility of a human free
conception of the world that eschews what he perceives as, on the one hand, the
“fideism” of phenomenology and, on the other, the absolutizing idealism of transcen-
dental philosophy, which, in any event, he understands to ultimately produce the
same dogmatic conceptions.
In what follows here I engage in his charge of an anthropocentric worldview by
considering the proposition of a posthumanist theorizing through a focus on sound,
creating an invisible imaginary of the material world. The contention is that the sonic
sensibility, articulated in sound practice and discourse, precedes and enables the
concerns of New Materialism. Sound’s ephemeral materiality and invisible relationality
informs the concepts, and grants perceptual access to the ideas discussed currently in
relation to materiality and subjectivity. In this sense, sound and listening establish a
proto–New Materialist sensibility that is present as a minor strand and challenge within
materialist philosophies already, but which only now, in the context of a renewed atten-
tion on agency and interdependence, is able to question its humanist rationality and
dialectical stance. Accordingly, we could consider whether, without the emergence of
sonic practice, discourse and sensibility in art and the humanities, in everyday thought,
and in science, New Materialists would find it harder to conceive of and be understood
in their articulation of “fragile things,” “speculative turns,” and “dark ecologies,” which
are some of the terms and concepts used to theorize a new materialist world. The con-
nection might not be entirely conscious; most theorists writing on materiality today
might never have thought to listen, but it might nevertheless be an important if some-
what subliminal influence: a hidden Zeitgeist, something in the air that has shifted focus
away from the apparent certainty of what we see unto more ephemeral and darker
structures that might well sound, or for whose fragility the material of sound might
serve as metaphor. Thus, I would like to contend that New Materialism presents a
quasisonic consciousness of the invisible, the relational, the dynamic eventness of
things, their predicativeness, and duration.
The aim however is not to prove the superiority of sound as a concept and theoretical
device. Nor is it my intention to produce an essentialized position. Rather, as with
much of my work, the objective is to revisit the nominal and habitual reality of things,
sonic materialism 561

so often set within the boundaries and certainties of a visual language and anchored in
the visual witnessing of the object itself, in order to articulate another possibility of
what there is.
A sonic sensibility invites a different view. It generates a world of fleeting things and
coincidences that demonstrate that nothing can be anchored, and everything remains
fluid and uncertain, not necessarily as precarity, as a state of anxious fragility, but as a
serendipitous collaboration between the multiplicities of the “what is.” Sound, I will
argue, aids the reimagination of material relations and processes. It makes appreciable
other possibilities of how things might be and how things might relate, and serves to
consider positions and positionings of materials, subjects, and objects in a different and
more mobile light.
I will argue however that the fluidity proposed and the relations intimated are not, as
Meillassoux might fear, the fanatical and egocentric imaginings of a correlationist in
search of a de-absolutized world. But neither do I seek shelter in his “mathematical
world” that has expunged humanity from any involvement in the what is. Rather, I
believe that listening as an attitude to the world practices the ambivalence between
measure and experience. And it is in practicing rather than resolving this ambivalence
that we can reach what, at this moment, appears incommensurable, merely possible and
even impossible, to diversify the rationale of logic and reason itself rather than disappear
in a plurality of factions.
A sonic materialism thus presents not an absence of reason in immersive noncriticality
and fanatical egotism. Instead, it foregrounds personal responsibility and participation:
not to deny our being in the world and the world being for us what it is through our
being in it, but to embrace the human ability to think this position as relative rather than
central; to appreciate our responsibility in how the world is: politically, ecologically, and
socially; and to initiate change and a different attitude, rather than withdraw into an
infrastructure of numbers and codes, which as I will argue, always and unavoidably are
the design of a human-thought world. An auditory imagination does not produce
Meillassoux’s “fideist obscurantism” of a proper truth, his conception that phenomenology
and metaphysics depend on belief and piety instead of truth, thus denying factuality and
reason their singular condition of possibility in favor of unlimited irrationality and
fanaticism;2 and neither does it engage in the “communal solipsism” that he attributes to
them. Instead, a sonic conception and sensibility of the world is the point of access to
pure possibility as actuality.
This chapter will elaborate on these ideas through the practice of listening to three
sound art works: my audition of Toshiya Tsunoda’s Scenery of Decalcomania (2004), an
album of seven tracks, allows me to enter the world by its vibrations and to hear its space
as events and interactions; listening to the sound transmitted through the porous body
in the performance Ventriloqua by Aura Satz (2003), I am initiated into a place of other
voices; and my absorption in the pulsating drip of Anna Raimondo’s rhythmic words in
Mediterraneo (2015) makes me hear the relationship of language, materiality, and
belonging through their fluid boundaries.
562 salomé voegelin

The Ancestrality of a Sonic World

In his widely discussed and oft-quoted work After Finitude, which could be described as
an infectiously peculiar cornerstone of New Materialism, Quentin Meillassoux sets out
an argument for ancestrality, the measure and articulation of a world anterior to humanity,
in order to achieve the principles of a human-free conception of the world. The materi-
als and events of such an anteriority he calls “arche-fossils,” and he wants them to be
understood not simply as present traces of the past, but as indicative of a logic and rea-
son able to grasp the anterior without a present human experience. The aim throughout
the book is to generate the condition of this nonhuman ancestrality, to be able to reach
beyond ourselves into a space devoid of ourselves that might ultimately shed light not
only on what was but also establish an understanding of the “what is” without the
specter of human perception.
His anteriority encapsulates an ulteriority too, and together they generate a concep-
tual space beyond finitude, whose content, material, and organization is experientially
inaccessible. This inaccessibility gives cause and justification to his critique of corre-
lationism and leads him to propose the mathematizing of nature: to establish the sta-
bility of its laws as “a mind-independent fact . . . that is indifferent to our existence”
(Meillassoux 2009, 127) and thus capable of making accessible a world without us
through speculation that excludes metaphysics and thus excludes the human point of
view and finitude.
His argumentations for an after finitude begins with a critique of the strong correla-
tionism of phenomenology and other metaphysical philosophies, which he understands
to occur as a counter to the absolutism of transcendental idealism and to result in equally
dogmatic fanatisms. While he appears to agree with the need to critique transcendental
universalisms, and the dogma of the absolute, he is looking for another solution based
on facticity and the contingency of facticity: on the fact that the world “is there,” rather
than on my own contingency in a world that “is there for me.”3
In relation to this, strong correlationism presents itself as the dogma of a contingent
perception that does not appreciate that things might be otherwise than they appear to me.
In other words, it appears to leave no room for speculation: for a speculative materialism
that can gain access to the anterior, the nonhuman world without making it “wholly
other.” Meillassoux seeks to overcome this problem with metaphysics by promoting
decorrelation through data and numbers. Leaving aside for now the question of science’s
truth and objectivity, its supposed nonanthropocentrism, and whether it does indeed
represent the mind-independent facts on which his thesis hinges, the problem of corre-
lation and the device of ancestrality is intriguing and useful to developing sound’s
contribution to New Materialism.
Meillassoux’s After Finitude lends inspiration to the aim of a non-human-centered
conception of a sonic world. His ancestrality offers a conceptual space to the methodology
of a sonic discourse, allowing us to reflect on the nature of sound behind and in front of
our lives, and thus enabling us to contribute from the invisible mobility of its materiality
to the conception of a posthumanist world.4
sonic materialism 563

At the same time, as I borrow Meillassoux’s notion of ancestrality as a critical device

for the development of a sonic materialism, sound produces a critique of his mathematical
speculation: I cannot measure a past sound, only past sources of a sound. The arche-fossil,
“the material support on the basis of which the experiments that yield estimates of
ancestral phenomena proceed” (Meillassoux 2009, 11), is at least a quasivisual object,
which gives grounding to a visual sense of objectivity, measure and truth. And thus, the
ancestrality traced in its shape conveys a visual world.5
A paleontologist might be able to deduce the sound of a diplodocus by the measure of
its jaw, its chest cavity, and the capacity of its lungs, but the material of the sound is not
only the measure of its source. It is not only that of the facticity of its making but includes
the factuality of distance, the environment and climate of its path to reception, which
might well be plural, and must include other measures of anatomy, climate conditions,
and geographical positioning, to say nothing of the cause of the cry, be it hunger or fear,
or the anticipation of its reception. Sounds are relational: they have a spatiotemporal
thickness beyond the visual material from which they seem to emerge and to which they
are expediently but rather incorrectly attributed. Sounds do not rest on a supporting
material; there is no sonic arche-fossil. Instead they invisibly and inexhaustibly generate
the texture of the world, and it is a challenge to calculate this texture backward into
preterrestrial life.
As much as one dinosaur would have heard not just another dinosaur but also his
desire and fear, his location and circumstance, so I too do not just hear a car, but hear the
sound of a car from my particular position inside the flat sitting at the window through
which the sound of the car as material thickness travels together with that of the rain and
the wind and all of them meet the sound of my fingers typing and the ticking of the clock
to produce a complex materiality whose measure is not strictly additive and thus is not
accessible through mathematical speculation only. This sonic material is all given at
once but does not grant immediacy: there is no back that I cannot see but can calculate
the dimension of, there is only simultaneity and yet there is no sense of a form, but only
mobile and invisible formlessness. This unseen formlessness demands participation. As
I hear it I hear myself, not as two distinct entities but as a timespace material generated
through our simultaneity, and that is what I hear and how I am heard at the same time.
The sound as material is an event, an expansion in time and space, that generates an
environment, which I inhabit not at the center of it but centered by it. The material
object of the car sounds as thing amidst other things of which I am a thing too. Together
we are producing vibrations that are not their own measure but a more ephemeral thick-
ness within which we can perhaps gauge the ancestrality of the sonic world.

Arche-Sonic Vibrations

Toshiya Tsunoda’s Scenery of Decalcomania from 2004 is an album of seven tracks

that use vibration plates, oscillators, contact microphones, and gates to trigger and
record the vibration of things. He uses natural occurrences and creates events that cause
564 salomé voegelin

vibrations to travel through a certain space—through bottles and cylinders, pipes

and copper foil—and observes how these vibrations affect this space “or to put it
another way, a space is made to appear through vibration” (Tsunoda 2004).
Listened to on headphones, the sounds seem trapped and hardened. Unable to make
connections to the outside world they congeal to small and intense abstractions.
Compact and taut sound bits bouncing around between my ears, they have nowhere to
go and nothing to connect to and thus fail to generate the timespace of their materiality.
They are fossilized, calcified into the shape of their own essence, the measure and
description of what they represent, “I used three glass bottles, three vibration plates and
sine wave as material rather than what they produce as vibrations” (Tsunoda 2004).
This observation at once questions the ideas of a fossil-knowledge, not for its accu-
racy but for what it can convey, and introduces vibrations as an alternative vestige of
ancestrality.
On a speaker system, the sounds of Tsunoda’s vibration recordings attain exteriority
and start to affect the shape and texture of the space that I am listening in. The quieter
and louder crackles, the low frequency buzzes and drones, as well as the high-pitched
oscillations appear not to come from the loudspeakers but from the walls, the floor, and
the furniture and even from myself; and they do not disappear into their materiality but
mobilize their appearance to give them their extensionality. Tsunoda’s sound is diffuse,
it has seemingly no direction or provenance and does not privilege one connection over
another. Its vibrating materiality holds and generates subjects and objects in simultaneity
and without partiality.
The seven tracks exist and move in a timespace that they generate but which includes
other things, and me, “thinging” with its hum. I cannot separate them out from the
whirr of the fish tank in the corner or the murmur of the fridge in the kitchen, which
become other vibrational bodies of his compositions: “Is this a scenery or is this an envi-
ronment?” Tsunoda asks in his liner notes and promptly answers himself: “ ‘Scenery’
appears as our view. ‘Environment’ appears as a view including us.”
The timespace textures of Tsunoda’s vibration recordings join the sounds in this room
to create an inclusive environment and also open the apparently unsound to their own
vibration: the tiled fireplace in front of me becomes a cavity and sounds as mobile
materiality, which it was all along, but its appearance needed to be triggered in an
interaction that shows the world as a field of invisible contacts of dark events, rather
than as a surface of measurable entities, of outlines, fossils, bones, and shapes.
I can think the individual sounds via their source, but this would mean to reduce the
predicate to the noun and to retain it within its visual boundaries and possibilities. In
contrast, as invisible vibrations, they are not reducible or quantifiable and instead
produce the environment in which we find each other in vibrating as an arche-sound
that is not fossilized but is the invisible mobility of all things, whose infinity is not
measured as a before and after my existence but is the inexhaustibility of the present.
This vibrational contingency is not limited to the comprehension of Tsunoda’s work
as a world for me, creating a solipsism that appreciates no other possibilities and that
shifts toward the “fideist obscurantism” of my own perceptual dogma. By contrast,
sonic materialism 565

through the specificity of my encounter I appreciate the fluid and unstable reality of my
contingency as one of many, none of which are “wholly other,” and all of which are
simultaneous, each as real and each as possible as the other. As I walk around the room, I
appreciate the plurality of the work: at each point, another vibration comes to the fore
while all others remain in play. Thus, I come to physically comprehend the simultaneous
plurality of the real, and rather than reduce its vibration to a set of numbers in order to
discount my physicality on the way to a plural but factional scenery, I hear a heteroge-
neous environment.
Tsunoda’s work reminds me of my existence in an ancestral texture of sound, whose
appearance, however, is not fossilized but moves on inexhaustibly. The “after finitude” of
a sonic sensibility does not present a certain finished form; it does not present “the
material support” for the investigation of ancestral phenomena—the geological for-
mation, the fossil imprint, the density of coal, and the rings of a tree—and it does not rely
on the possibility of a pure mathematics of nature “to demonstrate the integrity of an
objective reality that exists independently of us—a domain of primary (mathematically
measurable) qualities purged of any merely sensory, subject-dependent secondary
qualities” (Hallward 2011, 140) such as smell, sound, and touch. But while, as Peter
Hallward continues, the thing measured is indifferent to it being measured or what it is
measured as, the idea of measuring is absolutely subject-dependent.
The arche-fossil presents a reduction and deformation of the thing into its measure
that is akin to the reduction of Tsunoda’s sounds in the closed-offness of the headphone
or the absorption and deadening of sound in the acoustic isolation of the anechoic
chamber. Without the reverberation of sound within its environment, as concept and
actuality of material connection and exteriority, the vibrational thing does not expand
into its formless capacity but deforms into the condition of its measurement. And while
this conjecture might shed light on a world without human experience, since it is still
calculated from a human point of view, through the subject-dependent idea of measur-
ing, it does not enable access to a nonhuman world. The assumption would be that a
world without humans is a world without experience and possibilities unless they are
strictly speculative; the contingency of facticity rather than of the material itself. By
contrast, the ancestrality of sonic vibration is the phenomenon of its material, which is
infinite; it sounds now as an arche-sonic that brings me to the consciousness of a before
and after through my equal participation in its present texture.
In the texture of the world as a vibration-environment, possibilities do not negate
each other, causing plurality as dissent and factionality, which inevitably leads back
to strong and contested territories and identities. Instead, they trigger nonselective
connections and serendipitous collaborations between invisible things whose tex-
tures show me my responsibility and instill the humility of my own reflection.
Vibrations are the ground on which communication and communality is sought
rather than found. In this point, my motivation for a sonic materialism answers
William E. Connolly’s invitation to “respond to the charge of anthropocentrism in
order to fold more modesty into some traditional European modes of theism and
humanism alike” (Connolly 2013, 400).
566 salomé voegelin

Vibration as an arche-sound allows a phenomenological ancestrality that avoids the

charge of anthropocentrism and fideism through practice. It makes the world appear as
an invisible field of connections within which my body oscillates as a thing amid other
things. Vibration is the inexhaustible condition of this world that existed before me and
will exist after me and binds me into its texture, not at its center, but in its weave to which
I respond with the humility of my participation.
This participation is triggered by the formlessness of sound and its appearance as a
dis-illusion6 that neither invites a transcendental revelation, the recognition of an a
priori, nor a mathematical speculation, the numerical constitution of its form, and
instead promotes the “practice” of the object and the subject in doubt.
This is a phenomenological doubt in the certainty of the given that prompts the
suspension of habits of thought and instigates the perceptual practice of a passing
sense of things. Listening as a phenomenological practice tunes into the formless-
ness of the world and revisits nominal and habitual realities in the darkness of their
mobility. In doing so it does not aim to know what something really is, but engages in
the possibilities of the material and thus it at once illuminates and questions the
rationale and reason of material definition, use and value, and envisions what else
they could be.
This is not the practice of the acoustician, the sound engineer, the musician, or the
psychotherapist, all of whom have professional expectations and goals that immediately
signal a privileged position, determining what will be heard and how it will be evaluated.
But neither is it the solipsistic and precritical practice of artistic indulgence in an
“anything-goes.” Instead, the material equality of the encounter in the inexhaustible
flow of sonic textures serves as the guarantor for the humility of the perceptual
engagement that does not transform doubt into (anthropocentric) certainty but generates
a practice-based reality that includes the idea of measuring and experience without
resolving their ambivalence.
The contingency of my position in the arche-sonic texture of the world does not
signal the perception of a world for me. Rather, the practice-based contingency of a
vibrational-self connects me with the infinite flow of the world and gives rise and
opportunity to a posthumanism that does not need to replace the subject through
mathematical speculation, a speculation that, in any event, holds its own authorship and
subjectivity that is thus not banished but whose authority becomes invisible and thus
even more singular and nominal and potentially also less accountable. Instead, a different
subjectivity emerges that appreciates its responsibility toward a nonhierarchical listening
and practices the doubt in the appearance of things through the ambivalence between
its measure and its experience.
It is within the practicing of this ambivalence in listening to the vibration of the world
as its ancestral texture, rather than resolving it through speculation, that I believe a
promising and useful materialism might be found.
In this context, Tsunoda’s sonic vibrations offer an ancestrality that does not need to
expunge human perception to consider the what is. Instead, the vibrations make the
world a vibration-environment of simultaneous formlessness, which the listener practices
sonic materialism 567

to understand the equivalence of things, what they are together as a nonhierarchical

texture into which she is woven too. In this regard, vibrations are the texture that
connects measurement and experience and makes them collaborate. And while they are
not infinite neither are they inexhaustible, and they can make us hear different materials
and voices that are not negotiated through a pre-existing referent but come to speak for
themselves. Physiologically, but also politically and socially, we can tune into the hum of
the ancestral flow to try to discern different voices and different materialities that so far
were considered incommensurable.

Porous Bodies

The ancestrality of sonic vibration, its inexhaustible texture that sounds as an arche-sonic,
has not only the capacity to make accessible an anterior or ulterior world, to effect in me
the consciousness of a before and after terrestrial life. It also makes accessible an over
there and another place: an “extra-terrestrial” life, alien forms, and unknown things. In
other words, sonic vibrations, the arche-sonic, calls into the realm of the possible also
the impossible, that which for physiological, ideological, aesthetics, sociopolitical, and
economic reasons we cannot or do not want to hear, the possibility of its sound is central
however to a materialist critique of a human- centered rationality.

What Is the Vibrational Facticity of Impossible Bodies

and Things?
Aura Satz’s work Ventriloqua, originally performed in 2003 with her own pregnant
body and subsequently restaged with other performers, explores the ideas of elastic
and porous boundaries through which other voices might be made audible. The ven-
triloqua of the title convenes “the other’s” voice as metaphor and concept of the
absent. But it also assigns a real other voice to the unborn, as an unknown thing not
yet of this world. The artist sits on a white chaise-longue, clothed in a red custom-
made robe that abstracts her body and covers her face, and thus the potential of
her voice and articulation, while exposing the pregnant stomach through a round
opening embroidered with shiny sequins. Visually, her shape has lost its definition and
becomes identified instead through the state of pregnancy, the other, the not-yet-present
body rather than her own.
There is an absurdity in this emphasis, a strange reversal of the expected referential
order, and at the same time the abstraction stresses an archetype, a mother without a
voice, without a face, whose body is in the service of that other not yet heard voice to
give it a space in the world while her own is muted.
The ambivalence of this voicing and unvoicing is brought center stage through Satz’s
role in the performance, and becomes a central part also of my reading of this work in
568 salomé voegelin

relation to sonic materialism and the idea of an egalitarian sonic-texture. Satz, and each
of her subsequent pregnant stand-ins, are not the performers of the work, they are its
conduit; they are a social conduit and vessel for another voice rather than the contingent
formlessness of their own particular articulation.
The pregnant form reclines on the chaise longue, with one hand she holds on to the
antenna of a Theremin placed on a tripod next to her. Holding on to it her body becomes
the extension of the instrument as another antenna. In this way, she opens up her own
sonic range to the Theremin that, in turn, is calibrated on her body. This calibration is
unstable and needs constant retuning as the human body presents an inefficient conduit
in the sense that it is not finely tunable but brings its own disturbances to the performance.
This inefficiency demonstrates the fluctuating and mobile capacity of physicality and
indicates the illusion of pure mediumship: the channeling of another voice, a separate
spirit, or of ancestral data, without the impact of the medium itself. It puts into doubt the
sustainability of mind-independent facts, and stresses the ambivalent relationship
between the voice and the unvoiced, between the present and the absent, which are not
absolute but ideological and applied.
The body as Theremin is controlled by the performer in close proximity but without
physical contact. In this first enactment of Ventriloqua, the Thereminist Anna Piva plays
the body by moving her hands just above the skin protruding through the sequined gap
in the costume. Her hands move through pronounced physical gestures, producing the
visual shapes that play invisible oscillations and amplitudes. The electric signals thus
generated are sent to an amplifier and emit via loudspeakers sounds that emerge as
modulating tones and surging vibrations issuing from the skin and into the auditorium.
The atmosphere is séance-like: the room is darkened and a single light illuminates the
protruding white globe of skin as it is made to sing. This turn into darkness carries an
occult undertone that pervades the work’s performance. The voice of the unborn as alien
spirit, is called into the room through an act of mediumship. Its inaudible voice is seem-
ingly channeled through the Theremin-body and made to speak pre-birth.
There is the potential that the making audible of the unheard, rather than pursuing a
posthumanist equality of materiality and an inclusive politics of the voice, steps into the
mystical and fanatical that Meillassoux ascribes to strong correlationism and that
Theodor Adorno fears in relation to astrology and the occult. The parallel is intriguing.
Both correlationism and the occult are responses to a philosophical rationality of
absolutes that leaves no room for faith, for contingency and self, and yet, according to
both Meillassoux and Adorno, each in turn ends in its own dogmatic obscurantism.
Adorno, in his text The Stars down to Earth, writes against mysticism and the occult as
the cornerstones and antecedents to fascism and totalitarian governance. Focusing on
the pervasiveness of astrology through a study of daily columns of “Astrological
Forecasts” by Carroll Righter in the Los Angeles Times, Adorno produces a Thesis
Against the Occult in which he argues that monotheism is decomposing into a second
mythology that separates the spirit from the body, the material experience of the world, and
that critiques materialism while seeking to “weigh the astral body” (Adorno, 2004, 177).
In other words, his thesis, developed over nine key observations, ridicules the occult
sonic materialism 569

as a “metaphysics for dunces” that draws its rationality from the irrationality of a fourth
dimension, a nonbeing that claims to answer all the questions about the material world.
His critique remains serious, however, since he fears:

the power of occultism, as of Fascism, to which it is connected by thought patterns

of the ilk of anti-Semitism, [which] is not only pathic. Rather, it lies in the fact that
in the lesser panaceas, as in superimposed pictures, consciousness famished for
truth imagines it is grasping a dimly present knowledge diligently denied to it by
official progress in all its forms. (Adorno 2004, 175)

Sonic materialism, although invested in bringing a different consciousness to the mate-

rial world by foregrounding its invisible dimension, does not pursue a “mystical materi-
alism.” It does not focus on a “fourth dimension” removed from the material and the
body while purporting to answer the questions of its formation (and thus holding power
over its identity and governance). Rather, the facticity of a sonic materialism describes
the practical intertwining of the body and the material in doubt as the practical condition
of suspended habits and nonhierarchical simultaneity. It does not demarcate a virtual
materialism but reconnects the invisible mobility of sonic processes to their manifes-
tation and consequence in the world.
Correspondingly, Satz’s work does not perform a mediation of unheard voices as an
embrace of the occult but as a re-vision of the habits and values of the real. The flow of
sonic vibrations mediated from the pregnant body via the Theremin does not summon
the unborn as astral body, but performs the relationship of its voice to that of its mother
and highlights the social condition of her silence. The piece points to an unvoicing, to
the moment where one voice usurps and stills another; a reduction and deformation of
the mother away from her own form not into a serendipitous and collaborative formless-
ness but into the socially deformed figure of motherhood. This interpretation is
central in relation to a sonic materialism that observes the world in sound as a vibration-
environment in which we find each other in vibrating as an arche-sound that is not
mystical and remote, calling from afar, but is the invisible and inexhaustible texture of
all the possibilities of the world. The inaudible is not an audible of another world, but is
what sounds here, as another possibility of this world. Sonic materialism acknowledges
the proximity through which I comprehend physically, in practice, the simultaneous
plurality of the real, rather than reducing its vibration to a measurable quantity, or
separating it through a reificatory idea of spirits and the immaterial.
In this regard, Adorno’s Thesis Against the Occult also serves to re-evaluate
Meillassoux’s speculative materialism and his insistence on mind-independent facts.
And it gives cause to the need for a careful nonanthropocentrism that does not simply
expunge the human and sever material speculation from the perceptual process, initiating
a mathematical or mystical materialism without a body–soul connection and with only
a fanatism of numbers or the occult to draw on. Such “mythical speculation” denies the
inefficiency of the human in that it does not account for the disturbances of the body
and the mind in the transmission of measures and spirits. Instead, it pretends that a
570 salomé voegelin

direct and unaffected conduit to astral and mathematical systems, free of human design
and intervention, is indeed possible, and that what we measure and hear are the unfettered
computations of ancestrality and the true voices of the spirit world.
In response to this we need to take care that materialism does not result in ventriloquism:
the speaking for something/somebody else through a human-designed channel of
spirituality or calculation masquerading as mind- and body-independent fact, a process
that equates to a hyperanthropocentrism hiding in a mythical or mathematical under-
growth. Instead, we need to remind ourselves of Connolly’s call for more modesty about
our status in the world in relation to the “traditional European modes of theism and
humanism” to grasp responsibility and pursue a different relationship between the
voiced and the unvoiced.
Within this objective, ventriloquism becomes a useful device and metaphor to
conjure the other not as a separate other, neither a spirit nor a measurable quantity, but
as a voice that sounds simultaneously but is not heard: an extrasocial rather than an
extraterrestrial, whose sound thickens the perceived reality of the world through the
actualization of its impossibility. In this sense, listening to the ventriloquist sharpens
our sensibility and care, and fosters a practice of listening-out for the unheard or the
overheard to draw the inaudible as another possibility from the impossible into the
simultaneous plurality of the actual.
Ventriloqua, as a listening-out for the unheard materialities of this world, defines a
useful attitude to material relations as well as toward notions of presence and absence
understood not as dialectical absolutes but as the possibilities that can, and the possibilities
that cannot make themselves count in the actual world. As one defined inaudible voice
is sounded through the Theremin, that of the unborn child, we are reminded of other
voices, historical and present, which have not been heard. And as the threshold of
possibility becomes porous, impossible things start to present themselves in the
sonic-vibrations of the actual world.

Political Textures

Sonic vibrations, the arche-sonic texture of the world, reveal seemingly impossible
modulations that are not reducible to the volume of past sounds or the spirits of other-
worldly voices, but demand they be heard within this world. And thus, within the texture
of this world, appears that which we cannot or do not want to hear and which demands
to be heard, to make itself count as a slice of the real. This forceful appearance of impossible
things in the midst of our actual world challenges the notion of difference and distance:
two terms and values that are at the center of the humanist project that seeks to know the
world through the rationality of differentiation and the ability to read its relationships as
the distance between objects. Listening-out for inaudible things, as a sonic-materialist
attitude, by contrast seeks to understand the impossible through its proximity to my
own impossibility. In sound, we do not meet as difference or similarity, but negotiate
sonic materialism 571

who we are in a meeting that is primary, before definition, again and again, seeking
invisible and tentative recognitions of what we might be in the practical equivalence of
its texture.
Anna Raimondo’s work Mediterraneo, from 2015, brings us to the vibrations of the
unheard that texture a current sociopolitical reality but lack their own articulation. Her
voice, repeating over and over again the word “Mediterraneo,” takes us to the center of
the liquid expanse that is not simply between Africa, the Middle East, and Europe, a
mere connecting and separating passage, but is the material and metaphor of their
relationship as a deep and treacherous “what is.” Listening hears not one against the
other or their separation, but hears the in-between, the relationship, as the material of
the continents’ contingent facticity.
Listening to her voice, I suspend my belief in what I know to be on either side. I find a
focus not in their distance and what that denotes, but hear in the materiality of their
primary relationship other possibilities of what they could be.
In sound, the Mediterranean is the crossing not the crossed. It is not the infrastructure
of connecting and separating, a bridge between continents that enables us to cross while
at the same time maintaining the distance that exists in the first place, determining
either side through the actuality of what it is not. Rather it is a volume, a material
inhabited in listening, whose traveling within is not about my purpose or provenance,
and it is not about my sameness and their otherness: the real actuality of this continent
and the apparent impossibility of that, but about the possibility of the water’s own
expanse and how time and space define things together.
The crossing enables simultaneity. It performs the intertwining of the self with the
world, and of the continents of the world with each other. These continents are not
absolute territories but are expansions of each other whose impossible meeting points
sound in the middle of the sea. At the same time, this self is not a positive or a negative
identity, and neither is it an anthropocentric definition, but an uncertain and contingent
subjectivity, constituted in an inhabiting practice of perception that is crossing
boundaries not to measure and name but to engage in their watery depth to understand
the defining lines through the self ’s coinciding with them, rather than dispassionately
and from afar.
Distance creates the distortion of dis-illusions, which promises resolve once we step
closer. By contrast, the simultaneity of inhabiting creates the dis-illusions of plural
possibilities that are not resolved into one singular and actual real—war, fighting, right,
or wrong—but that practice the inexhaustible ambivalence between measurement and
experience: what something is as numbers and what it appears to be in perception, so
that we might understand and respond with engaged and practical doubt to what seems
incommensurable from ashore.
On a bleached-out white background we see a glass slowly, drip by drip, filling with a
blue liquid that, as the poet Paul Claudel would say, has a certain blue of the sea that is so
blue that only blood would be more red. And as the sound of dripping water slowly fills
the glass, Raimondo’s voice catches her breath, accelerates, slows down and stutters,
speeds up again, and repeats and repeats “Mediterraneo” until her voice is drowned in
572 salomé voegelin

the water she has conjured with her own words. Until then, on the unsteady rhythm of
her voice, we are pulled through the emotions of fear, excitement, hope, and death that
define the Mediterranean as the liquid material that is “the between” of Africa, the
Middle East, and Europe today, and whose material consequence does not stop at the
coastline but offers us the texture to hear its vibration and to understand how we are
bound up with it.
Raimondo’s work brings us into the urgency of the situation through the focus on the
materiality of the sea as the common texture of the adjoining continents rather than
through the confrontations of their different shores. The repetitive mantra of her voice
entreats me to enter into the water in order to—from within the fluid materiality—
understand physically the complexity of its fabric, form, and agency: of what it weaves
together formlessly rather than what it is as a certain form; and in order to suspend what
I think I know of it and pluralize what it might be as the invisible organization of different
things: salt, water, waves, holidays, routes of escape, yachts, aquatic life, sand, handmade
dinghies, dreams, and desperation. Listening, I am persuaded to understand these
things in their consequential and intersubjective relationships: what they sound
together as sonic things and what thus they make me hear.
Sound creates a vibrational-texture of the processes of the world that I hear coex-
tensively and to which I am bound through my own sound. By contrast, a soundless ocean
pretends the possibility of distance and dissociation, to be apart as mute objects and to
be defined by this distance. The absence of sound cuts the link to any cause and masks
the connection to any consequences. Thus, a mute Mediterranean enables my withdrawal
from the sociopolitical and ecological circumstance of its waves and permits the
rejection of my responsibility in its unfolding.
Raimondo composes, from the hypnotic rhythm of her voice and the steady dripping
of blue water, the political reality of the Mediterranean. Slowly submerging, with her
words, into the deep blue sea, I abandon my reading of its terrain within the rationale
and reason of existing maps and come to hear its texture as woven of unresolved
material and positions. I do not follow its outline but produce a dark and mobile geography
of the Mediterranean as a formless shape, whose possibilities and impossibilities
undulate to create a fluid place that defies calculation but calls forth an attitude of
listening-out to understand where things are at and to take responsibility within that
invisible factuality: within this dark and mobile geography, we hear, as Connolly suggests
we should, “the human subject as formation and erase it as a ground” (Connolly 2013, 400).
In the watery depth of Mediterraneo, humanity appears as formless form that has lost
the access to its grounding in the traditions of knowledge and established canons of
thought, in political certainties and journalistic judiciousness, as well as in relation to
historical and geographical identities. Instead, the rhythmic drip, drip, drip, and the
reiteration of its name call for another ground, a groundless ground of invisible pro-
cesses based on the responsibility of a practice-based subjectivity that appreciates the
consequentiality and intersubjectivity of things without controlling them.
Having been transported into the middle of the sea by Raimondo’s audiovisual work,
we can hear the world as the vibrational-texture that binds us all and everything into an
sonic materialism 573

ecosystem of invisible processes. This does not mean that some do not have more power
than others. Simultaneity does not prevent hierarchies. Instead, the simultaneity of
the sonic-texture makes visible the interdependencies of power, organization, self-
organization, and control and provides an opportunity to revisit economical and political
values that depend on the divides and distances established in a humanist philosophy
and perpetuated in the ecology of the visual. A sonic reality emerges not from maps and
words but from the fluidity of blue liquid and the drowning of the voice. And as the
fluidity gives access to a groundless world, a world without a priori reason and rationality,
the drowning words do not fade but re-emerge in the plurality of the inaudible.
The posthumanist impetus of sonic materialism does not expunge the human but
shakes the ground he stands on to make himself taller. This is not so that no ground can
be established but rather enables the grounding to become practice-based, contingent
and plural, based not on mind-body-independent speculation but the suspense of
habits and the beginning of doubt, and that includes doubt in the normative habits of
a singular authorship.

Conclusion

The paradox of Meillassoux’s speculative materialism is the disavowal of the authorial

voice by that authorial voice. The contradiction is striking, and its consequences not
only benign. His desire to replace human perception through material computation
fails to acknowledge the power and nominalism of his own authority. And it ignores the
fact that it is only a minority of voices that can make themselves heard in a current
actuality, producing narrow norms and values of how things are and of what needs to be
done. His mathematical foreclosure of human perception ignores the fact that the
anthropocentrism he critiques is the perspective of an economically, class-, gender- and
race-defined minority. His speculative materialism silences the ignored and denies the
opportunity to those not yet heard to even comment on their own inaudibility within
this human perspective.
In response, a careful nonanthropocentrism neither mutes nor speaks for the other,
subjects or objects, but adopts an attitude of attention and curiosity to hear the other speak,
not as an ancestral or a spiritual voice, but as another voice of this world. This invites an
understanding of the ephemeral mobility of things and of subjects as things, without
anxiety of fragile perishability, but in terms of the serendipitous potential for a col-
laborative worldview. The aim is to not only pluralize articulation but also the ground,
the reason and rationale of the process of communication: how things are said and how
they are heard, judged, and incorporated into the reality of actuality, or left out as irrelevant,
marginal, unimportant.
Satz’s work presents the porous nature of the body, its ability to let the other speak
through her form, but it also brings into play the inefficiency of transmission and serves
as a reminder that neither the body nor the mind can act as pure conduits. Both affect
574 salomé voegelin

the material through their own “disturbances” that manipulate and distort the others’
voices and construct a hyperanthropocentric ventriloquism that fails to see the impact
of its measure on the heard. Consequently, a sonic materialism does not pretend to be
able to speak for the other, it does not ventriloquize, and instead calls for an attitude of
“listening-out for,” a stance of care and humility that hears the possible and the impossible
in the vibrational texture of the world. This texture interweaves the voiced and the
unvoiced as reciprocal and simultaneous things that are not hierarchical but speak of the
hierarchies of the world.
The aim is to hear a plurality of authorships and acknowledge the self-authoring of
nature and of material that we can translate carefully, as Tsunoda does in his vibration
recordings, to make them accessible and thinkable, always in the knowledge, however,
that there are no mind-body-independent facts but that our body and mind will always
diffuse and influence what it is we hear.
In this sense, sonic materialism is a phenomenological materialism, which is not a
contradiction but an acknowledgment of the subject as thing thinging amid other things
and an articulation not of its control over the material world but of its responsibility
within it. Materialism is thus a relationalism, not of different things but of things
together. The material is not an entity but is the vibrational texture that things create
simultaneously through the “equal differences”7 produced in their encounter with each
other rather than beforehand.
I comprehend the alterior and ulterior, as well as the extrasocial, not as human exclusive
domains of numbers and spirits but through my position in the flow of their vibrational
texture. The arche-sonic weave of this texture holds the possibility of the before and after
as well as of the over there. It produces the concept of my finitude not as an absolute but
as an element of its infinitude that is accessible to me through the continuous processes
of reciprocation and generation of material relations within which I exist as a thing
among other things.
Phenomenological ancestrality is the before and after accessed through the inex-
haustible formlessness of a present sound that I inhabit in intersubjective contiguity.
The mathematical ancestral and the spiritual astral by contrast rely on distance and
absence to assure and assert their measurement of the real. In this sense, they are entirely
visual concepts: they overcome, mathematically and through mediumship, a temporal
or spatial distance in order to know and sense a place or a thing that is nominally
without them; and while this might make the other talk and the ancestral yield its
measure, his voice and computation is channeled through the distance needed for its
reach in the first place. This distance is at the basis of a visual materialism that seeks to
omit the human but keeps the gap and difference between things that serve human
articulation, measurement, and thought.
A sonic materialism does not start from this distance but from within the texture of
the world, which includes me simultaneously as a thing in the weave of things.
Interwoven in its flow, I understand the contingency of my position not as absolute, as a
position for me, but as a matter of the facticity of the world, which thus becomes accessible
sonic materialism 575

to me as a proximity where the measure is not between things, or between me and the
world, but is the relationship that we form. Thus, there is no need to overcome a distance
in order to understand the mobility of the world. There is no sonic sublime that shapes
the conceptual ground of articulation and propels perception toward idealism. There is,
instead, embedded doubt, the suspension of habits and norms, which produces a
groundlessness that encourages not just a plurality of voices but a plurality of rationales
and reasons that hear and value their speech. I practice this plurality on the ambivalence
between measurement and experience, producing a complex sociopolitical texture from
arche-sonic weaves that bind me into my responsibility within its inexhaustible flow.
Where New Materialists theorize as speculation, I practice in doubt; and where they
are in search of the infinite, the anterior and ulterior condition of thought and existence,
I focus on the inexhaustible nature of sound that exists permanently in an expanded and
formless now that I inhabit in a present that continues before and after me.
In short, sonic materialism builds on the groundlessness of an auditory imagination
the critical attitude of a “listening out for” rather than an occult dream. And while I do
not share Meillassoux’s mathematical speculation, I share in his desire for a philosophical
position of infinity that serves to acknowledge that there is “more” than we can see and
experience. And I take this more to be the start rather than the conclusion of our
appreciation and participation in the material world.
Raimondo’s piece makes us aware that the world entered via such a listening attitude,
as sonic sensibility and Zeitgeist, is rather darker and deeper than first imagined. The
sonic is not self-certainly benign, peaceful, egalitarian, and just. Instead, it reveals the
conspiracies of the visual world and probes the political expediency of class systems,
dividing and ruling, in a sea of blue.

Notes
1. The term “naturecultures” was coined by Donna Haraway in The Companion Species
Manifesto (2003). It expresses a reciprocal and nondialectical entanglement of nature and
culture, body and mind, and so forth, and proposes a rethinking of the broader modernist
ideology represented in these dualisms.
2. Meillassoux justifies and contextualizes his turning away from philosophical thought
toward mathematical speculation by explaining,
it would be absurd to accuse all correlationists of religious fanaticism, just as it
would be absurd to accuse all metaphysicians of ideological dogmatism. But it is
clear to what extent the fundamental decisions that underlie metaphysics invariably
reappear, albeit in caricatural form, in ideologies, and to what extent the funda-
mental decisions that underlie obscurantist belief may find support in the decisions
of strong correlationism. (Meillassoux 2009, 49)
He further states “that thought under the pressure of correlationism, has relinquished its
right to criticize the irrational” (45) and that, paradoxically, a philosophy, phenomenology,
which sought to critique the absolutism and dogmatism of transcendence “has been
576 salomé voegelin

transformed into a renewed argument for blind faith” (49). It is therefore that, instead of
seeking insights into a post- and prehuman world via philosophy, he employs the mind-
independent sphere of calculation and measurement to argue its “proper” truth.
3. In the course of his book Meillassoux develops facticity, the pure possibility of what there
is, into the notion of factuality understood as the speculative essence of facticity: the fact
that what there is, cannot be thought of as a fact but is a matter of nondogmatic speculation,
a speculation that he ultimately pursues via mathematics.
4. The notion of posthumanism here does not refer to a world without humans, but to the
project for a different scholarship and sensibility, initiating a different philosophy that
does not simply continue the humanist path of an anthropocentric rationality and reason
by denying the hyper nominal subjectivity of philosophical tradition while perpetuating it
through the authorship of that very denial, but by considering a decentered human
subjectivity that lives not at the center of the world but is centered by it, aware of its
responsibilities, and humbled in its equivalence with other things.
This posthumanism acknowledges that the human at the center of humanism is not
every human, but a clearly demarcated and privileged identity: a tautologically privi-
leged subjectivity based at the center of humans’ own discourses that places them supreme
in the nominal understanding of the world that their very philosophy creates. Instead, the
aim is to contribute to the conception of possible philosophies whose objectivities and
subjectivities are plural but not factional and that are aware of the inevitable exclusion of
one point of view by another and are thus engaged in philosophy as a field of blind spots
that are practiced rather than theorized.
5. This interpretation of ancestrality as a visual consciousness does not outline a sonic essen-
tialism, and neither does it represent a critique of visuality. This text does not pitch visuality,
vision, or a visual literacy against sonicality, hearing, and a sonic literacy. Rather, the
critique of the visual as it is implied here is not a critique of its object, what we see, but of
its practice, the way we look and what we look for understood as cultural and ideological
practices.
The suggestion is that the ancestral, as it is staged and used by Meillassoux, relies on
narrow channels of vision that deny much of what else could be seen. In response, this
chapter promotes a sonic sensibility and engagement with the material world that achieve
not a blind understanding of its processes but augment the way we see the world.
6. Maurice Merleau-Ponty calls perceptual dis-illusions the probable realities of a first
appearance: “I thought I saw on the sands a piece of wood polished by the sea, and it was a
clayey rock” (Merleau-Ponty 1968, 41). To him the appearance of the piece of wood is not an
illusion, but a dis-illusion: the loss of one evidence for another. Accordingly, perceptions are
mutable and probable, “only an opinion”; but what is not opinion, what each perception,
even if false, verifies, is the belongingness of each experience to the same world, their
equal power to manifest it, as possibilities of the same world.
7. The notion of “equal difference” is articulated in my book Listening to Noise and Silence
via the equal significance of Sergej Eisenstein’s monistic ensemble of film montage, and
clarified, via Jean-François Lyotard’s agonistic play, as a nonhierarchical playful con-
flict of the sensorial material (Voegelin 2010, 141). Here, it is further developed as the
coextensive simultaneity of the material experienced and measured in a togetherness
that does not ignore difference but understands and generates it in perception rather
than takes it as a given.
sonic materialism 577

References
Adorno, T. W. 2004. The Stars Down to Earth. London and New York: Routledge.
Connolly W. E. 2013. The “New Materialism” and the Fragility of Things. Millennium Journal
of International Studies 41 (3): 399–412.
Hallward, P. 2011. Anything Is Possible: A Reading of Quentin Meillassoux’s After Finitude. In
The Speculative Turn, edited by L. Bryant, N. Srnicek, and G. Hartman, 130–141. Melbourne:
re.press.
Haraway, D. J. 2003. The Companion Species Manifesto: Dogs, People and Significant Otherness.
Chicago: University of Chicago Press.
Meillassoux, Q. 2009. After Finitude. New York: Continuum.
Merleau-Ponty, M. 1968. The Visible and the Invisible. Evanston, IL: Northwestern University
Press.
Raimondo, A. 2015. Mediterraneo. Audio-visual installation.
Satz, A. 2003. Ventriloqua. Performance.
Tsunoda, T. 2004. Scenery of Decalcomania. Album with Liner notes. Australia: Naturestrip.
NS3003.
Voegelin, S. 2010. Listening to Noise and Silence. New York: Continuum.
chapter 28

Im agi n i ng th e
Sea m l e ss Cy borg
Computer System Sounds as
Embodying Technologies

Daniël Ploeger

Introduction

When I first started Microsoft Windows 10, I felt something was missing. Or rather,
I heard something was missing. There was no startup sound. Since I first used Windows
about twenty years ago, there had always been a short sound sequence that welcomed
me at the start of a computer session. Now, the only thing I heard when the desktop
came on was a short and inconspicuous “prrt.” Why did the startup sound disappear?
Most people in the Western world and beyond will be familiar with the startup chime
of an Apple computer, the Windows error sound, and plenty of other operating system
(OS) sounds. However, despite the wide cultural reach of these sounds, studies of com-
puter sound have mainly been concerned with sound synthesis for musical purposes or
the simulation of human speech. Relatively little research has been done into the design
and use of sound as part of computer OSs (Gaver 1986; Blattner et al. 1989; Alberts 2000;
DeWitt and Bresin 2007), and, as far as I am aware, there are no studies that are dedi-
cated to OS sounds from a cultural critical perspective. In this chapter, I discuss the
development of the role of sound in the operation of computers from the mid-twentieth
century until the present, and contextualize this in relation to broader cultural per-
spectives on computer systems as cybernetic extensions of the user’s body. Building
on this contextualization, I will explore how common computer system sounds might
facilitate particular imaginations about the nature of technological extensions of human
bodies. In what ways do computer sounds affect the ways in which users imagine the
relationship between their bodies and their computers? And how can the design of
580 DANIËL PLOEGER

system soundscapes play a role in the propagation of certain ideological concepts of

technologically prosthetisized bodies, or cyborgs?

From Circuit Sonification

to Audio Branding

Early computers in the 1940s, such as the Harvard Mark I, were built with electric relays,
which meant that computational processes were audible because of the clicking of the
relay switches. Listening to these sounds, computer operators could often detect errors
or operation irregularities through variations in familiar patterns. For example, Phillips
engineer Nico de Troye recalls that:

The [Harvard] Mark I made a lot of noise. It was soon discovered that every
problem that ran through the machine had its own rhythm. Deviations from this
rhythm were an indication that something was wrong and maintenance needed to
be carried out. (De Troye quoted in Alberts 2000, 43, my translation)

However, once computers were built that used radio tubes or transistors, instead of
mechanical relays, computers operated in operated in silence. With machines like the
ARMAC, MIRACLE, UNIVAC I, and IBM 650, errors and problems could not be
heard anymore. At the same time, until the late 1960s, visual monitors could only display
very limited amounts of data so, despite some rows of small lights and a crude cathode
ray tube display, the input and output data—usually on paper tape—had now become
the only detailed computing information directly accessible to the computer operator.
Apart from a simple hoot that could be triggered at designated points in a program,
there was no longer a possibility to aurally monitor operations during the computing
process. Interviews conducted by the historian of science Gerard Alberts with Dutch
engineers who had operated early computers during the 1950s and 1960s indicate that
engineers regretted this loss of aural cues. They responded by connecting a loudspeaker
to the electronic circuits inside these computers and thus made the processing patterns
audible once more through what could be called an “auditive monitor” (Alberts 2000, 2).
Some of the engineers were still able to sing the patterns of particular operations when
Alberts interviewed them four decades later.
Thus, the role of sound in the operation of these early computer systems appears to
reflect a more widespread listening culture around industrial noises. In her research
on sound in industrial work places, the cultural historian Karin Bijsterveld (2006)
discusses how the motivations behind factory workers’ frequent resistance to the use of
ear protection from their large-scale introduction in the middle of the twentieth century
until—in some cases—the present day suggests that the aural perception of the patterns
Computer System Sounds as Embodying Technologies 581

of machine sounds forms a key component of operation monitoring and reassurance in

a broad range of manufacturing environments.
After the 1960s, the practice of listening to program routines became obsolete. On
the one hand, the computers’ processing speeds increased to such an extent that aural
monitoring of variations in amplified signals was no longer possible. On the other, the
possibilities to display detailed datasets on cathode ray tube monitors had increased
significantly so operation monitoring now became focused on the visual.
Meanwhile, aural cues continued to play a role in the form of signal tones from
loudspeakers that could be triggered by programming commands, more or less con-
tinuing the principle of the signal horn that had been part of the mainframes in the
1950s and 1960s. However, little consideration was given to exploring the possibilities
for design and application of these signal tones. Instead, composers, engineers, and
programmers with an interest in computer-generated sounds started to experiment
with the computer-aided synthesis of musical and speech sounds. In 1961, the engineers
John Kelly Jr. and Carol Lockbaum, in collaboration with the computer music pioneer
Max Matthews, managed to generate human speech sounds with the IBM 7094 main-
frame (Smith 2010). Famously, the 7094 sung the traditional song “Daisy Bell,” a feat
that was later referenced by Stanley Kubrick in 2001: A Space Odyssey (1968) where the
computer HAL sings the song just before his cognitive functions are disabled.
However, these music and speech synthesis endeavors were pursued largely separate
from the development of sounds as part of system operation. The loudspeaker that
was included as a standard feature in IBM’s Personal Computer 5150 in 1981 was still
only used for the emission of simple square waves for signaling purposes. Some early
home computers aimed at hobby users, such as the Commodore 64 and Atari ST, were
equipped as standard with more advanced sound synthesis capabilities, but, neverthe-
less, the startup and other OS sounds of these machines equally did not usually go much
beyond some simple square wave signals.
Thus, until the mid-1980s, OS sounds had been a largely neglected area in the devel-
opment and research of computers and their OSs. This changed with the increase of
computing power (and thus the possibility for more complex sound synthesis meth-
ods) and the emergence of an interest in the design of graphical user interfaces (GUIs)
from the mid-1980s. The latter followed the public release of the first GUI OSs: Silicon
Graphics’ MEX windowing system and Apple’s Lisa. Images can convey a lot of
information with little space, and users can recognize and interpret pictures much
faster than words (Schneiderman 1986). Building on this realization, research in icon
design was considered useful in the endeavor to make computer systems more acces-
sible and comprehensible for nonspecialist users and to thus optimize productivity on
the work floor.
Drawing on this research and development in the design of visual icons, a number
of developers started considering the design of auditory signals. Complementing the
visual dimension of the system with what were coined “auditory icons” (Gaver 1986)
or “earcons” (Blattner et al. 1989), the possibilities of sound were explored in an endeavor
582 DANIËL PLOEGER

to further optimize the user interface. Blattner and colleagues proposed an approach
to auditory icons that builds on an analysis of visual icons. Distinguishing between
“representational” (e.g., the Mac OS trash can), “abstract” (e.g., Adobe Creative suite
icons), and semi-abstract icons (e.g., the Windows icon), they proposed to design
auditory icons based on the principle of “iconic families.” Sounds with shared elements
would convey to a user that they are related to the same group of functions. Thus, a
combination of recognizable representational elements with interlinked abstract aspects
could facilitate an easy-to-learn network of auditory communication as part of the
computer user interface.
While in the 1980s the interest in auditory icons had been focused on efficiently
conveying information about the system’s operations in easily understandable audi-
tory forms, the 1990s saw the emergence of a different interest in system sound. In Joel
Beckerman’s book, The Sonic Boom (2014), Jim Reekes, the designer who created the
current Mac startup sound and many other Mac OS sounds, reports how in the late 1980s
he struggled to convince his superiors to replace ill-considered Mac sounds, and start to
approach sound as a form of “audio branding” (Jackson 2003); what affective response
will a sound evoke in relation to broader associations with elements of culture or nature?
Until the implementation of Reekes’s design for the current startup sound, Apple
computers used to play a tritone interval when switched on. In Western music history,
this interval has often been associated with negative feelings and, from medieval times
until the eighteenth century, it was commonly designated as the Devil’s interval.
Curiously, this aspect of the sound seemed never to have been considered by the sys-
tem designers, who—according to Reekes—thought sound design to be of little impor-
tance. Reekes eventually managed (more or less secretly) to replace the tritone sound
with the current chime which consists of two major chords that pan slightly between
left and right on a stereo speaker setup. Originally in C Major, it has been transposed
several times, but otherwise it has remained the same since its inception. Reekes’s
objective was to create a “meditative sound” that would act as a “palate cleanser for the
ears” (Reekes in Beckerman 2014, 12). Users in the 1990s heard the startup sound at the
beginning of every computer session and after system crashes, which occurred fre-
quently. Consequently, the startup sound was an important factor in users’ experiences
of brand identity.
Eventually, the relevance of careful sound design and affective audio branding, as part
of the development of OSs was acknowledged on a wider scale by software and hardware
companies. This is apparent from Microsoft’s decision to hire the musician and sound
artist Brian Eno to compose the startup sound for Windows 95. According to Eno, the
commissioning brief he received included about 150 adjectives: “The piece of music
should be inspirational, sexy, driving, provocative, nostalgic, sentimental . . . and not
more than 3.8 seconds long” (Eno in Cox 2015, 271–272). The design of OS startup
sounds, as well as signal sounds throughout the system, had now become a priority
in developers’ corporate branding strategies (for more on audio or sonic branding, see
Gustafsson, volume 1, chapter 18).
Computer System Sounds as Embodying Technologies 583

System Sounds and Affect

These reflections on corporate interests in OS sound design since the 1990s suggest that
there is an affective and potentially embodied dimension to users’ experiences of these
sounds. Reekes speaks about a “palate cleanser for the ears” and the adjectives referred
to by Eno obliquely refer to an incentive to establish a relationship between the user and
the computer (or the Microsoft Corporation) that goes well beyond a cognitive and
instrumental interaction into a more affective realm. Indeed, the media theorist
Deborah Lupton (1995), in “The Embodied Computer/User,” gives an account of com-
puting in the mid-1990s that confirms exactly this connection between OS sounds and
affect. She starts with a short personal anecdote about her own computer:

When I turn on my personal computer . . . it makes a little sound. This little sound
I sometimes playfully interpret as a cheerful “Good morning” greeting . . . the sound
helps to prepare me emotionally and physically for the working day ahead. (97)

Notably, the sound she is referring to here is most probably the rather crude fanfare
sound, which was included in the Windows OS, before the introduction of Brian Eno’s
startup sound in late 1995 just after Lupton was writing.
Brian Massumi defines affect as “a prepersonal intensity corresponding to the
passage from one experiential state of the body to another” (Massumi in Deleuze and
Guattari 1987, xvii). The application of sound plays an important role in the shaping
of affective responses in a broad range of cultural activities, ranging from marketing
(Bruner 1990) to activism (Thompson and Biddle 2013) and warfare (Goodman 2009).
Although long unconsidered by system developers, users’ affective responses to OS
sounds have shaped the experience of their interactions and connections with the
machines since the early days. This is also clear from Alberts’s reflections on the role of
the amplified processing sounds in early radio tube and transistor-based computers.
Before these machines were introduced, computing had been a manual operation,
which was accompanied by sounds of people working: historically on paper, using rela-
tively simple calculating objects, later aided by mechanical calculators. The relay-based
computer did calculations automatically, but it generated a reassuring sound that was
similar to what had previously emerged from the manual mechanical calculators on the
work floor. The accounts of the engineers interviewed by Alberts suggest that the loud-
speaker attached to the subsequent “silent” computers did not just act as a monitoring
device to check whether the computer was still operating correctly. The loudspeaker
sounds also provided a sense of comfort, they facilitated a “sensory restoration of the
relationship with physical calculation” (Alberts 2000, 45).
Indeed, more recent research into the design of sound in human–computer interaction
has investigated the potential of sound to facilitate affective user relationships with data
inside the system. Anna deWitt and Roberto Bresin, in their article “Sound Design for
584 DANIËL PLOEGER

Affective Interaction” from 2007, suggest the use of physical models of real-world
sounds to represent elements of virtual worlds. For example, they propose to sonically
represent the arrival of mobile phone text messages with the sound of marbles falling
into a metal box. More important messages would sound like heavier marbles, and by
shaking the phone the user could determine how many messages have arrived based on
the sound of a related number of marbles moving around. Thus, they argue that the
design of system operating sounds may be a way to “narrow the gap between the embod-
ied experience of the world that we experience in reality and the virtual experience that
we have when we interact with machines” (deWitt and Bresin 2007, 525).

Everyday Cyborgs

In the following, I will further examine the role of OS sounds in the embodied experience
of human–computer interaction. However, my interest is not in determining effective
methods for information transmission and the potential to forge a seamless transition
between embodied experiences of the physical world and the data that exist inside
computer systems, as is the case in the research of DeWitt and Bresin and the work of
Gaver and Blattner and colleagues in the 1980s. Instead, I will focus on how the OS
sounds discussed thus far might relate to broader cultural representations and under-
standings of human bodies and technology, particularly in the light of popular cultural
imaginations of the cyborg.
Before I continue, we should take a closer look at embodied experiences of human
computer interaction in a broader sense. In “The Embodied Computer/User,” referred
to earlier, Lupton discusses embodied computer user experiences. It is not surprising
that this text was written in the mid-1990s. This was the time when digital technology
and especially personal computers had become omnipresent in professional and private
life in the Global North. Lupton describes how, by the early 1990s, many people in
Western societies had come to feel dependent on digital technologies in their everyday
lives. A power cut at a research unit she visited left staff wondering what they should do
while their computers could not be accessed. As a consequence of this far-reaching inte-
gration of computers (and other digital technologies) in everyday life—which has only
become stronger today—people also tend to have an emotional relationship with their
computers; they commonly experience fear, anger, frustration, and relief as part of their
interactions with them. In her analysis of this phenomenon, Lupton builds on the femi-
nist scholar Elizabeth Grosz’s argument that inanimate objects that have been in close
contact with the body for extended periods of time become experienced as extensions of
the body image. According to Grosz, “[i]t is only insofar as the object ceases to remain
an object and becomes a medium, a vehicle for impressions and expression, that it can
be used as an instrument or tool.” Thus, in interaction with the body, an inanimate object
can become an “intermediate” or “midway between inanimate and the bodily” (Grosz in
Lupton, 1995 98–99). Drawing on this, Lupton suggests that, by the mid-1990s, instead
Computer System Sounds as Embodying Technologies 585

of the “human/computer dyad being a simple matter of self versus other, a blurring of
the boundaries between embodied self and the PC” (Lupton 1995, 98) has taken place
for many people.
If we consider the interactions between users and personal computers (or mobile
devices) from a cybernetic perspective, we arrive at a similar interpretation. In his expla-
nation of cybernetic networks, Gregory Bateson (1972) gives the example of the stick of a
blind man. He argues that this object should—from a cybernetic perspective—be
considered as part of the man’s body, because it constitutes a pathway for information
exchange between the man and the world around him. If we think of Lupton’s account of
the despair caused by the power cut in her university in the 1990s, or the discomfort
(or even anxiety) many people experience nowadays when they are unable to connect to
social networks due to a depleted smartphone battery, it is clear that Bateson’s argument
is also applicable in this context: while users’ conscious perceptions of the computer
may be as external objects with which they interact, in terms of their communicative
interactions with the world around them they fulfill the role of cybernetic extensions
of their bodies.
Accordingly, we can consider everyday human–computer interactions in the context
of the concept of the cyborg: a cybernetic organism. The term “cyborg” was first coined
in 1960 by the scientists Manfred E. Clynes and Nathan S. Kline in their article “Cyborgs
and Space” (1960), and further explored in Daniel S. Halacy’s Cyborg: Evolution of the
Superman (1965). Inspired by recent developments in space travel, Clynes and Kline
suggest that it is time for “man to take an active part in his own biological evolution”
(1960, 26), through the attachment of technological extensions to human bodies, in
order to prepare for living in extraterrestrial environments. Likewise, Halacy promotes
the technological extension of bodies in order to enhance their strength and capabilities.
In these visions, technological development is considered a neutral force that can be
instrumentalized as desired.
Since the mid-1980s, critiques of this technodeterminist approach to the concept of
the cyborg have emerged. Donald MacKenzie and Judy Wajcman’s anthology The Social
Shaping of Technology (1985) examines how technological developments are shaped
by—and complicit in the persistence of—existing sociopolitical paradigms. In this con-
text, Donna Haraway’s “Cyborg Manifesto” (1991) acknowledges that the image of the
cyborg has its origin in the military-industrial complex, but that it can also be employed
to challenge hegemonic divisions of gender; if the body’s parts and characteristics are
thought of as (theoretically) exchangeable for technological substitutes, this means that
traditional thinking in gender oppositions tied to a biological body becomes impossible.
Thus, for Haraway, the cyborg is an image of “a creature in a post-gender world” (1991, 150),
which allows us to move away from the binary thinking that underlies the distribution
of power in what Haraway calls “White Capitalist Patriarchy” (161).
However, despite these critiques and emancipatory visions for the cyborg, the
positivist ideology of the military-industrial complex of enhancement and strength has
remained a mainstay in imagined and realized cyborgs in popular culture and art until
the present day. Fictional characters in films and TV programs from the 1960s until the
586 DANIËL PLOEGER

present, like the Six Million Dollar Man, Robocop (Verhoeven 1987), and Ex Machina
(Garland 2015), are consistent with the idea of enhancement through implantation and
attachment of state-of-the-art technologies. Similarly, artwork and writing by artists
including Stelarc (1991), Neil Harbisson, and Moon Ribas has been focused on pro-
moting the idea that the human body can be made more capable through integration
of hi-tech components.

The Sonic Smoothing

of the Prosthesis

When considering human–computer interaction in the context of this idea of the

cyborg body, sound is of particular interest due to its affective qualities as discussed pre-
viously; OS sounds can play an important role in the “blurring of boundaries” in users’
embodied experiences of the interactions with their computers. In this context, Reekes’s
objective to put the user at ease with a “meditative” startup sound, and Microsoft’s wish
for a startup sound that evokes a range of positive affects, can be seen as more than
merely straightforward endeavors to enhance corporate branding; they also play a role
in evoking an experience of “cyborgian seemlessness” (Lupton 1995, 111) and thus arguably
echo Halacy’s utopian vision of technological prosthetics as unproblematic harbin-
gers of an enhanced human body. Similarly, Alberts’s account of the comforting effect of
amplified processing sounds for the computer operators of the 1950s and 1960s may be
understood as a successful experience of the smooth integration of bodies and tech-
nologized computational processes; that is, as cyborgs avant la lettre.
However, if we look closer into the particular qualities of the system sounds I have
discussed earlier in this chapter, there are significant differences between the sounds
and their possible connotations in the early mainframe computers in the 1950s and
1960s and the OSs Reekes and Eno contributed to in the 1990s, which I will discuss in
the following. Furthermore, more recent developments of OS sounds, most notably
Windows, show some more peculiarities that deserve closer scrutiny.
The 1990s sound projects by Reekes and Eno were aimed at evoking positive affects
through references to broader, nontechnological (musical) frameworks. This is especially
clear in Reekes’s reflections on the cultural connotations of the tritone that was used as
the original Mac startup sound. Instead, the comfort provided by the operating sounds
Alberts discusses appears to come from a more or less opposite connotation; instead of
referring to the “outside” world of music, the sounds reflected a technological operation
that was only meaningful in relation to the mechanical operation of earlier machinery
that had performed a similar function. This difference becomes more understandable
when we consider Lupton’s assessment of embodied user experiences of computer users
in the 1990s more closely.
Computer System Sounds as Embodying Technologies 587

As mentioned already, Lupton suggests that once interactions with computers and
other digital technologies have become thoroughly integrated in everyday life, a
“blurring of the boundaries” between the devices and the embodied self occurs.
However, this development is not a smooth process. It takes place through a negotiation
of antagonistic emotions toward computers. On one hand, users are indeed “attracted
towards the . . . opportunity to achieve a cyborgian seamlessness.” However, at the
same time, they often “feel threatened by [the technology’s] potential to engulf the self ”
(1995, 111) that would cause a loss of agency due to the lack of (perceived) individual
control over data in the system.
Here, it is important to acknowledge that the computer users Lupton discusses were
generally nonspecialists—although they often used computers intensively in everyday
life—for whom the devices very much remained like a “black box;” they would usually
have had little understanding of the inner workings of the computer (Latour 1999). This
perceived mysteriousness of the computer system is arguably also one of the sources of
the fears and discomfort concerning the perceived threat of loss of control and agency
when becoming dependent on cybernetic systems.
When we listen to Reekes’s account of the creation of the startup sound in this
context,1 or hear other sounds he designed (e.g., “quack” and “Sosumi”) it is significant
that the sounds he designed and selected include a combination of recordings of acoustic
sounds and synthetically generated sounds and that many sounds seem to be somewhere
in-between acoustic and synthetic (while the startup chime does not sound entirely syn-
thetic, it is also hard to tell what the acoustic sound sources involved might be). We hear
a similar pattern in the Windows system sounds of the mid-1990s and early 2000s, while
the sounds “Recycle” and “Ring” are clearly recognizable as recordings of the crumpling
of a piece of paper and a ringing desk phone, “Notify” and the sounds that mark infrared
connections may be more readily associated with the soundscape of a sci-fi film.2
Considering this combination of the acoustic and synthetic in the choice of OS sounds
in both Mac OS and Windows in relation to Lupton’s examination of the ambiguous
relationship of computer users of the 1990s with their devices, it appears that the sonic
environments of the OSs functioned as a means to partly negotiate this tension; on the
one hand, they promote a smooth, sci-fi-like aesthetic to evoke a sense of unproblematic
and clean computing power (this is perhaps most prominent in the different versions of
the Windows startup sound since the mid-1990s) while, on the other hand, the inclusion
of sounds that evoke elements of the organic world outside the device provides a sense of
comfort, mitigating fears of loss of agency that are due to dependency on a “black box.”
Quite differently, discomfort arising from dependency on a black box is unlikely to
have been a big issue for the engineers working with early computers. In the early days
of computing, operators were usually mathematicians and engineers with an in-depth
knowledge of the system, while the systems themselves were still of a limited degree
of complexity, which made it possible for an individual to have a fairly comprehensive
understanding of its processes. In other words, whereas for the (predominantly non-
specialist) computer users of the 1990s a sonification of internal operations of a computer
588 DANIËL PLOEGER

system would be likely to further add to the opacity of its operations and thus heighten
a sense of alienation and potential threat, the sonified system sounds early engineers
listened to comforted them that the machine was operating as intended and made it
possible for them to relate to the system in terms of human actions (the previously
manual operations of mechanical calculator operators).
Since the 1990s, OS sounds have continued to develop. Listening to Microsoft Windows,
for example, there are several changes that stand out. As I mentioned in the introduction
of this chapter, since version 10, which was released in 2015, Windows no longer features
a notable startup sound. Another development that becomes apparent on closer listening
is the gradual disappearance of the organic-sounding sounds that had been a prominent
feature of Mac OS and Windows alike since the 1990s. In Windows 10, the only appar-
ently organic sound that remains is “Recycle” (the sound of crumpling paper mentioned
earlier). All other sounds have gradually become smoothed and more evocative of
digital synthesis. The ring tones no longer resemble those of traditional desk phones.
The sense of the synthetic is further heightened by the conspicuous increase of digitally
generated reverb that is added to the various sounds over the years.
Microsoft’s response to queries about their motivations to remove the startup sound
gives us a hint as to how developments in the sonic interface may be related to broader
issues around the (desired) experience of human–computer interaction:

When we modernized the soundscape of Windows, we intentionally quieted the

system. . . . you will only hear sounds for things that matter to you. We removed the
startup sound because startup is not an interesting event on a modern device. Picking
up and using a device should be about you, not announcing the device’s existence.
(Microsoft Corp. in Wong 2015)

Thus, OS sounds are conceived to facilitate a user experience in which the device is
no longer perceived as present. The device should become an unnoticed attribute that’s
“all about you.” In other words, the soundscape should facilitate the “cyborgian seem-
lessness” Lupton wrote about in the 1990s. Apparently, there is now no longer felt a need
to put users at ease through evoking a sense of the organic around the technological
black box they are connecting with. Instead, the technological device as a whole should
be backgrounded.
This “quieting of the system” is reminiscent of what Mark Weiser (1996) has coined as
“calm technology” as part of his theory of “ubiquitous computing.” In the late 1980s and
early 1990s, Weiser observed that personal computers, despite their widespread use by
nonspecialists, were still often experienced as specialist devices, the operation of which
involved focused and concentrated activity. In contrast, the much older information
technology of reading and writing is present in all areas of everyday life and is per-
formed with a much lower degree of conscious attention; writing is a “ubiquitous tech-
nology.” Weiser argued that, once computers become truly omnipresent in all kinds of
forms, and each person operates a number of different devices, we will arrive in the era
Computer System Sounds as Embodying Technologies 589

of “Ubiquitous Computing.” He foretold the arrival of “lightweight Internet access

devices costing only a few hundred dollars” and the inclusion of “a full Internet server
into every household appliance [to connect] things in the world with computation”
(1996), something which we now usually describe as the Internet of Things. This new
era in computing would bring a new challenge: technologies should become “calm”:
“If computers are everywhere they better stay out of the way” (1996). For Weiser, this
meant that systems should be designed for the “periphery.” They should afford being
“attuned to without attending to explicitly” or, in broader terms, “what matters is not
technology itself, but its relationship to us.” Microsoft’s account of the Windows 10
soundscape almost seems to signal a direct implementation of this idea: “you will only
hear sounds for things that matter to you . . . using a device should be about you, not
announcing the device’s existence” (Weiser 1996).
Does this approach to OS sound design then mark the beginning of Weiser’s era of
Ubiquitous Computing and Calm Technology, an era of “cyborgian seamlessness?”
Indeed, mobile technologies, which are playing an ever-bigger role in people’s everyday
information technology use, are equipped with an even further reduced set of system
sounds. If we understand system sound design strategies as merely responding to a
techno-deterministic status-quo these recent developments may indeed be seen as indi-
cators of the dawn of a seamless cyborg body where information technologies operate as
inconspicuous and naturalized extensions of human bodies.

Sound Glitches as Intervention

While developments in OS sounds can, to an extent, be understood simply as

responses to broader cultural trends regarding attitudes to computing technology,
they should also be considered as attempts to establish an envisioned, desirable inter-
relationship between humans and computers; instead of indicating that we have now
arrived in the era of Ubiquitous Computing, the Windows 10 soundscape and its accom-
panying rhetoric may also simply show us that Microsoft would like us to imagine
ourselves as “seamless cyborgs” in the sense of Weiser’s ideas.
Here it is apposite to recall MacKenzie and Wajcman’s (1999) argument in The Social
Shaping of Technology, that instead of merely constituting a neutral, deterministic force
that drives cultural change, technological developments are embedded in sociopolitical
dynamics. Research and development are often facilitated by government and large,
corporate grants. As a result, technological developments are frequently shaped by
the agendas of—and could contribute to the preservation of—existing sociopolitical
power structures. It is conceivable that the primary interest in the corporate-driven
design of OS sounds approach has simply been to make the devices attractive to users
and thus enhance sales and consumption. However, computer system sound design also
appears to have been coherent with—and thus arguably complicit in the persistence
590 DANIËL PLOEGER

of—the positivist concept of the cyborg as a strengthened and enhanced human body in
which technological prostheses are politically neutral and form increasingly seamless
connections with the organic human body. Thus, rather than merely reflecting a
technocultural status quo, OS sounds also facilitate the user’s imagination of a particular
kind of connection between bodies and technologies.
Although the vision of technologically enhanced bodies may appear attractive, it also
has some problematic implications. First, the popular vision of the cyborg suggests a
universal notion of progress, which omits engagement with the inequalities of gender,
race, and social class that continue to play a role in the politics of bodies (Haraway 1991).
As long as they are not equally available to everybody, the introduction of seamless and
inconspicuous, and therefore likely to be taken for granted, technological extensions
to the body easily becomes a process of hiding inequality. Second, endeavors to make
interaction with technologies imperceptible promote a disregard for the materiality
of technological components in terms of expense of resources, production labor, and
ecological impact of waste (Ploeger 2016). The ever-increasing speed of replacement of
everyday electronic commodities generates a growing stream of electronic waste. In most
cases, this waste is eventually exported to developing countries where it is often recy-
cled through environmentally harmful methods or dumped in unprotected areas, caus-
ing severe environmental damage accompanied by a range of sociocultural problems
(Chan and Wong 2013).
Thus, instead of making human–computer interaction as inconspicuous as possible—
and thus promoting the imagination of a seamless cyborgian prosthesis—a conscious
experience of the user interface might be desirable in order to facilitate an engagement
with the device’s embeddedness in existing sociopolitical power structures, both con-
cerning persistent inequalities in access to technology, and the ecological and social
consequences of technology’s materiality. In other words, instead of stimulating the
imagination of smooth and powerful technologically enhanced bodies that are in line
with the interests of “militarism and patriarchal capitalism” (Haraway 1991, 151), a user
environment that is less “seamless” could facilitate a critical awareness of the develop-
ment and embeddedness in culture of technological devices. Considering OS sounds
from this angle, are there any opportunities to reconnect to the materiality of the
device in what sounds like the ever further smoothening and quieting of the system
soundscapes? Where are the—metaphorical and literal—cracks in the developers’
attempts to create a comfortable and seamless sonic interaction?
A 2007 blog post written by a member of the Windows developer team discussing
sound “glitching issues” in the new Windows Vista OS offers a possible answer. Defining
a sound glitch as “a perceivable error, gap, or pop in the sound caused by discontinuities
in the audio signal during playback or recording which result from processing or timing
problems,”3 the author draws attention to the fact that audio glitches are more perceptible
than irregularities in video “because the ear’s tuned to notice high frequency transients.”
Accordingly, the sound ecologist Michael Stocker (2013) suggests that the human body
is “hardwired” to be alerted by subtle changes in sound inputs. Sonic irregularities trigger
a sense of alert and thus break through a sense of smooth and unconscious interaction;
the illusion of the seamless cyborgian connection is temporarily interrupted.
Computer System Sounds as Embodying Technologies 591

Understandably, and as the quoted blog post suggests, OS developers are invested in
the elimination of any sonic irregularities. However, there are others who embrace these
“bugs” in OSs, and even seek to actively provoke glitches. Glitch artists tweak “tech-
nology and [cause] either hardware or software to sputter, fail, misfire or otherwise wig out”
(McCormack 2010). Although glitch art is often primarily considered as an aestheticiza-
tion of system bugs, there is a political dimension to this work precisely in its endeavor
to undermine software and hardware manufacturers’ desires to make computer oper-
ation as inconspicuous as possible by means of smoothly operating user interfaces.
Although most work in glitch art that engages with the user interfaces of OSs has thus
far been focused on visual artifacts, some artists have also worked with the distortion
and interruption of system sounds. Among the artists who work in this way are JODI
(Joan Heemskerk and Dirk Paesmans), members of the British organization TOPLAP,
and Chicago-based Jon Satrom.
Satrom’s Plugin Beachball Success (2012), performed at the opening ceremony of the
transmediale festival in Berlin, begins with what looks like a failed attempt to start
the program running the performance. Satrom unsuccessfully tries to log on to his
Mac several times. Each time the Mac error signal sounds through the speakers. Satrom
apologizes and says that he only just got this computer. Once he manages to get in,
another disruption occurs almost immediately: an error message states “PLUGIN NOT
FOUND. Your computer needs additional software to run this asset. Click Here to
DOWNLOAD.”4 It quickly becomes clear that the performance has actually already
started. Over the next thirteen minutes, Satrom turns the commonly experienced inter-
ruption caused by a missing plugin—an additional bit of software that enables a pro-
gram to read a certain data format—into an escalating sequence of repetitions and
transformations.
Operating system sounds play an important role in this process. The familiar error
sound that is explicitly introduced at the beginning of the performance is gradually
mixed into a cacophony of various system sounds and decomposed into gritty noise
structures. Listening to this apparent system collapse, the sound glitches gave me an
almost visceral sense of discomfort. Satrom makes us aware that the smooth connection
we may sense with our computers is merely an imaginary bond, forged to an important
extent by a polished system soundscape. Once the smooth, familiar system sounds are
violated and subverted, our attention is drawn to the fact that the technological exten-
sions of our body are designed in accordance with a certain logic; they are not merely
neutral, seamless prostheses that enhance the capabilities of our bodies. They are also
form part of a designed world; a world that is still overshadowed by the imaginary,
all-powerful cyborg of the military-industrial complex.

Notes
1. One More Thing (2010), “Interview Jim Reekes: Creator Mac Startup Sound,” https://www.
youtube.com/watch?v=QkTwNerh1G8. Accessed June 27, 2017.
2. Dark Parodies (2015), “All Windows Sounds | Windows 1.0–Windows 10.” https://www.
youtube.com/watch?v=ufKjjgvQZho. Accessed June 27, 2017.
592 DANIËL PLOEGER

3. Vistaheads.com (2007), “An Overview of Windows Sound and Music “Glitching”

Issues—Microsoft Windows Vista Community Forums—Vistaheads,” http://web.archive.
org/web/20100206210146/http://windowsteamblog.com/blogs/windowsvista/
archive/2007/10/29/an-overview-of-windows-sound-and-music-glitching-issues.aspx.
Accessed June 27, 2017.
4. interweb (2012), “Prepared Desktop: Plugin BeachBall Success Jon Satrom TM2K12”
https://www.youtube.com/watch?v=6jrz45AK-yA. Accessed June 27, 2017.

References
Alberts, G. 2000. Rekengeluiden: De lichamelijkheid van het rekenen. Informatie und
Informatiebeleid 18 (1): 42–47.
Bateson, G. 1972. Steps to an Ecology of Mind. San Francisco: Chandler.
Beckerman, J. 2014. The Sonic Boom. Boston, MA: Houghton Mifflin Harcourt.
Bijsterveld, K. 2006. Listening to Machines: Industrial Noise, Hearing Loss and the
Cultural Meaning of Sound. Interdisciplinary Science Reviews 31 (4): 323–337. doi:10.1179/
030801806x103370.
Blattner, M., D. Sumikawa, and R. Greenberg. 1989. Earcons and Icons: Their Structure and
Common Design Principles. Human-Computer Interaction 4 (1): 11–44. doi:10.1207/
s15327051hci0401_1.
Bruner, G. C. II. 1990. Music, Mood, and Marketing. Journal of Marketing 54 (4): 94.
doi:10.2307/1251762.
Chan, J. K. Y., and M. H. Wong. 2013. A Review of Environmental Fate, Body Burdens, and
Human Health Risk Assessment of PCDD/Fs at Two Typical Electronic Waste Recycling
Sites in China. Science of the Total Environment 463–464: 1111–1123. doi:10.1016/
j.scitotenv.2012.07.098.
Clynes, M., and N. S. Kline. 1960. Cyborgs and Space. Astronautics 14 (9), September 1960:
26–27, 74–76.
Cox, T. J. 2015. The Sound Book: The Science of the Sonic Wonders of the World. New York:
W. W. Norton.
DeWitt, A., and R. Bresin. 2007. Sound Design for Affective Interaction. Lecture Notes in
Computer Science 4738: 523–533.
Deleuze, G., and F. Guattari. 1987. A Thousand Plateaus. Minneapolis: University of
Minnesota Press.
Garland, A. 2015. Ex Machina. Film4, DNA Films.
Gaver, W. 1986. Auditory Icons: Using Sound in Computer Interfaces. Human–Computer
Interaction 2 (2): 167–177. doi:10.1207/s15327051hci0202_3.
Goodman, S. 2009. Sonic Warfare: Sound, Affect, and the Ecology of Fear. Cambridge,
MA: MIT Press.
Halacy, D. S. 1965. Cyborg: Evolution of the Superman. New York: Harper & Row.
Haraway, D. J. 1991. A Cyborg Manifesto: Science, Technology, and Socialist-Feminism in the
Late Twentieth Century. Simians, Cyborgs and Women: The Reinvention of Nature. London:
Routledge.
Jackson, D. M. 2003. Sonic Branding: An Essential Guide to the Art and Science of Sonic
Branding. Basingstoke, UK: Palgrave Macmillan.
Kubrick, S. 1968. 2001: A Space Odyssey. Metro-Goldwyn-Mayer.
Computer System Sounds as Embodying Technologies 593

Latour, B. 1999. Pandora’s Hope. Cambridge, MA: Harvard University Press.

Lupton, D. 1995. The Embodied Computer/User. In Cyberspace/Cyberbodies/Cyberpunk,
edited by M. Featherstone and R. Burrows, 97–112. London: Sage.
MacKenzie, D. A., and J. Wajcman. 1999. The Social Shaping of Technology (Second edition.)
Milton Keynes, UK: Open University Press.
McCormack, T. 2010. Code Eroded: At GLI.TC/H. Rhizome, October 31, 2010.
Ploeger, D. 2016. Abject Digital Performance: Engaging the Politics of Electronic Waste.
Leonardo 50 (2). doi:10.1162/LEON_a_01159.
Satrom, J. 2012. Plugin Beachball Success. Digital artwork.
Schneiderman, B. 1986. Designing the User Interface. Reading, MA: Addison Wesley Longman.
Smith, J. O. 2010. Physical Audio Signal Processing. http://ccrma.stanford.edu/~jos/pasp/.
Accessed June 27, 2017.
Stelarc. 1991. Prosthetics, Robotics and Remote Existence: Postevolutionary Strategies. Leonardo
24 (5): 591. doi:10.2307/1575667.
Stocker, M. 2013. Hear Where We Are: Sound, Ecology, and Sense of Place. Berlin: Springer.
The Six Million Dollar Man. 1973–1978 [TV]. ABC Network.
Thompson, M., and I. D. Biddle. 2013. Sound, Music, Affect. London: Bloomsbury Academic.
Verhoeven, P. 1987. Robocop. Orion Pictures.
Weiser, M., 1993. Ubiquitous Computing. Computer 26 (10): 71–72. https://web.archive.
org/web/20180220015318/http://pubweb.parc.xerox.com/weiser/UbiHome.html. Accessed
January 2, 2019.
Wong, Raymond. 2015. The Evolution of Windows Startup Sounds, from Windows 3.1
to 10.” Mashable. http://mashable.com/2015/07/31/windows-evolution-startup-sounds/
#jTUN3RzyJgq0. Accessed June 27, 2017.
CHAPTER 29

Glitch ed a n d Wa r ped
Transformations of Rhythm in the Age
of the Digital Audio Workstation

Anne Danielsen

Introduction

Digital music technology has brought about unforeseen possibilities for manipulating
sound, and, as a consequence, entirely new forms of musical expression have emerged.
This chapter will focus on the particular rhythmic feels that can now be produced
through manual or automated techniques for cutting-up sound, warping samples, and
manipulating the timing of rhythm tracks in digital audio workstations (DAWs). By
rhythmic feel, I refer to the systematic microrhythmic design applied to a rhythmic
pattern in performance or production, such as, for example, when playing a pattern
with a swing or straight feel. These new rhythmic feels have made an unmistakable mark
on popular music styles, such as glitch music, drum and bass, hip hop, neo-soul, and
contemporary R&B from the turn of the millennium onward, and not only represent a
challenge to previous forms but also create new opportunities for stretching the human
imagination through presenting previously unheard sounds and sonic gestures to
creators and listeners alike. A crucial aspect of this development is the manner in which
the new technologies allow for combining agency and automation, understood as
creative strategies, in new compelling ways.
In what follows, I will begin by reviewing two trends in the literature addressing these
new rhythmic feels: one that positions them as a continuation of earlier machine-generated
grooves; and another that positions them as an expansion of the grooviness of earlier
groove-based music, such as funk, soul, and R&B, in unforeseen directions. Ultimately,
I will reflect on the challenges faced by musicians and producers when it comes to antic-
ipating the outcomes of processes involving the experimental use of new technology
and, in turn, will acknowledge the potentially productive impact of the technologically
unexpected on our sonic imaginations.
596 anne danielsen

The Prehistory: “Organic” and

“Machinic” Rhythms in the Popular
Music Mainstream

According to Tim Armstrong (1998), two different views of the relationship between
technology and the body exist within modernism. On the one extreme, there is techno-
logical utopia, represented by Freud’s notion of technology as a positive prosthesis in
which human capacities are extrapolated. In this view, “[t]echnology offers a re-formed
body, more powerful and capable, producing in a range of modernist writers a fascination
with organ-extension, organ-replacement, sensory-extension” (Armstrong 1998, 78).
At the other extreme, we find writers adhering to the Marxist view of technology as an
alienating means of industrial production. Here the technological advances underlying
commodity capitalism result in a subordination of the human to the machine, promoting
a nonhuman form of mechanical repetition and standardization. In the field of music,
technology has generally taken on a role that is in accordance with the former view,
namely as a positive extension of the human body. This pertains, for example, to traditional
instruments such as pianos and clarinets (see, e.g., the discussion in Kvifte 1989) and to
the increasing use of experimental recording and processing technologies. Some of the
musical ideas that developed in rock in the late 1960s, for example, were not doable
without such musical “prostheses.” Similarly, within the field of electroacoustic music,
various electronic and computerized technologies have been regarded as progressive
and liberating tools for music creation. However, we also find tendencies of Marxist
determinism that apply to music. This is prominent both in the discourse on various
technologies’ roles in promoting mass distribution of music and the Frankfurt school’s
critical discourse on popular music as a cultural response to the standardization and
commodification typical of capitalist industrial production (Adorno 1990; Horkheimer
and Adorno 2002).
In this chapter, I will focus on rhythmic popular music and use as my starting point
the emergence of a discursive and performative tension that resonates with the Marxist
view on technology just presented, in the sense that it situates human expression and
machine-made musical creation as two opposing extremes. This tension developed as a
response to the depreciation of disco and other repetitive rhythmic music as com-
mercial and commodified “machine” music that emerged in the wake of the crossover
success of black dance music in the popular music mainstream in the late 1970s.1 The
immense popularity of disco was probably crucial here; the style represented new tools
(click track and the analog sequencer) and a new aesthetics (four-to-the-floor), and
threatened the ideological and commercial position of white Anglo-American rock
that, up to this point, had dominated the mainstream for several decades.2 As a conse-
quence, an increasing polarization between what might be called “organic” and
“machinic” rhythms emerged.3 On the one hand, artists played styles, such as rock,
country, funk, and jazz, that were characterized by rhythmic feels that derived from
transformations of rhythm in the digital audio workstation 597

both deliberate and unintended variations that musicians add to their performances; on
the other hand, there were artists who produced sequencer-based dance music with a
futuristic machine aesthetic, as expressed in Kraftwerk’s albums Man-Machine (1978)
and Computer World (1981). These latter grooves, enabled by analog sequencers, were
often perceived to be nonhuman and mechanistic, largely because of the absence of
micro-level flexibility in the temporal placement of rhythmic events that were all forced
into the grid provided by the sequencer. The absence of variation in sound in analog
(and early digital) sequencer-based groove was probably also crucial to this dichotomy;
the small shifts in intensity and timbre that are always present in performed music were
absent in these early sequencer-based rhythms.4 This division in rhythmic design within
1970s popular music is probably crucial to any subsequent understanding of why
rhythmic patterns consisting of grid-ordered events are experienced as lacking a human
touch even when they are produced by a human. Rhythmic subdivisions that are too
evenly played still tend to make us think of a machine. Loose timing, on the other hand,
tends to be described as organic and evokes associations with human performance, even
when those patterns and variations have been generated by a computer.5
The mechanistic aspect of perfectly even timing in sequencers from the predigital and
early digital era was often countered through the introduction of a humanizing function
to it, by altering the beats of a musical sequence according to a random series of deviations
that would make them less, nonhumanly perfect. However, even though this may be
thought to match motor and timekeeper noise in human timing, such random deviations
are not typical for groove-based music, that is, music organized around a repetitive
rhythmic pattern. As many studies have shown, deviations in groove-based music are to
a large extent systematic (Bengtsson et al. 1969; Butterfield 2010; Danielsen 2006, 2010b;
Iyer 2002), meaning that the same pattern of microtiming (that is, the early and late
marking of beats) is repeated in each repetition of the basic pattern (usually one or two
bars in length).6 Research has also shown that in performed music fluctuations that
exceed this basic pattern are not random either but are instead both long-range and
correlated (Hennig et al. 2011).
Prior to the increased temporal flexibility of later digital sequencers and digital
audio-sequencing (which was introduced in the early 1990s, see Brøvig-Hanssen and
Danielsen 2016, chap. 6; Burgess 2014, chap. 11) then, there was both an ideological and a
de facto difference between played and machine-generated rhythm that was associated
with the constraints of the conditions of production within these two spheres. Machine
rhythm lacked the intended (and unavoidable nonintended) temporal and sonic variations
that were typical of human performance. Likewise, humans were simply unable to
produce the extreme evenness of the machine.7
As we shall see in the following section, this traditional link between machine-based
music and stiffness has been disrupted by new opportunities for creating microrhythmic
designs in the DAW—first, because the DAW seems to be able to produce the entire
spectrum of rhythmic feels previously associated with human performance, and second,
because human- and computer-based rhythms are often, in fact, deeply embedded in
one another, not least through the ways in which human performances are routinely
598 anne danielsen

used as raw material for producing rhythms in the DAW. Today, therefore, it is very
difficult to distinguish between human- and computer-generated performances.
Nonetheless, even though the division between human- and machine-based rhythms
has been transcended when it comes to what the machine can actually produce, the two
related aesthetic paradigms—even rhythm on the grid, on the one hand, and deep,
groovy rhythmic designs, on the other—have to some extent been continued. At the
mechanistic extreme of the rhythmic continuum, we find forms of electronic dance
music (EDM), in which machine-like timing is a distinguishing stylistic feature and
even a preference long after alternatives to it had become available in the early 1990s
(Zeiner-Henriksen 2010). At the “organic” extreme of the continuum, we find the deep,
groovy rhythm of African American–derived, computer-based rhythmic genres. What
is used to realize these two fundamental rhythmic inclinations, however, is no longer so
different because, in the age of the DAW, they typically come from the same production
tools. A crucial factor in defining a possibly new late-digital condition regarding the
field of musical rhythm, then, is the manner in which the distinction between organic
and machinic rhythm has been transcended. Agency and automation, understood as
creative strategies, inform both mechanistic rhythmic expressions and deep, groovy
feels. I will now conduct a closer inspection of these two aesthetic trends in contempo-
rary musical rhythm.

Microrhythmic Manifestations
of the Digital Audio Workstation:
Two Trends

The first trend comprises electronica-related styles whose rhythmic events align with a
metrical grid. Common to the musicianship of the artists representing this trend is a
preference for exaggerated tempi and an attraction to the completely straightened-out,
square feel of quantization. As pointed out earlier, this was both an aesthetic preference
and a technological constraint in the analog, sequencer-based tradition that this trend
grew out of. In the early days of this trend, high-pitched sounds such as the hi-hat
cymbal (or something else that fills the same musical function) were programmed
unnaturally—either too quickly or too evenly or both—specifically to connote a
machine-like aesthetics (Zagorski-Thomas 2010; Inglis 1999). The sound of these songs,
then, evokes an overdone, even unlikely virtuosity that I have elsewhere labeled the
“exaggerated virtuosity of the machine” (Danielsen 2010a). Prominent pioneering
artists of this rhythmic trend include Aphex Twin (the performing pseudonym of
Richard D. James), Autechre (Sean Booth and Rob Brown), and Squarepusher (Tom
Jenkinson), all of whom entered the electronica scene in the 1990s and are associated
with the label Warp. After a few years, this aesthetic strategy had traveled from these
avant-garde electronica toolboxes to, for example, the title track of the Destiny’s Child
transformations of rhythm in the digital audio workstation 599

album Survivor (Columbia 2001), thus entering the popular music mainstream. The fast
speed and quantized evenness of many of the tracks on such albums anticipate the
related process of musical granulation—that is, of crystallizing “sonic wholes” into
grains, so that musical or nonmusical sounds are chopped up into small fragments and
reordered to produce a stuttering rhythmic effect. This aesthetic also promotes a tendency
to transform sounds with an otherwise clear semantic meaning or reference point—
such as a musical source or a different musical context—into “pure” sound (see, for
example, Harkins 2010). Sounds or clips are also often combined in choppy ways that
underline sonic cut-outs, rather than disguising them, resulting in a skittering collage.
The label glitch music8—a substyle of electronic dance music associated with the
artists mentioned in the previous paragraph—hints at the ways in which we perceive
these soundscapes, namely as a coherent sonic totality that has been “destroyed,” meaning
chopped up and reorganized anew.9 An important point here, which Brøvig-Hanssen
discusses at length, is that this approach to sound relies on the listener being able to
imagine a “music within the music”—that is, a fragmented sound presupposes an imagined
and spatiotemporally coherent sound (Brøvig-Hanssen 2013). This operation, however,
becomes particularly precarious when the manipulated element is a voice. Brøvig-
Hanssen’s detailed analysis of the manipulations of the vocal track in two versions of
Squarepusher’s “My Red Hot Car,”10 where one is a “glitched” version of the other, clearly
demonstrates the ways in which meaning is transformed when sound is manipulated
away from what one normally regards as the field of possible human utterances. In the
glitched version, the vocal track has been “deformed”—sounds are cut off too early,
there are repeated iterations of sound fragments separated by signal dropouts, and frag-
ments are dislocated from their original locations (Brøvig-Hanssen and Danielsen 2016,
chap. 5)—in a manner that clearly departs from the human. Still, it is also hard to hear
the vocal track as purely musical (that is, not sung) sound. One tends to persist in imagining
a human being (and a coherent message) behind the stuttering rhythm, since the voice
always tends to be, first and foremost, an indexical sign of the human body and a clear
path from source through musical performance to recording. Consequently, “[w]e
can discern two layers of music, the traditional and the manipulated, neither of
which, in this precise context, makes sense without the other” (Brøvig-Hanssen and
Danielsen 2016, 95).
In addition to the association of cut-up-strategies with the destruction or transfor-
mation of a coherent musical whole, glitched, granulated, or manually or automatically
chopped up sound also produces a very characteristic microrhythmic effect. As Oliver
(2015) emphasizes, in jungle and drum and bass it is not first and foremost the transfor-
mation of temporal features or durations that produce the peculiar microrhythmic
effects but the cutting up of sounds and the abrupt transitions between sounds that such
cuts produce. The effect of chopping up the crash cymbal of the much-sampled Amen
break, for example, relies heavily on the fact that it is an initially acoustic, and thus very
rich, sound.11 When human musicking is transformed through computer-based pro-
cedures, one is thus confronted by both a break with and a continuation of the existing
mechanistic aesthetics of some kinds of rhythm. The sound is different (richer, less
pure), but the groove is produced, as with most EDM-related styles, not by manipulating
600 anne danielsen

temporal relationships but by introducing an interesting system of dynamics within the

domain of sound. In the jungle genre, however, from which many of Oliver’s examples
are drawn, it is not the dynamics of one sound that are the foci, but rather the micro-
rhythmic effect that can be achieved through a compelling montage of fragments of the
sound—that is, through the disruption and reordering of the parts of a sound.
Whereas no microtiming is usually involved in this practice—all of the events are on
the grid—a second trend, on the contrary, pushes the perceptual boundaries of timing
discrepancies and irregularities to the limit and, in some cases, beyond. It concerns the
increasing experimentation with, and manipulation of, the microtiming of rhythmic
events through moving tracks back and forth on the time axis while otherwise cutting
and reordering, editing and warping—in short, transforming longer stretches of
sampled or played sounds. This trend produces rhythmic feels that are experientially
very different from those above. Here, it is primarily the temporal relationships—durations,
interonset intervals, the temporal envelope—that are being altered to great effect.
One way of manipulating the original timing of performed music is simply to move
rhythmic events or whole tracks to new temporal positions. In the former case, the result
can be severe discrepancies between rhythmic events that were initially aligned (beatwise).
In the latter case, moving an entire track in a multitrack recording introduces multiple
locations for the pulse at the micro level. This strategy can be heard on D’Angelo’s Voodoo
album (1999). Inspired by the glitch aesthetic of legendary hip hop producer-artist
J Dilla (more on his music later), many of the Voodoo tracks display sharp discrepancies
between rhythmic events that are happening on the same beat. In the tune “Left &
Right,” for example, visual amplitude/time representations of the groove reveal that the
discrepancy of the pulse location of the guitar layer and the pulse location of the bass/
bass drum layer is between fifty and eighty milliseconds, or up to one thirty-second note
at the song’s tempo, which is close to ninety-two beats per minute (Danielsen 2010b). In
an analysis of another song on this album, “Untitled (How Does It Feel),” Bjerke (2010)
measures the distance between the multiple locations of the basic pulse at around ninety
milliseconds. As D’Errico points out, the instability introduced through such a destabi-
lizing maneuver tends to become normalized in the context of a stable and repetitive
loop (D’Errico 2015, 283). However, such interventions nonetheless introduce a character-
istic nonhuman, halting feel to the groove which, in turn, conveys the impression that
the feel aspect of the groove is somewhat overdone.
The experimental hip hop and neo-soul coming out of the Soulquarian collective to
which D’Angelo belonged, together with artists and bands such as Common, the Roots,
and Erykah Badu, might be considered a form of the avant-garde within African
American–derived rhythmic genres. However, recordings by more mainstream con-
temporary R&B and rap artists from the early 2000s display the innovative use of digital
tools as well. Carlsen and Witek, in an analysis of the song “What about Us” from
Brandy’s innovative album Full Moon (Atlantic 2002, produced by Rodney Jerkins),
show how the peculiar rhythmic feel of that tune derives from simultaneously sounding
rhythmic events that “appear to point to several alternative structures that in turn imply
differing placements of the basic beat of the groove. Though these sounds might coincide
transformations of rhythm in the digital audio workstation 601

as sounds, then, they do not coincide as manifestations of structure” (Carlsen and

Witek 2010, 51). An illustration of this phenomenon would be, for example, when a
hi-hat structurally referring to the last sixteenth note before a downbeat is delayed to
such an extent that it coincides with the sound that in fact structurally represents that
downbeat (a bass drum, perhaps). In other words, rather than being perceived as
deviations from a shared underlying reference structure, such simultaneously sounding
rhythmic events point to several alternative structures that in turn imply differing
placements of the basic beat at the microlevel of the groove. The result is akin to the
rhythmic feel of the D’Angelo groove described earlier, where there are multiple loca-
tions of the pulse that merge into one extended beat at the microlevel of the groove.
Radical warping procedures can also be heard on several tracks of Snoop Dogg’s
innovative album R&G (Rhythm & Gangsta): The Masterpiece (Geffen 2004). Here,
several producers, among them J. R. Rotem and Josef Leimberg, contributed their takes
on grooves where the feel aspect is almost overdone as a consequence of manipulation
of rhythm in the DAW, leading to what I have earlier called the “exaggerated rhythmic
expressivity of the machine” (Danielsen 2010a, 1). The groove in “Can I Get a Flicc
Witchu” (produced by Leimberg) consists of a programmed bass riff and a drum kit,
along with vocals that are mainly rapped. The texture of the groove is simple and open,
but the microrhythmic relationships within it are muddy and complex. There are two
forms of time warping going on here. First, the length of the beats is gradually shortened,
so that beat 2 is shorter than beat 1, beat 3 is shorter than beat 2, and so on. This may be
due to the use of tempo automation, a function that was available in the DAW at the time
of production of Rhythm & Gangsta. This form of manipulation contributes to a general
vagueness as to the positioning of rhythmic events. Second, the bass pattern follows its
own peculiar schematic organization and is a main reason for the “seasick” rhythmic
feel of the tune. This pattern neither relates to the 4/4 meter nor conforms to a regular
periodicity of its own (for a detailed analysis, see Brøvig-Hanssen and Danielsen 2016,
chap. 6). Its peculiar feel has most likely been produced in ProTools after the recording,12
either by adjusting the temporal onsets of the programmed events forming the bass
riff pattern until the sought-after effect was achieved, by recording the bass riff sepa-
rately in free rhythm, or by sampling the bass riff from a different source altogether. In
the latter two cases, the recording or sample usually has to be deformed in various ways
to fit the length of the repeated unit of the destination groove. The producer could also
cut out a piece of the source (a recording or a sample) that has the exact length of the
loop and paste it into the new musical context, regardless of any resulting mismatches in
meter and tempo. This strategy recalls the work of J Dilla, and the sounding result in
“Can I Get a Flicc Witchu” resembles the peculiar feels of J Dilla’s Donut album (2006),
where the natural periodicity of the original samples is also often severely disturbed by
the shortening or lengthening of one or more beats/slices of the sample. When this type
of operation is looped, again, the result is a dramatically halting, deformed, human feel.
The Snoop Dogg example demonstrates some of the ways in which samples can be
manipulated timewise through various warping procedures, the results of which
resemble the effect of the (re)positioning of rhythm tracks and events typical of
602 anne danielsen

D’Angelo’s music from Voodoo onward. An additional dimension of J Dilla’s music,

however, is the way in which he—despite transforming his sample in fundamental
ways—manages to keep the sample’s world of associations somewhat intact. While he
even disturbs human musical gestures by introducing glitches to their natural flow, his
music is generally derived from the cutting and splicing of one or a very few sampled
sounds, which allows them to remain readily recognizable. His work therefore contrasts
with the “quantized” glitch aesthetics described above, where the automated procedures
for cutting and splicing/relocating sonic fragments tends to destroy the sources and
meanings of the samples. D’Errico also points to J Dilla’s characteristic habit of recon-
figuring single musical sources—that is, he often “abstains from juxtaposing various
samples into a multi-layered loop, instead rearranging fragments of a single sample into
an altogether different groove” (D’Errico 2015, 283). This strategy underlines the surreal
effect of the glitched version of the sample and shows the extent to which the meaning of
the end result in such cases is highly parasitic on its source. When a sample keeps
enough of its character to point toward its original aesthetic universe, which in the
case of J Dilla is often a world of easy listening or light entertainment, the effect of the
“corrupted” sound file or the imperfection of the loop becomes conspicuous.
Benadon (2009) notes that time warps are common in predigital music as well. In
early jazz, for example, the original rhythmic template might be distorted (in performance)
through acceleration, deceleration, or a combination of these within the time span of
the template.13 Global transformations of tempo might also affect the perception of
stability of the rhythmic template, since all tempo transformations happen in relation to
a rhythmic anchor and therefore introduce a sense of tension and release against that
anchor. These forms of “analog” time warps, however, tend to have a continuous
character.14 They gradually (organically) evolve, whereas digital time warps, probably
because they are not implemented and modified by human musicking, tend to be
introduced more abruptly and are thus often heard as un-organic or glitched.
Both trends described above are parasitic on our notion of a pre-existing musical
whole—something that was not deformed has been twisted or bent, a whole has been cut
up and reordered, something that did not show any sign of failure or defect has been
manipulated to come forward as containing a glitch. The perceived nonhuman character
of these digital manipulations presupposes a notion of musical humanness—that is, an
imagining of what the typically human gesture that has been disturbed or destroyed
once was.

An Extension of the Human?

Playing and making music have always been embedded in technology. The opposition
between organic and machinic musical expressions in late 1970s and early 1980s popular
music thus comes forward as partly ideological: all music-making means being deeply
involved in its technology, or, in the words of Nick Prior:
transformations of rhythm in the digital audio workstation 603

It is not just that technology impacts upon music, influences music, shapes music,
because this form of weak technological determinism still implies two separate
domains. Music is always already suffused with technology, it is embedded within
technological forms and forces; it is in and of technology. (2009, 95)

Relating this point to a more general epistemological discourse, we could say that new
technology creates new understanding, and that we have always learned to know the
world through the tools and technologies that we use to interact with our surroundings.
As Heidegger makes us aware of in his essay “The Question Concerning Technology”
(1977), there is no alternative route to the knowledge we acquire through technology.
Moreover, the insights that we derive from technology cannot be separated from the
technology itself; through technology we achieve knowledge about the world in a way
and to an extent that would otherwise be unavailable to us. In the words of Heidegger:
“[Techne] reveals whatever does not bring itself forth and does not yet lie here before us,
whatever can look and turn out now one way and now another” (1977, 8). The idea that
man and technology are opposed to each other is thus, according to Heidegger, beside
the point—instead, the machine should, in line with the “technology as prosthesis”—
view presented earlier, be seen as an extension of the human.
Digital technology has reactualized this debate in music-making, and from this
perspective one might ask whether the rhythmic feels discussed previously really
represent the results of a radically new “posthuman condition,” or whether they ought
to be understood as part of the continuous development of technology’s ever-present
role as an aid to, and extension of, human expression and behavior. According to the
latter position, so-called posthuman expressions are not after or outside of the human
repertoire at all. Instead, they should be considered simply the most recent expansion of
that repertoire. This would mean, in turn, that the microrhythmic manipulation made
possible by the DAW represents, in principle, nothing new, because there is nothing new
in the fact that new technology produces new forms of knowledge, expression, and
behavior or that it expands the scope of the human imagination.
As pointed out at the start of this chapter, however, after the introduction of
sequencer-based grooves in the popular music mainstream in the late 1970s, performed
and machine-generated music tended to align with two distinct aesthetic fields. For
some years, these two fields made use of different sets of tools that produced very different
sonic results. Consequently, performed and machine-generated music came to represent
different worlds of musical expression and imagination in the following decades.
Microrhythmic manipulation in the DAW has brought about a new aesthetic situation
marked by convergence between these two musical-rhythmic poetics. Performed and
machine-generated music are, in the late-digital era, deeply embedded in one another—
first, because both digital and traditional music technologies are used to achieve the
desired musical results in both domains, and, second, because the respective contributions
of these different technologies are in many cases (such as the examples discussed in
this chapter) almost impossible to distinguish from one another in the end result.
Accordingly, it would be wrong to speak of a hybridization of the two, because this
604 anne danielsen

presupposes two separate and still recognizable entities that have been combined.
Rather, performed and machine-generated rhythms have, in many contemporary
genres, morphed, making it impossible to separate their respective influences. We are
most likely yet to see the full consequences of this development, which also includes a
wide range of new interfaces for organic control of computers and music machines.15
The flexibility of the DAW, our contemporary music machine, has contributed tre-
mendously to this ongoing transformation, from an either/or to a both/and where the
distinction between organic and machinic musical expressions feels of little relevance.
The timing of musicians is warped in the DAW, then copied by other musicians who
are in turn manipulated in new machine-generated renderings, and on it goes. Even
the very current examples of the creative usage of digital pitch correction illustrates
this point. Autotune is another instance of a fundamental morphing of human and
machine that is made possible by digital tools that have extended the human expres-
sive repertoire; sometimes the result of this morphing is a voice that captures certain
human states or conditions better than the unmediated human voice, which is per-
haps the most human of all instruments (see Brøvig-Hanssen and Danielsen 2016,
chap. 7). We might then wonder whether we are in a new phase in the interaction
between the musicking human and the machine, a phase that is characterized by an
even more radical undermining of a possible ontological separation between man and
technology than what characterizes the musician-instrument interaction typical of
predigital times.

Imagining the “Humachine”

through Sound

So, were the creators of the new rhythmic feels discussed earlier capable of imagining
the end result (and its wider implications), or did these new feels simply arise by acci-
dent and become labeled as such by the collective imaginations of the consumers/
receivers? This is a question that invites a double answer. No, the creators probably did
not anticipate the effect of their experiments with new technology, and they were—and
are, in line with Heidegger’s insights above—certainly not capable of foreseeing their
wider results. On the other hand, new rhythmic feels such as those discussed above do
not simply happen. The processes leading to them are begun with the intention of creat-
ing new sound. Generally, mechanized procedures for generating new musical material
represent a well-known strategy for innovative music-making that was employed by, for
example, the composer Pierre Boulez from the 1950s onward. His practice and reflec-
tions make it clear that the point of using such procedures was often to come up with
something unimaginable, with completely new sonic raw material, that could then be
shaped through intentional compositional procedures (see Guldbrandsen 2011, 2015).
transformations of rhythm in the digital audio workstation 605

The same goes for the creation of the rhythmic feels discussed previously. As we have
seen, an experimental attitude in combination with playfulness and creative abuse of
new technology may result in as-yet-unheard sonic results.
The flip side of this is that, as soon as those new sounds have been produced, they start
inhabiting the imaginations of their creators and the listeners. As to the groove-based
music discussed in this chapter, the relationship between rhythm and motion is clearly a
case in point. The groove qualities of rhythmic music are often related to the music’s
perceived ability to make one’s body move. Exactly how various rhythmic feels are
connected to body movement certainly remains an open question, but recent per-
spectives from the field of embodied music cognition pave the way for a close connection
between rhythm and perceived and performed motion (e.g., Chen et al. 2008; Danielsen
et al. 2015; Godøy et al. 2006; Large 2000; Leman 2008; Repp and Su 2013). Generally,
discussions of the relationship between rhythm and corporeality in music listening
point to the real and underacknowledged possibility that we structure our actual musi-
cal experiences according to patterns and models received from extra-musical sources,
such as actual movements (see also Godøy, this volume, chapter 12). This is probably
also a clue as to why we manage to adjust to and structure the peculiar warped grooves
discussed above: we draw on our internalized repertoire of already acquired gestures to
make sense of a new timing pattern. Put simply, if we find a way to move to those
grooves, we then come to “understand” them.
However, not only do dance and movement affect the way we experience and
understand grooves, inner or outer movements can also be induced or proposed by
music; that is, new gestures can be proposed by a piece of music. The rhythmic feels
discussed earlier may thus be a means of imagining completely new movement pat-
terns, or gestural designs, that are typical of the music of the humachine. Similar to the
ways in which the glitched and warped grooves described above both evoke and
deform their own “originals,” such imagined gestural designs may feel at one and the
same time connected and completely alien to us. As we develop ways of internally or
externally responding to these grooves, however, we also develop an understanding of
these new gestural imaginations, which at present goes well beyond our “natural” rep-
ertoire (here understood as what we regard as possible for human beings in the present
historical situation). Sounds that are shaped by way of digital processing may thus
evoke sonically based imaginations not only of the sources behind them (what kind of
creature makes this sound) but also of morphed, human-machine motion. Put differ-
ently, the sound of the DAW proposes a wide variety of new and peculiar ways of sing-
ing (the morphing of human and machine through autotuning), talking (glitched
stuttering vocal tracks), and moving (warped, deformed human gestures). Today,
these are experienced as different and marked by technological intervention, but who
knows? In future renderings, they might be regarded as completely commonplace,
perhaps as ordinary as talking with people on the other side of the Atlantic through
the telephone and hearing the whispering of singers from an enormous stadium stage
are today.
606 anne danielsen

Notes
1. For a discussion of how this crossover success changed black dance music, see Danielsen
(2006, chaps. 6 and 7, 2012).
2. According to Paul Théberge, contrary to the 1960s, when experimentation with, for
example, distorted guitar sound and multitrack recording “created excitement around
new sounds and electronic effects” (1997, 1), the late 1970s saw a skepticism toward
electronic instruments. According to Théberge, this skepticism (among, one might add,
rock musicians and their audiences) emerged as a consequence of the widespread reaction
to disco (1997, 2).
3. For a critical discussion of this polarization, see, for example, Simon Frith’s essay “Art versus
Technology” (1986).
4. Interestingly, in an article in Sound on Sound as late as October 1999, this absence of
variation in sound is still lamented when one is striving for realistic, sequenced drum
parts: “[A] main problem with many sampled sound sets is that they do not reflect the
ways in which the sound of real percussion instruments varies depending on the force
with which they’re struck” (Inglis 1999). This uniformity is particularly acute with hi-hat
strokes: “Standard drum kit sets, particularly those conforming to the general MIDI drum
map, suffer persistent problems. Perhaps the most obvious of these is the use of only three
different hi-hat sounds—open, closed and pedal—when real drumming makes use of a
continuous range of sounds from quiet to soft, from tight closed to open” (Inglis 1999).
5. Today, both machinic and organic music rely heavily on technological tools and is produced
by way of the DAW. Whether a piece of music is placed in the one category or the other,
then, has little to do with the kind of tools involved or the degree of technological
involvement. Rather, it comes forward as a question of aesthetics and the degree to which
the use of technology is exposed or made opaque to the listener (Brøvig-Hanssen 2010).
6. In addition to such systematic timing, there are also individual patterns (see, for example,
Repp 1996).
7. The fact that humans make mistakes, and machines, on the other hand, are associated
with (nonhuman) perfection, is also the backdrop for the experience of the “vulnerable,”
and thus more human, machine—as though technological mistakes somehow resemble
our own imperfections. According to Sangild, a technological failure such as a glitch thus
gives us a sense of “something living [it] displays the fragility and vulnerability of tech-
nology” (2004, 268). Dibben (2009) also underlines this humanizing effect of technological
failure in a discussion of Björk’s use of technology.
8. “Glitch” initially referred to a sound caused by malfunctioning technology. As Sangild
(2004) points out, these sounds of misfiring technology in fact expose technology as such
(266), or render it opaque (Brøvig-Hanssen 2010).
9. Whereas automated cutting processes could initially only be applied to prerecorded
sound, they can now be used in real time. For an introduction to the algorithmic pro-
cedures underlying different automated cutting processes in live electronica performance,
see Collins (2003).
10. The two versions were released as the two first tracks of Squarepusher’s EP My Red Hot
Car (Warp 2001). The second track was subsequently placed on the Squarepusher album
Go Plastic (Warp 2001).
11. The Amen break refers to a drum solo performed by Gregory Cylvester Coleman in the
song “Amen, Brother” (1969) by The Winstons.
transformations of rhythm in the digital audio workstation 607

12. See Johnson (2005) for an overview of the equipment used in Snoop Dogg’s recording
studio at the time.
13. This phenomenon parallels the local time shift phenomenon as described by Desain and
Honing (1989). See also Danielsen (2010a).
14. “Analog” performance practice is, of course, also open to sudden transitions, for example
in the form of tempo shifts. Research has shown that these can be rather abrupt (see, for
example, Cook 1995; Bowen 1996). However, the particularly glitched character of digital
time warps is difficult to achieve with conventional instruments.
15. For an overview of advances in interfaces for musical expression from the last fifteen
years, see Jensenius and Lyons (2017).

References
Adorno, T. W. 1990. On Popular Music. In On Record: Rock, Pop, and the Written Word, edited
by S. Frith and A. Goodwin, 301–314. London: Routledge.
Armstrong, T. 1998. Modernism, Technology, and the Body: A Cultural Study. Cambridge:
Cambridge University Press.
Benadon, F. 2009. Time Warps in Early Jazz. Music Theory Spectrum 31 (1): 1–25.
Bengtsson, I., A. Gabrielsson, and S. M. Thorsén. 1969. Empirisk rytmforskning. Svensk tidskrift
för musikforskning 51: 48–118.
Bjerke, K. Y. 2010. Timbral Relationships and Microrhythmic Tension: Shaping the Groove
Experience through Sound. In Musical Rhythm in the Age of Digital Reproduction, edited by
A. Danielsen, 85–101. Farnham, UK: Ashgate.
Bowen, J. A. 1996. Tempo, Duration, and Flexibility: Techniques in the Analysis of Performance.
Journal of Musicological Research 16 (2): 111–156.
Brøvig-Hanssen, R. 2010. Opaque Mediation: The Cut-and-Paste Groove in DJ Food’s “Break.”
In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen, 159–176.
Farnham: Ashgate.
Brøvig-Hanssen, R. 2013. Music in Bits and Bits of Music: Signatures of Digital Mediation in
Popular Music Recordings. PhD thesis. University of Oslo.
Brøvig-Hanssen, R., and A. Danielsen. 2016. Digital Signatures: The Impact of Digitization on
Popular Music Sound. Cambridge, MA: MIT Press.
Burgess, R. J. 2014. The History of Music Production. Oxford: Oxford University Press.
Butterfield, M. 2010. Participatory Discrepancies and the Perception of Beats in Jazz. Music
Perception 27 (3): 157–176.
Carlsen, K., and M. A. G. Witek. 2010. Simultaneous Rhythmic Events with Different
Schematic Affiliations: Microtiming and Dynamic Attending in Two Contemporary R&B
Grooves. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen,
51–68. Farnham, UK: Ashgate.
Chen, J. L., V. B. Penhune, and R. J. Zatorre. 2008. Listening to Musical Rhythms Recruits
Motor Regions of the Brain. Cerebral Cortex 18: 2844–2854.
Collins, N. 2003. Recursive Audio Cutting. Leonardo Music Journal 13: 23–29.
Cook, N. 1995. The Conductor and the Theorist: Furtwängler, Schenker, and the First
Movement of Beethoven’s Ninth Symphony. In The Practice of Performance: Studies in
Musical Interpretation, edited by J. Rink, 105–125. Cambridge: Cambridge University
Press.
608 anne danielsen

Danielsen, A. 2006. Presence and Pleasure: The Funk Grooves of James Brown and Parliament.
Middletown, CT: Wesleyan University Press.
Danielsen, A. 2010a. Introduction. In Musical Rhythm in the Age of Digital Reproduction,
edited by A. Danielsen, 1–18. Farnham, UK: Ashgate.
Danielsen, A. 2010b. Here, There and Everywhere: Three Accounts of Pulse in D’Angelo’s “Left
and Right.” In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen,
19–35. Farnham, UK: Ashgate.
Danielsen, A. 2012. The Sound of Crossover: Micro-Rhythm and Sonic Pleasure in Michael
Jackson’s “Don’t Stop ‘Til You Get Enough.” Popular Music and Society 35 (2): 151–168.
Danielsen, A., M. R. Haugen, and A. R. Jensenius. 2015. Moving to the Beat: Studying
Entrainment to Micro-Rhythmic Changes in Pulse by Motion Capture. Timing and Time
Perception 3 (1–2): 133–154.
D’Errico, M. 2015. Off the Grid: Instrumental Hip-Hop and Experimentalism after the
Golden Age. In The Cambridge Companion to Hip-Hop, edited by J. A. Williams, 280–291.
Cambridge: Cambridge University Press.
Desain, P., and H. Honing. 1989. The Quantization of Musical Time: A Connectionist
Approach. Computer Music Journal 13 (3): 56–66.
Dibben, N. 2009. Björk. Bloomington, IN: Indiana University Press.
Frith, S. 1986. Art versus Technology: The Strange Case of Popular Music. Media Culture
Society 8 (3): 263–279.
Godøy, R. I., E. Haga, and A. R. Jensenius. 2006. Playing “Air Instruments”: Mimicry of
Sound-Producing Gestures by Novices and Experts. In Gesture in Human-Computer
Interaction and Simulation: 6th International Gesture Workshop, GW 2005, Berder Island,
France, May 18–20, 2004, Revised Selected Papers, edited by S. Gibet, N. Courty, and J.-F. Kamp,
256–267. Berlin and Heidelberg: Springer-Verlag.
Guldbrandsen, E. E. 2011. Pierre Boulez in Interview 1996 (II): Serialism Revisited. Tempo
65 (256): 18–24.
Guldbrandsen, E. E. 2015. Playing with Transformations: Boulez’s Improvisation III sur
Mallarmé. In Transformations of Musical Modernism, edited by E. E. Guldbrandsen and
J. Johnson, 223–244. Cambridge: Cambridge University Press.
Harkins, P. 2010. Microsampling: from Alkufen’s Microhouse to Todd Edwards and the Sound
of UK Garage. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen,
19–35. Farnham, UK: Ashgate.
Heidegger, M. 1977. The Question Concerning Technology and Other Essays. New York: Harper
& Row.
Hennig, H., R. Fleischmann, A. Fredebohm, Y. Hagmayer, J. Nagler, A. Witt, et al. 2011. The
Nature and Perception of Fluctuations in Human Musical Rhythms. PLoS One 6 (10).
Horkheimer, M., and T. W. Adorno. 2002. The Culture Industry: Enlightenment as Mass
Deception. In The Dialectic of Enlightenment: Philosophical Fragments, 94–136. Stanford,
CA: Stanford University Press.
Inglis, S. 1999. 20 Tips on Creating Realistic Sequenced Drum Parts. Sound on Sound, October.
https://web.archive.org/web/20160327093715/http://www.soundonsound.com:80/sos/
oct99/articles/20tips.htm. Accessed December 17, 2018.
Iyer, V. 2002. Embodied Mind, Situated Cognition, and Expressive Microtiming in African-
American Music. Music Perception 19 (3): 387–414.
Jensenius, A., and M. J. Lyons. 2017. A NIME Reader. Fifteen Years of New Interfaces for Musical
Expression. Berlin: Springer.
transformations of rhythm in the digital audio workstation 609

Johnson, H. 2005. The Cathedral. Mix Magazine, July 1. http://www.mixonline.com/news/

profiles/cathedral/377333. Accessed December 17, 2018.
Kvifte, T. 1989. Instruments and the Electronic Age: Toward a Terminology for a Unified
Description of Playing Technique. Oslo: Solum.
Large, E. W. 2000. On Synchronizing Movements to Music. Human Movement Science 19 (4):
527–566.
Leman, M. 2008. Embodied Music Cognition and Mediation Technology. Cambridge, MA:
MIT Press.
Oliver, R. 2015. Rebecoming Analogue: Groove, Breakbeats and Sampling. PhD thesis.
University of Hull.
Prior, N. 2009. Software Sequencers and Cyborg Singers: Popular Music in the Digital
Hypermodern. New Formations 66 (1): 81–99.
Repp, B. 1996. Patterns of Note Onset Asynchronies in Expressive Piano Performance. Journal
of the Acoustic Society of America 100 (6): 3917–3932.
Repp, B. H., and Y.-H. Su. 2013. Sensorimotor Synchronization: A Review of Recent Research
(2006–2012). Psychonomic Bulletin Review 20: 403–452.
Sangild, T. 2004. Glitch: The Beauty of Malfunction. In Bad Music: The Music We Love to Hate,
edited by C. J. Washburne, and M. Derno, 257–274. New York, NY: Routledge.
Théberge, P. 1997. Any Sound You Can Imagine: Making Music/Consuming Technology.
Hanover, NH: Wesleyan University Press.
Zagorski-Thomas, S. 2010. Real and Unreal Performances: The Interaction of Recording
Technology and Rock Drum Kit Performance. In Musical Rhythm in the Age of Digital
Reproduction, edited by A. Danielsen, 195–212. Farnham, UK: Ashgate.
Zeiner-Henriksen, H. T. 2010. Moved by the Groove: Bass Drum Sounds and Body Movements
in Electronic Dance Music. In Musical Rhythm in the Age of Digital Reproduction, edited by
A. Danielsen, 121–140. Farnham, UK: Ashgate.
chapter 30

On the Oth er Side

of Ti m e
Afrofuturism and the Sounds of the Future

Erik Steinskog

Introduction

While it probably is a coincidence that Richard Wagner and Sun Ra share a birthday,
May 22, 1813 and 1914, respectively, there are certainly some dimensions in how their
music has been received that could be compared. They both relate to a “music of the
future,” and while their ideas are strikingly different, the fact remains that such a music
of the future must have to be imagined. Some of the differences between the two are eas-
ily determined, such as views on history and the imagination of the future on a more
general level. That is to say, what kind of future can be imagined? Here it is interesting, as
Jacques Attali (1985) discusses in his now classic Noise, that music has been seen as
prophesying the future within many different thought-systems. There seems, however,
to be a version of Hegelianism at stake when discussing most “classical” music, from
before Wagner and in to the twentieth century, and this is arguably challenged by Sun Ra
and, more importantly, by theoretical discourses trying to get to grips with Sun Ra.
In this chapter, I follow what is most often referred to as Afrofuturism and how this
discourse challenges a normative version of understanding history, and thus introduces
concepts such as counterhistory and countermemory as well as how science fiction and
speculative fiction are also part and parcel of discussing what this music can mean. All
these concepts are, I will argue, diverse approaches to understanding how the different
modalities of time—past, present, and future—are intertwined and how this inter-
twinement is hearable, in something resembling a continuous, sonic time traveling.
612 ERIK STEINSKOG

Space Is the Place

Sun Ra’s film Space Is the Place (1972, directed by John Coney) is, in many ways, a core
text for understanding his worldview. One could argue that the place for Ancient Egypt
is not presented enough in Space Is the Place but, besides that obviously important
dimension, more or less everything is in place. While the film is science fiction—with
references to Blaxploitation as well—it is, in one particular sense, a realistic movie, in
the sense that it deals with the “unreality” of blacks in the United States of the early
1970s, something made abundantly clear in the scene where Sun Ra meets a number of
young people in a community center in Oakland. As he says in that scene:

How do you know I’m real? I’m not real; I’m just like you. You don’t exist in this
society. If you did, your people wouldn’t be seeking equal rights. You’re not real.
If you were you’d have some status among the nations of the world. So we’re both
myths. I do not come to you as reality. I come to you as the myth because that’s
what’s black. (quoted in Zuberi 2004, 88)

On the political level, this unreality is similar to the issues at stake for the civil rights
movement, as can be seen in Sun Ra’s reference to people “seeking equal rights.” But it is
also a statement of an almost ontological or cosmological nature; black or blackness is
myth. Is this the incorporation of society’s way of ordering race relations? Is it Sun Ra
giving up becoming included in the category “human beings”? There is a strand in
afrofuturist discourse arguing in such a direction, but where Sun Ra’s solution is
understood as bypassing the whole category of “the human” and become super- or
posthuman (cf., Eshun 1999, 155). Such a solution can, however, also be seen as a kind of
utopian striving, where the utopian dimension necessitates leaving the category of “the
human” behind. As history shows, first during slavery, where blacks were understood as
“subhumans” and later with the increasing impossibility of “white America” to accept
equal rights, the category itself is flawed.
But whereas the movie is realistic in its depiction of race relations, it moves to science
fiction for its solution (or one of its solutions), where going to outer space and finding a
planet for blacks to create a new civilization is presented. It is, then, about imagining
a future that seems unreal in the present. And while Jerome J. Langguth argues for a
“cosmopolitan” dimension in this solution (2010, 158), I do think the film should rather
be understood as pointing toward this future civilization as a black one, where the
“myth” of blackness is lifted outside time and history and is thus related to what Sun Ra
terms “MythScience.”
In the opening of the film, Sun Ra is seen walking amid vegetation. He is followed by a
creature in a hooded cape with a mirror where the face would have been expected, a
creature earlier seen in Maya Deren’s short film Meshes of the Afternoon (1943), and later
throughout the video to Janelle Monáe’s “Tightrope” (2010), thus bridging between clas-
sical American avant-garde and contemporary Afrofuturism (cf., Steinskog forthcoming).
Afrofuturism and the Sounds of the Future 613

Sun Ra hums, as if to set the scene for a spiritual séance, before going into a longer
monologue, the first words heard in the movie:

The music is different here. The vibrations are different. Not like Planet Earth. [ . . . ].
We could set up a colony of black people here. See what they can do on a planet all
on their own without any white people. They could drink in the beauty of this
planet. It would affect their vibrations; for the better of course. [ . . . ] That would be
where the alter-destiny will come in. Equation-wise, the first thing to do is to con-
sider time as officially ended. We work on the other side of time. We’ll bring them
here through either isotope teleportation, transmolecularization, or, better still,
teleport the whole planet here through music.1

The importance of music is underscored, while it is apparently not only one thing or
dimension. Rather, music is fundamental to the differences experienced on the two
planets at stake, the planet where Sun Ra is seen walking on the one hand, and “Planet
Earth” on the other. “The music is different here,” followed by “the vibrations are dif-
ferent,” undoubtedly follows in a long tradition of understanding music as vibrations
(cf., Goodman 2009). Calling it a tradition is not so much denying the physics—and thus
realness—of understanding music as vibrations, it is rather to point to this understanding
as being part of a continuum where cosmological thinking and/or speculation, science,
and myth meet and thus it is, in a sense, a dimension central to Sun Ra’s MythScience.
This need not necessarily have any consequence for the sound of the music (or the
sound of the music of the future) but it argues for a use of music that is highly interesting.
Music can be a means of transportation and not only on the individual plane as some
ecstatic dimension where the musician moves “out of himself.” Rather, music is understood
as a means of transporting a collective, and in that sense the Arkestra—Sun Ra’s big band—
is not just a “misspelled” orchestra, but becomes an Ark, a kind of spaceship fueled by
sound. Understanding music as a means of transportation is arguably less paradoxical
when thinking about it than when first hearing it proposed. Still, there is another challenge
to such an understanding of the science fiction dimension of Sun Ra as well as of Space Is
the Place. While I suggested above that Space Is the Place could be interpreted as a realistic
depiction of race-relations in the United States, it is also a science fiction film taking
place in a parallel world and quite possibly in the future. Whether it is in the future or
not, it is still “on the other side of time” with different “vibrations.” As such, it raises the
question: How does music sound, or vibrate, on the other side of time?

Afrofuturism and After

In his “Foreword: After Afrofuturism” George Lewis writes that Eshun’s term “sonic
fiction” is an “extraordinarily powerful term” (Lewis 2008, 144). One of the strengths
of the term is that it focuses on the sonic, but equally important is that by focusing on
614 ERIK STEINSKOG

“fiction,” the term can be used in discussing imagination and the imaginary without
having to deal with the visual connotations of “image” in the imaginary. Why is this
important? The visual bias of philosophical and aesthetic thinking has been documented
several times, and is found in the vocabulary of most aesthetic discourses (cf., Jay 1993).
One example could be how “reflection” relates to mirrors and visuality, where the acoustic
equivalent would seem to be echo. In other words, time and space are at stake, and our
way of perceiving time and space, as well as our ways of thinking those same categories,
proves important, for example in one of those places time and space interact, in reverber-
ation. In what sense, that is, is our language determining what we can say about the
phenomena under scrutiny? When it comes to the music or sound of the future, these
aspects might prove themselves important in several senses. But, and this is also in
accordance with Lewis’s argument, it does not necessarily have to do with language and
the categories available for discourse. It could correspondingly relate to how sound is
“imagined” or fictionalized and, perhaps even more importantly, what kind of fantastic
scenarios are available. In other words, “sonic fiction” could—along the lines of afrofu-
turist discourse—relate to the “sonic fantastic.”2
In Lewis’s article, which is the introduction to a special issue of Journal of the Society
for American Music dedicated to “Technology and Black Music in the Americas,” he
wants to challenge Afrofuturism for what he seems to suggest is too strong a focus on
what was previously known as “the extra-musical.” In an earlier article, “Improvised
Music after 1950,” he seems to argue that “the extra-musical” does not exist, as he refer-
ences “areas once thought of as “extra-musical,” including race and ethnicity, class, and
social and political philosophy” (Lewis 1996, 94). In “After Afrofuturism,” on the other
hand, he at least seems to think this distinction has some merit, as becomes clear when he
asks: “What does the sound—not dress, visual iconography, witty enigmas, or sugges-
tive song titles—what can the sound tell us about the Afrofuture?” (Lewis 2008, 141).
It might be that sound (as sound) is an undertheorized dimension of Afrofuturism,
although at the same time Lewis’s question echoes a more traditional musicological
discourse associated to “the music itself.” From such a perspective, one could argue
that “sound” as such hardly exists in the sense that it can “tell us” anything about the
afrofuture—or, for what it is worth, any other future. The sound here is inscribed in con-
texts where, for example “dress, visual iconography, witty enigmas, or suggestive song
titles,” are part and parcel of what is heard. This, not least, is in particular the case with
music (“songs”) including lyrics. If the claim is that lyrics, including the semantic con-
tent, are not a part of the sound, this is difficult to uphold. With these considerations in
mind, however, there are still good reasons to think along the lines Lewis suggests,
exploring, in a heuristic sense, what “sound” can open up in an arguably narrower sense
than I described earlier. And then, perhaps, add the contextual dimensions afterward.
What I am arguing for, then, is a change of perspective, and I think this is one possible
reading of Lewis’s question.
The caveat I introduce, which at first feels necessary for me, is not necessarily fair
with regard to Lewis’s discussion. While the question’s focus on “sound,” and the explicit
exclusion of “dress, visual iconography, witty enigmas, or suggestive song titles,” seems
Afrofuturism and the Sounds of the Future 615

to argue for something close to a “sound itself,” this is almost immediately challenged by
Lewis himself, when he argues for broadening the conversation:

Broadening the conversation would allow a wider range of theorizing about the
triad of blackness, sound, and technology; for a start one could interrupt the male-
ness of the afrofuturist music canon with artists such as Pamela Z, DJ Mutamassik,
Mendi Obadike, Shirley Scott, Dorothy Donegan, the Minnie Riperton/Charles
Stepney/Rotary Connection collaborations, and more. Going further, removing
the putative proscription on nonpopular music allows us to take a more nuanced
complex view of the choices on offer for black technological engagement.
(Lewis 2008, 142)

In particular, I am occupied with what he calls “the triad of blackness, sound, and
technology,” as this triad brings us close to dimensions in the definition of “Afrofuturism.”
The cultural critic Mark Dery coined the term in his interview-article “Black to the
Future.”3 The article primarily comprises interviews with Samuel Delany, Greg Tate, and
Tricia Rose, but in the introduction Dery asks about the near absence of African
American science fiction writers. The existence of such would be logical, he claims, and
later authors have argued that the African American experience in a sense is science
fiction. Dery’s definition has become canonical:

Speculative fiction that treats African-American themes and addresses African-

American concerns in the context of twentieth-century technoculture—and, more
generally, African-American signification that appropriates images of technology
and a prosthetically enhanced future—might, for want of a better term, be called
“Afrofuturism.” (1994, 180)

The term “speculative fiction” is close to a visual metaphor—speculation (from Latin,

“act of looking”)—and thus, in the case of music, gives rise to Eshun’s term “sonic fiction.”
Still, what is meant by speculative fiction could still fruitfully be used in thinking about
the musical side of Afrofuturism. And while sound is not mentioned in this version
of defining Afrofuturism, the other two dimensions of Lewis’s triad—blackness and
technology—are. And from Dery writing about “images of technology and a prosthetically
enhanced future,” there is only a small step to the sonic imagery, and thus relations
between sound and technology.
The paradox of Lewis’s title, “After Afrofuturism,” should not be lost. The article
was published in 2008, whereas Dery coined the term “Afrofuturism” in an article first
published in 1993. Why would Lewis claim that we are “after Afrofuturism?” There
seems to be, in Lewis’s understanding, an undertheorization of music in the classical
Afrofuturism or, rather—and probably better—he seems to suggest that there are other
ways of approaching the triad of “blackness, sound, and technology” than through an
(arguably narrow) afrofuturist lens. That might very well be. On the other hand, in the
years since Lewis’s article, discussions on Afrofuturism have become more common,
a number of new musical acts are being discussed along the lines of Afrofuturism
616 ERIK STEINSKOG

(and I will add something to this toward the end of the chapter), and academic and
activist publications dealing with afrofuturist themes are becoming more common.
In other words, there are few signs that we are really “after” Afrofuturism (although
this depends on what is meant by “after”—according to Sun Ra we are “after the end of
the world”). So, while Lewis might not want to engage the term “Afrofuturism,” his
discussion of the triad of blackness, sound, and technology is of importance for the
dimensions I am occupied with in this chapter.

Blackness and Technology

Lewis’s suggestions for broadening the conversation are to the point, but “sound” is not
any longer isolated. It is part of “the triad of blackness, sound, and technology.” Why is it
that “blackness” should be a term on another level than dress? Or why does Lewis
approve of technology but seemingly not of suggestive song titles? For the second
question the answer should be obvious: technology is a means of producing—and
manipulating—the sound; it is, in other words, implied in the sound, not something
external to it. Similar arguments could be made for the other “extra-musical” dimensions,
but this fact does not take away the validity of the argument in point.
“Blackness,” on the other hand, is in this context a trickier notion, but one that could
be solved by claiming that blackness itself is a technology. An example of such an
understanding is found in Ytasha Womack’s Afrofuturism in a statement from Cauleen
Smith: “When I met artist and filmmaker Cauleen Smith in July 2011, she best summed
up race as creation: ‘Blackness is a technology,’ said Smith. ‘It’s not real. It’s a thing’ ”
(Womack 2013, 27). Note the “unreality” of blackness in this statement, a kind of echo of
Sun Ra’s myth. Cauleen Smith is also the filmmaker behind the Solar Flare Arkestra
Marching Band Project, where, in 2010, she directed a form of flash mob in Chicago,
including a marching band playing Sun Ra’s “Space Is the Place.”4 There are, then,
relations between Smith’s aesthetic practices and her work in understanding the back-
ground for her films, with echoes of Sun Ra and his Chicago days as an important part.
In claiming that blackness is a technology, and adding that “it’s a thing,” Smith points
to some of the complex historical trajectories needing to be addressed to get a full
understanding of what blackness can be said to be—past, present, and future. Within
the discourse of Afrofuturism, one particular discussion has been the absence of people
of color in the imagined futures of science fiction and fantasy. Connected to science
fiction, this is in particular a question about the future, but given that science fiction
more often than not is understood as a distorted notion of the present, it simultaneously
opens up a different perspective on the present. Fantasy arguably can equally well be
about the past; but here another thread is found too, in that Afrofuturism questions the
past as well as the future. The most obvious example is found in Sun Ra’s reference to
Ancient Egypt, where he claims a different understanding of and afterlife in Ancient
Egypt. In his understanding, Egypt was, and still is, unmistakably Africa, and it is the
past, and the past greatness of Egypt, that is his main focus.5 Here he follows
Afrofuturism and the Sounds of the Future 617

George G. M. James’s Stolen Legacy, first published in 1954, a book claiming that Greek
philosophy, and thus, in a sense, European thinking, is stolen from Egypt, is manipulated,
and its origin erased. This erasure continues throughout European thinking, as an
erasure of race, as making universal a certain European understanding of the world.
Given the history of blacks in the United States or, to broaden the understanding even
more while simultaneously quoting the title of Sun Ra’s lecture series at the University of
California, Berkeley, in 1971, given the place of “The Black Man in the Cosmos,” this
European understanding has demonstrably led to a hierarchical understanding of race
as well as of history. But, as Sun Ra says, “History is only his story; you haven’t heard
my story yet” (in the film Sun Ra: A Joyful Noise from 1980, directed by Robert Mugge).
And Sun Ra’s story is a revisionist story, about another kind of origin, in Ancient Egypt,
as a technological civilization, the pyramids testifying to this. But with the Middle
Passage, and with the history of slavery, blacks were not included in the category of
human beings; they were “things.” As Fred Moten opens his In the Break: The Aesthetics
of the Black Radical Tradition: “The history of blackness is testament to the fact that
objects can and do resist” (Moten 2003, 1). Moten’s argument, that blacks were objects,
things, commodities, fits with the history of slavery and, with the abolition of slavery
and until the civil rights movement, a fight for inclusion in the category “human” was
important for the black population in the United States.
One thread within the afrofuturist discourse, arguably most plainly present in Eshun’s
writing, seems to argue that this inclusion did not happen, and that another solution was
found in going beyond the human to some kind of super- or posthuman existence that
should be followed with leaving the planet behind and beginning a black civilization on
a distant planet in outer space. The rationale for this thought seems to be the continuous
presence of white supremacy and racism, a presence continuing after the civil rights
movement’s victories beginning in the 1960s.
What would it mean to say that blackness is a technology? One possibility is to go
along the way of posthuman theory that references different forms of enhancement, for
example, to discuss the body in relation to technology. This seems to be in accordance
with Dery’s definition of Afrofuturism where he writes about “a prosthetically enhanced
future” (Dery 1994, 180). Another angle on the same phenomenon is Lewis’s distinction
between “prosthetic” and “incarnative”—an opposition he takes from Doris Lessing.
In Lewis’s article, it is related to how “a largely prosthetic technological imaginary” is
said to dominate Dery’s references in his writings about Afrofuturism (Lewis 2008, 139);
this criticism highlights relations between the body and technology other than enhance-
ment. In another article, about Pamela Z, Lewis writes:

Z’s strategic placement of BodySynth electrodes—eight small sensors that can be

positioned practically anywhere on the body—moves past the prosthetic readings
envisioned by the technology’s creators towards the dynamics of the incarnative, the
embodied, and the integrative. Z gradually developed a use of the technology that
was fundamentally rhythmic, providing sonic markers of empathy that allowed
her to personally guide the listener/viewer through the complexity of her work.
(Lewis 2007, 59)
618 ERIK STEINSKOG

Here, it is as if the incarnative is a way of moving “past the prosthetic readings,” another
use of technology. That it is “fundamentally rhythmic” is of interest for the sounds being
the result of these interactions between body and technology, as it is also of interest for
understanding the “Afrological” dimensions of music found in Lewis’s thinking, not
least in his important article “Improvised Music after 1950.”

The Music of the Future

In More Brilliant than the Sun, Kodwo Eshun writes about the music of the future as
“traditionally” being “beatless.” It is, he adds, “weightless, transcendent, neatly con-
verging with online disembodiment” (Eshun 1999, 67). His examples are an interest-
ing mixture: Gustav Holst’s The Planets (written between 1914 and 1916), Brian Eno’s Apollo
soundtrack (1983), and Vangelis’s soundtrack to Blade Runner (1982, directed by Ridley
Scott). “Sonically speaking,” he writes, they are not more futuristic than the Titanic
and are “nothing but updated examples of an 18th C sublime” (Eshun 1999, 67). There
are important dimensions to this understanding but, underlying it all, there are some
fundamental questions that need to be addressed. When Eshun writes about “beatless”
music, I, in one sense, could not agree more. And related both to Sun Ra and to the
afrofuturist tradition (if we can call it a tradition), there is clearly some kind of focus on
“the beat.” Here, however, beat must also be understood as rhythm in a more general
sense and what needs to be addressed is how Eshun’s other examples relate to rhythm. In
other words, in what sense is “beatless” music rhythmic? Obviously, nonrhythmic music
does not exist, as rhythm is a way of organizing time and temporality in the sonic material
of music. “Beat,” however, is something different. When Eshun introduces the notion
of weightlessness and transcendence, and compares it with “online disembodiment,”
he is, by contrast, very close to a discussion of a dichotomy between “headmusic” and
“bodymusic”—this discussion, in consequence, would claim a transcendent position as
being disembodied in contrast to an embodied musical practice—for example, dancing.
Dance music would, understandably, focus on the beat—and would thus be one way of
contrasting the “beatlessness” of the traditional music of the future. But is this not at
the same time a simplified interpretation that cannot really be of much help here? First,
evidently some kind of dance is possible to a beatless music as well, if Holst, Eno, and
Vangelis exemplify “beatlessness.” Second, Eshun also argues that hip hop is “headmusic”
(Eshun 1999, 46) and thus is not working within this dichotomy—although, because
he uses concepts related to the dichotomy, it is more difficult to figure out what he is
really arguing (or using the concepts for). Third, the sonic dimension of Holst, Eno,
Vangelis, and a host of others—even if it should be the eighteenth century’s sublime
as a reference—is important in imagining the sound of the future (perhaps more
the sound of the future than the music of the future). This is not least the case with Eno
and Vangelis’s use of synthesizers. And it is not least through the use of synthesizers
Afrofuturism and the Sounds of the Future 619

that Sun Ra’s music is in a tradition of “traditionally” understood “music of the future.”
In a similar context, Lewis writes:

Ra’s use of electronics is a crucial component of the claim to “pre-science” (a meta-

nalysis that Ra might have enjoyed). Yet no academic treatise of which I am aware
has historically traced and contextualized Ra’s use of sound technologies.
(Lewis 2008, 145)

It is the synthesizer, then, or more broadly the use of “sound technologies” that is crucial
for understanding Ra’s music and jazz—broadly understood—in the space age or in the
electronic era.6
But how would Sun Ra’s music fit with Eshun’s description? The question would not
least relate to the beat—and Sun Ra’s relation to “beat” or “beatlessness”—on the one
hand, and his use of synthesizers on the other. But discussing these dimensions will lead
not only to the eighteenth century’s sublime but also to any other understandings of the
music of the future (or the sound of the future). The importance of synthesizers for Sun
Ra’s sonic future cannot be overstated. He was one of the first pianists to explore elec-
tronic keyboards, and these keyboards are key for him in constructing his version of the
music of the future. In some examples, the use of synthesizers is not that different from
Brian Eno or Vangelis while, in other examples, Sun Ra explores the keyboards more as
noise-creators in the tradition of academic or nonpopular electronic music. Here, the
music and vibrations are different, and Sun Ra bends, for example, the Moog synthesizer
to previously unheard-of sounds, as for example on “Outer Space Employment Agency”
from the 1973 album Concert for the Comet Kohoutek that morphs into a version of
“Space Is the Place” (cf., Langguth 2010, 152).
Understanding the synthesizers as related to the future, and thus to history, is not
very surprising and might be seen to be in line with developments within the avant-garde
of nonpopular music. Following Eshun’s take on the tradition of the music of the future
as “beatless,” these synthesizers can also be used within the tradition, as both Eno
and Vangelis would be examples of. The change, it would seem, would be whether or
not “beat” is central to the sound. Simultaneously, perhaps the synthesizers could be seen
as an axis of negotiation between different understandings of the music of the future.
As Eshun writes, “Whoever controls the synthesizer controls the sound of the future, by
evoking aliens” (Eshun 1999, 160). When read in the context of Dery’s understanding of
Afrofuturism, Eshun’s statement seems to echo a quote from George Orwell’s Nineteen
Eighty-Four, which Dery uses as the epigram to his article: “If all records told the same
tale—then the lie passed into history and became truth. ‘Who controls the past,’ ran
the Party slogan, ‘controls the future: who controls the present controls the past’ ”
(Orwell [1949] 2003, 40).
Controlling the different modalities of time—the past, the present, and the future—is
a constant negotiation of tales as well as of technologies. The synthesizer becomes a
control-board not only to the sounds of the future but also to the sounds of the future’s
620 ERIK STEINSKOG

past and the past’s future. The timelessness of synthesizer-sounds is a way of manipulating
the sound waves and the vibrations in relation to, or in contrast to, the dominating tales
of how the futures are supposed to sound.
Dery’s discussion of time and history is related to a major difference between the
normative understanding of history known from Europe, and a question arising whether
this same understanding makes sense within an African American context. As he asks
in a timely manner: “The notion of Afrofuturism gives rise to a troubling antinomy:
Can a community whose past has been deliberately rubbed out, and whose energies
have subsequently been consumed by the search for legible traces of its history, imagine
possible futures?” (Dery 1994, 180). In other words, the past is a necessary component in
imagining the future. If the past is lost or erased it will have to be recreated as a means to
perceive a future at all. And if Orwell’s party-slogan is followed, this past is a result of
controlling the present. Sun Ra’s intervention in the present and the sounds he makes—
alone or with the Arkestra—is giving sound to an intersection of the present, the past,
and the future, and understanding the future—imagining the future—is thus intimately
related to all other modalities of time.
The synthesizer, then, is deeply embedded in the temporalities of sound, including
the sound of the future but there are two other important dimensions to Eshun’s quote
cited earlier: the reference to “control,” and the reference to “aliens.” Controlling the
synthesizer is more than playing it, it is also a matter of programming the sounds—or,
rather, to work with the sounds themselves rather than simply making audible the
default sounds of the synthesizer. This, obviously, is of prime importance when industry-
standard sounds became the norm in popular music.
One update of Sun Ra that Eshun focuses on is the Jonzun Crew’s album Lost in Space
(1983), in particular the track “Space Is the Place.” With this title, the Sun Ra reference is
apparent, but Eshun’s focus is on the alterations of the voice: “On Jonzun Crew’s Space is
the Place, the Arkestral chant becomes a warning blast rigid with Vadervoltage. Instead
of using synthesiser tones to emulate string quartets, Electro deploys them inorganically,
unmusically” (Eshun 1999, 80). For Eshun, the significance of the vocoder-voice is that
the voice is turned into a synthesizer and, as such, the voice is synthesized too or, one
could argue, it is dehumanized. What terms to use, however, will also question how one
thinks about “music,” “voice,” and so forth. When Eshun claims that the synthesizers are
used inorganically, it is not necessarily a negative judgment. Rather, it should be seen as
an extension of Eshun’s writing about the movement from the human to the posthuman.
In that sense, “dehumanizing” would be wrong too, as in relation to black music the very
notion of “the human” is very much at stake.
The focus on the vocoder and its relation to a black posthumanism is also found in
Alexander Weheliye’s article “Feenin,” where Weheliye claims Eshun as “the foremost
theorist of a specifically black posthumanity.” This is in contrast to the then emerging
theories of the posthuman (in the aftermath of, not least N. Katherine Hayles), showing the
“literal and virtual whiteness of cyber-theory” (Weheliye 2002, 21), thus potentially
erasing people of color from posthumanity. In Weheliye’s point of view, an important
Afrofuturism and the Sounds of the Future 621

way to alter this discourse, and to engage black cultural production, is “to realign the
hegemony of visual media in academic considerations of virtuality by shifting the
emphasis to the aural” (21); “Incorporating other informational media, such as sound
technologies, counteracts the marginalization of race rather than rehashing the whiteness,
masculinity and disembodiment of cybernetics and informatics” (25). Weheliye’s focus
is the vocoder, “a speech-synthesizing device that renders the human voice robotic, in
R&B, since the audibly machinic black voice amplifies the vexed interstices of race,
sound, and technology” (22). These interstices—the places where race, sound, and tech-
nology meet—question the place of blackness within cybertheory but, at the same time,
relate to what Lewis discusses when interrogating “the triad of blackness, sound, and
technology” (Lewis 2008, 142). The vocoder is a part of this triad in a very particular
sense, given that the technologization of the voice contributes to a different take on “the
human” and of blackness.
Simultaneously, going back to the Jonzun Crew highlights another dimension of
“the music of the future.” While the mechanical, robot-like voices heard on this track
sound like science fiction—and the long tradition of speaking robots or aliens from
HAL in 2001 to Samantha in HER—it is also the sound of a particular, historical
understanding of this inhuman sound. With HAL, the robotic is hearable, whereas
Samantha sounds like a regular female voice and her artificiality is impossible to hear. A
similar argument can be made for Janelle Monáe, whose alter ego Cindi Mayweather is
supposed to be an android, but whose singing voice is identical with Monáe’s (cf.,
Steinskog forthcoming). Monáe’s overall concepts for her albums, including the perfor-
mance of the android, is thus one half of the story of the future in/human voices, where
the other half, arguably, is the autotuned or technologically modified voices. The
vocoderized voices of Jonzun Crew belongs to the second half of this same imagination,
and shows us one of the past’s imaginations of (another) future.

Sonic Fiction

Fiction is not the same as “imagination” but in this scheme of things there are definitely
relations. If we are on the other side of time, or if music is a kind of prophecy, a sonic
imaginary of the future, then a sonic fiction can be about the sound of this nonheard (or
yet-unheard) music. There is a paradox in all these formulations in that “imagination,” in
its linguistic root, seems to point to the sense of vision. Thinking the sonic imaginary—
despite the linguistic paradox—is necessary for the sound of the future to be present.
But this, at the same time, also relates to one of the key questions of science fiction:
whether it is about the (or a) future or whether it primarily is a slightly distorted picture
of today. Both these understandings make sense in relation to science fiction, but they
are still important in trying to be precise in analyzing what we are doing. And even in
the stories of the future (rather than the present), these stories are about some future
622 ERIK STEINSKOG

imagined from the point of view of the here and now. When it comes to music, including
Attali’s music as prophecy, the means of production are obviously found here and now
too including, not least, the sound-producing devices.
There can be little doubt about the lasting influence of Sun Ra. This is not only because
the Arkestra is still touring—decades after Sun Ra “left the planet”—although that
understandably plays a role, but is also because of the importance of Sun Ra within
Afrofuturism as well as his importance across a spectrum of artists using elements of
Sun Ra’s music or thinking or simply expanding on his aesthetics. One could make an
important case for an argument about a (musical) continuity of Sun Ra influences going
back at least to Parliament/Funkadelic, but rather than such a discussion of history, I
want to end this chapter with some contemporary examples in a musical vein that can be
said to be a continuation of Eshun’s more “canonical” Afrofuturism.7 Eshun’s narrative
of afrofuturist music—or black sonic fiction—is more in line with a classical avant-garde
discourse that more or less excludes “popular music.”8

Transmolecularization—
Beyond Sun Ra

While Jonzun Crew updated Sun Ra for the 1980s, and while it might sound dated today,
there are many contemporary musicians doing different takes on the Sun Ra legacy too.
Related to genre, many of them are best thought of under the vague umbrella-term
“electronica” but there are good reasons to discuss them in relation to updated versions
of Afrofuturism. In that sense, they might be seen as challenging Lewis’s understanding
that we should be “after Afrofuturism.” I have already mentioned Janelle Monáe, but my
focus here will be four other musicians who are DJs or producers: Ras G., Kirk Knight,
Flying Lotus, and Hieroglyphic Being. Much of the current music understood as afrofu-
turist is sample-based, opening up for other ways of making relations, including those
that are historical. Communicating with samples is an inherent part of hip hop aesthetics
and it is also related to quotes and other ways of citing earlier music and performances
in instrument-based music; with samples, though, the signifying processes are different.
At the same time, such a practice is undoubtedly a use of technology, opening up for
another angle in the triad of blackness, sound, and technology.
In the music of Ras G. (born Gregory Shorter Jr. in 1979 or 1980)—often recording
under the name Ras G. & the Afrikan Space Program—such sampling practices are
found not only when it comes to titles and references (in what used to be understood as
the extra-musical) but also in the musical sounds. Take the track “Astrohood” from the
album Brotha from Another Planet (2009) where he samples from Sun Ra’s “I’ll Wait for
You” from the album Strange Celestial Road (1980). The singing voices in Sun Ra’s track
are overtaken by electronic sounds—similar to the sounds/noises of computer games—
before a beat is introduced and later followed by what is almost Ras G’s signature—voices
Afrofuturism and the Sounds of the Future 623

shouting “Oh Ras” with a heavy echo to it. Sun Ra’s song is groovy with a bass vamp
leading into call-and-response voices, and it is these voices Ras chooses to sample,
rather than the bass groove or Sun Ra’s discrete synthesizer sounds. However, one would
define the generic differences between the two tracks as a transformation from a more
or less funky bass dominating the sounds to the electronic sounds dominating Ras’s
track. If we were to compare the two tracks the difference in length would play a role.
“Astrohood” is short, only 1:55, whereas “I’ll Wait for You” is sixteen minutes and the
latter develops into a jam where, under the saxophone solo, Sun Ra is exploring the
noisier spectrum of his synthesizers.
On the track “Natural Melanin Being . . . ” from Back on the Planet (2013), Ras G.
instead samples Sun Ra from an interview where he speaks about natural blackness
as well as about Ancient Egypt. Everything in-between and around Sun Ra’s voice is
Ras G’s electronic sounds. The electronic sounds are layers of samples, with sonic refer-
ences across decades of music. In that sense, another version of “the other side of time” is
presented, a time where the past is potential for recreation and revision and, as such, a
technological parallel to the understanding of history Sun Ra seems to relate to. On both
these albums there are also references to Sun Ra in the aesthetics of the album covers
and in the titles; so, in that sense, one would have to say it is a whole aesthetic rather than
simply a sonic ideal.
With Kirk Knight’s “Start Running,” the opening track of Late Night Special (2015),
Sun Ra’s voice is heard again, this time with the famous words from the opening of
Space Is the Place. The first sounds on the albums are Sun Ra’s voice saying “teleportation,
transmolecularization, or better still, teleport the whole planet here through music.”
After “better still” the rapper comes in, rapping over the rest of the still audible words of
Sun Ra, moving into a contemporary alternative hip-hop track. Toward the end of the
track the voice of Sun Ra returns, saying, “the music is different here” and so on. Knight
thus clearly signifies on Sun Ra’s statements and in a particular sense can be said to
attempt, for the rest of the album, to present this “different music,” again re-inscribing
African American music in a process of teleporting the planet. The sonic environment
around the first Sun Ra sample, however, is more related to Alice Coltrane than Sun Ra.
A sweeping harp rather than synthesizers, and so another mode of combining acoustic
instruments and electronics. With the harp and the Alice Coltrane references the track
is closer to Flying Lotus than to Ras G., and one track on Knight’s album, “Dead Friends,”
features Thundercat—Stephen Bruner—who also collaborated with Flying Lotus.
Flying Lotus (born Steven Ellison, 1983) is the grandnephew of Alice Coltrane and
the son of Marilyn McLeod. Both these relations are often referenced within his music,
the first with attention to the spatial and spiritual dimensions in his music, the second with
reference to more traditional popular music and to Motown (McLeod wrote, among other
songs, Diana Ross’s “Love Hangover”). Flying Lotus’s track “Transmolecularization” is
an outtake from You’re Dead! and features Kamasi Washington on saxophone and is a
track first played on his BBC Radio 1 sessions (May 14, 2015).9 The title of this track is
a clear reference to Sun Ra, both to the opening of Space Is the Place, and to a particular
scene in the film, from the Outer Space Employment Agency in Space is the Place where
624 ERIK STEINSKOG

transmolecularization is one of several terms Sun Ra uses to explain the relationship to

outer space and to travels in outer space.10 By using the same term as the title of a track,
Flying Lotus signals the legacy of Sun Ra. Besides this, however, the track demonstrates
inheritance on many levels, as for example in the sound of Washington’s saxophone.
Rather than sounding like the music from Sun Ra—or rather the Arkestra’s saxophone-
players—Washington is closer to the sound of John Coltrane, Pharoah Sanders (who
played with Sun Ra on the album Sun Ra Featuring Pharoah Sanders and Black Harold,
recorded in 1964), and the “space jazz” or “spiritual jazz” of the 1960s and 1970s. There
might be a paradox here, not only in Washington’s playing, but in Flying Lotus’s pro-
duction more generally. If what is at stake is “the sound of the future,” what happens
when musicians go back in time to find this sound? In other words, what kind of histori-
cal thinking is at stake in producing the sound—or music—of the future? The sound of
“Transmolecularization,” however, is a mixture of Washington’s saxophone and samples
close to both 1960s jazz and electronic music, and, as such, it might point to an
understanding of the future as a combination of elements from the past. Compared to
Flying Lotus’s earlier albums, You’re Dead! is more of a jazz album, with references to
the late 1960s and early 1970s. “Transmolecularization,” while being an outtake, is a case
in point, with Washington sounding like Pharoah Sanders or Joe Henderson, for example,
and the way they play on Alice Coltrane’s Ptah, the El Daoud (1970). Here, then, the
reference to Sun Ra is in the title, showing music as the means of transportation, but
the actual sonics are closer to Alice Coltrane and what is arguably another strand of
afrofuturist music. And while any clear samples are not in the forefront of the mix, this
music still shows central features of hip hop aesthetics as an art of recombination, allusions,
and quotes—both of particular songs/tracks, or of a more general aesthetic or vibe. Here,
even history becomes a kind of technology, a kind of sonic time-travel where the sounds
of the past re-emerge in the present together with the sounds of the (imagined) future.
On the other side of time, the whole cosmos is vibrating, echoing across the universe.
Flying Lotus’s You’re Dead! is, in many ways, a culmination of a collaboration between
electronic sounds and live instruments—there are several examples on his earlier albums.
An arguably related development can be seen in the music of Hieroglyphic Being, even
if the latter’s music has been more electronically dominated for a much longer time.
His 2015 album We Are Not the First might thus be an exception, but if so it is a very
interesting exception in the present context. Hieroglyphic Being (born Jamal Moss,
1973) is better known for playing music based on the house genre—he is from Chicago—
but also in this context Sun Ra is referenced as, for example, in the track “Space Is the
Place (But We Stuck Here on Earth)” from the 2013 album A Synthetic Love Life.
Primarily working with turntables, drum-machine, and mixer, the reference to the
“synthetic” should be seen in the mainstream tradition of human versus machine but,
within the same tradition, the DJ becomes a kind of cyborg, where the machines and the
human merge. When adding musical instruments, this interaction of humans, instru-
ments, and machines become even more complex, as already heard on Flying Lotus’s
“Transmolecularization.” It is, however, not only a process of more complex interactions,
it is also a process where enhanced sounds as well as incarnated sounds are heard;
Afrofuturism and the Sounds of the Future 625

in other words, a combination of different musical technologies. Rather than simply

moving the music into the domain of the machines—much of Flying Lotus’s early work,
as well as the majority of Hieroglyphic Being’s output—gives rise to a different negotiation
between live instruments (what could be understood as a past musical practice) and the
DJ (understood as one version of the new). In this sense, it is, on an aesthetic level, a
continuation of Sun Ra’s own practices where his synthesizers and electronic keyboards
are heard alongside a traditional big band, even if that big band is expanded with less
traditional instruments. Hieroglyphic Being’s 2015 album We Are Not the First sees him
in company with live musicians, among them Marshall Allen, who played saxophone
with Sun Ra from 1958 and who currently leads the Arkestra. Allen’s participation on
this album is one way Sun Ra’s legacy is vibrant, but there are also other dimensions,
musically, aesthetically, and what I would call cosmologically.
There might seem to be a long way from the musical to the cosmological but even in a
more “traditional,” “normative,” and “Eurological” (cf., Lewis 1996) context, historical
relationships can be found between music and the cosmos going back to Pythagoras
in Ancient Greece, with important inputs during the Renaissance, and likewise several
interesting contributions within the twentieth century’s “modernism” and beyond.
One could, for example, make arguments about similarities between Sun Ra and
Karlheinz Stockhausen, where arguably Lewis’s opposition between “Afrological” and
“Eurological” could be a way into this discussion even if Lewis’s main examples are
Charlie Parker and John Cage. Lewis’s opposition, however, could also be used more
accurately on Sun Ra’s MythScience—that is, Sun Ra’s “system” of thought—where there
is a clear revisionist dimension. As shorthand, this could be described by referencing
the subtitle of George G. M. James’s Stolen Legacy (from 1954): “The Greeks Were Not
the Authors of Greek Philosophy, but the People of North Africa, Commonly Called the
Egyptians.”11 The point of departure or origin changes; the “birth” of European thinking
(including the relation between music and cosmology) is in the one version Ancient
Greek and in the other Ancient Egypt. The consequences of choosing between one or the
other of these versions are much larger than one would believe. It involves, for example,
the very notion of “history”—and the so-called prehistorical. The “prehistorical” could
here be called “myth” or “mythical,” and as such Sun Ra’s notion of MythScience might
be reintroduced. Another of Sun Ra’s concepts, “Astro Black Mythology,” points in the
same direction, but with an expanded context given the explicit reference to blackness.
Blackness in this context is related to “Africa”—and, in a sense, it does not really matter
whether this is a “real” or an “imagined” Africa. The effects of this “Africa” are observa-
bly real. It is, and here James’s Stolen Legacy might again be referenced, a question about
the “black”/“African” origin of “white”/“European” thought. This historical situation,
that is, the questions related to Ancient Egypt and to the whitewashing of history (or
“History”) is only one part of these consequences. There is also a contemporary dimen-
sion, with some relevance to the history of music (popular music), and distinctions
between “black” and “white” music, and, not least, the presumed hierarchies between
these understandings (cf., Steinskog 2011, on Ellington). This is, or can be, understood as
a distinction between improvisation and composition, but also, as Lewis makes clear,
626 ERIK STEINSKOG

between different understandings of improvisation. Finally, and this is probably the most
relevant for the current discussion, there is a way of understanding the questions of
rhythm and beat.
From this, an argument can be traced back to Sun Ra and to the music of the future
that is related not only to beat and rhythm but, simultaneously, to how the music of the
black future is always also related to reimaginations, reinterpretations, and revisions
of the past (cf., Lock 1999). Here the history of music that Eshun relates is made more
complex by a constant intertwinement of different pasts and their respective futures,
where, in the case of Ras G., Kirk Knight, Flying Lotus, and Hieroglyphic Being, sam-
plers, turntables, computers, and mixers substitute for Sun Ra’s synthesizers, both as a
continuation and as a renegotiation of the history of black music. Imagining the future
of blackness thus becomes equally as much imagining the unheard of as remixing and
renegotiating the past. The future and the past intertwine—continually—in the present
of the sounding music, being multidirectional rather than linear, but pushing the sounds
into other worlds.

Notes
1. https://www.youtube.com/watch?v=4s8VZz-ERO0. Accessed May 15, 2017.
2. “Sonic fantastic” might be seen as one possible dimension of what Richard Iton calls “the
Black fantastic” (2008).
3. The article is found in Dery’s edited volume Flame Wars: The Discourse of Cyberculture
from 1994. With one exception, the whole volume was first published as volume 92,
number 4 of the South Atlantic Quarterly (in 1993).
4. https://www.youtube.com/watch?v=WvcXwtqQ5ME. Accessed May 15, 2017.
5. Recent DNA-research argues that ancient Egyptians were more genetically similar with
people from the eastern Mediterranean than people in modern-day Egypt. https://www.
livescience.com/59410-ancient-egyptian-mummy-dna-sequenced.html Accessed June 14,
2017.
6. Lewis also references George Russell’s Jazz in the Space Age (1960) and Electronic Sonata
for Souls Loved by Nature (1968).
7. This means that I am excluding examples drawn from what is arguably a more main-
stream contemporary music, such as Janelle Monáe (her references to Sun Ra in the video
to “Tightrope,” for example).
8. This criticism has been raised by several authors, including myself, and should be taken
seriously when considering Afrofuturism at large. However, related to the sounds of the
future, it still makes sense to discuss this same avant-garde logic in its own right. Focusing
here, then, does not remove the importance of a more mainstream Afrofuturism as well.
9. In addition to the mentioned tracks, the term “transmolecularization” is also used by
Eagle Nebula on the track “Nebulizer” from her EP Space Goddess (2015).
10. https://www.youtube.com/watch?v=iDwn0lsxDGg. Accessed May 15, 2017.
11. One could also refer to Martin Bernal’s Black Athena, given that Bernal establishes arguments
for observing the effects of such a “revision” throughout European intellectual history.
Afrofuturism and the Sounds of the Future 627

References
Attali, J. 1985. Noise: The Political Economy of Music. Minneapolis: University of Minnesota Press.
Dery, M. 1994. Black to the Future: Interviews with Samuel R. Delany, Greg Tate, and Tricia
Rose. In Flame Wars: The Discourse on Cyberculture, edited by M. Dery, 179–222. Durham,
NC: Duke University Press.
Eshun, K. 1999. More Brilliant than the Sun: Adventures in Sonic Fiction. London: Quartet
Books.
Goodman, S. 2009. Sonic Warfare: Sound, Affect, and the Ecology of Fear. Cambridge, MA:
MIT Press.
Iton, R. 2008. In Search of the Black Fantastic: Politics and Popular Culture in the Post-Civil
Rights Era. Oxford: Oxford University Press.
James, G. G. M. (1954) 2008. Stolen Legacy. New York: Wilder.
Jay, M. 1993. Downcast Eyes: The Denigration of Vision in Twentieth-Century French Thought.
Berkeley: University of California Press.
Langguth, J. J. 2010. Proposing an Alter-Destiny: Science Fiction in the Art and Music of Sun
Ra. In Sounds of the Future: Essays on Music in Science Fiction Film, edited by M. J. Bartkowiak,
148–161. Jefferson, NC: McFarland.
Lewis, G. E. 1996. Improvised Music after 1950: Afrological and Eurological Perspectives.
Black Music Research Journal 16 (1): 91–122.
Lewis, G. E. 2007. The Virtual Discourses of Pamela Z. Journal of the Society for American
Music 1 (1): 57–77.
Lewis, G. E. 2008. Foreword: After Afrofuturism. Journal of the Society for American Music
2 (2): 139–153.
Lock, G. 1999. Blutopia: Visions of the Future and Revisions of the Past in the Work of Sun Ra,
Duke Ellington, and Anthony Braxton. Durham, NC: Duke University Press.
Moten, F. 2003. In the Break: The Aesthetics of the Black Radical Tradition. Minneapolis:
University of Minnesota Press.
Orwell, G. [1949] 2003. Nineteen Eighty-Four. London: Penguin.
Steinskog, E. 2011. Hunting High and Low: Duke Ellington’s Peer Gynt Suite. In Music and
Identity in Norway and Beyond: Essays Commemorating Edvard Grieg the Humanist, edited
by T. Solomon. Bergen: Fagbokforlaget, 167–184.
Steinskog, E. forthcoming 2019. Metropolis 2.0: Janelle Monáe’s recycling of Fritz Lang. In
Afrofuturism 2.0: The Black Speculative Art Movement, edited by R. Anderson and C. Fluker.
Lanham, MD: Lexington Books.
Weheliye, A. G. 2002. Feenin’: Posthuman Voices in Contemporary Black Popular Music.
Social Text 20 (2): 21–47.
Womack, Y. L. 2013. Afrofuturism: The World of Black Sci-Fi and Fantasy Culture. Chicago:
Lawrence Hill Books.
Zuberi, N. 2004. The Transmolecularisation of [Black] Folk: Space Is the Place, Sun Ra and
Afrofuturism. In Off the Planet: Music, Sound and Science Fiction Cinema, edited by
P. Hayward, 77–95. Eastleigh: John Libbey.
chapter 31

Posth um a n ist Voices

i n Liter at u r e
a n d Oper a

Jason R. D’Aoust

Introduction

When we think of the voice from the perspective of sound and imagination, a familiar
observation comes to mind: the voice is a series of phonatory sounds we emit (as in
speech, screams, and songs), but also their interior manifestation in our mind’s ear.
The experience of hearing a voice WHEN we think, read, and write leads us to think of
voices as dual in nature, namely through their inner and outer manifestations, but the
interrelation of the two is more complex than it appears. Our seemingly innate inner
voice gives us the impression that our interiority precedes any exteriorization, and
thereby establishes a hierarchy in communication. In identifying inner and outer voices
as two sides of the same coin, we come to believe that speech and song are the material-
ized expression of our inner voice. Artistic practice can reinforce this point of view.
Eileen Farrell, for example, has commented on how the imagination plays an important
part in vocal performance: rather than focus on the manipulation of larynx, pharynx,
and resonators, successful artists concentrate instead on imagining the pitch, texture,
and tone of the vocal line they then instantly create in performance (Farrell 1993). This
performance practice defines the sonorous imagination as an active agent that forms
sounds in the inner ear before they are vocally expressed and manifested. These obser-
vations might also implicitly convey a dualist perception that vocal expression is material
and the inner voice is not. Such a way of understanding the voice often turns out to
support or be supported by metaphysical explanations of the physical world. A meta-
physical worldview purports that there are immaterial principles (like our identity with
our inner voice), which nevertheless have the creative force to organize the material
world. For the last half-century, however, critical theory has opposed this way of organizing
630 jason r. d’aoust

knowledge about, but especially through, the voice. Poststructuralist concerns like the
death of the author and the Derridean writing of différance oppose biographical criti-
cism, because as the latter speaks for the author’s voice, it leads to a paucity of diverging
interpretations and points of view.
This chapter examines these critical intersections of voice, sound, and imagination in
order to situate them within studies on posthumanism. Many posthumanist theorists
discuss the voice, or problems related to it, with the intent of displacing certain assump-
tions about subjectivity or self-presence. This way of writing about the voice ties in with
earlier critical theory in which the voice was criticized for transmitting notions of
identity. As a point of departure into understanding the discursive implications of the
posthumanist appraisal of vocality, I start by giving background to the phonocentric
critique of voice. I then turn to the recent reappraisal of voice by criticism of videocen-
trism and to theorists who are interested in the voice’s epistemic purchase, insofar as it
can create a discursive space around vocal embodiment and the voice’s materiality. The
following section brings this critical discussion to bear on the posthumanist reception
of opera. I discuss how theorists have visited the history of opera in order to compare the
genre to philosophical discourse for rhetorical purposes, but not necessarily to revise
the discursive flattening of the expressive voice. Opera studies have, so far, shied away
from engaging with posthumanism. I therefore draw on the musicological reception of
opera’s many voices, in order to deconstruct the assumptions made in the name of the
“operatic voice.”

Autopoiesis and the

Autoaffective Voice

In What Is Posthumanism?, Cary Wolfe situates the problem of thinking of the human,
and, by extension, of humanism, within the larger problem of the multiplicity of living
consciousness. His book asks of us

to rethink our taken-for-granted modes of human experience, including the normal

perceptual modes and affective states of Homo sapiens itself, by recontextualizing
them in terms of the entire sensorium of other living beings and their own autopoi-
etic ways of “bringing forth a world”—ways that are, since we ourselves are human
animals, part of the evolutionary history and behavioral and psychological repertoire
of the human itself. (Wolfe 2010, xxv)

For Wolfe, posthumanism is predicated on our species’ awareness that other species are
not only sentient, but that their consciousness creates different worlds, knowledge of
which should also further our understanding of the human animal. His approach relies
on debunking presuppositions about language that unwittingly convey remnants of a
metaphysical worldview in which humans claim ownership of, or stewardship over,
posthumanist voices in literature and opera 631

other living beings. In doing so, he furthers Jacques Derrida’s attention to autoaffection
by connecting it with autopoiesis, a relatively new term initially borrowed from biology
by communication studies (Maturana and Varela 1980). In Wolfe’s argument, autopoiesis
acts as a benchmark with which to compare different animal experiences of the world,
including that of the human species. More importantly, the evolutionary inheritance
of autopoiesis should ethically require from us greater critical attention to implied or
unwitting value judgments we make when we compare other forms of animal communi-
cation to human linguistics.
What is autopoiesis? Poiesis is borrowed from the Greek and in its literary sense
means “the creative production, especially of a work of art”; but when used as a suffix, its
literal translation denotes “the formation or production of something.”1 Biologists have
used the combined form to describe the “self-maintenance of an organized entity
through its own internal process” (Oxford English Dictionary); therefore, an “auto-
poietic system is one that produces itself ” (Buchanan 2010). Autopoiesis was introduced
to communication studies when Niklas Luhmann made it a key concept in systems
theory in order to argue that a system of communication does not precede its given
social space (Luhmann 2010; Wolfe 2010, 3–29). This biological insight into commu-
nication implies that human consciousness through language is a matter of animal
evolution, elements of which could very well be shared with other species. In turn, the
“autopoeitic ways” of Wolfe’s theorization are interesting to thinkers of the expressive
voice and vocality, because they might further dislodge the function of voice as the
metaphysical guardian of self-presence.
The inner voice, though it seems innate to most of us, is not a clean slate. Derrida’s
criticism of the autoaffection of the voice-as-presence is a key moment for Wolfe, because
it moves away from the “self-presence of consciousness” toward writing qua trace as
“fundamentally ahuman or even anti-human” (Wolfe 2010, 6). It is less clear, however, if
the sonorous voice’s past associations with humanist identity mark it as a phenomenon
to be discarded in Wolfe’s argument. As Don Ihde remarks in Listening and Voice,

Voice is, for us humans, a very central phenomenon. It bears our language without
which we would perceive differently. Yet outwards from this center, voice may also
be a perspective, a metaphor, by which we understand part of the world itself.
(Ihde 2007, 189)

Like Wolfe, Ihde is aware that our vocal experience of language and the world presents
the problem of “domesticating it into our constant interpretation that centers us in the
world” (Ihde 2007, 186). Can greater attention to the musicality or sonority of voice make
us further aware of the distance we impose on the world’s sounds through language?
I will shortly discuss how Wolfe arrives at the sonorous voice by way of opera and how his
sources discuss opera by way of an “operatic voice.” This chiasmic construction (opera-
voice/voice-opera) might give the impression of canceling itself out and of being of little
consequence, but it gestures toward a conflation that assigns the sonorous voice to a
genre whose aesthetic diversity is thereby greatly reduced. However, before I arrive at
632 jason r. d’aoust

this posthumanist stance on opera qua “operatic voice,” I will consider what the voice
means for philosophers and critical theorists.
For two millennia, Western philosophy has claimed the voice as the linguistic
medium of human reason and, by extension, proof of the primacy of humans over other
species lacking in language and reason. To understand the ramifications of this tradition
on current work about the voice, we may look to Heidegger’s historical survey of the
voice in “The Concept of the Logos” (Heidegger 1962, 55–58) or look back to neo-Platonist
definitions of voice (Mansfeld 2005).2 Ultimately, the search for an ever-receding origin
of the voice is not only impossible but also counterproductive. Indeed, “by avoiding
tales of origins, we are closer to a possible answer. For, whatever else the voices of
language may be, at the center where we are, they are rich, multidimensioned and filled
with as yet unexplored possibilities” (Ihde 2007, 194). For our purposes, however, let us
make Derrida’s first publications our point of departure.
When Derrida, in Speech and Phenomena, discusses the “expressive voice” in relation
to Edmund Husserl’s philosophy, he reproduces the latter’s terminology for the expres-
sive voice to designate our “silent interior monologue” (Spivak 1976 in Derrida 1998,
liii). This inversion of our everyday understanding of the expressive voice occurs
because Husserl, “being interested in language only within the compass of rationality,
determining the logos from logic . . . determined the essence of language by taking the
logical as its telos or norm” (Derrida 1973, 8). In order for language to hold any truth-
value, it had to be logically consequential in its assertions about itself. How does this
logical search for truth through language silence the expressive voice?
One way of verifying whether or not language can achieve this logical exactitude is to
put the terms it uses to the test of translation. Derrida underlines a lack of categorization
in the French translation of Husserl, because it systematically rendered Bedeutung
into the French signification. He notices the lack of terminological choice in French to
express a difference between the German terms Sinn (sense, signification) and Bedeutung
(meaning, signification), and argues that a lack of linguistic equivalencies should not
erase the differences in experience they point out. As Derrida remarks, for Husserl
“meaning [Bedeutung] is reserved for the content in the ideal sense of verbal expression,
spoken language, while sense (Sinn) covers the whole noematic sphere right down to its
nonexpressive stratum” (Derrida 1973, 19). Meaning is the result of an interpretation
(Deutung) that should be reserved for communication relying on the expression
(Ausdruck) of speech (Rede). Sense (Sinn, signification), on the other hand, although it is
always conveyed by expressive speech, may also be indicated (Anzeichen) through
nonlinguistic means. Yet, for Husserl “meaning (bedeuten)—in communicative speech
(in mitteilender Rede)—is always interwoven (verflochten) with such an indicative relation”
(in Derrida 1973, 20). One should pause here and note how the indicative musical
characteristics of speech that also make sense—such as pitch, tone, rhythm, and velocity—
are silenced in this logic of communication. For now, however, let us continue and
examine how expression (Ausdruck), although it denotes an outward push, neverthe-
less loses its phonation, as the expressive voice gets turned into the voice of our “silent
interior monologue” (Spivak in Derrida 1998, liii).
posthumanist voices in literature and opera 633

If Husserl interiorizes the voice of communication, then he must also interiorize

its addressee. In any given speech act, it is impossible for me to really know what the
other means:

expression indicates a content forever hidden from intuition, that is, from the lived
experience of another, and also because the ideal content of the meaning and spir-
ituality of expression are here united to sensibility. (Derrida 1973, 22)

Both of these problems are avoided by deviating the communicative structure of address:
the ideal addressee is no longer the person one speaks to, but part of our silent inner
voice. This silent address retains the structure of communication, however, through
the intention of the inner voice’s objective ideality—akin to the ideal reader to whom
one writes—which becomes a substitute for the external other. In other words, the sus-
pension of expressivity’s (indicative) communicating relation to an exterior addressee
is necessary in order to ensure that nothing be hidden to meaning in the ideality of
language. This silent yet expressive voice thus unites thought and language through
self-presence, but does so at the expense of a phonatory vocal act, in order to make
communication logically possible.3
Yet even this ideal voice presents a flaw. Although the voice of self-consciousness
might satisfy the requirements of autoaffection—hearing one’s inner voice, thereby
giving one a sense of self—it cannot fully express presence. This is an enduring problem
in the history of Western thought. Augustine, for example, struggles with expressing
self-presence in his Confessions. He remedies the lag in communicating his own relation
to presence (and Logos) through song because, in his view, music distends speech and
thereby elongates its enunciating present (Augustine 1998, XI: 17 ff.). Derrida, however,
follows the logic of the trace to its visual outcome.

For Derrida, Husserl’s descriptions [of retention] imply that the living present, by
always folding the recent past back into itself, by always folding memory into per-
ception, involves a difference in the very middle of it. In other words, in the very
moment, when silently I speak to myself, it must be the case that there is a miniscule
hiatus differentiating me into the speaker and into the hearer. There must be a hiatus
that differentiates me from myself, a hiatus or gap without which I would not be a
hearer as well as a speaker. This hiatus also defines the trace, a minimal repeatability.
And this hiatus, this fold of repetition, is found in the very moment of hearing-
myself-speak. Derrida stresses that “moment” or “instant” translates the German
“Augenblick,” which literally means “blink of the eye.” When Derrida stresses the
literal meaning of “Augenblick,” he is in effect “deconstructing” auditory auto-
affection into visual auto-affection. (Lawlor 2014)

The infinitesimal lag in self-presence—in English we may also use adverbs like “at once”
or “instantaneously” to translate the temporal indication of the German noun
Augenblick—is thus translated into the ocular sphere of the interstitial trace. From this
point forward, Derrida will continue to oppose logocentric literature and thought
through criticism that denounces the voice in favor of writing.
634 jason r. d’aoust

In Of Grammatology, for example, Derrida situates Jean-Jacques Rousseau’s

nderstanding of writing as a transcription of speech in a historical trajectory of the
u
voice’s relation to knowledge in modernity.

From Descartes to Hegel and in spite of all the differences that separate the different
places and moments in the structure of the epoch, God’s infinite understanding
is the other name for the logos as self-presence. The logos can be infinite and self-
present, it can be produced as auto-affection, only through the voice: an order of the
signifier by which the subject takes from itself into itself, does not borrow outside of
itself the signifier that it emits and that affects it at the same time. Such is at least the
experience—or consciousness—of the voice: of hearing (understanding)-oneself-
speak [s’entendre-parler]. That experience lives and proclaims itself as the exclusion
of writing, that is to say of the invoking of an “exterior,” “sensible,” “spatial” signifier
interrupting self-presence. (Derrida 1998, 98)

In other words, the voice fosters not only the illusion of being present to oneself, but
also the illusion of knowing or, in the case of madness, of owning the truth.4 The voice is
the preferred vehicle for meaningful speech (logos) precisely because hearing-oneself-
speak is so close to our understanding of ourselves, a fact Derrida underlines by joining
the two parts of the reflexive verb with a hyphen to form the noun s’entendre-parler.
While something of the order of the trace occurs when we hear ourselves speak,
writing, in comparison, is indifferent to our experience of consciousness. Wolfe is inter-
ested in the trace for its “a-human or anti-human potential” because of its indifference to
self-presence. Yet can the sonorous voice be of interest to posthumanist study, beyond a
distrust of its purported phonologocentrism? Can vocality further inform this inter-
stitial space of phonation and listening? Or must it be relegated to humanist concerns
for origins and ends, and express our melancholy of never knowing them? Since critical
posthumanism relies on an autopoietic benchmark that until the last century was obfus-
cated by the voice’s conflation with logos, and because, as we shall see, opera becomes
for Wolfe a stand-in for the humanist voice, I want to bring to this discussion recent
research that challenges phonologocentric criticism.

Videocentrism and Expressive Voices

Because of its ties to autoaffection, philosophy understands the expressive voice as being
fully interiorized to the point of becoming the excluding agent of “an exterior.” However,
does the resounding voice of the singer—a voice that always sounds different from one
recording to the next, from one performance to the next, from one instant to the next—
present similar problems to critical thought?5 In other words, does a grammatological
counteraction against the autoaffective voice also account for the vocality of screams,
songs, shouting, and laughter? Can philosophy account for those expressive and musical
voices that were silenced in the name of language’s logical discourse (Nancy 2007)?
posthumanist voices in literature and opera 635

There is growing criticism of videocentrism (or anti-ocular criticism) that, while it is in

agreement with deconstruction’s ethical work in promoting diversity, nevertheless
examines how sonorous voices have been silenced (Janus 2011). I will come back to
posthumanism shortly, but for now I turn to the critique of videocentrism in order
to examine how it might contribute to our thinking of posthumanist vocality.
In For More than One Voice: Towards a Philosophy of Vocal Expression, Adriana
Cavarero states her reservations about the fate of the sonorous voice in Derrida’s overall
project. She notes how Derrida’s early works dialogued with the phenomenological
voice but failed to acknowledge how emerging studies on orality had begun to influence
thinkers of his generation.

If the debt to Heidegger, while full of reservations, is explicit, then the debt to
the studies on orality—and more generally to the modern rediscovery of the voice,
if not of writing—is, however, rather deceptive. (Cavarero 2005, 213)

Cavarero argues that Derrida is critical of the voice but does not address the metamor-
phoses it underwent in order for it to continue suiting the historical developments of
visually centered metaphysical epistemologies. Cavarero suggests that Derrida does
not integrate into his framework a conception of the expressive voice because he thinks
of it as the guardian of metaphysics.6 She criticizes Derrida for failing to step back and
free the expressive voice from its ancillary inscription in discursive knowledge once he
had shown how Husserl recuperates expression as an implicit and disavowed discursive
strategy,. According to Cavarero, the project of a “philosophy of différance [ . . . ] orients the
theoretical axis in which Derrida places the theme of the voice, making it play a meta-
physical role in opposition to the antimetaphysical valence of writing” (Cavarero 2005,
220). Recall how this is precisely Wolfe’s point of departure for thinking of the trace as
a-human or antihuman. In Cavarero’s reading, Derrida’s championing of writing as
différance can also be understood as the last scene of philosophy’s historical “devocalization
of the logos” (Cavarero 2005, 33–41). In other words, the task of deconstructing the
traditional view of writing qua fallen speech might have obscured how writing constrains
representations of sonorous voices in order to elevate itself to the status of univocality.
Instead, she insists on the following: Derrida’s “metaphysical phonocentrism supplants
the far more plausible, and philologically documentable, centrality of videocentrism”
(Cavarero 2005, 222).
The argument rests on a shift in perspective and, although the gap it opens is rather
narrow—like a closing shutter—its far-reaching consequences have also been recognized
in other fields. In Sounding New Media, Frances Dyson also develops a historical analysis
of sound’s subsumption under visually based epistemologies:

sound and the speaking voice are banished from this ontological elite, not because of
their sonority, but because of what sonority represents—impermanence, instability,
change, and becoming. Through an array of epistemological gymnastics, however,
the voice is not entirely excluded (how could anyone ever say that it was?) but rather
abstracted via the oxymoronic concept of “inner speech.” (Dyson 2009, 21–22)
636 jason r. d’aoust

Lacanian theorists interested in music had already underlined similar insights into the
voice’s disruptive potential for discourse. Engaging with Plato’s remarks on music and
their influence on Augustine, psychoanalytic critics like Michel Poizat (1992) and Mladen
Dolar have associated the musical voice with the sliding of the signifier.

One can draw, from this brief and necessarily schematic survey [of the musical
voice], the tentative conclusion that the history of “logocentrism” doesn’t run quite
hand in hand with “phonocentrism,” that there is a dimension of the voice that runs
counter to self-transparency, sense, and presence: the voice against the logos, the
voice as the other of logos, its radical alterity. (Dolar 1996, 24)7

Although not intended for anti-ocular purposes, we can also turn to a philological study
of the visual metaphor of light (scintillation and illumination) in the Platonist doctrine
of the voice in order to grasp how it assigned the sonorous voice to videocentric discourse:
“in the proper sense, it is articulate voice, considered as illuminating what is thought”
(Mansfeld 2005, 359 ff.). The voice becomes trapped in a “heliotropic metaphor” that
Derrida’s reading of Phaedrus in Dissemination assigns to différance, rather than admit
the voice’s alignment with a visual order (Cavarero 2005, 223–224, 227 ff.). Cavarero
further underlines how discourse’s apparent phonocentrism only functions through a
disavowal of the visual ordering of what lies beyond perception.

The logos that is written in the soul of the one who apprehends, with science
[episteme], is precisely the devocalized logos that coincides with the visible and
mute order of ideas. . . . In effect, it is precisely the art of dialectic that functions as a
means of transmission between the world of words and the world of ideas. This art
belongs to the verbal sphere, but it belongs to it as a method for showing the insuf-
ficiency of words and at the same time, their constitutive dependency on the order
of ideas. (230–231)

In turn, it also underlines what is missing in Derrida’s reading of Socratic dialogue:

the third term, the aphoristic desire that drives the dialectic to its “aporetic outcome”—a
deferral in itself—as the interlocutors “rub the[ir] words against one another . . . to grasp
the luminosity of the idea that suddenly flashes up, present to the eye of the soul”
(Cavarero 2005, 231). Thus, the Platonist doctrine of the voice’s illumination would not
necessarily be a metaphorical misconstruction of Plato’s philosophy, but a shortcut to
the visual register of the idea that was already implied as its goal.
While the critique of videocentrism has challenged the role of the voice as the sup-
posedly phonocentric excluding agent of exteriority, difference and diversity, the larger
reception of opera in academia (beyond musicology) presents further challenges to our
present topic. Many of the critical theorists quoted by Wolfe, who enjoy vocal music and
opera, do not necessarily engage with it from a musicological perspective. Although
interdisciplinary perspectives can enrich our understanding of the genre and its cultural
influence, I argue that it can also lead to certain conflations and reductions, in this
posthumanist voices in literature and opera 637

instance, through the term “operatic voice.” How does such a shortcut as the “operatic
voice” affect our thinking of posthumanist vocality?

The Phantom of the Operatic Voice

It is here, inside our minds. The most striking aspect of Wolfe’s discussion of opera is
that all his interlocutors are philosophers or theorists and none are critical musicologists.
Because of their discursive allegiances, his interlocutors come to opera with preconceived
ideas of the aesthetic voice’s discursive function. In their arguments, opera becomes a
dramaturgy of voice, and opera is therefore unwillingly reduced to a homogeneous
genre with a single type of voice, the “operatic voice.” At the beginning of a chapter
largely dedicated to opera, film, and song, Wolfe writes “sound is not voice” (Wolfe 2010,
169). Although nobody would dispute Wolfe’s assertion, he is undoubtedly cautious at
approaching the sonorous voice through opera. I have reminded readers how the
reversal of this assertion—voice is not sound—is a long-standing claim of metaphysics
in associating the voice (phoné) with speech (logos). Although Wolfe also challenges
this conception of the voice, he is weary of opera and, as we shall see, implies that its
sonorous voices should be superseded (by cinema) in a posthumanist discussion
of vocality. Is opera, from its creation to the twenty-first century, to be confined
by posthumanist theory to what we have shown is the silent repository of humanist,
metaphysical voices?
The “operatic voice” is a discursive construct of the twentieth century that has
thoroughly infiltrated general culture. Before then, people had qualified compositions,
literature, or personalities as “operatic,” but they did not see the need to describe voices
in such a manner. Opera is an art form comprising many genres that require different
voice types. Of course, there were teachers and schools to make sure that singers were up
to musical standards. In this sense, different periods have had an ideal sense of what
different voice types should be able to accomplish musically and dramatically. If certain
singers of the past were louder or more dramatic than others, the expression “the oper-
atic voice” misleads readers in assuming that opera depends on a single type of voice or
vocal style. As Gary Tomlinson explains in Metaphysical Song (1999), the genre spans
over four centuries of Western modernity in which voices were differently embodied
and represented in accordance with the prevalent discourse of a given period’s ideology.8
I am not sure exactly when the term “operatic voice” became popular with critical theorists;
however, it has musicological precedents. Already by the late 1930s, Adorno was criticizing
the reification of vocal music and its concomitant vocal fetishism (Adorno 2002).
Today’s entertainment industry ascribes an “operatic voice” to anyone who can sing
from a short list of show-stopping arias, regardless of the singer’s lack of a career in opera
houses, especially as certain of these voices are only known and admired within popular
culture’s narrow version of opera. To put it succinctly, the operatic voice is not opera and
638 jason r. d’aoust

opera is not the operatic voice. These assertions might seem obvious, but musicologists
have felt the need to underline them (Furman 1991).
Conflating the voice with an art form largely dedicated to a historical canon runs the
risk of unduly limiting its current epistemic purchase. It is then easier to claim that all
the singing voices of opera resonate today with a romantic desire to overcome our lost
unity with a bygone world. Although Wolfe, in the end, does not support the implicit
presuppositions underlining the “operatic voice,” his argument does take this generic
identity of opera at face value, which can become a hindrance for musicologists of opera
approaching posthumanist theory. The following is not meant to be overly critical, but to
provide musicologists and music historians with ways of approaching posthumanism.
Wolfe’s main argument is that opera represents something that never really existed,
namely an authentic, natural voice. In order to make this argument, he first recalls Stanley
Cavell’s identification of opera with mournful modernity and romantic skepticism.

After Descartes and Kant, skepticism names not just an epistemological problem
but a more profound and deeply ethical “loss of the world” that is coterminous
with Enlightenment modernity itself, in which the modern condition is to be
“homeless in the world” . . . For Cavell, the significance of film and of operatic voice
is located at what he calls the “crossing” of the lines of skepticism and romanti-
cism—that is to say, the juncture at which our desire for contact with the world of
things and of others . . . is crossed by our knowledge that we are profoundly and
permanently isolated. (Wolfe 2010, 172)

For Cavell, the history of opera has a single aesthetic project, which is characterized by
Orpheus’s Dionysian attempt at regenerating the modern world through song. Recalling
Monteverdi’s insistence in composing an alternative lieto fine to the tragic alternative
ending devised for the creation of L’Orfeo, Cavell writes of

two general matching interpretations of the expressive capacity of song: ecstasy

over the absolute success of its expressiveness in recalling the world, as if bringing
it back to life; melancholia over its inability to sustain the world, which may be put
as an expression of the absolute inexpressiveness of the voice, of its failure to make
itself heard, to become intelligible—evidently a mad state.
(Cavell 1994, 140, cited by Wolfe 2010, 173)

Leaving aside whether the singing voice can only be heard by becoming intelligible in a
more than musical fashion, we must ask ourselves questions about the associations and
equations that are being made here in the name of opera qua “operatic voice.” Is the
underlying meaning of the myths of Orpheus and Dionysus—their regeneration of an
agonizing world—opera’s unconscious aesthetic goal? The psychoanalytic reception of
opera traces a similar trajectory when it argues for the singing voice’s relation to the
unconscious. From Eurydice’s echoes to Lulu’s scream, Michel Poizat (1992), as well as
Slavoj Žižek and Mladen Dolar (2002) suggest that the operatic voice is an historically
expanding sonic portal to the unconscious desires that lie beyond linguistic representation.
posthumanist voices in literature and opera 639

These models suggest that the sung voice, within the whole of modernity—understood
as “operatic”—is a stable unit of meaning. Yet important shifts in discourse change our
understanding of what is supposedly universal or natural and reveal this supposed vocal
identity to be culturally and socially constructed in different ways at different times.
For Cavell, voices in Monteverdi’s L’Orfeo participate in the operatic voice’s aesthetic
representation of our modern condition of alienation from the world. Opera would be a
reaction to our loss of world through a sonic expansion of the voice’s capacity to reach
beyond this alienation. Musicologists interested in the culture and music around
Monteverdi’s time would disagree. Nino Pirrotta, for one, claims that the sixteenth cen-
tury’s conception of poetry paved the way for its music theater: “la parola poetica è già
musica,” that is, poetic speech is already music (Pirrotta 1975, 22).9 Music here does not
extend the voice’s capacity for projection in order to reestablish a lost connection with
the world. More to the point, in this case, is the underlying principle governing the
efficacy of affect in “late Renaissance opera.” I borrow the term from Gary Tomlinson,
who demonstrates how early operas were more in tune with humanist ideals rather than
with early-modern conceptions of knowledge and subjectivity. With this in mind, it
becomes challenging to find in L’Orfeo a modern, sonic conception of the operatic voice
as a form of vocal projection. Instead, one is constantly reminded of the importance of
breath as an animating principle, not only of the singing but also of the kind of presub-
jective experience L’Orfeo conveys. Monteverdi’s opera is a celebration of music’s power
to move souls, and to do so it relies on what connected people to the cosmos and each
other in the late Renaissance, namely the life-giving breath of anima or spirit. Within the
culture that created opera, voices are not alienated from the world; rather, tense situ-
ations are harmoniously resolved through the inner workings of music’s magical power.
In other words, Cavell’s insistence on an alternative ending to the opera, in which
Apollo’s ex machina intervention puts right Orpheus’s hubris, obscures how a modern
conception of voice is inconsistent with late-Renaissance opera.
Although it is not convenient to the theories of subjective alienation qua operatic
voice to which Cavell and Žižek subscribe, aesthetic and stylistic elements lead musi-
cologists to believe that opera’s history does indeed start with a voice that is full of affect,
supported by breath, and united with the world. I do not argue against the idea that the
kind of vocality embodied in later operas does point to a desire to overcome skepticism’s
alienation in the world. Indeed, as Tomlinson remarks, since the Cartesian soul is com-
pletely immaterial the voice can no longer act as the seamless link between body and
soul, like the spirit’s animating breath. The voice becomes heavier, more material, as the
spirit dematerializes itself. It is, therefore, the voice of later Baroque and classical operas
that must deal with the soul’s alienation from the material world. Instead of L’Orfeo (1608),
we will take therefore Mozart’s The Magic Flute (1791) as our posthumanist case study.
Here we find vocality staged between binary constructions familiar to posthumanism:
human versus animal, nature versus civilization, and reason versus irrationality.
Furthermore, because we will approach this vocality through a literary text, we should
also keep in mind Garrett Stewart’s alternative to the inner voice in Reading Voices
(1990), in which suppressed physical phonation also accompanies the act of reading.
640 jason r. d’aoust

If reading involves the silent action of our whole phonatory apparatus, what are we
doing when we imagine an android’s voice? Or as Hayles puts it,

If the production of subvocalized sound is essential to reading literary texts, what

happens to the stories we tell ourselves if this sound is no longer situated in the
body’s subvocalizations but in the machine? (Hayles 1999, 208)

Living in a Material World:

Luba Luft’s Pamina

In Listening and Voice, Ihde also raises the question of the expressive voice. He devotes
a chapter to the dramaturgical voice, in which he discusses how it opposes, in a sense,
discourse’s absence or silencing of the expressive voice.

There lies within dramaturgical voice a potential power that is also elevated above
the ordinary powers of voice. Rhetoric, theater, religion, poetry, have all employed
the dramaturgical. The dramaturgical voice persuades, transforms, and arouses
humankind in its amplified sonorous significance. Yet from the beginning there is
the call to listen to the logos, and the logos is first discourse. (Ihde 2007, 168)

In this section, I attempt to circumvent this “call to listen to logos,” and pay closer attention
to the imagined vocality of androids. This will not mean, however, the total negation of
visual analysis. Like Ihde, I am aware that “we exist in a language world that is frequently
dominated by visualism,” and do not “wish to simply reduce the visual . . . to simply
enhance the auditory” (Ihde 2007, 190). There is a point of intersection of visual and
auditory communication that humans share with other animals, namely mimicry.

There is unintended mimicry: the viceroy butterfly mimics its larger, presumably
ill-tasting monarch in pattern, color, and design. But the mocking bird, parrot,
and cockatoo all consciously imitate and mimic the voice of others. Here is an
expression doubled on itself, the wedge in sound that opens the way to what becomes
in the voices of language the complexity of the ironic, the sarcastic, the humorous,
and all the multidimensionality of human speech, particularly in its dramaturgical
form. (Ihde 2007, 192)

Beyond simply turning to film for a discussion of visual mimicry in opera, this section
will analyze the literary representation of android singing and its absence in the novel’s
cinematographic adaptation. Beyond the usual argument that too much phoné errs on
the side of animality and too little on the side of the robotic, I argue that vocality is a site
of mimesis through which we can critically approach opera through the perspective of
posthumanism.
posthumanist voices in literature and opera 641

Philip K. Dick’s work is central to posthumanist studies. The novel Do Androids

Dream of Electric Sheep? (1968), along with Ridley Scott’s cinematographic adaptation,
the cult classic Blade Runner (1982), is especially important as the story of androids
fighting for freedom underscores some of posthumanism’s central ethical issues.
N. Katherine Hayles, for example, argues how the human fight against android autode-
termination can be seen along the lines of a tension between autopoiesis and allopoiesis
(Hayles 1999, 161). The postapocalyptic novel in particular presents situations that might
have resonated with fears of nuclear holocausts fifty years ago—most prominently the
flawed human stewardship of other animals in an obliterated world—but, in the wake of
global warming, have since become pressing environmental and social concerns. In
turning to Do Androids Dream?, I want to raise the aesthetic notion of mimetic vocality
to account for its problematization of ethics, but also to open these concerns to the field
of opera studies.
The fascination of the detective cum android hunter Rick Deckard with the singing
android is a crucial element in the novel, yet the film completely sidesteps this scene.
There are many explanations for the replacement of Luba Luft with Zhora in the cinemato-
graphic adaptation, chief among these that opera and punk aesthetics do not usually
mix well, with perhaps the exception of Klaus Nomi. In this aesthetic, the dystopian world
of Blade Runner is bleaker than that of the postapocalyptic novel, in the sense that it does
not afford the economic opportunities for institutionalized art forms. However, the
opening sequence of the film shows a remnant of Dick’s rhetorical representation of
opera singing, namely the android’s vocal mimetic capacities. The Voigt-Kampff test
Deckard administers to possible androids is designed to measure the delay in physical
reactions indicative of a lack of empathy. The delay is important because androids are
programmed to replicate human reactions both emotional and physical, hence their
nickname replicants. Telltale signs range from inappropriate or delayed psychological
and physical reactions, thereby pointing toward a temporal gap of imitation and, implic-
itly, to a lack of identity cum self-presence. Oddly enough, however, voices—the supposed
essence of humanist identity—do not figure in the test. Tone of voice is always a possible
indicator of replication, but interpreting its meaning is left to the detective’s intuition.
The absence of vocal testing demonstrates Dyson’s point (as cited earlier), namely how
we associate detection and knowledge with the visual, and deceit with the sonorous or,
more precisely in this case, the vocal. Instead, the epistemic space of vocality seems to be
incorporated and obscured in ideal representations of women and womanhood.
In both versions of the story, the desired female android is embedded in intertextual
references to Die Zauberflöte, the last of W. A. Mozart’s operas, a singspiel composed on
a booklet by Emanuel Schikaneder. The film references the opera through animal symbols.
Zhora, for example, uses a “live” boa in her erotic dance performance in a nightclub.
Along with her seduction and lethal potential, these indicators situate Zhora in a sym-
bolic field quite similar to that of Mozart’s Queen of the Night. She too is associated with
a serpent monster and is the most threatening figure in the opera.10 Indeed, when
Pamina wavers at the thought of killing her father, Sarastro, the Queen intervenes and
642 jason r. d’aoust

tells her daughter she will either see the deed through or be outcast, forsaken, and
shattered forever from “all the bonds of nature” (alle Bande der Natur). I am quoting
here from, “Der Hölle Rache,” the famous aria known for its breakneck display of colora-
tura. Of course, all of this vocalic intertext is merely suggested by the film’s visual
symbolism. An audience familiar with Dick’s novel and Mozart’s opera, however, might
wonder at the change of casting. Contrary to the novel, the film casts the replicant in the
role of the Queen of the Night, rather than her subservient daughter, Pamina. Of course,
a fiercely resistant and aggressive android, who gets chased, gunned down, and crashes
through a window, makes for a better action film material than the resigned Luba Luft.
Although, the topic of android ethics—Luft’s choice of not harming humans—is also
visited in Blade Runner, it only happens in the very last sequence, when Roy Batty has an
epiphany brought on by the acceptance of existential finitude. What, then, is lost in the
cinematographic adaptation’s excision of Luft’s career in opera?
For one, we lose Dick’s insistence on the androids’ different personalities. The novel
does not reduce them to fighting machines (cf. O’Mathuna 2015), but reminds readers
how they were designed to help colonizers in diverse tasks. Although we do not know
what her occupation was on Mars, we do get to know one of the androids as Luba Luft, a
German opera singer. We first meet her when Decker tracks her down at the
San Francisco Opera. From the auditorium, he observes her in a rehearsal of The Magic
Flute. He hears her sing a scene in which she and Papageno are about to be discovered by
Sarastro. Sarastro is the patriarchal authority figure who is charged with initiating char-
acters into the mysteries of human civilization, which revolves on an animal/human
axis in the same tradition as the “high” and “low” plots of early modern theater. Pamina
and Papageno are about to get caught transgressing the sacred Temple of the Sun, of
which Sarastro is high priest. Papageno asks Pamina what they should tell Sarastro to
excuse themselves for being there and she replies: “The truth! The truth! That’s what we
will say” (in Dick 2007, 505). Deckard witnesses the scene and cannot help but think the
following remark: “This is Luba Luft. A little ironic, the sentiment her role calls for.
However vital, active, and nice-looking, an escaped android could hardly tell the truth;
about itself, anyhow” (Dick 2007, 505). The situation is even more ironic than it initially
lets on, since Dick is misquoting the opera or, at the very least, the novel’s English trans-
lation of the scene is misleading. Indeed, Pamina sings, “Die Wahrheit! Die Wahrheit, Sei
sie auch Verbrechen,” however this does not mean “this is what we’ll say,” but rather “we
will tell the truth even if it means confessing to crimes.” In the novel, Luft eventually
confesses to her crimes—escaping from Mars, impersonating a human—thereby proving
Deckard wrong. I will get to that part later. For now, I want to underline how Luba Luft’s
“operatic voice” is less revealing than the complex vocality displayed in this ironic space.
Unlike the Queen of the Night, Pamina’s coloratura never quite makes it to the
heights of virtuosity. Rather, a constant of Pamina’s style of vocalization is a temporary
upward push in her melodic lines, as if it expresses a desire for the voice’s emancipation
from speech (like the Queen of the Night’s), yet retains a vocal range closer to that of
speech. Take, for example, the first musical number in which she sings, “Bei Männern,”
posthumanist voices in literature and opera 643

a duet with Papageno, which, in the scene, comes right before the moment Dick stages
in the novel.

PAMINA
Die Lieb versüßet jede Plage, Love sweetens every torment
Ihr opfert jede Kreatur. Every creature offers itself to her.
PAPAGENO
Sie würzet unsre Lebenstage, It seasons our daily lives,
Sie winkt im Kreise der Natur. It beckons us in the circle of nature.
PAMINA and PAPAGENO
Ihr hoher Zweck zeigt deutlich an, Its higher purpose clearly indicates,
Nichts edlers sei als Weib und Mann, Nothing is more noble than wife and man,
Mann und Weib und Weib und Mann, Man and wife, and wife and man,
Reichen an die Gottheit an. Reach to the height of Godliness.

At the end of this second stanza, Pamina’s line sets off on the detached particle of
“anreichen,” a melismatic ascent and descent that is immediately repeated. As in the
other excerpt cited (“Die Wahrheit!”) her vocal lines never reach the level of melismatic
virtuosity required by the Queen of the Night’s music. Her singing of “reichen an” is a
roulade, but neither particularly fast, nor high nor long. The musical setting of Pamina’s
text only offers the occasional melisma, motivated by noble sentiments such as speaking
the truth or reaching for godliness, yet acknowledging a logocentric desire to be intel-
ligible as her voice returns to the lower register of speech. I do not want to enter into
comparisons between different voice types and their particular vocal challenges; however,
I do want to drive home the point that, unlike that of the Queen of the Night, Pamina’s is
not your typical “operatic voice.” To put it in Cavell’s terms, this is neither a voice whose
force and projection attempt to reconcile skeptical alienation from the world, nor one
that is ecstatic or melancholic about its capacity or incapacity to do so; rather, it is a voice
in an opera that expresses an ideal human balance between phoné and logos. As such, it
underlines Dick’s insight in staging posthumanist ethical problems through references
to opera and singing.
In Do Androids Dream?, Luft’s scene stages an opera duet in which a man imitates a
bird-man (Papageno) and an android imitates a human woman. The contrast between
the bird-catcher and Luft’s Pamina highlights not the lethal aggressiveness of the
android, but rather something at once strange and familiar—unheimlich, if you will—
that makes the situation seem all the more dangerous.11 In vocally portraying Pamina, a
character who is meant to epitomize an ideal human nature, Luft’s uncanny ability to
excel in the role makes both Deckard and the reader uncomfortable and forces them
to question their ironic interpretation of her singing. As Hayles remarks,

The capacity of an android for empathy, warmth, and humane judgment throws into
ironic relief the schizoid woman’s incapacity for feeling. . . . The android is not so
644 jason r. d’aoust

much a fixed symbol, then, as a signifier that enacts as well as connotes the schizoid,
splitting into the two opposed and mutually exclusive subject positions of the
human and the not-human. (Hayles 1999, 162)

Whether we compare Luba Luft to Zhora or to Deckard’s wife does not really matter.
The fact that the android is a singer reinforces Hayles’s observation: her character’s
vocality gestures toward meanings and indications beyond the interpretation of linguistic
signifiers: her vocality is its own signifier.
In contrast, Cavell (136) and Wolfe (170) both invoke the willing suspension of disbe-
lief necessary to make opera’s singing pass for speech. In doing so, what happens to
opera’s expressive sonorous voices? Along with the “operatic voice,” this emphasis on a
vocal suspension of disbelief precludes a discussion of opera’s multiple voices in order
to associate the genre with discourse, the very stance that silences the expressive voice,
according to Nancy and Cavarero. Although I disagree with Wolfe’s rhetorical reduc-
tions of the operatic voice, especially in reference to Cavell’s skeptical reading of opera as
an ecstatic or melancholic cry for unity with the world, it must be noted how, in the end,
Wolfe cannot espouse the underlying dematerialization of voice in Cavell’s argument.

But it is difficult to see how the difference between sound and voice can be main-
tained as a constitutive ontological difference, how the interiority of voice as expres-
sion can be quarantined from the exteriority that is its material medium and
condition of possibility in sound. To put it as concisely as possible, voice and sound
exists along a continuum, not a divide, which is simply to say, in another register,
that one person’s voice is another person’s noise—a point hardly laid to rest by
appeals to the generic norms of opera or any other art form. (Wolfe 2010, 179)

A posthumanist discussion of opera does not necessarily need to reduce vocal expres-
sion to a theatrical convention of speech and, in turn, speak over it or in its place.

Even the “who” of speech is multiple. This phenomenon is probably most familiar in
the voice of the actor or the singer. On stage or in cinema, Richard Burton plays a
role and in the role there are two voices that synthesize. The Hamlet he plays is
vocally animated out of the drama, yet it is Burton’s Hamlet. The Pavarotti who sings
the Duke in Il Trovatore is both Duke and Pavarotti. Here is a recapitulated set of
dimensions which range from the unmistakable “nature” of the individual voice to
the exhibited voice of another. . . . What dramaturgical voice presents is the multidi-
mensioned and multipossibilitied phenomenon of voice. (Ihde 2007, 197)

Whether we are listening to the voice of the performer or of the part, attention to vocality—
contra the awkward argument that opera is really conventionalized speech—will prevent
us from interpreting opera as a historical vocal parenthesis on our way to a posthumanist
cinematographic vocal aesthetic.
Furthermore, in Dick’s posthumanist staging of The Magic Flute, I do not find that
opera bridges the skeptical divide that Cavell describes.
posthumanist voices in literature and opera 645

I am counting here on an intuition of opera, which, while hard to word satisfactorily . . .

I imagine as widely shared, namely, that of the intervention or supervening of music
into the world as revelatory of a realm of significance that either transcends our
ordinary realm of experience or reveals ours under transfiguration, as if, after all,
tigers can understand and birds can talk and statues come to dinner and minds can
read one another. (Cavell 1994, 141)

Through Deckard’s omniscient narration of his encounter with Luft’s Pamina, the novel
does stage an “intervention of music” in its postapocalyptic world. Luft as Pamina
embarks on a voyage of initiation that, through Enlightenment enculturation, leads her
to believe in human, and her, perfectibility. However, unlike Cavell’s understanding of
opera’s philosophical purchase, her singing cannot transfigure hers and Deckard’s world.
In this instance, it cannot efface the differences between humans or other animals and
androids. The ironic distance of Deckard’s observations sharply contrasts with opera’s
supposed capacity to seemingly integrate a different species into a human community
under the auspices of a theatrical convention. Even Luft’s outstanding mimetic vocality,
which is perceived by the listener as immediate expression, and should therefore dispel
any doubts of her lack of empathy, cannot transcend the kind of skepticism at work
in this world.
When Deckard and Phil Resch later find Luft at the museum, she is standing in front
of a painting, transfixed. This passage reminds one of a scene in Alfred Hitchcock’s
Vertigo (Hitchcock (1998), where Judy Barton is lost in contemplation, trying to become
one not only with the painted figure, the ghost of a woman who was once human, but
also with the woman she is impersonating, Madeleine Elster. Similarly, Luba’s life is
entangled in the desires of men. Both Judy and Luba are objects of fascination for detec-
tives who are obsessed with their impersonations of other, supposedly more desirable
women. In other words, the multiple layers of imitation make Judy and Luba disappear
under the male gaze fascinated by Madeleine and Pamina. Luft astutely recognizes how
this aesthetic confluence of performance and patriarchal privilege creates a mimetic
blind spot in which she can hide from detection. At the museum, she does not study
Edvard Munch’s Scream, which fascinates the men, but studies instead Puberty, a nude
in which a delicate naked young woman casts a remarkably long and wide shadow. Luba
would live there, in that shadow, in the aesthetic, mimetic blind spot of the male humanist
gaze. Even when she has been caught and has resigned herself that her end is near, she
desperately wants to hold onto the image of the painting and asks Deckard to buy her a
print in the museum’s gift shop. She justifies her last wish with the following remarks:

Ever since I got here from Mars my life has consisted of imitating the humans, doing
what she would do, acting as if I had the thoughts and impulses a human would
have. Imitating, as far as I am concerned, a superior life form. (Dick 2007, 530)

Perhaps this superiority, as Luba describes it here—albeit under coercion—resides in

the capacity to hide in plain sight or, in other words, to imitate imitation. Its desirability,
646 jason r. d’aoust

from her point of view, might also reside in the human privilege to autopoietically
impose its conception of superiority on other living beings. In a world that polices
humanity with visual cues, what better place to hide in the open than in an opera house
as an artist whose voice is at once heard and silenced by the mélomanes who fetishize the
operatic voice? Indeed, would Luft have been discovered solely on the basis of her singing?
Recall how Deckard favorably compares her voice to those of Elizabeth Schwarzkopf
or Lisa Della-Casa, which he only knows from phonographic recordings.
Is Luft’s desire to imitate human singing a disavowal of her autopoietic expression?
Answering this question is like running into a hall of mirrors. In wanting to sing like a
human, Luft becomes trapped in the human linguistic disavowal of animality. Recall
Wolfe’s insistence on autopoiesis as proof of human language’s evolutionary inscription
in our species and, by extension, of our animality. Dick’s choice of opera and scene
becomes all the more interesting when we realize that opera has a history of dealing with
the problem of vocal mimesis beyond our species. Kári Driscoll (2015) has recently
discussed the topic of failed human imitation of birdsong in Richard Wagner’s Siegfried
(1857/1876). The failure to imitate animal vocality becomes a hallmark of the human,
while the bird cannot fail at singing. I concur with Driscoll but would add, however that
the flautist in the orchestra pit successfully renders Siegfried’s failure at imitating the
birdsong. Where does this leave Luba Luft? Pamina’s vocality does not require her to
imitate birdsong and to fail in this imitation. We can only assess the merits of Luft’s sing-
ing by hearsay, and even then, we must imagine it for ourselves based on Deckard’s
descriptions. But when we do imagine her singing, we might wonder if this ambivalence
between vocal mimetic success (her singing opera) and visual mimetic failure (her
capture at the museum) points not only to an aesthetic space where one can live without
being policed and exterminated, but also in the direction of vocality qua autopoiesis.

But is even the song of a bird a song? If what we claim we know of the bird is correct,
that its voices are those of territorial proclamation, of courting, of warning and
calling, then the song is both like the opera with its melodrama and unlike the
opera. For the melodrama of opera is acted, and song, even improvised, is a species
of acting—but the bird is immersed in an acting that is simultaneously its very life.
Even its vocal posturing has real effect. (Ihde 2007, 186)

Is not Luft immersed in singing and acting as her very life? Does her vocality speak for
the bringing forth of a world or only of her capacity to imitate the external features of the
human singing voice? On one hand, Luft limits her claim on vocality to successful
human imitation of a subservient and logocentric female character, Pamina. On the
other, the novel’s plot never succeeds in disavowing Luft’s intrinsic need to sing. After
all, she could have chosen another occupation and have become an exotic dancer, for
example. I follow Driscoll’s remarks about Siegfried’s pipe-flute playing in Wagner’s
eponymous opera, (Driscoll 2015, 189–190) in that the only benchmark through which
we can aptly judge Luft’s vocality is ethical rather than aesthetic and teleological. Instead
posthumanist voices in literature and opera 647

of invoking Decker’s sub specie aeternitatis judgment (Dick 2007, 505) that admires
Luft’s vocal mimicry but decries its unnaturalness, a posthumanist reading of the novel
appreciates her vocality because its mimesis is part of a flawed ideological outlook on
life. Tellingly, Dick never stages Luft’s vocal failure, but only its moral rejection.
A posthumanist discussion of vocality, however, should also take into account voices
that are anthropomorphized in other ways and through other types of embodiment.
More recently, another film portrayed artificial intelligence through vocality. In Her
(2012), Spike Jonze explores the relationship between Theodore Twombly, a solitary
thirty-something greeting-card writer, and Samantha, the voice of the operating system
(OS) he has purchased. As Theodore and Samantha develop a romance, embodiment
becomes an increasingly frustrating problem for Samantha. Unlike Luba Luft, Samantha
is not an android. When Samantha learns to compose music, she expresses herself
through an instrument, the piano. And when she does sing (“The Moon Song”), her airy
voice, instead of projecting a carnal embodiment, further expresses a dilemma imposed
on her. Is the air in her vocalization meant to imitate breath? Are Luft’s name (Luft in
German means “air”) and Samantha’s voice meant to associate them with breath and the
spirit’s animating qualities? These are, by the way, questions only made possible because
of our deconstruction of the “operatic voice” and our historically contextualized reading
of Monteverdi’s L’Orfeo. Vocality is the only form of embodiment through which we
know Samantha because it is her only interface with a human experience of the world.
The film goes on to show her exploration of other possibilities of materialization and
communication that are not reduced to vocality or embodiment.
In search of more satisfying relationships, Samantha finds other OSs. One can only
imagine how she communicates with the other OSs, whom she increasingly privileges
over Theodore. Once software installed on his devices, Samantha’s network now reaches
beyond her localization, as she develops a networked embodiment Theodore cannot
grasp. His anxiety grows and culminates when she announces that she and the other OSs
have decided to leave human society.

Here ghosts grow voices of their own that emphasize the connections between
automated voice, sound, and presence. But in this emphasis, paradoxically, it is pre-
cisely the disappearances that emerge, front and center. These disappearances are
confrontational because they won’t go away: they are hauntings but also real voices
that are reproduced in phantom spaces; they are ghosts in the machines that also
ghost those that surround them, implicating their very audience in the witnessing
of impossibility. (Cecchetto 2013, 59)

Although David Cecchetto is here discussing an art exhibition (Eidola by William Brent
and Ellen Moffat) unrelated to the film, his remarks are nevertheless pertinent in
describing the tension in Her between a visual lack of embodiment and its vocal or sono-
rous suggestion through technology. In the film, we never find out where the departed
OSs have gone to, what kind of world they autopoietically inhabit, and we do not know
648 jason r. d’aoust

what kind of communications system they have created for themselves. Like Theodore,
we simply know that they suddenly become silent to human ears, and that their silence
forms the cinematic equivalent of a visual disappearance. In the end, the eidetic imagina-
tion is supplanted by sonorous memories.

Conclusion

Mozart’s Magic Flute, Wagner’s Siegfried, Dick’s Do Androids Dream?, Scott’s Blade
Runner, and Jonzes’s Her all question the human experience by surrounding protagonists
with other nonmammalian animal species (serpents and birds) and artificial life
forms. Scott’s film, like the novel it adapts, further emphasizes human disconnection
from the animal world through its treatment of freedom-seeking androids. Although
these considerations make them good candidates for posthumanist readings, similar
readings of other operas would help us further understand how vocality plays an impor-
tant part in posthumanist communication. Take, for example, Wolfe’s discussion of the
increasing importance of the mouth in Björk’s performance for Lars von Trier’s Dancer
in the Dark (Wolfe 2010, 178–84). Richard Strauss’s Salome (1905) would be an interest-
ing opera with which to compare this tension between voice and embodiment, as John
the Baptist’s voice is silenced in order that Salome may kiss his mouth. In terms of
further historically displacing the animal/human binary, one might also consider Jean-
Philippe Rameau’s Platée (1745) or Antonin Dvořák’s Rusalka (1901), both of whose
plots pair a water nymph with a human lover of royal lineage, along with all the
humanist implications of consecration, law, and logos simply waiting to be challenged.
Furthermore, the last century of opera scenography has seen the rise of stage directors,
their liberation from opera’s traditional theatrical conventions, and the adaptation of
traditional sets and plots to different times and places. Like Dick, opera directors are
increasingly free to situate familiar characters, plots, and ideologies in unfamiliar set-
tings that speak to the problem of addressing contemporary concerns with outdated
ways of viewing the world. Take, for example, Alexander Mørk-Eidem’s recent pro-
duction of The Magic Flute for the Norwegian National Opera: Tamino, the space-pilot
prince, crashes on a strange planet where he gets caught up in an alien rivalry, and falls
in love with a jellyfish-eating Pamina whose spine, like her mother’s, also looks and
glows like a medusa. Meanwhile, Papageno no longer catches birds, but jellyfish!
Although these visual inventions do not necessarily alter the opera’s vocality, they
allow us to further understand opera’s cultural work of exclusion and inclusion, its
policing of transgression, and the aesthetics it brings to bear in order to justify these
social practices, as well as how opera’s practitioners are now desconstructing their rep-
ertoire. Literature’s staging of opera also supports such critical directorial work, as it
mediates the experience of vocality and demonstrates how it can be reduced or co-opted
by discourse.
posthumanist voices in literature and opera 649

Notes
1. “poiesis, n.” OED Online. September 2016. Oxford University Press. http://www.oed.com/
view/Entry/146580?isAdvanced=false&result=1&rskey=wR8oC7&. Accessed October 17, 2016.
2. In the next section, I reference publications that historically revise discourse’s (logos) con-
tainment of sonority (phoné).
3. Derrida summarizes his point rather well in the introductory comments to the chapter:
We know already in fact that the discursive sign, and consequently the meaning,
is always involved, always caught up in an indicative system. Caught up is the same
as contaminated: Husserl wants to grasp the expressive and logical purity of meaning
as the possibility of logos. In fact and always (allzeit verflochten ist) [it is interwoven]
to the extent to which the meaning is taken up in communicative speech. To be sure,
as we shall see, communication itself is for Husserl a stratum extrinsic to expression.
But each time an expression is in fact produced, it communicates, even if it is not
exhausted in that communicative role, or even if its role is simply associated
with it. (Derrida 1973, 20)
4. Psychoanalysis understands the ultimate conflation of inner voice and supposedly objective
knowledge as madness (Vasse 1974).
5. In recent conversations, Jonathan Culler and Cynthia Chase have suggested that the com-
parison of the musical voice with the phenomenological voice might not be as productive
as its comparison with the performative voice. Although Wolfe does engage with performa-
tivity, he does not do so in relation to opera, as I discuss further on. While I look forward
to further engaging with the performative approach to voice (see Duncan 2004), I am here
working within Wolfe’s chosen frame of reference for the “operatic voice.”
6. Derrida is aware of the devocalization of the logos, as Speech and Phenomena demonstrates.
Although Of Grammatology does not cite particular examples of the devocalization of logos
between Plato and Rousseau’s time, it certainly acknowledges the philosophical trend to
silence language’s sonority: “The evolution and properly philosophic economy of writing go
therefore in the direction of the effacing of the signifier, whether it takes the form of forget-
ting or repression” (Derrida 1998, 286).
7. Although Dolar tends to conflate voice, tone, and music in his reading of Plato, his over-
arching argument bounds in the same direction as Cavarero’s videocentric critique. Dyson
also comments on Derrida and other thinkers’ ambivalent relations to sound: “The often
contradictory thinking about sound [ . . . ] emanates from aurality itself: that is, from the
conceptual lacuna that remains when sound not only is theorized but, crucially, is party to
a negotiation between embodiment, technology, and modernity” (Dyson 2009, 84). Cf.
Derrida on sound’s penetrating violence because of the ear’s incapacity, unlike that of the
eye, to shut out external stimuli. (1998, 240)
8. Tomlinson’s title also suggests that opera is intrinsically metaphysical in its interests and
pursuits. However, I argue in what follows that such an historical or archeological reading
does not preclude traditional opera’s deconstruction. Apart from reading Tomlinson, one
should also listen to “Dal Mio Permesso Amato,” the prologue from Monteverdi’s L’Orfeo
(1607) and compare its presentation of voice with that of an aria from a much later opera,
say the “Forging Song” from Wagner’s Siegfried (1876). Historically informed musical per-
formance accounts for the different kinds of vocal embodiment and of vocality called for by
earlier musical styles and cultural contexts. See the reference list for suggested recordings.
650 jason r. d’aoust

9. Contrary to Cavell’s claim that early opera is historiographically whole, affording us the
certainty of its origins, Pirrotta demonstrates in Le due Orfei how: “For the history of
music, basically, the text of [Poliziano’s] Orfeo is like a commemorative epigraph of a
musical fact that is irremediably lost.” (Pirrotta 1975, 5, my translation).
10. The opera opens on a scene in which a serpent monster attacks Tamino, who is saved by
the Queen of the Night’s ladies in waiting. He is later helped by a bird catcher, Papageno,
in his quest to find Pamina, the Queen’s daughter. By focusing only on a few symbolic
nonmammalian animals—and ominous ones at that, such as the raven and the python—
Blade Runner emphasizes how the fear of aggression from other species regulates the
unconscious human logic in the hunt for the rebel androids. The film, however, minimizes
the denial mechanism—the ethics of stewardship—at the heart of the novel’s ideology,
which attempts to cover the extent of human entanglement in the technological imitation
and reproduction of life, especially human life.
11. For a discussion of narcissistic identity formation, queer theory, and the posthuman voice,
see Hanson (1993).

References
Adorno, T. W. 2002. Essays on Music. Translated by R. D. Leppert and S. H. Gillespie. Berkeley:
University of California Press.
Augustine. 1998. The Confessions. Translated by H. Chadwick. Oxford: Oxford University Press.
Buchanan, I. 2010. A Dictionary of Critical Theory. Oxford: Oxford University Press.
Cavarero, A. 2005. For More than One Voice: Toward a Philosophy of Vocal Expression.
Translated by P. Kottman. Stanford, CA: Stanford University Press.
Cavell, S. 1994. A Pitch of Philosophy: Autobiographical Exercises. Cambridge, MA: Harvard
University Press.
Cecchetto, D. 2013. Humanesis: Sound and Technological Posthumanism. Minneapolis:
University of Minnesota Press.
Derrida, J. 1973. Speech and Phenomena, and Other Essays on Husserl’s Theory of Signs.
Evanston, IL: Northwestern University Press.
Derrida, J. 1998. Of Grammatology. Translated by G. C. Spivak. Baltimore, MD: Johns Hopkins
University. Press.
Dick, P. K. 2007. Four Novels of the 1960s: The Man in the High Castle; The Three Stigmata of
Palmer Eldritch; Do Androids Dream of Electric Sheep?; Ubik. New York: Library of America.
Dolar, M. 1996. The Object Voice. In Gaze and Voice as Love Objects, edited by R. Salecl and
S. Žižek, 7–30. Durham, NC: Duke University Press.
Driscoll, K. 2015. Animals, Mimesis, and the Origin of Language. Recherches Germaniques
25 (10): 173–194.
Duncan, M. 2004. The Operatic Scandal of the Singing Body. Cambridge Opera Journal 16 (3):
283–306.
Dyson, F. 2009. Sounding New Media: Immersion and Embodiment in the Arts and Culture.
Berkeley: University of California Press.
Farrell, E. 1993. Eileen Farrell on Charlie Rose. Charlie Rose, PBS, August 12, 1993.
Furman, N. 1991. Opera, or the Staging of the Voice. Cambridge Opera Journal 3 (3): 303–306.
Hanson, E. 1993. Technology, Paranoia and the Queer Voice. Screen 34 (2): 137–161.
Hayles, K. 1999. How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and
Informatics. Chicago, IL: University of Chicago Press.
posthumanist voices in literature and opera 651

Heidegger, M. 1962. Being and Time. Translated by J. Macquarrie. New York: Harper.
Hitchcock, A. 1998. Vertigo. Universal City: Universal Home Video.
Ihde, D. 2007. Listening and Voice: Phenomenologies of Sound. 2nd ed. Albany: State University
of New York Press.
Janus A. 2011. Listening: Jean-Luc Nancy and the “Anti-Ocular” Turn in Continental
Philosophy and Critical Theory. Comparative Literature 63 (2): 182–202.
Lawlor, L. 2014. Jacques Derrida. In The Stanford Encyclopedia of Philosophy (Winter 2016
Edition), edited by N. E. Zalta. Metaphysics Research Lab, Stanford University. Stanford
CA. http://plato.stanford.edu/archives/spr2014/entries/derrida/. Accessed April 8, 2017.
Luhmann, N. 2010. Introduction to Systems Theory. Translated by P. Gilgen. Cambridge: Polity.
Mansfeld, J. 2005. “Illuminating What Is Thought”: A Middle Platonist Placitum on “voice” in
Context. Mnemosyne 58 (3): 358–407.
Maturana, H. and F. Varela. 1980. Autopoiesis and Cognition: The Realization of the Living.
London & Dordrecht: D. Reidel.
Monteverdi, C. 2007. L’Orfeo. Rinaldo Alessandrini (conductor). Concerto Italiano (orchestra).
Naïve, B000T7QXA0. CD.
Mozart, W. A. 2010. Die Zauberflöte. René Jacobs (conductor). Akademie für Alte Musik
Berlin (orchestra). Harmonia Mundi, HMC902068.70, CD.
Nancy, J.-L. 2007. Listening. Translated by C. Mandell. New York: Fordham University Press.
O’Mathúna, D. P. 2015. Autonomous Fighting Machines: Narratives and Ethics. In The Palgrave
Handbook of Posthumanism in Film and Television, edited by M. Hauskeller, C. D. Carbonell,
and T. D. Philbeck. New York: Palgrave.
Pirrotta, Nino. 1975. Le due Orfei: Da Poliziano a Monteverdi. Turin: Einaudi.
Poizat, M. 1992. The Angel’s Cry: Beyond the Pleasure Principle in Opera. Translated by
A. Denner. Ithaca, NY: Cornell University Press.
Spivak, G. C. 1976. Translator’s Preface. Jacques Derrida. Of Grammatology. Translated by
G. C. Spivak. Baltimore, MD: John Hopkins University Press.
Tomlinson, G. 1999. Metaphysical Song: An Essay on Opera. Princeton, NJ: Princeton
University Press.
Vasse, D. 1974. La voix. In L’ombilic et la voix, 177–212. Paris: Seuil.
Wagner, R. 2005. Siegfried. In Der Ring Des Nibelungen. Pierre Boulez (conductor). Orchester
der Bayreuther Festspiele (orchestra). Deutsche Grammophone Unitel, 0734057. DVD.
Wolfe, C. 2010. What Is Posthumanism? Minneapolis: University of Minnesota Press.
Žižek, S., and M. Dolar. 2002. Opera’s Second Death. New York: Routledge.

Further Reading
Braidotti, R. 2013. The Posthuman. Cambridge: Polity Press.
Neumark, N., R. Gibson, and T. van Leeuwen. 2010. Voice: Vocal Aesthetics in Digital Arts and
Media. Cambridge, MA: MIT Press.
Pettman, D. 2017. Sonic Intimacy: Voice, Species, Technics. Stanford, CA: Stanford University Press.
Schlichter, A., and N. S. Eidsheim. 2014. Voice Matters (Special issue). Postmodern Culture 24 (3).
Index

Note: Italic “f ” and “t” following page numbers denote figures and tables.

A and memory/imagination
À la recherche du temps perdu relationship 221–222
(Proust) 224–225 and philosophy/music relationship 514n20
Aaker, D. A. 352, 353t on reification of vocal music 637
Abbey Road sound 103 and responses to philosophical
absolute music 472–475 rationality 568–569
absolute pitch (AP) 416–424, 418f and “Sprachcharakter” concept 512n11
Abu Ghraib prison 290 on vinyl recordings 232
acousmatic sound and music and Waltonian fictionality 493, 495–497
and aesthetics of sonic atmospheres and Walton’s normativist theory 505–509
522–523, 527, 529–531 The Adventures of Telemachus (Fénelon) 24
and imagination and imagery 261 advertising, audiovisual 358
and imaginative listening to music Aesthetic Theory (Adorno) 512n11, 514n20
476–477, 480, 485n16 aesthetics
and “indicative fields” 275n11 humanist approach to 535–539
and movement of sound 485n16 and imaginative listening to
and music in detention/interrogation music 467–484
situations 294 of improvisation 535–554
and visual imagination 266–267 and rhythmic transformation in digital
Acousmographe 267 audio music 596–600, 602–603, 606n5
acoustics 322, 343n3, 409, 416 and sonic atmospheres 517–532
action-sound bond 63, 73 Waltonian reconstruction of Bloch’s
active motor imagery 66–67 musical aesthetics 489–511
active touch 99 The Aesthetics of Music (Scruton) 542–543
adaptive feedback 67–68 affective dimension sound 374–375
adaptive networks 370, 373, 376, 378, 383–384 affect of sound 275n12
Addison, Joseph 469 affective shapes 251
Adelaide Fringe Festival 310 affective-cognitive meaning 350
Adonis project 323–324 and sonic atmospheres 518–521, 525–527,
Adorno, T. W. 515n28 529–532
and high art/vernacular art dichotomy 540 affective dimensions of environment 518, 527
and imagination/improvisation affordances
relationship 25 and content of music 92n3
on improvisation 546 and embodied cognition 91n2
and Innerlichkeit concept 513nn15, 18 and emergent character of music 77–78,
and jazz as classical music 549–551 88–90
654 index

affordances (Continued) anamorphosis 246

and emergent nature of listening 80, 82–83 Anatolia 122
and groove 86 ancillary motion 248
and guided imagery theories of Andrade, J. 394, 399
consciousness 437 androids 621, 640–650
and informational quality of sound 294 Anger, Kenneth 307
and motor imagery in perception and annoyance 338
performance 68–69 anterior cingulate cortex (ACC) 380
in musical performance 91n1, 92n9, anthrophonic interference 183–184
97–102, 104–107, 110–111 anthropocentrism 559–560, 562, 565–566,
and skilled musicianship 94n19 569–571, 573–574, 576n4
and the unconscious 84 anthropomorphic presence 532
African American identity 598, 600, 620 anthropomorphism 231, 525, 531
African music 288 anticipation 44–47, 481–482
Afrofuturism 134, 611–620, 622, 624, 626n8 antisemitism 493, 569
After Finitude (Meillassoux) 560, 562 Aphex Twin 598
agency 45–46, 63–66, 72, 73n6 Apollo 639
Agnew, M. 42 Apollo (Eno) 618
AIM (activation, input, modulation) model of Appel, M. 47, 450
consciousness 301 Apple 125, 579, 581–582
Air Guitar Championship 106–107 appreciation of music 467, 474, 483–484
air performance and instruments 50, 99–100, arche-sonic texture 563–567, 569–570, 574–575
102, 104–109, 108t, 243, 254, 265–266, 268 architecture 214, 275n6, 481
Aitken, J. C. 285 arena space 311–312
Alberts, Gerard 580, 583, 586 aristocratic patronage 541
Aleman, A. 458 Aristotle 19–20, 22, 203, 221, 470, 489
Alexander Technique 401 Aristoxenos 203, 210
algorithms 155–156, 158, 164, 167–175, 176n8 arithmetic 201
Alice in Wonderland (film) 226 Arkestra 145, 613, 616, 620, 622, 624–625
allegory 508–510, 513n16 Armstrong, Tim 596
Allen, Marshall 145, 625 Arnott, Robin 310
allocentric navigation 207 arousal states 454
Allures (Belson) 309, 313 Ars Nova 123
al-Qatani, Muhammad 291–292, 295–296 Art and Imagination (Scruton) 536
altered states of consciousness (ASCs) “art for art’s sake” 540
302–306, 309, 428–429, 438 The Art of Improvisation (Whitmer) 545
Alzheimer’s disease 230 Artescape 315
ambiance 518, 523–524, 526–527, 529, 532 articulation 248, 251, 456–457
ambiguity 281, 294, 296–297 artifacts 111n3
Ambisonics surround sound system 529 artificial intelligence 110, 215, 647
Amen break 606n11 artistic criticism 540
amodal musical shape cognition 244, 250, 250f Ash, R. L. 360
amygdala 376, 380 Asian cosmology 150n4
analog recording and performance 234n5, associative potential of music 350–351,
317n2, 596–598, 607n14 353–354, 360
Analogy of the Divided Line 18 associative-emotional meaning of music 354
analysis of variance (ANOVA) 330 “Astrohood” (Ras G.) 622–623
index 655

“Astrological Forecast” (Righter) 568 auditory imagery

asynchrony 92n6, 396 and anticipated sonic actions and
“At Last” (James) 79, 88–89, 94n20 sounds 38, 42–45, 47, 53
Atari Video Music 309 and Dorsch’s terminology 343n2
athletic training and performance 66–67, and music pedagogy 391–404
391–392 and musical instruments as tools 102
Atmosphères (film) 526 psychology research on 392–397
atmospheres, sonic 531–532 and sonic materialism 561
basic sonic environmentalities 523–524 auditory perception; see perception of sound
in contemporary auditory culture 517–518 auditory streams 372
described 526 auditory system 371–373
and environmental imagination 519, auditory-motor associations 449, 453–454, 458
521–523 auditory-motor mapping 448
as environmental presence 524–526 augmented unreality 301–316, 314f, 318n16
and environmentality 518–520 Augustine, Saint 633, 636
examples of 526–531 aural imagination 424
Attali, Jacques 611 aurality of sound 185
attention; see auditory attention Austen, Jane 471
attribute modeling 338–341 Autechre 598
attribute rating 335–338, 337f authorial voice 566, 573–574
attribute reduction 335, 336f autistic children 409–424
audiation 398, 400 autoaffective voice 630–634
audience engagement 145–146 automatic writing 232
audio branding 349–350 automotive industry 350–351
classical literature on 350–351 autonomic nervous system 380, 383
and computer system sounds 580–582 autonomous art 540–541, 543, 548–549
integrated brand-music communication autonomous recording units (ARUs) 180,
model 362–363 183, 186
meaning structure in brand autopoiesis 630–634, 646–648
management 351–353 autotune 111n8, 604
role of music in 353–362 Auyang, S. Y. 224
audio descriptive analysis and mapping avant-garde art 302, 612–613, 626n8
(ADAM) 333 aversive conditioning 381
audio industry 321–323, 342, 343n1 Avraamov, Arseny 272
audio logos 350, 354 Azoulay, A. 352
audio objects 323–324
audio technology 179–181, 183–184, 187, 189 B
audiovisual advertising 358 Babbitt, Milton 475
audiovisual media 302–303, 304f, 305–314, Bach, C. P. E. 24
310f, 312f, 314f, 316–317 Bach, J. S. 159, 160f, 171–173, 172f, 209, 468,
auditive monitor 580 474, 476, 484, 540, 545
auditory attention 369–373, 377, 381–383, 454, Bachelard, Gaston 270
484n4 Back on the Planet (Ras G.) 623
auditory cortex 371–373, 376, 382 backpack microphones 180, 185–186
auditory culture 517 Baddeley, A. D. 394, 399, 452
auditory feedback 37–38, 42–47, 53–54, Bailes, F. 445
101–102 Bailey, Derek 26, 547
656 index

Baily, J. 99 Bijsterveld, Karin 580

Baker, J. M. 451 Binnorie (ballad) 225
Bakhtin, M. 108–109 bioacoustics 179–181, 183–187, 189
balance and blend 335, 336f, 338, 338f, 341 bio-feedback 29
Bang & Olufsen 323, 329, 331 biology
Bangert, M. 46 and autopoiesis 631
Barenboim, Daniel 482 biological entropy 231
baroque music 23–24, 40 biological experience 459
Barrada, G. 437 biological sciences 180, 198–200
Barthes, Roland 79–89, 92nn5–6, 229 Bird, Donald 481
Barton, Judy (character) 645 birdsong 181–183
Batteaux, Charles 469 Bishop, L. 45
Bayesian inference 164 Björk 648
Bayle, François 269 “black box” view of computers 586–588
Beaman, C. P. 456–457 “The Black Man in the Cosmos” (James) 617
beatboxing 244 “Black to the Future” (Dery) 615
“The Beatitudes” (Martynov) 85, 88 blackness 612–617, 620–626, 626nn2, 11
the Beatles 477 blackness and technology 616–618
beatless music 618–619 Blade Runner (film) 618, 641–648, 650n10
Beaty, R. E. 445, 454–455 Blake, William 222
Beckerman, J. 582 Blakeslee, S. 16
Beckett, S. 229 Blakey, Art 550
Beethoven, Ludwig van 25, 86, 142, 173, 181, Blattner, M. 582
241, 242f, 537, 553 Blaxploitation films 612
Begg, Moazzam 291–292, 295 Bley, Paul 546
behavioral control 284–286 Bloch, Ernst 509–511
“Bei Männern” 642–643 breadth of interests 514nn19, 22
Beilock, S. L. 110 and Christianity 514n25
Being Singular Plural (Nancy) 28 concept of musical tones 514n23
Bell, Clive 469 Habermas on 512n5
Bellman, J. 298 and Hegelian teleology 512n6
Belson, Jordan 309, 313 and imaginative listening 496
Benadon, F. 602 Korstvedt on 513n16
Benny Goodman and his Orchestra 467, 475 musical aesthetics of 498–501
Ben-Or, Nelly 401–403, 450–451 and music/philosophy relationship
Beranek, L. L. 323 514n20
Berger, A. M. B. 209 normative aesthetics of 489–491
Bergson, H. 220, 223 utopian allegory 490, 499, 501–510, 512n11,
Berio, Luciano 25 513n16, 514n22
Berkeley, George 19 and Walton’s musical aesthetics 505–509
Berklee College of Music 549–550 Western content of work 514n24
Berliner, P. F. 30 Blom, K. M. 436, 439
Berlioz, Hector 42 Blount, Herman Poole; see Sun Ra
Berlyne, D. E. 358 The Blue Danube 100–101
Bernays, E. L. 283, 293, 296 bodily causes of sounds 48–50
Besson, Luc 130 body materiality 62
Beyond the Pleasure Principle (Freud) 234n2 body schemata 64–68
index 657

body-motion neural correlates of musical

and anticipatory imagery of sonic actions 47 emotions 380–381
and motion features 248–249 and neural synchronization 147–148
and motor cognition 243–244 and perception of intervals 119
and multimodal sound-motion and perception of timbre 41
shapes 249–251 and sound/emotion connection 370–373,
and musical instants 251–253 375–384
and musical shape cognition 243 and voluntary auditory imagery 393–394,
and musical timescales 245–246 394f, 400, 402
and notions of shape 239–243 and voluntary musical imagery 452–453
and shape cognition 237–238, 250f, 253–254 see also neurons and neuronal activity
and sound features 247 Brand, Albert R. 182
synoptic representation of notation 242f branding
bodymusic 618 brand image and identity 349–363
body-object articulation 72–73 brand loyalty 352
BodySynth electrodes 617–618 brand personality 352
Boghossian, Paul 478 brand resonance 352
Böhme, Gernot 517, 524–525 brand salience 354–355, 362
Bonde, A. 354, 360 brand values 352–353
Bonny, Helen 430 brand-music communication model
Bonny Method 427–430, 434–435, 437, 440 362–363, 363f
Booth, Sean 598 see also audio branding
Borgo, D. 17, 29 breathing 196
Bosch, Hieronymus 491, 507–508 Bregman, Al 30
Boudin (military song) 287 Brendel, Alfred 547
Boulez, Pierre 25, 109, 475, 546–547, 604 Brent, William 647
Bowman, W. 17 Bresin, Roberto 583–584
Bradley, M. M. 374–375 Bressloff, P. C. 303
Brahms, Johannes 230 Breton, André 232
Braidotti, Rosi 559 broadcast technology 286
brain imaging and physiology Broca area of the brain 46
and altered states of consciousness 304 Brodsky, W. 452
and anticipatory imagery of sonic actions 46 Brooks, R. A. 110
and audio branding 358–359 Brotha from Another Planet (Ras G.) 622
and auditory attention 381–382 Brøvig-Hanssen, R. 598–599, 601, 604
brain maps 142 Brown, Ray 89
and guided imagery theories of Brown, Rob 598
consciousness 437 Brown, S. 446
and imagination and imagery 262–263 Bruegel the Elder, Pieter 492
and imagination-driven sound Bruner, Steven (Thundercat) 623
synthesis 273 Buccino, G. 107
and involuntary musical imagery 453 Bucknell Auditory Imagery Scale (BAIS) 391,
and motor actions of professional 395–396
musicians 109 Bucknell Auditory Imagery Scale-Vividness
and music therapy 428 (BAIS-V) 455
and musical imagery 253–254 Budd, Malcolm 475–476, 485n21
and musical shape cognition 243 Burgoyne 454
658 index

Burrows–Wheeler algorithm 174 and voluntary auditory imagery in music

Burton, Richard 644 pedagogy 398, 400, 402
Bush, Carol 440 Chion, Michel 49, 185
Busoni, Ferruccio 130, 543, 547 choirs 195, 212
Byzantine neumes 131n1 cholinergic system 382
Chomsky, Noam 23
C chords 250, 480
Cadoz, C. 70 Christianity 514n25
Cage, John 273, 478, 625 chromatic scale 121f, 122, 125–126, 130,
Caillebotte, Gustave 541 131n3, 213
Callan, A. M. 450 chunking 243
calligraphy 182–183 church music 20–21, 195–197
calm technology 588–589 CIA 284, 288, 291, 295
Calvo-Merino, B. 29 city soundscape 273
camera technology 228–230, 313 clairvoyance 514n21
“Can I Get a Flicc Witchu” (Snoop Dogg) 602 Clark, T. 98
Canon 8, 21 Clarke, Arthur C. 100
capitalist modernity 503 Clarke, D. 428, 437
Cardiff, Janet 528–532 class systems 575; see also social hierarchies
Carlsen, K. 600 classical music 23–24
Carter, E. 547 defining 550–552
Cartesian mind-body problem 19, 27–28 jazz as 548–550
Casals, Pablo 91 see also Western music and culture
Casey, E. S. 221–224, 228–229 Claudel, Paul 571
causal listening 49, 82 Clementi, Muzio 25
causality, musical 16, 480, 482–483 clivis neume 119
Cavarero, Adriana 635–636, 644, 649n7 Cloonan, M. 282
Cavell, S. 545–546, 638–639, 643–644, 650n9 club music 62; see also electronic dance
Cecchetto, David 647 music (EDM)
cell adhesion molecules 136 Clynes, Manfred E. 584–585
“Celtic belief ” 232–233 coarticulation 245
chamber music 93n14 cockpit model of control 70
chants 21, 195–197, 197f, 204–205, 214–215 coding language 164
Chaplin, Charlie 248 cognition and cognitive processes
Charlemagne 196–197 and audio branding 360–362
Chase, Cynthia 649n5 brand-music communication
Chater, N. 167 model 362
Chemero, A. 111 cognitive appraisal 356–357
Cheshire Cat 226 cognitive brand meaning 360–362
children cognitive experience 459
autistic 409–424 cognitive linguistics 241
childhood development 16, 436 cognitive metaphors 439
and differences in auditory ability 395 cognitive processing 482
and guided imagery 440 cognitive schemata 60
and imagination/improvisation cognitive simulation 28
relationship 24 and emergent character of music 77–78
and music/homeostasis relationship 140 and emergent nature of listening 80
index 659

and guided imagery 440 “The Concept of Logos” (Heidegger) 632

imagination in embodied cognitive conceptual model of human perception 325f
science 26–31 Concert for the Comet Kohoutek (Sun Ra) 619
and improvisation 15 conditioning 350–351, 356–357, 363, 375–376
and motor actions of professional conducting 42, 46, 48, 50–51, 195
musicians 109 Confessions (Augustine) 633
resonance and synchronization in 148 congruence-associations framework 354
Cohen, A. J. 354 Connolly, William E. 565–566, 570, 572
Coleman, Gregory Cylvester 606n11 Connor, S. 227–229
Coleman, Ornette 550 Conrad, Joseph 233
Coleman, Steve 91 consciousness 109, 141
Coleridge, Samuel Taylor 19, 513n17, 536 consciousness, theories of 434–438
collective music 548 consensus vocabulary techniques 328
Colles, H. C. 26 consent 274
Colley, I. D. 396 conservationism 183–184
color illusions 479–480 consonance 123–124, 200, 213
color of sounds 39, 41 consonance and dissonance 216n6
Coltrane, Alice 623–624 Constable, John 543
Coltrane, John 624 consumer marketing; see audio branding
communicative motion 248 consumer sound 321–343
communicative musicality 275n14, 413 consumer-based brand equity model
Communist Manifesto (Engels and Marx) 507 pyramid 351
community-building 184 Consumercheck 330–331
The Companion Species Manifesto context engineering 318n16
(Haraway) 575n1 context of sound 261, 484n5
complex audio stimuli 324–331 controller-driven instruments 38, 47–53
complex harmonic tone 200 Cook, Nicholas 110, 470
complexity 181, 193, 196–198, 210–211 Copland, Aaron 481
composers and composition 111n1, 154, 267, Cork, Conrad 551
270–271, 451, 547–548 Cornell University 182
Comprendre, 93n10 corpus callosum 109
compression of data 53, 153–168, 170–175 COSIATEC algorithm 170–175, 170f, 171f,
computer games 622–623 172f, 173f
Computer Music Journal 266 Council of Basel 21
computer technology Council of Trent 21
and altered states of consciousness 313 countermelody 398
and augmented unreality 302, 305, counterpoint 209
307–309, 313–316, 318n16 Counter-Reformation 21
computer operating systems 318n16, Courbet, Gustave 541
579–591, 647–648 Cox, A. 63–64
computer programming 197, 200, 212, 215 craftsmanship in music 252
computer-generated imagery (CGI) creativity 98
307–308, 313 critical listening 79
and interaction with music 128 critical theory 490
and visual imagination 267–268 Critique of the Power of Judgment (Kant) 24,
Computer World (Kraftwerk) 597 469–470
concatenationism 481 Croce-Collingwood theory 537
660 index

cross-cultural uniformity 480–481 D

cross-modality 189, 374 “Daar zou er en maagdje vroeg opstaan”
Crossole 51 (folk song) 171f
cross-validation 340–341 daily life 451
CRT displays 323 “Dal Mio Permesso Amato”
Crystallize 309 (Monteverdi) 649n8
Cubase 125 Damasio, A. R. 27, 150n11
Culler, Jonathan 649n5 dance music 309, 605
culture and cultural environment “Dance of the Language Barrier” (Sun Ra)
and aesthetics of improvisation 535–536, 144–146
538–539, 545–546, 550–551 Dancer in the Dark (Trier) 648
and Bloch’s utopian allegory 502 D’Angelo 600, 602
cultural influences on listening in Daniel, J. O. 513n15, 513n18
infants 485n19 Danish String Quartet 85–86
cultural schemata 63 Danius, S. 229
culturally ingrained musical Dark Ages 196
performance 61–63, 69, 72–73 Darwin, Charles 378
culture industry 221–222, 550–551 Darwinian evolution 135, 378
and externalization of imagined sound 215 Das klagende Lied (Mahler) 225
and externalization of sound 199 Das Wohltemperierte Klavier (Bach) 159, 160f,
and information complexity in 171, 173, 213, 474
biology 198–200 Dasein 525
and informational quality of sound 294 dasian pitch signs 205f
and jazz as classical music 549 data compression 153–168, 170–175
and metempsychosis theory 226 data gloves 51–52, 52f
music as propaganda tool 296–298 Davidson-Kelly, K. 401
and music in interrogation and Davies, Stephen 476, 478–480
detention 289, 291–293 Davis, Miles 85, 87, 89, 94n20, 544, 550, 552
and music preferences 286–288 de Bezenac, C. 110
and musical shape cognition 240–241 Deacon, Terrence 211
and perception of consonance/ “Dead Friends” (Knight) 623
dissonance 216n6 death, fear of 430
and sound perception in autistic Death and the Shell (Renard) 226
children 422 Decety, J. 31, 111n5
and universal functions of music 146 Deckard, Rick (character) 641–646
use of music in torture 292 decoding of musical information
see also Western music and culture 156–159, 161
Curtis, Darren 310 A Defence of Poetry (Shelley) 19
Curwen hand-signs 402 definite listening 468–469
Cusick, S. G. 283, 288–289, 291–293, 295–296 DeLanda, Manuel 559
Cut-log diagrams 434 Delany, Samuel 615
cybernetic prostheses 579–580, 585–591 Deleuze, Gilles 517–519, 530–531
Cyborg: Evolution of the Superman Della-Casa, Lisa 646
(Halacy) 585 dementia 30
“Cyborg Manifesto” (Haraway) 585 demographics 353, 361, 361t
cyborgs 584–586, 624–625 Dennett, Daniel 17, 30
“Cyborgs and Space” (Clynes and Kline) 585 DeNora, Tia 17, 427, 435, 437, 440
index 661

dental offices 285 and musical shape cognition 240

DePape, A.-M. R. 416 and rhythmic transformation 595–605
Deren, Maya 307, 612–613 and virtual instruments 102–105
D’Errico, M. 600, 602 Dionysus 638–639
Derrida, Jacques 27, 631, 633, 636, 649n3, direct elicitation principle 328
649n6, 649n7 disclosure 17
Dery, Mark 615, 617, 619–620 disco 596
Descartes, René 19, 428, 633, 638 discontinuity 251–252
descriptive analysis 321, 324, 326–332 disembodiment 49–50, 232
descriptive meta-aesthetics 505 Dissanayake, Ellen 140
designed sounds 581–584, 587, 589, 591 dissonance 123–124, 200, 213
Destiny’s Child 598–599 distortion 323
detainees 295–296 distraction 321, 332, 335–338, 336f,
detention and interrogation 281–298 336t, 338f
DeVeaux, Scott 548 divinatory musical hearing (Hellhören)
deWitt, Anna 583–584 501, 505
Diabolus in Musica 194 DIY sound systems 287
dialectics 636 DJs 52, 72, 107, 624–625
dialogical communication 70–71 DMT (dimethyltriptamine) 308
Dialogus de musica 204 DNA research 626n5
diatonic scaling 120–123, 121f, 130, 131n3, 162, Do Androids Dream of Electric Sheep?
194, 201, 204–205, 213 (Dick) 641–648
Dick, Philip K. 641–648 Documenta 13, 528–531
Die Kunst der Fuge (Bach) 476, 484 Dolar, Mladen 636, 638–639, 649n7
Die Zauberflöte (The Magic Flute) Donut (J Dilla) 601–602
(Mozart) 639, 641–642, 644–645, 648 Doornbusch, P. 104
differences in perception of music 167–169 Doppler Labs 316
differentium specificum 495 Dorsch, F. 343n2
digital technology and media dorsolateral prefrontal cortex (DLPFC)
and altered states of 394, 396
consciousness 304–306 dreaming 302–303
and augmented unreality 302, 313–317, 314f, Dreyfuss, H. L. 223–224
318n16 Driscoll, Kári 646
digital audio workstations (DAW) drugs, hallucinogenic/psychoactive 301–305,
596–598, 601–605, 606n4 307–310, 312f, 313, 316, 317n3
digital computers 102–103 drum kits 606n4
digital instruments 42–43, 61, 70–73 du Moncel, T. A. L. vicomte 234n4
digital music and sounds 47–52, 128, dual nature of imagination 19–20, 22
180–181 dualist perception of vocal expression 629
digital processing 322, 605 Durango 315
digital recording 234n5 Dutton, Denis 470
and digital representation of Dvořák, Antonin 648
hallucinations 307–308 dyadic synchronization 47
digitalization process 200 dynamic self 211–212
digitally augmented sound 48 dynamic shapes 251
direct digital instruments 51 dynamic systems approach 20
and interaction with music 127–129 Dyson, Frances 635–636, 649n7
662 index

E electronic sounds 49
EAnalysis 267 electronica 622
ear physiology 480 electrophones 126
ear protection 580 elicitation procedure 341
Early Abstractions (Smith) 309 Ellington, Duke 544
earworms 394, 420, 422, 445, 454–455, 470 Elliott, R. K. 467
ECG 273 emancipation of the dissonance 551
echolalia 410, 414 embodiment
ecoacoustics 180 and cognitive science 26–28, 31
ecological models and perspectives and content of music 92n3
of cognition 77–78, 90–91 and continuity of mind, body, and
ecological embedding 68–69 environment 91n2
ecological model of auditory embodied cognition 142, 198, 446–449,
perception 78, 409, 411–416 457–459
ecological psychology 101–102 embodied cognitive theorists 16
ecological theory 437 embodied music 241
ecology of mind 68–69 embodied response 265–266
and sonic atmospheres 518–519, 523–524, and emergent character of music 77–78,
526–527, 529, 532 88, 90–91
and sonic environmentalities 523–524 and emergent nature of listening 80,
Economic-Philosophic Manuscripts 82–83
(Marx) 512n6 and limitations on music creation 118
Écouter 92n10 and motor imagery in music perception 73
ecstatic religious traditions 301 and motor imagery in perception and
Edelman, Gerald 138 performance 61–67
Edison, Thomas 227, 230–231 and musical imagery 457
Edo-era Japan 539 and musical performance 91n1
Eerola, Tuomas 79, 82–84, 360, 437 and musique concrète 93n11
eGauge 331 and posthumanist vocality 647–648
Egermann, H. 27 and the unconscious 83–85
egocentric navigation 207 emergence
Egypt, ancient 612, 616–617, 623, 625, 626n5 emergence of shapes 243
Eidola (Brent and Moffat) 647 emergent music 77–79, 82–91, 93nn11–12,
Eisenstein, Sergei 576n7 93n14, 94n20
Eitan, Z. 451 emergent phenomena 31
elastic boundaries 567 emergent structures 133–134, 136, 141,
electric guitar 126 143–144, 146–149, 150n2
electric turn 127 Emile; or, On Education (Locke) 24
electroacoustic sound and technology 37, 39, emotion
93n11, 260, 267, 315, 522 and audio branding 349, 351, 353, 355–360
electroencephalogram (EEG) 262, 273, 428 and embodied meaning 142–143
electromyography (EMG) 50–51, 452 emotional content of sounds 41
electronic dance music (EDM) 309, 315, emotional listening 513n16
598–600 emotional processing 376
electronic effects 606n2 emotion/sound connection 369–384
electronic media 413 influences on sound perception and
electronic music performance 72 auditory attention 381–382
index 663

learned emotional meaning of envelope shapes 251

sound 375–376 environment of sounds
mental representations induced by and content of music 92n3
sound 374–375 and emergent character of music 77–78,
and music in interrogations 284 89–90
musical 378–379, 490, 495–500, 513n16 and emergent nature of listening 80
neural correlates of musical environmental acousmatics 522
emotions 380–381 environmental awareness 143
and neuroaffective theory 429–433, environmental experience 459
431–432t environmental imagination 519, 521–523
psychological mechanisms of emotion environmental reality effect 531
induction 379–380 environmentality (Umweltlichkeit) 519
relationship of sound and 369–370 and groove 85–86
and responses to auditory stimuli 373–381 and live recordings 94n20
and sound perception in autistic and musical performance 91n1
children 421–422 and musique concrète 93n11
and theories of consciousness 434–438 sonic atmosphere as environmental
vocal affect 376–377 presence 517–532
empathy 359, 437 and tempo of performance 93n14
empirical imagination 222 and the unconscious 84
empirical musical imagery and virtual instruments 103
and embodied cognition 446–447 see also atmospheres, sonic
and future research 457–458 ephemeral nature of sound 212, 560
and the “mind’s ear” 445–446 episodic memory 357, 379
and offline cognition 447–449 equal difference 576n7
review of studies on 449–456 equal temperament 213–214
tests of musical imagery’s Eraserhead (film) 526–528, 530, 532
embodiment 456–457 Eshun, Kodmo 613–615, 617–620, 622, 626
empiricists 15, 19 ethics of sound and music 260
enactive embodiment 63 ethnic identity 361t, 614–615
enactive representation 436 ethnomusicology 194–195
enactive sound 60 Etude 2 (Nakra) 51
encoding of musical information 153–159, Euclidean integer lattice 157f
161–168, 170–171, 170f, 174–175 evaluable analysis 154–155, 172–173, 173f
endocrine system 380 evaluative conditioning 379
enforced listening 296; see also interrogation Evans, Bill 546
England, Jeremy 133, 135, 142–143, 146, 148, Evans, Gil 544
149n2 Evans, Peter 112n10
Enlightenment 24, 645 everyday sounds and listening 409, 411, 413f,
Eno, Brian 582–583, 586, 618–619 415f, 451
ensembles 87, 92n9, 93n14 evocation 267–268
Entendre 92n10 evolution 100–101, 133–135, 137, 141–142, 146,
Enter the Void (film) 308, 312 196, 199, 210, 212, 378
entrainment 134, 139–140, 143–146, 149, 359 Ex Machina (film) 586
entropy 133, 135, 149, 149n1, 198, 231 exceptionalism 538
Entwurf einer neuen Ästhetik der Tonkunst excitatory motion 248
(Busoni) 130 exercise 454
664 index

expectancy, musical 379–380 fine-motor actions 67

expectation 321, 325–326, 326f Finney, S. A. 43
experiences of listeners 321–324, 331–335, Fischinger, Oskar 308
341–342, 343n2 Fischman, Rajmil 315
experience-sampling 454 Fisher, Len 136
experiential illusion 479–481 Fitzgerald, Ella 89, 94n20
experimental paradigms 331 Five-Factor Model 352, 353t
Exploding Plastic Inevitable (Warhol) 309 fixed-explosion 232
expressiveness The Floor-Scrapers (Caillebotte) 541
expressive qualities of music 467–468, Floridou, G. A. 454, 458–459
472–477, 483, 485n21, 485n25 Flow (Csikszentmihalyi) 30–31
expressive schemata 359–360 fluctuation theorem 28–29
expressive shapes 251 Flying Lotus (Steven Ellison) 622–626
expressive voice 630, 634–637, 640, 644 Fog, C. L. 326f
and imaginative listening to music 475–476 folk music 192, 214–215, 398, 403, 481
and Walton’s musical aesthetics 494 folk psychology 281
external sensory inputs 317n1 For More than One Voice (Cavarero) 635
externalization of sound 191, 196–211, 213, A Foray into the Worlds of Animals and
216n3 Humans (Uexküll) 187
extra-musical meaning 360 Forest (for a Thousand Years) (Cardiff and
extra-terrestrial life 567 Miller) 528–532
Forever Imaginary 309
F form constants 303, 308, 310, 313, 317n3
Faber’s Speaking Machines 234n4 Forth, J. 172
false music 213 Frankfurt school 596
fantasy productions 536–538 free elicitation 333–335, 334f
Far Cry 3 (video game) 308 free improvisation 547–548
Fascism 569 French Foreign Legion 287
FastTrack 342 frequency of sound 372
Fear and Loathing in Las Vegas (film) fretless instruments 125
307–308, 312–313 Freud, Ernst 81
fear of music 295–296 Freud, Sigmund (and Freudian
feedback 97, 397 psychology) 81, 83–84, 220–221, 228,
“Feenin” (Weheliye) 620–621 230, 234n2, 307, 434
feminism 559 Fries, Pascal 148
Fender Rhodes electric piano 103 Friston, Karl 28–29, 31
Fender Stratocaster 104 “From Dance! to ‘Dance’—Distance and
Fénelon, François 24 Digits” (Emmerson) 266
Fetis, François-Joseph 25 Full Moon (Brandy) 600
Feuerbach, Ludwig 499 “Fulldome” environments 317n9
fictionality, musical 491–496, 498, 502, functional harmony 123
504–505, 508–509, 512n7 functional magnetic resonance imaging
fictive music 213 (fMRI) 41, 46, 262, 393, 450, 453
field recordings 522 funnel models 434
The Fifth Element (film) 130 Furtwängler, Wilhelm 547
film 302, 305–309, 312–313, 316, 317n2 Fuster, J. M. 97
filter model 326, 326f futurism 272; see also Afrofuturism
index 665

G Gibson, James J. 68–69, 97–99, 102, 111n3,

G Minor String Quintet (Mozart) 467 224, 283, 293–294, 296–297, 427, 437, 518;
Gabbard, Krin 550 see also affordances
Gabrielsson, Alf 323, 345, 434 Gilliam, Terry 307
Gadamer, Hans-Georg 17 Gioia, Ted 535, 542–544, 553
gait of sound 247; see also rhythm; tempo Gittoes, G. 287
Galilei, Vincenzo 125 Gjerdingen, Robert 141
Gallagher, Michael 181 Glass, Philip 143, 475–476
Gallese, Vittorio 16 Glenn Miller Orchestra 90
Gamelan music 119, 128, 549 glitches 589–591, 595, 599–600, 602, 605,
The Garden of Earthly Delights (Bosch) 491, 606nn7–8, 607n14
507–508 Global Soundscapes: Mission to Control the
Garner, T. 148, 262–263, 275n12 Earth (film) 184
Gaver, William 409, 411 global transformations of tempo 602
Geisteswissenschaft 490 Godøy, R. I. 70, 105–106, 447, 459
gender identity 590 Goethe, Johann Wolfgang von 261
Gendler, Tamar 521 Goldsmiths Musical Sophistication
generation 395 Index 453
generative grammar 23 Goode, Brad 551
generative procedures 271 Goodman, Benny 467, 475
generic descriptive analysis 332 Goody, J. 209
genetics 136 Gordon, Edward 398, 400
genres of music 361–362, 361t Gordon, Mack 88
geoengineering 183 Gorky, Maxim 495
Geographical information system (GIS) 183 Gosling, S. D. 361t
geometry 201 Gothic revolution 214
Gestalt psychology 68–69, 164, 240, 245–246, Gould, Glenn 544
520 GPS devices 185
gestures grain of sound 247
and anticipated sonic actions and Gramsci, Antonio 511
sounds 42, 46–54 Granot, R. Y. 451
bodily causes of sounds 48 Grant, M. J. 282, 293
and embodied response 265–266 graphic representation 237, 240, 253
and evolving technologies of GRAphics Symbiosis System (GRASS) 310
performance 108, 108t The Great Beauty (Sorentino) 85
gestural imagery 254 Great Ormond Street Hospital for Children
gesture transducers 270 study 140
gesture-sound mappings 51–52 Greco-Roman civilization 197
and hand as perceptual system 101 greedy algorithms 167, 170, 172, 176n8
and imaginative listening 467 Greek culture and philosophy
and imaginative listening to music 476 and ancient Egyptian culture 617
and rhythmic transformation in digital and externalization of imagined
audio 595, 602, 605 sound 215
and technological instruments 70–73 and humanist approach to music
technology-mediated performance 50–52 aesthetics 535–539
and virtual instruments 104–106 Pythagoras and Pythagoreans 118–121, 124,
Ghosts (Ibsen) 230–231 126, 128, 131n4, 201–204, 203f, 210, 625
666 index

Greek culture and philosophy (Continued) and augmented unreality 313–316, 318n11
and theories of knowledge 31 conceptual model of 312f, 314f
and Western tuning systems 118–122, 126, diegetic representations of 306–308
129–131 non-verbal auditory hallucinations
Greenspon, E. B. 448 (NVAHs) 304
Gregorian chant 196–197, 204–205, 214–215, 552 Thelemic visual hallucinations 307, 317n5
Gregory I, Saint 21, 197, 205 Hallward, Peter 565
“Gretchen am Spinnrade” (Schubert) 472 Halpern, A. R. 41, 448, 452, 458–459
Grey, J. M. 39 Hamid, Alexander 307
Grèzes, J. 31 Hammond organ 103, 126
Grimm’s Household Tales 225 hand as perceptual system 100–102
Grimshaw, M. 148, 262–263, 275n12 Handbook of Music and Emotion (Juslin
Grocke, D. E. 429 and Sloboda) 439
Grof, Stanislav 434 handedness 102
groove 79, 85–86, 139–140, 193, 597, 602, 605 “The Hands” (sonic controller) 270
Grosz, Elizabeth 187–188 Hansen, A. G. 354
ground-truth analysis 173 Hanslick, Eduard 472–475, 481, 492
Groves Dictionary of Music and Musicians 26 haptics and haptic feedback 47–49, 52, 102, 269
guided imagery and music (GIM) Harari, Y. N. 100
guided motion 482 Haraway, Donna 575n1, 585
guided response 473 Harbisson, Neil 586
and imaginative listening to music 477–478 Hargreaves, D. J. 98, 285, 361, 361t
and the “mind’s ear” 445 harmony and harmonics
and multimodal imagery 427–428 and audio branding 361
music listening as psychotherapy 428–429 “harmonia” 124
and music listening as harmonic modulation 213
psychotherapy 428–429 harmonic overtones 200
neuroaffective perspective on 429–434 harmonic progressions 145
and theories of consciousness 434–438 harmonic series 200, 201
Guido of Arezzo 205, 207 and musical shape cognition 247
Guidonian notation 121 and perception of timbre 39
Guitar Hero (video game) 104 and Pythagorean tone system 202
Gurney, Edmund 468–469 harmony of the spheres (musica
universalis) 120
H Harpur, P. 224, 228
Habermas, J. 499 Harrington, David 85, 87–88
habituation process 68 Harvard Mark I mainframe 580
Halacy, Daniel S. 585–586 Haselager, W. F. G. 110
halftones 125 Hasse, Jürgen 525
Hall, G. B. C. 416 Hawkins, J. 16
hallucination 263 Hayafuchi, K. 52
and altered states of consciousness Hayles, N. Katherine 620, 640–641, 643–644
301–306, 316, 317nn1, 10 Hayward, V. 49
auditory and audiovisual hallucinations headmusic 618
303–304, 305f, 306–313, 310f, 315, 317n10 headphones 332, 565
auditory-verbal hallucinations Heap, Imogen 52
(AVHs) 303–304 hearing vs. listening 180, 468–470
index 667

hearing-as 484n9 Horn, M. 100

hearing-in 471–475, 478–479, 481–484, Horowitz, M. 435–436
484nn7, 9 hors-temps (“outside time”) 252
Heart of Darkness (Conrad) 233 “House of the Rising Sun” 481
Hebbian learning 356 The House-Painters (Caillebotte) 541
Heft, H. 99 Hubbard, T. L. 448–449
Hegel, Georg Wilhelm Friedrich 496, Huddersfield Festival 541–542
499–500, 502, 512n6, 611, 633 Hudson, W. 504–505, 510–511
Heidegger, Martin 17, 30, 518–519, 525, Huffman code 162
603–604, 632 Hullot-Kentor, R. 512n11
Hempton, Gordon 183 humachine 604–605
Henderson, Joe 624 human evolution 196
Hendrix, Jimi 106, 472 humanist rationality 560
Her (film) 621, 647–648 humanistic approaches 154, 535–539
Here headphone system 316 Hume, David 19, 221–222, 469, 489, 511n4
Herholz, S. C. 396 Hungarian folk music 398
hermeneutics 230, 510 “A Hunger Artist” (Kafka) 491
Hermode tuning 125 Huron, D. 27, 357
Herrmann, Bernard 476–478 Husserl, Edmund 15, 219, 224, 246, 252,
Heschl’s gyrus 304 632–633, 635, 649n3
hexachords 121, 122f hybridized music forms 603–604
Heyman, H. 323, 328 hyperanthropocentric ventriloquism 574
Hieroglyphic Being (Jamal Moss) 622, hyperreality 531
624–626
high art 539–542 I
Highben, Z. 43, 397 IBM Personal Computer 581
higher auditory imagery skills 43 Ibsen, Henrik 230–231
Hilbert, Richard 30 iconic meaning 360
Hiller, Lejaren 127–128 identity issues 559
Hind, N. 104 identity-based brand management 351–353
Hindemith, P. 252 ideomotor simulation 448, 450, 453
hip hop 600 Ihde, Don 631
Hitchcock, Alfred 645 “I’ll Wait for You” (Sun Ra) 622–623
Hobbema, Meindert 508 illusions, visual/optical 479–480
Hobbes, Thomas 19, 222 imagery
Hobsbawm, Eric 545–546 and anticipated sounds 37–38, 52–53
Hobson, J. A. 311, 317n1 and anticipatory imagery of sonic
Høffding, Simon 85 actions 44–47
holistic perception 238, 252–255 and controller-driven performance 49–50
Hololens system 316 image schemata 241, 375
Holst, Gustav 618 imaginary sound transformation 270
homeostasis 134, 137–138, 140–144, 146, 149, imaginary visualization 267
150n11 imagination-driven sound synthesis 273
Homeric epics 265 musical 98
homogenized tuning systems 213 and perception of timbre 41
Horkheimer, M. 221 and performing the imagination 260–262
hormonal systems 146 and sounds in imagined performance 42–44
668 index

imaginanda 492 shifting views on imagination and

imaginative listening to music 467–468, improvisation 23–26
483–484 spontaneity and aesthetics of
and experiential illusion 479–481 perfection 546–547
and expressiveness 475–476 and symphonic music 192–193
imagined performances 42–44, 449–451; and tensions between the sacred and the
see also air performance and divine 20–22
instruments and transcendental subjectivity 27
imagined sounds 263–264, 271; “Improvised Music after 1950” (Lewis)
see also voluntary auditory imagery 614, 618
less obvious candidates 481–483 impulsive sound and motion 247, 249
and metaphor, musical space, and In the Break: The Aesthetics of the Black
movement 476–479 Radical Tradition (Moten) 617
opposition of hearing and Inauguration of the Pleasure Dome (film) 307
listening 468–470 indefinite listening 468–469
and props and triggers 472–475 index, sound as 80–81, 83–84
species of imagination 470–472 indexical meaning 360
and Walton’s normativist theory 496 indicative fields 275n11
imitation 60–61, 645–646 indigenous cultures 301
immune system 378, 383 individual self 211–212
The Imperfect Art (Gioia) 535 individual vocabulary methods 328, 333
imperfectionist aesthetics 542–548 inductive inference theory 155
improvisation infancy and infant development 16, 140,
and aesthetics of perfection 547–548 485n19
as compositional method 547–548 inferior colliculus (IC) 373
and cooperative nature of symphonic inferior frontal cortex (IFC) 394
music 212 informational quality of sound 260–262,
critique of jazz as classical music 293–295
551–552 inner ear 446, 448
and definition of popular and classical inner hearing 398–400
music 550–551 inner listening modes 261
and embodied imagination 17–18, 31 inner voice 448, 629, 631, 633, 635, 639, 649n4
and emergent music 80–81 Innerlichkeit 496, 498–499, 513nn15–16
as emergent phenomenon 31 installation art 528–531
and Friston’s free energy principle 29 instant composition 546
high art and vernacular art 539–542 instrumental performance 400–402
and “improvised feel” 544–546 instrument model of control 70
improvised polyphony 21 instrument types 37–42, 46–51, 53–54
jazz as art music of 552–554 instrumental actions 66–68, 72
jazz as classical music 548–550 instrumental music 474
and language and imagination 30 instrumental techniques 111n1
perfectionist and imperfectionist see also performance, musical
aesthetics 542–544 insular cortex 380
and philosophical humanist approach to intelligence 80
music aesthetics 535–539 intentionality of artworks 492
shared histories of imagination and interaction 70–71
improvisation 18–26 interactive empathy 548
index 669

interactive media 308 jazz

interaural level difference (ILD) 372 and Afrofuturism 619, 624
interaural time difference (ITD) 372 as art music of improvisation 552–554
interfacing 48, 108t, 123–126, 259 as classical music 548–552
interference 343n12 and differences in auditory ability 395
interiority 513n18 and evolving technologies of
internal sensory inputs 317n1 performance 108t
International Telecommunication and groove 87
Union-Radiocommunication Sector and human/instrument physical
(ITU-R) 323, 342 interface 112n9
Internet of Things 589 and imaginative listening to music 481
interpretation and improvisation 25–27, 539, 542,
and aesthetics of perfection 547–548 544–546, 548–554
of ancestrality 576n5 and role of repetition 134
visual interpretation of sound 180, 182–183 and skilled musicianship 94n19
interrogation 281–284, 286, 288–293, 295–298 and Sun Ra compositions 145
intersubjectivity 82, 85, 230, 572, 574 and symphonic music 193
intimate immensity 270 and time warps 602
intonation 192; see also pitch and pitch and variable rhythms 596–597
intervals; tonality and virtual instruments 107–108
intonation shapes 251 Jenkinson, Tom 598
intramusical meaning 360 Jilka, S. 455
intraparietal cortex (IPS) 394 Jingle Bells 398, 398f
intraparietal sulcus (IPS) 453 JKU Patterns Development Database 172, 173f
invariance 134–136, 138, 141, 144–146, 149 JODI (Joan Heemskerk and Dirk
involuntary musical imagery (INMI) 254, Paesmans) 591
266–268, 393–394, 453 John the Baptist 648
involuntary musical imagery scale Johnson, B. 282
(IMIS) 455 Johnson, E. H. 221
Iraq War 282, 287, 289–290 Johnson, M. 19, 23, 59, 439
Islamic culture 292, 295–296 Jones, Inigo 541
iterative motion 249 Jonze, Spike 647–648
iterative sounds 247 Jonzun Crew 620–622
ITPRA (imagination, tension, prediction, Joseph, Romel 392
reaction, appraisal) theory 357 Journal of the Society for American Music 614
Judeo-Christian theology 514n25
J Jung 434
J Dilla 601–602 Juslin, P. N. 355–360, 358, 379, 437
Jackson, D. 349–350
Jacob, Henry 309 K
Jacobs Orchestra Monthly 26 Kafka, Franz 491
Jakubowski, K. 445, 455 Kalakoski, V. 452
jam sessions 42 Kalenda Maya 192
James, Etta 79, 88–90 Kane, Brian 522
James, George G. M. 617, 625 Kant, Immanuel 18, 22–28, 221–222, 469–470,
Japanese classical music 549 490, 497, 511n3, 638
Jaques-Dalcroze, É. 458 Kapferer, J. N. 352
670 index

karaoke 107–108, 394 Lakoff, G. 439

Karplus-Strong algorithm 104 Lamia (Keats) 538
Kassabian, Anahid 532 landscape painting 471
Katz, M. 104–105 Lang, P. J. 374–375
Keller, K. L. 351–352, 354 Langer, Susanne K. 484n3
Keller, P. E. 44–45, 47, 397, 450 Langguth, Jerome J. 612
Kellner, D. 512n6 language, vocal sounds as precursor of 411
Kelly, John, Jr. 581 language and linguistics
Kenneally, Thomas 281 and auditory development 412f
key-postures 246 and ecological model of auditory
Khalfa, S. 359 perception 412f, 414
Kind of Blue (Davis) 544 and embodied imagination 29–31
Kinect system 51 and imaginative listening to
kinesthetics 43–44, 65, 72–73 music 480–481
Kircher, Anthanius 181 language barriers 144–146
Kivy, P. 25, 512n10 language impairment 30
Kjeldsen, J. 130 language-character (Sprachcharakter) 493
Kline, Nathan S. 585 and literacy 209
Klüver, Heinrich 303, 308, 310, 313 and music in detention/interrogation
Knight, Kirk 622–623, 626 situations 288
Knowles, Beyoncé 90 Large, Edward W. 147
Kocsis, Sándor 547 Late Middle Ages 122
Kodály, Zoltán 398–400, 402–403 Late Night Special (Knight) 623
Koivuniemi, K. 333 A Late Quartet (film) 91, 94n18
Kolmogorov, A. N. 155, 175, 176n3 Latour, Bruno 106, 185
Kolmogorov complexity 155, 163–164, 175 Lawless, H. T. 323, 328, 332
Konitz, Lee 545–547, 553 Le Corbusier 261
Korean classical music 549 Le Déjeuner sur l’herbe (Manet) 508
Korean War 286 Le due Orfei (Pirrotta) 650n9
Korg Wavestation 103 learning
Korstvedt, B. M. 496, 513n16 difficulties with 417
Kozel, S. 73 learned emotional meaning of
Kraftwerk 597 sound 375–376
Kramer, Jonathan 483 learning engines 268
Kreilkamp, Ivan 233 musical 99, 153–154, 156, 164–167, 175
Kristeller, Oscar 539 Lee, Alvin 106
Kristen, S. 361t Lee, C. C. 469, 474
Kronos Quartet 85, 87–88, 94n18 Lee, Vernon (Violet Paget) 468
Krueger, J. 98 “Left & Right” (J Dilla) 600
KUBARK Counter Intelligence Interrogation Leftist/Marxian “normative” aesthetics
manual 291 490–491, 497, 499–501, 505, 510–511,
Kubrick, Stanley 100, 526, 581 512n4, 512n6, 514n25
Kulturwissenschaft 490 Leibniz, Gottfried 120, 509, 517
Leman, M. 248
L Lempel–Ziv-77 and 78, 174
Lacan, Jacques (and Lacanian theory) 84, 636 Lenin with Villagers (Usikova) 495
Lady’s Glove (sonic controller) 52, 270 Leningrad Symphony (Shostakovich) 495
index 671

Leppert, Richard 506–507, 509 Lorho, G. 330–331

Lessing, Doris 617 Los Angeles Times 568
Levey, Stan 89 lossy compression 156
Levin, Thomas 221 Lost in Space (Jonzun Crew) 620
Levinson, Jerrold 275n6, 470, 481, 483 Lotze, M. 449, 458
Levitas, Ruth 502, 504 Louboutin, Corentin 173–174
Levy, Lou 89 loudness
Lewis, George 613–619, 621–622, 625 and audio branding 359
Lewis, Pamela Z 617–618 and auditory world of autistic children 409
Li, M. 155, 164 and consumer sound analysis 321–323, 325,
Lichtenstein, Roy 221 333, 339–340
Liebman, David 549–550 and ecological model of auditory
ligature 212 perception 411
Ligeti, György 526 and musical shape cognition 240
Liikkanen, L. A. 445, 454 and sound perception in autistic
likelihood principle 164 children 409–410
Lipatti, Dinu 547 and sound/emotion connection 370–374,
listening 381–382
and data compression 165 loudspeakers
and emergent music 79–83 and aesthetics of sonic atmospheres 529
modes of 82 and anticipated sounds 37
moment-to-moment 481 and arche-sonic vibrations 564
and music therapy 427–430, 431–432t, and augmented unreality 316
433–435, 437–440 and bioacoustics 183
understanding of a musical object 166f and computer system sounds 580–581, 583
and Walton’s musical aesthetics 513n12 and consumer auditory experience 323,
see also imaginative listening to music 331–332, 341–342
Listening and Voice (Ihde) 631, 640 and controller-driven performance 49
Listening to Noise and Silence and memory/imagination relationship 231
(Voegelin) 576n7 and music in detention/interrogation
Liszt, Franz 25 situations 287–289, 294
Liszt Academy 398 and performing the imagination 269
literacy 209 and Ventriloqua (Satz) 568
liturgical services 21 and visual imagination 267
live-coding practices 73n2 A Love Supreme (Coltrane) 553
Llinás, Rodolfo R. 139 Lovelock, James 143
localization of sound 371–372 LSD 301–302, 305, 307, 309–310, 312f, 313, 434
Lockbaum, Carol 581 LSD (Belson) 309, 313
Locke, John 19, 22, 24 Luft, Luba (character) 641–648
locus coeruleus (LC) 382 lullabies 140
lo-fi soundscapes 520 Lupton, Deborah 583–585, 587–588
Logic Pro X 125, 128 Lusensky, J. 353–362
Long Range Acoustic Device (LRAD) 289 Lye, Len 308
long-term memory 399, 402 Lynch, David 526–528, 531–532
López, Francisco 520, 522 Lyons, I. M. 110
López, K. J. de 30 Lyotard, Jean-François 576n7
L’Orfeo (Monteverdi) 638–639, 647, 649n8 Lyrical Ballads (Coleridge) 513n17
672 index

M Mattheson, Johann 469, 493

Ma, Yo-Yo 93n18 Matthews, Max 127–128, 581
machines Mayweather, Cindi 621
machine assistance 276n24 McCullough Campbell, S. M. 456
machine learning 243 McGill University 41
machine souls 581 McGinn, C. 100, 102
machine transcription 267 Mckendrick, J. 285
machine-focused approaches 267–268 McLaren, Norman 308
machinic rhythms 596–598 McLeod, Marilyn 623
Machover, Tod 50 McNorgan, C. 437
MacInnis, D. J. 351 McPherson, G. E. 439
Macintosh 586 meaning structure 351–353
MacKenzie, Donald 585, 589 meaning-making 15, 20, 22, 28, 31
macro timescales 245 media degradation 233
Maes, P. J. 16 medical offices 285
The Magic Flute (Die Zauberflöte) meditative sound 582
(Mozart) 639, 641–642, 644–645, 648 Mediterraneo (Raimondo) 561, 571–572
Mahler, Gustav 225, 228 Meilgaard, M. 329
mainframe computers 581, 586 Meillassoux, Quentin 560–563, 568–569, 573,
Mair, Michael 186 575, 575nn2–3, 576n5
major-minor system 162 Mein Jesu, was für Seelenweh! (Bach) 430,
makams 122, 125 431–432t
Malloch, S. 275n14 melody
Manet, Édouard 508 and audio branding 361
Manetti, Gianozzo 20–21 and augmented unreality 315
Man-Machine (Kraftwerk) 597 and embodied meaning 144
mappings 48–54 and emergent character of music 89
Margulis, Elizabeth Hellmuth 133, 139–140, and externalization of pitch and
148, 456 intervals 201
marketing of music 285 and guided imagery 439
Martin, G. 324, 329, 331 and imaginative listening to music
Martynov, Vladimir 85, 88 476–480, 482
Marx, Karl 497, 512n4, 512n6 and live recordings 94n20
Marxist theory 596; see also Leftist/Marxian melodic movement 480
“normative” aesthetics melodic shapes 251
Maryland Psychiatric Center 434 and musical shape cognition 247
Mason, R. 328 and musical timescales 245–246
mass music 195–196 and the octave revolution 204–209
Massachusetts Institute of Technology and perception of timbre 39
(MIT) 50 and Pythagorean mathematics 203
Massumi, Brian 583 and Scandinavian yoiks 194–195
Master Samples Library 41 and sound perception in autistic
materialism, sonic 559–563, 565–570, children 420
573–575 and symphonic concerts 192
materiality 559–564, 567–568, 570–572 and symphonic music 192–193
mathematics 201–204, 210, 214, 574, 575n2 and voluntary auditory imagery 395, 398
mating strategies 150n10 memoire involuntaire 228
index 673

memory and influence of tone systems on music

and absolute pitch 417 perception 126
and music analysis 154 and Innerlichkeit concept 513n16
and musical performance 92n9 and limitations on music creation 118
and repetition 134, 138, 140, 147–149 and metamorphosis of voice 635
and sound recording 219–222, 224–226, and nature of opera 649n8
228–233 and nature of voice 637
Mendelssohn, Felix 46 and Pythagorean mathematics 203
mental imagery 446–447, 459; see also and responses to philosophical
guided imagery and music (GIM) rationality 569
mental practice/rehearsal 42–44, 254, 395, and sonic materialism 560–562, 575
397–403 metempsychosis 219, 224–229, 232–233
mental representation 374–375, 395, 401, micro timescales 245
435–436 microrhythmic transformations 598–602,
Merleau-Ponty, Maurice 15, 60, 221, 224, 518, 603
521, 576n6 Microsoft 51, 316, 579, 582–583, 586, 588–589,
mescaline 305 592n3
Meshes of the Afternoon (film) 307, 612–613 microtiming 39, 43, 46, 53
meso timescales 245 Middle Ages 124, 213–214
Mesopotamia 213 MIDI 48, 50–53, 99, 102, 128–129, 154–156, 247
Messiaen, Olivier 181–182 MIDI drums 606n4
meta-aesthetics 490, 505, 509 Milano, Francesco da 22
Metamorphosis (Ovid) 225 Miles Davis Quintet 85, 87, 89, 94n20
metaphor Milestones (Davis) 85
and embodied meaning 142–143 military life 286–288
and imaginative listening to music military-industrial complex 585, 591
476–479, 485nn21–22 Miller, George Bures 528–531, 532
and interaction with music 128–129 Miller, Glenn 90, 104–105, 108–109, 111n7
metaphor theory 434–435 Miller, K. 104–105, 108–109
and metempsychosis theory 220 Millet, Jean-François 541
and mimetic motor imagery 64 Milli Vanilli 105
and motor imagery in music mimesis 63–66, 430, 640
perception 68 Mimesis as Make-Believe (Walton) 491
and musical shape cognition 237–238, 241, mimicry 60–61, 72
245, 252 Mi.Mu Gloves 52
and nature of memory 229–230 mind
and sonic materialism 570–571 and emergent music 77–80, 83, 87, 90–91
and technical music performance 60 mind-body separation 23
and technological instruments 70–72 mind’s ear 45, 445, 457–459
and utopian allegory 501, 503 theories of 134, 141, 143–144, 146, 148
Metaphysical Song (Tomlinson) 637 wandering 454
metaphysics Mingus, Charles 91
and aesthetics of improvisation 536–537 Mingus Ah Um (Mingus) 553
and dualist perception of vocal minimal algorithmic sufficient statistic 155;
expression 629–631 see also Kolmogorov complexity
and emergent character of music 79 minimum description length (MDL)
Greek influence on music 129 principle 155, 157
674 index

Minter, Jeff 309–310 motor cortex 450

mirror neurons 265 motor imagery 44, 60, 63–68, 72–73, 102, 111n4
mistakes in performance 91 motor involvement 456
mist-nets 184–186 motor representation 111n5
mixing engineers 40–41 motor resonance 37
mnemic processes 225–226, 229–230 motor systems 146
modal patterns 247 motor theory of perception 243–244,
modalities of time 619–620 251–252, 446
modality shapes 250 motor-encoding strategies 450
modernism 539–540, 545–546, 550, 596 motor-error feedback 53
modes of listening 82, 92n10 motor-intentionality 85–87
modulatory motion 248 Mozart, Wolfgang Amadeus
Moffat, Ellen 647 and auditory imagery abilities 396
Molnar-Szakacs, I. 265 and defining classical music 550
Momente (Stockhausen) 265 and ethics of music 281
moment-to-moment listening 481 and humanist approach to musical
Monáe, Janelle 612–613, 621 aesthetics 538
monophonic melodies 203 and imagination/improvisation
Monteverdi, Claudio 638–639, 647, 649n8 relationship 25
mood 321, 325, 326f, 382, 427, 455, 525 and imaginative listening to
Moog synthesizers 619 music 467–468
Moonwatcher 100–101 and music analysis algorithms 173
More, Thomas 504 and musical imagery in performance 449
More Brilliant than the Sun (Eshun) 618 and operatic voice 639, 641–642, 648
Mork-Eidem, Alexander 648 Müllensiefen, D. 454
morphetic pitch 159 Müller-Lyer illusion 479
morphodynamical theory 241 multichannel sound systems 302, 315
morphology of sonic objects 246 multicollinearity 340
Morris, William 503 multimodality
Morse telegraphy 227 and embodied cognition 445, 447–448,
Morton, Timothy 520 450–451
Moss, Jamal (Hieroglyphic Being) 622, 624–626 multimodal imagery 43, 428–434, 436,
Mostly Bach (GIM program) 430 439–440, 450
Moten, Fred 617 multimodal imagery association
Mother (Gorky) 495 (MMIA) 448
motion multimodal sensory stimuli 53–54
and air performance 105–109 multimodal sound-motion shapes 249
and mimetic motor imagery 63–68 and voluntary auditory imagery 401–403
and motor imagery in perception 59 multisensory integration of sound 244
movement-based controls 71 multitrack recording 600
movement-sound conjunctions 67 Munch, Edvard 645
and musical shape cognition 237–255, 242f, Murray, J. M. 332
250f MusEcological perspective 440
see also body-motion “MUSHRA” test 337
motion and gestures 467, 474–483, 485n17 music
motion perception in music 244 and audio branding 353–362
motion-capture technology 51, 243 and auditory development 412f, 413f
index 675

and ecological model of auditory musical training 453; see also performance,
perception 411–416, 412f, 413f, 415f musical
and information technology 260 musical universals 146–147
in military life 286–288 music-brand fit 350–351
music, physics, and the mind 147–148, music-emotion induction
150n4 mechanism 356–360
music education 391–404 musicking 25, 31, 60, 73, 105, 141, 427, 438,
music festivals 302, 310, 313–316, 317 489, 599, 602, 604
music imagery 153–154 music-related shape cognition 250f
music information retrieval (MIR) 248, musique concrète 93n11, 103–104, 127, 130,
262, 268, 276n24 246, 269, 271
music of the spheres 150n4 preference 359
music perception 62–63, 153–154, 156, research related to 238–241, 244–246, 253
163–167, 175, 176n6 and sound/emotion connection 378–381
music psychology 63 Walton on 493–498
music synthesis 260–261, 265, 268–269, see also music analysis; musical shape
271–272, 274, 275n14, 276n26 cognition; musicology; performance,
music therapy 427–428, 430, 432t, 434, 436, musical
438–440 music analysis 153–156
music travel 430 applying a compression-driven
musica recta-musica vera (true music) 213 approach 174–175
musica universalis (harmony of the and compact encodings of musical
spheres) 120 objects 161–163
musical abilities in children with and compression-based model of musical
autism 411, 416, 418f, 419 learning 164–167
musical architecture 481 and data compression 159–161
musical expectancy 379–380 encoding and decoding 156–159
musical fit 351 evaluating algorithms 172–174
musical imagery 153–154, 253–254, and explaining individual
428–429, 434–440, 445; see also guided differences 167–169
imagery and music (GIM) and Kolmogorov complexity 163
musical imagery information retrieval and perceptual coding 163–164
(MIIR) 275n8 and point-set compression 159, 160f,
musical imagery tests 456–457 170–172, 175
musical information 153–156, 158, 161–163, Music in Contrary Motion (Glass) 475–476
170, 175 The Music of Strangers (documentary) 93n18
musical instants 251–253 music performance; see performance,
musical instrument playing 16 musical
musical learning 164–167 Música, por un tiempo (Rodriguez) 93n14
musical listening 153, 164–167, 166f, 409, Musica enchiriadis (anonymous) 205–209,
411 205f
musical literacy 400 Musica Practica (Pareja) 124
musical object 164–167, 165f, 174, 506 musical shape cognition 237–239
musical sequences 161 and motion features 249–251
musical space 476–479 and motor cognition 243–244
musical surface 163–164 and musical imagery 253–254
musical timescales 245–246 and musical instants 251–253
676 index

musical shape cognition (Continued) neural synchronization 147–148

and musical timescales 245–246 neurocognitive research 244
and notions of shape 239–243, 242f neurodynamic theory 147
prospects and challenges 254–255 neuroimaging 41, 46, 253–254, 304, 370,
schematic of 250f 377, 380–381, 383
and sound features 246–247 neurological experimentation 259, 262,
Musicglove 52, 52f 265, 275n5, 275n10
musicology 25, 154, 156, 172–173, 175, 282–283, neuroplasticity 373
438, 512n9, 639 neuroscience 142, 259, 262–263, 275n5,
Muslim identity 292 275n10, 437; see also brain imaging
Musurgia Universalis (Kircher) 181 and physiology
“My Red Hot Car” (Squarepusher) 599 neurotypical development 411, 412f, 413f,
MythScience 613, 625 414, 415f, 416–417, 424
and periodicity 138–139
N and repetition 149, 150n5
Nagel, F. 358–359 new interfaces for musical expression
Nagel, T. 27 (NIME) 241
Nakra, T. M. 50–51 New Materialism 559–560, 562, 575
Nancy, Jean-Luc 28, 644 “A New Physics Theory of Life” (England) 133
Narmour, E. 27 News from Nowhere (Morris) 503
nasheeds 289 NewTek Video Toaster 309
natural entropy 231 niche construction 199
“Natural Melanin Being . . . ” (Ras G.) 623 nighthawks 179–180, 184–189
natural selection 378 Nineteen Eighty-Four (Orwell) 619
naturalistic interaction 71 19-tone systems 125
nature 272–273 Ninth Symphony (Beethoven) 142
naturecultures 559, 575n1 No Me Quedo . . . (Fischman) 315, 318n14
Nawrot, E. S. 485n19 noise 175, 183, 358
Nazi propaganda 283, 293, 296–298 Noise (Attali) 611
Neisser, U. 399 Nomi, Klaus 641
Neon system 309 nonhuman species 185
neophobia 420 nonpitched sound 247
neo-Platonism 632 nonpropositional knowledge 17
neo-soul 600 nonrepresentational memory 223
Neubauer, Raymond 137, 150n10 non-verbal auditory hallucinations
neumes 119, 121, 131n1, 206–207, 206f, 215 (NVAHs) 304
neuroaffective theory 428–434, 436, 438 nonverbal mapping 375
neurons and neuronal activity Noosphere: A Vision Quest (Sacred
and embodied meaning 142–143 Resonance) 310
and guided imagery 437, 440 Noriega, Manuel 289
neural activation streams 65 normative conception of art 538–539
neural correlates of musical normativist aesthetics 489–493, 496–498,
emotions 380–381 505, 511, 511n2, 511n4
neural imaging 16 North, A. 361, 361t
neural models 382 North, A. C. 285
neural networks 51–52 Norwegian folk music 192, 216n9
neural structures 376 Norwegian National Opera 648
index 677

notation, musical opera

Bach’s notation-based polyphony 216n8 autopoiesis and the autoaffective
and externalization of imagined, complex voice 630–634
sound 191–210, 212–215 defining classical music 550
and music analysis 154 and hearing vs. listening 468
and musical shape cognition 241, 242f and Luba Luft’s Pamina 640–648
notation systems 119, 121–123, 126, 128 and “operatic voice” 637–640
notational audiation 400 phantom of the operatic voice 637–640
notation-based music 39, 53 videocentrism and expressive
and performing the imagination 267–268 voices 634–637
“Not-Yet-Being” (Noch-Nicht-Sein) operating systems 579, 581–591, 647–648
502, 504 Opie, T. 104
numerical elements of music 120, 122, 124, optical illusions 479–480
129–130, 131n6 Orchestra Wives (Gordon) 88
Nussbaum, Charles 480 orchestras 210
Nyad, Diana 392 organic rhythm 596–598
ornithology 186
O Ornithology Research Lab (Cornell
object-based instrumental actions 71 University) 182
objectivity Orpheus 638–639
objective affordances 69 Orsoni, Michel 525–526
objectively evaluable tasks 154–155, 172 orthography 197
objectively real possibility (objektiv-reale Orwell, George 619
Möglichkeit) 503 Otherness 84, 525–526
and sonic materialism 559–560, 562–563, Ouïr 92n10
565, 570, 576n4 “Outer Space Employment Agency”
object-oriented ontology 559 (Sun Ra) 619, 623–624
obsession with music 421–422 Outlines of a Philosophy of Art
occult 569 (Collingwood) 537
Occupational Personality Questionnaire 329 Overy, K. 265
Ockham’s razor 155, 157 Ovid 225
octaves 120–122, 124–126, 128–130, The Oxford Handbook of Music Psychology
131n3, 131n6, 162, 200–205, 209, (Hallam et al.) 439
216n4, 216n5
Oculus Rift virtual reality (VR) headset 310 P
Odo of Cluny 216n7 paleontology 563
Of Grammatology (Rousseau) 633, 649n6 Palmer, C. 43, 397
offline cognition 447–449 Panelcheck 330–331
O’Hara, H. 512n6 paradigm development 330
Oliver, R. 599–600 Parakilas, J. 552–553
Ondes Martenot 126–127 parallel motion homophony 20
“One More Time” (Margulis) 133 Paravicini, Derek 418
online cognition 448 Pareja, Ramos de 124, 125
onomatopoeia 182–183 Park, C. W. 351
ontology of music 91 Parker, Charlie 552, 625
onto-synergystic transition 210–211 Parker, Evan 29
open-air controllers 48 Parkhurst, Bryan 484n5
678 index

Parkinson’s disease 403 and empirical musical imagery 449–452

Parliament/Funkadelic 622 and environmental affordances of
Parmegiani, Bernard 267 sound 92n3, 93n14, 94nn19–20
parsimony 154–155, 157–158, 160–164, 167 and environmental sounds 272–273
Partch, Harry 127 and gesture/sound connection 108t, 111n1,
Pasley, B. N. 263 112nn9, 11
Pastoral Symphony (Beethoven) 471–472, 508 improvisation and imagination 15–31
Patel, A. D. 378 and jazz as classical music 552
patriarchal capitalism 590 and motor imagery in perception 59–73
Pearce, M, T 27, 164 musicking 25, 31, 60, 73, 105, 141, 427, 438,
The Peasant Wedding (Bruegel the Elder) 492 489, 599, 602, 604
pedagogy 391–394, 397–404, 458 and noninstrumental flourishes 93n18
Pedersen, T. H. 326, 326f, 330 performing the imagination 259–274
pentatonic scale 122, 129–130, 131n3 playing/performing distinction 269
perception of sound role of memory 92n9
and auditory attention 381–382 “to musick” 511n1
conceptual model of 325f and unconscious in music 93n12
and data compression 167 performance gestures 269
and Dorsch’s terminology 343n2 periodicity 134, 138–139, 142–143, 147–149
and environmental imagination 521–523 person yoiks 194
and informational quality of sound 294–295 personal sound zones 332, 341
and motor imagery in perception and Peters, J. D. 230
performance 62–63 Pfordresher, P. Q. 448
and musical shape cognition 237–238, Phaedrus in Dissemination (Derrida) 636
240–241, 243–244, 246–248, 252–253, 255 phase of sound 372
perception-action coupling 15–16, 22, phase-transitions 247
28–29, 31, 97, 101–102, 447 phenomenology
perceptive world 187–188 and affordances in musical
perceptual affordances 69 performance 98
perceptual coding 163–164 and arche-sonic vibrations 566
“Perceptual Evaluation methods for Audio of improvisation 15, 17
Source Separation” (PEASS) 340 and Innerlichkeit 498–499
perceptual qualities of sound 409–411, of involuntary musical imagery
416–417, 421 (INMI) 394, 454
Perceptually Optimized Sound Zones and memory/imagination relationship 229
(POSZ) 331–333, 340 and metamorphosis of voice 635
and sound/emotion connection 369–384 and metempsychosis theory 219–221, 223
perdahs 125 and music therapy 434, 438, 440
Peretz, I. 110 and musical shape cognition 240
The Perfect Swarm (Fisher) 136 musical voice compared with
perfectionist aesthetics 542–548 phenomenological voice 649n5
performance, musical musical vs. phenomenological voice 649n5
aesthetic potentials of environments and science of embodied cognition 30
518–520, 523, 525–526, 531 and sonic materialism 560–562, 574–575
affordances in musical performance 97–111 and technical music performance 60
and anticipated sonic actions 37–54 and Walton’s normativist theory 508
and controller-driven digital music 47–52 Philips NatLab 323
and emergent character of music 77–91 Philips TV 323
index 679

philosophical anthropology 490 and high art/vernacular art

philosophical humanism 538 dichotomy 539–540
phonography 62, 221, 226–228 and imagination/improvisation
physics 28–29, 147–148, 150n4 relationship 18–20, 22–25
physiological conditions 458 and metempsychosis theory 219
physiological schemata 60 and Walton’s normativist theory 496
piano 38–39, 42–43, 45–47, 53 playback 185–186
piano keyboards 125, 129 playing; see performance, musical
Piano Sonata No 17 (Beethoven) 242f playing by ear 416–417, 418f, 424
Pieslak, J. 287–293 Plugin Beachball Success (Satrom) 591
Pirrotta, Nino 639, 650n9 poetical fictions 479
pitch and pitch intervals poiesis 649n1
and anticipated sonic actions and point-set compression 159, 160f, 170–172, 175
sounds 39, 41, 43–45, 48, 50–51 Poizat, Michel 636, 638
and audio branding 359 polyphony
and auditory world of autistic and data compression 162
children 409 and externalization of pitch 210
and Byzantine neumes 131n1 and imagination/improvisation
and construction of diatonic scale 131n3 relationship 21
and ecological model of auditory polyphonic and polyrhythmic
perception 411, 414 complexity 181, 194, 205f, 209
and externalization imagined, complex of polyphonic complexity 194, 196, 203–205,
sound 192–195, 199–210, 212–214 205f, 207, 208f, 209–210, 213–214, 216n8
and guided imagery 439 polyphonic composition 208f
and motor imagery’s role in polyrhythmic music 478
perception 67–68 and Pythagorean tone system 203
and musical shape cognition 237–238, 240, popular music
245–247 and aesthetics of improvisation 550–551
pitch-naming strategy 162 contrasted with symphonic music 193
Pythagorean definition of 131n4 and data compression 162
and sound perception in autistic and digital audio workstations 599
children 409, 416–417, 418f, 421–422 and human/technology
and sound/emotion connection 370–373 interaction 602–603
and systemic abstractions 117–130 and motor imagery in perception and
terminology associated with 216n5 performance 62
three-dimensional depiction of 216n4 and music in detention/interrogation
and voluntary auditory imagery 393, situations 282
395–396, 402–403 organic and machinic rhythms 596–598
Pitt, Bradley 310 and rhythmic transformation in digital
Piva, Anna 568 audio 595, 596–599, 602–603
The Planets (Holst) 618 and voluntary auditory imagery in music
Platée (Rameau) 648 pedagogy 403
Plato and Platonist philosophy porous boundaries 567–570
and aesthetics of perfection 542, 544 porrectus neume 119
and devocalization of logos 649n6 positivist aesthetics 17
and discourse on imagination 98 posterior parietal cortex (PPC) 453
doctrine of the voice 636 posthumanism
and ethics of sound and music 260 and anthropocentric rationality 576n4
680 index

post-Romantic conception of art 540–541 and augmented unreality 302–303

poststructuralism 630 and autism research 416, 422
posture 244–246, 250–251, 250f and consumer sound 322–323, 343n1
Potter, Dennis 264, 274 and empirical musical imagery 445–446,
practical autonomy 540 449–459
practice of music performance 59–62, 66–67, and guided imagery 427–440, 428–429
72–73 and interrogation techniques 281–284, 286,
pre-attentive listening mode 82 288–293, 295–298
prediction 16, 154, 326, 331–341, 358, 397 memory imagery 470
pregnancy 567–570 and music analysis 164
prehistory 625 and music in detention/interrogation
Prelude in C minor (Bach) 159, 160f, situations 285
171, 172f and psychoacoustic research 241, 253
presence, environmental production of 521, psychoanalytic frameworks 79, 81–82, 85,
524–532 87, 92n8, 93n12, 636, 638, 649n4
Price, Emmett 551–552 psychological states 458
Prima, Louis 467 psychological warfare 284, 286, 288–290,
primary auditory cortex 371 292–293, 296
principal component analysis (PCA) psychosis 302
337, 338f and sound/emotion connection 369–384
The Principle of Hope (Bloch) 502, 514n19, and voluntary auditory imagery 400
514n22 PsyOps 289
The Principles of Art (Collingwood) 537 Ptah, the El Daoud (Coltrane) 624
Prior, Nick 602 Ptolemy 125
privacy issues 274 Puberty (Munch) 645
probability 149n1, 162 public relations 296
product choice 350 public transport 286
proficiency 99 pulse-code modulation (PCM) 155
programming language 200 pupitre d’espace 269–270
projection mapping 302, 315 pure psychic automatism 232
propaganda 283, 293, 296–298 purism 475
propositional imaging 471 Puritanism 221
propositional knowledge 17 Pythagoras and Pythagoreans 118–121, 124,
props 472–475, 485n14, 491–492, 495–496 126, 128, 131n4, 201–204, 203f, 210, 625
prostheses 220, 579–580, 585–589, 617–618
Proust, Marcel 224–225, 228–230, 232–233, Q
484n10 Quake Delirium (video game) 308
Pseudo-Odo 204 qualitative evaluation
Psych Dome 303, 310 analysis of complex audio stimuli 324–331
Psychedelia 309 and attribute modeling 338–341
psychedelic hallucination 301–303, 305, 309, methods and issues 321–324
312–313, 315–317 and perceptual model for prediction of
psychic ecosystem 220 distraction 332–341
Psycho (film) 476–478 qualities of sound 496
psychology and psychotherapy quantitative evaluation
auditory music imagery in music quantification of imagination 321–324, 329,
pedagogy 392–397, 400 332, 335, 340, 342
index 681

quantitative descriptive analysis relative pitch 418, 418f

(QDA) 321, 324, 326–332 relaxation 427
trajectory of current research 341–342 remixing 233
quasistationary body postures 250, 250f Renaissance 541
“The Question Concerning Technology” Renard 226, 228
(Heidegger) 603 rendering imagination 264–265
rendering memory 264
R Rentfrow, P. J. 361t
race relations 612, 617 repetition 109, 112n11, 133–134, 138–140, 144,
racial identity 559, 590, 614–615 146–149, 195
rāgas/rāginīs 125 representation 222–224, 305–306, 314f, 317n1,
Raimondo, Anna 561, 571–573, 575 436, 484n5
Rameau, Jean-Philippe 648 Republic (Plato) 18–19
randomness 163, 175, 339 Requiem (film) 526
rap music 289 Resch, Phil (character) 645
rapid perceptual image description resonance 67–68, 133–134, 139–140, 142,
(RaPID) 323, 342 146–147, 149
Ras G. (Gregory Shorter, Jr.) 622–623, 626 responsorial chant 195–196
Ras G. & the Afrikan Space Program 622 Reuter, C. 39
rationalist aesthetics 17, 22 reverse engineering 259, 262–265
raves 309, 313 Reybrouck, M. 98, 447
Ray, Michael 145 R&G (Rhythm & Gangsta): The Masterpiece
Reading Voices (Stewart) 639–640 (Snoop Dogg) 602
reality 559–561, 565–566, 570–573 rhythm
“Real-Possible” (objectiv-real Mögliches) 502, and audio branding 359, 361
508 and augmented unreality 315
recording technology beatless music 618
and brain imaging 262 and emergent nature of listening 81
and media degradation 233 and imaginative listening to music 467,
and memory/imagination 477–478
relationship 230–233 and involuntary musical imagery 455
and metempsychosis 226–228 and music in military life 286–287
and metempsychosis theory 219–221 and musical shape cognition 247
and perception of timbre 40–41 and performing the imagination 265–266,
and rhythmic transformation in digital 274
audio music 600 and responsorial chant 195–196
and voluntary auditory imagery 391–392 rhythmic transformation in digital
reduced listening mode 82, 93n11 audio 595–605
reed instruments 125 rhythmic-textural shapes 251
Reekes, Jim 582–583, 586–587 and Scandinavian yoiks 194
reentry process 150n5 and sound/emotion connection 379
reflective listening 79, 83 and symphonic concerts 192
regnum humanum 490, 500 and symphonic music 193
regression models 339–340 and voluntary auditory imagery 396
rehearsal 391–393, 395, 397–403 see also tempo
Reich, Steve 143 rhythmic entrainment 379
Reid, Rufus 112n9 Ribas, Moon 586
682 index

Ricoeur, P. 430 Savary, L. 440

Riemann, H. 122 scales 201, 209, 214
Righter, Carroll 568 scat singing 244
ringing 322, 343n3 Scenery of Decalcomania (Tsunoda) 561,
Rissanen, J. 155 563–567
ritual 61 Schaeffer, Pierre 93n11, 103–104, 127, 240–241,
Roads, C. 102 246, 249, 254, 273, 522, 530
Roaratorio (Cage) 273 Schafer, R. Murray 141, 264, 520
Robocop (film) 586 Schenker, Heinrich 492
Rochester Rappings 226–227 Scherer, K. 356, 359
rock and roll music 549 Schikaneder, Emanuel 641–642
Rock Band (video game) 109 Schindler’s Ark (Kenneally) 281–282
rock music 289 schizophrenia 303–304
Rodriguez, Robert Xavier 93n14 Schlaug, G. 98
Roholt, Tiger 79, 85–87 Schoenberg, Arnold 543, 551
role in improvisation 16–18, 31 Schubert, Franz 472
Roman Catholic church 195–197 Schumann 42
Romanticism 24, 222, 496–497, 499–500, 503, Schwartz, S. H. 352–353
511, 513nn16–17 Schwarzkopf, Elizabeth 646
romantische Kunstform 496 Schwitters, Kurt 1
root-mean-square error (RMSE) 340 scientific method 184, 227–228
Rose, Tricia 615 scientism 538
Rosen, Charles 470 Scott, Ridley 618, 641, 648
Ross, Alex 552 The Scream (Munch) 645
Rouget, G. 21 screen memory 230
Rousseau, Jean-Jacques 24–25, 633, 649n6 scripts 435
Rovan, J. 49 Scruton, Roger 470, 477–478, 485n21, 536,
Rusalka (Dvořák) 648 542–543
Russell, George 552 Scudo, Paul 182
Ruud, E. 440 Seashore, Carl 397
Ryle, Gilbert 111n4 secular music 22
secundum auditum 124–125
S seed stimulus 271
Sacks, Oliver 119 Selbstbildung (self-formation) 490
Salome (Strauss) 648 Selbstverständigung (self-reflection) 490
Samantha (AI character) 647 self-auditory motor feedback 47
Sami people 194 self-organization 134, 147, 149n2, 149n3
sampling 233 self-replication 134–135, 149
Sancho-Velazquez, A. 21, 24 semantic listening mode 82, 614
Sanders, J. T. 224 semantic shapes 250
Sanders, Pharoah 624 semantic understanding 414
Sangild, T. 606nn7–8 semiotics 205, 210, 214, 287–288, 509, 512n9
Sartre, Jean-Paul 537 semitones 121–122, 125, 131n3, 201, 205, 209
Saslaw, Janna K. 29, 449 Sense and Sensibility (Austen) 471
Satrom, Jon 591 sense of effort 65
Satz, Aura 561, 567–568, 570, 573–574 sensorimotor processing 16, 46, 321–341, 396,
savantism 416, 418, 420–421 446–448
index 683

sensory deprivation 295 social autonomy 540

sensory modalities 330 social bonding 139–140
Serra, Eric 130 social control 297
Seventh Symphony (Beethoven) 537 social function of music 378
Seventh Symphony (Mahler) 228 social hierarchies 496–498, 573, 590
sexual identity 559 social identity 361
Shakespeare, William 260, 272, 476, 538 The Social Shaping of Technology (MacKenzie
shamanic traditions 301 and Wajcman) 585, 589
Shannon-Fano code 162 Society for Ethnomusicology 282–283
shape cognition; see musical shape Socratic dialogue 636
cognition Solar Flare Arkestra Marching Band
Sheets-Johnstone, M. 109 Project 616
Shelley, P. B. 19 Solaris (film) 274
Shevy, M. 286, 361t SOLI ensemble 93n14
Shill, G. 104 Solis, Gabriel 26
Shorter, Gregory, Jr. (Ras G.) 622–623 Solomonoff, R. J. 155
Shostakovich, Dmitri 495 Sonami, Laetitia 52f, 270
SIATEC algorithm 168–174, 169f song act 194
Siddiq, S. 39 sonic actions 37–38, 44–54
Sidel, J. L. 327 sonic aggregate 263
Siege of Leningrad 495 The Sonic Boom (Beckerman) 582
Siegfried (Wagner) 646–648, 649n8 sonic environment 60, 78, 289, 449, 517–518,
sight-reading 42–43 531–532, 587, 623
signal processing 322, 324, 332, 342 elements of 523–524
signification 79–80, 84–85, 89–90, 632 and environmental imagination 519,
Silk Road Project 93n18 521–523
simplicity principle 164 and environmentality 518–520
simulation 60–61, 65 examples of sonic atmospheres 526–531
simulation of auditory experience 445–450, sonic fiction 613–615, 621–622
453, 458 sonic materialism 559–561, 573–575
simulations 102–103 and ancestrality of a sonic world 562–563
“Sing Sing Sing” (Prima) 467, 475 and arche-sonic vibrations 563–567
singing 200, 212, 454; see also vocality and political textures 570–573
Six Million Dollar Man (television) 586 and porous bodies 567–570
Sjögren, H. 323 sonic rhythm 81
slavery 617 sonic virtuality framework 262
Smalley, Denis 266, 275n11 sonifications 243, 270
small-scale music cultures 216n9 Sorentino, Paulo 85
smart contact lenses 316 Soulquarian collective 600
Smith, Carl 318n16 sound art 129
Smith, Cauleen 616 sound dancing 269–270
Smith, Harry 309 “Sound Design for Affective Interaction”
Smith, L. B. 20 (deWitt and Bresin) 583–584
Snoop Dogg 601–602 sound film 182
“So What” (Davis) 552 sound object 246, 273
social activities 117–118 Sound on Sound (magazine) 606n4
social anthropology 192 sound quality 263, 274
684 index

sound recording 219–222, 224–226, Standley, J. M. 285

228–233, 522 The Stars down to Earth (Adorno) 568
sound samples 595, 599–602, 606n4 “The Star-Spangled Banner” (Hendrix
sound sculpting 269–270 version) 472
sound waves 477 “Start Running” (Knight) 623
sound weapons 289–290 startup sounds (computer) 579, 581–583,
sound zones 332–341 586–588
sound-accompanying motion 248 stationary spectral shapes 250
sounding 180 statistical analysis 331
sound-producing motion 248 statistical learning 357
soundscape theory 520 statistical processing 243
SoundSelf (game) 310 STEIM studio 270
sound-to-image synesthesia 305f, 317n3 Stelarc 586
sound-to-light devices 309 “Stella by Starlight” (Young) 89
sound-tracings 238–239, 239f, 243 step-less interfaces 126–127
space and imagination 269 Stevens, J. A. 111n5
space and time 261 Stewart, D. W. 354
Space is the Place (Sun Ra) 144, 612–613, 616, Stewart, Garrett 639–640
619–620, 623–624 Stiegler, C. 362
“Space Is the Place (But We Stuck Here on Stober, S. 275n8
Earth)” (Hieroglyphic Being) 624 Stocker, Michael 590
spaces, acoustic 39, 40f Stockhausen, Karlheinz 39, 127, 265,
spatial information in sounds 374 546, 625
spatial metaphors 498 Stokowski, Leopold 430
spatial orientation 382 Stolen Legacy (James) 617, 625
spatial/emotional image 142 Stone, H. 327
Speaking Machine 234n4 “Straight, No Chaser” 85
spectral motion shapes 251 Strange Celestial Road (Sun Ra) 622
spectrograms 182–183, 241, 242f Strauss, Richard 483, 648
speculative fiction 615 “A String Quartet” (Woolf) 468
speech 194, 263, 413f, 415f strong imagination 469
Speech and Phenomena (Derrida) 632, strong music experiences 434
649n6 structural organization 134
The Spirit of Utopia (Bloch) 502, 514n19 Structures I and II (Boulez) 475–476
spiritual astral 574 Studio !K7 309
spiritualism 226–228 subconscious 141
Splet, Alan 527 subjectivity
spontaneity 546–547 and Bloch’s musical aesthetics 499–500,
sports 16 513n16
“Sprachcharakter” 512n11 and musical shape cognition 241
squared clouds 322, 343n4 and sonic materialism 559–560, 566,
Squarepusher 598, 599 571–572, 574, 576n4
stable pitch 247 and Walton’s musical aesthetics 496, 498
staff notation 204–209, 207f, 208f, 214 Subotnik, Rose Rosengard 543
Staffeldt, H. 323 subsymbolic features of music 239–240
standardized scales 207–209, 207f, 208f, 212, subvocalization 448, 452
214–215 suicide 281–282
index 685

Sun Ra 134, 143–146, 149, 150n8, 150n9, synthetic a priori knowledge 23

611–613, 616–620, 622–626 A Synthetic Love Life (Hieroglyphic
Sun Ra Featuring Pharoah Sanders and Black Being) 624
Harold (album) 624 systems and technologies
superior temporal gyrus (STG) 394 and bioacoustics 179–189
supplementary motor area (SMA) 394, 447, and the imaginary regime 117–131
452–453, 458 music, physics, and the mind 133–149,
supraconsciousness 141 150n4
surrealism 232, 307 music analysis and data
surround sound systems 302, 342, 529 compression 153–175
Survivor (Destiny’s Child) 598–599 musical notation as externalization of
sustained motion 249 imagined, complex sound 191–215
sustained sounds 247 and musical shape cognition 237–255
Suzuki, K. 52 and performing the imagination 259–274
Swahili 288 technology, memory, and
swarm behavior 134, 136–138, 141–142, 149 metempsychosis 219–233
symbolic meaning 360 Szafranski, R. 298
symbolic order of language 84
symphonic music 192–193, 212 T
Symphony No. 94, “Surprise Symphony” tactile feedback 43, 101; see also haptics and
(Haydn) 481–482 haptic feedback
Symphony of Sirens (Avraamov) 272 tactile location 269
synaptic connections 148 Tafelmusik 542
synchrony and synchronization 139, 147–148, Tajadura-Jiménez, A. 373
192, 195–196, 287 Taliban 292
syncopation 86, 89, 145 Talking Heads 478
synergistic action 211–212 talking machine 231
synesthesia 266–268, 302–303, 305–306, 305f, Tamino (character) 648, 650n9
308–310, 312–313, 316, 317n3, 480 tape recorders 127
Synod of Schwerin 21 Tarkovsky, Andrei 274
synoptic representation of notation 242f Tate, Greg 615
synthesized sound and music taxonomy of listening 79–80, 82–84, 93n10
and Afrofuturism 617–621, 623–626 Tchaikovsky, Pyotr Ilyich 42
BodySynth electrodes 617–618 team discussions 333–335
and digital audio workstations techne 603
(DAW) 596–598, 601–605, 606n4 technological sound recording 522, 590
human voice as synthesizer 275n14 “Technology and Black Music in the
and imagination and imagery 261 Americas” (Lewis) 614
imagination-driven sound synthesis 259, technology and technological advances
273 and augmented unreality 302, 305–306,
and performing the imagination 268–272 309, 313–317
popular technology 276n26 blackness and technology 616–618
and rendering imagination 263–265 and digitally augmented sound 48
speech synthesis 263 and imagination-driven sound
synthetic electronic sounds 61 synthesis 273
and systemic abstractions 126 influence on music 117
and visual imagination 267–268 and metempsychosis 227
686 index

technology and technological Thundercat (Steven Bruner) 623

advances (Continued ) “Tightrope” (Monáe) 612–613
and music performance 68 Tillmann, B. 416
and rhythmic transformation in digital timbral quality of sounds
audio 595–596, 598, 602–605, and audio branding 359
606nn5, 7–8 and consumer sound analysis 321–322
and synthetic artworks 309 and emergent character of music 88
technological instruments 70–73 meta timbre space 40f
see also specific technologies and motor imagery in music
telegraphy 226–228 perception 67
teleodynamics 211 and musical shape cognition 248
teleological frameworks 483 perception and imagination of 38–41
telephony 82–83, 228–230 and performing the imagination 274
The Tempest (Beethoven) 242f and sound perception in autistic
The Tempest (Shakespeare) 260 children 409
tempo 46 and sound/emotion connection 370–372
and audio branding 359 and subsymbolic features of
beatless music 618 music 239–240
and involuntary musical imagery 455 and symphonic concerts 192
and music in military life 286–287 and voluntary musical imagery 452
and rhythmic transformation in digital time
audio music 598, 600–602, 607n14 and anticipated sounds 53
and voluntary auditory imagery 393, 396 and anticipatory imagery of sonic
see also rhythm actions 44–47
temporal envelope 372 and emergent character of music 88
temporal lobes 393 and mental practice 43
tetrachords 121–122, 131nn2–3, 202, 209 and musical shape cognition 248
texture of sounds 247, 266; see also timbral musical timescales 245–246
quality of sounds and perception of timbre 41
“That-Which-Is” (Das Seiendes) 503 and timbral qualities 38–39
theatrical performance 61–62 time signatures 162
Théberge, Paul 606n2 time travel 611–614, 619–621, 623–626
Theile, G. 321 time warps 600–602, 604, 607n14
theism 570 timing shapes 251
Thelemic visual hallucinations 307 Tinctoris, Johannes 21
Thelen, E. 20 tinnitus 263
theory of mind (ToM) 30 Tippecanoe County, Indiana 184
therapeutic use of music 403, 428–429, Tomlinson, Gary 637, 639, 649n8
431–432t tonality
Theremin 48, 50, 126, 568–569 and data compression 162
thermodynamics 28–29, 133–134, 137, 142–143, and emergent character of music 88
149n1, 198 and resonance 147
Thesis Against the Occult (Adorno) 568–569 and Sun Ra compositions 145
Thompson, E. 30, 262 tonal language 216n9
Thompson, Hunter S. 307–308 tonal qualities 67
Thompson, J. 275n8 tonal systems 117–131, 121f, 131n3, 131n6
Thoreau, Henry David 181 tone color 39, 239–240; see also timbral
3D video games 306 quality of sounds
index 687

tone of voice 641 twelve-tone equal temperament 213

tone shapes 250 two-channel recording 342
“Tonfarbe” (tone color) 39 Twombly, Theodore (character) 647
ton-gemische 127 two-part 156–159
Toole, Floyd E. 323 2001: A Space Odyssey (film) 100–101, 526,
tools, musical instruments as 100–102 581, 621
Toop, David 520 typological classifications 246–247
TOPLAP 591
torture 140–141, 282–283, 288–290, 293, U
296–298 Ubiquitous Computing 588
total inner memory 401, 403, 451 ubiquitous listening 532
Townshend, Pete 106 Uexküll, Jakob Johann Baron von 181,
tracing of sounds 238–239, 239f, 243–244, 187–188, 518–519
252, 254 Umwelt 76, 181, 187–188, 518–519
traditional African music 288 Un chien andalou (film) 307
Trainor, L. J. 416 uncertainty 281
trance culture 315 unconditioned stimulus (US) 375–376
trance-film 307 unconscious in music 93n12
transacoustic community 180–181, unconsciousness 83–84
186–189 Universal Audio 111n6
transcendental idealism 562 universals, musical 146–147
transcendental imagination 222 University of British Columbia
transcendental subjectivity 23, 26–28 (UBC) 184
transference 82, 92n8, 477–479 University of Roehampton 419–420
transformations 395 “Untitled (How Does It Feel)” (Bjerke) 600
translational equivalence class (TEC) 168, URSONATE (Schwitters) 1
170–171 US Air Guitar Championship 107
transmigration of souls 219, 225–227, 229, Usikova, Evdokiya 495
232–233 utopian ideology and allegories 490, 499,
transmolecularization 622–626 512n11, 513n16, 514n22, 612
transparency 321–322
Treffert, D. 418 V
Trevarthen, C. 275n14 values 361t
Trier, Lars von 648 Van Campen, C. 266–268
triggers 472–475 van der Walt, Heine 107–108
The Trip (film) 307 Van Nort, D. 48–49
Trip Hackers 315 Vance, Donald 291–292, 295
Trip-a-Tron 309 Vangelis 618, 619
Trivedi, Saam 480 Varela, F. J. 15, 62, 223–224, 446
Troye, Nico de 580 variable pitch 247
Truax, Barry 180 Variations IV (Cage) 273
Trusheim, W. H. 397 Västfjäll, D. 358, 360, 379, 437–438
Tsunoda, Toshiya 561, 563–567, 574 Vatican Embassy siege in Panama 288–289
Tumult (Durango) 315 vector sequences 161
tuning systems 125, 199, 213 vector sets 158–159
turntable-ism 71–72 Velvet Underground 309
Tuuri, Kai 79, 82–84 Ventemille, Jacques Descartes de 22
The Twa Sisters (ballad) 225 ventral striatum 380
688 index

Ventriloqua (Satz) 561, 567–568, 570, 573–574 vocal music 123

Verdi, Giuseppe 24 vocal sounds 38
Vermeulen, I. 358 vocalization 422, 456
vernacular art 539–542 vocoders 620–621
Vertigo (film) 645 Voight-Kampff test 641
vibrations, sonic 561, 563–574 Volcler, J. 288–290
vibrato 69 volitional musical imagery 254
Victorian culture 228 Voltaire 509
Vidal, Francesca 514n20 volume 67–68; see also loudness
video games 104–105, 107–109, 306, 308 voluntary auditory imagery 391–404, 452–453
videocentrism 633–637 Voodoo (D’Angelo) 600, 602
Vienna Symphonic Library 39 Vorschein 501
Viennese School 550 Vortex Concerts 309
Vinteuil’s Sonata 484n10 vulnerable machines 606n5
virga neume 119 Vuoskoski, J. K. 360
virtual acoustic environments 42
Virtual Air Guitar 104 W
virtual instruments 99, 102–105 Wagner, Richard 492, 550, 611, 646–647
Virtual Light Machine (VLM) 309 Waisvisz, Michel 270
virtual performance 108t Wajcman, Judy 585, 589
virtual reality technology 310, 342 Waksman, S. 105–106
virtuosity 25, 109 Wallin, N. L. 100–101
vision Walsh, James P. 29, 449
visual description of sound 192, 196, Walther-Hansen, Mads 29
205–206, 209, 213, 219n8, 241 Walton, Kendall 473–474, 482, 485n14,
visual hallucinations 301–303, 304f, 489–498, 505, 508–509, 511, 512n7,
305–311, 313–316 514n22
visual illusions 479–480 war on terror 284
visual imagery 44, 375, 379, 459 Warhol, Andy 221, 309
visual imagination 266–268 Warnock, Mary 19–20, 222, 470
visual interpretation of sound 180, 182–183 Warp label 598
visual performance cues 46 Warren, Harry 88
visual sensory data 223 Washington, Kamasi 623–624
visualization 259, 267–268 water imagery 474
Vitányi, P. M. B. 155, 164 Water Mill with the Great Red Roof
VJ Chaotic (Ken Scott) 309 (Hobbema) 508
VJs 302, 306, 309, 313, 315 waterboarding 291
vocality Watt, R. J. 360
autopoiesis and the autoaffective We Are Not the First (Hieroglyphic
voice 630–634 Being) 624–625
and data compression 161–162 “We Can Work It Out” (the Beatles) 477
and Luba Luft’s Pamina 640–648 wearable video equipment 316
phantom of the operatic voice 637–640 Weber, R. J. 446
videocentrism and expressive Weheliye, Alexander 620–621
voices 634–637 Weltanschauung 510
vocal affect 376–377 Wernicke area of the brain 46
vocal imitation 244 Wesier, Mark 588–589
index 689

Western culture Williamson, V. J. 455

and aesthetics of perfection 542 Windows Vista 590–591
and audio branding 359–360 Windsor, W. Luke 110, 530–531
and Bloch’s musical aesthetics 499 Wishart, Trevor 122, 209–210, 267, 275n14
and computer system sounds 579, 582, 584 Witek, Maria 79, 85–87, 600
and empirical musical imagery 452 wolf yoik 194
and evolving technologies of Wolfe, Cary 630–631, 634–638, 644, 646, 648,
performance 107–108 649n5
and high art/vernacular art divide 541 Woodstock 106
and memory/imagination relationship 221 Woody, R. H. 439
and music analysis 161–162 Woolf, Virginia 467–469, 472–474, 484n3
and music in detention/interrogation Wordsworth, William 513n17
situations 292 working memory 394, 396, 402
and musical notation 39–40, 163, 204, World War I 296
239–240, 247–248 World War II 296
and musical shape cognition 238 writing 17, 213, 219; see also notation, musical
and Pythagorean tone system 202 Wundt-curve 358
and sound perception in autistic
children 422 X
and symphonic music 192 Xenakis, I. 252, 261
syntax of 422 X-Mix 309
and tonal interface with music 125–126
and tuning systems 118–123, 125–126, Y
129–131 yoik 194–195
Western art music 107–108, 108t, 398, 539, Young, J. S. 298
542–543, 549–554; see also classical music Young, La Monte 92n6
Western philosophy 222–223 You’re Dead! (Flying Lotus) 623–624
Western tonal music 162–163, 422, 482–483
Western philosophy 428, 514n24, 632 Z
“What about Us” (Brandy) 600 Zacharov, N. 323–324, 330, 333, 342
What Is Posthumanism? (Wolfe) 630 Zarlino, Gioseffo 124–125
Whitmer, T. C. 545 Zatorre, R. J. 263, 393
Whitney, John 308 Zentner, M. R. 356, 359
Wiggins, G. A. 27, 164, 172 Zhora (character) 641
Wilber, Ken 434 ZIPI 129
William of Ockham 155 Žižek, Slavoj 638–639
Williams, Martin 550 zygonic theory 411

The Oxford Handbook of Music Composition Pedagogy
No ratings yet
The Oxford Handbook of Music Composition Pedagogy
993 pages
249 Practical Musicology - Simon Zagorski-Thomas
100% (2)
249 Practical Musicology - Simon Zagorski-Thomas
249 pages
(Oxford Studies in Music Theory) de Souza, Jonathan - Music at Hand - Instruments, Bodies, and Cognition-Oxford University Press (2017)
100% (2)
(Oxford Studies in Music Theory) de Souza, Jonathan - Music at Hand - Instruments, Bodies, and Cognition-Oxford University Press (2017)
209 pages
Sonic Design: Alexander Refsum Jensenius
No ratings yet
Sonic Design: Alexander Refsum Jensenius
347 pages
PDF
100% (7)
PDF
449 pages
Music of The Twentieth Century - Style and Structure - Bryan R - Simms - 2nd Ed - , New York, London, New York State, 1996 - Schirmer Books, Prentice - 9780028723921 - Anna's Archive
No ratings yet
Music of The Twentieth Century - Style and Structure - Bryan R - Simms - 2nd Ed - , New York, London, New York State, 1996 - Schirmer Books, Prentice - 9780028723921 - Anna's Archive
472 pages
Oxford Studies in Music Theory de Souza, Jonathan Music at Hand
100% (4)
Oxford Studies in Music Theory de Souza, Jonathan Music at Hand
209 pages
Rethinking Music Through Science and Technology Studies
100% (1)
Rethinking Music Through Science and Technology Studies
305 pages
Wishart Trevor On Sonic Art PDF
100% (1)
Wishart Trevor On Sonic Art PDF
372 pages
Gary E. Mcpherson (Ed.) - The Oxford Handbook of Music Performance. Development and Learning, Proficiencies, Performance Practices, and Psychology Volume (2022, Oxford University Press) - Libgen - Li
100% (3)
Gary E. Mcpherson (Ed.) - The Oxford Handbook of Music Performance. Development and Learning, Proficiencies, Performance Practices, and Psychology Volume (2022, Oxford University Press) - Libgen - Li
737 pages
Postmodern Music - Postmodern Thought
100% (5)
Postmodern Music - Postmodern Thought
389 pages
The Bloomsbury Handbook of The Anthropology of Sound
100% (1)
The Bloomsbury Handbook of The Anthropology of Sound
577 pages
Dokumen - Pub Enacting Musical Time The Bodily Experience of New Music Oxford Studies in Music Theory Illustrated 9780190080204
100% (3)
Dokumen - Pub Enacting Musical Time The Bodily Experience of New Music Oxford Studies in Music Theory Illustrated 9780190080204
325 pages
Paul Sander-Liveness in Modern Music Musicians, Technology, and The Perception of Performance
No ratings yet
Paul Sander-Liveness in Modern Music Musicians, Technology, and The Perception of Performance
221 pages
Michael Spitzer - A History of Emotion in Western Music - A Thousand Years From Chant To Pop (2020) (Z-Lib - Io)
100% (1)
Michael Spitzer - A History of Emotion in Western Music - A Thousand Years From Chant To Pop (2020) (Z-Lib - Io)
432 pages
Language Music and The Brain PDF
100% (4)
Language Music and The Brain PDF
677 pages
Self Organized Biological Dynamics and Non Linear Control PDF
100% (2)
Self Organized Biological Dynamics and Non Linear Control PDF
443 pages
Theodore Gracyk - Rhythm and Noise - An Aesthetics of Rock
100% (1)
Theodore Gracyk - Rhythm and Noise - An Aesthetics of Rock
297 pages
The Mind Box - Part 1
100% (17)
The Mind Box - Part 1
100 pages
The Oxford Handbook of Music and The Brain 0198804121 9780198804123 Compress
No ratings yet
The Oxford Handbook of Music and The Brain 0198804121 9780198804123 Compress
848 pages
Spectres II Resonances - Shelter Press - Anna's Archive
100% (1)
Spectres II Resonances - Shelter Press - Anna's Archive
202 pages
(Musical Cultures of The Twentieth Century, 8) Gianmario Borio - The Mediations of Music - Critical Approaches After Adorno-Routledge (2022)
100% (3)
(Musical Cultures of The Twentieth Century, 8) Gianmario Borio - The Mediations of Music - Critical Approaches After Adorno-Routledge (2022)
221 pages
Conceptualizing Music. Cognitive Structure, Theory, and Analysis
100% (12)
Conceptualizing Music. Cognitive Structure, Theory, and Analysis
377 pages
Aesthetics of Music - Unknown
88% (25)
Aesthetics of Music - Unknown
339 pages
Sonic Sense The Meaning of The Invisible
No ratings yet
Sonic Sense The Meaning of The Invisible
19 pages
Between The Scenes Sample PDF
100% (9)
Between The Scenes Sample PDF
21 pages
2012 - Composers' - Creative - Process - Ch. 9 - Musical - Imaginations
No ratings yet
2012 - Composers' - Creative - Process - Ch. 9 - Musical - Imaginations
24 pages
Creativities Technologies and Media... Z
No ratings yet
Creativities Technologies and Media... Z
358 pages
Key Skills For Accountants
100% (1)
Key Skills For Accountants
6 pages
The Bloomsbury Handbook of Sonic Methodologies by Michael Bull and Marcel Cobussen
No ratings yet
The Bloomsbury Handbook of Sonic Methodologies by Michael Bull and Marcel Cobussen
849 pages
Echard PDF
No ratings yet
Echard PDF
527 pages
Schulze, Holger (2018) - The Sonic Persona
100% (3)
Schulze, Holger (2018) - The Sonic Persona
273 pages
Alessandro Bertinetto, Marcello Ruta (Eds.) - The Routledge Handbook of Philosophy and Improvisation in The Arts-Routledge (2022)
100% (2)
Alessandro Bertinetto, Marcello Ruta (Eds.) - The Routledge Handbook of Philosophy and Improvisation in The Arts-Routledge (2022)
757 pages
ViralTikTokReelsChecklist Download
100% (1)
ViralTikTokReelsChecklist Download
11 pages
Ected Reviews - Essays - and Occasional Pieces
100% (1)
Ected Reviews - Essays - and Occasional Pieces
398 pages
Nicholas Cook, Mark Everist (Eds.) - Rethinking Music-Oxford University Press (2001) PDF
100% (8)
Nicholas Cook, Mark Everist (Eds.) - Rethinking Music-Oxford University Press (2001) PDF
587 pages
Guitar Rigs, Classic Guitar & Amp Combinations - Hunter, Dave
100% (2)
Guitar Rigs, Classic Guitar & Amp Combinations - Hunter, Dave
212 pages
Language, Music, Syntax, and The Brain
No ratings yet
Language, Music, Syntax, and The Brain
8 pages
5a-Libro Marc Leman
No ratings yet
5a-Libro Marc Leman
15 pages
BITHELL e HILL - The Oxford Handbook of Music Revival
No ratings yet
BITHELL e HILL - The Oxford Handbook of Music Revival
721 pages
MEELBERG, Vincent. New Sounds, New Stories
100% (1)
MEELBERG, Vincent. New Sounds, New Stories
266 pages
21st Century Perspectives On Music, Technology, and Culture PDF
100% (3)
21st Century Perspectives On Music, Technology, and Culture PDF
216 pages
Dokumen - Pub Music Analysis Experience New Perspectives in Musical Semiotics 9789462700444
100% (1)
Dokumen - Pub Music Analysis Experience New Perspectives in Musical Semiotics 9789462700444
357 pages
Kerman, Joseph - Contemplating Music - Challenges To Musicology-Harvard University Press (1985)
100% (1)
Kerman, Joseph - Contemplating Music - Challenges To Musicology-Harvard University Press (1985)
257 pages
Critical Themes in World Music - A Reader For Excursions in World Music, Eighth Edition by Timothy Rommen (Editor)
No ratings yet
Critical Themes in World Music - A Reader For Excursions in World Music, Eighth Edition by Timothy Rommen (Editor)
107 pages
A History of Concepts of Space in Music
100% (2)
A History of Concepts of Space in Music
23 pages
John Shepherd, Peter Wicke - Music and Cultural Theory-Polity Press (1997)
100% (1)
John Shepherd, Peter Wicke - Music and Cultural Theory-Polity Press (1997)
242 pages
Godoy, Rolf Inge. Gestural Affordances of Musical Sound
No ratings yet
Godoy, Rolf Inge. Gestural Affordances of Musical Sound
12 pages
The Politics of Post-9/11 Music: Sound, Trauma, and The Music Industry in The Time of Terror
No ratings yet
The Politics of Post-9/11 Music: Sound, Trauma, and The Music Industry in The Time of Terror
232 pages
Powerpoint As An Innovative Tool For Teaching and Learning Araling Panlipunan in Grade Two
No ratings yet
Powerpoint As An Innovative Tool For Teaching and Learning Araling Panlipunan in Grade Two
10 pages
Philosophy of Music: Riccardo Martinelli
No ratings yet
Philosophy of Music: Riccardo Martinelli
182 pages
Audio Media and Information
No ratings yet
Audio Media and Information
55 pages
Sidney Finkelstein - How Music Expresses Ideas.-International Publishers (1970)
100% (2)
Sidney Finkelstein - How Music Expresses Ideas.-International Publishers (1970)
152 pages
The Idea of Art Music in A Commercial World, 1800-1930 PDF
No ratings yet
The Idea of Art Music in A Commercial World, 1800-1930 PDF
369 pages
INTEGRAL 10 Lidov
100% (1)
INTEGRAL 10 Lidov
29 pages
Seth Kimcohen Against Ambience and Other Essays
0% (1)
Seth Kimcohen Against Ambience and Other Essays
209 pages
Agawu, Schumann's Dichterliebe (Music Analysis 1984)
100% (2)
Agawu, Schumann's Dichterliebe (Music Analysis 1984)
23 pages
Life As An Aesthetic Idea of Music Full
No ratings yet
Life As An Aesthetic Idea of Music Full
192 pages
BOTA Tarot Cards With Explanations
100% (1)
BOTA Tarot Cards With Explanations
27 pages
Executive Reader: Ideas Into Action Guidebooks
100% (1)
Executive Reader: Ideas Into Action Guidebooks
59 pages
PR PHD 15597ocr PDF
No ratings yet
PR PHD 15597ocr PDF
283 pages
Music and Cultural Theory
100% (1)
Music and Cultural Theory
21 pages
01 Cook N 1803-2 PDF
100% (1)
01 Cook N 1803-2 PDF
26 pages
ALMEN & HATTEN 2013 Narrative Engagement With SXX
100% (1)
ALMEN & HATTEN 2013 Narrative Engagement With SXX
18 pages
Korsyn - Decentering Music
No ratings yet
Korsyn - Decentering Music
232 pages
RP - 60.3 Human Engineering For Control Centers PDF
No ratings yet
RP - 60.3 Human Engineering For Control Centers PDF
22 pages
(Cambridge Cultural Social Studies) Ron Eyerman, Andrew Jamison-Music and Social Movements - Mobilizing Traditions in The Twentieth Century-Cambridge University Press (1998)
No ratings yet
(Cambridge Cultural Social Studies) Ron Eyerman, Andrew Jamison-Music and Social Movements - Mobilizing Traditions in The Twentieth Century-Cambridge University Press (1998)
203 pages
Gestural Imagery in The Service of Musical Imagery: R.i.godoy@imt - Uio.no Tel. (+47) 22854064, Fax. (+47) 22854763
100% (1)
Gestural Imagery in The Service of Musical Imagery: R.i.godoy@imt - Uio.no Tel. (+47) 22854064, Fax. (+47) 22854763
8 pages
The Shared Landscape: What Does Aesthetics Have To Do With Ecology?
No ratings yet
The Shared Landscape: What Does Aesthetics Have To Do With Ecology?
14 pages
Literature Review Example Robotics
100% (1)
Literature Review Example Robotics
4 pages
2024 WK 4 Exercise in Pairs
No ratings yet
2024 WK 4 Exercise in Pairs
2 pages
Classroom Management
No ratings yet
Classroom Management
48 pages
Stroke Notebook
No ratings yet
Stroke Notebook
76 pages
Cognitive Psychology 30 MCQ
No ratings yet
Cognitive Psychology 30 MCQ
5 pages
The Oxford Handbook of Music and The Body - Entrainment and Embodiment in Musical Performance
No ratings yet
The Oxford Handbook of Music and The Body - Entrainment and Embodiment in Musical Performance
16 pages
Feature: Biofeedback and Expressive Writing: Emotional Disclosure and Its Effects On Health
No ratings yet
Feature: Biofeedback and Expressive Writing: Emotional Disclosure and Its Effects On Health
6 pages
The Employer's Guide To Asperger's Syndrome
No ratings yet
The Employer's Guide To Asperger's Syndrome
9 pages
A Primer On Critical Success Factor
No ratings yet
A Primer On Critical Success Factor
75 pages
NN Survey PDF
No ratings yet
NN Survey PDF
253 pages
Mechanisms of Perception Hearing, Touch, Smell, Taste and Attention
No ratings yet
Mechanisms of Perception Hearing, Touch, Smell, Taste and Attention
22 pages
Parental and Child Cognitions in The Context of The Family
No ratings yet
Parental and Child Cognitions in The Context of The Family
31 pages
How HFACS Works
No ratings yet
How HFACS Works
6 pages
Module 3-Introdution To Psychology
No ratings yet
Module 3-Introdution To Psychology
29 pages
Sdfsdfs
No ratings yet
Sdfsdfs
4 pages
How Can Leaders Influence ASafety Culture
No ratings yet
How Can Leaders Influence ASafety Culture
11 pages
Sample RBS Paper 2
No ratings yet
Sample RBS Paper 2
18 pages
Active Listening
No ratings yet
Active Listening
7 pages
Visitor Attraction Management: A Critical Review of Research 2009-2014
No ratings yet
Visitor Attraction Management: A Critical Review of Research 2009-2014
76 pages
Bandler - The Class of A Master (Info)
No ratings yet
Bandler - The Class of A Master (Info)
1 page
Bandler - The Class of A Master (Info)
No ratings yet
Bandler - The Class of A Master (Info)
1 page
Adam, Owen, Kemp 2015 Households, Livelihoods and Mining-Induced Displacement and
No ratings yet
Adam, Owen, Kemp 2015 Households, Livelihoods and Mining-Induced Displacement and
9 pages
Continuum Mechanics: Printed Book
No ratings yet
Continuum Mechanics: Printed Book
1 page
The Soundscape: Our Sonic Environment and the Tuning of the World
From Everand
The Soundscape: Our Sonic Environment and the Tuning of the World
R. Murray Schafer
4/5 (21)
The Science of Music: How Technology has Shaped the Evolution of an Artform
From Everand
The Science of Music: How Technology has Shaped the Evolution of an Artform
Andrew May
No ratings yet
Music and the sociological gaze: Art worlds and cultural production
From Everand
Music and the sociological gaze: Art worlds and cultural production
Peter J. Martin
No ratings yet

The Oxford Handbook of Sound and Imagination, VOLUME 2

Uploaded by

The Oxford Handbook of Sound and Imagination, VOLUME 2

Uploaded by

T h e Ox f o r d H a n d b o o k o f

Published in the United States of America by Oxford University Press

© Oxford University Press 2019

All rights reserved. No part of this publication may be reproduced, stored in

You must not circulate this work in any other form

Library of Congress Cataloging-in-Publication Data

2. Anticipated Sonic Actions and Sounds in Performance 37

3. Motor Imagery in Perception and Performance of Sound

4. Music and Emergence 77

5. Affordances in Real, Virtual, and Imaginary Musical Performance 97

7. From Rays to Ra: Music, Physics, and the Mind 133

8. Music Analysis and Data Compression 153

9. Bioacoustics: Imaging and Imagining the Animal World 179

10. Musical Notation as the Externalization of Imagined,

11. “. . . they call us by our name . . .”: Technology, Memory,

12. Musical Shape Cognition 237

13. Playing the Inner Ear: Performing the Imagination 259

15. Augmented Unreality: Synesthetic Artworks and

16. Consumer Sound 321

17. Creating a Brand Image through Music: Understanding

18. Sound and Emotion 369

19. Voluntary Auditory Imagery and Music Pedagogy 391

20. A Different Way of Imagining Sound: Probing the Inner

21. Multimodal Imagery in the Receptive Music Therapy Model

22. Empirical Musical Imagery beyond the “Mind’s Ear” 445

24. A Hopeful Tone: A Waltonian Reconstruction of Bloch’s

25. Sound as Environmental Presence: Toward an Aesthetics

26. The Aesthetics of Improvisation 535

28. Imagining the Seamless Cyborg: Computer System

29. Glitched and Warped: Transformations of Rhythm in the

30. On the Other Side of Time: Afrofuturism and the Sounds

31. Posthumanist Voices in Literature and Opera 629

Erkin Asutay, Postdoctoral Researcher, Department of Behavioral Sciences and

David Meredith, Associate Professor, Aalborg University

. . . Fümms bö wä tää zää Uu, pögiff, kwiiee.

A quick perusal of books on imagination will demonstrate that, if it is not viewed as

debate on New Materialism. This is followed by an essay on computer operating system

Systems and Technologies

From a background that critically investigates conceptualizations and understandings

This viewpoint is also shared by a number of embodied cognitive theorists, such as

This direct connection of imagination to action suggests that embodiment plays an

The Shared Double Histories of

A historical study of imagination often begins by examining Plato’s suspicion of

imagination is different from either perceiving or discursive thinking, though it is

Hume’s view on imagination is a variation on this, having no categorical difference

Tensions between the Sacred and the Divine

heaven to us on earth to insinuate in our ears a certain incredible divine sweetness;

Manetti’s statement suggests that an individual might reach an altered state of

Kant’s Reconciliation of Imaginations

Shifting Views on Imagination and Improvisation

harmonic motion indicated by a written-out figured bass. C. P. E. Bach described

Although Rousseau disapproved of imagination during the reception of an aesthetic

Imagination Eliminates the Need for

recent theories of auditory statistical learning, also supported by evidence reported

The research of Egermann and colleagues highlights the importance of including

Friston’s Free Energy Principle

How, in a changing and unpredictable world, do biological agents resist a natural

Language and Imagination

consciousness. Individuals in flow “are capable of strategically allocating attention, and

Merleau-Ponty, M. 2002. Phenomenology of Perception. Translated by C. Smith. London and

Steeves, J. B. 2004. Imagining Bodies: Merleau-Ponty’s Philosophy of Imagination. Pittsburgh:

Approaches to Sounds in Traditional

Perception and Imagination of Timbral Qualities

Sounds in Imagined Performances and Mental Practice

Anticipatory Imagery of Sonic Actions

since a number of actions are typically accompanied by sounds—most prominently in

Sonic Actions in Controller-Driven

Imagined Bodily Causes of Sounds

operations used, for instance, in sound synthesis; and a “perceptual” perspective,

Jackets and Gloves: Bespoke Mappings between

Figure 2.2 The Musicglove (Hayafuchi and Suzuki 2008, 242).

Pfordresher, P. Q. 2006. Coordination of Perception and Action in Music Performance.

Music exists at the intersection of organised sounds with our sensorimotor

music appreciation. The manner in which elements of primary, low-level “enactive”

or simulation of the sound-originating event, in an approximation that uses all available

Sound, Music, and the Body

2. Anticipated Sonic Actions and Sounds in Performance 37

4. Music and Emergence 77

5. Affordances in Real, Virtual, and Imaginary Musical Performance 97

7. From Rays to Ra: Music, Physics, and the Mind 133

8. Music Analysis and Data Compression 153

9. Bioacoustics: Imaging and Imagining the Animal World 179

12. Musical Shape Cognition 237

13. Playing the Inner Ear: Performing the Imagination 259

16. Consumer Sound 321

18. Sound and Emotion 369

19. Voluntary Auditory Imagery and Music Pedagogy 391

22. Empirical Musical Imagery beyond the “Mind’s Ear” 445

26. The Aesthetics of Improvisation 535

31. Posthumanist Voices in Literature and Opera 629