The Oxford Handbook of Sound and Imagination, VOLUME 2
The Oxford Handbook of Sound and Imagination, VOLUME 2
S OU N D A N D
I M AGI NAT ION,
VOLU M E 2
The Oxford Handbook of
SOUND AND
IMAGINATION,
VOLUME 2
Edited by
MARK GRIMSHAW-AAGAARD,
MADS WALTHER-HANSEN,
and
MARTIN KNAKKERGAARD
1
1
Oxford University Press is a department of the University of Oxford. It furthers
the University’s objective of excellence in research, scholarship, and education
by publishing worldwide. Oxford is a registered trade mark of Oxford University
Press in the UK and certain other countries.
1 3 5 7 9 8 6 4 2
Printed by Sheridan Books, Inc., United States of America
Contents
Acknowledgmentsix
Contributorsxi
The Companion Websitexiii
Introduction: Volume 2 1
Mark Grimshaw-Aagaard, Mads Walther-Hansen,
and Martin Knakkergaard
PA RT I M U SIC A L P E R F OR M A N C E
1. Improvisation: An Ideal Display of Embodied Imagination 15
Justin Christensen
PA RT I I SYST E M S A N D T E C H N OL O G I E S
6. Systemic Abstractions: The Imaginary Regime 117
Martin Knakkergaard
PA RT I I I P SYC HOL O G Y
14. Music in Detention and Interrogation: The Musical
Ecology of Fear 281
W. Luke Windsor
PA RT I V A E S T H E T IC S
23. Imaginative Listening to Music 467
Theodore Gracyk
PA RT V P O S T H UM A N I SM
27. Sonic Materialism: Hearing the Arche-Sonic 559
Salomé Voegelin
Index 653
Acknowledgments
This handbook has been a four-year labor of, if not unconditional love, then surely a love
tempered by blood, toil, tears, and sweat. Completing the task from proposal to comple-
tion, of compiling, editing, and publishing a two-volume work of over 650,000 words,
requires dedication, attention to detail, and, at times, sheer bloody-mindedness. Here,
thanks are due to the many people without whom the book you now hold in your hand
would not have seen the light of day. Our first thanks go to our commissioning editor
Norm Hirschy and music editor Lauralee Yeary, both of Oxford University Press, who
not only had the vision to see beyond the shortcomings of our initial proposal but also
firmly and patiently guided us through the many twists and turns of putting together the
final manuscript. Additionally, we are grateful to their many nameless colleagues at the
press who have tirelessly labored over copyediting, proofing, design, indexing, and a
host of other unknown tasks that take place behind the scenes. Thanks are also due a
number of anonymous reviewers, from proposal through to draft manuscript, who were
overwhelmingly supportive of what they read while also presenting us with many sug-
gestions for expansion and improvement. Alistair Payne, Professor of Fine Art Practice
and Head of the School of Fine Art at the Glasgow School of Art, has our gratitude for
allowing us to use his magnificent diptych The Fall as the cover art. Finally, although it is
our names on the front of the handbook—and thus our responsibility for any errors that
remain—none of what you are reading would have been possible without the contribu-
tions of our authors who have neither ceased in their enthusiasm for the project nor
flagged in the face of countless e-mails from us. Our heartfelt thanks go to them; we
hope you enjoy their efforts.
Contributors
www.oup.com/us/ohsi2
Oxford has created a website of images to accompany The Oxford Handbook of Sound
and Imagination, Volume 2. Readers are encouraged to consult this resource while reading
the volume as many images on the website are in color.
I n troduction
Volume 2
Mark Grimshaw-Aagaard,
Mads Walther-Hansen, and
Martin Knakkergaard
A working assumption might be that imagination has its genesis in past experience,
whether that genesis is social, cultural, or individual, and this influences the interpretation
of context and directs the thinking and ideas that arise from it. This is the theme that
fundamentally constitutes the substance of this handbook: the role and effect of imagi-
nation in the development and use of sonic processes and artifacts. Whether the act of
imagination is a previously unheard sound in a science fiction movie or a new composi-
tional style, such a process always derives from, and may be discussed and made sense of
in relation to, something pre-existing; the mundane recordings of wildlife that form the
basis for the alien’s screech, for example, or a distinctive difference from other composi-
tional styles. Yet, one should not make the mistake of assuming sonic imagination is
purely to do with the creation of new artifacts; one can rehearse mentally a piece of music
or recall and imagine a previously heard sound for the silent action seen on screen.
Equally, imaginative sound processes and artifacts themselves provoke other instances
and forms of imagination often far removed from the field of sound. It is this broad
reach that the handbook endeavors to cover.
2 grimshaw-aagaard, walther-hansen, and knakkergaard
The Chapters
The handbook comprises seventy chapters (excluding this Introduction) shared across
ten parts and two volumes that broadly arc across philosophical concerns to more prac-
tical matters before returning to philosophical issues again. However, the reader should
not expect a particular part to be purely philosophical and untainted by practice or for
those parts ostensibly dealing with practice to be unsullied by philosophy. As a multidis-
ciplinary handbook, we have endeavored to maintain that ethos across all parts meaning
that the reader, moving sequentially through the book, will, for instance, find a chapter
on the relationship of imagination to presence in the context of multimodal surfaces
juxtaposed to one dealing with the science of auditory imagery or a chapter on synes-
thetic art and hallucination abutting another detailing the process of controlling or even
excluding the listener’s imagination from auditory imagery. This is quite deliberate and
is a demonstration that particular topics within the broad theme of sound and imagi-
nation are as common to a variety of disciplines as those disciplines’ writing styles are
diverse. Yet there is a more devious method at work here: in a world where universities,
politicians, and research funding bodies all implicitly or explicitly work toward the
introduction: volume 2 3
rioritization of certain forms and areas of research, we would rather present a handbook
p
structure that ignores the barriers that arise in response to such short-term, limited,
and, yes, unimaginative thinking in order to show that the conditions for new thoughts
and ideas and for the synthesis of new knowledge are best nurtured and sustained in the
absence of academic siloes. So, our advice to the reader of this handbook is to indeed
read sequentially, and, in this, we trust that inspiration will be found.
Volume 2 of the handbook comprises five parts: “Musical Performance,” “Systems and
Technologies,” “Psychology,” “Aesthetics,” and “Posthumanism.” The first part takes us into
the sphere of musical performance and imagination. The chapters here cover musicking
and meaning-making in improvisation, the construction and use of sonic imagery in per-
formance, the role of motor imagery when playing a musical instrument, the emergence of
a hidden music facilitated through embodiment and environmental affordance, and the
connection between gesture and sound, particularly the imagined sound of the air guitar.
Part 2 of the volume has as its framework sound and imagination in the context of
systems and technologies. These are systems and technologies that underpin not only
the production of sound and music but also the analysis and description of sound and
music, stressing the role of imagination in what is consequently conceived of or inter-
preted. There are chapters on the profound influence of Ancient Greek imagination on
Western tuning systems, the centrality of repetition not only to the emergence of life but
also to the experience of music, the compression of musical information as a means to
analytical musical knowledge, how the technology and interpretation of bioacoustics
imposes a potentially incorrect imagining of the existences of other animal species and
our relationships to them, musical notation as an externalization of music, the reliance
of sound recording on the imagination, shape cognition and the experience of music,
and, finally, a speculative essay on a tool to extract sound imagery for musical purposes.
Part 3 concentrates on the psychology of sound and imagination. The first chapter
covers psychological warfare and interrogation practices under the influence of music,
tying this to the ethics of marketing, while the next chapter takes as its topic audiovisual
media, such as VJ events and gaming, to show how sounds can be used to evoke halluci-
natory experiences. The next four chapters deal with the areas of the control of auditory
imagination in the context of listening tests for the design of consumer audio products,
the use of music in the creation of brand identity, the role of emotions in sound perception,
and the part that musical imagery plays in music education and performance rehearsal.
The final three chapters in this part have topics ranging from the relevance of autism
research to the development of musical ability, the role of musical imagery in music
therapy, and musical imagery in an embodied framework.
The penultimate part of the volume comprises four essays dealing with aesthetics and
sonic imagination. Topics range over hearing-in—a form of imaginative engagement
with music—the viewpoint that music can be seen as a utopian allegory, the affective aes-
thetic potential of sonic environments, and the aesthetics of imperfection in the context
of musical improvisation.
Part 5 positions the theme of sound and imagination within posthumanism. The part
begins with a chapter on sonic materialism that brings sound into the posthumanist
4 grimshaw-aagaard, walther-hansen, and knakkergaard
Musical Performance
Justin Christensen deals with the bond between improvisation and imagination in
artistic experience. Starting with a reassessment in continental philosophy both of how
imagination is conceived and can be demonstrated, Christensen observes that the con-
nection between improvisation and imagination has little value in classic aesthetic theories.
He then goes on to argue for the value of improvisation as a reflection of perception–
action coupling that is central to newer theories that favor embodied approaches to music
cognition. In the light of such theories, where perception, action, and imagination are
seen as interdependent properties, Christensen proposes a greater recognition of the
processes of musicking—including improvisation—to better understand meaning-
making and the role of imagination in musical experience.
Clemens Wöllner’s chapter deals with sonic actions in music performance. He argues
that musicians construct sonic images in the act of playing that allow them to anticipate
sonic actions and to perform without auditory feedback (for instance, when sound is
switched off during a performance). The construction of sonic images is discussed in
the context of performances on both traditional and controller-driven instruments, and
Wöllner shows how a performer’s anticipated sonic actions differ according to the type
of instrument. In relation to this, the level of detail of the imagined sound qualities
involved in auditory imagery is explored, and Wöllner considers the mappings between
gesture and sound that are required for audiences to imagine the sound emerging from
the performer’s actions.
In his chapter, Jan Schacher asks the question of what it is to imagine and initiate an
action on a musical instrument. For Schacher, the body is the central element of listen-
ing and sound perception; thus, the body, in an embodied and enactive sense, becomes
the focus for his explication of musicking on both conventional instruments and digital
instruments where, in the latter case, bodily schemata are replaced by metaphors and
instrumental representations. This last provides a significant topic of enquiry in the
chapter and the theme is explored from a number of angles, chief among which is a focus
(through the lenses of motor imagery and imagination in music) on relations between
inner and outer aspects of our ways and means of listening to and performing music and
sound. Ultimately, Schacher identifies a tension underlying digital musical performance
brought about by the fracturing of the action–sound bond that is the basis not only for
our sound perception of the natural world but also for the world of culturally ingrained
musical performance.
introduction: volume 2 5
John Carvalho’s chapter is about the music that emerges in a skilled engagement with
an environment of sound. This music emerges in a piece of music when the embodied
skills of a composer, performer, or listener enact affordances that turn up in the environ-
ment defined by that piece of music. For Carvalho, the imagination animates these skills
and directs their embodied testing of the environment for affordances to be enacted
in making music emerge in it. To support this argument, Carvalho turns to an ecology of
mind and a taxonomy of listening that account for how composers, performers, and
listeners enact the music in a piece of music.
Marc Duby bases his exploration of sound and imagination on James J. Gibson’s affor-
dance concept. Using this concept, Duby studies how musicians benefit from real and
imagined actions in their interaction with real (e.g., pianos), virtual (e.g., MIDI con-
trollers), and air instruments (e.g., air guitars [nonexistent instruments]). In each case,
Duby explores the connection between gesture and sound and how the instruments
afford creativity. This leads to discussions of the range of imaginary possibilities that the
instruments afford musicians in the act of performing, composing, and listening, and of
how the special case of the air guitar challenges existing theories of embodied cognition.
In a chapter that explores the use within bioacoustics of technology and interpretation
of its data in order to assess human acoustic impact on nonhuman species, Mickey Vallee
introduces the term “transacoustic community” to illustrate the nefarious and trans-
gressive means those data are put to. Vallee makes the charge that the bioacoustics
community hears without listening, having a different imagination of sound to other
sound-based researchers. This imagination springs not only from the specific aims of
the bioacoustics community but also from the audio technology used that ultimately
relies on visualization for its data access; thus, the requirement of a mastery of visual
interpretation, rather than a refined aurality, affects our understanding of the relationship
between humans and other species.
Henrik Sinding-Larsen presents an analysis of how new tools for the visual description
of sound revolutionized the way music was conceived, performed, and disseminated.
The Ancient Greeks had previously described pitches and intervals in mathematically
precise ways. However, their complex system had few consequences until it was com-
bined with the practical minds of Roman Catholic choirmasters around 1000 ce. Now,
melodies became depicted as note-heads on lines with precise pitch meanings and with
note names based on octaves. This graphical and conceptual externalization of patterns
in sound paved the way for a polyphonic complexity unimaginable in a purely oral/aural
tradition. However, this higher complexity also entailed strictly standardized/homoge-
nized scales and less room for improvisation in much of notation-based music. Through
the concept of externalization, lessons from the history of musical notation are general-
ized to other tools of description, and Sinding-Larsen ends with a reflection on what
future practices might become imaginable and unimaginable as a result of computer
programming.
Bennett Hogg queries the relations that sound recording has commonly been thought
to have to memory, in particular mechanistic approaches to both memory and record-
ing that see them as processes that fix things through time. Making sense of memories
as they are “laid down,” and as they are “recalled,” involves imagining novel con
nections between memorized materials and networks of sensory, social, and cultural
experience. Imagination, through time, subtly reworks memories, modulating their
affect, re-evaluating the significance of particular memories, mythologizing them, even.
To understand listening to recordings according to a rather reductive model of memory
risks misrepresenting the richness of the cognitive ecosystem in which listening occurs.
In looking for a new metaphor to inhabit this ecosystem of memory, imagination, and
persistence through time, Hogg proposes metempsychosis, the transmigration of souls,
as a more suggestive model.
Rolf Inge Godøy’s chapter is focused on how notions of shape, understood as geometric
figures and images stemming from body-motion, metaphors, graphic representation,
and so on, can be associated with the production and perception of music. Central to the
chapter is the understanding that shape cognition is not only deeply rooted in the human
experience of music and in musical imagery as such, but also has the potential to enhance
our understanding of music as a phenomenon in general. The chapter also discusses
introduction: volume 2 7
how musical shape cognition, given that it is becoming increasingly more feasible with
new technology, can contribute to various domains of music-related research and,
furthermore, can be highly valuable to practical applications in musical and multimedia
artistic creation.
The conceiving of an evocative synthesis engine from our imagining of sound is the
substance of Simon Emmerson’s chapter. In it, Emmerson surveys recent neurological
experiments in the synthesis of speech and music and focuses his attention on how
our imagining of sound might be synthesized at some future date. The purpose of this
speculative chapter is not to map out the design and interface of such a system but rather
to conceive of what the act of imagining sound is and how the tool to extract such sound
imagery might be used both for musical purposes and to externalize these formerly
private sounds.
Psychology
Luke Windsor focuses on enforced listening to music in detention camps and explores
the use of music in detention and interrogation while pointing to the creation of ambi-
guity and uncertainty as a central effect. Windsor engages with several cases of psycho-
logical warfare during previous wars and references interrogation practices described
by CIA. The exploration of these cases, where music is used as a sound weapon, leads to
a broader discussion of the application of music to influence behavior and the ethics of
music application in, for instance, marketing.
The representation of hallucinations within audiovisual media forms the subject of
Jonathan Weinel’s chapter. Weinel builds his discussion around the concept of augmented
unreality, and he provides examples from films, VJ performances, digital games, and
other audiovisual media to show how sounds are used to form hallucinations. Ultimately,
Weinel points to a set of structural norms that defines psychedelic hallucinations and
the hypothesis that, with the improvement of digital technologies, the boundaries
between external reality and synthetic unreality may gradually dissolve.
Søren Bech and Jon Francombe’s chapter provides an illustration of how sensory
analysis is undertaken in the audio industry. It demonstrates how the industry attempts
to quantify the listener’s imagination—which is taken to include a range of modifiers of the
listener’s auditory experience including mood, expectation, and previous experience—
in order to ensure that the end result, the listener’s auditory experience or impression
after the audio transmission chain, matches the intended experience as closely as possible.
The example provided to illustrate this is of sensory analysis used for qualitative and quan-
titative evaluation of the listening experience in a personal sound zone. A perceptual
model was developed to reliably predict the listener’s sense of distraction (due to inter-
fering audio) from the experience of listening to audio intended for a particular zone.
Hauke Egermann explores the influence of music on how consumers imagine character-
istics of a brand. The chapter deals with several psychological mechanisms to outline
8 grimshaw-aagaard, walther-hansen, and knakkergaard
the associative and emotional potential of music and illustrates how music aids in estab-
lishing brand recognition and recall in consumers. Egermann elaborates on how music
can create brand attention and positive-affective responses in consumers and can affect
the cognitive meaning of a brand image. In summation, he argues for a brand-music com-
munication model that describes three different functions of music in the creation of
brand identity—brand salience, cognitive meaning, and emotional meaning.
The focus of Erkin Asutay and Daniel Västfjäll’s chapter is the relationship between
sound and emotion. Evidence from behavioral and neuroimaging studies is presented
that documents how sound can evoke emotions and how emotional processes affect
sound perception. Asutay and Västfjäll view the auditory system as an adaptive network
that governs how auditory stimuli influence emotional reactions and how the affective
significance of sound influences auditory attention. This leads to the conclusion that
affective experience is integral to auditory perception.
Andrea Halpern and Katie Overy argue that auditory imagery can be used actively as
a tool in various education and rehearsal sessions. Building on Nelly Ben-Or’s tech-
niques of mental representation for the concert pianist and the pedagogical approaches
of Zoltán Kodály and Edward Gordon, Halpern and Overy suggest that conscious and
deliberate use of auditory imagery should be exploited more in music education and
could be used with profound benefits for musicians as a rehearsal strategy. This leads to
a call for further empirical investigations of how voluntary auditory imagery might be
best used as a training method for both professional musicians and in classroom settings.
Adam Ockelford draws attention to that section of the population that does not nor-
mally engage in everyday listening; those for whom the acoustic properties of sound are
prioritized over the function of sound. In particular, Ockelford points to the listening of
autistic children for whom the perceptual qualities of sound exert an especial fascination
at the expense of the meaning that everyday listening would normally give to sound.
Through research that supports his contention that the development of musical abilities
in children precedes that of language skills, Ockelford makes the claim that the aural
imagination of those on the autistic spectrum is one that processes all sound, even speech,
for their musical structural properties and thus it is music that is the autistic person’s
gateway to communication and empathy.
Lars Ole Bonde considers musical imagery in the context of music therapy sessions
from the tradition of the Bonny Method of guided imagery and music which provides
well-documented examples of such imagery. While Bonde mainly focuses on listening
in clinical settings, he argues that image listening should be seen as a health resource in
everyday listening settings. Taking in perspectives from neuroaffective theory, Bonde
analyzes clinical material and evidence from the analysis of EEG data, and he shows how
music therapy theory—as a specific tradition within musicology—can contribute to
research on music listening through a greater understanding of the multimodal imagery
of such listening.
Musical imagery as a multimodal experience is also the topic of Freya Bailes’s chapter,
where embodied cognition is used as a framework to argue this. Existing empirical studies
introduction: volume 2 9
of musical imagery are reviewed, and Bailes points to future directions for the study of
musical imagery as an embodied phenomenon. Arguing that musical imagery can never
be fully disembodied, Bailes moves beyond the idea of auditory imagery as merely a
simulation of auditory experience by “the mind’s ear.” Instead she outlines how the imagi
ning of sounds and music is always connected to sensory-motor processing.
Aesthetics
Theodore Gracyk takes issue with the claim that imaginative engagement is a pre
requisite for the appreciation of music; that the experience of expressiveness in music
derives from an imaginative enrichment that allows music to be heard as a sequence of
motion and gestures in sound or that the expressive interpretation of music is guided by
imaginative description. While not completely rejecting an imaginative response to
music, Gracyk instead opts for a form of imaginative engagement with music described
as hearing-in. While not all music demands such engagement, hearing-in is not a trigger
for imaginative imagery but rather a musical prop that invites the listener to attend to
music’s animation in, for example, the form of musical causality and anticipation.
Bryan Parkhurst uses contemporary analytic “normativist” aesthetics as a lens through
which to view Leftist/Marxian “normative” aesthetics of music appreciation. In order to
do this, Parkhurst situates the key theses of Ernst Bloch’s theory of utopian musical lis-
tening within the framework of Kendall Walton’s theories of musical fictionality and
emotionality. The aim of this task is to make Bloch’s fundamental position perspicuous
enough that it can be assessed and evaluated. Parkhurst concludes that Bloch’s con-
tention that music should be viewed as a utopian allegory, and that the distinguished
office of (Western classical) music is to contribute to the political project of the imagining
of a better world (a “regnum humanum”), faces difficult objections.
An exploration of the affective dimension of our sonic environment forms the topic
of Ulrik Schmidt’s chapter. Schmidt poses the question, What does it mean to be affected
by the sonic environment as environment? The answer to this question involves a con-
ceptual distinction between atmosphere, ambience, and ecology. Schmidt argues that
affect and imagination are key components in the environmental production of presence,
and he provides examples of the aesthetic potentials of environments and explores how
an environment can “perform” in different ways to affect us as environment.
In his chapter on musical improvisation, Andy Hamilton deals with the cultural aspects
and historical practices of improvisation. The chapter sets out to explore the artistic status
of improvised music and this involves a discussion of the connection between imagi-
nation and art, and the contrast between composition and improvisation. These discus-
sions provide a theoretical framework to outline and defend an aesthetics of imperfection
as a contrast to an aesthetics of perfection. Finally, the artistic value of jazz as an improvised
art form is discussed, and Hamilton ponders whether jazz should be described as
classical or art music.
10 grimshaw-aagaard, walther-hansen, and knakkergaard
Posthumanism
Salomé Voegelin’s chapter contributes to current ideas on materiality, reality, objectivity,
and subjectivity as they are articulated in recent texts on New Materialism. Her chapter
makes use of the writing of Quentin Meillassoux and his posthumanist theorizing,
and it aims to contribute to the discussion through a focus on sound, as sound is seen to
support the reimagination of material relations and processes. In order to qualify and
substantiate her notion of sonic materialism, Voegelin includes narrow listenings to
three sound art works, focusing “on the inexhaustible nature of sound that exists perma-
nently in an expanded and formless now that I inhabit in a present that continues before
and after me.”
Daniël Ploeger investigates the designed sounds of operating systems (particularly
those of Apple and Microsoft computers and devices) from a cultural critical per-
spective, arguing that such sounds are cybernetic prostheses enhancing our capabilities.
In a chapter that takes in initial conceptions of the cyborg, which are overcast by the cyborg’s
roots in the military-industrial complex, to the subversion and use of operating system
sounds for creative purposes, Ploeger discusses the use and subsequent development of
such sounds—from early mainframe computers’ inherent noises to the designed sounds
of today’s computing devices—and shows how they underpin the imagining of computers
as extensions of the human body. Ultimately, for Ploeger, the recent design of operating
system sounds serves to propagate pre-existing ideological concepts of the cyborg as
evinced by our now technologically prosthetisized bodies.
Anne Danielsen’s chapter focuses mainly on the particular rhythmic feels that have
characterized many popular music styles since the 1980s and how these are produced
through the manipulation of sound samples and the timing of rhythm tracks. Initially,
Danielsen evaluates the formation of these rhythmic feels in two perspectives, an internal
and an external, but then goes on to discuss how they constitute a challenge to previous
popular music forms while, at the same time, offering new opportunities for human imagi-
nation and musical creativity. The chapter uncovers transformations across several
styles and discusses whether the present technology at hand can be seen as an extension
of the human, creating musics and causing gestural movements that go beyond human-
kind’s “natural” repertoire.
A musical imagining of the future and an exposition of a challenge to the normative
historical discourse are the subjects of Erik Steinskog’s chapter on Afrofuturism. These
topics are dealt with through a discussion of “blackness” and the theoretical discourse
that addresses the musical style and polemical and political stance of afrofuturist musi-
cians such as Sun Ra and others following in his path. Steinskog suggests that afrofutur-
ist music is a form of sonic time travel that intertwines the modalities of time represented
by notions of past, present, and future, his argument being that reimaginations, reinterpre-
tations, and revisions of a normative past are represented in the technology and music of
the black future.
introduction: volume 2 11
Reference
Schwitters, K. 1932. URSONATE. http://www.costis.org/x/schwitters/ursonate.htm. Accessed
June 15, 2018.
Pa rt I
M USIC A L
PE R FOR M A NC E
Chapter 1
Improv isation
An Ideal Display of Embodied Imagination
Justin Christensen
Introduction
Imagination has often been considered to play a major role in perception, in the
production and appreciation of aesthetic objects, in simulation, and in fanciful creative
thought. Phenomenologists such as Husserl have claimed that imagination should not
be thought of in terms of images or description, but rather as a means of structuring
consciousness and giving meaning to phenomena. For Merleau-Ponty, imagination
brings about a perception that “arouses the expectation of more than it contains, and
this elementary perception is therefore already charged with a meaning” (Merleau-Ponty
2002, 4, italics in original). More recently, Varela and colleagues have stated, “cognition
is not the representation of a pregiven world by a pregiven mind but is rather the enact-
ment of a world and a mind on the basis of a history of the variety of actions that a being
in the world performs” (Varela et al. 1992, 9). Simply said, cognition is not independent
of the world, but instead functions as a means to guide action and perception. For
phenomenologists and for many empiricists, imagination acts to bind cognition together
with action and perception, and can also be seen to govern our perceptions, shaping and
filtering them into meaningful experiences. For instance, Merleau-Ponty has supported
linking together imagination, action and perception, stating,
our waking relations with objects and others especially have an oneiric character as
a matter of principle: others are present to us in the way that dreams are, the way
myths are, and this is enough to question the cleavage between the real and the
imaginary. (1970, 48)
16 justin christensen
As strange as it sounds, when your own behavior is involved, your predictions not
only precede sensation, they determine sensation. Thinking of going to the next
pattern in a sequence causes a cascading prediction of what you should experience
next. As the cascading prediction unfolds, it generates the motor commands neces-
sary to fulfill the prediction. Thinking, predicting, and doing are all part of the same
unfolding of sequences moving down the cortical hierarchy. (2004, 158)
that when individuals perceive the actions and the emotions produced by others, they
use the same neural mechanisms as when they produce the actions and the emotions
themselves [even though] there is no complete overlap between self- and others
representations. This would lead to confusion and chaotic social interaction. (12)
Beyond this support for a common-coding of perception and action, there is evidence
of perception–action coupling in infant development (Johnson and Johnson 2000),
childhood development (Getchell 2007), and in sports activities (Ranganathan and
Carlton 2007). Attempting to move beyond common-coding, Maes and colleagues
support a more radical embodied approach to explain action-perception coupling in
music listening and performance. They argue, “sensory-motor association learning can
be considered a central mechanism underlying the development of internal models”
(Maes et al. 2014, 2). Similarly, they point out, “the ability to predict the auditory conse-
quences of one’s actions, which is one of the core mechanisms of action-based effects on
perception, depends on previous acquired sensory-motor associations” (2). Alongside
this, they “define the concepts of temporal contiguity and probabilistic contingency as
two [of the] main principles underlying associative learning processes” (2). Furthermore,
they consider that “musical instrument playing [is] a special but highly illustrative
case of sensory-motor association learning” (2). Subsequently, Maes and colleagues
follow a dynamic systems approach to examining embodied music cognition in order
to incorporate social interaction, introspection, and expressivity alongside sensory-
motor coupling in their meta-analytic study of embodied music cognition.
improvisation: embodied imagination 17
Similar to these views, I will argue that this ability to imagine and simulate the actions
and emotions of others plays a major role in our reception of both written music and
improvisational practice (see Wöllner, this volume, chapter 2, on the anticipation of
sound in performance). I fully acknowledge that this is only part of the picture, as
language, with its role of describing and categorizing, also plays a large role in filtering
and shaping our imagination. To attempt a more complete picture, I propose that imagi-
nation is made up of a dynamic collaboration between nonpropositional (embodied)
and propositional (language) forms of knowledge that construct our aesthetic experi-
ences. Bowman has remarked, “When we hear a musical performance, we do not just
‘think,’ nor do we just ‘hear’: we participate with our whole bodies; we construct and
enact it” (Bowman 2004, 47; also seen in Borgo 2005, 44). This combined approach
of the nonpropositional and propositional also fits well with both Heidegger’s and
Gadamer’s theories on aesthetics. For them, art has a great impact on our experiences in
the world, through presenting us with mutually dependent disclosures and “hidden-
nesses.” These disclosures not only disclose themselves, but they also reveal the presence
of the hidden, with these revelations drawing us further into the aesthetic experience.
While some “hiddennesses” are aspects of experience that we lack the ability to con-
ceptualize through language, others are aspects of experience that just have not as of yet
reached the center of attention to be conceptualized. This phenomenological perspec-
tive proposes that the separation of the nonpropositional from the propositional forms
of thought is achieved through having prereflective and reflective forms of conscious-
ness, validating that imagination should be viewed from multiple levels. As a result,
I will argue that a dynamic and multileveled perspective of imagination is necessary for
exploring our musical experiences. Furthermore, I propose that only focusing on the
reflective, verbally reportable aspect of consciousness impoverishes our understanding
of artistic emotions.
Many such as Daniel Dennett (1991) consider disclosure (the ability to give verbal
description) necessary before an experience can be considered a valid conscious
experience. This ignoring of the embodied aspects of the musical experience has also
permeated rationalist and positivist views on aesthetics. DeNora has stated,
listening is too often de-historicised in a way that imposes the model of the (histor-
ically specific) silent and respectful listener as a given. Within this assumption, the
body of the listener is excised. And yet, such listening involves a high degree of
bodily discipline. (2003, 84)
Borgo (2005) has pointed out the inherent tension between an embodied take on music
reception and a more traditional aesthetic view, which purports that one should disinter-
estedly examine an art object as something that is autonomous and fully separate
from oneself. Rationalist and positivist aesthetic viewpoints have also had difficulty in
making aesthetic judgments on musical experience, as the experience is ephemeral and
thus the only art object that remains to be judged is the score. This largely results from
the fact that representation has been considered vital for something to be recognized as
18 justin christensen
a fine art. For instance, owing to music’s ephemerality and lack of disclosed representation,
Kant has needed to consider music generally as agreeable sensations rather than a fine
art (with textual vocal music to be an exception to this norm). Supporting this, Kant has
stated that music, as he hears it, provides “nothing but sensation without concepts, so
that unlike poetry it leaves us with nothing to meditate about” (Kant 2007, §328). This
difficulty is exacerbated for improvisation, which lacks even a score to judge as an art
object. I propose that we need to get rid of the notion of reified art objects, and that we
instead need to re-examine imagination and improvisation, both through our past
interpretations of their usefulness and through the context of more current embodied
cognitive approaches, in order to give imagination and improvisation the important
recognition they deserve as part of artistic experience.
For a poet is an airy thing . . . he is not able to make poetry until he becomes inspired
and goes out of his mind and his intellect is no longer in him. As long as a human
being has his intellect in his possession he will always lack the power to make poetry
or sing prophecy. (Plato 1997, 942, 534b–c)
For me, this simultaneous disparagement and awe of artists is an unresolved dichotomy
in Plato’s writing. On the one hand, imitation and creative thought is reprehensibly far
from the transcendental “forms” while, on the other hand, artistic inspiration in its
excess can form a direct link to the transcendental. Even with this dichotomy, Plato’s
view of creative imagination has had an enduring influence on the arts, such as when
improvisation: embodied imagination 19
Shelley, in A Defence of Poetry, discussed the artist resonating to their internal and
external influences like an Aeolian lyre, drawing on divine effluence (Shelley [1840]
2010). Similarly, Samuel Taylor Coleridge gave a Platonic description of imagination
when he stated that imagination is “a repetition in the finite mind of the eternal act of
creation in the infinite I am” (Coleridge 1984, 304). Johnson has nicely summed up
Plato’s view on creative imagination in the Republic by stating, “But imagination of this
sort is not a rational faculty; rather, it is the result of a kind of demonic possession in
which the poet loses rational control” (2013, 143, italics in original). Thus, the Platonic
creative imagination in its reaching toward the divine necessarily walks a fine line
between genius and insanity. Moreover, if the Platonic creative imagination lacks this
transcendental genius, then it fails society.
The other side of imagination, which has been considered as a mediator between the
senses and thought, has often been seen to begin with Aristotle, for whom
[T]hose perceptions, which enter with most force and violence, we may name
impressions . . . By ideas I mean the faint images of these in thinking and reasoning . . .
The first circumstance, that strikes my eye, is the great resemblance betwixt our
impressions and ideas in every other particular, except their degree of force and
vivacity. ([1731] 1888, 1)
Hobbes was more extreme in his viewpoint, stating, “there is no conception in a man’s
mind, which hath not at first, totally, or by parts, been begotten upon the organs of sense.
The rest are derived from that original” ([1651] 1996, 9). For Hobbes, we are unable
to imagine anything that is completely free from the inputs of our sense apparatus.
Accordingly, Aristotelian imagination has a strong connection to an empirical philo-
sophical viewpoint and is shared by empiricists such as Hobbes, Berkeley, Locke, and
Hume as well as others.
The Platonic and Aristotelian viewpoints of imagination have an inherent tension
between them, congruent with the Cartesian mind–body problem. Since Descartes
(1985) presumed that the mind and the soul are more or less the same thing, the Platonic
view of imagination conforms well to a Cartesian substance of the mind in that it can be
seen to draw inspiration from the transcendental. Similarly, the Aristotelian view of
imagination conforms to a Cartesian substance of the body immanent in the world, as
this viewpoint explores the connections between sense experience and thought. Related
to this, Mary Warnock (1976) has asked how it is that imagination can both facilitate
everyday perception and be a source of novelty. For me, the only way to resolve this
20 justin christensen
dual nature of imagination is to deny the unfounded divisions that have been made
between the mind and the body, and between the divinely inspired and the routinely
experienced.
all the places of the Temple resounded with the sounds of harmonious symphonies
as well as the concords of diverse instruments, so that it seemed not without reason
that the angels and the sounds and singing of divine paradise had been sent from
improvisation: embodied imagination 21
the sacred mysteries should be celebrated with utmost reverence, with both deepest
feeling toward God alone, and with external worship that is truly suitable and
becoming, so that others may be filled with devotion and called to religion . . . But
the entire manner of singing in musical modes should be calculated, not to afford
vain delight to the ear, but so that the words may be comprehensible to all.
(Canon 8, quoted and translated in Monson 2002, 9)
These requests for suitable solemnity, reverence, and comprehensibility of text seem to
suggest that sacredness be given a priority over divine inspiration. I find this tension
22 justin christensen
between sacredness and divine inspiration to have similar aspects to the tensions
that we have earlier seen involved with the Platonic and Aristotelian imaginations.
Improvisation as divine inspiration tied to altered states of consciousness and mystical
religious experiences maps well onto Plato’s need for divine inspiration in creative
thought for it to be worthwhile. Similarly, improvisation as an innovative practice with
constraints and affordances maps well onto the Aristotelian imagination that links
together cognition with perception and action.
Furthering the links between divinely inspired improvisation and Platonic imagi-
nation, there have also been accusations of demon possession related to ecstatic
experiences in the church (Edwards 1742), as people feared that they did not know
where these divine inspirations came from, whether from God or from demons
(Edwards 1746).
Outside of the church, there have also been descriptions of ecstatic responses to
improvised performances of secular music. In the sixteenth century, Jacques Descartes
de Ventemille described an improvised (free fantasia) performance of Francesco da
Milano, stating,
he continued with such ravishing skill that little by little, making the strings languish
under his fingers in his sublime way, he transported all those who were listening
into so pleasurable a melancholy that . . . they remained deprived of all senses save
that of hearing. [He left] as much astonishment in each of us as if we had been
elevated by an ecstatic transport of some divine frenzy.
(Descartes de Ventemille, quoted in Weiss and Taruskin 2007, 134)
While improvisation’s role was not only to transport people into ecstatic frenzy, I would
argue that it, showing a similar dual nature to imagination, would have been considered
to act in both the transcendental and material realms. As a result, I will argue in similar
fashion against any needless divisions between improvisation as a powerful and sublime
experience and improvisation as innovatory practice. I see these two sides of improvi-
sation as necessary to one another.
which he was aware, and so, as a result, Kant introduced the concept of synthetic a priori
knowledge (a concept is described by Kant as “something that is universal and that
serves as a rule” [Kant 1998, A106]).
Kant’s appeal for a synthetic a priori is an appeal for universally generalizable experi-
ence. Basically, if we exist as minds (in a Cartesian sense) the only possibility for us to
share experience with one another and to communicate with one another is either to
have an apparatus for connecting to the world that is universally similar or to passively
receive the information from the world. If we have a mind–body separation and we want
to escape the need for only passively accepting our perceptions of the world, then we
would need a chip in our brains that could bridge the gap, translating experience
from the body to the mind. Kant thus considered subjective experience not to be fully
subjective, and gave it the name transcendental subjectivity, as it gave individuals
some access to an apodictic experience. Through this transcendental subjectivity,
there are some universal ways to experience reality, which builds a universal (although
invisible) foundation on which we can intelligibly communicate to one another and
experience things similarly.
I find that an oversimplified but useful comparison to these hidden (transcendental
subjectivity) universals is with the hidden rules of universal generative grammar that
Chomsky has proposed are hardwired into the brain to allow us to learn languages
quickly and efficiently. Nevertheless, I would argue that neither of these viewpoints has
panned out. Chomskyan generative grammar has rules that are very commonly used
around the world but has failed to find universality (Everett 2005), while Kant’s s ynthetic
a priori has failed to adequately bridge the gap between the substances of the mind and
body. Instead, as Johnson points out,
the rigid separation of understanding from sensation and imagination relegates the
latter to second-class status as falling outside the realm of knowledge. As a result,
judgments of taste can never, for Kant, be determinative or constitutive of experi-
ence. Neither can they be “cognitive,” for he regards the cognitive as the conceptual,
and there is no concept or rule guiding reflective judgment. (2013, 167)
In a way, Kant may have succeeded in bringing together the different types of imagi
nation. However, in doing so, he relegated the Platonic imagination to the same place on
the divided line as Plato, to the very lowest rank. Furthermore, to accomplish this he
had to give great power to transcendental subjectivity, thus partially muting the active
participatory role of the mind, and obligating our experiences and understandings to
be normalized through their universal underpinnings (Steeves 2004).
No: I want only one single creation, and I shall be quite satisfied if [the singers]
perform simply and exactly what [the composer] has written. The trouble is that
they do not confine themselves to what he has written. I deny that either singers or
conductors can “create” or work creatively.
(Verdi, quoted in Sancho-Velazquez 2001, 5)
In the spirit of the Enlightenment near the beginning of the eighteenth century, Fénelon’s
The Adventures of Telemachus presented imagination as something childish that could
be exploited when teaching children, but ought to have been eradicated by the time the
students reached adulthood. Fénelon followed the views of Plato, and thus imagination
was yet again considered inferior to reason (Lyons 2005). Romanticism is a period that
we, in retrospect, have decided celebrated creative imagination. However, I would argue
that it instead continues to celebrate divinely inspired imagination and attempts to
guard us against more fanciful imagination, while also continuing to separate the divinely
inspired from the routinely experienced.
Rousseau, an important figure for the growth of Romanticism also wanted to s uppress
imagination in favor of a Lockean passively receptive “sensibility.” Rousseau’s book
Emile; or, On Education had a teacher attempting to protect Emile from his imagination,
killing his imagination through habit. Through this book, Rousseau argued that people
should strongly avoid creatively modifying or distorting the information of the senses
(1979). This viewpoint matches well with Kant’s aesthetics on reception. In his Critique
of Judgement, Kant stated, “judgements of taste” must contain four characteristic
features: (1) disinterestedness, where pleasure is derived from judging something as
beautiful, and not the inverse of judging something as beautiful as a result of finding it
pleasurable; (2) universality of this judgment; (3) necessity of this judgment, where the
beauty is intrinsic to the object itself; and (4) “purposiveness without purpose” of the
object ([1790] 2007). Even though Kant gave the mind an active and participatory
role in perception, all of these features of aesthetic judgment work to greatly reduce the
role of imagination in the aesthetic appreciation of art.
improvisation: embodied imagination 25
Fetis saw the task of his self-proclaimed science of the “philosophie de la musique”
to be to show how tonality was the dialectical synthesis of theory and history; music
history was the actualization of tonality, while music theory could be seen as its
“objectification.” (1996, 56)
More recently, Kivy repeatedly has stated in one way or another through his book on
musical genius, “A musical genius is one who produces supremely valuable musical
works” (2001, 178). Thus, genius is not in the process of musicking (focusing on music
as an act rather than music as an object), but rather in the production of objectifiable
aesthetic art objects. Similarly, through the twentieth century, improvisation has been
discredited by major composers such as Boulez and Berio, and by influential music the-
orists such as Adorno (Peters 2009). When musicology has become a search for a canon
of masterworks, then improvisation has had a difficult time staying relevant in this
changing landscape. Concurrently in the early twentieth century, due to the influence
of behaviorism, imagination was relegated to “the outer darkness of intellectual irrele-
vance” (Morley 2005, 117).
While improvisation slowly departed from classical music, it quickly spread into
other musical styles. Jazz improvisation, which highlights spontaneous and unplanned
performance practice, gained great popularity in the early twentieth century, especially
in America, well in advance of any psychological or philosophical theories that highly
value the role of spontaneous imagination in cognition. However, improvisation in jazz
26 justin christensen
was not without difficulty in its early days. Gushee quotes Colles in the 1927 Groves
Dictionary of Music and Musicians stating that improvisation “is therefore the primitive
act of music-making, existing from the moment that the untutored individual obeys the
impulse to relieve his feelings by bursting into song” (Colles 1927). Alongside this,
Gushee quotes a questioner in Jacobs Orchestra Monthly from 1912 asking about faking
(improvisation) even though it is “not playing correctly,” and later mentions that
improvisation, which is now considered “as an act of creative imagination, in the past it
was sometimes considered anything but, something that inadequately trained musicians
did from rote or force of habit or necessity” (1927, 265–266). Once improvisation in jazz
grew through the early stages, something that might have helped allow improvisation to
flourish in jazz was the act of recording, as it gave the possibility for improvisation to be
fixed into an autonomous aesthetic art object (Solis 2009). Subsequently, this ability to
reify improvisation into fixed art objects has been a point of contention for improvisers.
Some, like Derek Bailey (1993) are greatly against this reification, while others such as
Gabriel Solis (2009) see a positive value in this.
Imagination in Embodied
Cognitive Science
Earlier, I presented the tradeoffs that Kant felt were necessary (overvaluing normalized
experiences while devaluing imagination) to allow individuals to have a participatory
role in perceiving their world around them. Also, I have already several times through
this chapter appealed for an embodied perspective as a better means of participating in
the world around us.
during the generation, imagination, as well as observation of one’s own and other’s
behavior” (4). Egermann and colleagues have stated that
animal or human individual inhabit[s] the same world as another, however close
and similar these living individuals may be . . . between my world and any other
world, there is first the time and space of an infinite difference, an interruption that
is incommensurable with all attempts to make a passage, a bridge, and isthmus, all
attempts at communication, translation, trope, and transfer that the desire for a
world or the want of a world, the being wanting a world will try to pose, impose,
propose, stabilize. There is no world there are only islands. (Derrida 2011, 31)
This agnostic position toward the other is more emphatically expressed by Derrida here
than he may actually believe, but theories following a Cartesian mind–body separation
have a very difficult time defending against this, which has led to their need to search for
essentialisms and universally innate common abilities (such as Kant’s transcendental
subjectivity) that allow for communication between these infinitely separated islands.
Furthermore, I argue, following an embodied viewpoint, that since we have a responsibility
28 justin christensen
to act in a goal-directed manner within the time constraints of real life, we actively
participate in the perception and meaning making of our environment rather than
passively apprehend an accurate reality.
Since our predictions are situated in similar embodiments and environments to
one another, we have access to similar shared experiences. Jean-Luc Nancy in Being
Singular Plural has given an adept perspective on how embodiment can defend us
from solipsism:
That which exists, whatever this might be, coexists because it exists. The co-implication
of existing [l’exister] is the sharing of the world. A world is not something external
to existence; it is not an extrinsic addition to other existences; the world is the coex-
istence that puts these existences together. . . . Kant established that there exists
something, exactly because I can think of a possible existence: but the possible
comes second in relation to the real, because there already exists something real.
(Nancy 2000, 29)
Nancy here proposes that thinking of reality includes an immediate coexistence. Thus, it
is impossible to approach things-in-themselves as they always already exist in a reality
that is plural. In my opinion, the troubles that we have had in joining the different types
of imagination into a single framework stem mainly from one problem, the completely
unnecessary split between mind and body.
environment will ultimately change our physical states to the point we cease to exist.
And yet, biological systems seem to violate these laws [and] they occupy a small
number of states with a high probability and avoid a large number of other states. In
short, they appear to resist thermodynamic imperatives. (263, 266)
Similar principles have been discussed by Saslaw and Walsh in their chapter (this volume,
chapter 7), and more can be read on this topic there. A major part of Friston’s proposed
answer to this question is the minimization of surprise. He suggests that we do this
in two ways; through altering our perceptions, and through altering our actions. First
of all, we should through learning, evolution, and neurodevelopment maximize the
chances that our model works by becoming a model of our environmental niche. Then,
we can change our level of surprise by changing (optimizing) either our predictions
(perceptions), our expectations (our model), or our actions (Friston 2012). “This per-
spective suggests that we should selectively sample data (or place ourselves in relation to
the world) so that we experience what we expect to experience. In other words, we will
act upon the world to ensure that our predictions come true” (272).
It is this selective sampling of data that supports imagination’s major influence on our
agential interaction with the world. This is supported by Calvo-Merino and colleagues’
work on dance (2005), where experts had stronger neural activations in the premotor
cortex than novices. Furthermore, in this study they also saw that experts had stronger
activations in response to dance styles more similar to their own. Thus, our ability to act
can also influence how we choose to sample our perceptual data and how it directly
influences our behaviors. Related to this, Borgo quotes Evan Parker when he states,
In the end the saxophone has been for me a rather specialised bio-feedback instru-
ment for studying and expanding my control over my hearing and the motor
mechanics of parts of my skeleto-muscular system and their improved functioning
has given me more to think about. Sometimes the body leads the imagination,
sometimes the imagination leads the body. (Parker 1992; Borgo 2005, 58)
Friston suggests something very similar, but rather as a tri-part structure, where some-
times perception leads, sometimes imagination leads, and sometimes the body leads. In
this regard, I find improvisation to be an ideal representation of Friston’s free energy
principle.
categorizations (Lupyan et al. 2007), and when dementia patients lose the concept
knowledge to describe their emotions they can lose the capacity to perceive them
(Lindquist et al. 2014). Further supporting this, Richard Hilbert has found that chronic
pain that does not fit within the standard descriptions of pain causes further suffering
and social isolation in patients, as they have no language with which to describe their
pain (Hilbert 1984). On top of this, meta-analysis results from Nilsson and de López are
consistent with the theory that children with language impairment have a “substantially
lower ToM [theory of mind] performance compared to age-matched typically develop-
ing children” (2016, 143). Furthermore, there have been significant links made between
language processing and spatial representation (Richardson et al. 2003), language pro
cessing and the perception of moving objects (Meteyard et al. 2007), and language
processing and color perception (Thierry et al. 2009). Evidence is considerable and
growing that language is not an innocent tool, but rather that it has a large impact on
conscious experience. As a result, it is easy to see the power that language might have in
the reflective awareness and reflective imagination involved in improvisation.
The importance of reflective consciousness becomes more readily apparent once one
realizes all of the necessary work that is put in to prepare for “spontaneous” improvisa-
tory performances. Musicians spend years learning theory, scales, and the techniques
necessary to pull this off. As Berliner has stated, there is “a lifetime of preparation and
knowledge behind every idea that an improviser performs” (2009, 17). Similarly, musi-
cians converse with one another between and during performances, reflecting and
narrativizing on their musical experiences. Furthermore, as can be seen in Schmicking’s
chapter (volume 1, chapter 4), musicians intermittently imagine and reflect on how they
can guide the music to where they want it to go.
Even with these strong arguments for reflective thinking in improvisation, I would
argue that some thinkers such as Dennett (1991) take it too far when they consider that
the ability to provide a verbal report is necessary for an experience to be considered a
valid conscious experience. As artistic experiences both elicit and elude conceptuali-
zation according to Heidegger, I propose that only focusing on the reflective, verbally
reportable aspect of consciousness impoverishes our understanding of artistic emotions
(Harries 2011). Following this, Thompson and colleagues state, “phenomenologists
emphasize that most of experience is lived through unreflectively and inattentively, with
only a small portion being thematically or attentively given” (Thompson et al. 2005, 59).
This is well supported by the work of Al Bregman, who has spent much of his career
studying auditory scene analysis. Bregman has researched how sound perceptually
either groups together or splits apart into auditory streams, and has found that the
most transformative effect on how the auditory stream is processed is whether it is
foregrounded or backgrounded in the listener’s mind (Bregman 1990). With that said,
Bregman’s research also supports the notion that both the foreground and background
musical elements are consciously experienced by a listener. Flow (Csikszentmihalyi
1990), frequently considered to be an optimal state both for performing and listening to
music, also has support for it being a strong mix of reflective and prereflective states of
improvisation: embodied imagination 31
Conclusion
Neither improvisation nor imagination fares well within either the classical Greek or
Enlightenment theories of knowledge and understanding. These theories have instead
privileged objective, disinterested forms of approaching reified aesthetic objects, which
neither imagination nor improvisation has to offer. In this chapter, I have argued that
this failure of imagination and improvisation to find value within aesthetic theories that
value the ontology of fixed art objects says less about the value of either imagination or
improvisation than it does about the value lacking in these aesthetic theories. Instead,
I suggest that we need to focus more on newer aesthetic theories that value the ontology
of the process of musicking. In relation to empirical research, improvisation and imagi-
nation have made a resurgence into popular acceptance concurrent with the rise of
embodied theories that make direct links to perception and action. Researchers like
Decety, Grezes, and Friston have shown the extraordinary influence that imagination
has over perception and action, and the close links that perception, action, and imagina-
tion have to one another. Furthermore, improvisation, with its reliance on a strong inte-
gration of perception, action, and imagination can be seen as a strong reflection of this
tight interdependence. As a result, if we really want to understand our embodied, holis-
tically integrated, and time-constrained musical experiences, I feel that it is imperative
that we include the investigation of the improvisatory and participatory aspects of
meaning making that occur as part of the process of musicking.
References
Abdallah, S., and M. Plumbley. 2009. Information Dynamics: Patterns of Expectation and
Surprise in the Perception of Music. Connection Science 21 (2–3): 89–117. doi:10.1037/0
02h2-3514.78.1.53.
Aristotle. 2004. De Anima. Translated by Hugh Lawson-Tancred. Reissue edition. London:
Penguin.
Bailey, D. 1993. Improvisation: Its Nature and Practice in Music. New York: Da Capo.
Berliner, P. F. 2009. Thinking in Jazz: The Infinite Art of Improvisation. Chicago and London:
University of Chicago Press.
32 justin christensen
Boden, M. A. 2004. The Creative Mind: Myths and Mechanisms. Oxon and New York:
Psychology.
Borgo, D. 2005. Sync or Swarm: Improvising Music in a Complex Age. New York and London:
Continuum.
Bowman, W. 2004. Cognition and the Body: Perspectives from Music Education. In
Landscapes: The Arts, Aesthetics, and Education, edited by L. Bresler, 29–50. Boston,
Dordrecht, and London: Kluwer Academic.
Bregman, A. S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound.
Cambridge, MA, and London: MIT Press.
Calvo-Merino, B., D. E. Glaser, J. Grèzes, R. E. Passingham, and P. Haggard. 2005. Action
Observation and Acquired Motor Skills: An fMRI Study with Expert Dancers. Cerebral
Cortex 15 (8): 1243–1249. doi:10.1093/cercor/bhi007.
Christensen, T. 1996. Fetis and Emerging Tonal Consciousness. In Music Theory in the Age of
Romanticism, edited by I. Bent, 37–56. Cambridge: Cambridge University Press.
Coleridge, S. T. 1984. Biographia Literaria; or, Biographical Sketches of My Literary Life and
Opinions. Vol. 1. Edited by J. Engell and J. Bate. Princeton: Princeton University Press.
Colles, H. C. 1927. Groves Dictionary of Music and Musicians. 3rd ed. Edited by H. C. Colles.
New York: Macmillan.
Csikszentmihalyi, M. 1990. Flow: The Psychology of Optimal Experience. New York: Harper &
Row.
Damasio, A. R. 2010. Self Comes to Mind: Constructing the Conscious Brain. 1st ed. New York:
Pantheon.
Decety, J., and J. Grèzes. 2006. The Power of Simulation: Imagining One’s Own and Other’s
Behavior. Brain Research 1079 (1): 4–14. doi:10.1016/j.brainres.2005.12.115.
Dennett, D. C. 1991. Real Patterns. Journal of Philosophy 88 (1): 27–51. doi:10.2307/2027085.
DeNora, T. 2003. After Adorno: Rethinking Music Sociology. Cambridge: Cambridge
University Press.
Derrida, J. 2011. The Beast and the Sovereign. Vol. 2. Edited by M. Lisse, M.-L. Mallet, and
G. Michaud. Translated by G. Bennington. Chicago and London: University of Chicago Press.
Descartes, R. 1985. The Philosophical Writings of Descartes. Vol. 2. Translated by J. Cottingham,
R. Stoothoff, and D. Murdoch. Cambridge: Cambridge University Press.
Dufay, G. 1966. Opera Omnia. Vol. 2. Edited by H. Besseler. Rome: American Institute of
Musicology.
Edelman, G. 2001. Consciousness: The Remembered Present. Annals of the New York Academy
of Sciences 929 (1): 111–122. doi:10.1111/j.1749–6632.2001.tb05711.x.
Edwards, J. 1742. Some Thoughts Concerning the Present Revival of Religion in New-England,
and the Way in Which It Ought to Be Acknowledged and Promoted, Humbly Offered to the
Publick, in a Treatise on That Subject. Boston: S. Kneeland and T. Green.
Edwards, J. 1746. The Treatise on Religious Affections. New York: American Tract Society.
Egermann, H., M. T. Pearce, G. A. Wiggins, and S. McAdams. 2013. Probabilistic Models
of Expectation Violation Predict Psychophysiological Emotional Responses to Live
Concert Music. Cognitive, Affective, and Behavioral Neuroscience 13 (3): 533–553. doi:10.3758/
s13415-013-0161-y.
Everett, D. L. 2005. Cultural Constraints on Grammar and Cognition in Pirahã: Another
Look at the Design Features of Human Language. Current Anthropology 46 (4): 621–646.
doi:10.1086/431525.
improvisation: embodied imagination 33
Fellerer, K. G., and M. Hadas. 1953. Church Music and the Council of Trent. Musical Quarterly
39 (4): 576–594.
Ferand, E. 1961. Improvisation in Nine Centuries of Western Music: An Anthology. Koln:
Arno Volk Verlag.
Friston, K. J. 2012. Free Energy and Global Dynamics. In Principles of Brain Dynamics: Global
State Interactions, edited by M. I. Rabinovich, K. J. Friston, and P. Varona, 261–292.
Cambridge, MA, and London: MIT Press.
Getchell, N. 2007. Developmental Aspects of Perception-Action Coupling in Multi-Limb
Coordination: Rhythmic Sensorimotor Synchronization. Motor Control 11: 1–15.
Grout, D. J., and Claude V. Palisca. 1988. A History of Western Music. London: Norton.
Harries, K. 2011. Art Matters: A Critical Commentary on Heidegger’s “The Origin of the Work
of Art.” New York: Springer.
Hawkins, J., and S. Blakeslee. 2004. On Intelligence. New York: Times Books.
Hilbert, R. A. 1984. The Acultural Dimensions of Chronic Pain: Flawed Reality Construction
and the Problem of Meaning. Social Problems 31 (4): 365–378. doi:10.2307/800384.
Hobbes, T. 1996. Leviathan. Edited by J. C. A. Gaskin. Oxford: Oxford University Press.
Hume, D. 1888. A Treatise of Human Nature. Edited by L. A. Selby-Bigge. Oxford: Clarendon
Press.
Huron, D. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge:
MIT Press.
Johnson, M. 2013. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason.
Chicago and London: University of Chicago Press.
Johnson, S. P., and K. L. Johnson. 2000. Early Perception-Action Coupling: Eye Movements
and the Development of Object Perception. Infant Behavior and Development 23 (3–4):
461–483. doi:10.1016/S0163-6383(01)00057-1.
Kant, I. 1998. Critique of Pure Reason. Edited by A. W. Wood. Translated by P. Guyer. The
Cambridge Edition of the Works of Immanuel Kant. Cambridge and New York: Cambridge
University Press.
Kant, I. 2007. Critique of Judgement. Edited by N. Walker. Translated by J. C. Meredith. Oxford:
Oxford University Press.
Kivy, P. 2001. The Possessor and the Possessed: Handel, Mozart, Beethoven, and the Idea of
Musical Genius. New Haven and London: Yale University Press.
Lindquist, K. A., M. Gendron, L. F. Barrett, and B. C. Dickerson. 2014. Emotion Perception,
but Not Affect Perception, Is Impaired with Semantic Memory Loss. Emotion 14 (2):
375–387. doi:10.1037/a0035293.
Locke, J. 1803. An Essay Concerning Human Understanding. 1st American ed. Vol. 2. Boston:
David Carlisle.
Lupyan, G., D. H. Rakison, and J. L. McClelland. 2007. Language Is Not Just for Talking
Redundant Labels Facilitate Learning of Novel Categories. Psychological Science 18 (12):
1077–1083. doi:10.1111/j.1467–9280.2007.02028.x.
Lyons, J. D. 2005. Before Imagination: Embodied Thought from Montaigne to Rousseau.
Stanford, CA: Stanford University Press.
Maes, P.-J., M. Leman, C. Palmer, and M. Wanderley. 2014. Action-Based Effects on Music
Perception. Frontiers in Psychology 4:1008. doi:10.3389/fpsyg.2013.01008.
Merleau-Ponty, M. 1970. Themes from the Lectures at the Collège de France, 1952–1960.
Evanston, IL: Northwestern University Press.
34 justin christensen
A n ticipated Son ic
Actions a n d Sou n ds
i n Per for m a nce
Clemens Wöllner
Introduction
Imagine the following situation: a guitarist performs in a large concert venue together
with a singer, a bass guitarist, and a percussionist. At one point, the stage monitor speak-
ers cease functioning, and the guitarist does not hear the sound of her own instrument
or the sounds produced by the others. Only some noise from the audience reaches her
ears and some delayed feedback from the loudspeakers directed away from the stage.
Puzzled at first, the guitarist quickly glances at the fellow musicians and then continues
playing. They had performed the piece a dozen times and she is an experienced guitarist,
so she is able to anticipate the sound outcomes of her own performance actions and to
synchronize with the imagined sounds of the others, all the while ignoring the delayed
acoustic feedback from the audience loudspeakers. Musicians such as the one described
in this situation are typically well aware of the effects of their actions, so they may
depend less on actual auditory feedback. Long training enables them to associate fine-
tuned motor behavior with sounding action outcomes, and their actions are tightly
coupled with sounds in their imagination (Keller 2012). Even when hearing sounds,
skilled musicians may experience a motor resonance of the corresponding actions
(Bangert et al. 2006). The sound waves transform to sonic actions in their auditory and
motor imagery, enabling them to perform even when auditory feedback is disrupted or
when anticipating their own and others’ sounds.
This chapter elaborates on anticipated sounds in performance, and focuses on sonic
actions in bodily representations of sounds (see also, Godøy this volume, chapter 12, on
musical shape cognition). Research into traditional instrumental performances is
discussed, and features of electroacoustically generated sounds in gestural performances
38 clemens wöllner
are described for controller-driven music. In all these examples, sound characteristics,
according to more restricted definitions, refer to the timbre of instrumental or vocal
sounds in a performance. Certain spectral components, as specified later, characterize
and distinguish sounds from each other. In a wider sense, the sounds of a music per-
formance include further components such as timing and dynamics that make a
performance unique and distinctive from others. These sound qualities are often related
to timbre, such that higher intensity in many acoustical instruments alters their timbre
as well, or the chosen tempo affects playing techniques, articulation, and sound quality.
Distinctions between the two concepts of sound may thus be primarily of theoretical
interest. Yet, for electronic music, an unlimited combination of sound features is pos-
sible that does not necessarily involve the aforementioned interdependencies between
timing, intensity and timbre. The chapter will focus on both perspectives of musical
sounds in actual and imagined performances and explores ways in which anticipated
sonic actions might differ between performances of physical instruments and those of
gestural, controller-based music. The main argument is that musicians construct images
of their sonic actions that permit them to perform independently from what they actu-
ally hear, while at the same time a feedback mechanism is needed for controlling the
actual sounds that the audience should perceive.
In what ways do musicians establish images of their sonic actions? Findings of research
on perception and auditory imagery, mental performances and anticipation of sound
events are discussed in the following section. The research presented here employed
physical analog instruments that allow musicians to construct multimodal images
of their actions without sound modifications or distortions that are possible with
electronic instruments.
and pedaling, listeners ascribe individual sound qualities to pianists that are reflected
in a number of performance parameters such as touch or articulation (Bernays and
Traube 2014). On the other hand, these qualities are not fixed in Western musical nota-
tion, nor can they be captured in a direct, one-dimensional way. Empirical approaches to
music performance seem to have focused primarily on timing, pitch, dynamics, and
articulation. Though these dimensions may characterize key elements of music per-
formances, other features shape musical experiences to a significant extent. For electro-
acoustic music, Stockhausen (1963) defines five musical dimensions: pitch including
harmony and melody, duration including meter and rhythm, timbre (“Tonfarbe,” literally
“tone color”), dynamics, and spatial aspects. According to Stockhausen, each dimension
should be equally important for composers, performers, and listeners, and he employed
this stance for his own compositions.
While the first three of Stockhausen’s parameters have long been described and
indicated in the scores in relatively precise terms (typically, however, not taking into
account the fine nuances in performers’ microtiming or dynamics (see also Danielsen
this volume, chapter 29, on microtiming), defining the tone color seems more challeng-
ing. Musicians, theorists, and empirical researchers often employ verbal descriptions
such as full, bright, diffuse, or tense in their descriptive approaches to timbre, and these
should relate to acoustic features of the sounds. Measurements of the acoustic features,
then, focus on spectral components, including formant areas, spectral centroid, spectral
flux, or intensity of selected partials. Reuter and Siddiq (2017) present an overview of
various attempts to classify instrumental timbres by assessing their closeness or dis-
tance to each other in so-called timbre spaces. Grey (1975) was the first to construct
such a timbre space by having listeners rate the perceptual quality of synthesized sounds.
The dissimilarity of the perceived timbres was transferred into a three-dimensional
space using the statistical method of multidimensional scaling. Briefly, the further away
the timbres are in the space, the more they differ from each other in perceptual judg-
ments. Several other researchers constructed timbre spaces in the following years. In
their meta-analysis, Reuter and Siddiq found out that the dimensional solutions for
describing instrumental sounds vary widely in different studies of timbre spaces. They
tested the synthesizer sounds used in these studies and compared them with the timbres
of actual instrumental sound samples from the Vienna Symphonic Library. Listeners
subsequently rated the dissimilarity of the timbres in pairwise comparison judgments.
As a result, there were large differences between the timbral qualities of the same instru-
ments across the stimuli sets used in the respective studies, indicating limits in the
comparability and generalizability of synthesized sounds. In contrast, ratings within
the set of the actual instrumental samples were more widespread (see the boxes with a
“v” as first letter, Figure 2.1). Consequently, Reuter and Siddiq suggest that a wider range
of sound samples should be investigated and that perceptual judgments should be
more firmly related to acoustical properties.
If sound qualities, according to Reuter and Siddiq, are rather elusive and lead to great
variation in perceptual judgments and descriptions, in what ways are they relevant for
musical performers and composers? Even more, some Western music, especially of the
40 clemens wöllner
vS
vS
vS
gS1
gS3 gS1
gS2 vT gS1
kS gS2
gS2 kS
vF kP gS3 gS1
vT gP gF
gF
vP vF gT kS
gP kP
kP vF gT gFgC1
vP gF
Dimension 3
gE gC1 gF
gF
gE
kP gE kT
kT gC2
gC1
gB
vB gC1
kT kB
vB gC1
vC kF
kT
vE kF
vE vE vE
kF
kE
kB vC
vk
kC vk vk
vk vk
kB kC
kE vE vk vk
vT kT vC
kT vC
kP
kC
kB
vP vC
vk
D kP
im kC
kD vC
en kP
sio
on 1
kD vB
n vk
nsi
2 ime
D
Figure 2.1 Meta timbre space including four dimensions (fourth dimension: shades on the
gray scale).
(1) Sounds used in the study by Grey (1975): gB (bassoon), gC2 (bass clarinet), gC1 (E<flat> clarinet),
gE (English horn), gF (French horn), gS1-3 (cello with different playing techniques),
gT (trombone), gP (trumpet).
(2) Sounds used in Krumhansl (1989) and McAdams et al. (1995): kB (bassoon), kC (clarinet), kE (English horn), kF
(French horn), kS (strings), kT (trombone), kP (trumpet).
(3) VSL sounds (Vienna Symphonic Library): vB (bassoon), vC (clarinet), vE (English horn), vF (French horn), vS
(strings), vT (trombone), vP (trumpet) (Reuter and Siddiq 2017, 160).
baroque period with its widespread stress on structural elements rather than per
formative affordances of the instruments, sounds adequate even if it is performed on a
variety of different instruments. Arrangements for new instruments are popular and com-
mon, suggesting that the essence of a piece of music may remain intact and recognizable
even with completely different timbres. On the other hand, musical arrangements also
point to the fascination that the timbres of various instruments have for the audience.
Notions that timbral qualities are hard to capture in language do not imply that they should
be less relevant. In addition, it is generally accepted, and somewhat expected from musical
performers, that timbre differs from one performer to the other, giving rise to character-
istic sound qualities of soloists and whole orchestras. For recorded music, the mixing
anticipated sonic actions and sounds in performance 41
engineer’s and sound producer’s personal ideals of timbre further influence the sound of
a recording. It can thus be assumed that performers, producers, and listeners alike have
distinct images of sound qualities that shape their expectations and, in the case of musi-
cians, their actions. A basis for this claim certainly lies in the fact that people are indeed
able to distinguish between timbres, and that they do imagine them vividly.
Evidence for the vivid imagery of timbre stems from empirical research that addresses
differences and similarities between imagined and perceived instrumental timbres.
Halpern et al. (2004) asked ten participants with moderate musical experience to judge
the similarity of instrumental timbres in pairwise comparisons. The sounds were taken
from the McGill University Master Samples Library and included realistic sounds of
musical instruments. Participants were either acoustically presented with the samples
or had to imagine the sounds, while their brain activity was assessed with fMRI. The
pairwise comparisons, indicating the similarity or distance of the timbres, were ana-
lyzed with multidimensional scaling, resulting in two dimensions that the authors
defined as “brilliance” and “nasality.” As an example, the oboe timbre was placed highly
in both dimensions, whereas the clarinet was low in both dimensions. Interestingly,
there were roughly similar dimensional scaling solutions for the perceived and imag-
ined timbres. In addition, the similarity ratings for the timbre pairs correlated highly
between the imagined and perceived conditions, suggesting some overlap in the cog-
nitive processes of perception and imagery. This conclusion is further supported by
neuroimaging results showing that areas in the right auditory cortex were activated both
in timbre perception and in timbre imagination. In addition, there was some activation
in the supplementary motor area during timbre imagery, but not in perception.
Musicians may have accessed some unspecific motor component during imagery, or
they subvocalized the pitches of the instruments during imagery. Further research,
conversely, questions whether individuals can indeed imagine musical timbre vividly,
and suggests that it is rather their representations of timing or pitch that are more
detailed and accurate (Bailes 2007). Familiarity with a piece of music clearly aids in
imagining timbral qualities.
Despite the different systems in which timbral qualities are classified, they are
undoubtedly among the key characteristics of individual music performances and,
accordingly, one of the first features listeners may perceive when appraising musical
interpretations. Whereas the timing and dynamics unfold over time, the sound col-
ors of a performance are present immediately. Some evidence for the significance of
the sound quality for listeners’ judgments stems from research on “thin slices” of
music (Gjerdingen and Perrott 2008; Krumhansl 2010). Even when presented with
very short musical excerpts of 300 and 400 ms, listeners were able to provide judg-
ments on genre or emotional content. For well-known pieces, they could also name
the performers or release decade. Since information about other musical elements is
reduced, it can be assumed that timbral qualities have a paramount role for these
quick evaluations, especially for genre judgments. The same evaluation strategies, in
combination with other elements such as timing, dynamics, or pitch, are present of
course for longer durations.
42 clemens wöllner
played. It can be assumed that pianists could vividly imagine and anticipate the sounds
during sight-reading. Finney (1997) observed in a related study that manipulations of
pitch in auditory feedback interfered with pianists’ performance plans and impaired
their play. When auditory feedback was completely absent, on the other hand, their
imagery skills allowed them to perform without disruptions. The tactile and kinesthetic
feedback was evidently more important for pianists to control their performances than
the external auditory information. These findings show that employing different feed-
back modalities in empirical studies allows insights into the stability of performers’
imagery skills.
Further research investigated the learning and recall of unfamiliar music under
different feedback conditions. Highben and Palmer (2004) asked pianists to practice
without auditory feedback (i.e., playing without sound), without motor feedback
(i.e., score reading while listening to a prerecorded version of the piece), or while
practicing the music in their minds without any feedback at all. The last condition led to
worse results than the conditions where only one of the feedback modalities was absent.
Those pianists who succeeded in an auditory skill post-test were also better in learning
with no auditory feedback of the piano. It can be concluded that musicians with good
aural skills may benefit from more stable auditory images, supporting them in practice
and performance. In a related study (Brown and Palmer 2013), higher auditory imagery
skills allowed the pianists to recall more notes, to play with greater temporal regularity,
and to overcome distortion caused by experimental interferences. These results indicate
that the individual stability of auditory imagery aids in performance and recall, even in
situations when the sensory feedback of the actual sound is absent or altered. Auditory
imagery was not correlated with motor imagery, suggesting that both cognitive skills are
relatively independent in performers.
Multimodal imagery skills are particularly vital for mental practice, in which per-
formers play the music “in their minds’ ear” without overt muscle movements. Such mental
performances are only efficient and feasible if performers possess efficient imagery
skills. A study assessed timing stability in piano performance across experimental vari-
ations of performance feedback and imagery (Wöllner and Williamon 2007). Pianists
were asked to perform a memorized composition under four conditions that included a
normal performance as well as conditions without auditory feedback, without auditory
and visual feedback, and, finally, while tapping along with an imagined performance.
Analyses of the microtiming and dynamics revealed that the condition without auditory
feedback (but with haptic feedback of the MIDI piano) was relatively close to the normal
performance and deviated only about 10 percent, while tapping along with the imag-
ined performance led to strong timing deviations of up to 40 percent. In other words,
musicians had developed strong auditory-motor images of the compositions that did not
impair their play as long as they received the kinesthetic feedback of their fingers from
the piano keys. These results were not reflected in the pianists’ self-evaluations of their
practicing and memorizing strategies, in which they indicated that they had memorized
and employed aural, visual, kinesthetic, and conceptual images of the music’s structure
to a similar extent. In an associated study (Clark and Williamon 2011), the timing accuracy
44 clemens wöllner
of pianists’ imagined performances was related to the amount of time they had spent
practicing mentally per day, while results of a self-report imagery vividness test were
correlated only with live performances, not imagined performances. These findings sug-
gest that domain-specific practicing may enhance musical imagery skills, which are not
necessarily related to the general vividness of imagery outside the specific skills.
In summary, vivid auditory imagery may aid in performance when sounds cannot be
perceived. Studies provided evidence that auditory perception and imagery are closely
related, since the cognitive processes involved are highly comparable and both engage
the secondary auditory cortex (Daselaar et al. 2010; for a review, see Hubbard 2010). As
shown previously, when individuals evaluate the qualities of the sounds they imagine,
their judgments generally resemble those of the actually perceived timbres (Halpern
et al. 2004). Similarly, individuals are able to imagine durations and pitches in conditions
where they do not hear the sounds (Janata and Paroo 2006).
Vivid auditory, visual, and motor imagery is thus central for mentally practicing
music, but also for other performance areas such as sports or medicine in which athletes
and surgeons train “internally” a number of crucial actions, acquire new skills, and
prepare for their performance in the absence of any overt physical movements (Cocks
et al. 2014). Mental practice has been shown to be beneficial compared to no practice
(see Driskell et al. 1994; Connolly and Williamon 2004) and can aid the performers in
re-enforcing their imagery skills, which are necessary for building the auditory and motor
representations of a musical piece or of athletic movement patterns. Perhaps unsur-
prisingly then, musical training increases auditory imagery skills. In a study using a
melody-continuation paradigm in which only the beginnings of the melodies were actually
played, musicians had a more vivid imagination of the following tones of the melodies
that were not played (Herholz et al. 2008). Compared to nonmusicians, they responded
more quickly to single incorrect tones that were played during the imagined continu-
ation of the melodies and showed a neural mismatch negativity, leading the authors to
suggest that imagery shares the same neural processes with actual perception.
produce. In this way, music performance can be seen as an act of knowing or sensing
what comes next. Before producing a certain action, performers internally anticipate
the sonic outcome. There are two internal models that describe the processes of action
anticipation: a forward model (being aware that a motor action leads to a sensory expe-
rience), and an inverse model (focusing on the desired outcome of an action affects
motor behavior)—both models run a short time before action execution (Keller 2012;
Rauschecker 2011). Experienced musicians are thus able to anticipate and control the
movements they need to execute in order to reach a desired sound quality and to adjust
the force necessary for producing the sounds during play. They do not only respond
to the outcome of their own actions, but also internally imagine the sonic actions of
co-performers in an ensemble (cf., Sevdalis and Keller 2014).
Several experimental studies investigated the musicians’ anticipatory imagery that
guides their actions and the related sound outcomes. In a tapping task, Keller and
colleagues (2010) compared the timing accuracy in conditions without sound as well as
with compatible or incompatible sounds. In the silent condition, the musicians’ timing
matched the given target tempo best, and the accelerations of their finger trajectories
before tapping were highest. Thus, auditory imagery clearly aided the musicians in pla-
nning their actions and timing their tapping movements. Action anticipation was also
investigated in a study by Bishop and colleagues (2013). Twenty-nine pianists of varying
degrees of expertise were asked to play the right-hand part of relatively unknown and
simple piano compositions. After practicing the piece, some of the experimental condi-
tions included actually playing the music while, in other performance conditions, no
acoustic feedback or no auditory plus motor feedback was provided (hence, in the latter
condition, they imagined playing through the compositions in their minds). As expected,
experienced pianists produced fewer pitch and timing errors than less experienced ones
in the conditions without auditory feedback, suggesting that they had vivid imagery skills
and more stable action anticipation (cf., also studies reviewed above). Furthermore, at
several points, dynamic and articulation markings occurred in the scores on a digital visual
screen that had not been present when practicing the pieces. During performances with
disrupted feedback and during imagined performances, pianists were asked whether
the newly introduced markings matched their own expressive intentions at the specific
moments. For instance, a crescendo marking appeared and pianists said “yes” if it
matched their own idea or “no” if this was not the case. As a result, they responded more
quickly than would have been expected if they had waited for the auditory feedback.
These findings suggest that pianists had access to anticipatory imagery that aided them
in performing and imagining the music without feedback. The findings further indicate
that there are relatively stable performance plans in terms of articulation and dynamics,
so that the music was vividly played “in the mind’s ear.”
Research on agency provides further evidence for the claim that musicians construct
stable representations of their actions, allowing them to imagine their own play before
carrying out an action. Agency is the awareness that actions are produced by oneself,
in other words, that someone feels an authorship of their actions (Jeannerod 2003;
Synofzik et al. 2008). Auditory feedback is highly informative for action identification,
46 clemens wöllner
might therefore include a number of different sources and modalities that resonate with
the perceiver’s motor system (cf., Alaeerts et al. 2009).
An even greater challenge for musicians is to imagine the sonic actions of others
while performing themselves at the same time, thus integrating self and other imagined
actions as well as self auditory-motor feedback and other auditory feedback. Visual
information guides action anticipation in an ensemble. Keller and Appel (2010) inves-
tigated the relation between anticipatory imagery and dyadic synchronization. Seven
pairs of pianists performed duets of the classical piano literature together on MIDI
instruments while seeing or not seeing each other. Their body motion was recorded
with an optical motion capture system. In a second session, each pianist’s anticipatory
auditory imagery was tested with a tapping paradigm that included auditory feedback of
marimba tones or no auditory feedback. As a result, anticipatory imagery scores, calcu-
lated for each pianist separately, were correlated with average duo synchrony, suggesting
that those pianists with good imagery skills were more successful at timing their ensem-
ble play. While being in visual contact did not markedly affect results, lags in anterior-
posterior body sway between duo partners were related to synchrony, indicating that
the pianists also timed their performances via body motion. Alternatively, their body
motion might have functioned as individual time-keeping support. In this study, pianists
not only had to anticipate their own actions but also had to imagine the co-performer’s
sonic actions. Keller and Appel suggest that inverse internal models (see above) were
run slightly before producing the actions. Anticipatory auditory imagery should consist
of imagining the sounds of oneself and the other performer, and both are then transferred
to adequate motor commands. In duo and ensemble performance, internal models are
thus coupled by simulating the other’s actions (cf., Gallese and Goldman 1998).
Taken together, timing information seem most important for an awareness of actions
and self-other distinctions. Evidence for the paramount importance of timing has also
been provided by studies outside the domain of music (e.g., Knoblich and Prinz 2001).
Most research on auditory action identification and sonic imagination investigated
simple acoustical stimuli or piano performances, in which the variety of sound qualities
is rather limited as compared to string or wind instruments, or the human voice. More
research is needed that addresses specific timbral qualities in the anticipatory imagi-
nation of the sounds to be produced by oneself and other performers.
Anticipation and auditory imagery of sonic actions are vital in performance areas that
do not rely on the haptic feedback of traditional instruments. A growing field in per-
formance practice employs sensors and controller systems that allow for the shaping of
sounds in new forms. The bodily movements of the performers are translated to sound
48 clemens wöllner
signals, so that their musical gestures become sonic actions in a somewhat direct and, in
the eyes of observers, apparently unmediated way. Producing sounds by human actions
“in the air,” without the haptic feedback of physical instruments, has fascinated per-
formers and the audience for a long time even before modern computing technology.
One of the first such instruments is the Theremin, invented in 1920, in which the spatial
positions of the two hands control volume and pitch (see Theremin 1996). The funda-
mentals of the Theremin are two metal capacitors that function as proximity sensors in
the near field for the above two performance dimensions. A boost in sensor-based music
performances has coincided with the greater availability of electronics and software
solutions since the 1980s. There are various developments in the field, including tech-
nology for digitally augmenting the sound options of acoustical instruments (which are
fundamentally still based on the playing techniques of these instruments but involve
additional electronic sounds), or controllers that turn a variety of different information
including brainwaves or human motion into sound (cf., Hugill 2012). In the following,
central conceptual issues and examples of performance systems that focus on purely
gesture-based or “open-air” controllers and their consequences for action-sound map-
pings and anticipatory imagination are discussed. At the center of the discussion will be
two types of systems that enable experiences of sonic agency for performers and audi-
ence: the conductor’s jacket and data gloves.
The disembodiment problem is also apparent when listeners perform actions in the
air to accompany music, either to follow the melodic lines and the musical structure
(e.g., Hohagen and Wöllner 2015) or by playing “air instruments” (Godøy et al. 2006;
cf., Jensenius et al. 2010; Visi et al. 2016). When mimicking or imitating the sound-
producing gestures, listeners may still imagine a link between their actions and the
sound of the original performance, and the motor involvement might augment their
auditory experience. Therefore, it can be assumed that audience members are prone to
assign visual gestures to performance sounds (see also Behne and Wöllner 2011), even if
they are aware that there is no direct mapping between the two.
were transferred into musical parameters such as sound onset and duration, tempo,
articulation, and dynamics.
Although the system was based on the movement patterns of actual conductors,
Nakra (2002) saw limits when used for the training of conductors, which requires more
complex interactions between musicians and conductor. It should thus instead be used
as a new, stand-alone instrument that allows different sounds to be produced, and for
which specific pieces should be written or arranged. Among the pieces composed
for the conductor’s jacket, Etude 2 (Nakra 2000) used algorithms in which the EMG
signal of the right biceps alone controlled pitch, volume, and timbre at the same time.
The more the muscle contracted, the more the pitch height and the intensity level
increased. In addition, the sound spectrum was altered such that the timbre appeared to
be brighter. When the contractions of the biceps were overtly shown by arm movements,
direct mappings between gesture and sound qualities became immediately apparent
for the audience and the performer alike. In contrast, traditional conducting involves
sometimes rather small gestures or blinks of an eye that can lead to large effects such as
an entrance of the whole orchestra—moments in which the audience may doubt that
the conductor “evokes” the sound.
Besides the conductor’s jacket, there are several other performance systems using
EMG sensors (see, among others, Donnarumma and Tanaka 2014; Nymoen et al. 2015).
A large number of gesture-driven controllers employ motion-capture technology to
direct digital instruments (e.g., Dobrian and Bevilacqua’s Motion Capture Music 2003).
Among the commercial systems most widely used in various applications and per-
formance art is the Kinect system, which was developed for Microsoft’s Xbox game con-
sole. The bodily motion of one or more performers can be tracked with video technology
using an RGB color camera and a depth sensor with infrared lighting, therefore no
markers are necessary as in other motion-capture systems. The advantage of the system
clearly lies in its usability, whereas caveats include the somewhat arbitrary modeling of
the body and the latency, resulting in less precise results compared to multiple-camera
motion-capture systems. Still, it is possible to capture some position and motion data
of performers and to map them to sounds. Various toolkits have been developed for
Kinect, one of the first action-to-sound applications among them is Crossole (Sentürk
et al. 2012). In this digital instrument, virtual building blocks are visualized in a way that
a performer may control them, including chord structure, arpeggios, and timbre. The
musical style is limited to harmonies of the Western classical tradition, while the tim-
bre control consists of various sound effects including delays and filters. In addition,
the contour of the melodic lines can be drawn by gestures.
While artistic approaches such as Crossole strongly rely on visual representations of
preset spatial positions for controlling the musical output, other systems allow for more
flexibility in shaping the sound qualities. For example, data gloves capture the fine-tuned
motion of the hands by use of markers or accelerometers. One of the earliest data gloves
gained information from optical finger-flex sensors and position sensors in order to train
a neural network of mappings between hand gestures and a vocabulary of 203 words
(Fels and Hinton 1993). Mapping results after training were astonishingly high, and later
52 clemens wöllner
Acceleration Sensor Y Z
X
Strain Sensor
Strain Sensor
(inner surface of the wrist)
versions were also used for music performances (Fels et al. 2002), in which the audience
should perceive metaphorical relationships between expressive gestures and sound.
A different set of gloves were constructed by Laetitia Sonami and collaborators. The
first version of their Lady’s Glove was invented in 1991, combining several transducers at
the fingers and a magnet worn on the other hand to control synthesizers via MIDI. Later
versions, mapped to MAX-MSP, allowed for more flexible sound control and intuitive
gesture-sound mappings (Rodgers 2010). Hayafuchi and Suzuki (2008) developed the
Musicglove (Figure 2.2), a device with accelerometers and strain sensors that meas-
ure the bending of the wrist and some fingers. A number of set gestures control the
musical tracks: for instance, making a fist pauses the music. Overall acceleration is
mapped to the tempo and vertical acceleration more specifically to the tempo of the
beats. Therefore, the sound of preexisting music can be controlled with the gestures.
One potential application includes the control of dance music by a DJ, where the ges-
tures could even be more convincing for an audience when compared to conventional
controlling devices such as a laptop or a turntable.
Perhaps the expressive performances of Imogen Heap made controllers, and data
gloves in particular, more popular with audiences even of popular music genres. The
Mi.Mu Gloves (see Mitchell and Heap 2011) contain bend sensors, accelerometers, and a
gyroscope as well as tactile feedback to the performer via vibrations. A large number of
preset audio control options are possible that can be incorporated in various performance
genres including dance, in which the mapping between human motion and sounds may
seem particularly impressive.
Conclusions
Musical performers in a wide range of genres benefit from vivid auditory and motor
imagery. Being able to perform the music in their minds’ ears enables performers to
anticipate their own and other musicians’ sonic actions and to shape the sound quality.
In the absence of sound or in situations with altered or delayed auditory feedback, musi-
cians depend on imagery skills to control their bodily performances. In digital music
performance, with its numerous features for shaping and combining sounds, precise
imagery clearly helps in reaching the desired sound outcome. Even more, controller-driven
anticipated sonic actions and sounds in performance 53
performances often fascinate the audience if a close mapping between gesture and
sound—an apparent fusion—is achieved. In these cases, the audience imagines the sound
to originate in the performer’s movements as sonic actions.
The anticipatory processes in auditory imagery and the level of detail of imagined
sound qualities, however, remain areas for further inquiry. Spectral components of
the sound quality are used to distinguish between performances; they may characterize
a performer’s “fingerprint,” that is his or her individual approach to sound (Bernays and
Traube 2014). At the same time, timbral qualities still appear rather elusive in descriptive
systems. In most empirical studies of music performances, microtiming information
sufficed for distinctions between sonic actions. In addition to timing, the role of timbre
in anticipatory imagery and the shaping of sounds are particularly important for
instruments and genres that rely more strongly on nuances in sound qualities, and thus
deserve further study.
Imagery skills vary among musicians according to their background and expertise,
depending on whether or not the music is played by heart, in ensembles with or without
notation, or in improvisations. Research summarized in this chapter has shown that
pianists who typically play their repertoire in a memorized way often develop par
ticularly vivid auditory imagery. Their performances are not interrupted if the sound
cannot be heard, for instance when it is switched off at MIDI pianos. Yet, for all musical
performers, it is paramount to imagine sonic events as an outcome of their own
actions. They need high degrees of vivid auditory imagery and motor awareness in
order to fine-tune the desired sound qualities in a performance and to adjust their
play if necessary, based on an auditory and motor error-feedback system. In this regard,
research on action–perception coupling indicated that musicians have a strong sense of
agency for their sonic actions. The neural processes underpinning imagery and per-
ception are fundamentally similar, and both may activate overlapping action networks
in performers, such that even listening to the sounds of music performances resonates
with the performer’s action systems.
Being aware of one’s own sonic actions, and vivid auditory imagery skills, occasion-
ally appears even more pertinent to musicians than hearing the actual acoustical sounds.
Apart from the widespread consumption of heavily compressed audio files, anecdotal
evidence suggests that experienced musicians are able to ignore shortcomings of sound
recordings and renditions; some of them may even not need high-fidelity sound systems
and can still appreciate the music to some extent by compensating for, and imagining,
the missing sound features. Similarly for performing, as discussed earlier, imagery can
become more important than actual sonic feedback during playing, especially in the
absence or alteration of sound. The relative dependencies of performers on imagery or
feedback remain a topic for more research: in other words, whether musicians primarily
concentrate on the imagined sounds during the act of performing as an intended action
outcome, or whether they rely more strongly on a variety of different feedback modali-
ties, including kinesthetic and visual components (cf., the internal models discussed
earlier). While performance plans may become more stable if they draw on multimodal
sensory stimuli, focusing one’s attention only on the sound as an external action goal,
54 clemens wöllner
thus concentrating less on internal bodily processes, has been shown to be efficient in
various fields (for an overview, see Wulf 2007).
The anticipation of sonic actions, as stated earlier, might rely solely on imagined
sounds that are to be produced. These processes appear to be particularly vital in gestural
performances of live electronics, in which the multisensory feedback loop of acoustical
instruments is often absent. Real-time processing, spatial placement and sound source
control are central for close mappings and perceived fusion in experiences of human
sonic actions. New developments in performance interfaces address these issues by
modifying gesture-to-sound mappings that offer captivating links between bodily actions
and sounds as an intentional, meaningful process for performers and for the perceptions
and imaginations of audiences.
References
Aglioti, S. M., and M. Pazzaglia. 2010. Representing Actions through Their Sound. Experimental
Brain Research 206: 141–151.
Agnew, M. 1922. The Auditory Imagery of Great Composers. Psychological Monographs 31:
279–287.
Alaeerts, K., S. P. Swinnen, and N. Wenderoth. 2009. Interaction of Sound and Sight during
Action Perception: Evidence for Shared Modality-Dependent Action Representations.
Neuropsychologia 47: 2593–2599.
Bailes, F. 2007. Timbre as an Elusive Component of Imagery for Music. Empirical Musicology
Review 2: 21–34.
Bangert, M., T. Peschel, G. Schlaug, M. Rotte, D. Drescher, H. Hinrichs, et al. 2006. Shared
Networks for Auditory and Motor Processing in Professional Pianists: Evidence from fMRI
Conjunction. NeuroImage 30: 917–926.
Banton, L. J. 1995. The Role of Visual and Auditory Feedback during the Sight-Reading of
Music. Psychology of Music 23: 3–16.
Behne, K.-E., and C. Wöllner. 2011. Seeing or Hearing the Pianists? A Synopsis of an Early
Audiovisual Perception Experiment and a Replication. Musicae Scientiae 15: 324–342.
Bernays, M., and C. Traube. 2014. Investigating Pianists’ Individuality in the Performance of
Five Timbral Nuances through Patterns of Articulation, Touch, Dynamics, and Pedaling.
Frontiers of Psychology 5: 157.
Bishop, L., F. Bailes, and R. T. Dean. 2013. Musical Imagery and the Planning of Dynamics and
Articulation during Performance. Music Perception 31: 97–117.
Brereton, J. 2017. Music Perception and Performance in Virtual Acoustic Spaces. In Body,
Sound and Space in Music and Beyond: Multimodal Explorations, edited by C. Wöllner,
211–234. Abingdon, UK: Routledge.
Brown, R. M., and C. Palmer. 2013. Auditory and Motor Imagery Modulate Learning in Music
Performance. Frontiers in Human Neuroscience 7: 320.
Caramiaux, J. F., N. Schnell, and F. Bevilacqua. 2014. Mapping through Listening. Computer
Music Journal 38: 34–48.
Chion, M. 1983. Guide des objets sonores: Pierre Schaeffer et la recherche musicale. Paris:
Buchet/Chastel.
Clark, T., and A. Williamon. 2011. Evaluation of a Mental Skills Training Program for
Musicians. Journal of Applied Sport Psychology 23: 342–359.
anticipated sonic actions and sounds in performance 55
Cocks, M., C.-A. Moulton, S. Luu, and T. Cil. 2014. What Surgeons Can Learn from Athletes:
Mental Practice in Sports and Surgery. Journal of Surgical Education 71: 262–269.
Connolly, C., and A. Williamon. 2004. Mental Skills Training. In Musical Excellence: Strategies
and Techniques to Enhance Performance, edited by A. Williamon, 221–245. Oxford: Oxford
University Press.
Daselaar, S. M., Y. Porat, W. Huijbers, and C. M. Pennartz. 2010. Modality-Specific and
Modality-Independent Components of the Human Imagery System. Neuroimage 52:
677–685.
Dobrian, C., and F. Bevilacqua. 2003. Gestural Control of Music using the Vicon Motion
Capture System. In Proceedings of the New Interfaces for Musical Expression Conference,
161–163. May 22–24, 2003, Montréal, Quebec, Canada.
Donnarumma, M., and A. Tanaka. 2014. Principles, Challenges and Future Directions of
Physiological Computing for the Physical Performance of Digital Musical Instruments. In
Proceedings of the 9th Conference on Interdisciplinary Musicology (CIM14), edited by T. Klouche
and E. R. Miranda, 363–368. Berlin, Germany: Staatliches Institut für Musikforschung.
Driskell, J. E., C. Copper, and A. Moran. 1994. Does Mental Practice Enhance Performance?
Journal of Applied Psychology 97: 481–492.
Fels, S. S., A. Gadd, and A. Mulder. 2002. Mapping Transparency through Metaphor: Towards
More Expressive Musical Instruments. Organised Sound 7 (2): 109–126.
Fels, S. S., and G. E. Hinton. 1993. Glove-Talk: A Neural Network Interface between a Data-
Glove and a Speech Synthesizer. IEEE Transactions on Neural Networks 4 (1): 2–8.
Finney, S. A. 1997. Auditory Feedback and Musical Keyboard Performance. Music Perception
15: 153–174.
Gallese, V., and A. Goldman. 1998. Mirror Neurons and the Simulation Theory of Mind-
Reading. Trends in Cognitive Sciences 2: 493–501.
Gjerdingen, R. O., and D. Perrott. 2008. Scanning the Dial: The Rapid Recognition of Music
Genres. Journal of New Music Research 37 (2): 93–100.
Godøy, R. I. 2010. Gestural Affordances of Musical Sound. In Musical Gestures: Sound,
Movement, and Meaning, edited by R. I. Godøy and M. Leman, 103–125. New York:
Routledge.
Godøy, R. I., E. Haga, and A. R. Jensenius. 2006. Playing “Air Instruments”: Mimicry of
Sound-Producing Gestures by Novices and Experts. In Gesture in Human-Computer
Interaction and Simulation: 6th International Gesture Workshop, edited by S. Gibet, N. Courty,
and J.-F. Kamp, 256–267. Berlin: Springer.
Grey, J. M. 1975. An Exploration of Musical Timbre. PhD thesis, Department of Psychology,
Stanford University.
Halpern, A. R., R. J. Zatorre, M. Bouffard, and J. A. Johnson. 2004. Behavioral and Neural
Correlates of Perceived and Imagined Musical Timbre. Neuropsychologia 42: 1281–1292.
Hayafuchi, K., and K. Suzuki. 2008. Musicglove: A Wearable Musical Controller for Massive
Media Library. In Proceedings of the International Conference on New Interfaces for
Musical Expression (NIME) 8, 241–244. 5–7 June 2008, Genova, Italy.
Herholz, S. C., C. Lappe, A. Knief, and C. Pantev. 2008. Neural Basis of Musical Imagery and
the Effect of Musical Expertise. European Journal of Neuroscience 28: 2352–2360.
Highben, Z., and C. Palmer. 2004. Effects of Auditory and Motor Mental Practice in
Memorized Piano Performance. Bulletin of the Council for Research in Music Education
159: 58–65.
Hohagen, J., and C. Wöllner. 2015. Self-Other Judgements of Sonified Movements: Investigating
Truslit’s Musical Gestures. In Proceedings of the Ninth Triennial Conference of the European
56 clemens wöllner
Society for the Cognitive Sciences of Music, August 17–22, Royal Northern College of
Music, Manchester.
Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136: 302–329.
Hugill, A. 2012. The Digital Musician. 2nd ed. New York: Routledge.
Hunt, A., M. M. Wanderley, and M. Paradis. 2003. The Importance of Parameter Mapping in
Electronic Instrument Design. Journal of New Music Research 32: 429–440.
Janata, P., and K. Paroo. 2006. Acuity of Auditory Images in Pitch and Time. Perception and
Psychophysics 68: 829–844.
Jeannerod, M. 2003. The Mechanism of Self-Recognition in Humans. Behavioural Brain
Research 142: 1–15.
Jensenius, A. R., M. M. Wanderley, R. I. Godøy, and M. Leman. 2010. Concepts and Methods
in Research on Music-Related Gestures. In Musical Gestures: Sound, Movement, and
Meaning, edited by R. I. Godøy and M. Leman, 12–35. New York: Routledge.
Kalakoski, V. 2001. Musical Imagery and Working Memory. In Musical Imagery, edited by
R. I. Godøy and H. Jørgensen, 43–55. Lisse, the Netherlands: Swets and Zeitlinger.
Keller, P. E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and
Potential Benefits. Annals of the New York Academy of Sciences 1252: 206–213.
Keller, P. E., and M. Appel. 2010. Individual Differences, Auditory Imagery, and the Coordination
of Body Movements and Sounds in Musical Ensembles. Music Perception 28: 27–46.
Keller, P. E., S. Dalla Bella, and I. Koch. 2010. Auditory Imagery Shapes Movement Timing
and Kinematics: Evidence from a Musical Task. Journal of Experimental Psychology: Human
Perception and Performance 36: 508–513.
Knoblich, G., and W. Prinz. 2001. Recognition of Self-Generated Actions from Kinematic
Displays of Drawing. Journal of Experimental Psychology: Human Perception and Performance
27: 456–465.
Krumhansl, C. 1989. Why Is Musical Timbre So Hard to Understand? In Structure and
Perception of Electroacoustic Sound and Music, edited by S. Nielzen and O. Olsson, 43–53.
Amsterdam: Elsevier.
Krumhansl, C. L. 2010. Plink: “Thin slices” of Music. Music Perception 27: 337–354.
McAdams, S., S. Winsberg, S. Donnadieu, G. de Soete, and J. Krimphoff. 1995. Perceptual
Scaling of Synthesized Musical Timbres: Common Dimensions, Specificities, and Latent
Subject Classes. Psychological Research 58 (3): 177–192.
McPherson, A., R. H. Jack, and G. Moro. 2016. Action-Sound Latency: Are Our Tools Fast
Enough? In Proceedings of the International Conference on New Interfaces for Musical
Expression, 12–14 July 2016, Brisbane, Australia.
Miranda, E. R., and M. M. Wanderley. 2006. New Digital Musical Instruments: Control and
Interaction Beyond the Keyboard. Madison, WI: A-R Editions.
Mitchell, T. J., and I. Heap. 2011. SoundGrasp: A Gestural Interface for the Performance of Live
Music. In Proceedings of the International Conference on New Interfaces for Musical
Expression (NIME), 465–468. 30 May–1 June 2011, Oslo, Norway.
Nakra, T. M. 2000. Inside the Conductor’s Jacket: Analysis, Interpretation and Musical
Synthesis of Expressive Gesture. PhD thesis, MIT.
Nakra, T. M. 2002. Synthesizing Expressive Music through the Language of Conducting.
Journal of New Music Research 31: 11–26.
Nymoen, K. H., M. R. Haugen, and A. R. Jensenius. 2015. MuMYO: Evaluating and Exploring
the MYO Armband for Musical Interaction. In Proceedings of the International Conference
on New Interfaces for Musical Expression, edited by E. Berdahl. Baton Rouge: Louisiana
State University.
anticipated sonic actions and sounds in performance 57
Motor I m agery i n
Perception a n d
Per for m a nce of
Sou n d a n d M usic
Jan Schacher
Introduction
Audition is one of our central senses, and listening is deeply integrated into how we
perceive and act in the world. In our perception, sound plays a central role for the
construction of a coherent, multimodal “world-view.” Hearing and listening are tied
together with all other senses in low-level, subpersonal, and prereflective relationships
that are established in interaction with others, with the goal of meaning generation
in higher cognitive functions. The central element used for making sense of acoustic
information and for interpreting sounds is the body with its indivisible being-in-the-world,
its capacity for action, and its role as substrate for cognitive processes. Sound comprises
any acoustic phenomenon we might perceive, whereas music is a constrained field, oper-
ating within combinations of sounds that are human-made and culturally coded. Sound
perception operates on numerous levels, from providing evolutionary survival cues,
to carrying core elements of interpersonal exchanges, to enabling culturally encoded
60 jan schacher
symbolic, and semantic elements subjugate the primary elements of timbre, pulse,
rhythm, consonance or dissonance, and the sonic spaces that are created with the use of
voices and instruments. Through its structuring movement and its relational meaning-
construction, music functions as an ordering principle and as a conveyor of fixed
(and therefore recognizable) sounding tropes and affects. Cultural practices—that is,
musical styles that avoid the tendency of fixing all elements—have the potential to
reflect and mirror back the perception of music to a primary sensorial experience of the
ephemeral, both through listening and through perceiving by other corporeal means,
such as kinesthesia or proprioception. Other cultural practices, such as popular and
club music, leverage fixity and use expectations (Huron 2006) and musical schemata
to access a mode of perception that need not reside primarily in listening, and rather
occurs through co-performing; for example, by dancing and similar modes of physical
partaking in the music.
In all these cases, the body’s characteristics and its capacities to resonate, re-enact and
“re-member” physically provide the foundation for the sonic experience. Even in phono-
graphically mediated music or sound experiences (Walther-Hansen 2012), the body
provides a point of reference—even if only through its absence in the perceptual field
when listening to a recording. And even sounds with clearly identified natural causes
that do not involve an active subject, be it human or nonhuman, such as water or wind
noises, provoke a physiological, affective response in the body.
Here again, the “enactive” position postulates that cognition is a function of being
bodily in the world, and that having a body is what enables us to develop experiential
structures that exhibit meaning (Noë 2004). The “autopoietic” position of embodied
action is stated clearly by Varela and colleagues:
Applied to my topic, it is fair to say that without the body’s materiality, the body’s ability
to produce sound itself, we would not be able to perceive, let alone identify and give
identity to sound, as far abstracted from its moment of production it might be, or how-
ever tenuous our perceptual link to it might be (Voegelin 2010, 82). The subjective identifi-
cation with sounds and their origins, through their presence in the same moment as
the perceiving subject, connects immediately but without fixity: it remains fluid and
fleeting, and needs to be (re-)activated continuously.
When perceiving events in sound, we perceive other bodies, agents, and actants
(Latour 2005) that are of the same kind and have the same capacities as our own body
motor imagery in perception and performance 63
and that the body only understands when they fit with a preexisting experience, a
predisposition to resemble, to be equivalent: a resonance. The body is the site of expe-
rience, the site of fusion between senses, perceptions, and memories, the site of cog-
nition. As a substrate and foundation, it carries cultural schemata of sounding, of
“eventing”1 (Ihde 2007, 109) with sounds in an affective, interpersonal, protolinguistic,
and even musical way. These schemata complement those of the body itself, of its imme-
diate learning and imprinting, and expose the potential to approximate relationships
between sound and event through other means.
Music performance encapsulates and charges sonic perception with cultural dimen-
sions, yet depends essentially on the perceptual and subpersonal capabilities of an
“enactive,” embodied intertwining with the sounding world, and on an experiential,
personal, and interindividual link to sound in a cultural context. In addition, personal
aspects of the performative construction of the self (Butler 1988) contribute the elements
of gender, age, social position, and other biographical factors that further color the act of
music performance and perception. The nature of musical processes is a dynamic flow,
not simply of time but also of elements constituted of bodily actions that produce
distinct sound impressions and carry immediate ecological meaning in a prereflective
corporeal domain, before rising to a protolinguistic, presemantic, or adaptive semantic
level (Reybrouck 2006). Within musical perception, the processes we are affected by,
perceive, and act out are made by dynamic chains of sound-objects as well as by action–
sound pairs or multimodal “gestural sonorous objects” (Godøy 2006). These elements
form “segregated streams and objects that lead, via the subjective sensing of the
subject’s body motion, to impressions of movement, gesture, tensions, and release of
tension” (Leman and Camurri 2006, 212–213). As musicians perform, they construct a
temporally unfolding stream of movement dynamics that the listener-viewer re-
enacts and co-performs through kinesthetic, corporeal resonances and higher-order
dynamic sensing. This state of active engagement is more akin to moving oneself than
to sounding within oneself.
The field of music psychology investigates cognitive, neural, and behavioral aspects
of music perception and performance actions. Empirical studies address the
understanding of one’s own and others’ actions (Jeannerod 2006), for example, by
investigating self-recognition (Sevdalis and Keller 2010) and co-performing with
other musicians (Keller 2008).
In his mimetic motor imagery hypothesis, Cox brings together a number of elements
that demonstrate and anchor in empirical research how imitative, simulation-based
re-enacting of sound-producing movements lies at the core of music perception: “Motor
64 jan schacher
The basis for motor imagery is given by two elements that are crucial for the acting
subject: the kinesthetic memory (Sheets-Johnstone 2009) and physical experience of
the perceptual consequences of similar earlier actions, and the ability to trigger complex
motor- or body-schemata or kinetic melodies (Luria 1973), such ability providing an
appropriate action form for the intentional image (Bergson 1939). While the memory of
a prior experience is always linked to an actually executed action, the intentional image
can become a surrogate for those perceptions that an executed action would produce
(Annett 1996). Thus, in motor imagery processes, the necessary addition of location and
timing information to a pre-established motor schema alters it in such a way as to help
inhibit or suppress the execution of the actual movement patterns.
The perceptual linking of imagery to motor patterns functions in two reciprocal paths,
one serving a representative the other an operational role. They are not exclusively
coupled and are independent enough to allow for an inhibition of imitative actions
as reaction to movement or action perception (Berthoz 1997, 209), and permit the
projection of action in motor imagery only and without the need to be executed openly
(Reybrouck 2001). In addition to recognition and mimetic re-enacting for goal-under-
standing, the mechanism of motor imagery plays a crucial role in the preparation of
real actions (Glasersfeld 1996, 65) and the storage of memories of executed actions:
“motor action itself, in its prenoetic body-schematic performance, has the same tacit
and auto-affective structure that involves the retention of previous postures, and the
anticipation of future action” (Gallagher 2005, 204).
If, in the case of simulation, the impulse for the execution of an action is blocked on
the way from the cortex to the spinal cord (Decety and Chaminade 2003) then, in case
of execution, the efferent and afferent neural activation streams form a complete loop
that enables adaptive and continuous control over the action (Annett 1996) and lead,
via internal model parameters (Keller 2012, 209), to the perception of one’s own agency:
“Performative awareness that I have of my body is tied to my embodied capabilities
for movement and action . . . my knowledge of what I can do . . . is in my body, not in a
reflective or intellectual attitude” (Gallagher 2005, 74). This “sense of effort” (James 1896)
provides the tacit proprioceptive knowledge that perceptual changes are indeed the
outcome of one’s own actions. “That is, although the content of experience may be the
intended action, the sense that I am generating the action may be traced to processes
that lie between intention and performance” (Gallagher 2005, 57) and “they are generated
66 jan schacher
The technique of motor imagery is commonly used in conjunction with physical training
in order to optimize skillful execution, for example, of an athletic task. It is used to imprint
as a corporeal motor schema a sequence of movements in a single coherent and economi-
cal movement unit. The goal is to have at one’s disposition a complex movement pattern
that only needs to be triggered once and not consciously controlled in every aspect
throughout the entire movement trajectory. The repetitive nature of practicing coordi-
nated movements of instrumental play fulfills the function of establishing body-schemata,
“integral kinaesthetic structures” (Luria 1973, quoted in Sheets-Johnstone 2009), dynamic
patterns, or so-called kinetic melodies. However, such obtained
Through the practicing process, the embodied “know-how” becomes prereflective and
can later, in the right environment and circumstances, be triggered as a unit without the
necessity to individually deal with the actions that constitute it. Obtaining these motor
schemata is considered beneficial to concentration and mental preparation for extreme,
singular, and rare high-performance moments. Having integrated complex patterns
into single units allows one to shift the focus to anticipation and adaptation in complex
situations such as, for example, returning a 200-km/h tennis service.
In athletic training scenarios, on the one hand, the patterns are predefined, pretrained
units of movement that are continuously recalled and reinforced. In a musical per-
formance scenario, on the other hand, how many of the actions are pattern-based and to
what extent these patterns modulate the shaping of a performance depends on stylistic,
musical definitions and their degree of fixity.
Active motor imagery serves as a technique for training and builds on the way
movements are memorized, related, and executed on a subpersonal as well as somatic/
kinesthetic level. Athletes, as well as musicians and dancers, are known to practice
mentally and with reduced physical activation, such as when marking phrases (Kirsh
motor imagery in perception and performance 67
2010). The extra scaffolding obtained from executing reduced, yet signifying bodily
actions, represents an interesting case of a hybrid practice, which leverages motor imagery
with goal-points or “key-frames” (Godøy and Leman 2010), without fully exerting the
body. Practice exercises for musicians can have the same degree of determination as an
athlete’s movement schema and can be mentally practiced in the same way. Training of
fine-motor control and the creation of larger body-schematic movement units that can
be recalled without conscious involvement are the central occupation in a musician’s
training during the instrumental skill acquisition phases.
The body accumulates knowledge about movements, dynamics, and forces and, in the
case of traditional musical instruments, links it to the perception, the adaptation, and the
control of the desired sound-qualities, thus dealing with movement-sound conjunctions
rather than with movement and sound separately. This embodied knowledge encom-
passes the full range of the body’s motion and audition control. It is completely interde-
pendent with the environmental situation within which it is learned and acquired.
Full musical performance situations consist of a large number of perceptual tasks
that need to be negotiated and mastered with skills going beyond mere body-schematic
patterns. Yet, in order to achieve the necessary level of (hyper-)reflection (Kozel 2007),
and in order to master both the corporeal and the expressive musical demands of the
performance, it is important to able to base the execution on previously imprinted
body-schemata, to anticipate or plan the triggering of a motor movement unit, and to
modulate in real-time the parameters of its execution. The ensuing multilevel perception
and attention during music performance are necessary to manage high-level musical
goals and succeed with timing and the expressive control of phrases (Brown et al. 2015)
or chunks, in a top-down manner (Godøy 2006, 156) without the need to put full
attentional focus on specific single task elements.
The difference between general movement tasks, such as picking up a cup and drink-
ing from it, and musical tasks lies in the acoustic, sounding component whose percep-
tion plays a crucial role in controlling the quality of execution. In musical performance
with an instrument, by modulating fine-motor actions, an adaptive feedback-loop is
created that controls sound’s central aspects such as timbre, timing, and dynamics. This
loop contains both prereflective, kinesthetic as well as conscious, musical, or sonic
perceptions and (re-)actions and includes all the peripheral situating elements that are
part of the performance, that is, the stage, the other players, the social situation, and so
forth. Thus, the training of instrumental playing on a traditional instrument such as
the violin or the flute consists of learning recursive motor adaptations that depend
both on the perception of physiological, corporeal elements, such as posture, breath,
tension, and force, and on the auditory perception of tonal qualities such as timbre,
pitch, resonance, and volume:
When the status of habituation is reached, the body-image retreats into the background
in order to enable the concentration on the sonic-expressive shaping of the entire
piece of music, something to which the prereflective, proprioceptive and auditory
body senses are continuously subjected. (Kim and Seifert 2010, 111, my translation)
68 jan schacher
For the musical performer, lower-level auditory processes occur on a prereflective level
and inform musical awareness on a higher level, where the musical elements become
part of the experiential content. With habituation, this prereflective perception of
musical elements gets integrated into prereflective somatic proprioception, as in the
example of “feeling” the correct intonation on a string instrument. This habituation process
shows how musical awareness plays out on a metaphorical (Lakoff and Johnson 1980)
or conceptual (Fauconnier and Turner 2003) level and blends with and informs the
sensory-motor integration of auditory adaptations in body-schematic patterns. As with
any other physical task, performing music involves the coordination of intention, goal,
perception, and adaptive feedback for adjusting the motion trajectory. This is where
motor imagery on a prereflective and subpersonal level, as well as active, intentional
imagination become fundamental to a successful performance.
Ecological Embedding
and Affordances
The affordance of something does not change as the need of the observer changes. The
observer may or may not perceive or attend to the affordance, according to his needs,
but the affordance, being invariant, is always there to be perceived. (Gibson 2015, 130)
Gibson derived his concept from “Gestalt” psychology’s terms of valence, invitation,
and demand, but was critical that its proponents used the concept in a value-free manner.
He emphasized the inherent meaning that arises out of ecological embedding:
motor imagery in perception and performance 69
An affordance points two ways, to the environment and to the observer. So does the
information to specify an affordance . . . this is only to reemphasize that exterocep-
tion is accompanied by proprioception—that to perceive the world is to coperceive
oneself . . . The awareness of the world and of one’s complementary relations to the
world are not separable. (132–133)
In order to understand the scope of objective affordances (Paine 2009) that arise in
playing traditional, physical music instruments, the concept of perceptual affordances
needs to be added that is located in the cultural domain of music. On a primary level,
perceptual affordances can be defined as those types of perceptions generated when
entering into contact with the instrument but without necessarily interacting with it.
These perceptions form a multimodal field that encompasses the traditional five senses
of vision, audition, touch, taste, and smell. They arise when attentional awareness is
guided toward the instrument in any of the sensory modes. An example of such an affor-
dance is that of perceiving the tension of a drum skin while holding a frame-drum. On
a secondary level, perceptual affordances could also be seen as the potential for per-
ceptions to arise from interaction with the instrument. These secondary perceptions could
be tied to the five senses as well, if they manifest themselves within the outside per-
ceptual field and in direct relationship to the instrument. An example of this affordance
would be the sound generated from playing the instrument and contained in the audi-
tory event that arises out of an instrumental action. The perception or awareness that
originates within the player when interacting with the instrument, however, represents
a separate type of perceptual affordance that—even though it is derived from contact
and action with the instrument—does not exist independently of the cognitive or sub-
personal processes of the performer. The outer contact with the instrument is conveyed
by tactile and sometimes vibrotactile cues. In contrast, the inner effects of contact with
the instrument are based on a kind of sensing that is active within the body, such as
kinesthetic and vestibular sensing. These effects cannot be called perceptions but rather
sensations and belong to the prereflective, precognitive levels of our perceptual system.
An example of this inner type of affordance might be the level of comfort or the com-
plexity of physical adaptation an instrument demands for its proper playing position,
such as, for example, correctly lifting the hands while sitting at a piano. Or the affordance
might be the prereflective adaptations to playing due to the perception of vibrational
forces transmitted through the body, such as the modulation of a vibrato as felt through
the changes in the vibrating string. Finally, on a higher level, the sounds of an instrument
itself obtain their meaning, and therefore offer their “musical” or “cultural” affordance
in the context of their application. In that case it is less the physical aspects of the instru-
ment and more their habitual use that defines the “ecological” potential. Motor imagery
depends on internalizing the affordances, on the ability to internalize the sounding
result of a musical motor action; it therefore “involves mutuality between perception
and action at a neurobiological level” (Windsor and Bézenac 2012, emphasis added) as
well as on an experiential and cultural level, since any musical action is situated in a
cultural context and builds on prior experiences.
70 jan schacher
The literature on musical gesture provides a rich set of categorizations and classifications
that deal mainly with the types and effects of actions on musical instruments labeled
as “gestures.” Cadoz’s classification of the “gesture channel” differentiates between the
three functions of the ergotic, that is, the “material action, modification and transfor-
mation of the environment,” the epistemic, and the semiotic, and orders the instrumental
“gestures” in the three categories of excitation, modification, and selection (Cadoz 2000).
Godøy formulates the distinction between body-related and sound-related “gestures”
(Godøy and Leman 2010) that are categorized into sound-producing, communicative,
sound-facilitating, and sound-accompanying “gestures” (Jensenius et al. 2010). These
authors all take into account the bodily basis for the actions, sometimes also the per-
ceptual effects, but fail to address the prereflective effects inherent to acting and perceiving
musical agency through an instrument, in particular the new forms of technological
instrument that rely on abstract mathematical models and digital signal processing for
the production of sound.
In order for digital musical instruments to become “playable” in the proper sense of
the word, the representations of their digital processes need to occur in metaphors
(Lakoff and Johnson 1980); these processes are too complex to be grasped and acted on
directly while performing.2 The metaphors are present in visual representations, such
as the display of waveforms or spectrograms, in physical placeholders, such as levers,
wheels, knobs, and sliders, or in more encompassing analog device metaphors such
as tape-reels, patch-bays, and signal-chains. By themselves, these metaphors are useful,
and enable complex instruments to be “played”; the problem is their limiting effect on
the cognitive and perceptual capacities that could be better mobilized with richer, more
differentiated, and more process- or action-specific metaphors.
A number of conceptual models for the control of digital sound processes originate
in real-world scenarios and in existing physical devices and can therefore be cognitively
handled through actions and behaviors that are shaped by everyday experiences. The
two main models of control can be identified as the instrument (Jordà Puig 2005) and
the cockpit (Wanderley and Orio 2002). The first model is based on a traditional musi-
cal instrument’s dependence on (continuous) energy input that is necessary to produce
sound. Rather than presenting mechanisms for generating larger time-based structures,
the instrument offers a palette of sound options (or playing techniques) that need to
be actively selected, combined, and performed by the musician. The second model of
action puts the performer into an observer perspective or pilot’s cockpit, where, from a
position of overview, single control actions keep the system within the boundaries of
the intended output, while the actual sound processes produce their output without the
need for continuous excitation and control. A third and less common model is that of
dialogical communication and interaction with generative aspects that become an
motor imagery in perception and performance 71
integral part of the sound production processes. The most interesting manifestations of
the third model deploy some form of autonomous agents to generate an “inter-subjective”
exchange (Lewis 2000).
The types of interaction and their position on the conceptual axis, between direct
parametric control and “naturalistic interaction,” depend on the level at which the musi-
cian acts or “inter-acts” with the digital domain (Kozel 2007, 68). Different complexities
demand different tangible objects and instrumental interfaces. In the case of one-
dimensional and precise parametric control, individual objects such as knobs, sliders,
or buttons are cognitively appropriate, since they represent in their physical form the
singular dimension of the parameter and can be handled discretely. In the case of
higher-dimensional or model-based action patterns, control objects with more degrees
of freedom are required. The mode of “interaction” with more intertwined dimensions
should reflect the relationship and dependency of those degrees of freedom that are
present in the digital domain.
The most complex set of entangled degrees of freedom that we can cognitively han-
dle are those present in our entire body. Leveraging this level of complexity, at least
through extraction of information about posture and kinematic qualities of the body is
attempted, for example, by camera-based motion controls for games in so-called natural
user interaction where full-body movements are used for control. This might be an
appropriate method when the goal is to affect a virtual body that mirrors the capabilities
of the natural body in a virtual game environment. It becomes problematic, however,
when the correspondence between the actions in the physical world and the result or
reaction in the abstract digital domain are modeled after categories that originate in
the abstract domain. Empty-handed and movement-based controls in an allocentric3
frame work well for metaphors of control that reflect spatial qualities. Object-based,
instrumental actions with tangible interfaces in an egocentric4 (for example, with wear-
able sensors) or object-centric frame are effective for actions on abstract entities without
clear correspondence in the real world. Digital instrument design and interface devel-
opments oscillate between these two poles. There is, however, a tendency to shift away
from action and behavior patterns that are based on the bodily capabilities shaped by
object “interaction” with physical instruments, and to move toward symbolic and
metaphorical projection onto a disjointed digital model.
A smartphone with its touch screen, for example, gets used as—but was also designed
to become—a generalized object with a repeatable and representable repertoire of
movement patterns, the so-called gestures of pinch-to-zoom, swipe, and so on. These
action patterns were copied from the natural world. Slight dissonances within or new
interpretations of these patterns are learned and absorbed quickly when they constitute
part of the interaction vocabulary of an information device.5
Technological instruments such as the turntable or a tablet are easily integrated into a
musician’s movement and instrumental vocabulary. Turntable-ism is a prime example
of the reappropriation of a music playback device into an instrument, subverting codes
of musical style as well as social codes of “stealing” music or the disregard for the
“authentic” musician’s voice (Eshun 1998). Today, in a further shift, the turntable finds
72 jan schacher
As the source of sounds, musical instruments with their rich cultural history and the
field of association they carry have a profound impact on our imagination of music
making. With the exception of the voice, all human-made (conventional) musical sounds
are generated by vibrating objects that exhibit specifically tuned physical properties.
Within a single culture, these modes of sound production are commonly known and
form the basis for understanding the act of music making. Sounds that have never been
heard, and do not resemble any other sounds that were experienced before, are not
easily identified, get confused with other sounds, or are simply ignored (Lemaitre et al.
2010). The prereflective auditory processes responsible for these decisions are part of
the filtering and inhibition systems that most of our perception is based on. Since
recognizing sounds is an evolutionary necessity, we are highly attuned to localizing and
identifying a sound’s origin rapidly and preconsciously, even if that means occasionally
failing. This capability is transferred to recognizing musical sounds and identifying
instruments, voices, and acoustical signals.
When considering the import that musical instruments have on our ability to imag-
ine producing sounds in a meaningful manner, the primary relationship to take into
account is that of an active body interacting with, exerting control over, and imposing
intentions onto a tool or object. The “body-object articulation” is a charged field and
contains not just the pragmatic value of its usage but also the signifiers of agency6 or,
in political terms, of the inherent power-relationship (Foucault 1977); the articulation
constitutes a body-weapon, body-tool, even body-machine complex that becomes a
relevant topic and urgent concern in technological performance practices and the way
technological and information-bearing tools pervade our current life-world and blur
the boundary between the organic and the technological (Haraway 1987).
Even though the body-object-movement relationships and kinesthetic patterns that
are offered by technological instruments exist in the same domain as the traditional
ones, culturally defined and explicitly designed motor images and interaction patterns
motor imagery in perception and performance 73
prevail. In order to enable the manipulation of sound with intentional actions, even a
technological instrument that is based on digital (intangible) processes to generate
sound needs a control- or performance-interface that is based on physical character-
istics; it needs to provide methods of access through proxy layers that enable physical or
gestural interactions. The way technological instruments mediate and alter the path
from an imagined and anticipated sound-event to its sonic manifestation tells as much
about the “technicity” of the instrument (Simondon 1958) as about the mechanisms for
music making we depend on. The temporal unity of an action and its sonic result, for
example, are critical to maintaining a sense of causality and agency. The translations that
are necessary to link a physical action to the production of a sound show the perceptual
boundaries of the physical properties of sounding objects, moving bodies, and the
action-sound coupling that are always present in the natural world; the immediate bond
between bodily action and sounding result is broken by the use of symbolic machines.
These computer programs with their associated graphical user interfaces are merely
executing logical or mathematical operations in order to generate sounds. Even though
technology is optimized to hide this fracture, for example, by becoming so fast as to appear
immediate and transparent, our necessary and indissociable reliance on embodied per-
ception for identifying sound sources as a matter of survival generates an inherent tension
and contradiction that undergirds and permeates any performance with technology.
How this tension can be fruitfully exploited to generate meaningful relationships
for performing arts is stated succinctly by Kozel (2007, 70–71): “If we create responsive
relations with others and our environments that transcend language, then by means
of intentional performance with technologies we can regard technologies not as tools,
but as filters or membranes for our encounters with others.” This statement emphasizes
the fact that musical imagination and performance are part of a deeply cultured activity
and are always already oriented toward others (Decety and Chaminade 2003). This
applies to all levels of “technicity” of instruments, even primary vocal utterances of
musical nature, and shows that current musical practices contain the reciprocal func-
tion of affectively touching the performing as well as the perceiving subject who are each
other’s “other” in the communicatively enfolded moment of “musicking.”
Notes
1. Or producing an event.
2. Even emerging live-coding practices rely on textual representations in programming
languages and widgets of graphical user interfaces to “perform” with sound-processes.
3. An outer spatial frame of reference.
4. A spatial frame of reference anchored on oneself.
5. Think of the finding of the power button of your smartphone; after a short period of
accustomization, the act of switching the screen on or off becomes a pattern that does not
need extra attention, even though there is often no clear reason why it might be in one
place or the other on the device.
6. It is interesting to consider the term “agency” in its German translation: “Handlungssmacht”
could be translated as the power to act (Stockhammer 2015).
74 jan schacher
References
Annett, J. 1996. On Knowing How to Do Things: A Theory of Motor Imagery. Cognitive Brain
Research 3 (2): 65–69.
Bergson, H. 1939. Matière et mémoire: Essai sur la relation entre le corps et l’esprit. Paris, France:
Presses Universitaires de France, Quadrige. (English: 1911, Matter and Memory. London,
UK: George Allen and Unwin.)
Berthoz, A. 1997. Le sens du mouvement. Paris, France: Odile Jacob.
Bregman, A. S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound.
Cambridge, MA: MIT Press.
Brown, R. M., R. J. Zatorre, and V. B. Penhune. 2015. Expert Music Performance: Cognitive,
Neural, and Developmental Bases. Progress in Brain Research 217: 57–86.
Butler, J. 1988. Performative Acts and Gender Constitution. Theatre Journal 40 (4): 519–531.
Cadoz, C. 2000. Gesture-Music. In Trends in Gestural Control of Music, edited by
M. M. Wanderley and M. Battier, 71–94. Paris, France: Ircam, Centre Pompidou.
Cox, A. 2001. The Mimetic Hypothesis and Embodied Musical Meaning. Musicae Scientiae
5 (2): 195–212.
Cox, A. 2011. Embodying Music: Principles of the Mimetic Hypothesis. Music Theory Online
17 (2): 1–24.
Decety, J., and T. Chaminade. 2003. When the Self Represents the Other: A New Cognitive
Neuroscience View on Psychological Identification. Consciousness and Cognition 12 (4):
577–596.
Enticott, P. G., H. A. Kennedy, J. L. Bradshaw, N. J. Rinehart, and P. B. Fitzgerald. 2010.
Understanding Mirror Neurons: Evidence for Enhanced Corticospinal Excitability During
the Observation of Transitive but Not Intransitive Hand Gestures. Neuropsychologia 48 (9):
2675–2680.
Eshun, K. 1998. More Brilliant Than the Sun: Adventures in Sonic Fiction. London: Quartet Books.
Fauconnier, G., and M. Turner. 2003. The Way We Think: Conceptual Blending and the Mind’s
Hidden Complexities. New York: Basic Books.
Foucault, M. 1977. Discipline and Punish: The Birth of the Prison. London: Vintage.
Gallagher, S. 2005. How the Body Shapes the Mind. Oxford: Clarendon.
Gaver, W. W. 1993. What in the World Do We Hear? An Ecological Approach to Auditory
Event Perception. Ecological Psychology 5 (1): 1–29.
Gibson, J. J. 2015. The Ecological Approach to Visual Perception. New York and London: Taylor
and Francis, Psychology Press.
Glasersfeld, E. 1996. Radikaler Konstruktivismus: Ideen, Ergebnisse, Probleme. Frankfurt am
Main: Suhrkamp.
Godøy, R. I. 2006. Gestural-Sonorous Objects: Embodied Extensions of Schaeffer’s Conceptual
Apparatus. Organised Sound 11 (2): 149–157.
Godøy, R. I., and M. Leman. 2010. Musical Gestures: Sound, Movement and Meaning.
New York: Routledge.
Groves, R., N. Zuniga Shaw, and S. DeLahunta. 2007. Talking about Scores: William Forsythe’s
Vision for a New Form of “Dance Literature.” In Knowledge in Motion: Perspectives of Artistic
and Scientific Research in Dance, edited by S. Gehm, P. Husemann, and K. von Wilcke,
91–100. Bielefeld, Germany: Transcript Verlag.
Haraway, D. 1987. A Manifesto for Cyborgs: Science, Technology, and Socialist Feminism in
the 1980s. Australian Feminist Studies 2 (4): 1–42.
motor imagery in perception and performance 75
Huizinga, J. 1955. Homo Ludens: A Study of the Play-Element in Culture. Boston: Beacon.
Huron, D. B. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge,
MA: MIT Press.
Ihde, D. 2007. Listening and Voice: Phenomenologies of Sound. Albany: SUNY Press.
James, W. 1896. The Principles of Psychology. London: Macmillan.
Jeannerod, M. 2006. Motor Cognition: What Actions Tell the Self. Oxford: Oxford University
Press.
Jensenius, A., M. M. Wanderley, R. I. Godøy, and M. Leman. 2010. Musical Gestures, Concepts
and Methods in Research. In Musical Gestures, Sound, Movement and Meaning, edited by
R. I. Godøy and M. Leman, 12–35. New York: Routledge.
Johnson, M. 2007. The Meaning of the Body, Aesthetics of Human Understanding. Chicago:
University of Chicago Press.
Jordà Puig, S. 2005. Digital Lutherie: Crafting Musical Computers for New Musics’ Performance
and Improvisation. PhD thesis, Barcelona, Spain: Universitat Pompeu Fabra, Department of
Information and Communication Technologies.
Keller, P. E. 2008. Joint Action in Music Performance. Emerging Communication 10:205.
Keller, P. E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and
Potential Benefits. Annals of the New York Academy of Sciences 1252 (1): 206–213.
Kim, J. H., and U. Seifert. 2010. Embodiment musikalischer Praxis und Medialität des
Musikinstrumentes—unter besonderer Berücksichtigung digitaler interaktiver Musik
performances. In Klang (ohne) Körper, Spuren und Potenziale des Körpers in der
elektronischen Musik, edited by M. Harenberg and D. Weissberg, 105–117. Bielefeld:
Transcript Verlag.
Kirsh, D. 2010. Thinking with the Body. In Proceedings of the 32nd Annual Conference of the
Cognitive Science Society, Austin, TX, edited by the Cognitive Science Society, 32:32,
2864–2869. Mahwah, NJ: Lawrence Erlbaum.
Kozel, S. 2007. Closer: Performance, Technology, Phenomenology. Cambridge, MA: MIT Press.
Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago: University of Chicago Press.
Latour, B. 2005. Reassembling the Social. Oxford: Oxford University Press.
Lemaitre, G., O. Houix, N. Misdariis, and P. Susini. 2010. Listener Expertise and Sound
Identification Influence the Categorization of Environmental Sounds. Journal of Experimental
Psychology: Applied 16 (1): 16.
Leman, M., and A. Camurri. 2006. Understanding Musical Expressiveness using Interactive
Multimedia Platforms. Musicae Scientiae 10 (1): 209–233.
Lewis, G. E. 2000. Too Many Notes: Computers, Complexity and Culture in Voyager. Leonardo
Music Journal 10: 33–39.
Lotze, M., U. Heymans, N. Birbaumer, R. Veit, M. Erb, H. Flor, et al. 2006. Differential Cerebral
Activation during Observation of Expressive Gestures and Motor Acts. Neuropsychologia
44 (10): 1787–1795.
Luria, A. R. 1973. The Working Brain. Harmondsworth, UK: Penguin Books.
Merleau-Ponty, M. 1945. Phénoménologie de la perception. Paris: Gallimard.
Montgomery, K. J., N. Isenberg, and J. V. Haxby. 2007. Communicative Hand Gestures and
Object-Directed Hand Movements Activated the Mirror Neuron System. Social Cognitive
and Affective Neuroscience 2 (2): 114–122.
Noë, A. 2004. Action in Perception. Cambridge, MA: MIT Press.
Paine, G. 2009. Towards Unified Design Guidelines for New Interfaces for Musical Expression.
Organised Sound 14 (2): 142–155.
76 jan schacher
M usic a n d Em ergence
John M. Carvalho
Introduction
There is something that emerges in a piece of music, especially in the skilled act of
making music. What emerges is the music in that piece of music. We say we make music
when, better put, we enact it by patterning sounds that achieve or contribute to the
emergence of music in an otherwise undifferentiated field of sound. For the purposes of
this chapter, the music that emerges from our skilled engagement with an otherwise
undifferentiated field of sound will be described as afforded by that field (Gibson 1979).
The affordances that turn up in the field will depend on the skills and the refinement of
the skills one attempting to make music deploys in her skilled engagement with a field of
sound. For the musician, sound is her environment, and the skills she has for engaging
this environment have been acquired and refined in prior engagements with sound
in this environment. Her skills are importantly embodied and, also, extended in the
instruments and others tools—the score, a music stand, a tuning device, and so on—she
uses in her skilled engagements with an environment of sound. Affordances in that
environment turn up for her specifically embodied and extended skills, and music
emerges from her embodied and extended engagements with those affordances. She
imaginatively tests which skills most musically pick up what is afforded by her environ-
ment and deploys the skills that enact the music virtually present there.1 The particular
music that emerges from that environment emerges for the distinctly refined skills
engaged by the composer, the performer, and the listener (recognizing that these skills
can and will overlap). The emergence of roughly the same music for a variety of musical
skill sets and refinements testifies to the way these skills are shared and the extent to
which our environment is co-constituted by a variety of musical subjects.
This chapter draws on arguments for the ecology of cognition to support its claims
about the emergence of music (Clarke 2005). The leading idea for this ecology is that our
minds are fundamentally active and interactive, in the world and not in our heads. This
thinking reverses traditional models that conceive cognition as passively receptive to
78 john m. carvalho
input from the environment that is processed in the form of representational content
leading to action and behavior. On the ecological model, cognition actively engages the
world, remaking the environment into an emergent field where its projects can be real-
ized. Using skills learned and refined in engagements with the world, subjects realize
their aims by enacting what the environment affords them. In the case of music, subjects
pattern sounds that turn up in the environment for their particular set of skills and the
refinement of those skills. Music emerges from affordances picked up and enacted or
realized by composers, performers, and audiences drawing from their skilled engage-
ment with prior performances of music works, the score, this particular performance
and responses to that performance as well as in the instruments played, the voices sung,
the venue where the music is performed, the constituency of the audience and so on.
Again, the music that emerges will be singularly connected to specific performers and
audiences engaged in making this music, but it will also be shared in virtue of the
convergences of the skills and affordances shared by the musical subjects involved.
In this ecological model, the imagination is not treated as a discrete faculty repre-
senting mental content that differs from what can be perceived or believed about what is
represented in that content. On the philosophy of mind drawn on here, the imagination
figures as an affective valence of the always embodied engagement of the mind in an
environment of sound. The mind as it is conceived here is not separable into distinct
doxic, praxic, and pathic streams. For embodied cognition, there is no perception with-
out action and no action without a conception, including an imagination, of the end of
that action which is afforded by the environment for an actor with specific skills. The
imagination functions with cognition and action to make actual what is only virtually
present in the environment for this particular body with aims afforded for the skills
acquired and refined by that body in relation to the environment. For the musician,
embodied in a composer or a performer or a listener, as this particular composer, per-
former, or listener—does she or has she played the piano, to what degree of proficiency,
or is she a singer, a horn player, a DJ, and so forth?—the environment of sounds turns
up affordances for the emergence of music by virtue of the imaginative, perceptive,
cognitive, and active engagement of this particular musician with this particular environ-
ment of sound. The imagination is a feature of this engagement. It is not determina-
tive, but it always figures in the embodied engagement of a musician with the sonic
environment that turns up for her as she composes, performs, and listens for the music
in that environment, engaging the environment to make music emerge from it.2
On this ecological model, music will be accounted for in terms of what emerges from
the affordances subjects pick up in their skilled interactions with the environment. This
music does not have a specific representational content that can be recognized as this but
not that. It is, instead, what repeats itself and cannot but repeat itself precisely because
it has no content.3 What emerges as music is what repeats itself in the environment in
virtue of the skilled engagements with the environment by composers, performers, and
auditors. This music that repeats itself is what the composer, the performer, and auditor
find and give back to the environment by their attentive, enactive playing and listening.
Without such an engagement, there is no music but only a succession of notes more or
music and emergence 79
less adequately executed, more or less attentively heard. In this chapter, music will be
taken to be what emerges in performances of it. What emerges does not approximate an
idea or an ideal. Music is very much real in its emergence as what we hear in this per-
formance of it. We approach this controversy as well as the question of a skilled listening
to music through an underappreciated text, “Listening,” by Roland Barthes (1985a). In
his account of listening, Barthes refers to the way the unconscious, in the psychoanalytic
setting, gives an ear to what emerges from the subject the analyst listens to. We update
and substantiate Barthes with a revised taxonomy for modes of listening proposed by
Kai Tuuri and Tuomas Eerola (2012) defending Barthes’s appeal to the unconscious
against what Tuuri and Eerola call “critical” and “reflective” listening. We locate music’s
“unconscious” in its “groove” as discussed by Maria Witek (2014) and Tiger Roholt
(2014), and we encounter it in the performance of “At Last!” by Etta James. We conclude by
defending a critical as opposed to a metaphysical ontology (Goehr 2007; Neufield 2014)
that identifies the emergence of music in an enactive performance of it.4
Listening
Among the many insights he offers us about music, Barthes says there are three ways
of listening. There is listening to an alert, listening that is a deciphering, and listening
that develops an intersubjective space where what we listen to is “a general ‘signifying’
no longer conceivable without the determination of the unconscious” (1985a, 246). The
first two ways of listening, Barthes says, we share with animals. The last, on his view, is a
distinctly human, and modern, way of listening.5 For Barthes, this distinctly human lis-
tening compares with the listening of the analyst in the psychoanalytic setting. Barthes
does not spell out the implications of this observation for listening to music. We do that
here, but ours is not a hermeneutic exercise. We do not hope to reveal what Barthes
might really have wanted to say about a listening to music that is inconceivable without
the determination of the unconscious. Instead, we hope to take advantage of what
Barthes wrote to get traction on the question, How does what is virtually present in an
environment of otherwise undifferentiated sound emerge as music in our listening to
that environment? To manage this, at least two things must be clarified: what Barthes
means by the unconscious and what Barthes means by a “general signifying,” what he
also calls “signifiance.”
Before getting to the unconscious and signifiance, however, we should note that Barthes
distinguishes listening from hearing. Hearing, on his account, is the physiological ana-
logue to the psychological act of listening. We cannot account for listening acoustically,
Barthes writes, or by reference to the anatomy of the ear and its object or goal. Rather,
listening involves the mind as well as the body which is formed by the mind from which
the body takes the object or goal of its listening. Hearing, on this view, is how the body
physically responds to listening’s psychological evaluation of a spatial and temporal
situation. Listening does not respond to hearing. That would lead quickly to a dualism
80 john m. carvalho
Barthes would not abide and we should reject. It is rather the case that listening directs
hearing while hearing supports listening. Listening is, thus, embodied in hearing just as
much as hearing is animated by listening. Barthes would likely not have been moved
from this position by the evidence now available that extends the brute anatomy of the
ear to the sophisticated psychobiology studied by cognitive neuroscience (see Schnupp
et al. 2012). Listening has for him an evaluative function that directs the hearing body to
affordances in its environment, and there are good reasons for thinking he is right. There
is no question of a dualism here. Hearing and listening are conjoined tendencies of an
entirely embodied cognition. At one pole we recognize that something is audible. At the
other, we engage what is audible in the context of the lives we enact by listening but also
singing, dancing, speaking, and so forth.
Listening “is the very sense of space and time,” Barthes writes. By “the perception of
degrees of remoteness and of regular returns of phonic stimuli” we shape our sonic
world (1985a, 246). He contends that humans identify a territory by listening to the
familiar and the unfamiliar in the sounds that constellate a more general environment.
We hear sounds, on his view, but identify a place by listening. For example, the “house-
hold symphony” of kitchen noises, plumbing, heating and air-conditioning, the sounds
of nature or the neighbors and maintenance equipment bleeding in from the outdoors
form an aural texture of background noises we hear as the basis for listening to a world
we call our home.
It is, again, as if listening “were the exercise of a function of intelligence,” Barthes
writes, taking intelligence to be a kind of “selection” (247). Listening picks things out,
it picks up affordances, but it only exercises this function against the background of
what is familiar and unobtrusive. If the background noises are too loud or unfamiliar,
listening—as a form of intelligence or selection—is precluded. Affordances turn up
because they reinforce what is familiar and because they expand creatively on what
has been familiarly afforded. If we were to restate Barthes in a contemporary idiom or,
better, if we were to draw from Barthes what will help us understand the role of the
mind in our appreciation of music today, we might say that listening enacts or achieves
the music afforded by what we hear in an environment of sounds. Let us now see how
this distinction plays out in Barthes’s taxonomy.
The alert, the first order of listening, is said to be what threatens to interrupt, disturb
or positively enhance the safe, sonic space that is the listener’s territory. Listening, at this
level, is a response to surprises perceived as either a menace or a need. The “raw material”
of listening on this level is what Barthes calls the “index.” The index is something singu-
lar, something that stands out because it is distinctive or exemplary in the context or the
texture of the territory or what we have called the environment. This is a type of listening
we share with animals. The sound of a can of food being opened stands out in the sonic
space of the napping cat as do unexpected footsteps in the hall, the one promising the
satisfaction of a perceived need, the other anticipating a perceived threat. In the experi-
ence of listening to music, an alert may take the form of a missed note that derails the
resolution of a melodic line or what proves to be a passing tone that opens a sonic space
for improvisation. What is important about this type of listening in the case of music is
music and emergence 81
an increasingly private affair, until the moment when the speaker, listening to what is
interior to himself commands the attention of another’s—the priest’s, the analyst’s—
interiority. The one speaking now commands another to listen to what the speaker has
heard listening to himself.
The injunction to listen is the total interpellation on one subject by another: it places
above everything else the quasi-physical contact of the subjects (by voice and ear):
it creates transference: “listen to me” means touch me, know that I exist.8
(Barthes 1985a, 251)
Is not this what the musical performer commands? “Listen to me,” she says, through her
instrument and her song. “Touch me. Know that I exist.” She does not simply share the
music she plays in her performance of it with an audience. She commands the attention
of that audience. Hear me. Feel me. Touch me. That is our clue to how the musician listens
to herself, to what is afforded her embodied enactment of her music.
Importantly, it is the affordance she engages, interior to her, that this musician listens
to while performing and that she commands her audience to listen to in the music she
makes. Listening to her music, music that emerges from the musician’s skilled engage-
ment with those affordances, the audience listens to the musician herself. They make
her interiority theirs. Here we have the beginnings of a shared intersubjective space
of listening. Barthes describes the telephone as the archetypical instrument of this
listening, since it “collects the two partners into an ideal (and in certain circumstances
an intolerable) inter-subjectivity” (1985a, 251–252). Telephonic communication, he says,
invites the Other to “collect [the speaker’s] whole body in his voice” (252). Speaker and
listener are, thereby, embodied and extended in the telephone that connects these
modalities of their embodiment.9
So far, Barthes’s observations square with the revised taxonomy for listening proposed
by Tuuri and Eerola (2012). Tuuri and Eerola conceive listening as an action-oriented
intentional activity that finds meaning in “emerging resonances between experi
ential patterns of sensation, structured patterns of recurrent sensorimotor experiences
(action-sound couplings) and the projection of action-relevant mental images” (137).
In the literature on listening taxonomies (Schaeffer 1966; Chion 1983, 1990), they discern
three listening modes: a causal mode distinguished by an intention to apprehend causal
indices, a semantic mode distinguished by an intention to comprehend meanings, and a
reduced mode distinguished by an intention to perceive the sound itself (Tuuri and
Eerola 2012, 139).10 More recent developments (Huron 2002; Tuuri et al. 2007) suggest a
division into two pre-attentive modes (reflexive and connotative), two source-oriented
modes (causal and empathetic), and three context-oriented modes (functional, seman-
tic, and critical) (Tuuri and Eerola 2012, 141). The pre-attentive modes, which capture
innate and primordial affective responses and their associations, map what Barthes
calls listening as to an alert. The source-oriented modes, which capture denotative acti-
vation systems and the perception of a sound being intentional, and the context-oriented
modes, which capture sounds’ affordances, their sociocultural conventions, and their
music and emergence 83
appropriateness, make a finer grained map of what Barthes (1985a) calls listening as a
deciphering (Tuuri and Eerola 2012, 141–142).
In their revised taxonomy, Tuuri and Eerola term the pre-attentive modes (to which
they add kinesthetic action sound couplings) “experiential.” They group the source-
oriented and context-oriented modes, stripped of critical listening, under the heading
“denotative.” Finally, they pair critical listening, judgments about the appropriateness of
a sound in a given context and of our responses to that sound, with reduced listening,
focusing on the “sound itself and its qualities,” and call this mode “reflective” (142, 147).
Again, experiential and denotative modes of listening follow what we have observed in
Barthes so far. Reflective listening, however, is not a part of Barthes’s plan. Tuuri and
Eerola attribute reflective listening in its reduced mode to an attention to qualities of the
sound apart from the denotations associated with the sound (149). To the critical mode
of reflective listening they attribute a judgment that “evokes new meaning” in the sounds
experienced “and reevaluates those [meanings] already evoked” (149). So described,
reflective listening does not appear to contribute to the making or emergence of music.
Reduced reflective listening gives us sound stripped of its relation to the listener and
the environment. Critical reflective listening makes the dynamic interplay between
cognition and the environment one-sided: the sound is given and the listener judges what
is given. In the experience we are hoping to describe, the music emerges from affordances
that turn up in the environment for the specifically embodied skills of the composer,
performer, and listener.11 We expect to find music so enacted in what Barthes has called
a distinctly human and modern mode of listening not covered by Tuuri and Eerola.
The Unconscious
Listening at Barthes’s first level transforms noise into an index. Listening at the second
level transforms the index into a sign. It also transforms the listener into a dual subject.
On this second level, interpellation becomes interlocution “in which the listener’s
silence will be as active as the locutor’s speech” (Barthes 1985a, 252). Listening speaks,
and it is at this level that the third type of listening, a distinctly human and modern
listening, a listening “no longer conceivable without the determination of the uncon-
scious,” begins to take shape. The image of the telephone cited earlier does not occur
capriciously to Barthes. It is the same image Freud used to describe the analyst’s listening
to the analysand.
The analyst must bend his own unconscious like a receptive organ toward the
emerging unconscious of the patient, must be as the receiver of the telephone to the
disc. As the receiver transmutes the electric vibrations induced by sound waves back
into sound waves, so is the physician’s unconscious mind able to reconstruct the
patient’s unconscious, which has directed his associations, from the communications
derived from it. (Freud 1963, 117–126)
84 john m. carvalho
In the free association of the analysand’s giving an account of himself, the unconscious
speaks: “touch me,” it says, “know that I exist.” The analyst, evenly hovering, attending to
nothing in particular, refusing to latch onto anything that would lead her to learn only
what she already knows, listens for the emerging unconscious of the analysand. She makes
her unconscious the sounding board for the unconscious of her patient or, better, a
surface where the array of her patient’s cathexes can take shape, where, in this array, her
patient’s unconscious can emerge.12
Since the unconscious is said by Freud to function at the level of images, any inter-
vention at the level of language, the language of the analyst in particular, threatens to intro-
duce a selection that revises the unconscious in advance, telling the analyst only what
she wants to hear. The evenly hovering attention of the analyst attempts to eliminate or
bracket, at least, anything that might mediate a connection between her unconscious
and the unconscious of her patient. The unconscious, while not a language, is said to be
structured like a language (Lacan 1981), that is as a constellation of gaps, lapses, dif-
ferences, and differential relations, so the patient’s words, spoken freely and associatively,
provide a medium for his unconscious to emerge. The impressions the patient’s free
associations make on the unconscious of the analyst are, ideally, recorded there for
revision only after the fact in the report the analyst writes. The “symbolic order” of lan-
guage, the language of the Other, is said to enter the equation only in this revision the
analyst gives of her patient’s account as mediated by her own unconscious. Of course,
these ideal conditions only infrequently obtain. What obtains more regularly, right
away in the account the analysand gives of himself, is the structuring influence of the
symbolic order and the Other as embodied by the analyst.
The unconscious, on these terms, should not be counted as something archaic or
archetypical in consciousness. It is not, either, what Freud called the preconscious or
what others often think of as a reservoir of repressed desires and cathexes. The uncon-
scious on these terms is rather what is afforded in the account the patient gives. It is
what emerges, when it emerges, and repeats itself in the free associations of the patient
for the analyst who skillfully bends her ear to these associations and makes her uncon-
scious a sounding board for the unconscious of the analysand. Jacques Lacan (1981)
calls this unconscious the objet a, the object cause of desire in the analysand. For
Barthes it is what he calls a general signifying, an overfullness of meaning or signifi-
ance that means nothing in particular. In musical terms, it is not an alert indicated by
a false note nor a sign interpellating a practiced listener to interpret the meaning of a
raised fourth in a blues scale. There is, in music, following Barthes, something more
than what it means. There is in music a general signifying that emerges and repeats
itself—“hear me,” “know that I exist”—and we hear that something more, that signifi-
ance, only with a listening practiced and refined on the model of the unconscious just
described. Listening in this way, engaging the affordances in an environment of sound
on the model of the unconscious of the analyst bending to the unconscious of the
analysand, the music in a piece of music, the something more in that environment,
emerges for us. This mode of listening cannot be found in any of the taxonomies
offered by Tuuri and Eerola.
music and emergence 85
Groove
This something more in music is approximated by what Maria Witek and Tiger
Roholt have called “groove.” Groove is not a colloquial term. Roholt uses it spe
cifically to talk about the body’s noncognitive grasp of an element in music that signifies
without signification, that is, without meaning something in particular. Groove, Roholt
writes, is something we feel, the “motor-intentional affect” lived through as part of the
effort to understand the music’s motor-intentionality through bodily movements
(Roholt 2014, 137). Groove, following Roholt, is perceived haptically by the body in its
lived corporeality. In Barthes’s terms, our embodied listening echoes haptically the
motor-intentional affect in the music itself. On our terms, this embodied listening is a
skilled engagement with what is afforded by an environment of sounds. This listening
compares with the listening in the psychoanalytic setting. It is not listening for something
it has determined in advance. It is listening for what, emerging in this environment of
sound, that has no determinate content, that cannot be fitted under a concept, repeats
itself and cannot but repeat itself. Groove is what repeats itself for the listener skilled at
engaging what is afforded by its general signifying, its signifiance.
This general signifying is the something more we are listening for, especially as per-
formers, in the performance of a piece of music. We are listening for the music as it
emerges in the environment of sounds we engage. We are listening for what is latent, as it
were, in the manifestly skilled execution of the piece as scored. This signifiance is the
element that turns up, emerges, slips away, and re-emerges. It is what as performers we
are striving to find and hold together (maybe by letting go of our attention to the score
and our technique to do so). It is what as auditors we are listening for in those pieces of
music we find exemplary because in them there is a groove. What Roholt calls groove
gestures to what we are describing as the intersubjective space where what we are lis-
tening for is a general signifying that repeats itself and cannot but repeat itself, and this
general signifying, this groove, is there only just in case we are there to enact or achieve
it. So, what we are getting at is not so much what Witek calls “groove music,” though no
doubt there is such a thing, but a groove in music, a general signifying that we bend
our ears toward the way the analyst bends her unconscious to the unconscious of the
analysand. This is as true of the outro of “Straight, No Chaser” as played by the second
Miles Davis Quintet on Milestones (1958)—a groove hackneyed performers struggle
to achieve—as it is for the opening passage of “The Beatitudes,” by Vladimir Martynov
(1998), rescored for Kronos Quartet (2012[2006]) and heard throughout Paulo Sorentino’s
film The Great Beauty (2013).
In an interview with David Harrington of Kronos Quartet,13 we hear many of the
same sentiments reported by Simon Høffding (forthcoming) from his interviews with
The Danish String Quartet. After learning to execute the score, these highly skilled, pro-
fessional musicians say they have to learn the music that comes only after playing the
piece together repeatedly. In their rehearsals, they are not just listening to one another.
86 john m. carvalho
They are also importantly listening for the music. That music has a fleeting quality. It
cannot be heard when one or another player insists on what they have deciphered as its
secret, a secret only she or he can hear. The music emerges in the intersubjective space
created by a shared bending of the ears of every player (and often the living composer in
the case of Kronos) to the music in its emergence. These ears are obviously particular to
each of those players, but they are also general to the quartet as a whole in virtue of
the skills those players have refined by playing together and the affordances they have
shared in their performances. With ears skillfully attentive to the environment of
sound, these performers actively engage what emerges with their voices and their
instruments and make music, enact music in this performance of it.14
Consider the example of Beethoven’s late string quartet, No. 14, Opus 131 (1826),
which is played attacca (without break or pause).15 As the piece goes on for nearly forty
minutes in seven movements without stopping, the musicians are tasked with achieving
and maintaining the music through their own fatigue and their instruments’ changing
tuning. They must continuously enact the music in part by listening, with their own
musical bodies—bodies extended by their individual instruments as well as by the
bodies of those playing with them—to what is emerging in this music, what repeats
and can do nothing more than repeat itself because, in the overfullness of its own sig-
nificance, its signifiance, it cannot signify something to the exclusion of something else.
Witek and Roholt agree on the attribution of groove to music that moves the bodies
of listeners. They point to rock, rhythm and blues, funk, hip hop, and electronic dance
music as sources of listening pleasure derived from a compulsion to move and acting
on that compulsion by moving the body. Witek focuses her more scientific study on the
impact syncopation has on the desire of listeners to move to the music. Roholt focuses
on swing and on the comprehension of a motor-intentionality in the music by listeners
who move their bodies in response to that music. Our focus has been on listeners who
are also performers. Again, we are guided in our intuitions by Barthes:
There are two musics (or so I’ve always thought): one you listen to, and one you
play. They are entirely different arts, each with its own history, sociology, aesthetics,
erotics . . . The music you play depends not so much on an auditive as a manual (hence
much more sensuous) activity . . . it is a muscular music; in it the auditive sense has
only a degree of sanction: as if the body was listening, not the “soul”; this music is not
played “by heart”; confronting the keyboard or the music stand, the body proposes,
leads coordinates—the body itself must transcribe what it reads: it fabricates sound
and sense: it is the scriptor, not the receiver.16 (Barthes 1985b, 261)
From what we have said earlier, we know that for us the body listening is not just
the physical body but (as for Roholt) the haptic, sensuous, affected and affective body,
the body capable of enacting or achieving music not out of habit (by heart) but by a
constant, attentive bending toward the environment where music can be heard emerging
in the performance of it.
This listening is something we can fathom comfortably when it comes to playing
for ourselves or performing solo for an audience. We can also comfortably fathom how
music and emergence 87
this listening is engaged in the enactment of music by small ensembles. The examples
cited previously were the Miles Davis Quintet and Kronos Quartet. It is more difficult
to imagine this kind of listening in a large ensemble, a symphony orchestra or concert
band.17 This difficulty is not a problem for the view set out here, since in those large
ensembles it sometimes happens, for a particular passage, a particular movement, in a
particular performance, that there is a magical confluence of the music making of the
conductor, one hundred or more performers, a concert audience, and the music. These
performances are truly memorable, legendary even. We attend performances of large
ensembles, and perform in them, for the chance to achieve that magic in music. In a
small ensemble, however, such music must be achieved more regularly if that ensemble
and the music they play will be remembered at all. It is likely most regular in string
quartets that play together for twenty-five years or more. In jazz (and rock and pop)
ensembles, too often a clash of egos or the assertion of a single ego leads to the ensemble
disbanding after only a few years, players seeking alternative intersubjective spaces
where they can achieve that general signifying conceivable only within the determi-
nation of the unconscious rather than the particular signifying of one dominant player.
This listening is not unfathomable for those who are not playing the music them-
selves. It is this listening that motivated Roholt and Witek to focus on how the bodies
of listeners are physically moved by rock, hip hop, and electronic dance music. Roholt
considers the case of classical music and ventures to guess that an attention by listeners
to the nuances expressed in some classical performance “will involve some body move-
ment that can be elucidated in terms of motor-intentionality” (Roholt 2014, 125–126).
We would venture to propose that he might discover motor-intentionality in the body
movements of classical performers (who are also listeners) themselves, especially the
body movements of performers in chamber ensembles and string quartets which extend
the ears and minds of those performers in the music that emerges from their playing or,
better, their achieving and enacting of that music.18
At Last!
In what we have said so far, at least two points deserve closer scrutiny: our apparent
commitment to the very idea of the unconscious—why introduce this element since it is
bound to arouse controversy—and the apparent commitment in the conclusions we
draw to an enactive ontology of music. In fact, neither commitment is as controversial
as it seems.
We were led to the concept of the unconscious by following Barthes’s association of
a distinctly human and modern form of listening with the listening of the analyst in
the psychoanalytic setting. We have tried to show that this is a form of listening well
known and acknowledged by performers and attentive listeners alike. Performers reg-
ularly talk about finding the music in what has been formally scored or merely sketched
out in advance. We mentioned the reports of David Harrington of Kronos Quartet
88 john m. carvalho
earlier. As an example of what he is getting at, consider the entrance of the second
violin in Kronos’s performance of “The Beatitudes,” by Vladimir Martynov. The crucially
embodied timing and the tonality of that entrance enacts or achieves the music in this
piece of music. Bowed a little late, a little too soon, with more or less vibrato, just slightly
sharp or flat, and the piece fails to come together. Everything about that piece of music
follows from this entrance, which must be skillfully achieved for the music of “The
Beatitudes” to emerge, and it will only be achieved if every member of the quartet, even
those not yet playing, bend their ears in the direction of that emerging music.
As another example of the same sort, take the Etta James rendering in 1960 of “At
Last!” written by Mack Gordon and Harry Warren in 1941 for the film Orchestra Wives
(directed by Archie Mayo). The timing, tonality, and timbre of the second note James
sings, as “last,” introducing and leading the band into the tune, are crucial to the emer-
gence of the music she and her accompanists achieve in this song. The entrance of the
note sounded as “last” follows a credenza ending in a held 9th chord and following that,
James singing “At” without accompaniment on the dominant 5th of the tune’s tonic scale.
“Last,” then, resolves the tension introduced with “At” and marks the downbeat as well
as the key that signs the song, signifying generally and abundantly the music of “At Last!”
As James holds “At” for as long as it feels right for her (the note is scored with a fermata),
she bends the sound in anticipation of the “last” that will follow (as she bends her ear to
the music emerging in her performance of the song). She increases the tension between
the dominant 5th and the tonic with the time it takes to resolve it and a grain in her
voice that enacts the blues idiom where she locates the tune (Barthes 1985c). She lands on
“last” in a way that cues the entrance of the band and, to do all these things, she must be
skillfully attentive to the affordance turning up for her, listening to the general signifying
she and those performing with her are in the course of enacting. She does not consult
a mental representation of prior performances of the song by herself or another. She
does not remember the song. What she is listening to, in advance, are the affordances
she will engage to enact the song on this occasion, the song that is realized or achieved
only by her skilled performance. No doubt she draws on skills she has acquired and
refined in prior performance of this song and others, but in this enactment of “At Last!”
she engages those skills in the context of the entirely local affordances picked up from
the particular performance of the cadenza and the anticipated skills of the band as a
whole as well as the audience for this particular performance. She bends her embodied
ear to what is afforded by this environment and, skillfully picking up those affordances,
enacts or achieves the music in this song. She brings “At Last!” to life. She deploys her
refined skills to realize and hold together the music of that song, what in that song
repeats itself and cannot but repeat itself in anticipation of her engagement with what is
emerging in the overfullness of her performance of it.
What emerges from “At Last!” in James’s performance is not a secret to be found by
tracing its origins to the musical score for a film about white entertainers (Orchestra
Wives) translated by a black blues singer for another audience and embraced by that
other audience because of what James brings to the performance of the song. Rather,
what emerges as “At Last!” is something James listens for in the song as she enacts it,
music and emergence 89
affording us the chance to listen for ourselves with the skills we have refined for enacting
the music James is performing. It may happen that this performance will result in a
missed encounter. It may happen that, on this particular occasion, James’s skills or the
supporting affordances will not be up to the task. “At Last!” will be realized only just in
case it is enacted by performers and listeners alike. Again, there is a certain magic to
these enactments. Music is not achieved just by the skilled execution of scored notes
performed in a prescribed manner for audiences skilled at evaluating such executions
(Goehr 2007). Music emerges for performers deploying their skills in the context of local
affordances for listeners deploying their skills in the context of their own local affor-
dances to achieve and enjoy not the signs of this or that song but the general signifying
that is the music itself. If there is a certain magic to these enactments, it is not something
mysterious we cannot foresee but a rare confluence of several, variable affordances
turning up and being picked up by skilled practitioners.19
If what emerges in a piece of music is not what Barthes, earlier, called its secret, what
we listen for on the second level, listening as a deciphering, it is also not what, once
heard, the musician can actively achieve by repeating it in memory. What the musician
is listening for each time she performs will be different relative to the local affordances
that vary with the musician’s honing of her skills and with the particular material cir-
cumstances of this or that performance. As we noted, it may happen that the musician’s
skills and the affordances she picks up result in a missed encounter with the general
signifying of the music she is attempting to enact. It may also happen that she encounters
a signifying she is not expecting, that she gives the song a life she did not know it had.
This can happen in the spontaneity of a live performance or it may happen in the prac-
ticed enactment of the song by the same or different performers.
Compare, for example, the renderings of “Stella by Starlight” by Ella Fitzgerald
(Verve 1961) and the Miles Davis Quintet (Columbia 1964). Both arguably achieve the
music of the song, but these are two very different lives that emerge from this same piece
of music. Fitzgerald’s “Stella,” with the help of accompaniment by Ray Brown on bass,
Lou Levy on piano, and Stan Levey on drums, swings. Against the syncopated rhythm,
we listen for the song’s title as it appears late in the lyric line and to the urgency in
Fitzgerald’s up-tempo vocals. The Davis Quintet’s “Stella” quickly dispenses with the
melody. In its place, Davis plays a moody, introspective meditation on the form of the
tune, leading the rhythm section through shifting cadences and drawing a line through
the form that enacts or achieves a music that only emerges in this particular performance
of the tune.
Both performances achieve “Stella by Starlight.” Both enact the music the song would
not have otherwise. (Both find a “groove,” in the language used earlier.) Drawing on
their refined skills as musicians, listening to the resources afforded by the circumstances
of their performances—Fitzgerald recording in a studio, Davis recorded live at Lincoln
Center—they enact the general signifying of the tune, what about the tune repeats
itself in anticipation of being engaged in this environment of sound by their skilled
musicianship.20 What each performer hears in “Stella by Starlight” is animated by an
evenly hovering attention to an intersubjective space formed from the bending of their
90 john m. carvalho
skilled listening to the performance of that song. From their skilled engagement to the
environment of sound turning up in that space, these performers enact what we have
been calling music. Something similar could be said of the renderings of “At Last!” by
the Glenn Miller Orchestra in 1941 and by Etta James some twenty years later. These
performers, Miller and James, each enact the music in the song, yet what emerges as a
general signifying of “At Last!” will vary relative to the specific skills of these musicians
and the affordances local to the performances they give. In effect, “At Last!” is a different
tune every time it is achieved, when it is achieved, allowing again that a given perfor-
mance may result in a missed encounter with the music of that tune.
A more traditional account of the distinction we are drawing would make James’s
an especially powerful token of the type “At Last!” (Wollheim 1980). On that view, the
type of the song would be given in the score, and the first token performance of the piece
would transfer qualities to the type substantiating it, setting a standard for future tokens.
James’s performance would be an especially powerful token that would transfer traits
to the type, thus setting a new standard against which future performances of the
tune, by Beyoncé Knowles, for example, would be judged. In fact, James came to think
of “At Last!” as her song, and Knowles’s performance (at the inaugural ball for
US President Barack Obama) has been judged against a standard supposedly set by James.
This view, however, idealizes the music of this song and music in general. It assumes
that there is a standard of correctness (the score) for measuring the achievements of
Miller and James and Knowles (and others). It would be better to say that Etta James
came to think of the song as hers because she heard a general signifying in the song
that others did not have the skills to achieve. It is not that James is more skilled, rather
her specific skills lead her to pick up affordances that do not turn up for others in
every performance of the tune that enacts the music in it.
Now, what exactly is the song whose general signifying James listens for in her
performance of it? How is the song identified, how is it heard as the song it is? On the
view defended here, “At Last!” is nothing else than the song that is enacted in every per-
formance of it, and it exists or, better, lives in those performances based on the skills of
the performer and the affordances that turn up for her specific skills and the skills of her
audience. What she listens for, what in the service of which she deploys her considerable
skills, is constituted, enacted, or achieved in all of the renderings of it that have been
performed and appreciated and that enable the music to emerge. There is no music in
the tune without a performance of it (Goehr 2007). The tune can be played and heard,
but not every playing or hearing of it achieves or enacts the music in that tune.
This view is grounded in an ecology of cognition that takes the mind to be necessarily
embodied and continuous with the environment in which it is embodied. For this
embodied mind, imagination contributes to the perception, cognition, and evaluation
of the affordances that turn up in the environment for the skills it has acquired and
refined. On this view, music is what emerges from such a skillful engagement with an
environment of sound. Music is just what is enacted and achieved in performances of it.
Performances depend on some manner of composition. Performances depend more
strictly on skilled listeners for those performances. In any case, on this ecological and
music and emergence 91
embodied view, music does not exist in an idea in the mind of the composer or in the
memory of a listener based on recordings or prior performances he has heard. Music
emerges in the skilled enactment of it on this occasion, played and sung by these musi-
cians, in this venue for this audience. Of course, it may happen, as in the “spontaneous
compositions and improvisations” of Charles Mingus or the “creative spontaneous compo-
sitions” of Steve Coleman, for example, that composer, performer, and auditor are
the same person. For an ecology of musical cognition, the problem of the ontology of
music is solved: music just is what emerges in our engaged listening and skilled enact-
ment of it (Neufield 2014).
In the film A Late Quartet, mentioned earlier, Peter (played by Christopher Walken),
the cellist and elder statesman in the group, relates the story of his encounter with the
great Pablo Casals for a master class of young musicians. As a young musician himself,
he played for Casals and, to his ear, performed miserably, but Casals inexplicably praised
him. Years later, as a mature professional, he chided Casals for what he said was Casals’s
insincerity so many years ago, and this time Casals grew angry. Didn’t you play this
figure, Casals asked, picking up his cello to demonstrate, with this fingering? It was a
novelty for me, he said. And didn’t you attack this phrase—again, demonstrating—with
an up bow? Casals emphasized the good stuff, Peter tells his students. He encouraged.
He wasn’t listening for the mistakes. The music in a piece of music is not missed when we
make mistakes. The music is missed when we fail to listen for what is emerging, what
turns up and wants to emerge, in that piece of music, when we fall short of bending our
ears and our skills to what affords us the chance to enact the music in a piece of music we
otherwise do not hear. What emerges in music is the music we make or, better, enact and
achieve by composing, listening, and especially playing skillfully what are, without our
engaged attention and refined skills, only patterns of sounds.
Notes
1. I conceive of this imagination as thoroughly embodied, as something felt about the fit of
this or that skill, as a form of affective cognition on the order of how the body feels about
attacking a snow-packed slope with a pair of skis or feels about the dish that can be made
from what is afforded by the refrigerator. Given what is afforded by an environment of
sound, the musician “imaginatively” feels the deployment of this or that skill will render
the most musical results. She has an embodied and affective sense of what to do with this
environment. This idea is developed in a paragraph below.
2. This account does not fall short of analytic specificity. It rethinks the mind as a continuous,
embodied engagement with its surrounds. It conceives of the imagination as inex
tricably caught up in perception, cognition, assertion, and action as well as with evaluations
of the ethical and aesthetic value of this imagination. All of these dimensions of embodied
cognition are present in different degrees in different engagements as they are afforded by
different environments and as those affordances turn up for the skills acquired and refined
by a particular embodied mind. The musician is an especially rich example of a continuity
of mind, body, and environment that is achieved by a skilled engagement with the music
afforded by sonic material.
92 john m. carvalho
3. This is not to say that the music has no content but only the affordances in an environment
of sound have no content prior to being skillfully engaged by the composer, performer or
listener. Once engaged, the music afforded by that environment acquires a content relative
to the specifically embodied skills deployed in that engagement. This content will be
shared in the same way skills are shared so that it is not surprising that the content the
composer enacts in a score is picked up as affordances and enacted as content by per-
formers for listeners who find affordances in performances for a shared content. Differences
in the embodiments of the skills acquired and refined by composers, performers, and listen-
ers as well as the particular conditions in which the music is enacted will account for dif-
ferent valences in the content on each occasion.
4. I thank Aili Brenahan, Marc Duby, Richard Eldridge, Enrique Morata, Manos Perrakis,
Martin E. Rosenberg, and Dylan van der Schyff, who read and commented on earlier
drafts of this paper.
5. By “modern” Barthes refers, as we see in more detail later, to a time after the discovery of
the unconscious, so from the late 18th century (in the work of Friedrich Schelling) to his
present day. Also, he is referring to a mode of listening and not to the modernity of what
we are listening to.
6. A piece of music heard at Stanford University many years ago as part of a recital by stu-
dents was composed for audience members to take the stage and use hammers and nails
provided to assemble random lengths of two-by-fours into unspecified arrangements. In
this piece, likely inspired by the work of La Monte Young, otherwise asynchronous sounds
and untampered pitches developed a rhythm and a tonal palette over time that we would,
today, attribute to a form of entrainment. We may suppose Barthes has such a phe-
nomenon in mind.
7. This secret is not the unconscious. It is rather the meaning or signification that the sign,
insofar as it is a sign, conceals, even as it points to it.
8. “Transference” refers to the psychoanalytic patient’s unconscious redirection of his own
affects toward the analyst.
9. It may be helpful, here, to distinguish this account of music making and appreciation from
a more traditional or standard story. It is often thought that the musician hears something
of the developing motif of the music she is making and makes music of the notes she is play-
ing or singing by her attention to this development. Memory is required, and the pattern the
music realizes in her performance is followed from recollection (Eldridge 2003, 132–133).
This makes music a mental activity heard, first, in the mind of the composer and performer,
then, communicated to the listener. On our account, there is only music in the enactment of
it from affordances picked up in the environment where that music is performed. For the
skilled musician those affordances are vast. They include the score, the instrument, past
performances by the musician herself and others, a note just played that was just slightly too
sharp, a page turned too quickly, what has been played by other musicians in the ensemble,
and on and on. The skilled musician also has a way of navigating all of these variables, effi-
ciently and creatively. She quickly cancels what will not contribute to her enacting the music
in this piece of music on this occasion. She will make it seem effortless, and when she is
successful she brings to her audience something more than a pattern of tones more or less
perfectly executed. She brings something of herself. She communicates something of the
affordances that turn up for her, and only her, in the performance of this piece of music.
10. In fact, Schaeffer posits four modes of listening: Écouter (attentive to the source of the
sound), Ouïr (nonattentive listening to the context of the sound), Entendre (selective
music and emergence 93
appreciation of the sound itself and its qualities) and Comprendre (attribution of a mean-
ing to the sound) (Schaeffer 1966). Tuuri and Eerola (2012) appear to have left out Ouïr in
their initial assessment of the literature but build it into their revised taxonomy.
11. Pierre Schaeffer’s musique concrète presents an interesting test in this context. Starting
with a type of reduced listening (écoute réduite), Schaeffer and those who followed him
attempt to find music in sounds dissociated from their sources in traditional musical
instrumentation and representation or abstraction in scores. On our terms, then, Schaeffer
and fellow electroacoustic practitioners have acquired and refined skills that allow music
to emerge from an environment of sound that is not restricted to the tradition of “serious”
music. If there is a difference between our view and theirs, it is in the volitional reduction
or bracketing of sounds from their source which is part of the skill set of electroacoustic
musicians. On our view, music emerges from an embodied engagement with the sound
environment. These intuitions, inspired by Simon Emmerson’s comments, deserve further
study elsewhere.
12. This is how it might be possible to speak about an unconscious in music but not a conscious-
ness. We tend to think of consciousness as some one thing, whatever it might be, whereas the
unconscious emerges from relations between elements, positively and negatively charged
cathexes, just as music might be said to emerge from patterns that turn up as melody, har-
mony, and rhythm in the notes played and sung without emerging as something we can
specify (see Freud 1953b). We make no claims here for an unconscious in music. Our aim is
rather to suggest an analogy between the listening of the analyst in the psychoanalytic
setting and the listening of the composer, performer, and auditor in the case of music.
13. See, for example, the interview by Don Kaplan, “Navigating a Single Note: The Kronos
Quartet’s David Harrington” at www.learningmusician.com/features/0107/DavidHarrington
(2007); “Interview with David Harrington of Kronos Quartet” at www.youtube.com/
watch?v=hxoF0wMb0Jc (July 2013); and “Spotlight on . . . David Harrington (Kronos
Quartet)” at www.youtube.com/watch?v=ibGTx4CY1VA (2013). Accessed October 5, 2017.
14. For the 2014 meeting of the American Society of Aesthetics in San Antonio, Texas, the
chamber ensemble SOLI performed and discussed their corroborations with living com-
posers. Working with Robert Xavier Rodríguez on Música, por un tiempo, the musicians
came to recommend that one of the movements be played at a tempo slightly modified
from how it was scored. Rodríguez agreed, and in that modified tempo performers
and composer together “found” the music in that score. We would say that composer and
performers, bending their ears to the environment of sound originally scored, engaged
what was afforded in that environment and contributed, through their recommendation,
to the emergence of the music in that environment by enacting it in a performance of that
music. Examples of music coming together in this way are the norm and not the exception
in the course of making music.
15. As featured in the film A Late Quartet (Yaron Silberman, US, 2012).
16. The relation between the scriptor and the receiver in music should be compared with what
Barthes elsewhere calls a “writerly” and “readerly” text (Barthes 1977).
17. The exception would be the swing bands of the 1940s headed by Count Basie, the Dorsey
brothers, Duke Ellington, Stan Kenton, and others.
18. The Music of Strangers (Morgan Neville, US, 2015) documents Yo-Yo Ma’s Silk Road
Project—a collective of musicians displaced by crises and brought together by a conviction
that music can make a difference in the world—with images of performers whose seemingly
noninstrumental flourishes are no doubt crucial to enacting or achieving the music they
94 john m. carvalho
References
Barthes, R. 1977. From Work to Text. In Image/Music/Text, translated by S. Heath, 155–164.
New York: Noonday.
Barthes, R. 1985a. Listening. In The Responsibility of Forms, translated by R. Howard, 245–260.
Berkeley: University of California Press.
Barthes, R. 1985b. Musica Practica. In The Responsibility of Forms, Translated by R. Howard,
26–66. Berkeley: University of California Press.
Barthes, R. 1985c. The Grain of the Voice. In The Responsibility of Forms, translated by
R. Howard, 267–277. Berkeley: University of California Press.
Chion, M. 1983. Guide des objets sonores: Pierre Schaeffer et la recherche musicale. Paris: Buchet
Chastel.
Chion, M. 1990. Audio-Vision: Sound on Screen. New York: Columbia University Press.
Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
Eldridge, R. 2003. Hegel on Music. In Hegel and the Arts, edited by S. Houlgate, 119–145.
Evanston, IL: Northwestern University Press.
Freud, S. 1953a. Beyond the Pleasure Principle. In The Standard Edition of the Complete
Psychological Works of Sigmund Freud, Vol. 18, translated by J. Stachey, 1–64. London: Hogarth.
Freud, S. 1953b. The Unconscious. In The Standard Edition of the Complete Psychological Works
of Sigmund Freud, Vol. 14, translated by J. Stachey, 159–215. London: Hogarth.
Freud, S. 1963. Recommendations for Physicians on the Psychoanalytic Method of Treatment.
In Therapy and Technique, translated by Joan Rivere, edited by P. Rieff 117–126. New York:
Collier Books.
Gibson, J. 1979. The Ecological Approach to Visual Perception. Hillsdale, NJ: Erlbaum.
Goehr, L. 2007. The Imaginary Museum of Musical Works: An Essay in the Philosophy of Music.
Oxford: Oxford University Press.
music and emergence 95
A ffor da nce s i n R e a l ,
V irt ua l , a n d
Im agi na ry M usica l
Per for m a nce
Marc Duby
Introduction
A growing body of literature (Clarke 2005; Barrett 2011, 2014; Krueger 2011, 2014;
Windsor 2011; Windsor and de Bezenac 2012) seeks to understand musical performance
98 marc duby
Musical imagery has often been viewed and considered as the ability to hear or
recreate sounds in the mind even when no audible sounds are present. However,
imagery as used by musicians involves not only the melodic and temporal contours
of music but also a sense of the physical movements required to perform the music,
a “view” of the score, instrument, or the space in which they are performing, and a
“feel” of the emotions and sensations a musician wishes to express in performance
as well as those experienced during an actual performance. (2011, 352)
actions as musicians use them in real or imagined performance, attending in the first
place to the physical movements that bring music (understood tout court as “organized
sound”) into being.
Learning to play an instrument necessitates a prolonged period of acquaintanceship
to acquire technical proficiency: for instance, to learn the gradations of pressure to apply
to a violin bow to produce a particular sound quality (Cumming 2000). Through physical
engagement with the task, long-term changes in brain plasticity ensue (Schlaug 2015),
facilitating and strengthening what Heft (2001) describes as “the mutuality between the
knower and the object known” (143). One way to approach this mutuality with regard to
musical instruments is to understand them as transducers (devices for converting one
form of energy into another). This is how Baily (1992) treats them:
So saying, Baily makes explicit the mechanisms whereby body movements are trans-
formed into audible patterns of organized sound through embodied engagements. In
broad sympathy with the fundamental principle that fingers and voices move air to
bring sound into being, it is also feasible to consider musical instruments as tools1 and,
in this regard, the concept of affordances provides an opportunity for understanding
musical instruments as real and imaginary tools and a starting point for exploring the
various configurations of human–musical instrument interfaces that this concept might
illuminate.
Such interfaces range in possibilities along a spectrum from directly embodied (as
in cases of a musician generating sounds in the moment) to more or less disembodied,
as in the case of the air guitar and its related cousin, the virtual air guitar (Karjalainen
et al. 2006). Virtual instruments, in which the performer’s actions generate MIDI data
from keyboards, drum controllers, or other sources (wind controllers, guitar synthesizers,
and so on) form a middle ground by introducing a degree of arbitrariness between
actions and outcomes. Such raw data does not by itself specify the intended sound
because the receiving device (generally a digital computer) treats the incoming binary
data as “pure” information, and not sound. The performer (or composer/arranger) is
then free to assign appropriate virtual instruments to translate these data into sound.
Through the Gibsonian idea of active touch (1962), musical instruments can be
understood as a specialized subset of tools with the potential to provide agents with real
and imaginary affordances (environmental opportunities for feedback-directed action
and learning). On this view, musical instruments afford playability just as different
surfaces variously afford climbability, stability, or concealment for different creatures.
So understood, as tools whose configurations have changed over time—whether real,
virtual, or entirely nonexistent (instruments of human construction, computer-based,
100 marc duby
The form of our mind is shaped by our handedness. The kind of mind we
exemplify is influenced by our possession of hands.
—McGinn (2015, 67, original emphasis)
Stanley Kubrick’s celebrated film 2001: A Space Odyssey (1968) opens with a scene from
prehistoric times known as “The Dawn of Man.”2 Shortly after a mysterious black mon-
olith of otherworldly origin is discovered by a group of apes on the African savannah,
Moonwatcher, the leader of the troop, grasps in a momentous instant the creative—and
destructive—potential of the bone he has to hand by using it to smash to pieces the skel-
eton from whence it came. For proto-man, from grasping the affordances of the animal
bone (as a weapon, a tool for waging war) to murdering the leader of the competing
troop is one small step, and as the conquering alpha male flings the weapon aloft in cele-
bration, the spinning bone morphs into a futuristic space station to the melodious
strains of The Blue Danube.
As Horn describes it (2015), “Kubrick’s segue from a triumphantly hurled tibia bone
weapon to a cylindrical Earth-orbiting satellite brilliantly encapsulated 4 million years
of technology into 10 seconds of film.” Through cross-fading the two events, Kubrick’s
imagery—by way of Arthur C. Clarke’s imagination—proclaims the direct links between
technology and a telescoped version of evolution.
After a combination of unforeseen factors such as prehistoric climate changes and
competition for scarce resources forced our ancestors to descend from the relative safety
of their arboreal environment (McGinn 2015), early man, as much prey as predator, was
forced by harsh and dangerous conditions to become a tool-maker. Harari (2014) notes
how, beginning approximately 2.5 million years ago, “evolutionary pressure brought
about an increasing concentration of nerves and finely tuned muscles in the palm and
fingers” (9). These evolutionary adaptations enabled humans to produce ever more
sophisticated tools, so that “the manufacture and use of tools are the criteria by which
archaeologists recognise ancient humans” (10).
Weapons such as arrows and spears enabled killing at a distance, so empowering the
hunters to attack larger prey and providing a means of self-defense against predator and
proto-human competitors. Anatomical developments such as larger brains, opposable
thumbs, and upright posture equipped our ancestors for the emergence of the new
world of the savannah, as Wallin (1991) argues:
affordances in real, virtual, and imaginary music 101
The upright posture presented new perspectives. To see and to hear was given a new
context. The anterior limbs were made free for communicative gestures, for making
and for using tools and instruments, for combining and comparing objects which
earlier had not had any obvious relation to each other, to support the balance during
that very specific locomotion, the dance, which was released by intense sound
sequences. (493, emphasis added)
Dynamic touch is a subsystem of the haptic perceptual system, and refers to perceiving
properties of hand-held and hand-wielded objects. During the process of wielding,
one can be aware of a variety of properties of the object being wielded such as length,
orientation, and heaviness. (751)
To remain with the example of tactile perception, imagine handling an object presently
out of sight. By turning it around in one’s hand and feeling its surfaces, contours,
102 marc duby
and edges, invariant properties of the object are revealed that specify its shape and
quite possibly its identity. The superiority of active over passive touch in shape
recognition is a robust experimental finding. (Heft 2001, 174)
The PA cycle (in which the notion of feedback—more precisely, degrees and types of
feedback—plays a vital role) provides a framework for comparing this experience with
those of virtual and nonexistent instruments, which also provide auditory feedback. In
the last two categories of instruments, tactile (haptic) feedback from the instrument is
unpredictable to a degree because of its mediation through MIDI (playing a MIDI guitar
may generate the sound of a saxophone, for argument’s sake) or absent altogether, as in
the case of air instruments.
To resolve the apparent category mistake4 implicit in terms such as “motor imagery”
and “auditory imagery,” one might consider the thought experiment of inverting such
terms so reframe “motor imagery” simply as imagined movement, and by corollary
auditory imagery as imagined sound. Far from mere verbal sleight of hand, this exercise
restores two perspectives: first, that imagining movement may not require intermediary
representations5 to be re-implemented in action, and second that the disciplinary
procedure of considering perceptual systems in isolation of necessity overlooks their
multimodal integration in complex organisms.
Creatures actively engaged in environments where parsimonious cognitive decisions
enable quick responses to systemic or local changes make sense of simultaneous multi-
sensory information available to the participant, whether deploying vestibular, visual,
auditory, haptic, or taste and smell, as per Gibson’s list of perceptual systems. As this
information becomes available to the organism by way of affordances, it becomes mean-
ingful, and therefore it seems plausible to consider the hand as a special case of a perceptual
system. Handedness, as McGinn’s epigraph proclaims, remains key to shaping the human
mind and will continue to do so as humankind and technologies continue to co-evolve. On
this view, contemporary hand-held technologies (such as mobile phones and tablets) may
well spark a new cognitive revolution as fingers and thumbs adapt to these new interfaces.
The increasing sophistication of digital computers has facilitated a wide range of pos-
sibilities for musical simulations. In this section, the discussion focuses chiefly on
affordances in real, virtual, and imaginary music 103
two aspects: the use of computing technology to simulate the behavior of analog
equipment (such as Hammond organs, Fender-Rhodes electric pianos, and guitar
amplifiers, to name a few) and its potential to recreate the soundscapes of iconic
recording studios. For a fee, the home studio enthusiast may purchase access to the
acoustic fingerprint of celebrated environments such as Abbey Road, including virtual
simulations of tape technology without the expense, weight, and inconvenience of the
original analog equipment.
The Fender Rhodes electric piano is a weighty beast whose signature sound has
graced many a recording, and it is no doubt convenient for home studios to have access
to this fingerprint without its less convenient aspects: likewise, the Hammond organ,
now in a virtual digital version where the recordist has access to its wide range of sonic
possibilities. More modern virtual instruments such as the Korg Wavestation consist
of the original digital synthesis programs used in the hardware unit, so porting over
its sonic capabilities to a computer platform. This is less a simulation than a shift of
environment from hardware (integrated circuits capable of various forms of digital
synthesis) to software, which “reads off ” the synthesizer’s behavior using the same
digital parameters as the original unit.
The advent of digital technologies has rendered virtual the spaces of costly recording
studios and their equipment, so that it is possible (at least in theory) for the recordist to
recreate the sound of these environments at home. Companies that offer software for
such simulations claim that these are meticulous physical models of their original
analog counterparts and some comparisons by reviewers seem to indicate a close degree
of similarity between the original environments and their software models.
Emulating the behavior and sound characteristics of analog equipment using digital
technology raises the problem of translating the nonlinear response characteristics of
analog technologies into the binary language used by computers, pointing to different
(if not incompatible) ways of encoding information. “Analog” implies fluctuations in
voltage as produced by changes in attack and volume of an electric musical instrument,
for argument’s sake, whereas digital technologies sample such voltage changes using a
bit stream of zeros and ones to take snapshots of the system state in infinitesimally
small increments.
The differences between these technologies are perhaps best exemplified by how they
handle distortion. Vacuum tube technologies found in “retro” guitar amplifiers produce
nonlinear effects (harmonic distortion) when overdriven, as opposed to digital clipping
in the case of a computer, which compromises the integrity of the original signal.
In other words, unlike analog equipment, it is impossible to overdrive a computer in a
musically pleasing way6.
The brave new world of the digital computer has certainly simplified the task of
precision editing. Compare, for argument’s sake, Pierre Schaeffer’s labor-intensive tasks
in producing his original musique concrète compositions to the relative ease with which
such tasks can be performed in the digital domain. As Schaeffer (2012) notes, the tech-
nological options of the time incorporated an element of risk insofar as their results
were unpredictable:
104 marc duby
A movement of the bow responds with dignity to the composer’s notations, to the
conductor’s baton. But the effects of a turn of a handle on the gramophone, an
adjustment of the potentiometer, are unpredictable—or at least we can’t predict
them yet. And so we reel dizzily between fumbling manipulations and erratic effects,
going from the banal to the bizarre. (79)
Consider the exercise of playing a sound backward. Using the available technology of
Schaeffer’s time, this would entail cutting a length of magnetic tape, resplicing and
reversing it so that the information was read as back to front by the playback head. To
replicate this effect in the digital domain, the recordist simply changes the order of the
sample data, so that the direction of the bit stream is reversed and the computer reads
the data from back to front to produce the desired effect. As Doornbusch and Shill
(2014) note, the concept of affordances also finds application in the field of digital audio
editing: “Having audio available in a digital form can be said to be an affordance to the
editing and manipulation of that audio” (27). The advent of the digital computer has
loosened the centuries-old bonds between sound production and its outcomes, so
paving the way for new methods such as granular synthesis, of which Opie writes (2015),
“If you want to do it the original Xenakis way you will need a reel to reel tape recorder, a
razor blade, sticky tape, and a lot of time.”
Physical modeling as a synthesis technique attempts to recreate the sonic behavior of
musical instruments by simulating their physical characteristics. According to Hind
(2016), “With physical modeling it is the actual physics of the instrument and its playing
technique which are modelled by the computer.” From physical modeling, the designers
of the Virtual Air Guitar adopted the Karplus-Strong algorithm, “a computationally
efficient digital wave-guide algorithm for modeling the guitar string as a single-delay
loop filter structure with parametric control of the fundamental frequency and losses in
the filter loop” (Karjalainen et al. 2006, 965). Their aim was to create “a pleasant rock
guitar sound experience” (969) (complete with a simulated vacuum tube amplifier and
distortion) where the player has a degree of control over the outcome as opposed to the
“schizophonic”7 experience of popular video games such as Guitar Hero (Miller 2009;
Katz 2012), where the would-be guitarist has to deal with a controller interface, a plastic
instrument without strings that is modeled on a Fender Stratocaster.
Miller (2009) extends Schafer’s original concept to what she terms “schizophonic per-
formance,” in which clear lines dividing live and recorded performance are blurred “by
combining the physical gestures of live musical performance with previously recorded
sound” (401). Her nuanced work raises vital questions regarding notions of authenticity
and identity, specifically that of the rock guitarist. She writes: “By giving players an immer-
sive gaming experience filled with rock-oriented cues—including the musical repertoire,
archetypal rock-star avatars, a responsive crowd, a guitar-shaped controller, and physical
performance cues—Harmonix [the designers of Guitar Hero] encouraged them to adopt a
rock-star identity.” In this respect, Katz (2012) notes how “digital music technologies—in
the form of video games and mobile phone applications—challenge traditional notions of
musicianship and amateurism” (460). Rather than framing such technologies as less than
affordances in real, virtual, and imaginary music 105
In this shifting musical setting, rock guitar has assumed an almost “tradi-
tionalist” aura for many audiences and musicians, encased in a nostalgia for
past forms that in previous eras was reserved for more folk-based styles of
expression. At the same time, though, rock guitar has moved into more
hybridized contexts wherein the polarity between analog and digital, electric
and electronic musicianship, becomes the basis for creative fusion.
—Waksman (2003, 131)
What do the gestures of air guitar enthusiasts mean for theories of embodied cognition?
According to Godøy (2006), it is not uncommon
to see people making sound-producing gestures such as playing air drums, air guitar,
or air piano when listening to music. Our observation studies of people with different
levels of expertise, ranging from novices with no musical training to professional
musicians, playing air piano, seem to suggest that associations of sound with sound-
producing gestures is common and also quite robust even for novices. (155)
What Godøy proposes is a direct link between sound and gesture that applies equally to
all musically inclined participants; that is, independently of the expertise required to be
a professional musician. By connecting instrumental sounds to the imaginary actions
required to bring them to life, Godøy provides a lens through which to examine the
nature of cognition as embodied action. In the case of the air guitar, this is a lens that
refracts light, whose resulting picture bends reality and challenges comfortable
assumptions about human musicking. At first glance, it might seem patently absurd to
spend time discussing such an extreme case as that of the air guitar, a phenomenon
one might want easily to dismiss rather like the notorious case of the pop duo Milli
Vanilli, whom the media unmasked as charlatans after it was discovered that they had
employed session singers on their hit record and mimed onstage to a prerecorded
backtrack.8 Popular and critical outrage at their inauthentic performance tactics
prompted the withdrawal of their awards, and subsequently their career took a disas-
trous turn into obscurity.
106 marc duby
Godøy (2004) elsewhere argues for profound links between sound and gesture
maintaining that,
Air Guitar is all about surrendering to the music without having an actual instrument.
Anyone can taste rock stardom by playing the Air Guitar. No equipment is needed, and
there is no requirement for any specific place or special skills. In Air Guitar playing all
people are equal regardless of race, gender, age, social status or sexual orientation.
The procedure is for contestants to submit a one-minute audio clip for miming to, to be
played over a “big sound system,” with the jury criteria listed as: “Originality, the ability
to be taken over by the music, stage presence, technical merit, artistic impression and
airness.” In the unlikely case of mystification, the last criterion (“airness”) is defined in
the rules of the US Air Guitar Championships (http://usairguitar.com/rules2/) as “the
extent to which a performance transcends the imitation of a real guitar and becomes an
art form in and of itself.” Hutchinson (2016) understands airness as “a term meant to
pinpoint some of those ineffable qualities that transform a competent performance into
a truly great one” (416). The notion of congruency in pantomime also comes into play in
defining the criteria for technical merit. As the site claims, “You don’t have to know what
notes you’re playing, but the more your invisible fretwork corresponds to the music
that’s playing, the better the performance.”
The air guitar phenomenon has inspired a number of websites that specialize in the
sales and marketing of these invisible instruments, with Dimitri’s Air Guitars in Sydney,
Australia, a front-runner (http://www.air-guitars.net/home/about.html). Here it is possible
to order not only electric, acoustic, and bass air guitars, but useful—if not essential—
accessories such as air strings and plectrums. Learning the right moves for the budding
air guitar practitioner entails imitation, defined by Buccino and colleagues (2004) as
“the capacity of individuals to learn to do an action from seeing it done. Imitation implies
learning and requires a transformation of a seen action into an ideally identical motor
action done by the observer” (323, original emphases).
As Heine van der Walt (aka Lord Wolmer, the South African air guitar champion in
the late 2000s) describes his learning process, he drew his original inspiration from
watching VHS tapes of bands like Iron Maiden and Megadeth: “My elder brothers were
in high school and they would watch these tapes, and there’s me, about 6 years old,
staring. As I watched the bands I thought ‘Man, that must be the best job in the world’ ”
(AuntyNexus 2014). With the credentials of a professional touring guitarist, van der
Walt might be thought to have possessed an unfair advantage over so-called nonmusicians.
Regarding his time in the contest, he notes that the number of professional musicians
taking part increased from around 30 to 50 percent.
Table 5.1 consists of a summary of the three categories of instruments in this chapter.
It forms a matrix with porous boundaries as opposed to discrete categories, and the
distinctions between the categories are understood as fluid and dynamic. In this
light, it is perhaps best understood as encompassing degrees of hybridity, because
music fields such as jazz, Western art music, karaoke, DJ’ing, videogames, and even
imaginary metal (as in the repertoire of my informant), all avail themselves of evolving
technologies of performance and improvements in interfaces (in short, affordances)
to bring musical sounds to life. Moreover, the porosity of these genres allows for
108 marc duby
Table 5.1 Comparing Performance with Three Different Interfaces (Real, Virtual,
and Air)
Real (live performance) Virtual (live or recorded) Air (recorded)
As opposed to the official feast, one might say that the carnival celebrated tempo-
rary liberation from the prevailing truth and from the established order; it marked
the suspension of all hierarchical rank, privileges, norms, and prohibitions. Carnival
was the true feast of time, the feast of becoming, change, and renewal. (45)
Miller (2009) astutely connects the ludic aspect of guitar-oriented videogames to self-
satirizing genres such as pantomime, high camp, and kitsch, stating that, in the case of
Rock Band, “players not only choose the gender, body type, clothes, and instruments
for their avatars but also must select a physical performance style, choosing from
rock, punk, metal, and goth ‘attitudes’ that govern the avatar’s physical mannerisms,
stance, and affect” (421). As a result of the disconnect between performance and outcomes,
Miller contends, “The games invite players to make a spectacle of themselves” (421).
Yes, indeed.
If we accept this premise as the fundamental basis for cognition, then the motor actions
of professional musicians involve the acquisition over time of an exacting level of precise
control as required for the execution of highly complex music at virtuoso level. Consider
in this regard the actual changes in wiring in the corpus callosum (responsible for
coordination between left and right “sides” of the body: see, for instance, Sacks 2011;
Koelsch 2012) in professional musicians, such that Sacks claims that musicians’ brains
are distinguishable at the physical level (i.e., dissection) from those of other occu
pations. The corpus callosum rewires itself exactly because it is bound up in precise control
and coordination exemplified by the performance of music at the outer limits of human
possibility by virtue of its technical difficulty, its inherent structure (Satie’s Vexations,
with its 840 repetitions lasting between eight and 24 hours to perform11), or its demands
for synchronization, such as music by Pierre Boulez the author witnessed in performance
in which not least of the technical demands was to play such difficult music in time with
the rest of the ensemble.
If, as Sheets-Johnstone proposes, consciousness of self and others springs from our
own animation in the gradual process of individuation, it is true that the realm of the
sensorimotor has largely been ignored by conventional cognitive science, as she avers.
The fact is that musicians learn not only how to move but also how to limit movement
for economy’s sake (so conserving energy) as well as deploying off-line resources such as
mental rehearsal to enhance performance in the case of athletes, dancers, musicians,
110 marc duby
and so on. For Cook (1992), “[b]eing able to play the piano is a matter not so much of
mastering the actions required in performance as of knowing how to organize them into
a coherent motor sequence” (75). Beilock and Lyons (2015) ground their discussion on
expert performance in experts’ greater propensities for off-line preparation, a kind of
motor visualization of imagined actions/procedures supported by clinical evidence.
Between player and instruments exists a reciprocity through which each is gradually
transformed; this is how constant friction wears the original varnish down to bare
wood so that instruments acquire a patina over time, and how, in turn, such instruments
become priceless. Much more than mere tools, instruments provide avenues for
self-expression, real and imaginative affordances for creativity, and may contribute to
the enactment of a temporary sense of community among participants. As Peretz (2006)
states it, a key purpose of music and dance is to “enhance cooperation and educate the
emotions and the senses. It is a form of communion whose adaptive function is to
generate greater sensory awareness and social cooperation” (24).
One of the founders of the field of artificial intelligence maintains that “mobility,
acute vision and the ability to carry out survival-related tasks in a dynamic environment
provide a necessary basis for the development of true intelligence” (Brooks 1991, 141).
His view aims to dispense with representations as intermediaries between body and
mind in favor of direct perception and action within a robotic environment. For Brooks,
a robotic agent displays intelligence by simply acting and has no need for an internal
map inside her silicon head. Windsor and de Bezenac (2012) point to the incompatibility
of notions like representations with an ecological approach, claiming that:
Ecological approaches do not sit well with discussions of imagery and representation,
however situated or embodied these discussions may be. Although our attempt to
stretch affordances to cover a wide range of behaviours may appear speculative in
some instances, we have intentionally chosen to avoid falling back on mental pro-
cesses and representations as an explanation of behaviour in order to test how well
the concept can be extended, and we would expect that such hypotheses should
attract further empirical as well as philosophical investigation. (116)
So, instead of giving in to such cravings, Chemero (2016) enjoins his readers to consider
the explanatory force of sensorimotor empathy as the “implicit, sometimes unin
tentional, skilful perceptual and motor coordination with objects and other people” (138).
This concept seems well suited to understanding the affordances of musical performance
as real-time sociocultural phenomena, without necessarily invoking representations as
intermediaries in such circumstances.
Acknowledgments
This material is based on work supported financially by the National Research Foundation
of South Africa. Any opinion, findings, and conclusions or recommendations expressed in
this material are those of the author(s), and therefore the NRF does not accept any liability
thereto.
Notes
1. It is worthwhile to note how composers have tended to push against the limits and con-
ventions of musical creativity by harnessing new extended instrumental techniques in
performance, so extending concepts of what musical instruments can be employed to do.
2. “Synopsis for 2001: A Space Odyssey,” 2016. http://www.imdb.com/title/tt0062622/
synopsis?ref_=tt_stry_pl. Accessed July 25, 2016.
3. “Man is known by his artifacts. He is an artisan, an artificer, an employer of the arts, an
artist, and a creator of art. Beginning with tools and fire and speech, the ‘tripod of culture,’
he went on to making pictures and images, then to the exploitation of plants and animals,
then to the exchange of goods for money, and finally to the invention of writing”
(Gibson 1968, 27).
4. The category mistake comes from Gilbert Ryle, whose example is a visitor to Oxford
who is in search of “the University,” misunderstanding the abstract nature of this term
for the agglomeration of buildings and personnel who staff it. The contention is that terms
such as “motor imagery” are problematic because they conflate two different modalities
of perception.
5. As Decety and Stevens put it (2015), “Because motor representation inherently involves
aspects of both body and mind, it presents as the most obvious candidate for wedding this
dichotomy” (3) [that is, that between body and mind]. No such dichotomy exists in eco-
logical psychology, which, as noted, insists on the mutuality between agent and
environment.
6. Unison technology as employed by the California-based audio company Universal Audio
offers a solution to this problem by allowing the digital information of the computer to
change the impedance characteristics of the interface, so modeling the behavior of an ana-
log channel strip.
7. Miller (2009) defines “schizophonic” as “R. Murray Schafer’s term for the split between a
sound and its source, made possible by recording technology” (400).
8. In fact, this kind of technological sleight of hand is fairly routine within the music industry.
Consider auto-tuning software, for instance, which purports to correct bad intonation, or
112 marc duby
the wide range of “sweetening” (audio processing such as compression, EQ, and reverb)
treatments employed routinely as production techniques.
9. Speaking of strings, Rufus Reid (1974) exhorts the apprentice jazz double bass player to
develop calluses to acquire a stylistically idiomatic pizzicato sound. These are constituted
over time through friction between skin, metal strings, and wooden fingerboard. The
instrument gives and takes over time: witness the devastating psychological effects for
professional musicians of being incapacitated and unable to play. Such strokes of cruel
misfortune sunder performers from their professional and artistic selves and steal from
them a vital raison d’être.
10. M. Duby, Skype Interview: Heine van Der Walt. Pretoria, South Africa, September 29, 2016.
11. According to Sweet (2013), “An Australian pianist named Peter Evans abandoned a 1970
solo performance after five hundred and ninety-five repetitions because he claimed he was
being overtaken by evil thoughts and noticed strange creatures emerging from the sheet
music. ‘People who play it do so at their own peril,’ he said afterward.”
References
AuntyNexus. 2014. Space Has Never Held Such Terror: Boargazm Interview.
http://metal4africa.com/interviews/space-has-never-held-such-terror-boargazm-interview/.
Accessed April 6, 2017.
Baily, J. 1992. Music Performance, Motor Structure, and Cognitive Models. In European
Studies in Ethnomusicology: Historical Developments and Recent Trends: Selected Papers
Presented at the VIIth European Seminar in Ethnomusicology, Berlin, October 1–6, 1990, edited
by M. P. Baumann, A. Simon, and U. Wegner, 142–158. Wilhelmshaven: F. Noetzel.
Bakhtin, M. 1998. Rabelais and His World (1940), Mikhail Bakhtin. In Literary Theory: An
Anthology, rev. ed., edited by J. Rivkin and M. Ryan, 45–51. Oxford: Blackwell.
Barrett, M. S. 2011. Troubling the Creative Imaginary: Some Possibilities of Ecological Thinking
for Music and Learning. In Musical Imaginations: Multidisciplinary Perspectives on Creativity,
Performance and Perception, edited by D. J. Hargreaves, D. Miell, and R. MacDonald, 45–66.
Oxford: Oxford Scholarship Online. doi:10.1093/acprof.
Barrett, M. S., ed. 2014. Collaborative Creative Thought and Practice in Music. SEMPRE Studies
in the Psychology of Music. Farnham, UK: Ashgate.
Beilock, S. L., and I. M. Lyons. 2015. Expertise and the Mental Simulation of Actions. In
Handbook of Imagination and Mental Simulation, edited by K. D. Markman, W. M. P. Klein,
and J. A. Suhr, 139–159. New York: Psychology Press.
Brooks, R. A. 1991. Intelligence without Representation. Artificial Intelligence 47: 139–159.
doi:10.1016/0004-3702(91)90053-M.
Buccino, G., S. Vogt, A. Ritzl, G. R. Fink, K. Zilles, H. J. Freund, et al. 2004. Neural Circuits
Underlying Imitation Learning of Hand Actions: An Event-Related fMRI Study. Neuron
42 (2): 323–334. doi:10.1016/S0896-6273(04)00181-3.
Chemero, A. 2016. Sensorimotor Empathy. Journal of Consciousness Studies 5: 138–152.
Clark, T., A. Williamon, and A. Aksentijevic. 2011. Musical Imagery and Imagination: The
Function, Measurement, and Application of Imagery Skills for Performance. In Musical
Imaginations: Multidisciplinary Perspectives on Creativity, Performance and Perception,
edited by D. J. Hargreaves, D. Miell, and R. MacDonald: 45–66. Oxford: Oxford Scholarship
Online. doi:10.1093/acprof.
Clarke, E. F. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
affordances in real, virtual, and imaginary music 113
SYST E M S A N D
T E C H NOL O GI E S
chapter 6
System ic
A bstr actions
The Imaginary Regime
Martin Knakkergaard
Introduction
Interval
At the core of the organization of music is the interval. No matter how we look at music,
the interval holds a central position as the essential building block and reference. It is by
the presence and use of defined intervals that music is distinguishable from other forms
of expressive sonorous (art) forms such as sound art and speech. Sound art and speech
obviously contain intervals (it is impossible not to), but these intervals are, contrary
to music, not tied to and dependent on specific, rigorous codes, although the use of
systemic abstractions: the imaginary regime 119
pitch-differences in speech, just like variations in dynamics and tempo, carries a lot
of information—in many languages they actually alter the meaning of words and
sentences—and of course can obtain a decisive function in sound art as well. Any interval
is valid in sound art and speech, intervals—no matter whether they are rhythm or pitch
intervals—are tied to language, dialect, situation, age, and so forth, and are more or less
unique to the person speaking (see Yasar, volume 1, chapter 21). Using specific, regular
intervals will either turn speech to song or maybe signal that the person in question is
not quite well. With regard to rhythm intervals or beats, as Oliver Sacks states (quoting
A. D. Patel), “The perception of synchronization of beat, Patel feels, ‘is an aspect of
rhythm that appears to be unique to music . . . and cannot be explained as a by-product
of linguistic rhythm’ ” (2006, 243).
Regarding music, we think in intervals just as we hear in—and listen for—intervals,
and the concrete intervals are set against an immanent systematization that is given in
advance depending on culture, epoch, style, and genre. From early on, the brain is sim
ply trained to listen for and respond to patterns of intervals that fall within certain, definite
matrixes, typically referred to as tone systems. These systems are not universal but are
culturally specific, which implies that what, from the side of the receiver, is acknowledged
as meaningful musical utterances is dependent on the sonorous pattern’s reference to
the tone system, just as what can be imagined from the perspective of the sender is guided
by, and in a way confined by, proportions of the tone system. Again, whether we are refer
ring to pitch or time intervals is not important: in practice, they are both always present
as long as pitch paradoxically is also allowed to include nonpitched or inharmonic
sound such as noise and many drum sounds. In this chapter I will, however, concentrate
almost entirely on pitch intervals.
The modern organization of pitch is not just a system of intervals but a finite system of
tones or pitches of fixed frequencies, albeit the determination of the frequencies—the
tuning—is relative to whatever concert pitch is currently decided (today, typically
a = 440–444 Hz). In comparison, neumes, which were in use for centuries in the Middle
Ages, only indicate relative intervals,1 and often ornamental implications too, but carry
no information about pitch (or rhythm and duration for that matter); the neume virga,
for instance, indicates a tone that is higher than both or one of the surrounding tones,
whereas clivis indicates two tones where the first is the highest and porrectus indicates
the tone sequence high-low-high, and so on. Neumes were thus primarily a descriptive
and mnemotechnical tool, to support learning and performance of music, and it was
completely dependent on the user’s imagination, experience, and acquaintance with the
musical practices and the tone system of the time. Similarly, pitches within the fixed
notation of Gamelan music today are only relative, as they vary “considerably from one
gamelan to the next, both in absolute pitch and in relative size of intervals” (Brinner 2017),
implying that instruments of whole ensembles are not necessarily “in tune” with one
another. In other words, whereas the intervals of Western music of today are dependent
on abstract, fixed proportions, this is not at all a universal—or an ahistorical—situation,
and even though the music theory of the Middle Ages was closely tied to a strict systemati
zation founded on Pythagorean principles and idealization, the performance of the
music relied on oral practices.
120 martin knakkergaard
The important truths about music were to be found instead in its harmonious reflec
tion of number, which was ultimate reality. As a mere temporal manifestation, the
employment of this harmonious structure in actual pieces of music was of decidedly
secondary interest. (Mathiesen 2017)
And, in the Greek’s “new sensitivity to order and form” (Burkert 1962, quoted in
Sundberg 1980, 21, my translation), the numbers provided a way to escape materiality’s
grip, as numbers “are present in the things with a reality-constructing and determining
function” (Sundberg 1980, 21, my translation).
The Ancient Greeks’ use of numbers in the generation of the tone system, where the
octave is defined by the relation 2:1, the fifth as 3:2 and the fourth as 4:3, is a direct result
of their general favoring of the musica universalis (also referred to as the harmony of the
spheres) comprising the small integers 1-2-3-4. With these four numbers, they also con
structed the triangular tetractys that, besides constraining the above ratios, also comprises
10, the sum of the four numbers and the core of the decimal system. Even though the
fundamental ratios of the tone system were supported empirically by experiments with
proportional divisions of the string of a monochord, the Greeks abstained from con
tinuing the sequence of ratios further, which, for instance, based on an additional readout
from the monochord, would have produced the major third by the ratio 5:4 (see later).
They insisted on “explaining” the harmonic proportions from within the reach of the
tetractys because the number 4 was considered to be sacred as it was observable every
where: nature and matter was made up of the four elements, there were four seasons,
four directions, four ages, and so on, and these numbers were expected to be present
in all proportions, natural as well as mystical, that warranted cosmic correspondence,
balance, and coherence. The scientific axiom that is linked to this particular use of num
bers in relation to music is, in other words, rooted in a sort of mythical understanding of
unity, simplicity, and balance as a ruling principle (see Klempe 1991; Sundberg 1980;
Knakkergaard 2016a).
Diatonic Scaling
Even though there are practically unlimited ways to divide audible sound into separate
pitches (see later), the Western practice is initially to maintain the octave as an identity
systemic abstractions: the imaginary regime 121
2 4
1 e f g a 3 e’ f’ g’ a’
A B c d e b c’ d’ e’
a bb c’ d’
5
Figure 6.1 The construction of a (double) diatonic tone system with a slight chromatic
element.
or unison interval and to partition this octave into a scale of 7 discrete steps. The scale
comprises 5 (whole) tones (T) and 2 semitones (S), often represented as this series
T-T-S-T-T-T-S, thus forming the scale we today refer to as “major.” Originally, the gen
erative structure behind this scale is the tetrachord—literally “four strings”—and the
partition of the octave is rooted in the combination of five equally constructed groups of
four tones within the compass of a fourth, called diatonic tetrachords (Figure 6.1). To
the Greeks, these five tetrachords together formed a complete tone system, and again
“against this tetrachordial thinking the fourth interval stands out as «das Lieblingskind
der griechischen Theorie» (the lovechild of Greek theory)” (Handschin 1948, quoted in
Sundberg 1980, my translation).
The internal proportions of the tetrachord were constructed2 on the basis of the three
fundamental intervals whose ratios, as mentioned earlier, are contained within the
tetractys: the octave, the fifth, and the fourth. Setting the octave to 12, the octave (2:1) below
will be 6, the fourth (4:3) below will be 9 and the fifth (3:2) below will be 8, and the pro
cedure thus reveals the interval 9:8, namely, between the fourth and the fifth. The Greeks
called this interval “tonos” (≈ “tension”), which was “considered to be the fundamental
tone-step” (Sundberg 1980, 112, my translation) and the Pythagoreans used it to “fill out”
the gaps (downward) between the fundamental intervals—see d and c in tetrachord 1
and f and g in tetrachord 2 in Figure 6.1.3
The tetrachordal understanding of this early diatonic scale came to dominate in
Western music theory until the eleventh century, when a hexachordal understanding
gradually took over, along with the development of Guidonian notation and the
mnemotechnical system called the Guidonean hand. The use of the hexachord as a par
ticular unit was most of all a pragmatic move, not a theoretical one. Its primary aim was
to ease and support the learning of vocal music, and it formed an intelligent approach to
the adoption of the proportions of the diatonic system of the time by the introduction of
the fixed format T-T-S-T-T, vocalized as ut-re-mi-fa-sol-la. Furthermore, it made it easy
to distinguish between the three different versions, the natural, the hard (durum), and
the soft (molle), according to their note placement, respectively c, g, and f, thus allowing
for the use of both b and bb (see Figure 6.2).
The tone-step was maintained as a fundamental proportion, and so the introduction
of the hexachord did not weaken the diatonic scale’s dominant position. Quite the con
trary. By replacing the much more ambiguous neumes, the innovation in practice not only
consolidated the scale but also indicated the transition from an informed descriptive
122 martin knakkergaard
Figure 6.2 The possible hexachords within the tone system of the Middle Ages, the gamut.
Interface
Depending on the needs and ambitions at a given time, the development of musical
instruments and the generation of tone and notation systems have to interconnect in a
process of synthesis in which they can be refined dialectically. The innovations and
changes that vocal music went through following the breakthrough of the radical new
kind of polyphony that Ars Nova brought about in the fourteenth century gradually trans
formed music from an almost entirely linearly organized art form into a musical practice
that included increased attention toward the vertical parameter as well. This development
initiated the emergence of instrumental music, as a separate trajectory, and eventually
also paved the way for the emergence of the triad, which later made up a central element
in the constitution of the “functional harmony” mentioned earlier.
The central premise for the triad to emerge was the inclusion of the third as a consonant
interval. The first traces of this process go back to the twelfth century, in which theore
ticians began to refer to the purely empirical observations, secundum auditum (“by the
ear”), that were achieved when practicing and performing vocal music that displayed
polyphonic implications (Hansen 1995, 58). This development eventually led to reconsider
ations regarding the relation between consonance and dissonance, which did not
simply imply the inclusion of the third as a consonant interval but also, in fact, formed a
124 martin knakkergaard
break with Pythagorean theory because the third’s numeric ratio fell “outside the range
of the tetractys, which by the Pythagoreans assumedly had to be given some sig
nificance” (Sundberg 1980, 107). As they did not want to exceed the restriction of 4,4
they consequently conceived of the major third by moving four fifths up and two octaves
3
2 4 81
= .
down:
4 64
In Ramos de Pareja’s treatise Musica Practica from 1482, the interval of the third is, for
the first time, described by the ratio 5:4 (today known as the “pure major third”), and
some seventy-six years later Zarlino
was the first also to include in his concept of harmony triads consisting of 5ths and
3rds; this he was able to do because, besides perfect consonances, he defined imperfect
ones—the 3rds—by means of simple, “harmonic” numerical proportions (5:4 and 6:5)
rather than by the complicated Pythagorean proportions. (Dahlhaus 2017)
From the title of his treatise, it is evident that Ramos de Pareja was concerned with
music in practice: music in its performance. Thus, his theoretical effort was apparently
motivated by the need to overcome the divide between music in theory and music
in practice that the development of polyphony in particular had led to, or, rather, had
uncovered. Just as singers and choirs in practice had probably been singing secundum
auditum all along, ensuring that intervals other than octaves, fifths, and fourths sounded
in consonance, they probably had never had any problems in “transposing” or “shifting”
from one central pitch to another (cf., hexachord, earlier)—they simply maintained the
internal proportions of the scale by ear—this obviously was, however, not the case
regarding the musical instruments of the time. The instruments were laid-out and tuned
according to the principles of the Pythagoreans, to whom “harmonia”—as already
implied—was a question of codifying the (diatonic) scale and “the relationship between
those notes that constituted the framework of the tonal system” (Dahlhaus 2017) and not
the theoretical or acoustic harmony of simultaneously sounding intervals. Consequently,
instruments with fixed steps like, for instance, the organs and the early clavicembalo of
the Middle Ages, were not capable of producing triads that sounded satisfactory. The
thirds and sixths of the Pythagorean scale
do not meet medieval and Renaissance criteria of consonance implied by such terms
as “perfection” and “unity.” When used as harmonic intervals these Pythagorean
3rds and 6ths are likely to be characterized, on an organ Diapason stop for example,
by rather prominent beats; middle C–E or C–A beat more than 16 times per second
at modern concert pitch. (Lindley 2017)
By 1600, the development of the tone system had found its final form with the division
of the octave into twelve intervals or half-steps—however, the question of tuning was
not solved at this time (and remains still unsolved to a certain degree). Equal temperament,
systemic abstractions: the imaginary regime 125
which was suggested by Vincenzo Galilei in 1584, where the semitone is defined as 12 2 , is
a compromise that is not fully musically satisfying. Just or harmonic tuning, that is
based on the ratios of small integers as suggested by Parejo and Zarlino, is more pleasing
to the ear, and is typically used by vocals and strings in ensembles where the performers
adjust pitch with each other by the ear (secundum auditum). However, just tuning only
works ideally for one scale at a time, hence the tuning of, for example, D major is not the
same as that of E major. Today, however, music software programs such as Apple’s Logic
Pro X and Steinberg’s Cubase include Hermode tuning, which is capable of adjusting
simultaneously sounding tones of electronic instruments in real time to accommodate
to just intonation without compromising equal temperament as the overarching tuning.
It is, however, interesting to consider that, by taking one’s point of departure in the
diatonic scale’s combination of tones and semitones,
there is nothing that prevents the whole tone from being more than twice as large or
less than twice as high as the halftone. If a tone system’s minimum interval equals m,
then the system using the fewest tones, is the one whose semitone is m and whole
tone 2m. This gives in all 2 × m + 5 × 2m = 12m, hence 12 tones per octave.”
(Hansen 2003, 1645, my translation)
However, if the half tone is set to 2m and the whole tone to 3m we get 19 tones within
the octave, which, compared with the 12-tone system, is slightly better or more precise in
terms of pure intervals.
While it nevertheless was the 12-tone system that prevailed, it is probably due to
economic and technical performative advantages, and maybe to the fact that
Western European music preferred to live with too large major thirds (implying
small semitones and thus sharp leading notes) than major thirds that are smaller
than the pure. (Hansen 2003, 1645, my translation)
This 19-tone system has not been abandoned altogether—the 19-tone-guitar is for
instance currently available5—but, from this time on, the system of 12 intervals per
octave has been the absolute dominant standard. Even though many musical instruments
can be traced back much further—and very often to non-Western cultures and coun
tries—most of the ones that exist today have been refined and developed to reach their
current shape and level of perfection within the last 300 years in order to comply with
the 12-interval tone-system. The piano and its well-known keyboard lay-out is in many
ways the epitome of a modern musical instrument, being, as it is, the most commonly used
instrument for music teaching and demonstration because it provides a generally objec
tive interface whose design is easily understood and is capable of producing many
simultaneously sounding tones. And although its tuning typically is fixed, it is possible
to alter the tuning if it is the original, acoustic instrument. Such a practice was actually
very often the case in the seventeenth century where there were many candidates for the
new tuning system that the new fully chromatic tone system required, and it was not
126 martin knakkergaard
Interaction
Until now, all references to musical instruments in this chapter have signified traditional
acoustic music instruments. During the first half of the twentieth century, however, a
number of electrophones were introduced, such as the Hammond organ, the electric
guitar, and the first synthesizers, and some of these instruments came to change and
expand the concept of the interface and its implications. Among the synthesizers, the
Ondes Martenot and the Theremin—which are both monophonic or one-note-at-a-time
instruments—introduced new kinds of step-less interfaces (in the case of the Ondes
systemic abstractions: the imaginary regime 127
computers of the time, carried out experiments with digitally produced—and partly
generated—music and sound. The application of digital technology did not just imply
an expansion of available interfaces (digital technology’s physical interfaces that are
comparable with traditional music instruments in fact came somewhat later) but addi
tionally offered new ways to control and interact with musical sound as well as new
models for musical shaping. From the start, this was only carried out on a very limited
scale since it could only be done by using mainframe computers and more or less stan
dardized command line programming. But, following the introduction of MIDI, a
standard protocol for the digital control of musical events, in the beginning of the 1980s,
together with the rapidly growing propagation of microcomputers and the development
of “graphical programming” in the same decade, the digital interface became quite
ubiquitous and so did various kinds of interaction that exceeded the, strictly speaking,
very definite forms of interactions that are possible by means of traditional musical
instruments (acoustic as well as electric). Today, computers, digital audio, and MIDI—
in the form of a vast number of music and sound applications of which many are spe
cifically aimed toward particular uses, interests, and music—together have become a
more or less dominating factor and reference, making the tone system, along with the
matching notation system, accessible from a plethora of digital sources.
One of the most baffling elements that this development has brought with it is the
unification of the three separate interdependent systems—or technologies—discussed so
far: the tone system, the notation system, and the interface technology, namely, the instru
ments (see also Dyndahl, volume 1, chapter 10, and Danielsen, this volume, chapter 29).
Digital technology in the form of MIDI integrates the three systems in such a way that
they appear to be one coherent and indivisible system. In this alternative and, in a way,
nonphysical world, the tone of the system, the note of the score, and the key of the key
board have apparently become one and any trace of theory and abstraction has practically
been obscured by the manifest totality and parallelism of the digital virtualization (for a
more detailed discussion on some of the consequences of this, see Knakkergaard 2016b).
Albeit digital technology logically offers unlimited ways of organizing sound into separate
tone steps, for designing interfaces and for representing sonorous events graphically or
likewise, such opportunities have nevertheless only been developed to a modest degree,
and even though the technology epitomizes a situation where most physical borders can
be crossed at will, the twelve intervals of the octave and its protagonist, the diatonic
scale, are maintained. By means of digital technology it is, for instance, much easier than
ever before to work with different tunings. In addition to the possibility for the user to
define personal, unique tunings, the music application Logic Pro X, for instance, offers
97 “standard” tunings including scientific ones such as “1/4-comma meantone with
equal beating fifths” and “12-tone Pythagorean subset of JI 17-tone scale”; historical ones
like “Ramos de Pareja (Ramos de Pareia)—Monochord, Musica practica (1482)” and
“J. S. Bach ‘well temperament,’ acc. to Jacob Breetvelt’s Tuner”; and also exotic tunings
such as “Northern Indian Gamut, modern Hindustani Gamut out of 22 or more Shrutis”
and “Gamelan Udas Mas (approx) s6,p6,p7,s1,p2,s2,p2,p3,s3,p5,s5,p5.” Due to MIDI’s
limitations, there is, however, no way to avoid the twelve steps of the octave and the
systemic abstractions: the imaginary regime 129
notion of the standard keyboard layout because the otherwise abstract tone in MIDI is
understood as a key-number instead of a tone-number and as such is tied to the concrete
concept of a key which is struck.6 MIDI is organized via the metaphor of the standard
Western keyboard, plain and simple, and thus, in reality, controls and regulates the way
music is created and appreciated in a much more rigid and dominating sense than was
possible before its advent, eventually nourishing the notion that the systems behind are
ontologically given a priori.
Thus, unless we turn to sound art and sound installations, the development seems
only to have consolidated the dominance of the implied systems and the interval has not
at all been set free. Although practices that imply procedures such as glissandi, blue
notes, and similar alterations that diverge from the fixed intervals are still widely used,
the keyboard metaphor is not really suited for the “in-betweens” and the 12-note segmen
tation of the octave—and the octave itself—in MIDI is not just a prerogative but an una
voidable premise.
Composers, performers, critics, and music thinkers, especially in the field of contem
porary music, every so often challenged this situation. The alternative protocol, ZIPI,
which was introduced in the 1990s, is maybe the best qualified and most versatile and
least esoteric example of this. But although the proposed standard was MIDI compatible,
and thus did not imply a complete break with current equipment and practices, it never
caught on and, to date, every attempt to establish an agenda that could threaten the con
cepts seriously has failed. For the time being, it seems fair to claim that music is not only
organized by means of the system’s concepts and elements, but it is also imagined
through the same conceptual formats—taking sound production of music into consid
eration confirms this.
Final Remarks
Music of today—and roughly speaking of the last 2,500 years—is not just influenced but
also determined by a particular kind of metaphysical thinking of the Ancient Greeks.
Although this thinking’s strong focus on the number four in reality lost its footing long
ago, it is still the major factor behind the idiosyncratic regime of possible tone-steps
within the range of audible sound and therefore the division of the octave—and nothing
seems capable of disturbing this regime seriously. The tone steps and their tuning might
appear to us as the squares or fields of a graph paper but, in reality, they build a format
that resembles the nodes of the lines of the grid and not the squares. Thus, there are many
possible nodes in-between the ones that are preprinted, and even though they are invisible,
these alternative nodes are very often articulated and exposed in musical performances.
They are, however, brought into play in relation to the preformatted nodes that, in this
way, function not just as a reference but also as theoretical and abstract final goals. There
is no doubt, that this well-organized and continuously exposed universe of pitches, and
especially the various selections that make up certain identifiable “tonics”—pentatonic,
130 martin knakkergaard
diatonic, and familiar, or unique modes—is essential as the core premise for musical
creativity in Western cultures, not least because the “tonics”—contrary to the thoroughly
chromatic as in the case of serialism—make way for sensations of invariant musical
signs (motives, figures) that are perceptually achieved through a kind of object perma
nence even where there is talk of obvious breakdowns between the different expositions
of the sign (Kjeldsen 2004). As Kjeldsen points out, it is our perception that “delivers”
the notion of identity or equivalence, just as it makes us “experience” elements such
as tension and relaxation. Thus, we really cannot hear what we are hearing, the “tonicity”
is too strong and our brains are too adapted to the patterns of the diatonic scale. The
strength of our perception of the diatonic scale—or any scale with which we can become
familiar—also makes it possible for us to ignore the quality or character of the sound
source and it does not matter if it is a piccolo flute or a bass synthesizer that plays a melody,
we can still recognize it just as we can even when it is transposed. The diatonic scale is a
strong regime.
Historically, there are numerous examples of alternative proposals aimed at replacing
or supplementing the ruling tone system. Ferruccio Busoni’s essay Entwurf einer neuen
Ästhetik der Tonkunst, first published in 1907, is a good—and famous—example of such
a proposal in which he, among many other things, suggests an expansion of the octave
into eighteen steps by means of a sixth-note-division of the whole tone (Busoni [1916]
1973, 40) and further claims that music’s full blossom is hindered by the instruments,
that “their range, their tone, what they can render . . . are chained fast, and their hundred
chains must also bind the creative composer” (Ruscoll 1972, 33). Many composers and
musicians in the twentieth century challenged the dominance of the systems, some by
expanding it further similar to what Busoni dreamed of, others by working with non
pitched or weakly pitched sounds in the form of samples, as introduced by the composers
of musique concrète. However, even when working with isolated sound samples, the
organization of these may take the form of a “normal” piece of music, in composition as
well as in realization. This is, for instance, the case in a highly “outer-space-soundscape”
composed by Eric Serra for a particular scene in Luc Besson’s movie The Fifth Element.
In the scene, even though the odd sounds and many “picturesque” sound-effects appear
somewhat chaotic and as a properly stereotyped space-soundscape, when closer exam
ined they turn out to be neatly composed in accordance with not just a discrete steady
beat but also in accordance with a selection of pitches that evokes a sense of musical
mode with references to the diatonic scale (see Knakkergaard 2009, 294). Again, the
diatonic scale is a strong regime, and, in a way, it seems fair to claim that not just our
musical practices but also our understandings and imaginations of music are subject to
the discreet hegemony of diatonism. However, this diatonism, and the abstract entities
of the reductionist tone system as a whole, have nourished the development of a firm,
highly complex and advanced basis for musical creativity and imagination. These frame
works that today truly are numerically regulated have provided the prerequisites that
secure the comprehensibility of highly complex sound structures and an overwhelming
amount of highly different musical genres and styles. So, maybe the Greeks were right
after all, not in their focus on the number four, but in their vision and imagination of the
systemic abstractions: the imaginary regime 131
order of musical sound regulation as a means to gain insight into some of the fundamental
principles of existence. By detaching the structure of the tone system from practice
and by making its entities abstract, the path is paved for a composite, ideal system whose
elements all are theoretically balanced. Such a strategy is not unique to Western cultures:
no matter where we look in time and place, there are always basic norms, scales, and
generative principles in play in the making of sonorous musical artifacts aimed for cere
monial and religious and eventually epistemological purposes. The question remains,
however, how is it possible to overcome the limitations that the current principles entail,
how can their imaginative spell be broken?
Notes
1. From around 1150, Byzantine neumes did, however, indicate intervals but not pitches.
2. Or reconstructed theoretically, as the tetrachords were present empirically at the time.
3. The diatonic scale can also be produced by applying the fifth as “generator interval”:
F–c–g–d’–a’–e’’–b’’, just as the pentatonic scale can be produced by stacking fifths F–c–g–d’–a’
and the chromatic scale by proceeding from the diatonic: F–c–g–d’–a’–e’’–b’’–f#’’’–c#’’’’–
g#’’’’–d#’’’’’–a’’’’’. However, before the introduction of equal temperament—which in fact
replaces the fifth with the semitone as the generator interval—these intervals would, just
like the ones produced by means of the tetrachord, not “be in tune” when folded back into
the same octave.
4. The fact that the Pythagoreans defined the interval of the whole note as 9:8 does not cor
rupt this point, as they understood the 9 as a fifth plus a fifth and the 8 as a fifth plus a
fourth, this way maintaining the limits of 4.
5. See https://en.wikibooks.org/wiki/Guitar/Print_Version. Accessed March 2017.
6. By tuning every single key-number (tone) individually, it is possible to program MIDI in
such a way that it has more than 12 tones to the octave.
References
Attali, J. 2008. Noise and Politics. In Audio Culture: Readings in Modern Music, edited by
C. Cox and D. Warner, 7–9. New York: Continuum.
Brinner, B. 2017. Indonesia. §III: Central Java. 3. Instruments and ensembles. Grove Music
Online. Oxford Music Online. Oxford University Press. Accessed October 16, 2017.
Burkert, W. 1962. Weisheit und Wissenschaft: Studien zu Pythagoras, Philolaos und Platon.
Nürnberg: Hans Carl.
Busoni, F. (1916) 1973. Entwurf einer neuen Ästhetik der Tonkunst. Hamburg: Verlag der
Musikalienhandlung. Karl Dieter Wagner.
Dahlhaus, C. 2017. Harmony. Grove Music Online. Oxford Music Online. Oxford: Oxford
University Press.
Handschin, J. 1948. Der Toncharacter. Zürich: Atlantis.
Hansen, F. E. 1995. Middelalderen. In Gads Musikhistorie, edited by S. Sørensen and B. Marschner,
15–72. Copenhagen: G.E.C. Gad.
Hansen, F. E. 1999. Musik: Logisk konstruktion eller æstetisk udtryk? In Æstetik og logik,
edited by J. Holmgaard, 151–167. Aalborg, Denmark: Medusa.
132 martin knakkergaard
From R ays to R a
Music, Physics, and the Mind
Introduction
Surfing the net one day led us to the discovery of a fortuitous combination of articles.
The first, by Elizabeth Hellmuth Margulis, titled “One More Time” (Margulis 2014),
deals with the crucial role of repetition in musical experience. The second article,
“A New Physics Theory of Life,” describes the work of MIT’s Jeremy England (Wolchover
2014). England’s work focuses on the second law of thermodynamics, particularly on
how entropy can be defeated locally under certain physical conditions. The signifi-
cance of repetition in both articles led us to the thought that a line could be traced
from the one to the other. That is, if repetition is crucial to the emergence of life and
to the experiencing of music, could it be that a fundamental relationship underlies
both phenomena?
We will take a moment to examine the implications of England’s work. Entropy can
be regarded as a measure of the tendency of energy to disperse over time.1 We focus on
entropy of “open” systems. Within these systems, entropy can be kept low by increasing
the entropy of their surroundings. During photosynthesis, for example, a plant uses
sunlight to maintain its own internal order while increasing overall entropy in the uni-
verse (Wolchover 2014). Jeremy England’s mathematical formula shows that more
likely evolutionary outcomes involve atoms that absorb and dissipate more energy.
Significantly, “[p]articles tend to dissipate more energy when they resonate with a driv-
ing force, or move in the direction it is pushing them, and they are more likely to move
in that direction than any other at any given moment.” For example, “clumps of atoms
surrounded by a bath at some temperature, like the atmosphere or the ocean, should
134 janna k. saslaw and james p. walsh
tend over time to arrange themselves to resonate better and better with the sources of
mechanical, electromagnetic or chemical work in their environments” (Wolchover 2014).2
There are two mechanisms mentioned by England that can increase efficiency of energy
use and its subsequent dissipation. These are self-replication (in nonliving or living things)
and increasing structural organization. Self-replication increases energy use and dissi-
pation by copying an already efficient entity. Structural organization will only increase,
as indicated earlier, if it results in greater energy usage. Both mechanisms are found in
life forms, but are not limited to them.3 It seemed to us that resonance with an energy
source, self-replication (a form of repetition), and increasing structural organization
were all notions that pertain to sound in general and to music in particular, both as
physical and cultural productions. A few examples may suffice at this point. Since sound
is produced by waves, resonance works most obviously in areas where waves combine:
timbre, consonance, and synchrony of constituent elements. Repetition applies to rhythm,
but also to many other facets of sound, including the creation, recognition, and memory
of pitch patterns. Increasing structural organization can be found in the development of
sonic and musical creations over time.
In this chapter, we intend to trace the role of repetition from the atomic level to the
homeostasis (stable state of equilibrium) of life forms, to the formation of culture, and to
music. We will focus on the evolutionary advantage of music in homeostasis of indi-
viduals and cultures, both sub- and supraconsciously, delineating what we call a “homeo-
static frame of reference.” We will provide short examples in music, from lullabies
to Beethoven, before examining two longer examples presenting the Afrofuturist jazz
pioneer Sun Ra’s vision and methods of expanding listeners’ homeostatic frame of
reference through his music.
We speculate that music is not simply a cultural invention or an evolutionary trait,
but rather an outcome of elementary laws governing the disposition of matter.4 It is a
product of iteration or periodicity and the natural accumulation of complexity through
variation. Just as England implies that if you shine light on random atoms long enough,
they will tend to self-replicate and organize until, eventually, you will get a plant, we
propose that if you continue shining that light you will get music.
In this section, we will examine the key components of our conjecture: self-replication,
invariance, emergent structure, homeostasis, entrainment, swarm behavior, and neural
synchronization. Each of these components will be discussed in what follows. We use
the term “homeostatic frame of reference” to refer to any group of entities that col
lectively maintains homeostasis. Using this concept, it is possible to theorize a continuous
process of development from England’s observations about thermodynamics, through
work on the origins of life and the development of cells, on to theories of mind and
consciousness, and even the behavior of crowds, economies, and nations.
music, physics, and the mind 135
Self-Replication
Self-replication is a necessary mechanism for counteracting entropy. According to England,
Interest in the modeling of evolution long ago gave rise to a rich literature exploring
the consequences of self-replication for population dynamics and Darwinian compe-
tition. In such studies, the idea of Darwinian “fitness” is frequently invoked in com-
parisons among different self-replicators in a non-interacting population: the
replicators that interact with their environment in order to make copies of them-
selves fastest are more “fit” by definition because successive rounds of exponential
growth will ensure that they come to make up an arbitrarily large fraction of the
future population. (England 2013)
Note that the mathematical formulas to determine reproductive fitness will apply at
any level of structure. As England puts it, to examine self-replication, first the entity
being replicated must be identified. Whatever that entity may be, however, the probability
of its replication is determined in the same way:
“Self-replication” is only visible once an observer decides how to classify the “self ”
in the system: only once a coarse-graining scheme determines how many copies of
some object are present for each microstate can we talk in probabilistic terms about
the general tendency for that type of object to affect its own reproduction . . . Whatever
the scheme, however, the resulting stochastic population dynamics must obey the
same general relationship entwining heat, organization, and durability.
(England 2013)
Invariance
Invariance is the property of identity between entities. This identity could be of any sort.
For example, in the case of two different triangles, the area or the angles or some other
aspect could be invariant. Invariant behavioral properties of starlings cause flocking.
Invariant ratios between elements in sunflower heads lead to their arrangement in
spirals. In sound, invariance of wave forms connects instruments that may be of different
sizes or composition. In music, invariant intervals, contour, and rhythm (separately
or together) connect different motive forms. In a progression toward more complex
136 janna k. saslaw and james p. walsh
Emergent Structure
Circumstances that create a high enough probability of an entity sharing an invariant
feature with another entity tend to lead to a multiplicity of similar entities. These entities
then may give rise to an emergent structure—a new structure that comes to exist because
of the interactions of the initial individual entities. For emergent structure to arise, one
needs an instantiation of some entity, another entity that displays invariance with the
first one, and some property of the invariance between the items that allows for con-
nectedness. For example, life on earth is largely carbon-based due to the element’s ability to
chain together to create more complex combinations. Single-celled creatures contain
genes for cell adhesion molecules that allow them to attach to their environment but
also serve to connect them to each other and create the beginnings of multicellular
organisms (Neubauer 2011, 45–49). Musical meter may be viewed as an emergent prop-
erty of rhythms that coincide at regular time intervals.
Emergent structure may be found at any level of complexity. Each phase involves
similarly behaving entities that give rise to an emergent structure. This emergent struc-
ture then takes on its own life as an entity, interacting with similar entities, and in turn
gives rise to yet another layer of emergent properties. This simple pattern of events will
continue over time. Thus, particles form into atoms, atoms into molecules, molecules
into chemicals, chemicals into cells, cells into organs, and organs into organisms.
Swarm Behavior
Organisms give rise to yet another level of complexity when they form into groups. We
probably have all seen flocks of flying birds that appear to act with a single intelligence.
In fact, swarm behavior “emerges naturally from simple rules of interaction between
neighboring members of a group” (Fisher 2009, 9).
One could view human social groups as a kind of swarm. In the words of Len Fisher,
author of The Perfect Swarm (2009):
The process by which simple rules produce complex patterns is called “self-
organization.” In nature it happens when atoms and molecules get together
spontaneously to form crystals and when crystals combine to form the intricate
patterns of seashells. It happens when wind blows across the sands of the desert to
produce the elaborate shapes of dunes. It happens in our own physical development
when individual cells get together to form structures such as a heart and a liver, and
patterns such as a face. It also happens when we get together to form the complex
social patterns of families, cities, and societies. (2)
music, physics, and the mind 137
Fisher further notes that animal and human swarms provide certain advantages to
individuals:
Swarm behavior becomes swarm intelligence when a group can use it to solve a
problem collectively, in a way that the individuals within the group cannot. Bees use
it to discover new nest sites. Ants use it to find the shortest route to a food source.
It also plays a key role, if often an unsuspected one, in many aspects of our own
society, from the workings of the Internet to the functioning of our cities. (10)
In this passage, we see how “swarm intelligence” is used to solve problems that deal
with homeostasis. For insects, finding a nest or food source contributes to a stable environ-
ment and regulation of energy intake and expenditure. In our view, this is a primary
function of musical activity. We propose that music, which can serve to bring individual
humans together into a sociocultural group or “swarm,” also aids in solving problems
required for homeostasis (the ultimate goal of which is to sustain life).
To delineate the role of the second law of thermodynamics as it pertains to life forms
we will generalize from the concept of swarm behavior to homeostatic frames of
reference.
For life that exists within a relatively narrow range of conditions, there must be a
strong tendency toward homeostasis, toward building an inner world that is buf
fered against fluctuations in the outer world. The changes described here are probably
not just accidental (although mutations supplied the raw material). Those groups
that came to dominance in each era evolved new ways to be independent of their
surroundings. (2011, 26)
The concept can be seen to give rise to various levels of complexity in which one might
consider the role of homeostasis. For example, if we consider biological functions, we
might derive frames of reference that expand from individual cells through organisms,
families, communities, societies, nations, and so forth. The idea could be applied on both
larger and smaller scales. We could conceivably start with atoms and build up greater
levels of complexity through the creation of molecules, chemicals, protein chains, and
138 janna k. saslaw and james p. walsh
Periodicity
Periodicity is the recurrence of invariant attributes. We focus here on temporal perio-
dicity, which involves recurrence of invariant temporal intervals. At the cellular level,
periodicity helps regulate basic bodily functions and controls the scanning of the
environment that leads to perception.
Periodicity is necessary in musical activity as well. The existence of a tone, for example,
requires periodicity of sound wave frequency. Temporal periodicity is also fundamental
in creating a basic pulse.
At the level of the individual, repetition in music lends itself to the creation of memory.
Repetition is crucial to simple identification of event sequences. In this process, the
presence of invariant features in consecutive spans of time causes neurons to fire in a
manner that aids the formation of memory. The neurobiologist Gerald Edelman describes
the neuronal activity of sequencing events in the brain in these terms:
In a sense, then, brains are themselves “swarms,” with each neuron functioning both as
an individual and collectively according to simple rules. These rules include the activation
or inhibition of neuronal members of the “swarm” in order to optimize homeostatic
regulation.
music, physics, and the mind 139
Rodolfo R. Llinás (2002) compares neural activity to “some types of fireflies, which
synchronize their light flash activity and may illuminate trees in a blinking fashion like
Christmas tree lights. This effect of oscillating in phase so that scattered elements may
work together as one in an amplified fashion is known as resonance—and neurons do it
too” (12). This type of synchronization is referred to also as “entrainment.”
In human activity, it may be that entrainment serves to activate coordinated action. For
instance, work songs use metrical regularity to enable the exertion of multiple indi-
viduals to happen simultaneously. This basic function can be extended to include the idea
of groove. Margulis describes the way that groove enables human bonding:
Groove tends to make people feel as though they were “a part of the music,” pro-
viding further evidence for a link between the ability to successfully predict elements of
the musical structure and the kind of extended subjectivity that has been identified
as a hallmark of strong experiences of music. (112)
feeling that a piece is inevitable and right amounts to an appealing sense of someone
else’s (the composer or performer) artistic act precisely matching our own s ensibilities.
It can be intoxicating to feel that a piece created by another person is funda
mentally right. (113)
Live music, as compared with reading and no interaction, appears to improve the
wellbeing of young patients with cardiac and/or respiratory problems, and also to be
beneficial for their careers. It seems to be live music per se, and not the social com-
ponent of the musical interaction that attracts and distracts children, thereby helping
them to feel less in pain and more relaxed, and this seems to apply to the older children
in particular. (Longhi et al. 2015)
Ellen Dissanayake has done extensive research into the role of music in human infancy.
Music is an important mode through which parents interact and bond with their children,
leading to homeostatic benefits for both.
On the opposite end of the spectrum, music can also be used to improve the chances of
a society surviving while at war, either by using it to discourage or even to torture the
music, physics, and the mind 141
The thesis is also borne out well in tribal societies where, under the strict control
of the flourishing community, music is tightly structured, while in detribalized
areas the individual sings appallingly sentimental songs. Any ethnomusicologist
will confirm this. There can be little doubt then that music is an indicator of the age,
revealing, for those who know how to read its symptomatic messages, a means of
fixing social and even political events. For some time I have also believed that the
general acoustic environment of a society can be read as an indicator of social con-
ditions which produce it and may tell us much about the trending and evolution of
that society. (Schafer 1993, 7)
By examining the values that bind individuals into a society, we should be able to deter-
mine how these values contribute to homeostasis—how the sound environment leads to
sustenance of suitable living conditions.
state of the organism. Thus, organized sound itself (including music) contributes to
the evolution of intelligence.
It may be that composers propose new combinations of sound that then affect
swarm behavior. With his Ninth Symphony, Beethoven is attempting to affect human
history. At the moment when the baritone soloist and then the chorus announce “Alle
Menschen werden Brüder” (“all men become brothers”), the composer and poet com-
bine to expand the homeostatic frame of reference from the listener as individual to
the world as family.
describes. The primary concept here is the role that periodicity plays in creating and
enhancing adaptive systems. Neuronal phase synchronization creates a periodicity that
allows the stochastic system of potential neural connections to become the adaptive
system that is the mind. In other words, metaphors arise from the inherent tendency of
our cognitive apparatus to adapt for homeostatic regulation, and music’s periodic
character aids this process.
Sun Ra
In this section we will introduce musical examples that illustrate the principles stated
above. Our choice to examine the music and thought of Sun Ra (born Herman Poole
Blount, 1914–1993), stems from his explicit attempt to create a larger, better homeostatic
frame of reference. Many styles and genres of music overtly feature periodicity that
encourages physical entrainment and social connection: dance music of all sorts, work
songs, gospel-style hymns, marches, for instance. Minimal music, especially the early
work of Philip Glass and Steve Reich, brings attention both to entrainment and also
lack of it. However, Sun Ra’s work, at times, combines different periodicities in order
to create the emergent property of new types of entrainment, thus encouraging us to
expand our homeostatic frame of reference to a planetary scale.
In a sense, Sun Ra dedicated his career and music to imagining moving beyond the
society that held him back. Yet Sun Ra did not simply imagine what his universal utopia
would be like. He spoke and behaved as if it were reality. Sun Ra created his life and
music as a means of refusing to participate in the oppressive narrative of his time.
Sun Ra spoke of the earth as being out of tune with the universe. We take his comments
to mean that humans need to expand our homeostatic frame of reference to include a
vision of the earth as it exists in the vastness of space. Sun Ra’s thinking on this subject
seems akin to the message delivered in James Lovelock’s classic book on the environment:
We need to love and respect the Earth with the same intensity that we give to our
families and our tribe. It is not a political matter of them and us or some adversarial
affair with lawyers involved; our contract with the Earth is fundamental, for we are
a part of it and cannot survive without a healthy planet as our home. I wrote this
book when we were only just beginning to glimpse the true nature of our planet and
I wrote it as a story of discovery. If you are someone wanting to know for the first
time about the idea of Gaia, it is the story of a planet that is alive in the same way
that a gene is selfish. (Lovelock 2000, viii–ix)
Sun Ra considered his music to come from beyond Earth. For him, an envisioned music
of space would not unfold like that of Earth.
If the harmony is just what they teach you in schools, then it wouldn’t be any other
than what we’ve been hearing all along, but when the harmony’s moved the rest is
144 janna k. saslaw and james p. walsh
supposed to move and still fit, then you’ve got another message from another realm,
from somebody else. Superior beings definitely speak in other harmonic ways than
the earth way because they’re talking something different, and you have to have chord
against chord, melody against melody, and rhythm against rhythm; if you’ve got that,
you’re expressing something else. (Schaap 1989, cited in Szwed 1997, 128)
“Melody against melody, and rhythm against rhythm” is immediately apparent in the
composition “Space Is the Place” (Sun Ra 1973). This is perhaps Sun Ra’s most well-
known composition, having been performed on the television show Saturday Night Live
in 1978. The bass ostinato is in 5/4 while the melody is in cut time (2/2). This means that
their respective downbeats coincide only every five bars of cut time (or four of 5/4).
Since the melody’s phrases move in four-bar units, this means that the two patterns
would begin together only every twenty (or sixteen) bars (the melody is not identical
throughout, but the four-bar phrases continue). Adding to the different feels of the osti-
nato and melody is the fact that, although quarter notes are the same tempo in both, in
the former the beat is the quarter note while in the latter it is the half note. In per
formance, percussion and other parts may add to the layers of metrical conflict. This use of
parallel streams of rhythms that articulate conflicting metric structures lures the listener
into a kind of entrainment: after multiple repetitions, the mind begins to coordinate the
5/4 bass ostinato with the cut time melody in a manner that creates a sense of flow. In
other words, the invariant quarter-note pulse, combined with the two conflicting metrical
streams, creates an emergent property of entrainment over longer spans of time—the
coordinated downbeats of the two meters. This emergent property, a new kind of entrain-
ment, reflects the lyrics’ encouragement to the audience to expand their consciousness:
Sometimes, “rhythm against rhythm” is found in the subtle shifting, variation, and expan-
sion of motives that confound metrical expectations, as in “Dance of the Language
Barrier,”7 which probably dates from the early 1980s.8 The titular “language barrier” refers
to the difficulty of understanding created by Sun Ra’s radical alteration of his materials,
which disguises the underlying motivic repetition, but it also refers to the problems of
communication between human beings. In terms of our discussion, the “language
barrier” represents the border between differing homeostatic frames of reference. The
“dance” aspect of the title suggests a pairing of the opposing sides, creating an expansion
of one’s social framework to include those who do not speak your language.
“Dance of the Language Barrier” is the musical realization of the difficulties of creating
entrainment between different homeostatic frames of reference. Sun Ra tried to create
a consciously challenging sense of entrainment, largely through more challenging rhyth-
mic ideas. Nevertheless, if he had thought that the language barrier was insurmountable,
music, physics, and the mind 145
he would not have written a piece of music to tackle, and, through physical and social
means, eliminate it.
Sun Ra is constantly changing the length of his motives in “Language Barrier.” While
many jazz phrases flow in units of two or four bars, there is no consistent phrase length
here. Most of the variations in motivic length in this tune are created through additions
or subtractions from the upbeat figures that begin every motive. Listeners certainly
will notice invariant elements in the phrases, especially after repeated hearings, but the
degree of variation seems much higher than in typical jazz tunes. The metrical clarity of
the tune is severely undermined by a high degree of syncopation, achieved by minimiz-
ing articulation of downbeats and emphasizing upbeats. The composition also contains
a very complicated pattern of accents. When jazz musicians are playing a tune, they
often talk about whether the “feel” of a note is “up” or “down.” “Up” notes occur as antici-
pations or delays of the beat and are played as accents, resulting in syncopation. “Down”
notes occur on the beat. In our own experience performing “Dance of the Language
Barrier,” we found that the written score does not adequately portray where to place ups
and downs. Only through listening to the tune as played by Michael Ray, former trum-
peter with the Arkestra, were we able to place the accents and shape the phrases the way
Sun Ra had taught them to his band. In a personal conversation with us in October 2000,
the woodwind player Marshall Allen, who worked with Sun Ra from the 1950s until the
latter’s death and now leads the Arkestra, discussed the difficulty of learning the accen-
tual patterns of Sun Ra’s music. Allen indicated that the main focus of rehearsals was
how to phrase the songs, how to achieve the right sound, tone, “vibration,” “voice,” or style,
how to “say” what Sun Ra wanted. This included when to slur notes, when to cut them
off, and when to “push the time forward or backwards,” “before the beat” or “after.”9
Part of the difficulty in detecting patterning in “Dance of the Language Barrier” comes
from the relatively constant stream of fast note values in the long, irregular phrases,
punctuated by longer note values that seem to be irregularly placed. Finally, the harmonic
progressions of this tune do not reinforce a sense of regular phrase structure. In fact,
apparently there was no single progression for the tune. Sun Ra was notorious for
reharmonizing his compositions at each performance. Since he was the keyboard player,
many of his more complicated written scores do not contain chord symbols—only he
knew what harmonies he would play, and the band would follow.
All of these factors combine to underline the “Language Barrier” of the piece’s title.
However, a crucial factor serves to encourage metrical entrainment: the drum part. In
the recording, there is a clearly discernable jazz swing beat, with cymbals adding emphasis.
Thus, a listener might have great difficulty humming the tune, for the reasons outlined
previously, but would still have no trouble tapping a foot to the performance. Thus, in a
sense, the language barrier is being erected by the melody and broken by the drums of
the Sun Ra Arkestra. Or perhaps the tune/harmony is one “language” and the very tradi-
tional swing drum part is another. In any case, one would need to supply more than
just passive listening. The kind of “dance” that would be performed to this music might
well require superhuman efforts—exactly what Sun Ra wanted. In our experience, if
one makes those efforts, one enters a new understanding. The fact that this music can be
146 janna k. saslaw and james p. walsh
played by a big band, which was designed to be a model of cooperation and coordination,
means that the “Language Barrier” is surmountable, and rewards listeners with a cosmic
mode of dance.
From Sun Ra’s writings and communications (see Sun Ra 2005; Szwed 1997), it seems
that he hoped audiences, in a sense, would create new thought patterns that would
resonate with the advanced patterns of his music, turning away from negative, divisive
ideas toward a more inclusive frame of reference. Sun Ra envisioned an active role for
every person creating or hearing his music. He felt that “all people vibrate” (Sun Ra 2005,
460), and all their individual frequencies were important. “Each person is music himself
and he’ll have to express what he is” (476). Echoing the terms used by Jeremy England
(discussed earlier) to describe the resonance of atoms with an energy source, Sun Ra
associates particular sound and musical frequencies with the notion of race: “each color
has its own vibration. My measurement of race is rate of vibration—beams, rays” (460).
So, in his view, individuals and races not only resonate with external energy sources,
but they are themselves energy sources. Through the vibrations of his music, Sun Ra
suggests, entrainment between the audience and the performers takes place, resulting
in a higher state of being. “The real aim of this music is to coordinate the minds of people
into an intelligent reach for a better world, and an intelligent approach to the living
future” (457). Achieving an emergent entrainment different from normal experience
would result in humans actually becoming different, more natural beings (ones who
do not injure or kill their brothers and sisters—to Sun Ra, a much better homeostatic
condition). “Space music is an introductory prelude to the sound of greater infinity. . . . It
is a different order of sounds synchronized to the different order of being. . . . It is of, for
and to the Attributes of the Natural Being of the universe” (457).
Music functions at two nonconscious levels. At the level of the body, music affects
hormonal and motor systems; at the level of society these hormonal and motor systems
contribute to coordinated behavior. Given this theory, one can conclude that music does
have basic universal functions and effects. However, history and evolution have led to
the creation of differing strategies for maintaining homeostasis in different “climates.”
From these differing strategies, basic values systems emerge and develop. The differences
in musical thinking in different cultures, and so on, then arise from the differing values
systems and behaviors that arise from cultural differences.
Invariance and repetition allow for predictive coherence and create the possibility of
emotive hearing in individuals and hence coordinated actions between individuals. As
an example, mating strategies10 may be viewed as related to musical taste. A society that
favors fewer children and greater parental commitment may prefer music that differs
music, physics, and the mind 147
from that in a society that favors more children and privileges the act of mating. One
might speculate that the latter society will feature a more “dance-friendly” style of music
with sharper attacks, motor rhythms, and simpler melodic forms, while the former
might place more emphasis on complex melodic shape.
Whatever the features of a musical style may be, it is likely that their emergence was
shaped by resonance, in accordance with the processes described earlier. Resonance,
according to our view, encompasses both universal laws of physics and cultural dif-
ferences, as indicated by the work of Edward W. Large (2011). He finds, for example, that:
Tonality is a universal feature of music, found in virtually every culture, but tonal
“languages” vary across cultures with learning. Here, a model auditory system, based
on knowledge of auditory organization and general neurodynamic principles, was
described and studied. The model provides a direct link to neurophysiology and,
while simplified compared to the organization and dynamics of the real auditory
system, it makes realistic predictions. . . . Analysis of the model suggests that certain
musical universals may arise from intrinsic neurodynamic properties (i.e., nonlinear
resonance). Moreover, preliminary results of learning studies suggest that different
tonal languages may be learned in such a network through passive exposure to music.
In other words, this neurodynamic theory predicts the existence of a dynamical,
universal grammar . . . of music. (123)
If, instead of imagining an open system with particles in it, as England asks us to
do, we substitute the brain for the system and potential synaptic connections for the
particles, we can see how neural synchronization functions to create the emergent
property of memory and ideas. A neuronal group that “fires together” repeatedly could
be considered equivalent to a recurring grouping of particles. As they fire, these neurons
would be absorbing and dissipating heat more efficiently than neural connections that
are not subject to periodic activation by neural synchronization. The neurophysiologist
Pascal Fries claims:
Inputs that consistently arrive at moments of high input gain benefit from enhanced
effective connectivity. Thus, strong effective connectivity requires rhythmic synchroni-
zation within pre- and postsynaptic groups and coherence between them.
(2015, 220, emphasis ours)
The mind, then, is an emergent structure that arises as a result of the effect of periodicity
on the stochastic collection of possible neural connections.
Conclusions
So, based on the foundation laid out earlier, we feel that we have clarified the connection
between the phenomena of repetition in music described by Margulis and England’s
theoretical work on emergent systems. Using terminology developed by Grimshaw and
Garner (2015), we could say that music is an emergent perception that is part of an acoustic
ecology arising from the interaction of two different stochastic systems, the mind and
the universe. We can describe air pressure fluctuations in the auditory range as exosonus,
and the activation of neural networks caused by these sounds as endosonus (33).
Repetition in music causes the exosonic stimulus to be better suited to forming neural
networks and thus to become endosonic activity. We have proposed that this emergent
perception is facilitated by synchronized neural activity that creates a referential time
frame in which particular sequences of synaptic connections are more likely to recur
and thus are reinforced. The neural networks themselves then function as an emergent
system that gains efficiency in taking in and dissipating energy (in the form of neural
firings). The endosonic results can be recombined through memory activation to take
music, physics, and the mind 149
Light shines on a heat bath. Its resonance creates entities that are more efficient at
processing heat, some of which then self-replicate. Eventually, the invariant relation-
ships between the replicants allow them to combine into emergent structures. Emergent
structures coordinate the actions of living beings into swarm behavior, allowing them to
better control homeostasis. Periodicity, which was already present in the system in the
form of resonance-creating energy, then serves as a basis for communication between
emergent structures by means of entrainment. This periodicity also influences neurons
to form networks, allowing consciousness to arise. Music is then a means by which
entrainment may be used for communication, allowing us to form more efficient homeo-
static frames of reference. As we have seen, Sun Ra used music to teach the human
species to see ourselves as a single homeostatic frame of reference in order for the spe-
cies to continue to exist in the vastness of space.
Notes
1. The entropy in a “closed” or isolated system can be described as follows: “[Entropy] increases
as a simple matter of probability: There are more ways for energy to be spread out than for
it to be concentrated. . . . Eventually, the system arrives at a state of maximum entropy called
“thermodynamic equilibrium,” in which energy is uniformly distributed. A cup of coffee
and the room it sits in become the same temperature, for example. As long as the cup and the
room are left alone, this process is irreversible. The coffee never spontaneously heats up
again because the odds are overwhelmingly stacked against so much of the room’s energy
randomly concentrating in its atoms” (Wolchover 2014).
2. In other words, England argues that under certain conditions, matter will spontaneously
self-organize instead of becoming more disordered. This tendency could account for the
150 janna k. saslaw and james p. walsh
internal order of many inanimate structures and of living things as well (see also Chaisson
2001). “Snowflakes, sand dunes and turbulent vortices all have in common that they are
strikingly patterned structures that emerge in many-particle systems” (Wolchover 2014).
3. “The underlying principle driving the whole process is dissipation-driven adaptation of
matter. . . . You start with a random clump of atoms, and if you shine light on it for long
enough, it should not be so surprising that you get a plant” (Wolchover 2014).
4. These notions have been around for millennia in the form of philosophical and religious
conceptions such as the “music of the spheres” in Western thought, as well as in ancient
Asian cosmology. Here, we add the contributions of modern physics and biology to these
earlier ideas.
5. Reentry is “a process of temporally ongoing parallel signaling between separate [neuronal]
maps along ordered anatomical connections” (Edelman 1989, 49). In other words, it is the
method by which receptors in the brain of stimuli from the world communicate and
coordinate with other neuronal activity elsewhere in brain structures. If neuronal activity
from separate receptors coincides temporally, those neural patterns are associated by
strengthening of their pathways. For example, if I hear a lullaby and simultaneously smell
baby powder, those stimuli can become associated in my brain.
6. For more on music used as a weapon, see Ross (2016) or music used in detention and
torture, see Windsor, this volume, chapter 14. For more on the sound environment of war,
see Bull, volume 1, chapter 9.
7. There is only one recorded Arkestra performance of this tune, on Sun Ra (1990).
8. Personal communications, October 2000, with Michael Ray, trumpeter in the Sun Ra
Arkestra from 1977 on, and with Robert L. Campbell, coauthor of Campbell and Trent (2000).
9. Allen also said that Sun Ra would sometimes ask the band to go faster on ascending pas-
sages and slower on descending ones. “If you go up the stairs you use more energy than
going down.” Sometimes only a part of the band would be accelerating while the rest
stayed at a steady tempo.
10. Neubauer (2011) describes two contrasting mating strategies, suited to differing habitats.
“Opportunistic, or r-selected species, tend to have rapid rates of increase, small size, many
offspring, rapid development, and little parental care. They are able to colonize variable or
unpredictable habitats quickly but may also experience catastrophic mortality when con-
ditions change. Equilibrial, or K-selected, species have fewer young, with slower develop-
ment, and a lot of parental care. They often exist in more constant or predictable environments
where competition is keen and long-term survival skills, in terms of either behavioral ver-
satility or physical growth, are important” (14).
11. While this chapter was in the editing process, Damasio (2018) was published. It supports
our thesis that homeostasis plays a crucial role in the formation of cultural activity, includ-
ing music. Damasio states that “cultural activity began and remains deeply embedded in
feeling” (5), and “feelings are the mental expressions of homeostasis” (6). Damasio defines
homeostasis as “the mechanisms of life itself and . . . the conditions of its regulation” (6).
References
Campbell, R. L., and C. Trent. 2000. The Earthly Recordings of Sun Ra. 2nd ed. Redwood, NY:
Cadence Books.
Chaisson, E. J. 2001. Cosmic Evolution: The Rise of Complexity in Nature. Cambridge, MA:
Harvard University Press.
music, physics, and the mind 151
Clayton, M., R. Sager, and U. Will. 2004. In Time with the Music: The Concept of Entrainment
and Its Significance for Ethnomusicology. ESEM CounterPoint 1. http://web.stanford.edu/
group/brainwaves/2006/Will-InTimeWithTheMusic.pdf. Accessed January 19, 2016.
Cox, A. 2011. Embodying Music: Principles of the Mimetic Hypothesis. Music Theory Online
17 (2). http://www.mtosmt.org/issues/mto.11.17.2/mto.11.17.2.cox.html. Accessed May 7, 2017.
Damasio, A. 2018. The Strange Order of Things: Life, Feeling, and the Making of Cultures.
New York: Pantheon.
Dissanayake, E. 2008. If Music Is the Food of Love, What about Survival and Reproductive
Success? Musicae Scientiae 12 (1 suppl), 169–195.
Edelman, G. M. 1989. The Remembered Present: A Biological Theory of Consciousness. New
York: Basic Books.
England, J. 2013. Statistical Physics of Self-Replication. Journal of Chemical Physics 139 (121923).
doi: http://dx.doi.org/10.1063/1.4818538. Accessed May 7, 2017.
Fisher, L. 2009. The Perfect Swarm: The Science of Complexity in Everyday Life. New York:
Basic Books.
Fries, P. 2015. Rhythms for Cognition: Communication through Coherence. Neuron 88 (1):
220–235. doi:10.1016/j.neuron.2015.09.034.
Gjerdingen, R. 1988. A Classic Turn of Phrase: Music and the Psychology of Convention.
Philadelphia: University of Pennsylvania Press.
Gjerdingen, R. 2007. Music in the Galant Style. Oxford: Oxford University Press.
Grimshaw, M., and T. Garner. 2015. Sonic Virtuality: Sound as Emergent Perception. Oxford:
Oxford University Press.
Large, E. 2011. Musical Tonality, Neural Resonance and Hebbian Learning. In Mathematics
and Computation in Music, 115–125. New York: Springer.
Lestard, N. D., R. C. Valente, A. G. Lopes, and M. A. Capella. 2013. Direct Effects of Music in
Non-Auditory Cells in Culture. Noise Health 15: 307–314.
Llinás, R. 2002. I of the Vortex: From Neurons to Self. Cambridge, MA: MIT Press.
Longhi, E., N. Pickett, and D. J. Hargreaves. 2015. Wellbeing and Hospitalized Children: Can
Music Help? Psychology of Music 43 (2): 188–196.
Lovelock, J. 2000. Gaia: A New Look at Life on Earth. Oxford: Oxford University Press.
Margulis, E. H. 2013. On Repeat: How Music Plays the Mind. Oxford: Oxford University Press.
Margulis, E. H. 2014. One More Time. Aeon. https://aeon.co/essays/why-repetition-can-turn-
almost-anything-into-music. 07 March. Accessed May 7, 2017.
Neubauer, R. L. 2011. Evolution and the Emergent Self: The Rise of Complexity and Behavioral
Versatility in Nature. New York: Columbia University Press.
Perunov, N., R. Marsland, and J. England. 2014. Statistical Physics of Adaptation. http://arxiv.
org/pdf/1412.1875.pdf. Accessed May 7, 2017.
Ross, A. 2016. When Music Is Violence. New Yorker. July 4, 2016 Issue. http://www.newyorker.
com/magazine/2016/07/04/when-music-is-violence. Accessed May 7, 2017.
Saslaw, J. 1996. Forces, Containers, and Paths: The Role of Body-Derived Image Schemas in
the Conceptualization of Music. Journal of Music Theory 40 (2): 217–243.
Saslaw, J. 1997–1998. Life Forces: Conceptual Structures in Schenker’s “Free Composition” and
Schoenberg’s “The Musical Idea.” Theory and Practice 22–23: 17–34.
Schaap, P. 1989. An Interview with Sun Ra. WKCR 5:5 (January–February) 7.
Schafer, R. M. 1993. The Soundscape: Our Sonic Environment and the Tuning of the World.
Rochester, Vermont: Destiny Books.
Small, C. 1998. Musicking: The Meanings of Performing and Listening. Middletown, CT:
Wesleyan University Press.
152 janna k. saslaw and james p. walsh
M usic A na lysis
a n d Data Compr e ssion
David Meredith
Introduction
Most people are capable of imagining music, and composers can even imagine novel
music they have never heard before. This is known as musical imagery and can be dis
tinguished from musical listening or music perception, where the music experienced results
from physical sound energy being transmitted across the listener’s peripheral auditory
system and then transduced in the inner ear into nerve signals that are propagated to
higher centers of the brain. In both music perception and musical imagery, what is expe
rienced is actually an encoding of musical information, created by the person’s brain.
Alternatively, one could adopt a less dualist stance and say that experiencing music is the
direct result of certain spatiotemporal patterns of neural firing that encode musical
information. In listening, this encoding is generated from information about sound
currently in the environment, combined with the person’s musical knowledge. In
imagery, the encoding is constructed only from the person’s musical knowledge.
Sound is thus just one particular medium for communicating musical information
and is not a prerequisite for musical experience. Indeed, trained musicians can experience
(i.e., “imagine”) music they have never previously heard while silently reading a musical
score. Musical imagery and perception therefore have a great deal in common—indeed,
there are some brain centers (especially in the right temporal lobe) that are necessary for
both (Halpern 2003).
Both the way that one perceives and understands music as well as the music that one
is capable of imagining are therefore largely determined by one’s musical knowledge that
is gained through passive exposure to music, active learning of musical skills, and/or study
of music theory and analysis. It has been proposed in psychology, information theory,
and computer science that knowledge acquisition—that is, learning—is essentially data
compression (Chater 1996; Vitányi and Li 2000): on being exposed to new data, a learning
154 david meredith
Of course, it may be the case that there is no single way of understanding a piece or set
of pieces that allows for optimal performance on all such tasks. For example, the best way
of understanding a piece in order to be able to detect errors in a performance may not be
the best way of understanding that piece in order to determine whether some other, pre
viously unheard, piece is by the same composer. There may also be several different ways
music analysis and data compression 155
of understanding a given piece or set of pieces that are equally effective for carrying out a
given task. Nevertheless, it will often be the case that understanding a piece in certain
ways will allow one to carry out certain objectively evaluable tasks more effectively than
understanding the piece in certain other ways; to this extent, one can speak of some anal
yses as being “better than” others for carrying out specific, objectively evaluable tasks. The
goal of the work presented in this chapter is therefore that of finding those ways of under
standing musical objects that allow us to most effectively carry out the musical tasks that
we want to accomplish. The approach adopted is based on the hypothesis that the best
possible explanations for the structure of a given musical object are those that
Clearly, these goals often conflict: accounting for the structure of a piece in more
detail or in a way that relates the piece to all the music in some larger context can often
entail making one’s explanation (i.e., analysis) more complex.
This hypothesis, which forms the foundation for the work reported in this chapter, is
a form of the well-known principle of parsimony. This principle can be traced back to
antiquity2 and is known in common parlance as “Ockham’s razor,” after the medieval
English philosopher, William of Ockham (ca. 1287–1347), who made several statements
to the effect that, when presented with two or more possible explanations that account
for some set of observations, one should prefer the simplest of these explanations.
In more recent times, the parsimony principle has been formalized in various ways,
including Rissanen’s (1978) minimum description length (MDL) principle, Solomonoff ’s
(1964a, 1964b) theory of inductive inference, and Kolmogorov’s concept of a minimal
algorithmic sufficient statistic3 (Li and Vitányi 2008, 401ff; Vereshchagin and Vitányi
2004). The essential idea underpinning these concepts is that explanations for data (i.e.,
ways of understanding it) can be derived from it by compressing it—that is, by finding
parsimonious ways of describing the data by exploiting regularity in it and removing
redundancy from it. Indeed, Vitányi and Li (2000, 446) have shown that “data com
pression is almost always the best strategy” both for model selection and prediction.
The basic hypothesis that drives the research presented in this chapter is thus that the
more parsimoniously one can describe an object without losing information about it,
the better one explains the object being described, suggesting the possibility of auto
matically deriving explanatory descriptions of objects (in our case, musical objects) simply
by the lossless compression of “in extenso” descriptions of them. In the case of music,
such an in extenso description might, for example, be a list of the properties of the notes
in a piece (e.g., the pitch, onset, and duration of each note), such as can be found in a
MIDI file. Alternatively, it could be a list of sample values describing the audio signal of a
musical performance, such as can be found in a pulse-code modulation (PCM) audio
file. The defining characteristic of an in extenso description of an object is that it explicitly
specifies the properties of each atomic component of the object (e.g., a MIDI event in a
156 david meredith
MIDI file or an audio sample in a PCM audio file), without grouping these atoms
together into larger constituents and without specifying any structural relationships
between components of the object.4 In contrast, an explanation for the structure of an
object, such as an analysis of a musical object, will group atomic components together
into larger constituents (e.g., notes grouped into phrases and chords or audio samples
grouped together into musical events), specify structural relationships between com
ponents (e.g., “theme B is an inversion of theme A”), and classify constituents into catego
ries (e.g., “chords X and Y are tonic chords in root position,” “bars 1–4 and 16–19 are
occurrences of the same theme”). Throughout this chapter, I assume that an analysis is a
losslessly compressed encoding of an in extenso description of a musical object, even
though most musical analyses to date have typically been lossy, in that they only focus
on certain aspects of the structure of an object (e.g., harmony, voice-leading, thematic
structure, etc.). Such lossily compressed encodings of an object can also provide useful
ways of understanding it, but, because information in the original object is lost in such
encodings, they do not (at least individually) explain all of the detailed structure of the
object. In particular, such lossy encodings do not provide enough information for the
original object to be exactly reconstructed. Thus, if one is interested, for example, in
learning enough about a corpus of pieces in order to compose new pieces of the same
type, then such lossy analytical methods would not be sufficient.
In the remainder of this chapter, it is proposed that a musical analysis can fruitfully be
conceived of as being an algorithm (possibly implemented as a computer program) that,
when executed, outputs an in extenso description of the musical object being analyzed,
and thus serves as a hypothesis about the nature of the process that gave rise to that
musical object. Moreover, it is hypothesized that, if one has two algorithms or programs
that each generate the same musical object, then the shorter of these (i.e., the one that
can be encoded using fewer bits of information) will represent the better way of under
standing that object for any task that requires or benefits from musical understanding.
A model of music perception and learning will be sketched later on in this chapter,
that is based on the idea of accounting for the structure of a newly experienced piece of
music by minimally modifying a compressed encoding of previously encountered
pieces. Some recent work will then be reviewed in which these ideas have been put into
practice by devising compression algorithms that acquire musical knowledge that can
then be applied in automatically carrying out a variety of advanced musicological tasks.
Encodings, Decoders,
and Two-Part Codes
description of this program may be shorter than its output. A basic claim of this chapter
is that such a description (in the form of a program) becomes an explanation for the
structure of the object being described as soon as it is shorter than the in extenso
description of the object that it generates. In other words, a compressed encoding of an
in extenso description of an object can be considered a candidate explanation (not neces
sarily a “correct” one) for the structure of that object because it serves as a hypothesis
as to the nature of the process that gave rise to the object.
Moreover, it is hypothesized that the more parsimoniously one can describe an object
on some given level of detail, the better that description explains the structure of the
object on that level of detail. As discussed earlier, this is an application of Ockham’s razor
or the MDL principle (Rissanen 1978).
The following simple example serves to illustrate the foregoing ideas. Consider the
problem of describing the set of twelve points shown in Figure 8.1. One could do this by
explicitly giving the coordinates of all twelve points, thus:
P(p(0, 0), p(0, 1), p(1, 0), p(1, 1), p(2, 0), p(2, 1), p(2,2), p(2, 3), (1)
p(3, 0), p(3,1), p(3, 2), p(3, 3)).
In this encoding, a set of points, {p1, p2, . . . pn}, is denoted by P(p1, p2, . . . pn) and each
point within such a set is denoted by p(x,y), where x and y are the x- and y-coordinates
of the point respectively. The encoding in (1) can be thought of as being a program that
computes the set of points in Figure 8.1 simply by specifying each point individually.
Representing this set of points in this way requires one to write down twenty-four inte
ger coordinate values. Moreover, the encoding does not represent any groupings of the
points into larger constituents, nor does it represent any structural relationships
between the points. In other words, this description is an in extenso description that
does not represent any of the structure in the point set and therefore cannot be said to
offer any explanation for it. One could go even further and say that expression (1) repre
sents the data as though it were a random, meaningless arrangement of points with no
order or regularity.
0
0 1 2 3
Note that, in order to actually generate the set of twelve points, the description (1)
needs to be decoded. An algorithm that carries out this decoding is called a decoder.
In this case, such a decoder only needs to know about the meanings of the P(·) and
p(x,y) formalisms.
One can obtain a shorter encoding of the point set in Figure 8.1 by exploiting the fact
that it consists of three copies, at different spatial positions, of the square configuration
of points,
T(P(p(0, 0), p(0,1), p(1, 0), p(1,1)), V(v(2, 0), v(2, 2))), (3)
where T(P(p1, p2, . . . pn),V(v1, v2, . . . vm)) denotes the union of the point set, {p1, p2, . . . pn},
and the point sets that result by translating {p1, p2, . . . pn} by the vectors, {v1, v2, . . . vm},
where each vector is denoted by v(x,y), x and y being the x- and y-coordinates, respectively,
of the vector. Note that description (3) fully specifies the point set in Figure 8.1
using only twelve integer values—that is, half the number required to explicitly list the
coordinates of the points in the in extenso description in (1). Description (3) is thus a
losslessly compressed encoding of description (1). Description (3) thus qualifies as an
explanation for the structure of the point set in Figure 8.1, precisely because it represents
some of the structural regularity in this point set. If one perceives the point set in
Figure 8.1 in the way represented by description (3), then the twelve points are no longer
perceived to be arranged in a random, meaningless manner—they are now seen as
resulting from the occurrence of three identical squares. Moreover, it is precisely because
expression (3) captures this structure that it manages to convey all the information in (1)
while being only roughly half the length of (1).
On the other hand, in order to generate the actual point set in Figure 8.1 from the
expression in (3), the decoder now needs to be able to interpret not only the operators
P(·) and p(x,y), but also the operators T(·), V(·), and v(x,y). The decoder required to
decode description (3) is therefore itself longer and more complex to describe than the
decoder required to decode expression (1). The crucial question is therefore whether we
save enough on the length of the encoding to warrant the resulting increase in length of
the decoder. If the set of twelve points in Figure 8.1 were the only data that we ever had to
understand and the operators T(·), V(·) and v(x,y) were only of any use on this par
ticular dataset, then the increase in the length of the decoder required to implement these
extra operators would probably exceed the decrease in the length of the encoding that
these operators make possible. Consequently, in this case, the parsimony principle
would not predict that description (3) represented a better way of understanding the
point set in Figure 8.1—the new encoding would just replace the specification of eight
random points in (1) with two random vectors in (3) and three randomly chosen new
operators to be encoded in the decoder. However, the concepts of a vector, a vector set,
and the operation of translation can be used to formulate compressed encodings of an
music analysis and data compression 159
infinite and commonly occurring class of point sets—those containing subsets related by
translation. If we encode a sufficiently large sample of such point sets using translation-
invariance as a compression strategy, then the saving in the lengths of the resulting
encodings will more than offset the increase in the length of the decoder required to
make it capable of handling translation of point sets. This illustrates that interpreting
the point set in Figure 8.1 as being composed of three identical square configurations of
four points only makes sense if one is interpreting this point set in the broad context of
a large (in this case, infinite) class of point sets, of which the set of points in Figure 8.1 is
an example.
The foregoing example illustrates that what we are really interested in is not just the
length of an encoding but the sum of the length of the encoding and the length of
the decoder required to generate the in extenso description of the encoded object from the
encoding. We therefore think about descriptions of objects as being two-part codes in
which the first part (the decoder) represents all the structural regularity in the object
that it shares with all the members of a (typically large) set of other objects and the sec
ond part represents what is unique to the object and random relative to the decoder.5
This is why we would not, for example, be interested in a “decoder” that itself consists
solely of an in extenso description of the point set in Figure 8.1 and generates this point
set every time it is run with no input. In this case, the “encoding” of the data would be of
length zero but, because the decoder would be of length at least equal to that of the
uncompressed in extenso description of the point set, we would have no net com
pression and, consequently, no explanation.
If the best explanations are the shortest descriptions that account for as much data as
possible in as much detail as possible, then this suggests that the goal of music analysis
should be to find the shortest—but most detailed—description of as much music as pos
sible. To illustrate this, let us consider a close musical analogue of the point-set example
in Figure 8.1 discussed previously.
Figure 8.2 shows the beginning of J. S. Bach’s Prelude in C minor (BWV 871) from the
second book of Das Wohltemperierte Klavier (1742) and Figure 8.3 shows a point-set repre
senting this music, in which the horizontal dimension represents time in sixteenth
pitch letter name (A–G) and octave of a note but not its alteration ( . . . ≅≅, ≅, ∃, #, *, . . . ),
notes and the vertical dimension represents morphetic pitch, an integer that encodes the
so that, for example, D≅4, D∃4 and D#4 all have the same morphetic pitch of 24
(Meredith 2006, 2007). The union of the three, 4-note patterns, A, B, and C, in Figure 8.3
could be described in an in extenso manner, on an analogy with description (1), as follows:
P(p(1, 27), p(2, 26), p(3, 27), p(4, 28), p(5, 26), p(6, 25), p(7, 26), (4)
p(8, 27), p(9, 25), p(10, 24), p(11, 25), p(12, 26))
160 david meredith
Figure 8.2 The opening notes from J. S. Bach’s Prelude in C minor (BWV 871) from the s econd
book of Das Wohltemperierte Klavier (1742). Patterns A, B, and C correspond, respectively, to the
patterns with the same labels in Figure 8.3 (from Meredith et al. 2002).
30
29
A
28
B
27
C
26
25
24
23
22
21
20
19
18
17
16
0 1 2 3 4 5 6 7 8 9 10 11 12
Figure 8.3 A point-set representation of the music in Figure 8.2. The horizontal dimension
represents time in sixteenth notes; the vertical dimension represents morphetic pitch (Meredith
2006, 2007). Patterns A, B, and C correspond, respectively, to the patterns with the same labels
in Figure 8.2. See text for further explanation (from Meredith et al. 2002).
This would require one to write down twenty-four integer coordinates. Alternatively,
on an analogy with description (3), one could exploit the fact that the set consists of
three occurrences of the same pattern at different (modal) transpositions, and describe
it more parsimoniously as follows:
music analysis and data compression 161
T(P(p(1, 27), p(2, 26), p(3, 27), p(4, 28)), V(v(4, −1), v(8, −2))) (5)
This expression not only requires one to write down only half as many integers but also
encodes some of the analytically important structural regularity in the music—namely,
that the twelve points consist of three, 4-note patterns at different transpositions. Thus,
by seeking a compressed encoding of the data, we have succeeded in finding a represen
tation that gives us important information about the structural regularities in that data.
In the particular case of Figure 8.3, we can get an even more compact description by
recognizing that the vector mapping A onto B is the same as that mapping B onto C. This
means that one could represent the vector set V(v(4,−1),v(8,−2)) in description (5) as a
vector sequence consisting of two consecutive occurrences of the vector v(4,−1), where
the result of translating pattern A by the first vector in the sequence is itself translated by
the second vector in the sequence. For example, this could be encoded as V(2v(4,−1)),
where the emboldened V operator indicates that what follows is a sequence or ordered
set, not an unordered set; and where we denote k consecutive occurrences of a vector,
v(x,y), by kv(x,y). This would, of course, require a modification of the decoder so that it
could process both vector sequences and the shorthand notation for sequences consist
ing of multiple occurrences of the same vector. As discussed earlier, whether or not add
ing this functionality to the decoder would be worthwhile depends on whether the new
functionality allows for a sufficient reduction in encoding length over the whole class of
musical objects that we are interested in explaining. In this particular case, since the
device of musical sequence, exemplified by the excerpt in Figure 8.2, is commonly used
throughout Western music, it would almost certainly be a good strategy to allow for the
encoding of this type of structure in a compact manner. It is therefore not surprising
that most psychological coding languages that have been designed for representing
musical structure allow for multiple consecutive occurrences of the same interval or
vector to be encoded in such a compact form (Deutsch and Feroe 1981; Meredith 2012b;
Restle 1970; Simon and Sumner 1968, 1993).
Music-Theoretical Concepts
That Promote Compact Encodings
of Musical Objects
There are a number of basic music-theoretical concepts and practices that help Western
musicians and composers to encode tonal music parsimoniously and reduce the cogni
tive load required to process musical information.
One example of such a concept is that of a voice. The strategy of conceiving of music as
being organized into voices substantially reduces the amount of information about note
durations that has to be communicated and remembered by musicians. For the vast
162 david meredith
majority of notes in a piece of polyphonic Western music, the duration is equal to the
within-voice, inter-onset interval—that is, most notes are held until the onset of the next
note in the same voice. This means that, for most notes, provided we know the voice to
which it belongs, we do not have to explicitly encode its duration—we only need to do so
if there is a rest between it and the next note in the same voice. Grouping notes together
into sequences that represent voices therefore considerably reduces the amount of
information about note durations that needs to be explicitly encoded, remembered, and
communicated.
The way in which pitch information is encoded in standard Western staff notation
also helps to make scores more parsimonious. Key signatures, for example, remove the
need to explicitly state the accidental for every note in a piece. Instead, accidentals only
have to be placed before notes whose pitches are outside the diatonic set indicated by the
key signature. Since most of the notes within a single piece of Western tonal music occur
within a small number of closely related diatonic sets (i.e., within a relatively limited
range on the line of fifths), accidentals are typically only necessary for a small pro
portion of the notes in a score. Key signatures, therefore, provide a mechanism for parsi
moniously encoding information about pitch names in Western tonal music.
Also, typically, Western music based on the major–minor system (or the diatonic
modes) is organized into consecutive temporal segments in which each note is under
stood to have one of seven different basic tonal functions within the key in operation at
the point where the note occurs. For example, in the major–minor system, these basic
tonal functions would be {tonic, supertonic, mediant . . . leading note} and each could be
modified or qualified by being considered flattened or sharpened relative to a diatonic
major or minor scale. Staff notation capitalizes on this by providing only seven different
vertical positions at which notes can be positioned within each octave, rather than the
twelve different positions that would be necessary if the pitch of each note were repre
sented chromatically rather than in terms of its role within a seven-note scale. Again,
this strategy allows for pitch information to be encoded more parsimoniously, leading
to a reduction in the cognitive load on a musician reading the score.
This pitch-naming strategy leads to more parsimonious encodings by assigning simpler
(shorter) encodings to pitches that are more likely to occur in the music. Time signatures
similarly define a hierarchy of “probability” over the whole range of possible temporal
positions at which a note may start within a measure. Specifically, notes are more likely to
start on stronger beats.6 In Western classical and popular music, this results in only very
few possible positions within a bar being probable positions for the start (or end) of a note
and the notation is designed to make it easier to notate and read notes that start at more
probable positions (i.e., on stronger beats). In data compression, variable-length codes,
such as the Huffman code (Huffman 1952; Cormen et al. 2009, 431–435) or Shannon–Fano
code (Shannon 1948a, 1948b; Fano 1949), work in a closely analogous way by assigning
shorter codes (i.e., simpler encodings) to more probable symbols or symbol strings.
Huffman coding, in particular, assigns more frequent symbols to nodes closer to the root
in a binary tree, which is closely analogous to tree-based representations of musical meter
that assign stronger beats to higher levels in a tree structure (Lerdahl and Jackendoff 1983;
Temperley 2001, 2004, 2007; Martin 1972; Meredith 1996, 214–219).
music analysis and data compression 163
It thus seems that several features of Western staff notation and certain music-theo
retical concepts have evolved in order to allow for Western tonal music to be encoded
more parsimoniously.
Kolmogorov Complexity
The work presented in this chapter is based on the central thesis that explanation is
compression. The more compressible an object is, the less random it is, the simpler it is
and the more explicable it is. This basic thesis was formalized by information theorists
during the 1960s and encapsulated in the concept of Kolmogorov complexity. The
Kolmogorov complexity of an object is a measure of the amount of intrinsic information
in the object (Chaitin 1966; Kolmogorov 1965; Solomonoff 1964a, 1964b; Li and
Vitányi 2008). It differs from the Shannon information content of an object, which is the
amount of information that has to be transmitted in order to uniquely specify the object
within some predefined set of possible objects. The Kolmogorov complexity of an object
is the length in bits of the shortest possible effective (i.e., computable) description of an
object, where an effective description can be thought of as being a computer program
that takes no input and computes the object as its only output. In other words, the
Kolmogorov complexity of an object is a measure of the complexity of the simplest
process that can give rise to the object. The more structural regularity there is in an object,
the shorter its shortest possible description and the lower its Kolmogorov complexity.
Unfortunately, it is not generally possible to determine the Kolmogorov complexity of
an object, as it is usually impossible to prove that any given description of the object is
the shortest possible. Nevertheless, the theory of Kolmogorov complexity supports the
notion of using the length of a description as a measure of its complexity and it supports
the idea that the shorter the description of a given object, the more structural regularity
that description captures. The theory has also been used to show formally that data com
pression is almost always the best strategy for both model selection and prediction
(Vitányi and Li 2000). For some further comments on the relationship between music
analysis and Kolmogorov complexity, see Meredith (2012a).
As stated at the outset, the work presented here is based on the assumption that the goal
of music analysis is to find the best possible explanations for musical works. This could
be recast in the language of psychology by saying that music analysis aims to find the
most successful perceptual organizations that are consistent with a given musical surface
(Lerdahl and Jackendoff 1983).
164 david meredith
Most theories of perceptual organization have been founded on one of two principles:
the likelihood principle (Helmholtz 1867), which proposes that the perceptual system
prefers organizations that are the most probable in the world; and the simplicity principle
(Koffka 1935), which states that the perceptual system prefers the simplest perceptual
organizations.
For many years, psychologists considered the simplicity and likelihood principles to
be in conflict until Chater (1996), drawing on the theory of Kolmogorov complexity,
pointed out that the two principles are mathematically equivalent. However, Vitányi
and Li (2000) showed that, strictly speaking, the predictions of the likelihood principle
(which corresponds to Bayesian inference) and the simplicity principle (which corre
sponds to what they call the “ideal MDL principle”) are only expected to converge for
individually random objects in computable distributions (Vitányi and Li 2000, 446).
They state, “if the contemplated objects are nonrandom or the distributions are not
computable then MDL [i.e., the simplicity principle] and Bayes’s rule [i.e., the likelihood
principle] may part company.”
Musical objects are typically highly regular and not at all random, at least in the sense
that randomness is defined within algorithmic information theory (Li and Vitányi 2008,
49ff.). Vitányi and Li’s conclusions therefore seem to cast doubt on whether approaches
based on the likelihood principle, commonly applied in Bayesian and probabilistic
approaches to musical analysis such as those proposed by Meyer (1956), Huron (2006),
Pearce and Wiggins (2012), and Temperley (2007), can ever successfully be used to dis
cover certain types of structural regularity in musical objects such as thematic transfor
mations or parsimonious generative definitions of scales or chords.
The approach presented in this chapter is therefore more closely aligned with
models of perceptual organization based on the simplicity principle—in particular,
theories of perceptual organization in the tradition of Gestalt psychology (Koffka 1935)
that take the form of coding languages designed to represent the structures of patterns
in particular domains. Theories of this type predict that sensory input is more likely to
be perceived to be organized in ways that correspond to shorter descriptions in a
particular coding language. Coding theories of this type have been proposed for serial
patterns (Simon 1972), visual patterns (Leeuwenberg 1971), and, indeed, musical patterns
(Deutsch and Feroe 1981; Meredith 2012b; Povel and Essens 1985; Restle 1970; Simon and
Sumner 1968, 1993).
A Sketch of a Compression-Based
Model of Musical Learning
Let us define a musical object to be any quantity of music, ranging from a single note
through to a complete work or even a collection of works. A musical object is typically
interpreted by a listener or an analyst in the context of some larger object that contains it
music analysis and data compression 165
I
T
WS P
C F
Figure 8.4 A Venn diagram illustrating various contexts in which a musical object might be
interpreted. A phrase (P) could be interpreted within the context of a section (S), which could
be interpreted within the context of a work (W), and so on. C = works by the same composer;
F = works in the same form or genre; I = works for the same instrumentation; T = tonal music;
M = all music.
(see Figure 8.4). In essence, the model of musical learning presented here is as follows.7
The analyst or listener explicitly or implicitly tries to find the shortest program that
computes the in extenso descriptions of a set of musical objects containing:
Figure 8.5 The analyst’s or listener’s understanding of a musical object (the dark gray circle—in
red on the companion website) is modeled as a program, P, that computes a set of musical objects
containing the one to be explained along with other related objects (the light gray circles—in yellow
on the companion website) forming a context within which the explanandum is interpreted.
P P′
Figure 8.6 When the listener hears a new piece (the dark gray circle—in red on the compan
ion website), the existing explanation (i.e., “program”) (P) for all the music previously heard is
minimally modified to produce a new program (P’) to account for the new piece in addition to
all previously encountered music. This might be achieved by discovering the simplest way of
interpreting as much of the material in the new piece in terms of what is already known.
g enerate the previously heard pieces in a way that differs from that in which P generates
these pieces, reflecting the fact that hearing a new piece may change the way that one
interprets pieces that one has heard before.
One can speculate that P’ is produced in a two-stage process. In the first stage, an
attempt is made to interpret as much of the new, unfamiliar piece as possible by reusing
elements and transformations that have previously been used to encode (i.e., under
stand) music. This will typically lead to a compact encoding of the new piece if it con
tains material that is related to that in previously encountered music. However, after this
first stage, the global interpretation of all pieces known to the listener/analyst (including
music analysis and data compression 167
the most recently interpreted piece) may no longer be as close to optimal as it could be.
In a second stage, therefore, the brain of the listener or analyst might carry out a more
computationally expensive “knowledge consolidation” process in which an attempt is
made to find a globally more efficient encoding of all music known to the individual.
This might, for example, occur during sleep (see Tononi and Cirelli 2014) and might
consist of a randomized process of seeking alternative encodings of individual pieces
that help to produce a more efficient global interpretation of the music known to the
individual.
On this view, music analysis, perception, and learning essentially reduce to the process
of compressing musical objects. This is, of course, an idealized model: for example,
in practice, a listener will not have internalized a model that can account in detail for all
the music they have previously heard. In other words, in reality, this learning process
would probably be based on rather lossy compression.
However, it is important to stress that, even though both the analyst and the listener
aim to find the shortest possible encodings of the music they encounter, they both usu
ally fail to achieve this. As Chater (1996) points out, “the perceptual system cannot, in
general, maximize simplicity (or likelihood) over all perceptual organizations. . . . It is,
nonetheless, entirely possible that the perceptual system chooses the simplest (or most
probable) organization that it is able to construct” (578). This is largely a result of the
limited processing and memory resources available to the perceptual system. For exam
ple, we typically describe the structure of a piece of music in terms of motives, themes,
and sections, all of which are temporally compact segments, meaning that they are
patterns that contain all the events that occur within a particular time span. It could well
be that, for some pieces, a more parsimonious description (corresponding to a better
explanation) might be possible in terms of patterns containing notes and events that
are dispersed widely throughout the piece. However, listeners would normally fail to
discover such patterns because their limited memories and attention spans constrain
them to focus on patterns that are temporally compact (see also Collins et al. 2011).
The model just sketched can be applied to understanding the emergence of differences
between the ways that individuals understand the same piece. The model proposed in
the previous section consists essentially of a greedy algorithm8 that is used to construct
an interpretation for a newly encountered piece that minimally modifies an existing
“program” that generates descriptions of all the pieces in a particular context set. It was
proposed that this greedy approach might be supplemented by a computationally more
expensive process of consolidation that attempts to find a globally more efficient
encoding. Nevertheless, because such a consolidation process will not generally be
168 david meredith
capable of consistently discovering a globally optimal encoding, the way that an individual
understands a given piece will generally depend not only on which pieces they already
know, but also on the order in which these pieces were encountered. This implication
could fairly straightforwardly be tested empirically.
A rather crude version of the foregoing model has been implemented in an algorithm
called SIATECLearn. The SIATECLearn algorithm is based on the geometric pattern
discovery algorithm, SIATEC, proposed by Meredith and colleagues (2002). The SIATEC
algorithm takes as input a set of points called a dataset and automatically discovers all the
translationally related occurrences of maximal repeated patterns in the dataset. If the
dataset represents a piece of music, with each point representing a note in pitch-time
space, then two patterns in this space related by translation correspond to two state
ments of the same musical pattern, possibly with transposition. We say a pattern P is
translatable within a dataset D if there exists a vector, v, such that P translated by v gives a
pattern that is also in D. A translatable pattern is maximal for a given vector, v, in a data
set D, if it contains all the points in the dataset that can be mapped by translation by v
onto other points in the dataset. The maximal translatable pattern (MTP) for a vector v
in a dataset D, which we can denote by (MTP (v, D), can also be thought of as being the
intersection of the dataset D and the dataset D translated by –v. That is,
MTP (v , D) = D ∩ (D − v ). (6)
For each (nonempty) MTP, P, in a dataset, SIATEC finds all the occurrences of P, and
outputs this occurrence set of P. Such an occurrence set is called the translational equiv-
alence class (TEC) of P in D, denoted by TEC(P, D), because it contains all the patterns in
the dataset that are translationally equivalent to P. That is,
SIATEC therefore takes a dataset as input and outputs a collection of TECs, such that
each TEC contains all the occurrences of a particular maximal translatable pattern.
An algorithm called SIATECCompress (Meredith 2013b, 2015, 2016) runs SIATEC on
a dataset, then sorts the found TECs into decreasing order of “quality.” Given two TECs,
the one that results in the better compression (in the sense of expressions (4) and (5),
discussed earlier) is deemed superior. If both TECs give the same degree of com
pression, then the one whose pattern is spatially more compact is considered superior.
SIATECCompress then scans this list of occurrence sets and computes an encoding of
the input dataset in the form of a set of TECs that, taken together, account for or cover
the entire input dataset.
SIATECLearn runs SIATECCompress, but also stores the patterns it finds on each
run and will preferably reuse these patterns rather than newly found ones on subse
quent runs of the algorithm. Thus, when SIATECLearn is run on the twelve-point pat
tern on the left in Figure 8.7, it “interprets” the dataset as being constructed from three
occurrences of the square pattern shown. This square pattern is therefore stored in its
music analysis and data compression 169
y 6 y 5
5
4
4
3
3
2
2
1
1
0 0
0 1 2 3 4 0 1 2 3 4
x x
Figure 8.7 Output of SIATECLearn when presented first with the dataset on the left and then
with the dataset on the right.
y 5 y 6
5
4
4
3
3
2
2
1
1
0 0
0 1 2 3 4 0 1 2 3 4
x x
Figure 8.8 Output of SIATECLearn when presented first with the dataset on the left and then
with the dataset on the right.
“long-term” memory. When the algorithm is subsequently run on the ten-point dataset
on the right, it prefers to use the stored square pattern rather than any of the patterns
that it finds in this newly encountered dataset; it interprets the new dataset as containing
two occurrences of the square pattern along with two extra points.
Conversely, when SIATECLearn is first presented with the ten-point dataset, it inter
prets the dataset as being composed from five occurrences of the two-point vertical line
configuration shown on the left in Figure 8.8. This pattern is then stored in long-term
memory, so that, when the algorithm is subsequently presented with the twelve-point
dataset, it interprets this set as consisting of six occurrences of this vertical line rather
than three occurrences of the square pattern. This very simple example illustrates how
the way in which objects are interpreted can depend on the order in which they are
presented.
170 david meredith
Given the concept of a TEC, as defined in (7) earlier, we can define the covered set,
CS(T), of a TEC T to be the union of all the patterns in T. That is
COSIATEC (Meredith et al. 2003; Meredith 2013b, 2015, 2016) is a greedy compression
algorithm based on SIATEC. The algorithm takes a dataset as input and computes a set
of TECs that collectively cover this dataset in such a way that none of the TECs’ covered
sets intersect. It also attempts to choose this set of TECs so that it minimizes the length
of the output encoding. The basic idea behind the algorithm is sketched in the pseudo-
code in Figure 8.9.
As shown in Figure 8.9, the COSIATEC algorithm first finds the “best” TEC in the
output of SIATEC for the input dataset, S. The best TEC is the one that produces the best
compression. This means that it is the one that has the best compression factor, which is
the ratio of the number of points in its covered set (as defined in (8)) to the sum of the
number of points in one occurrence of the TEC’s pattern and the number of occurrences
minus 1. The reasoning behind this is that a TEC can be compactly encoded as an
ordered pair, (P,V), where P is one occurrence in the TEC and V is the set of vectors that
map P onto all the other occurrences of P in the dataset. The number of vectors in V is
therefore equal to the number of occurrences of P minus 1. The length of an in extenso
encoding of a TEC’s covered set in terms of points is simply |CS(T)| as defined in (8).
Each vector in V has approximately the same information content as a point in P, so the
length of an ordered pair encoding of a TEC, (P,V), in terms of points is approximately
|P|+|V|. The compression factor is the ratio of the length of the in extenso encoding to
the length of the compressed encoding. Thus, the compression factor of a TEC, T = (P,V),
denoted CF(T), can be defined as
CS(T )
CF (T ) = .
P +V
COSIATEC(S)
while S is not empty
Find the best TEC, T, using SIATEC
Add T to the encoding, E
Remove the points covered by T from S
return the encoding E
If two TECs have the same compression factor, then COSIATEC chooses the TEC in
which the first occurrence of the pattern is the more compact: the compactness of a pat
tern is the ratio of the number of points in the pattern to the number of dataset points in
the bounding box of the pattern. The rationale behind this heuristic is that patterns are
more likely to be noticeable if the region of pitch-time space that they span does not also
contain many “distractor” points that are not in the pattern. These heuristics for evaluat
ing the quality of a TEC are discussed in more detail by Meredith and colleagues (2002),
Meredith (2015), and Collins and coauthors (2011).
As shown in Figure 8.9, once the best TEC, T, has been found for the input dataset, S,
this TEC is added to the encoding (E) and the covered set of T, CS(T), is removed from S.
Once the covered set of T has been removed from S, the process is repeated, with
SIATEC being run on the new S. The procedure is repeated until S is empty, at which
point E contains a set of TECs that collectively cover the entire input dataset. Moreover,
because the TEC that gives the best compression factor is selected on each iteration, E is
typically a compact or compressed encoding of S. COSIATEC typically produces encod
ings that are more compact than those produced by SIATECCompress.
Figure 8.10 shows the output of COSIATEC for a short Dutch folk song. The complete
piece can be encoded as the union of the covered sets of five TECs. In Figure 8.10, each
TEC is drawn in a different shade. The first TEC, drawn in red, consists of the occur
rences of a three-note, lower-neighbor-note figure. This TEC has the best compression
factor of any TEC for a maximal translatable pattern in this dataset. After these three-
note patterns have been removed from the piece, the next best TEC is the one drawn in
light green in Figure 8.10, namely the two occurrences of the four-note, rising scale seg
ment. The fifth TEC consists of the fourteen occurrences of a single unconnected point
in Figure 8.10. These are the points (notes) that are left over after removing the sets of
repeated patterns that give the best compression factor. This final set of “residual” points,
which cannot be compressed by the algorithm, is essentially seen by the algorithm as
being random “noise” that it cannot “explain.”
Figure 8.11 shows the analysis generated by COSIATEC for a more complex piece of
music, the Prelude in C minor (BWV 871) from book 2 of J. S. Bach’s Das Wohltemperierte
Klavier. Note that the first TEC (in red) generated by COSIATEC (i.e., the one that
results in the most compression over the whole dataset) is precisely the four-note
pattern shown in Figure 8.2, discussed earlier.
NLB015569_01 mid
28
Morphetic pitch
21
14
7
0
0 185 370 555 740 925 1110 1295 1480 1665 1850 2035 2220 2405 2590 2775 2960 3145 3330 3515 3700
Time/tatums
Figure 8.10 The set of TECs computed by COSIATEC for a short Dutch folk song, “Daar zou
er en maagdje vroeg opstaan” (file number NLB015569 from the Nederlandse Liederen Bank,
http://www.liederenbank.nl). Courtesy of Peter van Kranenburg.
172 david meredith
35
28
Morphetic pitch
21
14
0
0 2131 4262 6393 8524 10655 12786 14917 17048 19179 21310 23441 25572 27703 29834 31965 34096 36227 38358 40489 42620
Time/tatums
Figure 8.11 Analysis generated by COSIATEC of J. S. Bach’s Prelude in C minor (BWV 871)
from the second book of Das Wohltemperierte Klavier (1742). Each set of pattern occurrences
(i.e., TEC) is displayed in a distinct shade of gray (see image on companion website which uses
colors). The first TEC generated, consisting of occurrences of the opening “V”-shaped motive
(indicated with triangles here and red on the companion website), is the one that has the highest
compression factor over the whole dataset. The overall compression factor of this analysis is 2.3,
and the residual point set, containing notes that the algorithm does not re-express in a compact
form, contains 3.61 percent of the notes in the piece (corresponding to 25 out of 692 notes).
In the introduction to this chapter, it was proposed that, when given two or more
different analyses of the same piece of music (or, more generally, musical object), it
may be possible to determine which of the analyses is the best for carrying out certain
objectively evaluable tasks. It is similarly possible to evaluate algorithms that compute
analyses by comparing how well the generated analyses allow certain tasks to be
performed.
In a recent paper (Meredith 2015), the point-set compression algorithms, COSIATEC
and SIATECCompress, were compared on a number of different tasks with a third
greedy compression algorithm proposed by Forth and Wiggins (2009) and Forth (2012).
The algorithms were evaluated on three tasks: folk song classification, discovery of
repeated themes and sections, and discovery of fugal subject and countersubject entries.
Although no obvious correlation was found between compression factor and per
formance on these tasks, COSIATEC achieved both the best compression factor (around
1.6) and the best classification success rate (84%) on the folk-song classification task. The
pattern-discovery task on which the algorithms compared in this study were evaluated
consisted of finding the repeated themes and sections identified in the JKU Patterns
Development Database, a collection of five pieces of classical and baroque music, each
accompanied by “ground-truth” analyses by expert musicologists (Collins 2013). The
output of each algorithm was compared with these analyses. I have argued (Meredith
2015, 263–265) that these “ground-truth” analyses are not satisfactory for at least two
reasons: first, the musicologists on whose work the ground-truth analyses are based did
not consistently identify all occurrences of the patterns that they considered to be worth
mentioning; and second, there are patterns that are noticeable and important that the
music analysis and data compression 173
Figure 8.12 Examples of noticeable and/or important patterns in Bach’s Fugue in A minor
(BWV 889), that were discovered by the algorithms tested by Meredith (2015) but were not
recorded in the “ground-truth” analyses in the JKU Patterns Development Database used for
evaluation. Patterns (a), (b), and (d) were discovered by COSIATEC. Patterns (c) and (d) were
discovered by SIATECCompress.
musicologists who created the ground-truth analyses failed to mention. Indeed, the
tested algorithms discovered not only structurally salient patterns that the analysts
omitted to mention but also exact occurrences of the ground-truth patterns that are not
recorded in the ground-truth analyses. Figure 8.12 shows some examples of structurally
important patterns in a fugue by J. S. Bach that were not recorded in the “ground-truth”
analyses used for evaluation.
Notwithstanding the foregoing methodological issues with this task, it was found that
SIATECCompress performed best on average, achieving an average F1 score of about
50 percent over the five pieces in the corpus. However, COSIATEC, achieved F1 scores of
71 percent and 60 percent on the pieces by Beethoven and Mozart, respectively; and
Forth’s algorithm performed substantially better than the other algorithms on a fugue
by Bach. There was therefore no algorithm that consistently performed best on this task.
On the fugal analysis task, the algorithms performed rather less well than on the
other evaluation tasks. COSIATEC and SIATECCompress achieved a mean recall of
around 60 percent over the twenty-four fugues in the first book of J. S. Bach’s Das
Wohltemperierte Klavier. However, COSIATEC’s precision on this task was much
lower (around 10%). Overall, the best performing algorithm was SIATECCompress
that achieved an F1 score of around 30 percent on this fugal analysis task.
In the study just discussed, the performance of the SIA-based compression algo
rithms on the folk-song classification task was compared with that of the general-
purpose text compression algorithm, bzip2 (Seward 2010). On this task, bzip2 achieved a
much higher average compression factor (3.5) but a much lower classification success
rate (12.5%) than the SIA-based algorithms. At first sight, this might be interpreted as
evidence against the basic hypothesis that shorter descriptions correspond to better
explanations. In a later study, Corentin Louboutin and I therefore explored in more
174 david meredith
Applying a Compression-Driven
Approach to the Analysis of
Musical Audio
The main concern in this chapter has been with explaining “musical objects” by dis
covering losslessly compressed descriptions of these objects. The basic scheme is that one
takes an in extenso encoding of such an object and then attempts to find a short algo
rithm that generates that in extenso encoding as its only output. The encoding could be
on any level of granularity and could represent any quantity of music in any possible
domain in which a musical object might be manifested—for example, an image of a
score, a symbolic encoding of a score, an audio recording or a video recording. In the
examples and evaluations presented above, the focus has been on musical objects that
are symbolic encodings of scores. In such cases, one can realistically hope to be able to
produce losslessly compressed descriptions in which we are required to consider only a
music analysis and data compression 175
very small proportion of the information in the object to be “random” or “noise.” On the
other hand, if one were concerned with explaining the structure of a digital audio
recording of a performance of a piece produced by human performers playing from a
score, then one would expect the compression factors achievable to be lower and one
would expect to have to be satisfied with considering a larger proportion of the
information in the object as being “noise.” This is because the detailed structure of such a
recording depends not only on the score from which the players are performing, but also
many other factors that are perhaps harder to model, such as the acoustics of the space in
which the recording was made, the precise nature of the instruments used and, most
importantly, the players themselves and their own particular ways of interpreting the score.
Summary
In this chapter, I have proposed that the goal of music analysis should be to find the
“best” ways of understanding musical objects and that two different analyses of the same
musical object can be compared objectively by determining whether one of them allows
us to more effectively perform some specific set of tasks. I have also explored the hypoth
esis that, for all tasks that require an understanding of how a musical object is con
structed, the best ways of understanding that object are those that are represented by the
shortest possible descriptions of the object. I have briefly outlined how this hypothesis
relates to the theory of Kolmogorov complexity and to coding theory models of percep
tion. I have also briefly sketched how these ideas can form the basis of a theory of musi
cal learning that can potentially explain aspects of music cognition such as individual
differences. Finally, I briefly described the COSIATEC point-set compression algorithm
and reviewed the results of some experiments in which it and other related algorithms
have been used to automatically carry out musical tasks such as folk-song classification
and thematic analysis. The results achieved in these experiments generally support the
idea that the knowledge necessary to be able to successfully carry out advanced musico
logical tasks can largely be acquired simply by compressing in extenso representations
of musical objects. Moreover, some of the results clearly indicate a correlation between
compression factor and success on musicological tasks. However, these experiments
also show that performance on such tasks depends heavily both on the specific types of
redundancy exploited by the compression algorithm used to generate the compressed
encodings and on the precise form of the in extenso representations used as input to
these compression-based learning methods.
Acknowledgments
The work reported in this chapter was carried out as part of the EU collaborative project,
“Learning to Create” (Lrn2Cre8). The project Lrn2Cre8 acknowledges the financial support of
the Future and Emerging Technologies (FET) programme within the Seventh Framework
Programme for Research of the European Commission, under FET grant number 610859.
176 david meredith
Notes
1. https://www.midi.org/specifications
2. See, for example, chapter 25 of Book 1 of Aristotle’s Posterior Analytics (Bouchier 1901, 66).
3. Kolmogorov introduced the field of nonprobabilistic statistics at a conference in Tallinn,
Estonia, in 1973 and in a talk at the Moscow Mathematical Society in 1974 (Li and
Vitányi 2008, 405). Unfortunately, these talks were never published in written form.
4. See Simon and Sumner (1968, 1993) for a similar use of the term “in extenso” in the context
of music representations.
5. For a more technical discussion of two-part codes, see Vitányi and Li (2000, 447).
6. See Temperley (2007, chap. 3) for a model of rhythm and meter perception based on the idea
that simpler meters are more probable and events are more likely to occur on stronger beats.
7. This model was originally described by Meredith (2012c, 2013a).
8. A greedy algorithm attempts to solve an optimization problem by always choosing the locally
best option at each decision point in the construction of a solution. This does not always
produce a globally optimal solution, but for some problems it does (e.g., activity selection,
the construction of a Huffman code). For more details, see Cormen and colleagues
(2009, 414–450).
References
Bouchier, E. S. 1901. Aristotle’s Posterior Analytics. Oxford: Oxford University Press.
Burrows, M., and D. J. Wheeler. 1994. A Block-Sorting Lossless Data Compression Algorithm.
Palo Alto, CA: Digital Systems Research Center (now HP Labs). Technical Report SRC 124.
Chaitin, G. J. 1966. On the Length of Programs for Computing Finite Binary Sequences.
Journal of the Association for Computing Machinery 13 (4): 547–569.
Chater, N. 1996. Reconciling Simplicity and Likelihood Principles in Perceptual Organization.
Psychological Review 103 (3): 566–581.
Collins, T. 2013. JKU Patterns Development Database. http://tomcollinsresearch.net/research/
data/mirex/JKUPDD-Aug2013.zip. Accessed January 21, 2016.
Collins, T., R. Laney, A. Willis, and P. H. Garthwaite. 2011. Modeling Pattern Importance in
Chopin’s “Mazurkas.” Music Perception 28 (4): 387–414.
Cormen, T. H., C. E. Leiserson, R. L. Rivest, and C. Stein. 2009. Introduction to Algorithms.
3rd ed. Cambridge, MA: MIT Press.
Deutsch, D., and J. Feroe. 1981. The Internal Representation of Pitch Sequences in Tonal Music.
Psychological Review 88 (6): 503–522.
Fano, R. M. 1949. The Transmission of Information, Technical Report No. 65, March 17.
Cambridge, MA: Research Laboratory of Electronics, MIT.
Forth, J. 2012. Cognitively-Motivated Geometric Methods of Pattern Discovery and Models of
Similarity in Music. PhD thesis, Department of Computing, Goldsmiths, University of
London.
Forth, J., and G. A. Wiggins. 2009. An Approach for Identifying Salient Repetition in
Multidimensional Representations of Polyphonic Music. In London Algorithmics 2008:
Theory and Practice, edited by J. Chan, J. W. Daykin, and M. S. Rahman, 44–58. London:
College Publications.
music analysis and data compression 177
Bioacoustic s
Imaging and Imagining the Animal World
Mickey Vallee
Introduction
About 100 kilometers north of the Alberta Oil Sands, hidden amid the burned and
twisted shards of lumber, netted over freshly growing grass and sprouting pine, a com-
mon nighthawk nests on the ground, unseen by our eyes despite our attempts. Its own
speckled pattern mingling with the black, ash, and tan of the land, the nighthawk sits
unseen and unheard until one of the biologists whispers to the rest of us, “got it.” I can
see it: its form emerges from its surroundings under my own eyes like an image that
grows from a magic eye test, an organism whose home only a year ago was engulfed in a
700,000-hectare forest fire. It lays there like a taxidermy prop, but breathing rapidly,
seemingly unaware that we see it, its solid black marble eyes glistening with the silence
of a life full of wait. It seems designed for this terrain, and even as I see it. I cannot say in
all comfort that I can fix it under my gaze—my vision cannot hold it in all certainty,
which the nighthawk uses to its advantage when it explodes from the ground where it
nests (when it senses that its young are under threat, the common nighthawk produces a
loud “wing-clap” as it arks dramatically through the air away from its makeshift nest—
they do not nest in trees). This one lands in front of us, clumsily, with its plumage puffed
out and its wing bent backward, meters from its nest, in an attempt to distract us from its
young; one of the researchers slowly positions his iPhone overtop two chicks left on the
ground and takes a picture, and we leave hastily to let the mother return to her pair.
The common nighthawk is difficult to sight: it is nocturnal, it blends with its environ-
ment, it is notoriously elusive, and it is relatively quiet save for a nasally peent! in flight,
and a sonic wing-clap as it dives (Viel 2014). Sighting nighthawks, especially while they
are nesting, is painstaking work, which is why biologists are turning increasingly to
bioacoustics technologies for the purposes of identification and location exercises: an
180 mickey vallee
animal’s sonic emissions serve as a reliable route of access to their location and their
patterns of behavior (Laiolo 2010). But what to do with these sonic emissions, and
how these emissions play into the imagination of science, is my research focus in
this chapter.
In the context of the biological sciences, researchers who use bioacoustics are interested
in animals’ sounds in their ecological contexts and what those sounds might indicate
regarding the security of biodiversity and concerns over ecological depletion (this
context-based program of research is what some call “ecoacoustics”; see Sueur and
Farina 2015). Bioacoustics researchers use a variety of sound equipment to gather and
analyze data: durable autonomous recording units (ARUs) can track years of information
from within one location (Hutto and Stutzman 2009); backpack microphones strapped to
animals’ backs will track their sonic patterns as they move through space (Gill et al. 2016);
data are uploaded onto “listeners” that align the sounds with their appropriate species
(Schroder et al. 2012); and such results are uploaded for international research centers
and for international research teams.
This last point about the digital community of nighthawks is an example of a trans-
acoustic community. Barry Truax defines an acoustic community as an “information
rich” system that uses “acoustic cues and signals” that play a “significant role in defining
the community spatially, temporally in terms of daily and seasonal cycles, as well as
socially and culturally in terms of shared activities, rituals, and dominant institutions”
(2001, 66). But here, a transacoustic community transcends the immediacy of place,
transgresses the boundaries of immediate community, transforms data into inter
national research centers, transcends the visual with auditory analysis that has a better and
higher definition, and transposes from the audible into the visible. Because the sharing
is to access signs of population depletion and biodiversity loss, imagination is a scientific
tool for intervening in avoidable and undesirable futures. Indeed, I am not so much
interested in sound here as I am in sounding as a research method. Thus, in being
interested in how researchers are implicated in the infrastructures they spontaneously
design, I work toward inverting that infrastructure, with an eye to the argument that
such encounters are almost entirely reliant on a specific form of imagination where the
image of sound overrides the evanescence that is so often ascribed to it. Throughout the
chapter, I will attempt to “open the blackbox” of bioacoustics, by exploring the notion
that contemporary bioacoustics encourages hearing without listening: that is, when
emerging sound technologies are capable of detecting small variations in sound, they
register at a much higher accuracy than does human listening; the scientists involved in
this research must develop a technical mastery at species identification, but one that is
visually instead of audibly grounded. Characteristic of other sound-based research
units, bioacoustics researchers use sound not to understand the nature of the sonic but
instead as a means to find palpable solutions to pressing social and environmental
problems using the sonic as a mode for imaging. Bioacoustics researchers are not
intrigued by sound as an object so much as a method.
Since sound technologies and their storage devices have become (1) digitized and
(2) automated, they are capable of capturing the sounds of global populations in real time.
bioacoustics: imaging and imagining the animal world 181
Sound has become an essential methodological device for identifying species, as well as
tracking the polyphonic and polyrhythmic complexities of the various landscapes that
change across ecosystems. The open disciplinary conversation between disciplines and
to the public requires a more flexible and porous usage and definition of emerging sound
technologies, intended to educate members of the public in assisting research projects.
If Henry David Thoreau celebrated the “warbling of the birds ushering in the day” (1885, 35),
and this is certainly not an antiquated attitude toward birdsong today, researchers
today are more accepting of the fleeting nature of sound as “arrangements of charged
particles in the semiconductive materials of solid state “flash” memory, or the magnetic
surfaces of hard drives, tapes, and minidiscs” (Gallagher 2015a, 569). “Common prac-
tices include,” the geographer and sound recordist Michael Gallagher writes elsewhere,
“making field recordings, including the transduction of inaudible vibrations using
devices such as hydrophones and contact microphones; making compositions from
field recordings, and distributing these via CDs, MP3s, vinyl, radio or online platforms
such as weblogs, digital audio maps and podcasts; site-specific performances and
installations; and audio walks designed for listening on portable devices whilst moving
through a particular environment” (2015b, 469).
To elucidate the specific complexity of the transacoustic community proposed, I aim
in this chapter to clarify the general complexity that sound retains throughout creative
imaging processes. Sounding, I argue, has the potential of producing interdisciplinary
and theoretically innovative knowledge that seeks new virtual spatializations of the
earth. I proceed with a description of the historical context through which bioacoustics
became a research focus for those in the biological sciences. I am especially interested in
the move from individual specimens to whole species in their ecological contexts.
I conclude with a brief discussion of the transacoustic community, borrowing from Jakob
Johann Baron von Uexküll’s notion of the Umwelt; Uexküll explored the making of
worlds from a theoretical biological perspective, his ideas about organism self-preservation
deriving from a decidedly antimechanistic perspective that asks us to attend to an organ-
ism’s inner and external sense of events as the habituation of the codes and information
an organism uses to inhabit an environment.
Sounding Animals
well-known performances for and with a variety of species, live and recorded, continues
this tradition of linking aesthetic, sound, sense, and imagination (and, in Rothenberg’s
case, collaboration). Most of these and other examples have relied on an aesthetic of
listening, which the nineteenth-century music critic Paul Scudo once referred to as
“the divine language of sentiment and imagination” (Scudo, cited in Johnson 1995, 272).
But in this chapter I am interested in the type of imagination belonging to the biological
sciences. Before the mid-twentieth century, researchers in the biological sciences
centralized listening in their data collection and analyses, transcribing the sounds of
animals for the purposes of discovering keys to biodiversity, mating behavior, and the
anticipation of biological change. Because recording devices were too cumbersome, one
had to rely on having a musical ear to transcribe sound in the form of onomatopoeia.
Unconvinced by this method, Albert R. Brand at Cornell University’s Ornithology
Research Lab had attempted to capture bird song with “sound film” (used otherwise for
Hollywood “talkies”), which captured both the image and the sonic emissions of birds.
This he considered a more objective means of capturing sound. Brand had written:
[R]arely do two observers hear the same song in exactly the same way. The song is not
noticeably different when produced by varying members of the species, but by the
time the sound waves have affected the listeners’ hearing apparatus, and have been
transferred by the nerves to the brain, and interpreted by that organ, it has created an
entirely different sensation and impression on each individual listener. (1937, 14)
Although they were grounded in visual images and movement, Brand’s films were still
keen on listening in real time to the sounds of animals. However, by the mid-twentieth
century, the spectrogram was introduced to ornithologists to visualize sonic information.
Spectrograms had a significant impact on the democratization of access to sonic data
and analysis; scientists needed no more musical ear but rather a technical knowhow.
Spectrograms made a direct contribution to the democratization of sound analysis, data
collection, and contributions to science throughout the late twentieth century. Today
the spectrogram image (and its variations) is the most common image of an animal’s
utterance, an image that is inseparable from a new kind of work that would free up the
scientist from the burden of listening and instead place the attention on, first, placing
the equipment and, then, using it to capture the animals’ sounds. The researcher, now
liberated from their own ear, worked with the technology that could pick up the trans-
mission of information. Better yet, the spectrogram was equipped with a capacity for
accurate visualization, given that the vibrations from the needle on the machine would
be etched into a paper surface. The spectrogram caught more than the sound of the
organism, but rather the whole situation within which it was situated; this transcription
of the atmosphere, of its world, allowed researchers to visualize the polyrhythmic com-
plexities of its environment, including its communications with other species.
The spectrogram demanded a unique, visually grounded, art of its own: calligraphy,
traced on paper, meticulously teased out the upper portion of the recorded sound so as
to discover an arc represented through space (frequency) and time (duration). Birdsong
would no longer be described using words, or onomatopoeia for that matter, but had to
bioacoustics: imaging and imagining the animal world 183
have a direct inscription of the ecology in which the organism was situated onto pages,
using ink and paper. These were less contiguities, these points of contact, any kind of
mediation and transduction, than they were direct feelings, motivations, and movements
onto the page in order to trace the otherwise invisible (but real) contours of the bodies
responsible for producing them. Calligraphy thus demanded a particular visual detail of
sonic information that reduced the need for the humans involved in producing them to
listen attentively and instead to trace the contours of a sonic inscription.
The spectrogram was adept at picking up certain important information: the environ-
ment and the ecology in which the animal was situated, whereas the onomatopoeia
transcriptions isolated song from context. What became more important, then, was less
the taxonomy of the animal than what the animals’ sounds could tell researchers about
their surroundings and their environments: how they were situated within a community
or a sound ecology.
Bioacoustics researchers today are interested almost principally in method, technique,
and representation, attributed to the rapidly expanding datasets they have access to.
While some use multiple ARUs to triangulate the position of organisms and their return
to particular locations, others use sound to measure the amount of masking caused by
anthrophonic intrusion (see Berkaak, volume 1, chapter 15, for debates surrounding
cultural heritage). A vast array of representational methods and knowledge syntheses are
available to those interested in bioacoustics, moving well beyond the “manipulation and
playback” model of acoustic ecology, or the GIS (Geographic information system)-based
representations of landscape ecology.
Bioacoustics research is also a response to the uncertainty and anxiety around biodi-
versity loss, on a global scale, and the role that anthrophonic interference is having on
the balance of ecosystems. Some research teams use “noise mapping” by reading city
decibel levels corresponding to a color-coded legend that identify noise “hot-spots”
(Hawkins 2011). With a supposed 83 percent of the land in the United States being about
two-thirds of a mile from a road, conservation officers team up with acousticians and
sound ecologists to reduce the presence of helicopters, planes, and other means of trans-
porting especially tourists into natural landscapes (Powers 2016). Such high levels of
noise have inspired Gordon Hempton to locate the “quietest square inch on earth” in
(ironically enough) the United States that, he claims, has no anthrophonic interference
whatsoever for up to 20 minutes at a time (Berger 2015).
Such searches for quietude against the din of mobile humanity and expanding
urbanization have also resulted in conservationist measures to select habitats and in
the use of sonic technologies as a geoengineering strategy. These researchers have
taken to using sonic technologies to coordinate new and better soundscapes by
masking the anthrophonic interference with loudspeakers planted in natural settings
that are intended to “give back” the soundscape (Berger 2015). Others use multiple
recording technologies to triangulate the exact location of species so as to expedite
conservationist interventions for those who are deserting their natural habitat
(Donaldson 2016). (Triangulation is thus the creation of a virtual space that uses
sound, of tracing the contours of what a place might come to represent.) Playback
(which will be expanded later) is used for giving voice back to place.
184 mickey vallee
The bioacoustics researchers with whom I worked in Northern Alberta erected mist-nets
deep in the forest, in some places only accessible by bike or all-terrain vehicle, laced with
ghetto blasters emitting nighthawk calls in order to bait and capture them in flight. Once
bioacoustics: imaging and imagining the animal world 185
captured, the nighthawks were placed into small aluminum tubes and returned to the
research station on the gate of their pickup truck, were measured, and equipped with
a small backpack microphone and a GPS device; the data that is subsequently
recorded onto the microphone is uploaded to international research networks and
measures the “sound-event” of the organism (its heartbeat, its wing pace, its calls,
etc.) against the “sound-scene” of its habitat (the geophonic, anthrophonic, and bio-
phonic data that informs the backdrop against which the sound-events unfold). It was
not necessarily the results of this research that interested me as the infrastructural
labor that went into the capture of data. This infrastructure is conducive with the
“essence of mediations” that Bruno Latour describes as crossing the line between
signs and things. Latour writes:
To be sure, we no longer portray scientists as those who abandon the realm of signs,
politics, passions and feelings in order to discover the world of cold and human
things in themselves, “out there.” But that does not mean we portray them as talking
to humans only, because those they address in their research are not exactly humans
but strange hybrids with long tails, trails, tentacles, filaments tying words to things
which are, so to speak, behind them, accessible only through highly indirect
and immensely complex mediations of different series of instruments . . . Instead
of abandoning the base world of rhetoric, argumentation, calculation—much
like the religious hermits of the past—scientists began to speak in truth because
they plunge even more deeply into the secular world of words, signs, passions,
materials, and mediations, and extend themselves even further in the intimate
connections with the nonhumans they have learned to bring to bear on their
discussions. (1999, 96–97)
After one of the researchers with whom I worked strapped a backpack microphone
between the wings of a captured nighthawk, she released it to discover that it dropped
like a stone; the microphone was not properly installed and was causing a disequi
librium in the nighthawk’s capacity for flight. Swiftly she moved to its writhing body on the
ground to remove the device, which she did with ease. As the microphones take a great
deal of labor and care to install, it comes as a disappointment when they do not work.
Sound is connected to vibration and the kind of ethnographic fieldwork where sound is
the goal, but when sound is the method, the goal in this case being conservation, then
what does that tell us about the philosophy of soundworlds and the worlds between
sounds? This requires a turn to the transacoustic community as a way of imagining a
“society on display.”
bioacoustics: imaging and imagining the animal world 187
The common nighthawk, the bioacoustics researchers, the technologies through which
they are measured and made sense of, globally and locally, constitute the image of a
transacoustic community. The transacoustic community is bound by an elaborate
recording/playback apparatus that is not necessarily reducible to the listenable but
expands more generally into recording as a technical and cultural set of images. The
transacoustic community is itself an image central to contemporary debates and dis-
cussions around multispecies encounters: simply, that entities open through their surfaces
onto other entities (these openings are precisely the point of interrogation for bioacou-
stics researchers). There are variegated routes of access to such a conclusion (that enti-
ties have edges that open onto other entities), a few of which I have set out to explore in
this chapter. Of course, there are many ways of doing bioacoustics research, but all these
ways converge on the creation, maintenance, and breaking through of an entity’s
contained space through its sonic emissions; technological assemblages belonging to
bioacoustics researchers are intended to create new images and imaginations for how
these breaches are done.
Entities sound. And insofar as they sound, they make up their worlds. But since entities
sound out to other entities, it is insufficient to claim that the worlds to which these entities
belong are contained. Thus, the notion that a world is not self-contained, but rather
porous and protean, makes it necessary to interrogate the underlying function of worlds
as open, such as Uexküll’s philosophy of the Umwelt, which translates literally as “world
around” (Brentari 2015, 75): it describes a connecting point between an organism’s
interior and exterior sense of events, and describes the habituation of the codes and
information an organism uses to inhabit an environment (von Uexküll [1934] 2010,
126–132). Elizabeth Grosz has expanded on this position, explaining that the human
world finds its equivalent in the professional life of the architect, who brings together
things at the demarcation of boundaries, heterogeneous expressions within a space that
are given meaning through those very heterogeneous expressions (2008, 48). Grosz
accounts for the famous tick that appears early in Uexküll’s book, A Foray into the Worlds
of Animals and Humans, which he uses to go against the physiological approach to
organisms as the sum of independent reflexes, arguing instead that ticks are embedded
in affective worlds. Ticks use the smell of chemicals, the heat of the sun, and the flesh of
the mammal to complete their worlds, once they have conjoined with another organism,
such as the mammal; their world is defined by their connection to another’s world. The
tick’s world is thus complete when it attaches to the edge of another world (the world of
the unaware mammal, for instance, whose own totality the tick is equally unaware of).
While the organism’s perceptive world is an inherited, species-specific conscious per-
ception of those objects an organism perceives as outside of itself (such as the sound-event
mentioned above, the peent! of the nighthawk), its operative world completes the organ-
ism’s immersion in an environment by merging with it (such as the sound-scene, the
188 mickey vallee
flutter of moth wings the nighthawk dives into to consume) (see Brentari 2015, 99). At
stake here is thus, along with the maintenance of worlds as highly dependent on the
membrane of their milieus, the indeterminate nature of worlds as they forever open
onto and into other worlds. Elizabeth Grosz writes that such worlds are musical (a com-
mon audio-based resource for those constructing idealized collective experiences):
At what point does a boundary turn into a breaking point? Or, at what point does the
edge of one boundary merge into the edge of another? Uexküll establishes his position
against physiological accounts that would see organism and interorganism behaviors as
effects of stimuli reactions between different parts of an organism. This way of perceiving
organisms was isolationist, against environment, and against the notion that an organism
possessed agency in the construction of its world, and had to have agency in order to
converge with the edge of another world. But a cycle is never on its own; it is with other
cycles, and with others, significant and otherwise. Therefore, it is not the world that the
organism creates but its coconstructive capacity for going into other worlds. It is for
every melodic contour an edge of another’s world.
It is sound that accounts for the breakthrough between the edges of worlds. Sound, in
the context of this chapter, is the individuation of energies from separate worlds in
transduction, which involves at once the assemblages of vocal tissue and environmental
biotic and abiotic movements, including all other biophonic, geophonic, and anthro-
phonic crystallizations. When software has been programmed to detect a variant in
sound, a transduction, it is registered at a much higher accuracy than with organic lis-
tening and pattern identification (though researchers often test-listen samples to assure
accuracy), which introduces a technical or mechanical, in any case inorganic, listening
into the process of exploration and discovery. The individuation of longitudinal studies,
such as those that are multisited and multimicrophoned, which record more sound than
is possible to listen to organically, and which is never stable but always in flux, is the
crystallization of the node in a transacoustic community.
Imagination, to imagine, to image; these variations on a term point to the slippage
(linguistically and otherwise) of image; there are countless philosophical explorations
of image and imagination, but a question of how images come to be made through
sound is quite another matter, and one that is often grounded in routine empirics
bioacoustics: imaging and imagining the animal world 189
References
Baptista, L. F., and R. A. Keister. 2005. Why Birdsong Is Sometimes like Music. Perspectives in
Biology and Medicine 48 (3): 426–443.
Berger, E. 2015. Welcome to the Quietest Square Inch in the U.S. Outside. Outside Online.
https://www.outsideonline.com/2000721/welcome-quietest-square-inch-us. Accessed
September 30, 2017.
Brand, A. R. 1937. Why Bird Song Cannot Be Described Adequately. Wilson Bulletin 49 (1): 11–14.
Brentari, C. 2015. Jakob von Uexküll: The Discovery of the Umwelt between Biosemiotics and
Theoretical Biology. New York: Springer.
Chion, M. 1999. The Voice in Cinema. New York: Columbia University Press.
Cohn, J. P. 2008. Citizen Science: Can Volunteers Do Real Research? AIBS Bulletin
58 (3): 192–197.
Donaldson, A. 2016. National Network of Acoustic Recorders Proposed to Eavesdrop on
Australian Ecosystems. ABC News. http://www.abc.net.au/news/2016-07-11/soundscape-
ecology-could-track-environmental-changes/7587354. Accessed September 30, 2017.
Gallagher, M. 2015a. Field Recording and the Sounding of Spaces. Environment and Planning
D: Society and Space 33: 560–576.
Gallagher, M. 2015b. Sounding Ruins: Reflections on the Production of an “Audio Drift.”
Cultural Geographies 22 (3): 467–485.
Gill, L. F., P. B. D’Amelio, N. M. Adreani, H. Sagunsky, M. C. Gahr, and A. Maat. 2016.
A Minimum-Impact, Flexible Tool to Study Vocal Communication of Small Animals with
Precise Individual-Level Resolution. Methods in Ecology and Evolution 7 (11): 1349–1358.
190 mickey vallee
Grosz, E. 2008. Chaos, Territory, Art: Deleuze and the Framing of the Earth. New York:
Columbia University Press.
Hall, M. 2016. Soundscape Ecology: Eavesdropping on Nature. Deutsche Well (DW). http://
www.dw.com/en/soundscape-ecology-eavesdropping-on-nature/a-19304871. Accessed
March 12, 2017.
Hawkins, D. 2011. “Soundscape Ecology”: The New Science Helping Identify Ecosystems at
Risk. Ecologist: Setting the Environmental Agenda since 1970. http://www.theecologist.org/
investigations/science_and_technology/1171165/soundscape_ecology_the_new_science_
helping_identify_ecosystems_at_risk.html. Accessed September 30, 2017
Hill, P. 2007. Olivier Messiaen: Oiseaux exotiques. Farnham: Ashgate.
Hutto, R. L., and R. J. Stutzman. 2009. Humans versus Autonomous Recording Units:
A Comparison of Point-Count Results. Journal of Field Ornithology 80 (4): 387–398.
Johnson, J. J. 1995. Listening in Paris: A Cultural History. Berkeley: University of California Press.
Kircher, A. (1650) 1970. Musurgia Universalis: sive Ars Magna, Consoni et Dissoni. Hildesheim
and New York: Olms.
Laiolo, P. 2010. The Emerging Significance of Bioacoustics in Animal Species Conservation.
Biological Conservation 143 (7): 1635–1645.
Latour, B. 1999. Pandora’s Hope: Essays on the Reality of Science Studies. Cambridge, MA:
Harvard University Press.
Mair, M., C. Greiffenhagen, and W. W. Sharrock. 2015. Statistical Practice: Putting Society on
Display. Theory, Culture and Society 33 (3): 51–77.
Mazumder, A. 2016. Pacific North West LNG Project: A Review and Assessment of the Project
Plans and Their Potential Impacts on Marine Fish and Fish Habitat in the Skeena Estuary.
Environmental Assessment Report, Government of Canada. Minister of Environment and
Climate Change.
Pijanowski, B. C., L. J. Villanueva-Rivera, S. L. Dumyahn, A. Farina, B. L. Krause,
B. M. Napoletano, et al. 2011. Soundscape Ecology: The Science of Sound in the Landscape.
BioScience 61 (3): 203–216.
Powers, A. 2016. Preserving the Quietest Places. The California Sunday Magazine. https://
story.californiasunday.com/quietest-places-on-earth. Accessed September 30, 2017.
Rothenberg, D. 2008. Thousand-Mile Song: Whale Music in a Sea of Sound. New York:
Basic Books.
Schröder, M., E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, et al. 2012. Building
Autonomous Sensitive Artificial Listeners. IEEE Transactions on Affective Computing
3 (2): 165–183.
Sueur, J., and A. Farina. 2015. Ecoacoustics: The Ecological Investigation and Interpretation of
Environmental Sound. Biosemiotics 8 (3): 493–502.
Thoreau, D. 1885. The Writings of Henry David Thoreau. Vol. 6. Boston: Houghton Mifflin.
Torino, L. 2015. You Can Actually Hear the Climate Changing. Outside. https://www.outsideonline.
com/2035701/you-can-actually-hear-climate-changing. Accessed September 30, 2017.
Truax, B. 2001. Acoustic Communication. Vol. 1. Santa Barbara: Greenwood.
Vartan, S. 2016. We’re Changing the Way the World Sounds: Noise Impacts Ecosystems
in More Ways than You Might Think. Mother Nature Network. http://www.mnn.com/
earth-matters/wilderness-resources/blogs/we-are-changing-way-world-sounds. Accessed
September 30, 2017.
Viel, J. M. 2014. Habitat Preferences of the Common Nighthawk (Chordeiles Minor) in Cities
and Villages in Southeastern Wisconsin. PhD thesis, University of Wisconsin-Milwaukee.
von Uexküll, J. J. Baron. (1934) 2010. A Foray into the Worlds of Animals and Humans, with A
Theory of Meaning. Translated by J. D. O’Neil. London: University of Minnesota Press.
chapter 10
M usica l Notation as
the Exter na liz ation
of Im agi n ed, Compl ex
Sou n d
Henrik Sinding-Larsen
Introduction
At one, concrete level, this chapter is about the innovation of musical notation and how
this tool for the description of sounds affected the way new music was imagined, per-
formed, and socially organized. But there is also another and more theoretical aim; to
see how this case of imagining and describing sonic qualities and patterns can inform
and be informed by a general theory on the emergence of complexity as a result of new
tools for storing, transmitting, and processing information. A key concept in my work
with these topics is externalization. The theoretical aim of the chapter is to explore the
insights we may gain by analyzing musical notation as a case of externalization of sound
or patterns in sound. A subtheme within this endeavor focuses on imagination, and how
imagination can be supported by externalizations. Obviously, there are limits to how a
chapter of this size can provide even a brief overview of the history of musical notation
and its consequences. Thus, only selected contemporary and historical examples will be
dealt with to the extent that they serve the wider and theoretical aim.
Because of this composite aim, the text moves between quite different levels of empirical
detail and theoretical abstraction. It also draws on various academic disciplines.
Before digging into the more theoretical and conceptual issues, I will set the scene
by describing some cultural events where music was important and where musical
notation played quite different roles. The cases are built on concrete events with personal
participation as well as reflections based on other relevant events and sources. The aim is
to highlight differences that are related to musical notation as a tool for the description
of sounds.
192 henrik sinding-larsen
A Symphonic Concert
to trapeze artistry without a safety net: Will someone play or sing out of tune? Will
someone miss the timing? Will each note be sufficiently distinct? Will someone play a
wrong note? Will the sum of artistic efforts match the expectations of critics? With such
premises, it is not surprising that a sense of precariousness and nervousness may be
prominent during performance with a corresponding feeling of relief when the concert
ends without flaws.
However, every dimension of symphonic music is not equally complex. If one
masked the variations in pitch, intensity, timbre, and orchestration and just listened
to the rhythmic aspects of a classical concert, then much of so-called advanced
symphonic music would be rather simple if not boring judged by the ideals of, for
instance, jazz or a well-played traditional fiddle tune intended for dancing. Compared
to popular music, the “romantic freedom,” so prominent in much of classical music,
reflects an inversion of priority between the melodic and rhythmic dimensions of
music. In most popular music, a regular pulse and tempo is to a larger extent treated
as an imperative premise on top of which the melody, harmony, and other pitch-
based effects can unfold. In classical music, in particular in the romantic period,
cadences but also other, local melodic events, even a single note, might trigger emo-
tional responses that “demand” more time: time that is not taken from the duration
of other notes in the same measure but that results in a net slowing down of the
tempo. The tempo is, in these cases, treated as an expressive parameter subordinate
to the “needs” of the melody or harmony. Musical “needs” of the melody and har-
mony can “with impunity” override and both accelerate and retard the integrity of a
regular pulse. Deviating from the regularity of pulse and tempo may occur in many
musical genres, also within popular music. Generally, it is most easily practiced and
achieved by a soloist. To achieve the romantic kind of “freedom” with a large orches-
tra is generally very difficult without notation-based instruction and a conductor
during performance.
In popular music, an ideal is often to extensively challenge the main beat through
subtle, improvised syncopations and other off-beat, rhythmic effects while adhering
even more strictly to the regularity of the overall pulse. This is an element in what is
called “groove”; a phenomenon that is hard or impossible to capture with musical nota-
tion (see Danielsen, this volume, chapter 29). As a contrast to this ideal, it is along the
dimensions of pitch (melody, harmony), timbre (instrumentation, orchestral texture),
and intensity (the overall, dynamic “narrative”) that we find most of the complexity of
symphonic music (in addition to agogics and rubato mentioned earlier). The aesthetics
of classical music is often expressed in disciplined, large-scale, hierarchically organized
complexity most of which is impossible to achieve without musical notation. Notation
in this context is, to a large degree, indispensable both for the music’s conception (imagi-
nation) and performance. Nevertheless, within these strict frames, set by the composer’s
notation, uniqueness and creativity are highly valued. Similar values do also permeate
modern complex society in many other domains. For example, laws and contracts could
be thought of as externalizations or notation systems facilitating complex economic and
social processes.
194 henrik sinding-larsen
A yoik—Complexity
on a Different Scale
Seeking to compare a symphonic work with a very different genre, the traditional yoik
from the song tradition used among indigenous Sami reindeer herders in the northern
regions of Scandinavia, is a fruitful endeavor. Yoik has shamanistic origins and was tra-
ditionally only performed by a single singer in small settings without any instruments or
at most a hand-held drum played by the singer. The melodic material and intervallic
range were very limited but could be extensively repeated—at certain occasions until a
state of trance was reached. Within these constraints, there existed a huge variation in
subtle qualities of the voice including countless pitch degrees outside those captured by
a diatonic scale and traditional musical notation. Rhythmically, it alternates between a
relatively steady pulse and more free rhythms; at any moment it is susceptible to pauses for
breathing followed by restarts with no reconnection to the previous pulse (Graff 2007).
This organization of time is incompatible with long, elaborate melodic developments.
However, the range of possible qualities of the voice in yoik by far exceeds the acceptable
ones for a classically trained voice. The pitch is often mixed with expressive guttural
sounds or more relaxed, speech-like qualities which make the pitches less fixed and
“pure” and thus less combinable into polyphonic complexity.
The social and cultural values traditionally communicated through yoik are quite
different from those of a symphony orchestra. The traditional Sami society was small
scale, and personal relations to both humans and animals constituted an important part
of the total amount of the social “glue” keeping society integrated. This represents a con-
trast to the society that produced symphonic music where written laws and contracts
had replaced much of the relational integration that characterized smaller scale societies.
An important function of yoik was to describe or confirm concrete relations. A yoik can
be descriptive of persons, animals, and landscapes, not as externalized descriptions of
these entities, but by performing the yoik as a kind of “speech act” or “song act” which
directly connects the singer with the “described.” An important genre of yoik is called
person yoiks. The Sami singers insist that a person yoik is not about a person, it is that
person. To some extent, a similar logic is operative vis-à-vis animals. A famous yoik
about a wolf chasing a reindeer is modeled on the sounds of howling wolves.1 The yoik’s
main motif ’s most important intervals are a fourth and a fifth. Central to the section of
the yoik “describing” the wolf ’s final attack on the reindeer the fourth is, in a gliding way,
pushed toward the tritone, a particularly dissonant interval, known as Diabolus in
Musica in medieval times. One possible function of the wolf yoik is to connect with the
wolf and, in that way, obtain some magic control over this dangerous predator.
Yoiks have been notated by ethnomusicologists with classical musical notation, but
only few of the important elements in yoik are captured by this tool of description.
An example that could reveal other and subtler contrasts to a symphonic concert
would be sacred music echoing the congregational chant as it may have sounded and
externalization of imagined, complex sound 195
functioned in the cathedrals and monasteries of medieval Europe at the time just before
some of the basic elements of modern musical notation were invented around the
eleventh century ce.
The metaphor of breathing together as a suitable metaphor for the less externalized
process of synchronization was corroborated by a story a Norwegian singer of Gregorian
chant told me about a workshop he had attended.2 The leader of the workshop, a member of
a renowned ensemble of early music, instructed the workshop participants on how to
synchronize in the spirit of early music. The goal of his exercise was to make all the par-
ticipants start to sing in full synchrony without any prior counting or visual cues, in total
blackness. The only way to achieve this was to listen to each other’s breathing, synchro-
nize the breath, and then start singing. The deeper aim of the exercise was to achieve a
relevant state of mutual attentiveness for performing music closer to an oral or—in my
terminology—a less externalized tradition.
Of course, many other factors contributed to my different experiences of sounds and
music between the mass in Krakow and the symphonic concert. In the mass, there was
no separation between performers and audience. Everyone sang the same monophonic
song except for the priest, who had a more elaborate textual part. None of the sections
in the chant was exceedingly complex. The main values that were celebrated were less
about coordinated hierarchy and excellence and more about community and inclusion
through a shared practice. Also in ecclesiastic settings, hierarchy and excellence may be
valued. But in the Dark Ages, before notation and the splendors of Gothic polyphony,
the sacred music in churches and monasteries was less about the display of artistry and
more about community and participation (Saulnier 2009). This was also reflected in the
more modest complexity of the monophonic Gregorian chant.
To understand what happened to music between the period of early Gregorian chant
and the modern symphony orchestra, it can be useful to dig deeper into the relationship
between notation and externalization.
Humans had imagined and performed music for a very long time without notation. The
purportedly oldest musical instrument is a flute made of mammoth ivory found in a
cave in Southern Germany. It has been carbon dated to between 42,000 and 43,000 bce
(Goodall 2013, 6). The oldest rudimentary musical notation is from ancient Mesopotamia
(app. 2000 bce) and the oldest efficient notation is from Western Europe around 1000 ce.
So, how and why did this need for a comprehensive notation of musical sounds emerge?
And what were the consequences? I argue that these questions must be answered in the
light of wide, historical transformations where musical notation was just one example
among many other emergent tools of description affecting various domains in society.
The quest for an efficient notation started with an alliance between the imperially
ambitious Frankish king Charlemagne (r. 774–814) and the pope, who both wanted
every Christian in Western Europe to sing the same chants authorized by the Vatican.
The century following Charlemagne saw the rise of a comprehensive project of political
unification supported by religious, educational, artistic, bureaucratic, economic,
architectural, and military standardization (Freedman 2011). Orthography and grammar
externalization of imagined, complex sound 197
were standardized, and the small letters for writing were simplified to promote literacy.
Coins and weights were standardized to promote long-distance trade. Such was the
political and cultural climate in which the development toward an efficient system for
notating music started (Levy 1998). On the frontispiece of many of the newly stan-
dardized, liturgical chant books was a picture of Saint Gregory (pope from 590 to 604 ce)
with a dove (the Holy Spirit) whispering the chants directly into his ear, and a scribe
sitting by his side and notating (Figure 10.1).
A much earlier tool of description of vocal sounds was the phonetic alphabet with
decisive implications for the development of Greco-Roman civilization and its unprece-
dented level of complexity (Goody and Watt 1963; Ong and Hartley 2012). The emer-
gence of programming languages for computers is a recent example with possibly even
more global consequences than the phonetic alphabet. Computer programming also
includes radically new, digital approaches to the description, recording, and production
Figure 10.1 Frontispiece of a chant book from the monastery of Saint Gall circa 1000 ce.
(St. Gallen, Stiftsbibliothek, Cod. Sang. 390, p. 13—Antiphonarium officii [Antiphonary for liturgy
of the hours].)
198 henrik sinding-larsen
of sound (Danielsen, this volume, chapter 29; Knakkergaard, this volume, chapter 6). In
order to understand what happened in the particular case of musical notation, it would
be helpful to understand what all these histories of emergent tools of description have in
common. My main hypothesis is that all are cases of externalization, which is a concept
I have used to bring together several theories on transitions in cultural and natural his-
tory (Sinding-Larsen 1987, 1991, 2008). I have found this concept and perspective useful
for developing a more holistic understanding of cultural history as well as the relation-
ship between cultural and biological evolution.3
There is a need to pay attention to one distinction made in The Oxford English
Dictionary’s definition of externalization: “The action or process of externalizing; an
instance of this; also concr. an embodiment. externalize: To make external; to embody in
outward form.”
What is important to retain here is that the word “externalization” can be used with
two related but different meanings: (1) a process (of making external), and (2) something
concrete, an embodiment which could be the result of that process. I also use “externali-
zation” in more abstract and specialized senses that I will gradually approach through
various examples until I arrive at a more formal, in-depth discussion of the concept later
(see “Externalization and the Emergence of Complexity”).
The action of recalling a pattern of sounds from memory and writing this as a pattern
of note-heads on a staff would qualify as a process of externalization, while the concrete
result of that process, the actual manuscript with notation, could also be called an externali-
zation. In this chapter, it is the process of externalization that is of main interest,
including the larger-scale and more complex processes that follow from the fact that
even externalizations may themselves be externalized. And not only may externali
zations be externalized. In evolutionary time scales, externalizations show a tendency to
become externalized. In spite of periods of significant setbacks, both biological and
cultural evolution are characterized by a long-term tendency toward increased levels
of externalization.
Living organisms grow by capturing or diverting flows of energy and materials from the
environment into the dynamics of their bodies. These are materials and energy that
otherwise would have dispersed more directly in accordance with the law of increasing
entropy (the second law of thermodynamics) (Deacon 2012b). Life could be thought of
as an extremely indirect way of dispersing energy and materials, and an overall trend in
the evolution of life is a steady increase in the level of indirectness. A main driver of
increasing indirectness is increasing complexity in information or “informed actions”
that constrain and enable the flows of energy and materials. The ultimate function of
information is to constrain environmental (external) flows of material and energy for
externalization of imagined, complex sound 199
the purpose of maintaining, growing, and reproducing the internal dynamics of a living
body (its interiority). All living organisms need to handle information about how they
performed helpful and harmful actions in the past (memory), a way to repeat helpful
actions in the future (heritable, functional habits/traditions), and when needed (for
example in face of environmental changes), a way to modify habits/traditions through
imagination, creativity, learning, and evolution.
Niche construction is a recent and increasingly important concept in evolutionary
biology (Odling-Smee 2010). Niche construction denotes organisms’ actions in con-
structing and changing their environment for their short-term benefit in a way that also
has consequences for the species’ long-term genetic selection. Beavers build dams with
logs cut by their teeth. The dams favor selection for a flat tail that is adaptive for swim-
ming in the dams. Teeth suitable for cutting trees, dams as a constructed niche, tails for
swimming, and many other features enter into a kind of dialectic or coevolutionary
process. It is argued that humans first developed a rudimentary language as a cultural
(nongenetic) adaptation. Subsequently a language community functioned as a semiotic
niche construction that favored the selection of individuals with larger brains who
processed linguistic signs more efficiently (Deacon 2012a). Further externalizations
through writing, maps, notations, and other semiotic tools have now become part of the
niche or environment in which humans grow up and live.
There is no doubt that the human semiotic externalizations we call science have vastly
increased our species’ ability to channel energy and materials from the rest of the envi-
ronment into our bodies as well as into those of domesticated plants and animals under
our control. Describing and controlling are two closely linked activities not only within
science. The same is true for describing and controlling the sound production called
music. In a wide sense, all music could be thought of as a more or less transient sonic
niche construction where the development of notation systems and tuning systems has
played an important role as semiotic externalizations.
There exist different levels or orders of externalization processes. To create a new
piece of music and write its pitches and rhythmical patterns on paper by means of musical
notation could be thought of as one order of externalization. To improve or create a new
system of notation with which it is possible to write down or externalize entirely new
kinds of sonic phenomena is an externalization process of a comparatively higher order
than just making a description with an existing tool of description. To create a new tuning
system that better matches the possibilities of a notation system could be seen as a form
of externalization that resembles sonic niche construction or at least the construction of
a sonic infrastructure. Finally, the term “externalization” may also be used to denote the
large-scale processes of societal transformation that are a result of multiple, nested and/
or more limited externalization processes. This implies that my concept of externalization
can be used for processes on several levels and often in a wider sense than the colloquial
sense that is mostly concerned with the first order of “making external.” At a basic level,
an alphabet can describe, make explicit, or externalize phonemes in a language. At a
higher level, the process of introducing an alphabet and literacy to a culture that is
without writing has been characterized as “alphabetization” which, in my terminology,
would be an externalization (in the wider sense) by means of writing. The process of
200 henrik sinding-larsen
The Externalization
of Pitch and Intervals
Something radical and interesting happens when we change from speaking to singing.
The continuous type of pitch variation that characterizes prosody switches to a much
more discrete or discontinuous type of pitch variation that characterizes melodies. The
singing voice often moves in discrete steps between a limited set of pitches with a more
or less fixed pattern of intervals we call a musical scale. One important background for
our affinity toward discrete pitch steps in music is to be found in physical acoustics.
A vibrating object like a string does not only vibrate in its full length but simultane-
ously in fractions of its length. These shorter fractions vibrate at higher frequencies that
are inversely proportional to the frequency of the full length. If the full length vibrates
at 100 cycles per second (Hz), then ½ of the length vibrates at 200 Hz, 1/3 at 300 Hz, ¼ at
400 Hz, and so on. The full-length pitch or frequency is called the fundamental frequency
or just the fundamental. Pitches from the vibrating fractions are called overtones. In a
well-crafted musical string, the most important of these fractions will vibrate in integer
multiples of the whole length and produce pitches that are called harmonic overtones.
The collection of harmonic overtones together with its fundamental is called a harmonic
series. A tone consisting of concurrently sounding overtones from a single harmonic
series is called a complex harmonic tone. In general, overtones are fused with the funda-
mental into a single auditory image with the fundamental being perceived and labeled
as that complex tone’s only pitch. However, the overtones become important when we
judge concurrent intervals between different pitches as consonant or dissonant. An
element in how we judge the consonance or dissonance of an interval is the degree to
which the fundamentals’ overtones overlap and form a single harmonic series or not. If
the overlap is extensive, we could say that the fundamentals are closely harmonically
related. Pitches at an octave apart (frequency ratio 2:1) are the most closely related
because the higher tone has no harmonic overtone that is not also present in the tone
one octave below. This is the physical basis for what we call octave equivalence. We could
think of pitches an octave apart as simultaneously being identical and different in two
different pitch spaces or pitch dimensions. One dimension is continuous and linear (the
height or register aspect of pitch) while the other dimension (variably called “pitch
externalization of imagined, complex sound 201
class,” “tonal chroma,” or the identity aspect of pitch) varies in a discrete and cyclic way
and may be depicted as a circle. This dimension could also be called the harmonic aspect
of pitch, since it is this aspect that is the basis for creating melody and harmony. Scale
comes from the Italian word “scala” meaning ladder. A ladder is ascending in a straight
line, which is a relevant metaphor for the height or register aspect of pitch. But the pitch
class (chroma or harmonic aspect of pitch) changes in steps from one pitch class to the
next until it reaches the octave, which is identical to the pitch class where the movement
started. In other words, a one-octave musical scale is simultaneously a straight ladder of
linearly ascending pitch heights and a circular, harmonic scale akin to a “soft” ladder
that is turned onto itself in a ring and where an octave is “one full circle” (Deutsch 2013).4
Many of the early challenges in developing an efficient notation system had to do with
this double nature of pitches.
The second most harmonically close interval after an octave has the frequency ratio
3:2 and is called a perfect fifth.5 We find this interval between the third and second over-
tone in a harmonic series. The potential symmetries, consonances, and dissonances that
are a part of physical and auditory acoustics have under various cultural circumstances
been exploited to create tensions and resolutions in musical themes and variations.6 To
unleash this literally epic potential one needs to create a tone system (a collection of
pitches and intervals) suitable for moving around in a tonal pitch space in harmonically
relevant and motorically/perceptually manageable scale steps. This can be done in many
ways and can be achieved in an entirely oral tradition. But a notation system coupled
with a system for producing precise and predictable intervals with tunable instruments
might provide significantly extended possibilities. How well the notation system is able
to describe the tone system and its potential harmonic symmetries will influence what
kind of music it is possible to imagine, compose, and perform.
Pythagoras was, according to legend, the first to describe (externalize) the size of intervals
by means of what in his time was a relatively new and powerful tool of description:
mathematics. Pythagoras established the basis for the idea that the length of a vibrating
string was inversely proportional to its frequency and that the size of the most consonant
and basic music-relevant intervals could be expressed as small-integer ratios between
string lengths, such as 2:1 (octave), 3:2 (perfect fifth), 4:3 (perfect fourth). Also, the inter-
vallic difference between a fourth and a fifth (with the ratio 9:8) was of particular impor-
tance to the Greeks and was called a tone. Both the arithmetic and geometry of these
ratios were important, because the Greeks by means of calculations and a compass could
construct the length of strings that would produce the theoretically established pitches
and intervals as sounds. The instrument they used for this “sonification” of their theory
202 henrik sinding-larsen
was based on one string with movable bridges above a line inscribed with the appropriately
constructed geometric points. The instrument was called a monochord. With this
knowledge, the Pythagoreans generated a tone-system by repeatedly adding the interval
of a fifth to an initial pitch and then (building on the principle of octave equivalence)
subtracting surplus octaves to locate all scale steps within the first octave (Hansen 2003).
After six applications of a fifth to, for example, F, one gets F–C–G–D–A–E–B. Arranged
within one octave from C the result is C–D–E–F–G–A–B. Although the Greeks used
different note-names, they had created a sequence of seven pitches and intervals (the
diatonic scale) that was to become the backbone of Western music until today not least
because later (medieval) musical notation was specifically developed to fit this particular
scale. The Pythagoreans chose the interval of a tone (ratio 9:8) as their “atom” in the
diatonic version of the tone system. The basic interval structure of the seven diatonic
scale steps (five tones [T] and two semitones [S]) is in today’s major mode given as
follows: TTSTTTS. The cyclic character of the octave implies that the beginning and
end of this linear interval pattern TTSTTTS can be joined to form a circle. Because the
two semitones are asymmetrically located in the circle, one may obtain seven different
diatonic sequences (or scales) of tones and semitones depending on where one starts in
the circle. The Greeks identified these permutations and called them “species of the
octave” but did not relate this to the cyclic nature of the octave.
Today the idea of a musical scale is indissolubly connected to the octave as a cyclic or
symmetrically repeatable segment within the tone system. But in ancient Greece, the
foundational, symmetric scale segment was considered to be the tetrachord, which was
not symmetrically repeatable to the same extent as the octave. A tetrachord consisted of
four notes or scale degrees (three scale steps) where the two boundary notes were fixed
(a perfect fourth apart) while the two notes that separated the internal scale steps could
vary. The tetrachord in the diatonic genus (with internal steps of one semitone and two
tones) was considered to be the most ancient and natural (Atkinson 2009, 11). The Greek
diatonic genus is basically the scale that we still use and that now has attained almost
global dominance not least through the dissemination of modern notation and keyboard
instruments with their diatonic layout of the white keys.
To make the tetrachord segment work for the description of their two-octave diatonic
tone system, the Greeks had to stack the tetrachords in two different ways: one where the
highest note in one tetrachord was also the lowest in the next tetrachord (called conjunct
tetrachords—repeating each tetrachordal scale degree a fourth apart), and another (the
disjunct) repeating a fifth apart. This lack of a uniform symmetry in the stack of tetra-
chords added to the complexity of the Greek way of naming pitches. Each pitch label
consisted of two terms. One part of the pitch name referred to the location of the tetra-
chord in the overall register of tetrachords. The other part referred to the location (or
scale degree) within that tetrachord. However, notes with the same scale degree in two
consecutive tetrachords were only to a limited degree harmonically related and their
relatedness varied depending on the type of tetrachord (conjunct or disjunct). The
Greek diatonic tone-system as sounds was not basically that different from the one
we use today, but their tetrachord-based perspective and notation system provided
externalization of imagined, complex sound 203
support for a different cognitive map with different constraints and affordances for how
symmetries and harmonic possibilities could be imagined.
Aristoxenos, a pupil of Aristotle and a founding father of Greek music theory, was
well aware of the acoustics of the octave and octave equivalence. But for various reasons,
he, along with subsequent Greek philosophers, singled out the tetrachord as their ele-
mentary symmetric segment in the tone system. Some reasons were metaphysical and
related to the magic of the number four: The material world consisted of four elements
(earth, fire, air, and water), and one only needed four numbers (e.g., 16-12-9-8, see
Figure 10.2) to establish the ratios of the four foundational intervals in Greek music theory
(2:1 octave, 3:2 fifth, 4:3 fourth, and 9:8 tone). But it could also be that a music theory and
notation based on tetrachords worked sufficiently well for their basically monophonic
melodies that included intervals smaller than a semitone (in the enharmonic genus)
that in any case were less suited for our kind of polyphony. It could also be that the
symmetry-obsessed Greeks found the lack of symmetry within an entire diatonic
octave (with its two, asymmetrically placed semitones) to be incompatible with the
status of a foundational entity in their tone-system. In any case, the tetrachordal con-
ception of the tone-system represented serious limitations for medieval music scholars
Figure 10.2 The numerical basis for Pythagoras’s harmonic scale. (Detail from woodcut on
page 18 in the 1492 treatise Theorica musice Franchini Gafuri laudensis. Source: Bibliothèque
nationale de France.)
204 henrik sinding-larsen
with the ambition of creating an efficient notation system that could support an emerging
Christian interest in more advanced polyphony.
structure of tones and semitones that did not fit the notation system. These melodies
needed a semitone interval where the tone system did not provide one. At times this
resulted in melodies that did not fit being simply suppressed or altered to fit the system
(Atkinson 2009, 244). The integrity of the notation system became at this stage more
important than preserving an oral and divine tradition that Saint Gregory purportedly
had received directly from the Holy Spirit! This tells us something about the cultural
power of semiotic conventions. Eventually, the notation system, as well as the diatonic
tone-system, was developed with the addition of more notes and particular signs until
the system comprised twelve semitones in an octave.
The Greek names and symbols (inverted/distorted letters) for pitches were sufficiently
functional for theoretical treatises on music and the storage of melodic shapes for
“archival” purposes but were not for practical playing and sight-singing or the imagi-
nation (composition) of new, polyphonic complexity.
In particular, the Greeks did not use graphics in their notation system to visualize
with iconic resemblance the melodic pitch movements (scale steps) of actual melodies.
This idea appeared for the first time in the ninth century in the context of the Carolingian
renaissance. The treatise Musica enchiriadis (The music handbook) from the second
half of the century is an influential and early example (Erickson and Palisca 1995). The
anonymous author described his pitches with the long Greek note names and also with a
version of an ancient Greek collection of signs called the dasian sign system. But more
importantly, he transferred these signs into a kind of coordinate system where the pitches
of syllables of text were placed on lines ascending on the vertical axis while the temporal
sequence of the same syllables were placed on the horizontal axis.
In the left column in Figure 10.3 we see seven ascending dasian pitch signs (looking
like twisted Fs and referred to in the treatise as notae or notes). The intervals separating
the pitch signs (and hence also the lines) are marked with T for tone and S for semitone
in accordance with the Greek diatonic genus. This treatise, from before 900 ce, contained
the most basic graphic idea of staff notation more than a century before Guido of Arezzo’s
treatise from 1028 ce, which is reckoned as the definitive birth of staff notation.
Figure 10.3 Example of polyphonic chant from the treatise Musica Enchiriadis. (Staatsbibliothek
Bamberg, Msc.Var.1, fol.57r, photo: Gerald Raab.)
206 henrik sinding-larsen
The critical Enchiriadis innovation was (1) to depict the vertically stacked horizontal
lines as placeholders for pitches instead of attaching a letter-based pitch symbol to each
separate syllable in the text (as the Greeks had done), and (2) to specify the intervals
between the lines. This implied a graphic and iconic communication of pitch move-
ments and melodic contours that was cognitively much more intuitive and efficient than
its alphabetic predecessors. However, the author did not take it for granted that his
medieval reader understood his bold abstraction right away. The author asks the reader
to think of the lines as ordered strings (to create associations to the order of strings on a
lyre or harp) and he further asks the reader, “Let these strings be in place of the sounds
the notae signify” (Atkinson 2009, 124). The idea of a quality of a sound (pitch) depicted
as a visual line in a staff had not yet become established imagination. Or we could say
that the idea of pitch had not yet been fully externalized from the actual sounding string
(at least for his readers). Nor had the sound been fully externalized from the syllable of
the sung word that would eventually be replaced with a dot (a note-head). It would take
more than a century before the scribes of music would pick up again this way of depict-
ing a pitch space as a grid of horizontal lines. In the meantime, a quite different system of
notation was developed: the neumes (Figure 10.4).
Neumes also depicted shifting pitches as vertical movements, in particular within
compounded neumes (ligatures). In that sense, they were more intuitive and efficient to
read than the alphabetic notation. But without definitive, horizontal lines (pitches) of
reference and explicit intervallic distance between the lines, it was often impossible to
know exactly how the pitch changed from one neume to the next. Each neume (or group
of neumes) was complex, gestural, dynamic, often with additional hints on duration.
Neumes contained hints about changes in direction but no map with coordinates of the
Figure 10.4 Musical notation (neumes) from circa 900 ce. (St. Gallen, Stiftsbibliothek, Cod.
Sang. 359, p.145—Cantatorium.)
externalization of imagined, complex sound 207
pitch space to help locate from where the changes of direction took place. Neumes
served as mnemonic support for singers who already knew the melodies, but were not
usable for learning melodies or as a cognitive tool to support the imagination of
advanced polyphony.
The concepts egocentric and allocentric are used to characterize two kinds of navigation
(Buzsaki and Moser 2013). To navigate in an unknown landscape without a map requires
egocentric navigation. Any place must be understood in relation to the navigator’s per-
sonal path through the landscape to this location. The experience of a specific location
becomes path-dependent. To navigate with a map is called allocentric navigation because
prior information about the landscape has been plotted onto a map or has been exter-
nalized in a way the navigator can consult independently of her own past itinerary.
The information about a location on a map is thus more path-independent and more
independent of the “ego” as the ultimate point of reference or “point-of-view.” In many
contexts, allocentric may be used as a synonym for externalized, and the process of
externalization could be thought of as “allocentrification.” The neumes in their inability
to depict precise intervals (scale steps) were less externalized from the oral tradition and
thus less allocentric than both the previous alphabetic notation and the subsequent staff
notation, which was first developed with just one line of reference (Figure 10.5), and
then with four, five, or six (Figure 10.6).
The innovative idea of Guido’s staff notation was to combine and develop several
previously established insights: (1) to map the neumes as standardized dots onto a com-
pressed version of the pitch lines established in Musica enchiriadis (using both lines and
Figure 10.5 Neumes on a single line red F-staff. Montecassino, Italy, 2nd half of 12th century.
(“The Schøyen Collection MS 1681.”)
208 henrik sinding-larsen
spaces between lines as horizontal markers of pitch levels) and thereby creating more
compact visual gestalts of the melodic contours; and (2) to specify the intervals between
lines on the basis of octave symmetry, which meant that two note-heads seven ordinary
scale steps apart would always sound as a 2:1 octave. This was not the case in the
Enchiriadis version of the lines because of its consistent use of disjunct tetrachords. The
new staff became a powerful tool for the externalization of not only precise intervals but
also intervals’ contexts by making intervals visible and much more intuitively intelligible
than in any previous notation system. The letter-based pitch symbols had been able to
externalize exact intervals but in a more indirect way than the staff. Two intervals with
the same width, for example the fifths C–G and E–A, had in their letter-based versions
no visual similarities that would tell the reader that both were fifths. The letters only
functioned as ordinal numbers indicating the number of steps from a first scale degree
of a particular segment whether this segment was a tetrachord or an octave. On the
other hand, note-heads on a staff communicated pitch height dissociated from par-
ticular scale degrees. For example, a fifth (two note-heads three lines or three spaces apart)
was immediately visible and recognizable as this interval irrespective of its register or
scale degree. In this way, intervals could be transposed vertically up or down the staff
(register) while maintaining both their visual and sonic characteristics. Although some
adjustments might be needed for the location of semitones, in general, pitch patterns
became visually and cognitively “transposable” on a staff to a much higher degree than
in the letter-based or neume-based notation systems. Through vertical alignment of
several voices, it became easier to visually express and imagine how a particular pitch
was part of several intervals at the same time. This paved the way for more complex
polyphonic compositions. The staff was thus not only important for physical external-
ized notes on tablets and parchment. The staff had become a tool for visual-spatial
imagination of sonic relationships between concordant and discordant intervals, relation-
ships that were not easy to keep track of in a purely aural mode of conceiving music.
The name of an early and important genre of polyphony was counterpoint. This term
alludes directly to the note-heads as points on the staff that would be organized in several
voices, point-against-point, in parallel, oblique, and countermovements. The art of
counterpoint increased in complexity in the coming centuries and, according to many,
peaked with the fugues of J. S. Bach.8 Although notation was crucial in this develop-
ment, the change from an oral to a written musical tradition did not represent a simple,
one-way transition where the importance of interiorized, implicit oral models is decreased
and that of externalized, explicit written models is increased (Berger 2005). Berger
acknowledges Goody when she writes that literacy does not replace orality. Literacy also
creates conditions for a new kind of orality.
Certain genres of music departed more profoundly from their oral/aural origins than
others. Trevor Wishart reflects on the limited success of composers of serial (atonal)
music. He attributes its limited appeal to this music’s almost total reliance on notation
for its imagination; serial music was, according to Wishart, conceived with the eyes and
not with the sense of hearing (Wishart and Emmerson 1996). Music is a sonic art and
will ultimately be linked to hearing. But visual externalizations in the form of notation
210 henrik sinding-larsen
and theories about the geometry of interval ratios have evidently influenced music. The
extent to which the quality of music can be “objectively” determined by means of
mathematical proportions or if music will always depend on idiosyncratic feelings and
cultural preferences has been a contentious issue ever since Aristoxenos’s critique of the
more mathematically fundamentalist Pythagoreans (Boethius 1989, chap. 5).
Any semiotic culture is, in a profound way, both real and imaginary. We could say that
culture is nothing but cemented, habituated, institutionalized, or externalized imagi-
nations that are used for further imaginations and externalizations. Music, art, humor,
and science are human activities where the creative tensions between institutional-
ized constraints and imagination are widely cultivated (Deacon 2006). These activities
are characterized by being both more constrained and regularized on one side and more
relaxed (open to random or unconstrained events) on the other side than ordinary life.
Institutionalized externalizations are essential both for demolishing old complexity and
building new complexity. We cannot predict what in the future will be specifically
imagined or institutionalized, but an externalization perspective on the past may tell us
something about the general shape of certain transitions in this evolution.
In the following, I will take a step back from the externalization of pitch that paved the
way for increased polyphonic complexity to see how this relates to the connection
between externalization and complexity more generally.
transition, because the transition is about the emergence of an ontologically new entity
(a new and higher-level individuality), and because the foundation for its emergence is
the synergy obtained through new levels of cooperation enabled by new tools for the
management of information and knowledge. Terrence Deacon’s related transition to
what he calls higher-level teleodynamics is also an inspiration for my understanding of
externalization; in particular, Deacon’s treatment of the relation between the dynamic
(processes) and the static (constraints) (Deacon 2012b).
Some short conceptual clarifications before attempting to produce a more explicit
definition of externalization: Knowledge: useful information or information with sig-
nificance for the organism (Deacon 2017); and Self: an organism’s most fundamental orga-
nizing principle and what defines its individuation (Deacon 2012b, 465–466). I may also
use self in this wide sense almost as a synonym with individual, which may include even
wider individualities or individual-like entities like an orchestra. The subjective self is
regarded as a more special mode of the wider, organismic self (Deacon 2012b).
In the context of historic developments of new tools of description (or more specifically
semiotic externalizations), I propose to define and explain the concept externalization
in the following way:
and cooperative action can become more or less strongly institutionalized or in other
ways fixated as habits or addictions (Hui and Deacon 2010). To the extent that this insti-
tution becomes self-regenerating, self-repairing, and in other ways protects itself from
dissolution (“death”), we may speak of the emergence of higher-order individuality or
an onto-synergistic transition. The members of this higher-order individuality may
share the benefits stemming from the social synergies. But they will in general also have
to pay a price in the form of giving up some of their autonomy and uniqueness for the
sake of the new and larger-scale individuality based on standardized differentiation and
separation of labor or combination of labor (Corning and Szathmáry 2015). A former,
organically grown, relational, and complex uniqueness is replaced with a new, higher-level
uniqueness based on the enhanced combinatorial (often permutational) properties of
the simpler, more standardized elements.
The musicians in a symphonic orchestra or singers in a choir who are not allowed to
improvise or do anything not indicated in the score could represent an example of such
higher-level complexity made up of lower-level, standardized elements that have given
up some of their autonomy to share the gains from a higher-level synergy. Also, in the
development of musical notation, we saw the early neumes, where a single symbol (a
ligature) depicted a cluster of notes together with their intervallic movements as well as
an egocentrically grounded vertical placement on the paper, give way to staff notation
where each pitch had a separate symbol (note-head) and where all intervals were allo-
centrically defined by exact vertical positions in a grid (the five-lined staff). Signs and
symbols are not alive and do not have to literally give up autonomy in the same sense as
living individuals. But there might nevertheless be some interesting similarities between
the processes leading to standardization and increased combinatorial properties of
elements in a semiotic system and musicians in an orchestra.
It is the external and relatively static quality—the quality of being dissociated from the
continuous stream of material and energetic consequences of the dynamic self (its egocen-
tricity)—that provides descriptions and information with a certain distance of virtuality
which in the next turn can become a support for creative imagination. Sounds are inher-
ently dynamic and ephemeral. Descriptions of sounds (or aspects of or patterns in
sound) on paper are more static and have, in that sense, a distance to the immediacy of
the present. For the externalization process to become more than a halt or temporary
postponing of the dynamics of the self (a kind of pause-button), it must also be possible
to copy and manipulate the externalized information (the patterns that contain possible
information) without presupposing or involving interpretation of these patterns. It must
be possible to manipulate the externalized patterns in their preinterpreted state. With a
metaphor from computer science we could express this idea as follows: It must be possible
to rewrite a program while it is not running. If it is possible to come back to or revisit the
static, “frozen,” externalized version of the sound patterns multiple times, to make copies,
change one copy and compare the variant with the original, then the conditions for
evolutionary processes are in place (heredity, mutation, variation, and selection). Such
evolutionary-like processes supported by notation will often be an important part of a
cognition/imagination (Szathmáry and Fernando 2011).
externalization of imagined, complex sound 213
Summing Up
Throughout human history, new tools of description (new semiotic systems) have
catalyzed the emergence of larger-scale social and cultural institutions. The emergence
of language paved the way for the first human tribal groups based on culturally accumu-
lated skills and tools. The emergence of the first cities and empires (Mesopotamia) was
based on taxation that presupposed records of taxation based on the invention of writing
(including numerals) (Scott 2017). On a smaller scale, but in the same direction, musical
notation contributed to the emergence of large-scale symphony orchestras playing
music with a harmonic complexity unthinkable within a purely oral tradition. It was
notation’s quality of comprising externalized patterns of sound that enabled the syner-
gies of multiple musicians playing different parts reading from multiple, identical copies
of these patterns. Externalization also enabled new synergies between the two senses of
vision and hearing as a support for the composer’s combined aural and visual imagi-
nation of the complexities of multipart polyphony. These several changes promoted each
other in a dialectic or coevolutionary manner. The early tools for the description of
sounds were tailor-made for the diatonic tone system with its relatively limited number
of pitches available for singing, playing, and composing (imagining). But the notation’s
explicit (externalized) character also made it easier to explore this limited pitch space
to its edges from where it was possible to look further, toward new, “forbidden” or
“unimaginable” notes and interval patterns. In the Middle Ages, music that included
notes outside those accepted in the early notation systems was called musica ficta or musica
falsa as opposed to the music within the system which was called musica recta or
musica vera (“true” music) (Bent and Silbiger 2017). This lasted until the authoritative
theoreticians (guardians of the notation norms) had accepted an unlimited use of the
supplementary signs, accidentals, key signatures, and so forth. These amendments to
the notation system implied that the staff, originally tailor-made to describe the seven-step
diatonic tone system, could now describe a twelve-step, chromatic tone system. As the
enhanced notation system now supported the imagination of more audacious harmonic
modulations (in particular on instruments with fixed pitches like organs and harpsi-
chords), an old discrepancy between notation and sound became more acute. All
intervals that were visually identical on the staff (that spanned the same number of lines)
were not acoustically identical as sounds. The result was that certain intervals that, on
paper, should be consonant sounded dissonant. This discrepancy was eliminated by a
fully homogenized tuning system. With the tuning system called twelve-tone equal
temperament, one obtained for the first time a full symmetry between notated intervals
and acoustic intervals. The synergy between the visual and the aural intervals in this way
became complete, and the number of combinatorial possibilities increased significantly.
Bach celebrated the path toward equal temperament with his famous collection Das
Wohltemperierte Klavier, which, for the first time, exploited all the possible keys and
modes of his time.
214 henrik sinding-larsen
neumes, as this notation contained information about the previous oral chant tradition,
information that was lost in later and more standardized editions of the chant books.
But the revivers’ quest for the oldest, most “authentic” manuscripts has made them
more obsessed with notation than ever before. The old neumes were treated as
“directly externalized authenticity,” which becomes somewhat of a paradox if the aim
is to revive qualities of a prenotational past (Bergeron 1998). Obviously, there exist no
recordings of prenotational Gregorian chant. In that respect, certain folk music tra-
ditions are in a better position. Recorded music from oral traditions does exist and repre-
sents a new kind of externalization which captures many details that escaped the
limited descriptive power of traditional notation. But a meticulous copying of a
recorded tune can never be the same as learning music in a traditional, small-scale,
oral setting. The increased power of recordings as externalizations might, in some
senses, even increase the distance to an oral tradition because the externalized template
for what is “authentic” becomes more totalizing with less room for a personal interpre-
tation than one based on a crude transcription with standard notation. It seems that
some aspects of a lower level of externalization simply cannot “survive” the descriptive
power of higher-level externalizations.
Today, thousands of people worldwide are at any one moment engaged in imagining
and describing innumerable physical, social, and cultural processes by means of
computer-based tools of description (not least for creating artificial intelligence and
virtual reality, including elaborate soundscapes). These tools (programming languages)
have a descriptive power far beyond anything the medieval and renaissance creators of
musical notation could have imagined. With the modern sound and music applications
of the digital age (Knakkergaard, this volume, chapter 6), the distinction between nota-
tion (descriptive tools) and music has to some extent been abolished. Whatever can be
formally described can automatically be played, and whatever can be played can auto-
matically be described. All aspects of life (or music) cannot be formally described, but
the proportion that can, increases steadily. Humanity is undergoing multiple processes
of externalization contributing to a major transition in cultural evolution with conse-
quences comparable to those following the invention of writing or maybe even the
emergence of language. My contention is that we may get a better understanding of what
may be gained and lost in this transition by looking closely at what happened to music as
a result of what in hindsight looks like a comparatively innocent medieval improvement
of the Greek way of describing, imagining, and controlling musical sounds. The aim of
the concept and theory of externalization is not to make normative judgments on what
is “progress” or what is better or worse music. It is to show how externalization processes
are deeply transformative and that increased complexity at a large scale may be insepa-
rable from reduced complexity at lower levels. My ultimate goal with the concept of
externalization applied to the history of describing and imagining musical sounds is to
create a distance of reflection to both the historic processes and our current global
dynamics so that we are better able to imagine what might follow. The ultimate ambition
of the concept of externalization is thus to function as a good example of itself.
216 henrik sinding-larsen
Acknowledgments
The research leading to this chapter has received funding from the European Research Council
under EU’s Seventh Framework Programme (ERC Grant Agreement no. 295843), the
Research Council of Norway (SAMKUL project no. 246893/F10), and Department of Social
Anthropology, University of Oslo. Thanks to Hans M. Borchgrevink, Henning Kraggerud, Rob
Waring, Tellef Kvifte, Tim Ingold, Chris Hann, Viggo Vestel, Tord Larsen, Alix Hui, Ola Graff,
Maria Kartveit, Thomas Hylland Eriksen, Lars Henrik Johansen, Mark Grimshaw-Aagaard,
and Martin Knakkergaard for important feedback to the manuscript and contributions to the
writing process.
Notes
1. Performer Per Hætta, 1960, Track 24 on CD (1995) Norsk folkemusikk: 10: Folkemusikk frå
Nord-Noreg og Sameland. (Norwegian folk music: vol. 10: Folk music from Northern Norway
and Sameland). Grappa musikkforlag AS, Oslo. GRCD 4070.
2. Thanks to Hans M. Borchgrevink for this story.
3. An in-depth genealogy of the concept externalization falls outside the scope of this chapter,
but Leroi-Gourhan’s term “exteriorization” from 1964 is a precursor (see Ingold 1999) and
Hegel’s term “entaüsserung” from 1809, translated as “externalisation” in Rae (2012), may
be the first use with a related meaning to the one used in this chapter.
4. The two dimensions of pitch have been depicted in a combined way as a three-dimensional
spiral or helix ascending one octave per cycle (Deutsch 2013). But the ladder, circle, and
helix are simplifications of the complex and entangled relationship that exists between
pitch heights and pitch classes involving both physical, physiological, and cultural factors.
5. The terms “octave,” “fifth,” “fourth,” and so on, refer to the intervals one covers with
scale steps in the diatonic tone system counting the starting pitch of the scale as the first
scale degree.
6. Several other and more personal and cultural factors than vibrational interference patterns
may influence the judgment of consonance versus dissonance. Nonetheless: “Preference
for consonance over dissonance is observed in infants with little postnatal exposure to
culturally specific music . . . . Consonance and dissonance play crucial roles in music across
cultures: whereas dissonance is commonly associated with musical tension, consonance is
typically associated with relaxation and stability” (Thompson 2013, 108–109).
7. The anonymous author was for centuries thought to be Odo of Cluny (this is now proven
to be incorrect, and instead the author is often referred to as Pseudo-Odo) (Atkinson 2009).
8. A dynamic visualization of Bach’s notation-based polyphony can be watched on the online
video “Music+Math; Symmetry” provided by the Santa Fe Institute: http://tuvalu.santafe.
edu/projects/musicplusmath/index.php?id=35. Accessed November 10, 2017.
9. Small-scale music cultures are particularly affected, although ethnopolitical movements
may to some extent counteract these global trends. An example: In Norwegian folk song,
nonequalized scales that previously were nearly extinct are now regarded as a valuable cul-
tural trait and increasingly used among professional folk singers. However, they do not
represent a widespread music culture and their mastery of the traditional scales mostly
resembles that of a second language while their first tonal language remains in equal
temperament.
externalization of imagined, complex sound 217
References
Atkinson, C. M. 2009. The Critical Nexus: Tone-System, Mode, and Notation in Early Medieval
Music. Oxford: Oxford University Press.
Bent, M., and A. Silbiger. 2017. Musica Ficta. Grove Music Online. Oxford Music Online. http://
www.oxfordmusiconline.com/subscriber/article/grove/music/19406. Accessed November
15, 2017.
Berger, A. M. B. 2005. Medieval Music and the Art of Memory. Berkeley: University of
California Press.
Bergeron, K. 1998. Decadent Enchantments: The Revival of Gregorian Chant at Solesmes.
Berkeley: University of California Press.
Boethius, A. M. S. 1989. Fundamentals of Music. Translated, with introduction and notes by
C. M. Bower. Edited by C. V. Palisca. New Haven, CT: Yale University Press.
Buzsaki, G., and E. I. Moser. 2013. Memory, Navigation and Theta Rhythm in the Hippocampal-
Entorhinal System. Nature Neuroscience 16 (2): 130–138.
Corning, P. A., and E. Szathmáry. 2015. “Synergistic Selection”: A Darwinian Frame for the
Evolution of Complexity. Journal of Theoretical Biology 371: 45–58.
Deacon, T. 2006. The Aesthetic Faculty. In The Artful Mind: Cognitive Science and the Riddle
of Human Creativity, edited by M. Turner, 21–53. Oxford: Oxford University Press.
Deacon, T. W. 2012a. Beyond the Symbolic Species. In The Symbolic Species Evolved, edited by
T. Schilhab, F. Stjernfelt, and T. Deacon, 9–38. Dordrecht, Netherlands: Springer.
Deacon, T. W. 2012b. Incomplete Nature: How Mind Emerged from Matter. New York:
W.W. Norton.
Deacon, T. W. 2017. Information and Reference. In Representation and Reality in Humans,
Other Living Organisms and Intelligent Machines, edited by G. Dodig-Crnkovic and
R. Giovagnoli, 3–15. Cham, Switzerland: Springer.
Deutsch, D. 2013. The Processing of Pitch Combinations. In The Psychology of Music, 3rd ed.,
edited by D. Deutsch, 249–325. San Diego: Elsevier.
Erickson, R., and C. V. Palisca. 1995. Musica enchiriadis and Scolica enchiriadis. New Haven,
CT: Yale University Press.
Freedman, P. 2011. The Early Middle Ages, 284–1000 (HIST 210). Open Yale Course Online
Lecture. https://oyc.yale.edu/history/hist-210. Accessed May 16, 2016.
Goodall, H. 2013. The Story of Music. London: Chatto & Windus.
Goody, J., and I. Watt. 1963. The Consequences of Literacy. Comparative Studies in Society and
History 5 (3): 304–345. doi:10.1017/S0010417500001730.
Graff, O. 2007. Om å forstå joikemelodier: Refleksjoner over et pitesamisk materiale. Svensk
Tidskrift för Musikforskning 89: 50–69.
Hansen, F. E. 2003. Tonesystem. In Gads musikleksikon: Sagdel, Vol. 2, edited by F. Gravesen
and M. Knakkergaard, 1641–1646. Copenhagen. Denmark: Gad.
Hui, J., and T. Deacon. 2010. The Evolution of Altruism via Social Addiction. In Social Brain,
Distributed Mind, edited by R. I. M. Dunbar, C. Gamble, and J. Gowlett, 177–198. Oxford
and New York: Oxford University Press.
Huron, D. 2008. Lost in Music. Nature 453 (7194): 456.
Ingold, T. 1999. “Tools for the Hand, Language for the Face”: An Appreciation of Leroi-Gourhan’s
Gesture and Speech. Studies in History and Philosophy of Biological and Biomedical Sciences
30 (4): 411–453.
Levy, K. 1998. Gregorian Chant and the Carolingians. Princeton, NJ: Princeton University Press.
218 henrik sinding-larsen
Maynard Smith, J., and E. Szathmáry. 1995. The Major Transitions in Evolution. Oxford:
Freeman Spektrum.
Odling-Smee, J. 2010. Niche Inheritance. In Evolution: The Extended Synthesis, edited by
M. Pigliucci and G. B. Müller, 175–207. Cambridge, MA: MIT Press.
Ong, W. J., and J. Hartley. 2012. Orality and Literacy: The Technologizing of the Word. 3rd ed.
London and New York: Routledge.
Rae, G. 2012. Hegel, Alienation, and the Phenomenological Development of Consciousness.
International Journal of Philosophical Studies 20 (1): 23–42.
Saulnier, D. 2009. Gregorian Chant: A Guide to the History and Liturgy. Orleans, MA:
Paraclete Press.
Scott, J. C. 2017. Against the Grain: A Deep History of the Earliest States. New Haven, CT: Yale
University Press.
Sinding-Larsen, H. 1983. Fra fest til forestilling: Et antropologisk perspektiv på norsk folkemusikk
og dans gjennom skiftende materielle, sosiale og ideologiske betingelser fra nasjonalromantik-
ken og fram til i dag. Magister Artium dissertation. University of Oslo.
Sinding-Larsen, H. 1987. Information Technology and the Management of Knowledge. AI &
Society: The Journal of Human-Centred Systems and Machine Intelligence 1 (2): 93–101.
Sinding-Larsen, H. 1991. Computers, Musical Notation and the Externalization of Knowledge:
Towards a Comparative Study in the History of Information Technology. In Understanding
the Artificial: On the Future Shape of Artificial Intelligence, edited by M. Negrotti, 101–125.
London: Springer.
Sinding-Larsen, H. 2008. Externality and Materiality as Themes in the History of the Human
Sciences. Fractal: Revista de Psicologia 20 (1): 9–17.
Szathmáry, E., and C. Fernando. 2011. Concluding Remarks. In The Major Transitions in
Evolution Revisited, edited by B. Calcott and K. Sterelny, 301–310. Cambridge, MA: MIT Press.
Thompson, W. F. 2013. Intervals and Scales. In The Psychology of Music, 3rd ed., edited by
D. Deutsch, 107–140. San Diego: Elsevier.
Wishart, T., and S. Emmerson. 1996. On Sonic Art. Contemporary Music Studies,
12. Amsterdam: Harwood.
chapter 11
“. . . th ey ca l l us
by ou r na m e . . .”
Technology, Memory, and Metempsychosis
Bennett Hogg
Introduction
In the following chapter I shall be proposing that living, as we do, in a world where
sound recordings are a major element in our sonic ecosystems, we cannot think about
sound without considering the ways in which recording technologies affect and inform
our experiences of listening more widely. That perception and memory have been
invoked to account for sound recording is widely noted (perception and memory forming
a structurally congruent pair to recording and playback). However, in line with phe-
nomenological positions developed since Husserl, we cannot discount imagination
when we talk about perception and memory. This problematizes the assumed congruency
between sound recording and memory, there being no equivalence of imagination
immanent to the medium of sound recording itself (as opposed to imaginative ways artists
might use sound recording). After examining several problematic mappings of sound
recording and memory I shall be proposing the animistic doctrine of metempsychosis,
or the transmigration of souls, as a more suitable model of sound recording than the
more obvious and culturally embedded one of memory.
Recordings of all kinds, from the written word to the digital photograph, have, for
millennia, held associations with memory. Since at least Plato’s concerns that writing, as
one form of recording, undermines human memory, to the relatively recent use of the
term “memory” to refer to storage of information on computer hard drives, recording
and memory have gone hand in hand. The etymology of the word “recording” refers, of
course, directly to remembering—from the Latin recordare—though it is worth pointing
out at the outset that to record and to remember are not always the same thing, though
220 bennett hogg
they are often part of the same, larger process. Freud, in 1930, proposed the gramophone
as a prosthesis of memory, along with the camera as one of the “materializations” of the
“innate faculty of recall” ([1930] 2004, 35). Sound recording has been widely—and on
the whole unproblematically—figured as a metaphor of memory, as has the photograph.
In parallel to this, memory has been conceived in terms of sound recording—or other
forms of inscription (Freud [1924] 1961; Draaisma 2000; Terdiman 1993)—such that it is
not always fully possible to determine exactly which is the metaphor and which the original
object; indeed, as with so many phenomena, determining which aspect is originary and
which consequent is difficult to determine; the conundrum of the chicken and the egg.
Metaphors do have a tendency to acquire power over their referents, though (see also
Walther-Hansen, volume 1, chapter 23, for more on metaphor and recording technology).
Even as they illuminate those aspects of a phenomenon which they resonate with, their
apparent efficacy can dazzle us, and cast into shade, those aspects of a phenomenon that
are not accounted for in the work the metaphor does. We should not, therefore, take the
prosthesis of memory to be the same thing as memory—a prosthesis may extend human
capacities to remember, or compensate for the failures of memory (Armstrong 1998, 78),
but its prosthetic activity is only really active as one among many different elements that
go together to make up memory tout court. Memory, even as a metaphor, is less like a
recording than we might at first think, complicated as it is by being a malleable element
within a greater ecosystem of embodied consciousness. Recordings behave in ways very
unlike memory, for similar reasons, being enmeshed in their own cultural, material, and
creative ecosystems which, though having significant overlaps with ideas of memory
because of their mediating role in human culture, are also in some respects quite radi-
cally separate from, and not well accounted for by recourse to, models drawn directly
from human memory. In particular, recent thinking has credibly challenged formerly
held ideas about the ways in which perception, memory, and imagination work together.
A traditional linear model, in which perception sends images into memory which serves
as a resource for imagination to draw on, is perhaps too strictly causal to account for the
complex procedures involved in many acts of imagination; doodling absentmindedly
on a piece of paper, for example, and then realizing that one has drawn a monster.
Though distinct and dissimilar from one another, memory, perception, and imagination
cannot, in terms of the ways in which they interact and mutually inform one another, be
separated from one another, as Bergson asserted early in the twentieth century. In this
chapter, I have used the constellation of memory and imagination as mutually critical
tools to destabilize a received wisdom that understands recording and memory as being
adequately similar to one another. The key problem with the admittedly persuasive idea
that memory and recording can stand as productive metaphors of one another, is, in a
nutshell, that recordings can stand on their own, and the signals they contain can remain
more or less unchanged,1 can persist as entities, whereas human memories, through
their positioning inside of a psychic ecosystem of consciousness, action, and agency in
which forgetting, imagination, and supposition render them much less fixed, are much
less discretely organized with respect to one another. Imagination, then, is my principal
critical tool for prizing apart the connection of memory and recording, and at the same
time a phenomenologically informed understanding of imagination affords a process
technology, memory, and metempsychosis 221
Once we start looking into memory and imagination it soon becomes clear that we are
dealing with multiplicities rather than directly definable, unitary phenomena, each of
which have convoluted histories, and which, depending on the philosophical approach
grounding them, continue to carry aspects of these histories into the present. Sometimes
such aspects seem to cross over to, or to repopulate, sociocultural phenomena in the
present day. Casey, for example, notes how memory has been “frequently confined to
a passively reproductive function of low epistemic status” (1977, 187) in Western phi-
losophy since at least Aristotle. The notion that memory is passive and merely reproductive
places it in a subordinate position within consciousness in relation to sense perception,
which, in Hume, Kant, and to an extent Merleau-Ponty, is seen as primary. Insofar as
sound recording has been associated with memory, this subordination of it with respect
to perception finds echoes in Adorno’s dismissive words on sound recording as “not
good for much more than reproducing and storing a music deprived of its best dimen-
sion, a music, namely, that was already in existence before the phonograph record and is
not significantly altered by it” ([1934] 2002, 278). Even if, as Thomas Levin notes, a dis-
trust of the mimetic in Adorno may be “to some degree a function of the Jewish taboo on
representation so central to Adorno’s aesthetic” (Levin 1990, 25), it nevertheless fits in
with a more widely distributed distrust of “mere” copies or images, also manifest in vari-
ous iconoclastic moments such as Puritanism’s destruction of religious paintings and
statuary in England in the sixteenth and seventeenth centuries, and antagonized by the
likes of Warhol and Lichtenstein bringing techniques and ideologies of the mechanical
copy into the mainstream art world in the 1960s.
But such a conflation of memory with the reproduction of copies misrepresents how
memory operates—a misrepresentation that predates the inception of sound recording
and which therefore demonstrates how already existent cultural and philosophical
values colonize, as models of thinking, emergent technologies. Clearly, the phonograph
was conceived as an extension of memory almost at the moment of its inception.
Johnson reports that “at any future point in history” the recorded voice can be recalled
(Johnson 1877), but the relatively “low epistemic status” that has historically accrued to
memory is later compounded in the case of the phonograph through the central role
sound recordings come to play (not really envisaged by its inventor in the early days) in
mass culture—Adorno and Horkheimer’s culture industry—whose financial successes
come at the price of a failure (if it is indeed a “failure”) to attain high cultural value as
“Art.” Financial success and artistic failure are both, of course, factors in the reproducibility
222 bennett hogg
Toward the end of the eighteenth century Blake would write, “Imagination has nothing
to do with memory,” in the margins to a collection of Wordsworth’s poems, identifying,
in terms current among the nascent Romantic movement, imagination as “the Divine
Vision” that is only vouchsafed to the “Spiritual” rather than “the Natural Man” (Blake
[1927] 1975, 822). In seeking to promote the notion of a creative and spiritually inspired
imagination, Blake repeats the compensatory denigration of another psychic element—
memory. As noted, there has been a tendency to see memory in terms of the storage and
retrieval of “images”—of copies. Casey notes that Western philosophy’s conceptions of
memory—and, indeed, thought more generally—have, for centuries, been colored
technology, memory, and metempsychosis 223
and informed by the model of representation (Casey 1977, 187; 1993, 166–167). The
representational model for understanding memory has, for Casey, “been given a privi-
leged place in thinking about memory überhaupt.” A representational model leads to
an understanding of memory as “reproductive,” and as a result “we witness a working
presumption that all significant human remembering . . . is at once representational
and founded on isomorphic relations between the representing content of what we
remember and the represented thing or event we are recalling” (Casey 1993, 166).
Dreyfuss notes how a similar ideological frame determines how human actions have,
until recently, been understood, but proposes, in contrast to this, a phenomenological
approach that resists the idea that representation is a prerequisite for action. “When every-
day coping is going well one experiences something like what athletes call flow . . . One’s
activity is completely geared into the demands of the situation” (Dreyfuss 1996, 35).
Such “skillful coping does not require a mental representation of its goal” but rather,
quoting Merleau-Ponty, “[a] movement is learned when the body has understood
it, that is, when it has incorporated it into its ‘world,’ and to move one’s body is to
aim at things through it; it is to allow oneself to respond to their call, which is made
upon it independently of any representation” (Merleau-Ponty 1962, 139, quoted in
Dreyfuss 1996, 37, emphasis added).
I do not check out an inner image, or other representation, of my friend: his face and
body give themselves out as already (and instantly) recognizable to me, as featuring
familiarity on their very sleeve, as it were. Here what is remembered, far from being
continued in intrapsychic space, suffuses what I perceive as I perceive Burton [the
friend]; and in this natural context Bergson is right to say that “perception is full of
memories.” (1993, 167–168)
The implied dialogism (or more accurately polylogism) of recognition is differently pre-
sented by Varela and colleagues, yet the refusal of the “stored object” model of memory
and recognition remains a strong trope. Visual sensory data from the eye is met by
activity that flows out from the cortex. The encounter of these two ensembles of
neuronal activity is one moment in the emergence of a new coherent configuration,
depending on a sort of resonance or active match-mismatch between the sensory
activity and the internal setting at the primary cortex. The primary visual cortex is,
however, but one of the partners in this particular neuronal local circuit at the LGN
level. . . . Thus the behaviour of the whole system resembles a cocktail party conver-
sation much more than a chain of command. (Varela et al. 1993, 96)
224 bennett hogg
Husserl underwrites the distinction between memory and imagination by claiming that
memory retains while imagination protends, but some slight self-reflection will show
this to be too baldly schematic. Casey argues, rather, that memory and imagination,
while being distinct mental phenomena in their own rights, are “indispensable”
to one another. Their “mutual inclusiveness and co-iterability” and “their inbuilt
co-operativeness” (Casey 1977, 194–195). Memory and imagination are not just “difficult
to disentangle” from one another, “each act is indispensable in its collaboration with the
other . . . not just essential but co-essential, essential in its very co-ordination with
the other” (Casey 1977, 196).3 For Harpur, this interconnectedness of imagination and
memory is also highly significant. Memory does not simply keep records of past events
like files but:
mixes them up with fantasies and imagined events . . . It even makes things up
altogether, like imagination, and points to the fact that Mnemosyne (Memory) is
the mother of the Greek Muses and infers from this “that memory is pregnant with
imaginative power.” (2002, 215–216)
Harpur’s resistance to thinking of memory in terms of file storage is congruent with the
view of contemporary cognitive science. Memories “are not stored intact in the brain
like computer files on a hard disk” but are built up from different elements in a process
that is also “open to the world,” in other words, a process that incorporates environmental
and social elements, adjusting and reconstructing memories in the light of the con-
temporary situation and conditions (Auyang 2000, 283). Sanders underlines this, noting
how beliefs, ideas, and memories are not only in brains but also out in the world, to the
extent that we put traces into the world, “which changes what [we] will be confronted
with the next time it comes around,” so that our memories are not only carried around
inside of us but are inscribed, as it were, in our worlds (Sanders 1996, paragraph 36).
This brings us to an ecological sense of memory—and of imagination—and though
J. J. Gibson himself was cautious about “the muddle of memory,” excluding recognition
from his understanding of perception, Auyang proposes that an understanding of memory
in terms of an ecosystem of thought is a viable project (Auyang 2000, 300–301). It should
now be clear that the ideas of Varela and colleagues, as well as Casey, Dreyfuss, and
Sanders reported earlier, are broadly compatible with understanding memory and
imagination as participating elements within a greater ecosystem of consciousness that
includes intellection, embodiment, action, agency, and sociocultural relations.
Metempsychosis
that I myself was the immediate subject of my book . . . This impression would persist
for some moments after I awoke. . . . Then it would begin to seem unintelligible, as the
thoughts of a former existence must be to a reincarnate spirit. (Proust [1913] 1985, 3)
“My book” here refers directly to the one the young Proust was reading as he fell asleep,
and though Proust does not claim that what happens when he drifts off to sleep while
reading is directly mnemic, his placing of this figure at the very beginning of a book in
which he and his memories are the immediate subject invites a double reading of what
“the immediate subject of my book” intends. In Proust’s conflation of memory and reincar-
nation memory is positioned less like a process of recording and more like the expe-
rience of being otherwise re-embodied. Somewhat later in the first chapter Proust writes:
there is much to be said for the Celtic belief that the souls of those whom we have
lost are held captive in some inferior being, in an animal, in a plant, in some inanimate
object, and thus effectively lost to us until the day (which to many never comes)
when we happen to pass by the tree or to obtain possession of the object which
forms their prison. Then they start and tremble, and they call us by our name, and
as soon as we have recognized their voice the spell is broken. Delivered by us, they
have overcome death and return to share our life. (47)
This immediately precedes the famous incident of the madeleine, in which voluntary
memory, “the memory of the intellect” that “preserve[s] nothing of the past itself ” is
presented as inferior to the so-called memoire involontaire that spontaneously and unex-
pectedly takes over the whole being, triggered not by a volitional intention but by an
encounter with an object charged with memory. The profound sense of joy that results
from the tea-soaked madeleine is figured in terms of an invasion by “something isolated,
detached, with no suggestion of its origin.” If the book is all memory, and returning from
the book (as it were) is to be reincarnated, it follows that Proust intuits a strong connec-
tion between memory and the transmigration of souls in his novel.
The persistence of a soul, or a disembodied personality or intelligence after bodily
death is congruent with the idea of metempsychosis—the transmigration of a soul from
one embodiment to another. Sound and music are full of such references: the per
sistence of the voice of Echo, or the autotransformation of the nymph Syrinx into a Pan-
pipe in Ovid’s Metamorphoses; the voice of the murdered younger sister in the Scottish
traditional ballad The Twa Sisters or, as it is sometimes known, Binnorie, or its Germanic
equivalent set down in Grimm’s Household Tales and set to music by Mahler as Das
klagende Lied, which emerges from the playing of an instrument made from her mortal
remains—a harp or fiddle strung with her hair or a flute made from one of her bones;
spirit mediums speaking with the voices of the dead they claim to be channeling, spirits
of the departed temporarily taking over the physical bodies of the medium to speak
through them; there is even a hint of the voice qua soul or personality as a vehicle of
226 bennett hogg
transmigration in the behavior of the Cheshire Cat in Disney’s 1951 version of Alice in
Wonderland, who appears as a disembodied voice, and persists momentarily as a voice
and a fading smile (the detached outlet of a voice) after its body has dissolved. After the
telematic technologies of phonography and telephony made their appearance around
1876–1878, one immediate set of cultural constellations in which they were quickly
bound up involved the survival of death, the supernatural, and a means to communicate
with the dead. The conflation of voices disembodied by recording, telephony, or radio, with
the voices of the dead has been a clearly identifiable trope in fantasy literature and
popular culture almost since the moment of these technologies’ inventions (Connor
2000; Kittler 1999; Sconce 2000; Weiss 2002; Hogg 2008), “a historically mediated
imaginary . . . in which death is part of a cluster of ideas that gather around the image of
technology” (Danius 2002, 181; see also Peters 1999, 137–176). This is also another clear
instance of how emergent technologies are often colonized by ideas that predate their
invention and that was mentioned earlier in this chapter.
Memory, or more properly remembrance, is intimately linked with cultural practices
around the death of someone, and so it is perhaps not surprising that technologies that
can record moments in time—such as the movie camera, photography, sound recording—
should step in as extensions of the mnemic capacity. But in the case of sound recording,
this mnemic capacity is articulated through an apparent reappearance of the living
presence of the departed. In gruesomely vivid terms, the narrator of Renard’s Death and
the Shell (1907) evokes the image of departed friends seemingly brought back to life by
listening to recordings of their voices on a phonograph:
[O]n Wednesday the dead spoke to us. . . . How terrible it is to hear this copper throat
and its sounds from beyond the grave ! . . . it is the voice itself, the living voice, still
alive among carrion, skeletons, nothingness.
(quoted in Kittler 1999, 53, emphasis added)
The voice then, as cipher of the soul, passes between the worlds of the dead and of the
living through the mediumship of sound recording. Though memorial in its tone, this
seems more like a visitation or a ghostly encounter than a memory.
On the subject of ghosts, telegraphy served as a model for spirit communication from
the time of the so-called Rochester Rappings of 1848. Here, two young sisters claimed to
be able to communicate with spirits who knocked once for yes and twice for no in answer
to questions put to them verbally. From this grew a spiritualist movement sometimes
technology, memory, and metempsychosis 227
characterized as “the spirit telegraph” (Sconce 2000, 21–28), “such fantastic visions of
electronic telecommunications demonstrat[ing] that the cultural conception of a tech-
nology is often as important and influential as the technology itself ” (Sconce 2000, 27).
From this it is interesting to speculate on what we might think of as phonography’s
ghost, the technology that phonography left behind, as it were, the technology that
Edison was intentionally working on when he chanced upon phonography: the auto-
matic telegraph repeater.
The distances over which Morse telegraphy was possible were originally limited by
the resistance of the telegraph wires, which led to a degradation of signal to the point
that, in order to transmit across the massive distances of the United States, repeater
stations were needed in which a clerk would transcribe an incoming signal and then
retransmit it manually to the next repeater station, and so on across the whole country.
Not only did this take time and manpower, but it also meant that errors of reception and
transmission, both technological and human, could creep into the system. Edison was
working on a system whereby an incoming Morse signal would cut dots and dashes into
a moving paper strip which would then pass through a mechanical reader. This reader
would register, by means of a moving needle, the sequence of short and long signals,
transmitting them onward almost instantaneously and, in theory, as absolutely exact
copies. It was in experimenting with the increase in speed at which accurate relays were
possible that Edison is said to have chanced upon the idea of recording sound—more
particularly the human voice (Kittler 1999, 27–28; Wile 1977, 10–13).
The original situation in which telegraph messages could be sent over long distances
was that a human being would receive and transcribe the incoming message, which they
would then retransmit in a second act of “writing.” The technology Edison was working
on, though, moved from an imaginary of transcribing human bodies toward an imagi-
nary in which information passes smoothly and automatically across great distances
without passing through the body of other humans. Rather than a series of dictations
and reinscriptions—each of them conceived, according to the traditional imaginaries of
writing, as records—a disembodied energy passes from its origin to its destination with-
out seeming to be recorded at all (though in Edison’s repeater it was, in fact, recorded,
but not in writing by a human hand). To then seize on these transitional moments (the
telegraph repeaters) in the relay of information and isolate them as a recording technology
is, in some senses, to turn a means of transmission into a means of recording; records
that were originally only made for the purposes of resending onward become things in
themselves. Recording, then, according to this alternative genealogy, is the capture of an
energy during a moment of its transmigration. If the story ended there we would have
little more than an amusing observation, but much of Edison’s work was conducted in a
milieu of spiritualist research and a grasping at electromagnetic explanations of avow-
edly psychic and supernatural phenomena, not with the aim of debunking myths and
superstitions, but of arriving at scientific justification for such beliefs (Kahn 1994, 76–78;
also Connor 2000, 362–393). As Connor puts it, “The commerce between the disembodied
and the re-embodied, the phantasmal and the mechanical, is a feature in particular of
the scientific understanding of the voice, but it [is] apparent too in the languages and
228 bennett hogg
experiences of the Victorian supernatural, which coil so closely together with that work
of scientific imagining and understanding” (Connor 2000, 363).
And had the phonograph not made speech, “as it were, immortal” (Johnson 1877)?
Like the soul?
When we listen to an audio recording, is it really like remembering, though? Although
the type of recording affects how we experience it—the voice of a departed loved one, a
string quartet by Bartók, and the undefined distant rumble of traffic produce very dif-
ferent experiences—one general distinction between listening to a recording and
remembering is that it is not necessary to re-experience something in the time it would
take to happen in order to remember it. I can remember my wedding, for example, in an
instant, whereas it occupied the best part of a whole day. I remember hearing Mahler’s
Seventh Symphony in Newcastle City Hall similarly, as though it were, in mental time, a
moment. Memory seems to compress and codify experiences, at some level. And though
the etymology of phonography is concerned with the writing of sound, playing back a
recording is nothing like reading, even if “reading” is a viable metaphor for what the
machine is doing. Listening to a sound recording can seem like something altogether
more intersubjective, and though it can evoke memories it feels more like an encounter
with a sounding presence than recall per se. We see this in the “terrible” experience that
Renard’s narrator has with “the voice itself, the living voice, still alive among carrion,
skeletons, nothingness” (Kittler 1999, 53).
Rather as Casey conceives of imagination and memory as “not just essential but
co-essential, essential in [their] very co-ordination with [each] other,” Harpur finds Proust’s
memoire involontaire—the memory that seems to surge into consciousness unbidden,
triggered by a chance encounter such as the madeleine dipped in tea—“analogous to
imagination . . . the relationship between recollection and imagination is so richly inter-
fused that it is as difficult to separate them as it would be to separate, in Proust’s novel,
autobiography and art” (Harpur 2002, 212). Given that sound recording has been tra-
ditionally associated with recollection yet, as we have seen, the match is very far from per-
fect, it is useful to look briefly at two other technologies that occupy important positions
in Proust’s elaboration of his remembering of times past. That the something that surges
up as involuntary memory is “detached” and “isolated” finds a resonance in Proust’s
experiences with his beloved grandmother, the hearing of her voice on the telephone
isolated from seeing her face, and his view of her some short time later before she sees
him and is able to return his gaze. In the latter instance, it is Freud’s other prosthesis of
memory, the camera, that is invoked. Whereas the human eye is “marked by affection
and tenderness . . . necessarily refracted by preconceptions” it “prevents the beholder
from seeing the traces of time in the face of a loved one. . . . Memory thus prevents truth
from coming forward” (Danius 2002, 15). The camera eye, though, invoked by Proust to
technology, memory, and metempsychosis 229
account for the shock of seeing the “red-faced, heavy and vulgar, sick, vacant . . . dejected
old woman whom I did not know” (Proust quoted Danius 2002, 15), “carries no thoughts
and no memories, nor is it burdened by a history of assumptions. For this reason, the
camera eye is a relentless purveyor of truth” (Danius 2002, 15). Though the horror dis-
sipates as soon as eye contact is made, the moment experienced “hint[s] at her impending
death.” Here, though, it is not the photographic record that is deathly (as in Barthes’s
Camera Lucida, with its “anterior future” case of “he is dead, and he is going to die”
(Barthes 2000, 96), but the technological gaze of the camera. Memory, rather than a
dead record, in fact moderates vision and hearing, humanizes and warms it. Beckett
takes the same episodes and, like Danius, brings out the ways in which memory mediates
rather than models recording. He writes, “the laws of memory are subject to the more
general laws of habit” (Beckett [1931] 1999, 18–19), and it is habit, when he visits his
grandmother after the experience of their telephone conversation has so unsettled him,
which is “in abeyance, the habit of his tenderness for his grandmother . . . the notion of
what he should see has not had time to interfere its prism between the eye and its object.
His eye functions with the cruel precision of a camera; it photographs the reality of his
grandmother” (27–28).
The telephone, though, is a more productive technology to examine in terms of
memory, imagination, and the transmigration of souls. As already noted, the telegraph
had required the intermediation of human bodies to transmit over large distances
whereas, as Connor puts it, the telephone “allowed for intimate communication between
two interlocutors alone” (Connor 2000, 362). In its near immediacy of transmission,
and its avoidance of writing and reinscription/retransmission, the telephone fails as a
technological prosthesis of memory, or as a metaphor for memory. Connor, though,
notes how the “striking co-incidence in time” of the discovery of the two inventions
(within a year of one another) “allows us to see the two inventions as different forms of,
or relays in, some single, but polymorphous prosthetic apparatus” (362). When we
consider Connor’s suggestion in the light of the telegraph as not only a technological
forebear of phonography, and the telematic communication system preceding
telephony, but also as a metaphor for communication with the souls of the dead (the
Spirit Telegraph), this “polymorphous prosthetic apparatus” allows for an imaginary in
which memory and metempsychosis fold over one another, but where at least from the
technological perspective metempsychosis, as the motion of an energy through differ-
ent embodiments in media, seems a more plausible model of what is happening with
sound recording technologies. I do not believe that, in phenomenological terms, our
experience of sound recordings is like the retrieval of files from a storage medium, and
neither is our experience of memory.
It is worth noting that memory is nothing like a unitary phenomenon but is, instead, a
whole range of sometimes independent, sometimes interconnected cognitive and
embodied processes (Casey 1993, 165–169). This is one reason why it does not really work
as a metaphor of recording, though recording does seem—on the surface of things—to
work as a metaphor of remembering. Recording sounds and playing them back seems
like a mnemic process, but our experience of listening to such recordings is very different,
230 bennett hogg
as already noted. As Peters has described it, there is a cultural, viable dimension to the
recorded voice that comes over as in a sense “oracular,” a direct transmission from a
being whose consciousness is elsewhere and other to our own, and with whom there is
no possible dialogue. Though to be in the presence of a recorded voice is in many
respects to experience another subjectivity, it is not in any real sense an intersubjectivity
of participants with equivalent status. We are told things by the recorded voice, but there
is no sense of any interlocution. This gives rise, in Peters’s account, to voices whose con-
tents are available for hermeneutic readings, but not for dialogic interrogation. As such,
the voices of the dead—which we encounter most conventionally of course through
sound recordings—are “the paradigm case of hermeneutics: the art of interpretation
where no return message can be received” (Peters 1999, 149).
Recordings
Sounds, though, are recorded. If we experience them as oracles, voices of the dead,
idealized memories, writings, they nevertheless remain as records (rather than the more
complicated memories). The content of a memory does not remain constant in itself,
but is shaped, molded, even materially changed as it combines with other associations,
joins forces with other bits of information in a story that may well conflate different
events without even realizing that this has happened. In Freud’s theory of screen memo-
ries, for example, modifications to memory are repressed as modifications and “remem-
bered” as having actually occurred; such transitive and unreliable qualities of human
memory have already been extensively noted. Memory-content, then, has no independent
existence, and no permanence; it is as much an effect of the ecosystem of which it is a
part as it is an agent that has an effect on that ecosystem. Sound recordings, though,
seem self-sufficient and, as Edison himself put it, are “as it were, immortal” (Johnson 1877).
We know this, though, not to be true.
In the first case, recordings physically deteriorate, they are not immortal in any reliably
ontological sense. In this respect, they seem once more to resemble human memory.
Many of us will have known friends or relatives with Alzheimer’s disease, or similar
degenerative neurological conditions, and the loss of memory for these patients is often
likened to a kind of (premature) death; someone is said to have “left” long before they
actually died, for example. Listening to a very decayed phonograph recording—such as
the one supposed to be of Brahms playing the piano—seems to suggest the fogged and
abrasive feeling of not remembering. In the final moments of Ibsen’s play Ghosts, for
example, Osvald is paralyzed in the final stages of syphilis, and rapidly collapsing
into dementia. His mother asks him if there is anything he wants, and he asks for “the
Sun,” which has just risen in a bright dawn after days and days of rain and darkness.
When his mother questions this, he repeats, “the Sun,” and then, like a cracked record,
repeats again “the Sun . . . the Sun” without any change of expression. Ibsen directs that
Osvald “repeats dully and tonelessly,” and then “tonelessly as before,” In contrast, his
technology, memory, and metempsychosis 231
mother—still human, not reduced to a broken machine—is given extensive and detailed
indications of the gamut of emotion that should be expressed: she “trembles with fear”;
“throws herself on her knees”; “tears her hair with both hands”; “whispers as though
numbed”; “shrink a few steps backwards and screams”; “stares at him in horror” (Ibsen
[1881] 1973, 97–98). Juxtaposed against one another, then, are Osvald’s deathly, machine-
like voice uttering a single sound like a broken record, and the terrible range of violently
shifting, painfully human emotions specified by Ibsen and performed for the audience
by Osvald’s mother.
Ghosts was written in 1881, only four years after the invention of the phonograph, and
there is no evidence that Ibsen conceived of Osvald’s expressionless repetitions because
he had ever heard a broken phonograph cylinder playing. However, the fact that Osvald’s
loss of his personality is performed through a mechanical and expressionless cycle of
repetitions does show a strong congruency in the cultural imagination between a dehu-
manized body and the sound of a broken machine, particularly one designed to sustain
beyond the immediate present the human voice, that atavistic marker of presence and
soul. The “talking machine” was a sensation in the late nineteenth century because it
managed to do what no machine prior to it had been able to do4—it spoke. The distorted,
identical repetitions of the cracked record, though, present nothing but a machine to the
listener, and the suspension of disbelief that sustains the persistence of a human pres-
ence is broken.
Understandably, perhaps, there is a sense of pathos around a recording that has
degenerated. There is a constellation of anthropomorphization that encompasses loss of
memory, a sense of dehumanization of the broken voice, and the transience of all things
human. At the same time, there is opened the possibility for a further anthropomorphi
zation of the fading signal because the recorded speech, like memory, is not immortal,
whatever Edison said about it in 1877. It is worth noting, though, that loss of conscious
memory is only “tragic” when it is framed as such; forgetting is, in many respects, an
essential condition of being able to function in the world without collapsing beneath the
onslaught of multitudinous and mostly distracting or irrelevant connections and asso-
ciations between disparate pieces of information.
But human memory does not only fade through a process of physical damage, or
through the natural entropy of biological matter. Imagination fills in gaps, creates mem-
ories that never happened, forms connections that never pertained in “the real world.”
As I pointed out at the very start of this chapter, imagination is a phenomenon that
shines a critical light on the too easy association of recording and memory and brings a
subtle pressure to bear to explore other kinds of relations. For instance, we tend to think
of analog recordings as more or less inert material onto which a signal is recorded. We
experience the playback of a recording as a separation of the signal from the medium, in
part because we recognize the sound as a voice or an instrument or a steam train which
exists, or existed, outside of the recording. Additionally, we experience the sounds as
being physically separate from the medium as they emerge from loudspeakers, not the
disc or tape itself.5 All of this serves to reinforce the notion that there is the material
medium on the one hand, and the signal on the other, the medium serving only as a
232 bennett hogg
partial and temporary means for the disembodied energy of the sound “itself ” to be
transmitted through. Again, the transmigration of souls seems a more apposite model
for this experience than does human memory.
But is this the only way to conceive of this? In the case of a vinyl recording, for
instance, the “curves of the needle,” as Adorno described them ([1928] 2002), are a phys-
ically integral part of the disc, and their playing back is, arguably, simply the sound the
disc itself makes under the particular conditions of its playback via a turntable and
cartridge. The voice or music we hear is a sonic property of a solid piece of matter; there
is no real separation, any more than the sound of a cymbal, or a piano, could be con-
sidered as a separate property of the metal or the piano string. The sound of a cymbal or a
piano is the articulation of the sonic properties of its constitutive materials as they are
held in a particular state, and under particular conditions of excitation. Can we think of
the sound of a recording in terms of being simply the sonic properties of the disc, or the
tape, itself, once it has been excited in a suitable way? To do so allows us to reconfigure,
in a less anthropomorphizing way, our understanding and our imagination of the
fading signal.
For the surrealist poet and thinker André Breton, “the marvelous” is that which
stands outside of the natural order, which confounds rational expectations we may have
of the world, and which encompasses automata,6 the so-called fixed-explosion, and
objective chance (Foster 1997). He proposes the ruin as an instance of the marvelous in
which nature retakes culture in a reversal of the humanistic notion that culture has the
domination over nature. In this, he mirrors the very aim and objective of surrealism as
“the future resolution of these two states, dream and reality, which are seemingly so contra-
dictory” as it is stated in the first of the manifestoes of surrealism (Breton [1924] 2004,
14). This is a “resolution” that tends more toward a shift of balance away from the cultur-
ally and traditionally dominant side of the binarism toward placing more significance
on the generally subordinate term, rather than a genuine leveling out. The ruin, as a formu-
lation modeled on this pattern of thinking, shows a cultural construction succumbing
to nature, just as automatic writing models the dictation of the unconscious (figured
naively in much surrealism as more “natural” than conscious thought) against rationally
constructed narratives. Automatic writing, or other forms of articulating “pure psychic
automatism,” is “[d]ictated by thought, in the absence of any control exercised by reason,
[and is] exempt from any aesthetic or moral concern” (Breton [1924] 2004, 26).
Taking this as a model, can we take another view of the apparently fading signal, with
its pathetic anthropomorphic associations, and see not the fading of human memory,
but simply the medium reasserting itself as nature reasserts itself in the ruin? Evading
the separation of signal and medium in this way makes it possible to imagine the record
as an object where signal and medium are completely integrated. Instead of a human
memory, the sound of the record evidences something more like the soul of one
departed that has migrated into an apparently inanimate, “inferior” object, as Proust
states “the Celtic belief ” to be. It is important, I think, to ensure that one thinks in terms of
the “reality,” as it were, of the nonseparation of signal and medium, and the “imaginary”
of the metempsychosis of sound, just as memory is also an “imaginary” of recording.
technology, memory, and metempsychosis 233
Notes
1. This is of course a more complex situation; the artifacts arising from damage to, or degra-
dation of certain media (scratches, glitches, drop-outs, etc.) have a perceptible effect on the
status of the recorded signal, and the process by which a signal is reanimated and received
234 bennett hogg
has much to do with when and by whom it is heard. Arguably, though, that which is essential
in the original recorded signal is retained to a high degree.
2. At the very beginning of chapter 4 of Beyond the Pleasure Principle, Freud writes, “What
follows now is speculation, speculation often far-fetched, which each will according to his
particular attitude acknowledge or neglect. One may call it the exploitation of an idea out
of curiosity to see whither it will lead” (Freud [1920] 1961, 24).
3. Casey notes three other instances where this is the case—screen memories, dreams (Freud),
and time-consciousness (Husserl) (Casey 1977, 196).
4. In du Moncel’s early account of the phonograph (du Moncel 1879), he writes of it in relation
to the telephone and the microphone but also, as an afterthought, almost, to Faber’s
Speaking Machine, a mechanical organ-like device that seems to have attempted to physically
recreate, through bellows and different shaped pipes and resonators, the phonemes of
human speech.
5. I refer specifically to analog recording systems because the issues of signal and medium are
perhaps more explicitly foregrounded than with digital systems, though the differences
between digital and analog recordings with respect to the current discussion is not as large
or significant as might be imagined. In addition, the culturally significant resurgence of
cassette tape and vinyl discs over the past ten years or so, especially in DIY culture and
certain areas of experimental music and sound art, means that these forms of recording are
still very much a significant element in the audio culture of today.
6. As Foster (1997) shows, eighteenth- and nineteenth-century automata such as The Little
Writer, The Chess Playing Turk, and the Harpsichord Player exercised a complex fascina-
tion over many of the surrealists, and André Breton in particular.
References
Adorno, T. W. (1927) 2002. The Curves of the Needle. In Essays on Music, edited by R. Leppert,
271–276. Berkeley: University of California Press.
Adorno, T. W. (1934) 2002. The Form of the Phonograph Record. In Essays on Music, edited by
R. Leppert, 277–282. Berkeley: University of California Press.
Armstrong, T. 1998. Modernism, Technology and the Body. Cambridge: Cambridge
University Press.
Auyang, S. Y. 2000. Mind in Everyday Life and Cognitive Science. Cambridge, MA, and London:
MIT Press.
Barthes, Roland. 2000. Camera Lucida. Translated by R. Howard. London: Vintage.
Beckett, S. (1931) 1999. Proust and Three Dialogues. London: John Calder.
Blake, W. (1927) 1975. Poetry and Prose of William Blake. Edited by G. Keynes. London: The
Nonesuch Library.
Breton, A. (1924) 2004. Manifesto of Surrealism. In Manifestoes of Surrealism, edited by
A. Breton, translated by R. Seaver and H. R. Lane, 3–47. Ann Arbor: University of Michigan
Press, Ann Arbor Paperbacks.
Casey, E. S. 1977. Imagining and Remembering. Review of Metaphysics 31 (2):187–209.
Casey, E. S. 1993. On the Neglected Case of Place Memory. In Natural and Artificial Minds,
edited by R. G. Burton, 165–185. Albany, NY: State University of New York Press.
Connor, S. 2000. Dumbstruck: A Cultural History of Ventriloquism. Oxford and New York:
Oxford University Press.
technology, memory, and metempsychosis 235
Danius, S. 2002. The Senses of Modernism: Technology, Perception, and Aesthetics. Ithaca,
NY: Cornell University Press.
Draaisma, D. 2000. Metaphors of Memory: A History of Ideas about the Mind. Translated by
P. Vincent. Cambridge: Cambridge University Press.
Dreyfuss, H. L. 1996. The Current Relevance of Merleau-Ponty’s Phenomenology of
Embodiment. Electronic Journal of Analytic Philosophy 4. http://ejap.louisiana.edu/EJAP/
1996.spring/dreyfus.1996.spring.html. Accessed May 15, 2017.
du Moncel, T. A. L. vicomte. 1879. The Telephone, the Microphone and the Phonograph. London:
C. Keegan Paul.
Foster, H. 1997. Compulsive Beauty. Cambridge, MA, and London: MIT Press.
Freud, S. (1920) 1961. Beyond the Pleasure Principle. In The Standard Edition of the Complete
Psychological Works of Sigmund Freud XVIII, edited and translated by J. Strachey, 7–64.
London: Hogarth.
Freud, S. (1924) 1961. A Note on the Mystic Writing-Pad. In The Standard Edition of the
Complete Psychological Works of Sigmund Freud XIX: The Ego and the Id and Other Works,
edited and translated by J. Strachey, 226–232. London: Hogarth.
Freud, S. (1930) 2004. Civilization and Its Discontents. Translated by D. McLintock.
Harmondsworth, UK: Penguin.
Harpur, P. 2002. The Philosophers’ Secret Fire: A History of the Imagination. London: Penguin.
Hogg, B. 2008. The Cultural Imagination of the Phonographic Voice 1877–1940. PhD thesis,
University of Newcastle upon Tyne.
Ibsen, H. (1881) 1973. Ghosts. Translated by M. Mayer. London: Eyre Methuen.
Johnson, E. H. 1877. A Wonderful Invention—Speech Capable of Indefinite Repetition from
Automatic Records. Scientific American 37 (20): 304.
Kahn, D. 1994. Death in Light of the Phonograph. In Wireless Imagination: Sound, Radio,
and the Avant-Garde, edited by D. Kahn and G. Whitehead, 69–103. Cambridge, MA:
MIT Press.
Kittler, F. 1999. Gramophone Film Typewriter. Translated by G. Winthrop-Young and M. Wutz.
Stanford, CA: Stanford University Press.
Kreilkamp, I. 1997. A Voice without a Body: The Phonographic Logic of Heart of Darkness.
Victorian Studies: An Interdisciplinary Journal of Social, Political, and Cultural Studies 40 (2):
211–244.
Levin, T. Y. 1990. For the Record: Adorno on Music in the Age of Its Technological
Reproducibility. October 55: 23–47.
Merleau-Ponty, M. 1962. The Phenomenology of Perception. Translated by C. Smith. London:
Routledge and Keegan Paul.
Peters, J. D. 1999. Speaking into the Air: A History of the Idea of Communication. Chicago and
London: University of Chicago Press.
Proust, M. (1913) 1985. Remembrance of Things Past, Vol. 1: Swann’s Way and A Budding Grove.
Translated by C. K. S. Moncrieff and T. Kilmartin. Harmondsworth: Penguin.
Sanders, J. T. 1996. An Ecological Approach to Cognitive Science. Electronic Journal of Analytic
Philosophy 4. http://ejap.louisiana.edu/EJAP/1996.spring/sanders.1996.spring.html. Accessed
May 15, 2017.
Sconce, J. 2000. Haunted Media: Electronic Presence from Telegraphy to Television. Durham,
NC, and London: Duke University Press.
Terdiman, R. 1993. Present Past: Modernity and the Memory Crisis. Ithaca, NY: Cornell
University Press.
236 bennett hogg
Varela, F. J., E. Thompson, and E. Rosch. 1993. The Embodied Mind: Cognitive Science and
Human Experience. Cambridge, MA, and London: MIT Press.
Warnock, M. 1976. Imagination. London: Faber & Faber.
Weiss, A. S. 2002. Breathless: Sound Recording, Disembodiment, and the Transformation of
Lyrical Nostalgia. Middletown, CT: Wesleyan University Press.
Wile, R. R. 1977. The Wonder of the Age: The Edison Invention of the Phonograph. In
Phonographs and Gramophones, 9–48. Edinburgh: Royal Scottish Museum.
chapter 12
M usica l Sh a pe
Cogn ition
Introduction
This chapter shall explore notions of shape in our experiences of music, “shape” denoting
various geometric figures or images that we may associate with the production and/or
perception of music. For instance:
The basic tenet of this chapter is that what may be called shape cognition is not only
deeply rooted in our experiences of music and in musical imagery but also has the
potential to enhance our understanding of music as a phenomenon, to contribute to
238 rolf inge godøy
Figure 12.1 Sound-tracings by nine listeners of the sound fragment built up of an initial triangle
attack, a downward glide in the strings, and a final drum roll (spectrogram at the bottom).
(Sound fragment from cd3, track 13, 20”–29”, in Schaeffer [1967] 1998.)
of music as a phenomenon, shape cognition may also be useful in sonic design, musical
composition, performance, and multimedia arts, and a number of other domains by pro-
viding conceptual and practical tools for handling most musical features as shape images.
Notions of Shape
Music is ephemeral: musical sound and music-related body motion unfolds in time and
then vanishes, yet we are (fortunately) left with memory traces of what we just heard
and/or saw. The ephemeral nature of music is (and has been) a major challenge for
research, however, given available technologies for recording, processing, and repre-
senting sound and music-related body motion, we now have the means to “freeze” or
“make solid” the ephemeral, enabling close scrutiny of details previously not possible.
Yet, given these means, the next major challenge has become how to make sense out of
the vast amount of data typically generated by digitalization.
On the other hand, traditional means of representation by Western music notation,
although useful in conserving some aspects of music, is evidently incapable of repre-
senting many aesthetically and affectively highly significant features of musical expression.
This concerns what we may call subsymbolic features of music, meaning the various fea-
tures of sound, such as its so-called timbre (sometimes referred to tone color), a number
240 rolf inge godøy
of nuances in pitch (intonation) and loudness (dynamics), as well as what we may call
the suprasymbolic features, meaning the expressive elements of musical phrases such as
in timing and articulation, in so-called grooves, and in various affective and motion-
related labels, for example, tense, relaxed, light, heavy, agitated, calm, and so on.
We thus have the dual challenge of, on the one hand, representing salient features of
music using digital technology and, on the other hand, to go beyond the limitations of
traditional Western notation. My answer to this dual challenge, then, is that of musical
shape cognition, meaning that all features of music—that is, those at the subsymbolic,
the symbolic, and the suprasymbolic levels—can be represented as shapes; shapes that
enable us to systematically explore the many until now mostly inaccessible, yet highly
significant, elements of musical experience.
Musical shape cognition is thus a unifying conceptual and practical paradigm for
studying and actively manipulating salient features of music at different timescales,
ranging from the micro-level, subnote timescale features, to phrase and section-level
features of musical expression. In sum, we have the main challenge of bridging gaps
between the quantitative (of digital representations of sound and body motion) and the
qualitative (of holistic and subjective musical experience), and I believe musical shape
cognition will be the best answer to this challenge.
It would be no exaggeration to say that expressions of shape are ubiquitous in musical
discourse: there are innumerable occurrences of shape-related terms in music theory,
music analysis, music aesthetics, music history, and other music-related disciplines. We
typically encounter shape expressions for designating melodic, harmonic, rhythmic,
textural, dynamic, and expressive features, as well as large-scale formal designs. Also,
our Western music notation system, with its spatial distribution of notes on the pages
of the score, could actually be seen as having some element of shape cognition and,
secondarily, also as scripts for sound-production that in turn will result in body motion
shapes. And, needless to say, various graphical scores and sketches found in musical
composition and analysis contexts are instances of shape cognition. However, the more
systematic approach to musical shape cognition should be seen in relation to some
specific previous research endeavors:
• Seminal ideas on shape cognition in music extend back to classical Gestalt theory,
with early proponents towards the end of the nineteenth century such as Ehrenfels
and Stumpf and, a bit later, Koffka, Köhler, and Wertheimer, who were all con-
cerned with musical features as shapes (Smith 1988; Leman 1997; Godøy 1997b).
A number of Gestalt ideas have been extended into more recent music theory
(Tenny and Polansky 1980), into auditory research (Bregman 1990), and into music
perception research on melodies (Dowling 1994).
• The single most important historical background for my present thoughts on
musical shape cognition is the phenomenological approach to musical research
advocated by Pierre Schaeffer and his colleagues (Schaeffer 1966, [1967] 1998).
With the triple challenges of new music, music from other cultures, and new music
Musical Shape Cognition 241
technology in the post-World War II era, the need to develop a more universally
applicable music theory became evident to Schaeffer. To go beyond the confines of
traditional mainstream Western music theory, Schaeffer and colleagues turned
their attention to the subjective perception of sound, with the ambition of estab-
lishing a systematic classification of fragments of sound, of so-called sonic objects,
of any type, origin, or signification, for the most part by a systematic ordering of
sound features as shapes.
• Shape cognition plays an important role in acoustic and psychoacoustic research
(De Poli et al. 1991), and it has been used in signal-based visualizations of musical
sound (Cogan 1984) and in readily available software (e.g., SonicVisualiser, Praat,
and AudioSculpt as well as MIRToolbox, Timbre Toolbox, and other MatLab-
based software). Within these software development projects, there is ongoing
work to try to extract more perceptually salient information from signals and to
represent these features as “solid” shapes, that is, representations that can be
exploited in the context of our work on musical shape cognition.
• In work with new interfaces for musical expression (NIME), there is the challenge
of capturing and mapping body motion shapes to sound with the aim of enabling
more human-friendly control of the many parameters that go into digital synthesis
and processing of musical sound. As for motion data input, different technologies
for motion capture are available (various sensors, infrared and video camera
recordings). Associated processing tools (e.g., the MoCapToolbox, the EyesWeb
software, and the AudioVideoAnalysis software [Jensenius 2013]) have been
important in developing shape cognition, making the study of motion as “solid”
shape images possible. Notably, this also makes possible the study of expressive
and affective features of motion as shapes derived from motion data, for example,
of amplitude, velocity, acceleration, jerk, and so forth.
• We have learned much from more general approaches to shape cognition in
so-called morphodynamical theory (Thom 1983; Petitot 1985), an extensive theory
of geometric cognition as a basis for capturing and handling complex and distributed
phenomena in general. Also, in so-called cognitive linguistics, studies of image
schemata (i.e., more generic shape images) and of metaphor theory suggest that
shapes and spatial relations are crucial for all cognition (Godøy 1997a). Additionally,
there has been some very interesting work on the display of quantitative informa-
tion as shapes (Tufte 1983), with modes of representation that seem to have great
potential for shape cognition in general.
• Lastly, we have seen shape cognition become a topic in so-called embodied music
cognition, where the shapes of both sound-producing and sound-accompanying
body motion are understood as integral to musical experience (Godøy 2001, 2003a;
Leman 2008; Godøy and Leman 2010). In Figure 12.2 is an example of such sound-
producing motion shapes of a pianist playing an excerpt from a Beethoven sonata,
together with the notation and spectrogram of the resultant sound, demonstrating
a case of the ubiquitous sound-motion shape relationships in music.
242 rolf inge godøy
Spectrogram
Frequency (Hz)
900
600
300
1000
900
800
500
400
300
Right side, velocities
Velocity (mm/s)
400
200
0
Left side, velocities
Velocity (mm/s)
400
200
0
1 2 3 4 5 6 7 8 9 10
Time (s)
Shoulder Elbow Wrist
Figure 12.2 A synoptic representation of notation (top), spectrogram of resultant sound (next
to top), motion shapes, and velocity shapes of the shoulders, elbows, and wrists of a pianist
playing the opening of the last movement of L. v. Beethoven’s Piano Sonata No. 17 Op. 31 No. 2 in
d-minor, The Tempest, demonstrating shape correspondences between score, sound-producing
motion (including velocity shapes), and resultant sound Reproduced with permission from the
publisher, S. Hirzel Verlag, from (Godøy, Jensenius, and Nymoen 2010).
Musical Shape Cognition 243
As for embodied music cognition, I myself and colleagues have for more than a decade
tried to advance our knowledge of musical shape cognition through the following topics:
Motor Cognition
A core idea of the present chapter is that shape cognition is embodied, and that it extends
to several sense modalities; that is, it is manifest in sound and motion, with motion in
turn including vision, proprioception, haptics, and sense of effort. In particular, the
motor theory of perception has claimed that images of sound-producing body motion are
integral to our perception of sound. Initially presented in linguistics with the suggestion
that language acquisition is not only a matter of becoming familiar with a set of sounds
but also just as much a matter of learning the corresponding sound-producing motion
of the vocal apparatus (Liberman and Mattingly 1985), it has been extended to other
domains of human perception and cognition (Galantucci et al. 2006), including the vis-
ual domain (Berthoz 1997). Furthermore, there is now brain observation evidence of the
spontaneous linking of sound and motion in perception (Haueisen and Knösche 2001;
Bangert and Altenmüller 2003), including evidence of a neurophysiological predisposition
for this linking (Kohler et al. 2002). The mental simulation of assumed sound-producing
motion will, in most cases, be covert, but we may also sometimes have observable behavior
in the form of imitation. This imitation may be variable in its accuracy, ranging from
244 rolf inge godøy
very detailed to rather approximate and vague, as can be observed in cases of the
aforementioned air instrument performance and as may be observed in various kinds of
vocal imitation such as in scat singing and beatboxing.
The motor theory perspective implies that shapes of observed or imagined sound-
producing body motion are projected onto whatever it is that we are hearing; for instance,
we might project images of energetic hand motion onto ferocious drum sound, or slow
bowing motion onto protracted, soft string sound. The idea is that the shapes of sound-
producing body motion contribute to the mental schemas for perceiving musical sound.
However, there is an important duality to shape cognition here: shapes may be
considered “instantaneous” images—that is, something occurring “in the blink of an
eye”—yet shapes may also be considered something that unfolds in time—something
that has to be “set into motion” and more like a script that needs to be run through in a
performance. This duality will be a recurrent topic in musical shape cognition, with a
tentative understanding that perception and action may shift between “instantaneous”
and “unfolding” shapes. This can be related to musical features, as suggested in
Figure 12.3, meaning I can hypothesize that there is a core of more amodal shape cog-
nition surrounded first by a circle of body instantiation, both as stationary posture shapes
and as motion trajectory shapes that are in turn surrounded by a circle of musical
features manifest variably as postures and motion shapes.
With my main tenet that active tracing of sound features as shapes is integral to the
perception and cognition of music, and the associated idea that the spontaneous tracing
of sound features as shapes can be exploited in various music-related contexts, we can
work along the following lines:
Musical Timescales
• Micro, that is, the less than approximately .5 seconds duration range of continuous
sound and body motion, with features such as pitch, loudness, stationary timbre
(or tone color), and various microtextural fluctuations related to shape metaphors
such as smooth, grainy, rough, and so forth.
• Meso, typically in the very approximately .5–5 seconds duration range, and usually
encompassing salient information on rhythmic, textural, timbral, harmonic,
melodic, and overall stylistic and affective features, and very often related to salient
body motion shapes of sound-production, such as of hands moving along the
keyboard (see Figure 12.2).
• Macro, typically containing several meso timescale chunks, forming sections,
whole songs, and more extended works of music.
Clearly, the micro and meso timescales are the most important with respect to
perceiving salient musical features such as timbre, dynamics, rhythmical-textural, melodic,
harmonic, and motion shapes; a couple of seconds of music would be enough to tell us,
for example, that it is a slow waltz, late romantic style, played by a small café ensemble,
and so on (see Gjerdingen and Perrott 2008, for examples of duration thresholds for
various features). But also, the macro timescale could be important for musical shape
cognition; however, this would be more on a narrative or dramaturgical level, for
instance, as found in various cases of program music.
On the micro and meso levels, we have attention and memory constraints that make
these timescales special (see, Godøy 2013 for a summary), but we also find sound-
producing constraints that contribute to chunking at the meso timescale. This includes
some crucial biomechanical constraints (e.g., limits to maximum speed of body motion,
need for rest and change of posture to avoid strain injury, need to anticipate positioning
of effectors [fingers, hands, arms, etc.] before tone onsets, etc.) resulting in s o-called
coarticulation, meaning a contextual fusion of events into meso timescale chunks
(Godøy 2014). Also, there are some motor control constraints that contribute to the
formation of meso timescale chunks. For one thing, human motor control seems to
be hierarchical and goal-oriented (Grafton and Hamilton 2007), organized in the form
of action Gestalts (Klapp and Jagacinski 2011), and furthermore, there have been well-
founded suggestions that human motor control is intermittent (Loram et al. 2014), and
246 rolf inge godøy
also that it may proceed by postures (Rosenbaum et al. 2007), something that I have
called key-postures in sound-producing body motion (Godøy 2013, 2014). Each such
key-posture is surrounded by what I call a prefix and a suffix so that there is a continuous
trajectory to and from the key-postures, something which is closely linked with the
aforementioned coarticulation (Godøy 2014).
The intermittency in human perception, cognition, and action is especially relevant
to our ideas on musical shape cognition, because a shape is, by definition, something
that works holistically, as something overviewed “instantly,” in a “now-point,” to use
Husserl’s expression (Husserl 1991; Godøy 2010c).
Sound Features
Of these two schemes, the typology was considered a coarse, first sorting of sonic objects
into three basic dynamic envelope shape categories:
It should be noted that there are so-called phase-transitions between these typological
c ategories, dependent on the density, rate, duration, and so forth of the events and
clearly related to constraints of sound-production (and probably also of perception
and cognition): shortening the duration of a sustained sound will lead to an impulsive
sound and, conversely, lengthening the duration of an impulsive sound will lead to a
sustained sound; slowing down the rate of onsets in an iterative sound will lead to a
series of impulsive sounds, accelerating the rates of onset of impulsive sounds will lead
to an iterative sound; and so on. Also, in addition to these dynamic typological categories,
there were three pitch-related shape categories in the typology:
These two main categories (dynamic and pitch-related) were then combined in a 3 × 3
matrix of basic typological classification. The general procedure of both the typology
and the ensuing morphology is that of a top-down feature differentiation, starting out
with the overall envelopes and proceeding to subfeatures, sub-subfeatures, and so on,
as far down as is deemed useful for characterizing perceptually salient features. The
morphology is quite extensive in detail, so here are just two of the categories:
• Grain, meaning a rapid fluctuation within the sound, be that of loudness, pitch, or
spectral content; for example, the grainy sound of a deep double bass tone.
• Gait, meaning a slower fluctuation within the sound, such as in the undulating
motion in a dance tune accompaniment.
• Timbral patterns
• Articulatory and expressive patterns
• Timing patterns
In particular, the last three feature categories are well suited for technological
representations that combine Western notation with more detailed information on onset
timing, duration, dynamics, and spectral features, thus making accessible performance-
related information that previously it was not possible to represent. Such access to
features of musical sound is presently made possible within the field of so-called music
information retrieval; that is, the searching through large collections of musical sound
by way of various sound perception criteria (Müller 2015).
Motion Features
The basic idea of the aforementioned motor theory is that any sound event is also
perceived as embedded in a motion event, hence that it could be useful to make a more
systematic overview of various types of music-related body motion. Following the
scheme suggested in Godøy and Leman (2010), there are the following main categories:
These are just main categories, and notably so; music-related motion may also in
many cases be multifunctional, that is, it can be both directly sound-producing and
more theatrical so as to enhance the total, multimodal experience of attending a
concert. Also, there might, in many cases, be similarities in the energy envelopes of
sound-producing and sound-accompanying motion, typically in dance and/or other
kinds of body motion, such as in the classic example of Charlie Chaplin’s shaving
motions mirroring the sound-producing motions in the famous barber scene from
The Great Dictator (see Godøy 2010b, for a discussion of this).
Musical Shape Cognition 249
Furthermore, we have different timescales at work here as well, ranging from the
global to the local. Typically, we may have overall, global motion features such as:
• Quantity of motion, which may be calculated directly from the video data (frame-
by-frame pixel difference) or motion capture data or other time-varying sensor
data (total amount of distance traveled within a timeframe), and which may be a
coarse indicator of overall activity level.
• Various derivatives of the motion data, such as velocity and jerk, indicative of the
mode of motion, a high value for jerk meaning much abrupt motion, a low or zero
value for jerk meaning rather calm motion, and so on.
• Local trajectories, such as the shape of beat, that are indicative of mode of
articulation.
• Trajectories for different sonic features, such as ornaments and figures, indicating
anticipatory motion, phase-transition, and coarticulation.
Common to all these motion features is that they may be experienced, conceived,
and represented as shapes and, returning to the basic dynamic sound envelopes of
Schaeffer’s typology presented earlier, we see that they correlate well with motion
features (Godøy 2006):
As was the case for the sound features, these motion categories and features can in
turn be combined into more complex textures, for example, into often-found fore-
ground-background or melody-accompaniment textures of Western music, or into var-
ious heterophonic textures with composite sonic objects.
Musical shape cognition combines sound and motion, and hence also motion-linked
sensations such as vision, touch, proprioception, sense of effort, and possibly other
sensations as well, all together calling the “purity” of music into question, and rather
suggesting that we recognize and tackle sound-motion links as an inherent multi-
modal feature of music. Furthermore, “multimodal” here means that sound is combined
250 rolf inge godøy
with shapes in the involved modalities, that is, shapes of motion, vision, proprioception,
touch, and so on.
Concretely, we can see how different, and variably multimodal, musical features relate
to shape in Figure 12.3. Centered on a core of what may be understood as a very general
and amodal musical shape cognition, this general faculty for shape cognition may be
differentiated into the two main categories of posture-related shapes and motion tra-
jectory shapes, but certainly this is more of a gravitation matter than a sharp divide. These
two main categories can in turn be differentiated into a number of shape categories that
are variously posture and/or motion related.
In more detail, the posture shapes, hence the quasistationary shapes, include the
following sound-motion shapes (going clockwise around the outmost circle, starting
at the top):
And the motion trajectory shapes include the following sound-motion shapes
(continuing clockwise around the outmost circle), often displaying high levels of meso
timescale, within chunk fusion by coarticulation:
Musical Instants
A shape is intrinsically (by definition) instantaneous, in the case where we see it all at
once (a figure on a page, a sculpture, an object), when we anticipate a sequence of
motion, a sequence of events, of sound, or when we need to scan a figure in time or listen
to a sequence in time and form the shape image retrospectively and by keeping the
temporally unfolded shape in some kind of resonant buffer. We thus have a seemingly
enigmatic relationship between continuity (stream of sensations) and discontinuity
(instantaneous shape images based on overviews of continuous segments) in our reflec-
tions on musical shape cognition.
However, this relationship between continuity and discontinuity may perhaps
(at least partially) be understood as integral to motion planning and motor control, cf.
the model of key-posture oriented motion where key-postures are discontinuous but
where their respective prefixes and suffixes are continuous, resulting in a continuous,
undulating motion, only intermittently “punctuated” by key-postures (Godøy 2013).
The idea of motor theory is that the production schema is projected onto whatever it is
that we are perceiving, hence suggesting that we also perceive the key-posture orientation
252 rolf inge godøy
From a more conceptual point of view, we may in any case claim that shape is by
definition holistic or nonpunctual, hence, always temporally extended, and often also
spatially extended in the sense of the effectors (fingers, hands, arms) and, more indi-
rectly, in the time-domain and frequency-domain representations as extended shapes.
How the transition between the continuous stream of sound-motion and the discon-
tinuous images of shape (physical and/or mental images) actually works in our perception
and cognition still seems to be quite enigmatic, yet it so very obviously seems to work,
both in musical contexts and in general.
Shape Cognition in
Musical Imagery
Evidently, musical sound creates memory traces in our minds, and it seems that we may
mentally replay the music in the original tempo, or in slow motion, or in fast motion,
even defying temporal unfolding as the sounds may be more in the guise of instan
taneous overview images (cf., the previous section). What such an ability to recollect and
reenact musical sound in our minds points to, is the capability of musical imagery,
meaning to make music present in our minds beyond the immediate or “original” listening
experience. “Musical imagery” may be defined as “our mental capacity for imagining
musical sound in the absence of a directly audible sound source, meaning that we can
recall and re-experience or even invent new musical sound through our ‘inner ear’ ”
(Godøy and Jørgensen 2001, ix). However, the expression “musical imagery” is sometimes
also taken to denote mental images that accompany music, images of colors, textures,
landscapes, and so forth, that listeners may have when listening to various kinds of music
(Aksnes and Ruud 2008). Such imagery with music may of course reflect shape-related
features of the music; however, I shall here limit my reflections to the imagery for sound
and its associated sound-producing and/or sound-accompanying motion images.
Knowledge about musical imagery has in the past couple of decades been enhanced
by both behavioral and brain imaging research (see, e.g., Zatorre and Halpern 2005, for
an overview). But musical imagery may also be seen in the broader context of mental
imagery, and is closely linked with our general capacity for reenacting in our minds
whatever it is that we may have experienced (Kosslyn et al. 2001), as well as having a
capacity for simulating expected future events and actions in order to make us better
254 rolf inge godøy
Musical shape cognition is becoming increasingly more feasible with new technology;
thanks to new conceptual tools and attitudes, we might soon achieve an enhanced under-
standing of how we perceive sensory impression holistically as shapes. As argued earlier,
Musical Shape Cognition 255
• Finding and isolating experientially salient features, both in sound data and in
motion data.
• Systematically exploring sound-motion shape relationships by analysis-by-synthesis
and match-mismatch experiments.
• And, not to forget, exploring practical applications of musical shape cognition in
performance, composition, improvisation, sonic design, and various multimedia
arts, by systematic mappings between different representations.
Yet, in spite of these outstanding challenges, it seems fair to conclude that most (if not
all) perceptually salient musical features may be conceptualized as shapes. Our capacity
for musical shape cognition should be considered one of the most powerful tools of both
knowledge and skill in musical creation and we are only at the beginning of tapping its
potential.
References
Aksnes, H., and E. Ruud. 2008. Body-Based Schemata in Receptive Music Therapy. Musicae
Scientiae 12 (1): 49–74.
Bangert, M., and E. O. Altenmüller. 2003. Mapping Perception to Action in Piano Practice:
A Longitudinal DC-EEG Study. BMC Neuroscience 4: 26.
Berthoz, A. 1997. Le sens du mouvement. Paris: Odile Jacob.
Bever, T. G., and D. Poeppel. 2010. Analysis by Synthesis: A (Re-)Emerging Program of
Research for Language and Vision. Biolinguistics 4: 174–200.
Bizley, J. K., and Y. E. Cohen. 2013. The What, Where and How of Auditory-Object Perception.
Nature Reviews Neuroscience 14: 693–707.
Bregman, A. S. 1990. Auditory Scene Analysis. Cambridge, MA, and London: MIT Press.
256 rolf inge godøy
Cogan, R. 1984. New Images of Musical Sound. Cambridge, MA, and London: Harvard
University Press.
De Poli, G., A. Piccialli, and C. Roads. 1991. Representations of Musical Signals. Cambridge,
MA, and London: MIT Press.
Dowling, W. J. 1994. Melodic Contour in Hearing and Remembering Melodies. In Musical
Perceptions, edited by R. Aiello and J. A. Sloboda, 173–190. New York: Oxford University
Press.
Galantucci, B., C. A. Fowler, and M. T. Turvey. 2006. The Motor Theory of Speech Perception
Reviewed. Psychonomic Bulletin and Review 13 (3): 361–377.
Gjerdingen, R. O., and D. Perrott. 2008. Scanning the Dial: The Rapid Recognition of Music
Genres. Journal of New Music Research 37 (2): 93–100.
Godøy, R. I. 1997a. Formalization and Epistemology. Oslo: Scandinavian University Press.
Godøy, R. I. 1997b. Knowledge in Music Theory by Shapes of Musical Objects and Sound-
Producing Actions. In Music, Gestalt, and Computing, edited by M. Leman, 89–102. Berlin:
Springer-Verlag.
Godøy, R. I. 2001. Imagined Action, Excitation, and Resonance. In Musical Imagery, edited by
R. I. Godøy and H. Jørgensen, 239–252. Lisse, Netherlands: Swets and Zeitlinger.
Godøy, R. I. 2003a. Motor-Mimetic Music Cognition. Leonardo 36 (4): 317–319.
Godøy, R. I. 2003b. Gestural Imagery in the Service of Musical Imagery. In Gesture-Based
Communication in Human-Computer Interaction, LNAI 2915, edited by A. Camurri and
G. Volpe, 55–62. Berlin and Heidelberg: Springer-Verlag.
Godøy, R. I. 2006. Gestural-Sonorous Objects: Embodied Extensions of Schaeffer’s Conceptual
Apparatus. Organised Sound 11 (2): 149–157.
Godøy, R. I. 2008. Reflections on Chunking in Music. In Systematic and Comparative
Musicology: Concepts, Methods, Findings, edited by A. Schneider, 117–132. Frankfurt:
Peter Lang.
Godøy, R. I. 2010a. Images of Sonic Objects. Organised Sound 15 (1): 54–62.
Godøy, R. I. 2010b. Gestural Affordances of Musical Sound. In Musical Gestures: Sound,
Movement, and Meaning, edited by R. I. Godøy and M. Leman, 103–125. New York: Routledge.
Godøy, R. I. 2010c. Thinking Now-Points in Music-Related Movement. In Concepts,
Experiments, and Fieldwork: Studies in Systematic Musicology and Ethnomusicology, edited
by R. Bader, C. Neuhaus, and U. Morgenstern, 245–260. Frankfurt am Main: Peter Lang.
Godøy, R. I. 2011. Sound-Action Awareness in Music. In Music and Consciousness, edited by
D. Clarke and E. Clarke, 231–243. Oxford: Oxford University Press.
Godøy, R. I. 2013. Quantal Elements in Musical Experience. In Sound, Perception, Performance:
Current Research in Systematic Musicology, Vol. 1, edited by R. Bader, 113–128. Berlin: Springer.
Godøy, R. I. 2014. Understanding Coarticulation in Musical Experience. In Sound, Music, and
Motion, LNCS 8905, edited by M. Aramaki, O. Derrien, R. Kronland-Martinet, and S. Ystad,
535–547. Berlin: Springer.
Godøy, R. I. 2017. Key-Postures, Trajectories and Sonic Shapes. In Music and Shape, edited by
D. Leech-Wilkinson and H. Prior. Oxford: Oxford University Press.
Godøy, R. I., E. Haga, and A. R. Jensenius. 2006. Playing “Air Instruments”: Mimicry of
Sound-Producing Gestures by Novices and Experts. In GW2005, LNAI3881, edited by
S. Gibet, N. Courty, and J.-F. Kamp, 256–267. Berlin: Springer.
Godøy, R. I., A. R. Jensenius, and K. Nymoen. 2010. Chunking in Music by Coarticulation.
Acta Acustica united with Acustica 96 (4): 690–700.
Godøy, R. I., and H. Jorgensen. 2001. Musical Imagery. Lisse, Netherlands: Swets and Zeitlinger
Musical Shape Cognition 257
Godøy, R. I., and M. Leman. 2010. Musical Gestures: Sound, Movement, and Meaning.
New York: Routledge.
Godøy, R. I., M. Song, K. Nymoen, M. R. Haugen, and A. R. Jensenius. 2016. Exploring Sound-
Motion Similarity in Musical Experience. Journal of New Music Research 45 (3): 210–222.
Grafton, S. T., and A. F. de C. Hamilton. 2007. Evidence for a Distributed Hierarchy of Action
Representation in the Brain. Human Movement Science 26:590–616.
Griffiths, T. D., and J. D. Warren. 2004. What Is an Auditory Object? Nature Reviews
Neuroscience 5 (11): 887–892.
Grossberg, S., and C. Myers. 2000. The Resonant Dynamics of Speech Perception: Interword
Integration and Duration-Dependent Backward Effects. Psychological Review 107 (4):
735–767.
Haueisen, J., and T. R. Knösche. 2001. Involuntary Motor Activity in Pianists Evoked by Music
Perception. Journal of Cognitive Neuroscience 13 (6): 786–792.
Hindemith, P. 2000. A Composer’s World: Horizons and Limitations. Mainz: Schott.
Husserl, E. 1991. On the Phenomenology of the Consciousness of Internal Time, 1893–1917.
Translated by J. B, Brough. Dordrecht: Kluwer Academic.
Jensenius, A. R. 2008. Action, Sound: Developing Methods and Tools to Study Music- Related
Body Movement. PhD thesis, University of Oslo. Oslo: Acta Humaniora.
Jensenius, A. R. 2013. Some Video Abstraction Techniques for Displaying Body Movement in
Analysis and Performance. Leonardo: Journal of the International Society for the Arts,
Sciences and Technology 46 (1): 53–60.
Jensenius, A. R., and R. I. Godøy. 2013. Sonifying the Shape of Human Body Motion using
Motiongrams. Empirical Musicology Review 8: 73–83.
Klapp, S. T., and R. J. Jagacinski. 2011. Gestalt Principles in the Control of Motor Action.
Psychological Bulletin 137 (3): 443–462.
Kohler, E., C. Keysers, M. A. Umiltà, L. Fogassi, V. Gallese, and G. Rizzolatti. 2002. Hearing
Sounds, Understanding Actions: Action Representation in Mirror Neurons. Science
297: 846–848.
Kosslyn, S. M., G. Ganis, and W. L. Thompson. 2001. Neural Foundations of Imagery. Nature
Reviews Neuroscience 2: 635–642.
Leman, M. 1997. Music, Gestalt, and Computing: Studies in Cognitive and Systematic Musicology.
Berlin: Springer.
Leman, M. 2008. Embodied Music Cognition and Mediation Technology. Cambridge, MA:
MIT Press.
Liberman, A. M., and I. G. Mattingly. 1985. The Motor Theory of Speech Perception Revised.
Cognition 21: 1–36.
Loram, I. D., C. van de Kamp, M. Lakie, H. Gollee, and P. J. Gawthrop. 2014. Does the Motor
System Need Intermittent Control? Exercise and Sport Science Review 42 (3): 117–125.
Michon, J. 1978. The Making of the Present: A Tutorial Review. In Attention and Performance
VII, edited by J. Requin, 89–111. Hillsdale, NJ: Erlbaum.
Müller, M. 2015. Fundamentals of Music Processing. Heidelberg, NY: Springer.
Nymoen, K., R. I. Godøy, A. R. Jensenius, and J. Torresen. 2013. Analyzing Correspondence
between Sound Objects and Body Motion. ACM Transactions on Applied Perception 10 (2).
9:1–9:22.
Petitot, J. 1985. Morphogenèse du sens I. Paris: Presses Universitaires de France.
Pöppel, E. 1997. A Hierarchical Model of Time Perception. Trends in Cognitive Science 1 (2):
56–61.
258 rolf inge godøy
Rosenbaum, D. A., R. G. Cohen, S. A. Jax, D. J. Weiss, and R. van der Wei. 2007. The Problem
of Serial Order in Behavior: Lashley’s Legacy. Human Movement Science 26 (4): 525–554.
Schaeffer, P. 1966. Traite des objets musicaux. Paris: Editions du Seuil.
Schaeffer, P. (1967) 1998. Solfege de l’objet sonore. With sound examples by Reibel, Guy, and
Ferreyra, Beatriz. Paris: INA/GRM.
Smith, B. 1988. Foundations of Gestalt Theory. Munich and Vienna: Philosophia Verlag.
Stern, D. 2004. The Present Moment in Psychotherapy and Everyday Life. New York:
W. W. Norton.
Tenny, J., and L. Polansky. 1980. Temporal Gestalt Perception in Music. Journal of Music
Theory 24 (2): 205–241.
Thom, R. 1983. Paraboles et catastrophes. Paris: Flammarion.
Tufte, E. R. 1983. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Williams, T. I. 2015. The Classification of Involuntary Musical Imagery: The Case for Earworms.
Psychomusicology 25 (1): 5–13.
Xenakis, I. 1992. Formalized Music. Rev. ed. Stuyvesant: Pendragon.
Zatorre, R. J., and A. Halpern. 2005. Mental Concerts: Musical Imagery and Auditory Cortex.
Neuron 47: 9–12.
chapter 13
Pl ay i ng th e I n n er E a r
Performing the Imagination
Simon Emmerson
Introduction
Musicians and sound artists imagine sound; the imagination is the most powerful and
flexible audio workstation we have. It can do fabulous transformations—some beyond
current “real-world” capabilities. In this manner, it works on sound memory and
experience. But the imagination is also a synthesizer. Here we are in uncharted waters.
Real synthesis is experimental: we build circuits, set dials, or build algorithms and set
parameters—whatever the means, we can sit back and listen to the possibly unexpected
results. However, the imagination works differently in this mode—perhaps its inputs are
not at all conscious. I might hear a sound in my imagination apparently from nowhere—
I can perceive no immediate cause. How might we externalize and use this enormous
power? And, furthermore, might imagining music become a form of performance?1
There is on one hand a discourse of language2: first, better to describe what these inter-
nal sounds and processes sound like but, second, as a means to encourage their fuller
development. On the other hand, there is a subtle non- (pre-?) linguistic game. We do
not need words for this—they might even get in the way and limit the options. This
is about creativity and play—a continuous sequence of imagine, play, listen, modify,
imagine. . . . There may be other nonverbal ways to externalize, for example—what is the
role of visualization? Sound-to-visual synesthesia is a specific form of a more general
phenomenon (Van Campen 2008). This, too, may be a descriptive response but poten-
tially a powerful synthesizer as well. This is a form of reverse engineering—we describe
an effect and work backward to reconstruct a possible material cause.
This chapter will not deal directly with interface design, brain, and neuroscience. What
I want to discuss and encourage is greater engagement of this infinite world in the creative
process of sound- and music making. It is entirely speculative although based on ideas
and tools that seem to have had seeds in the last quarter of the twentieth century—indeed
260 simon emmerson
it is clearly a wondrous dream and we all (almost immediately) are imagining what
sounds might make him “cry to dream again.” Interestingly, the word “noises” seems not
to have a negative connotation in this passage. If for the moment these imaginings are
private, how might we in future harness them to enhance our abilities and possibilities
in the shared perceptible world of sound?
In the 1980s, the phrase that was often used to describe emerging computer and digital
applications was “music information technology.” This was a dry and practical reduc-
tion of music to information—eliminating or at least discouraging the descriptions
of aesthetic dimensions. We now have an emerging “music imagination technology”
(Emmerson 2011). Imagination is defined as “the faculty or action of forming new ideas,
or images or concepts of external objects not present to the senses.”4
use image in its everyday sense as having a visual component, while imagination can
have a much broader range of sensory, space, and time elements (audio, visual, tactile,
and so forth)—thus an image may, of course, be real or take effect within the imagination.5
So as musicians we may imagine a scenario, an instrument, a performance, a sense of
space, place and movement, a form, an atmosphere. These may not be sounding but give
us a context for sound. For example, we may imagine a complex relationship expressed
through mathematics that somehow drives the sound synthesis. John Chowning talks
of something similar in his realization that the principles of frequency modulation
(FM, well established in radio) might be applied in the audio domain.
Then, most importantly, we have what I would call “inner listening modes” that
only partly correspond to those of the physical world. First, we can imagine acousmatic
sound—that is the sound itself directly (with no sense of place and origin). If we are
aware of our imagination at work we may not (after all) search for any source and cause.
But then again, as a second such mode, imagining source and cause is also possible—
we might construct imaginary instruments, environments, machines, and so forth
(in scenarios as surmised earlier), and “hear” what we believe is their sounding.
Goethe said that “architecture was music become stone.” From the composer’s point
of view the proposition could be reversed by saying that “music is architecture in
movement.” (Le Corbusier 1968, 326)
So, when sound becomes music there seem to be two approaches to its evolution: form
in space, which seems to have an outside time existence—a kind of architecture; and
form in time, “forming”—an emergent property that is built over time into memory. Let
us examine each in turn, starting with form in space. Composers have often described
imagining musical form “outside time”—some have claimed to “see” forms of musical
works in an instant.
Form in space clearly has a relationship to the idea of the (Western musical) score
where time is mapped onto space. But, perceptually, we have form in time—a kind of
accumulation in which only at the conclusion of listening does memory assemble the
whole.6 Of course, if it is a piece that already exists, and that I already know, then this works
somewhat differently, as I may be comparing the present with a memory of the past—but
for our imaginative synthesizer we are working much more in the moment on entirely new
sound and music. I have a compromise (perhaps alternative) approach to bring these two
262 simon emmerson
Much study in the psychology of music tries to understand the reaction of the human to
musical stimulus, increasingly embodied as well as socially and ecologically situated
(Clarke 2005; Bharucha et al. 2006; Leman 2016). I suggest that we have here the elements
of a tool set for a reversal of the process. One of the aims of the study and understanding
of human reaction to sound might be to allow its generation from our ideas. From both
film music libraries and music information retrieval (MIR)-based sound spotting—that
is, finding sound with a given characteristic—we see the emergence of such toolkits.
It is this mirroring that will form the basis of my discussion. This is what I described
above as a kind of reverse engineering, starting with the result and working back to the
possible cause.
original audio. The results of these behavioral experiments will guide future research
toward accurate stimulus reconstruction from brain activity” (Thompson et al. 2013, 6).
This is based on the oldest of scientific methods that needs careful handling. If music
(or any signal) X tends to stimulate neural pattern Y sufficiently consistently across a
wide population then the observation of Y implies the imaginary or real presence of X.9
We know philosophically that a correlation is not necessarily a cause—but we tend pro-
gressively to adopt this belief the greater the supporting evidence (and in the absence of
contrary evidence). In short, we are using an inverse procedure to logical deduction—
namely induction—as we do not yet possess a direct causal mechanism between signal
and response. A sophisticated application of this to speech resynthesis is described
in Pasley and colleagues (2012). While great progress at the level of general features
has been made they still report that “Single trial reconstructions are generally not
intelligible. However, coarse features such as syllable structure may be discerned” (13).
Early research focused on measuring the neural response to the presence of real audio
or visual material. But more recently there has emerged the first stages of detecting the
neural activity patterns relating to such as memory and even imagination in the absence
of any physical signal. Grimshaw and Garner (2015) include a meticulous review of this
relationship. Their position is the most radical reappraisal of what (and where) sound is:
Sound is an emergent perception arising primarily in the auditory cortex and that is
formed through spatio-temporal processes in an embodied system.
(1, this idea is developed throughout the book)
To support this thesis they develop the idea of the sonic aggregate, which
comprises two sets of components: the exosonus, a set of material and sensuous
components; and the endosonus, a set of immaterial and nonsensuous components.
The endosonus is a requirement for the perception of sound to emerge; the exoso-
nus is not. (4)
In their final chapter, Grimshaw and Garner (2015) state one possible ultimate aim
as “simply thinking a sound when one wishes to design audio files” (196). I am, in this
chapter, taking off from this point—namely the observation and use of the neural patterns
resulting from imagined sound as the basis for a synthesis engine.12
Rendering Memory
Writers as well as research scientists have imagined the awesome power of tapping directly
into another person’s memory and somehow “reading” it. In Dennis Potter’s final play
created for British television, Cold Lazarus (1994/1996), the memories of a writer13 whose
head has been cryogenically frozen for nearly four hundred years are extracted and
projected in 3D into a relatively large space—a hi-tech laboratory whose funding might
depend on selling the results to a worldwide TV network.
As the experiment proceeds with increasing success, we see memories of landscapes
and people and hear sounds and conversations that the small group of scientists tries to
make sense of. The past is thus preserved and then projected into the present—or is it? It
turns out it is not that simple—the observers slowly become aware that “it” is interacting
with them; the head retains a degree of consciousness. As the head can observe the pro-
jected memories, memory, the present, and imagination become confused.
Rendering Imagination
Let us take Potter’s vision and give it a more optimistic, forward-looking projection.
What of the future—the act of imagination of what might be—could this also not be pro-
jected in like manner to be rendered and synthesized at our behest? This is not synthe-
sizing the future strictly but “the imaginative present”—we might project what we hear
(and see) in our imagination right now.
Can we really imagine sound without access to memory? This may be impossible to
answer, as we clearly access memory without conscious intention. As Murray Schafer
(1977) long ago pointed out, certain sounds seem to have a universal resonance, possibly
through their role in both our long-term evolution and in our experience of prebirth
sound in an amniotic state. Of course, we can consciously recall sound to our own present
internal perception but to communicate this to others in any detail14 we need (at present)
language and other descriptive symbol sets. Perhaps naming itself is an act of memory,
being shorthand for this description—but for that we seem to need a degree of stability
and repeatability.
Throughout my life I have heard sounds while driving that I have wanted to capture—
I am aware that this sensation lies uneasily between physical perception and imagination.
Sometimes, I cannot tell the provenance of the sound at all—my considered view is that
some aspect of the real sound around me provokes an additional imaginative layer and
the two strongly interact.
playing the inner ear: performing the imagination 265
Thus, this is more than simply externalizing imagination and effecting its synthesis
into sound—it may one day be possible to unravel these two components and understand
their interaction. The sounds are (unsurprisingly) typically drones but with great (and
sometimes changing) internal detail and occasional sharper events as a kind of punctu-
ation. Their mystery is compounded by ambiguous spatiality—both very close and very
distant at the same time. I have repeatedly referenced the line from the text of Stockhausen’s
Momente (from a personal letter from Mary Bauermeister): “Everything surrounding
me is near and far at once”—as a personally relevant resonance. I believe it is a key
modernist trope concerning spatiality in a mediated age.
Embodied Response
Perhaps other aspects of embodied response may be reverse engineered to join our
battery of drivers of the new imaginative synthesizer. What do sound and music stimulate
within us? One most extensively researched recently is that of mirror neurons firing “in
sympathy.” Indeed, in their discussion of the relationship of music to the mirror neuron
system, Molnar-Szakacs and Overy (2006) go so far as to write that
Watch dance and we mentally dance too. It is thus not surprising that actual muscular
movement creeps back in, such as rhythmic foot tapping or bodily reduced (more or
less) dance gestures. Then there are families of “air” activities—air guitar and air con-
ducting are common—that indicate the embodied attunement and entrainment of the
listener. These have already been actively harnessed in many computer game controllers
and interfaces. Still strictly embodied and physical, we might add our air imagination—
which will have rather different characteristics that we will explore in what follows.16
266 simon emmerson
Gesture is concerned with action directed away from a previous goal or towards a
new goal . . . Texture . . . is concerned with internal behaviour patterning, energy
directed inwards or reinjected, self-propagating. (82)
Thus, gesture tends to imply the performative—involving clear cause, effect, and agent—
while texture tends to imply elemental continuity where the cause, effect, and agent
chains are more complex. We can either describe these as being at micro-scales we cannot
individually perceive, or as much larger immanent structures with some kind of con-
tinuous (and often vague) agency and causality. To harness the potential of “air imagination”
we will need to capture the complete range of these time scales, from the immediately
embodied rhythmic through to longer time scales of day, week, month and season, year,
growth, and decline.
to source and cause. Electroacoustic music, of course, often uses this as the basis of its
aesthetics, deliberately bracketing out the search for origins and thus stimulating the
imagination through sound alone.17
I personally do have imaginary visualization when listening to electroacoustic music:
I see shapes, textures, colors, often “set” in a quasi-real-world vista and spaces. This
might be abstract geometric or more of a “landscape environment.” I listen with eyes
open to enhance this perception that seems to be in real space around me, superimposed
on (and strangely integrated with) the actual visual information from wall materials,
loudspeakers, other audience disposition, and set-up.
The synesthesic synthesizer may be played to produce new sounds—but what about new
music? Now is not the time to discuss any distinction between the two ideas—sound
may be music, music must effectively involve sound, but the boundary may be both
playing the inner ear: performing the imagination 269
porous and flexible. That said, I do have a bias toward retaining somewhere in the
relationship the idea of performance. Some of the actions that led to the imaginary
synthesis described previously were essentially performative. But I hesitate to say they
were performance.
I want to make a fuzzy distinction between playing and performing. Playing might be
seen as a search for suitable materials, performing as presenting some kind of structure—
maybe perceived as “expression” or “argument”—beyond the individual components,
although the two clearly overlap. Musicians are generally not dancers. Their movements
have been accurately directed by mechanical technology toward the physical excitation
of an object. The use of media has freed this up, allowing movement alone to control
sound, bringing dance and music performance a step closer. But there often remains a
strong residual desire for some sort of resistance. The study of such haptics informs inter-
faces and interactions where this enhances muscular control. The ultimate “air play”
may combine free and resistive components.25
perform the spatialization of the prerecorded sound around the audience.27 We can now,
of course, adapt this for a real-time sound sculpture where the movement controls
sound quality. Thus, the next stage of our imaginary sonification performance might be
to manipulate sound in space as a malleable (even fluid) substance—to place, move, and
“smear” sounds within that space, as a painter might sketch or, more appropriately, as a
dancer might move or a sculptor might manipulate clay. This shifts the metaphor for
externalizing imagination from 2D painting or movie to a 3D activity—dancing, bricolage,
or sculpting. Yet again, many of the inventions and developments of previous decades
may give us helpful clues as to how the new haptics (with resistance) and 3D representation
can be adapted to the musician’s touch and feel. Our imagination of being a dancer/
sculptor can be as a creator immersed in the sound—something our real sculptor can
but dream of.28
From the STEIM studio in Amsterdam came some of the most inventive devices to
harness the elemental (human) agency of movement. From 1984 on, Michel Waisvisz
developed a series of controllers, known as “The Hands,” detecting hand and some body
movement (Waisvisz 1985, later versions may be seen on the STEIM website29 especially
in videos of Waisvisz’s performances over the years). Using a later technology, Laetitia
Sonami developed her “Lady’s Glove”30 and, in a more popular idiom, Imogen Heap has
used similar controls (her Mi.Mu gloves31). Such gesture transducers might be a very
suitable interface to capture the “air” controller gestures as we explore performance as a
creative part of the imagination synthesizer. We must also remember another powerful
tool that might be controlled by such dance or sculptural gestures.
An Imaginative Plug-In—Imaginary
Sound Transformation
As we remarked in the introduction, the mind—the imagination—is also a fabulous
sound-transformation device. While the sensational world is bound by some sense of an
externally applied space and time, the imagination knows not these as boundaries. Just
as we remarked on documented cases of composers glimpsing an entire piece in an
instant, so too we might be able to grasp a complex transformation in a flash, then per-
haps “play” it at something nearer the time of real performance. We might be able to
compare alternative strategies for the sound to develop. Time compression and expan-
sion can be a useful tool for creation but may not map simply to external world time.
And so too with space: Gaston Bachelard’s intimate immensity (1964) speaks to an
oneiric experience that I have often had in the twilight between wakefulness and sleep
and also when entering the Olympic Stadium in London in 2012. Many composers (and
movie sound designers) have tried to capture such an experience in real sound. If our
imaginative experience of space can be surreal (even completely unreal) then it can act
as a stimulus—an impossible goal we know we cannot reach but there might be fruitful
experience in trying.
playing the inner ear: performing the imagination 271
Let us assume we do indeed have a suitable synthesis engine that can (in some way yet to
be determined) respond to our imaginative synthesis wishes. But the system starts with
a blank. Where do we begin? What do we start with? With the concrète traditions of
music-making we start with a sound and play with it—an empirical and experimental
approach. But, with our imaginary synthesizer, we have many possible sounds that do
not yet exist. Or perhaps they do exist but are hidden from our consciousness until called
forth. Let us divide the possibilities into seeds and provocations. I will not come to defini-
tive conclusions here but make some key suggestions and questions that we can crea-
tively address.
First, let us describe a seed stimulus: this might be external or internal—empirical or
idealized. External, here, means the origin was a real sound and remembered. Then
again, the seed could be entirely imagined—or perhaps a combination of sounds impos-
sible in the world around. The system will not be perfect—we could even say it “guessti-
mates” what we are imagining. A (real) sound is made and play begins. The user can
treat this first attempt as a source, and maintain the original thought as the target. This is
a kind of “imagination control loop”—change slightly till matched, or till sufficiently
close. Or, of course, we could treat any outcome as something entirely new with a future
path of its own and forget the original stimulus. It might be the case that some imagined
sounds are in fact physically impossible.32 Furthermore, “holding” a sound in the imagi-
nation unaltered while being compared with other sounds might be a difficult (perhaps
impossible) task! Hence the need for the evocative transcription discussed earlier to
help us fix it.
Provocations work somewhat differently. One line of thought throughout both mod-
ernist and postmodern musical discourse has been the creation of the unexpected as a
major device—not unexpected simply from the listener’s perspective but from even the
composer’s and performer’s perspectives. These involve generative procedures—usually
some kind of automata that (more or less) decouples the immediate taste and will of the
composer and performer.
Some like external “models” for this reason; to generate what might not otherwise
have been conceived. Needless to say, others reject them completely! Common in recent
decades have been systems that are both mathematically beautiful and relate to a degree
to a real-world phenomenon. Examples include fractal and chaotic systems, swarm
algorithms, and so on. Within our model here, these could be provoking our imagination,
kick starting the synthesizer. It will remain a matter of choice as to whether and to
what degree we intervene and guide the system toward any goal. If we fixed a goal
(as earlier) using a form of evocative transcription then we remain free to modify and
moderate this ideal—perhaps our provocative system comes up with something we
prefer to our original target.
272 simon emmerson
That leads us to a final source of potential input to our mind machine. Earlier generations
thought of the term “synthesizer” as pertaining to electronic generation usually of sound
types clearly not derived directly from real, sounding objects, although often based on
instrumental models (wind, brass, string). But largely through the analysis/resynthesis
developments in the last quarter of the twentieth century, the distinctions between sam-
plers and synthesizers steadily blurred until none now remain. There has also been a
cultural shift to hearing technological sound as part of an extended environment—the
nature-culture divide has effectively disappeared. Birdsong sits alongside traffic sound
in the urban soundscape.
At a sufficient distance over the woods this sound [bells] acquires a certain vibratory
hum, as if the pine needles in the horizon were the strings of a harp which it swept.
All sound heard at the greatest possible distance produces one and the same effect,
a vibration of the universal lyre. (Thoreau 1986, “Sounds” [from Walden 1854])
but radically reformed by the futurists to include urban and industrial sound:
We will sing of the vibrant nightly fervour of arsenals and shipyards blazing with
violent electric moons . . . deep-chested locomotives whose wheels paw the tracks
like the hooves of enormous steel horses bridled by tubing; and the sleek flight of
planes whose propellers chatter in the wind like banners and seem to cheer like an
enthusiastic crowd. (Marinetti [1909] 1973)
To convince ourselves of the amazing variety of noises, it is enough to think of the
rumble of thunder, the whistle of the wind, the roar of the waterfall . . ., and of
the generous, solemn white breathing of a nocturnal city. (Russolo [1913] 1973)
But, in the early twentieth century, there rapidly emerged from this a dramatic new
option—I can play it!—returning to the human listener the possibility of becoming per-
former. In 1922 (in Baku, Azerbaijan), Arseny Avraamov created a Symphony of sirens:
Avraamov worked with choirs thousands strong, foghorns from the entire Caspian
flotilla, two artillery batteries, several full infantry regiments, hydro-airplanes, twenty
five steam locomotives and whistles and all the factory sirens in the city. He also
invented a number of portable devices, which he called “Steam Whistle Machines”
playing the inner ear: performing the imagination 273
for this event, consisting of an ensemble of 20 to 25 sirens tuned to the notes of the
Internationale . . . Avraamov did not want spectators, but intended the active partic-
ipation of everybody. (Molina 2008, 19)33
However, the technology of recording allows a simulacrum of such a vast and ungainly
process. Recording sounds (more recently, of course, sound and image) allows the cre-
ation of a substitute for the real environment—and these are a lot simpler to play than the
original! From the very earliest days of Pierre Schaeffer’s experiments in the studio at
French radio, we have his invention of the sampler—in his imagination:
Once my initial joy is past, I ponder. I’ve already got quite a lot of problems with my
turntables because there is only one note per turntable. With a cinematographic
flash-forward, Hollywood style, I see myself surrounded by twelve dozen turntables,
each with one note. Yet it would be, as mathematicians would say, the most general
musical instrument possible. Is it another blind alley, or am I in possession of a solu-
tion whose importance I can only guess at?
(Pierre Schaeffer’s diary: April 22, 1948 [Schaeffer 2012, 7])
It is clear from the contextual discussion that Schaeffer does not mean “note” as the tra-
ditional pitched event but in a more general sense of what was to become the “sound
object.” Thus, with the advent of the internet nearly fifty years later, the ability to “sample”
sounds worldwide becomes a real possibility—even off-earth through, for example, the
NASA website—thus giving us the power to reach out to, play with, and ultimately perform
the environment in its mediated forms.34 Technology allows the creative reorganization of
these spaces; their transformation (often through the simplest means of amplification
and spatialization)—a “small” event can become a landscape.35 Our imagination allows
us to become Alice in Wonderland and change scale—the human scale can be made
gargantuan and the largest can be brought within human scale. John Cage famously did
this in many of his installations, projecting one space into another. His realizations of
Variations IV (1963) and Roaratorio (1979) are good examples. Thus, amplified small
sounds can fill a listening space alongside the reinjection of an entire city soundscape.
Conclusion—and a Footnote on
Ethics and the Transparency
of the “Fourth Wall”
In the five or so years between first thoughts about ideas behind this chapter and the
time of writing, speculation has rapidly become reality. On the one hand, the develop-
ment of increasingly accurate and extensive brain scanning techniques, on the other, the
advent of commercially available EEG and ECG brain interfaces (with a major drive to
produce thought-directed game controllers36) suggest that sooner rather than later we
could have imagination-driven sound synthesis.
274 simon emmerson
I have suggested that the way into this may not be so simple—both acousticians and
musicians have found it extraordinarily difficult to describe or define timbre or sound
quality. Just as the ideas themselves are multidimensional, so we shall need to harness all
the tools we have used to date in our new synthesis engine, from the most quantitative
measures to the most playful and creative actions across many modes—graphic, sculp-
tural, movement, or haptic. Musicians are well used to creative play and improvisation,
and I have argued that such embodied performance will be an integrated part of this
new experimental world—indeed vital to its fulfillment.
While an exciting prospect, potentially of enormous power, we shall need to tread
with mindful awareness. The example from the work of Dennis Potter I cited earlier
contained a basic ethical dilemma—the retrieval of the memories from the unfrozen
cortex was literally torture for the conscious head unable to express itself. Until in the
final episode it manages to construct a “message” on a piece of paper in its imagination
which begs for release—granted a short while later in a terrorist attack on the laboratory.
A final example will amplify this need for care in ethical matters as some of the tools
I am sketching do indeed come into existence in coming decades.
The creative model I have discussed here is based on an optimistic “projection out-
wards” under our aware control (with all its limitations) and with our consent. But there is
the dystopic mirror view that might become an “invasion inward” without our (apparent)
knowledge or permission. Mind reading is just the start of it in Andrei Tarkovsky’s
sci-fi film Solaris (1972). The space station is overrun by the invisible intelligence of the
planet’s ocean that can create apparently real people, things, and places from the memories
of the cosmonauts. In this case, they never had control over this immense power. They were
not responsible for its behavior and do not begin to understand its working. The scientists
on board (and back on Earth) know only of certain “rhythms” and changes in the ocean’s
behavior—and that the “creations” do not possess the same atomic structures as their
earthly equivalents. The humans on board not surprisingly become increasingly deranged.
If we wish to conclude with the more optimistic view that we can avoid our dreams
becoming nightmares, then we will need to share openly the necessary knowledge and
understanding of the workings of our imagination synthesizer. We might need to take
steps to ensure that this is the case and to gain aware consent for its use. Today we may
cursorily accept a “cookie” regime on our computer; but let us imagine an equivalent
(or more advanced) observer of our behavior while wearing an EEG interface and the
possible consequences of that data collection if made unaware and uncontrolled. While
moving to matters outside the remit of this chapter, we shall need to be aware of the
issues and participate in deciding our preferred safeguards.
Notes
1. This chapter is based on my keynote presentation to Audio Mostly 2014—“Imagining Sound
and Music,” run by the Music and Sound Knowledge Group, Aalborg University, October
2014. Some ideas first appear undeveloped in my keynote addresses to ACMC2011 (Auckland)
and ICMC2011 (Huddersfield).
playing the inner ear: performing the imagination 275
19. That is assumed to mean we map the time of the music onto the space of the page (or screen
equivalent) in some way—discussed further in what follows.
20. This phrase has been around for several decades with no obvious origin. For a good intro-
duction to its meaning and function, see Hugill (2012, 237).
21. See http://www.inagrm.com/accueil/outils/acousmographe. Accessed May 15, 2017.
22. See http://logiciels.pierrecouprie.fr/?page_id=402. Accessed May 15, 2017.
23. While this has not been the subject of formal research, evidence and discussion may be
found in Wolf (2013), Holland (2016), and ideas behind the EARS2 resource site (ears2.
dmu.ac.uk) and the associated software Compose with Sounds (cws.dmu.ac.uk).
24. MIR is a fast-developing discipline that has harnessed machine assistance to seek, sort,
and represent (visualize and display) information of some use and comprehension to the
user from “big data” sources (see Casey et al. 2008).
25. There is an interesting case with a conductor. While theoretically a “no resistance” system,
I wonder to what extent the response of the orchestra/ensemble “feels” like a resistive
weight.
26. This did not stop the Yamaha DX7 (1983) becoming the most successful synthesizer in
history at that time.
27. For an image, see the booklet with the CD box archives GRM (INA/GRM 2004, p.18).
28. This is something well developed for musicians of limited physical movement—
see the Sound=Space environment developed by Rolf Gehlhaar as an example
(http://www.gehlhaar.org/x/pages/soundspace.htm. Accessed May 15, 2017).
29. See http://steim.org/. Accessed May 15, 2017.
30. See http://sonami.net/ladys-glove/. Accessed May 22, 2017.
31. See, https://mimugloves.com. Accessed December 14, 2018.
32. This is not the place to discuss the interesting relationship between “impossible” and
“impossible to produce”—or the possibility that anything imaginable might exist
somehow.
33. Molina’s words, but followed by the complete “instructions” for the performance written
by Avraamov himself (20–21), as published in the local press. These comprise, in fact, an
hour-by-hour scenario of the entire event. Molina reports that there were two predecessors
(1919, 1921) and a subsequent full version in Moscow (1923).
34. An excellent and extreme version is seen in such as “The Earth’s Original 4.5 Billion Year-
Old Electronic Music Composition,” an installation by Robin McGinley (2002) that proj-
ects “sferics” into the installation triggered by the visitors (see https://vimeo.com/66475800.
Accessed May 15, 2017).
35. I have written elsewhere on “space frames” and their transformation and reconfiguration
through technology (Emmerson 2007, 2015).
36. See, for example, the interfaces from Emotiv (emotiv.com) and Neurosky (neurosky.
com)—with others announced.
References
Bachelard, G. 1964. The Poetics of Space. Boston: Beacon.
Bayle, F. 1993. Musique acousmatique—propositions . . . positions. Paris: INA and Buchet/
Chastel.
Bharucha, J. J., M. Curtis, and K. Paroo. 2006. Varieties of Musical Experience. Cognition 100:
131–72.
playing the inner ear: performing the imagination 277
Casey, M., R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. 2008. Content-Based
Music Information Retrieval: Current Directions and Future Challenges. Proceedings of the
IEEE 96 (4). 668–696.
Chion, M. 1982. L’envers d’une oeuvre (Parmegiani: De Natura Sonorum). Paris: Buchet/Chastel.
Chion, M. 1983. Guide des objets sonores—Pierre Schaeffer et la recherche musicale. Paris:
Buchet/Chastel.
Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
Couprie, P. 2004. Graphical Representation: An Analytical and Publication Tool for
Electroacoustic Music. Organised Sound 9 (1): 109–113.
Emmerson, S. 2001. From Dance! to “Dance”: Distance and Digits. Computer Music Journal
25 (1): 13–20.
Emmerson, S. 2007. Living Electronic Music. Aldershot, UK: Ashgate.
Emmerson, S. 2011. Music Imagination Technology. Keynote Address. In Proceedings of the
International Computer Music Conference Huddersfield, ICMA, 365–372. San Francisco,
ICMA.
Emmerson, S. 2015. Local/Field and Beyond: The Scale of Spaces. In Kompositionen für
hörbaren Raum (Compositions for Audible Space), edited by M. Brech and R. Paland, 13–26.
Bielefeld: transcript Verlag.
Gray, D. 2013. The Visualization and Representation of Electroacoustic Music. PhD thesis,
Leicester: De Montfort University.
Grimshaw, M., and T. A. Garner. 2015. Sonic Virtuality: Sound as Emergent Perception. Oxford:
Oxford University Press.
Hickok, G. 2009. Eight Problems for the Mirror Neuron Theory of Action Understanding in
Monkeys and Humans. Journal of Cognitive Neuroscience 21 (7): 1229–1243.
Holland, D. 2016. Developing Heightened Listening: A Creative Tool for Introducing Primary
School Children to Sound-Based Music. PhD thesis, Leicester: De Montfort University.
Hugill, A. 2012. The Digital Musician. 2nd ed. New York and London: Routledge.
Le Corbusier. 1968. Modulor 2. Cambridge, MA: MIT Press.
Leman, M. 2016. The Expressive Moment: How Interaction (with Music) Shapes Human
Empowerment. Cambridge, MA: MIT Press.
Levinson, J. 1998. Music in the Moment. Ithaca and London: Cornell University Press.
Malloch, S., and C. Trevarthen. 2008. Communicative Musicality: Exploring the Basis of
Human Companionship. Oxford: Oxford University Press.
Marinetti, F. T. (1909) 1973. The Founding and Manifesto of Futurism. In Futurist Manifestos,
edited by U. Appolonio, 19–24. London: Thames and Hudson.
Molina Alarcón, M. 2008. Baku: Symphony of Sirens: Sound Experiments in the Russian Avant
Garde. London: ReR Megacorp.
Molnar-Szakacs, I., and K. Overy. 2006. Music and Mirror Neurons: From Motion to “E”motion.
Social Cognitive and Affective Neuroscience 1 (3): 235–241.
Nattiez, J.-J. 1990. Music and Discourse: Toward a Semiology of Music. Princeton, NJ: Princeton
University Press.
Nishimoto, S., A. T. Vu, T. Naselaris, Y. Benjamini, B. Yu, and J. L. Gallant. 2011. Reconstructing
Visual Experiences from Brain Activity Evoked by Natural Movies. Current Biology 21:
1641–1646.
Park, T. H. 2016. Exploiting Computational Paradigms for Electroacoustic Music Analysis. In
Expanding the Horizon of Electroacoustic Music Analysis, edited by S. Emmerson and
L. Landy, 123–147. Cambridge: Cambridge University Press.
278 simon emmerson
P SYC HOL O GY
chapter 14
W. Luke Windsor
Introduction
When we are fearful, it can be because we are threatened or perceive a threat: often that
threat is current, sometimes it is remembered, and sometimes it is imagined. Music (and
sound) can play a role in the generation of fear, and in this chapter I will argue that music
is used in detention and interrogation not only to influence our emotional state directly
but also to create an ambiguity and uncertainty that leaves the detainee subject to the
free play of imagination, perverting the benign imagination of aesthetic contemplation
into something malign and horrific. In order to do this, the boundary between the real
and the imagined will be explored: the chapter aims to identify the location of this bound-
ary, how it is set for individuals, and the circumstances in which it becomes crossed.
A subsidiary aim is to address the broader context for the use of music in detention and
interrogation in order that, in the more academic quest for an understanding of a set of
musical behaviors and their consequences, the history and psychology of music’s use as
a military tool is not overlooked.
Music is often seen as a force for good: for example, the co-optation of Mozart’s
music as a panacea has become a paradigmatic example of folk psychology, despite many
unintended consequences. Yet, throughout human history, music (and sound) has been
associated with and used by commercial, political, and military forces in attempts to
control behavior, and music itself seems to have intrinsic power to do harm. In Thomas
Kenneally’s book Schindler’s Ark (1982), an inmate-musician and a German officer in
a work camp in occupied Poland conspire to use the repetition of a musical work (the
infamous Gloomy Sunday) to fatal effect: as a result the officer commits suicide after
requesting that the song be repeated with increasing passion. Kenneally’s parable gives
282 w. luke windsor
musical power to the detained Jewish prisoner, and the officer willingly submits. This, of
course, is the opposite of the normal state of affairs: music in detention is most often
controlled by the captor and, as will be discussed later, used to attempt to influence the
thinking and behavior of the captive. The parable is also an exaggeration for effect: through
hyperbole, Kenneally actually highlights the powerlessness of the captive, who only has
the passion for music left as a weapon.
As Grant (2013a, 2014) points out, it is not just in the recent Iraq War, where the use of
loud recorded music in detention gained notable media coverage (see, e.g., Chrytoschek
2011), that music has been used to coerce or humiliate detainees. Moreover, forced sing-
ing and playing, as well as forced listening form part of this history of music in detention
and link it to a broader and longer context of music in military settings (see, e.g., Pieslak
2009; Grant 2013b). Furthermore, as Pieslak (2009) discovered, there is a considerable
overlap in the choice of music by soldiers for personal use and their selection of music in
overt and covert attempts to influence others.
This chapter will not attempt to provide an overview of all the ways in which music
might or might not be used to influence others, for ill or for good. It will instead focus on
an acute and special case of music psychology: that of forced listening to music in deten-
tion, whether or not such forced listening is intended to elicit information or not. The
aims here are to show how such uses of music can be reframed within a broader context
of musical persuasion and to provide a deeper engagement with the ethics of music than
could ever be achieved without considering the extreme case of music in detention. For
a more general discussion of the darker side of musical experience, Johnson and Cloonan
(2009) provide a thought-provoking survey of the many ways in which popular music is
deployed as an accompaniment to, or tool of, violence.
It is with considerable care that any music researcher should engage with the study of
music in detention and/or interrogation. This is partly due to the commonly held view
that music is, or should be, a benevolent art with positive impacts on individuals and
societies, a view that may cause us to turn away from a more malevolent instrumentalism.
Within such a context, research pointing out the harm that music can do seems
counterproductive. Indeed, if one is focused on maximizing the potential social benefit
of research (see, e.g., Sloboda 2005, 395–419), working on the harm that music can do
can only really be justified if it presents data or analysis that can be used in advocacy
against the harmful use of music or of it suggests tools to combat such uses.
Added to this particular disincentive, the broader study of internment, detention,
interrogation, and torture requires a sensitivity to and breadth of knowledge about
international and military law and custom, the ethical and moral background to cruel
and inhuman practices, and so on; few musicologists or music psychologists come
ready-prepared to engage with this body of work. In addition, the researcher may be
persuaded that, by studying music in interrogation, they might inadvertently promote
practices they disagree with, or indeed add to the body of knowledge that interrogators
employ in the field. The position taken by the Society for Ethnomusicology in 2007
(SEM 2007) suggests there is something particular about the use of music in coercive
music in detention and interrogation 283
interrogation that should be called out by musicologists, and also that musicologists
should call attention to such (ab)uses of music. Some musicologists are content to
disavow all coercive interrogation and question whether we need to make a special case
of music within this context, when considered in the context of the range of coercive
methods used in detention and interrogation:
The issue is really torture, which to me is always wrong, period. I can’t see that music
as torture is more or less wrong than anything else as torture, and I confess that deep
down this feels like special pleading—e.g., water resource managers complaining
about the use of water for torture, or (more ridiculously) Hello Kitty aficionados
complaining that Hello Kitty armbands were to be used by a Thai police department
as badges of malfeasance and indiscipline. (Bellman 2007)
This chapter takes an initial step back from these problems, and it does not initially
consider whether they are of particular importance given the wider debate about the
legality or morality of obtaining information through psychological or physical manipu-
lation or pressure. Instead, it will engage with the perceptual and social-psychological
consequences of playing music in situations of detention, and it will engage also with
the context for these practices, as this may help us better understand how they have
come about and how they can be seen to be situated within a broader context of music
as a source of behavioral control. Hence, although much reference will be made to
existing ethnographic and historical work in this domain (especially that of Cusick
2006, 2008a, 2008b; Pieslak 2009; Grant 2013a, 2014) the broader contexts that will be
applied are derived from psychological research that is related both to interrogation and
also to other forms of coercion, and the understanding of the relationship between
imagination, sound (and music), and direct perception.
The role of imagination in the creation of a fearful, vulnerable, and malleable state
has an explicit and implicit relationship with the ability or inability of a person to
directly and effectively act on and perceive their surroundings. It is for this reason that,
rather than analyzing the role of sound in coercive interrogation in a theoretical vacuum,
some positioning is required. This chapter will introduce and apply the work of Gibson
(e.g., 1966, 1979) on direct perception and ecological psychology and will attempt to
show how his theory of perception helps explain the ways in which sound and music,
normally helpful or benign, become sources of fear and confusion. The work of Gibson
will be returned to in the conclusion of this chapter in a more political vein, as it will
become clear that his approach to psychology provides a neat riposte to the co-optation
of (music) psychology by military and commercial interests for purposes of persuasion.
The contrast between Bernays’s (1942) and Gibson’s (1939) reactions to Nazi propaganda
efforts do not, as I will argue, rest on both an ethical distinction and a theoretical one:
their views of human psychology lead them to very different conclusions about how
we as individuals should respond to attempts by others to influence us. Before this,
however, it is necessary to review some of the existing work on music and interrogation/
torture and its intellectual and practical antecedents.
284 w. luke windsor
Music in Detention/Interrogation
malls, and restaurants) to influence not just our internal state, in an attempt to imbue
spaces with a particular ambiance, but also our level of activity. One of the most highly
cited publications in the field of consumer control is the description of a study in which
the volume of music was varied in a supermarket (Smith and Curnow 1966): louder
music was associated with less time in the store but no lesser volume of purchasing. The
authors of this study explain this through an arousal hypothesis, whereby the louder
music leads to greater arousal in the customers and faster shopping, rather than driving
the customers from the store. The correspondence of music to customer’s expectations
or their degree of liking are, however, important factors that can be manipulated to
influence their behavior. A study by North, Hargreaves, and Mckendrick (1999) demon-
strated that we will stay on hold to a help line longer when the music is both liked and
congruent with the task. More subtle dimensions of musical structure and associated
or evoked emotions can also influence what we purchase, how long we linger, and even
how much we are prepared to pay for products. For example, the style of music and its
associations with more or less expensive items might be a powerful predictor of pur-
chasing (Wilson 2003). Music has also been considered as a factor in delineating zones
within shopping malls and department stores, with different style of music helping to iden-
tify soft boundaries between different product areas (e.g., Yalch and Spangenberg 1993).
Music is also used without the intention of influencing purchasing in public spaces.
Just as we might employ it within our own spaces or through earphones to manage our
mood, our spaces’ musical ambiences are curated for us in attempts to speed or slow our
movements, make us more comfortable, or provide public information. Although these
uses are potentially more benign and may be alternatives to more expensive or harmful
attempts to influence us, the central aim is to coerce the listener into a more or less pas-
sive state. Dentists, for example, claim to use music to calm patients with some success,
aiming to make their work easier through a more relaxed patient without needing
recourse to medication. However, Aitken and colleagues (2002) found no effect of music in
such contexts above and beyond the patient’s enjoyment of it in a controlled setting, and
even in studies where it is shown to have an effect it may only be for less anxious
patients (e.g., Lahmann et al. 2008). Moreover, regardless of whether it is effective,
music may simply become another remembered feature of a hostile environment for an
“uncooperative” patient (see, e.g., Welly et al. 2012), and associations of music with expe-
rience can obviously flow both ways. Nonetheless, Standley’s meta-analysis of music in
dental and medical settings (1986) does suggest an effect. Similarly, in waiting rooms,
medical or otherwise, rather than speeding up service, one may choose to play music to
increase tolerance of waiting time (see, e.g., North et al. 1999) or reduce stress (see, e.g.,
Tansik and Routhieaux 1999).
Note, that in all these situations, music’s primary value in self-managing our psycho-
logical state is supplanted by external control of this environmental information. Of
course, music is but one of many kinds of stimulus information that we and others use to
orient and be oriented in the environment, but the semi-unavoidable nature of acoustic
stimulation is significantly different from some other forms of influence: averting or
closing one’s eyes is much easier than ignoring unwanted sound. Of course, one can
286 w. luke windsor
wear ear defenders, plugs, or headphones to block out or supplant this information with
silence or our own choice of music, a technological adaptation that serves to both regain
and enhance control of the auditory environment in a way that is thoroughly con-
temporary. As a corollary, the encouragement of employees to curate their own workplace
musical environment in order to increase productivity and staff well-being (and to avoid
the distractions of workplace noise) is becoming more widespread, and there is some
empirical evidence to support the effectiveness of such practices (see, e.g., Lesiuk 2005).
Of course, music’s ubiquity in this space of influence has led many to complain about,
campaign against, or avoid such settings and uses of music. The attempts by early
adopters of musical broadcast technology to impose music in settings such as public
transport often backfired (see, e.g., Hui 2016), and there is a general social consensus
that even the minor public acoustic spillage from headphones is an intrusion that can
attract considerable opprobrium.
Before concluding this section, and in order to form a link with the later discussion of
the relationship between more general uses of music as propaganda and in psycho-
logical warfare, a final way in which music is used in explicitly political settings is worthy
of mention. In an unusual and original study, Shevy (2008) used different genres of music
to influence participants’ perceptions of trustworthiness, friendliness, and political
ideology, exploiting the stereotypical associations of hip hop and country music: a perti-
nent feature of his findings, which will become relevant when discussing psychological
warfare and interrogation, is that the extent and nature of such influence should vary
with the ideology and musical preference of the listener: a liberal African American lis-
tener would be primed very differently by music than a white or Hispanic listener or a
conservative African American, and such influences would vary with preference for
musical genre. Music is a tool for subtle persuasion in the context of ideology, not just
for commercial ends.
bands, and the coordination of movement to music has both utilitarian and psychological
dimensions. Even in situations where instruments are not used, soldiers will often march
to songs: the clearest examples of this in the Western military is the singing of the French
Foreign Legion which cuts across marching and more reflective settings; the Boudin, for
example, is sung standing to attention as well as in celebratory or functional marching
situations. The tempo of the Boudin, whether sung in motion or not, is surprisingly slow,
infamously necessitating the arrival of French Foreign Legion units at celebratory events
after other French units, and, indeed, it both denotes the separate identity of the Legion,
and connotes its rather dour character. This tempo, and a curious single style, with
truncated phrase ends, extends to a wide repertoire of traditional and popular songs,
mostly in French, which many of the recruits will barely speak, but also many songs are
in German, reflecting the large number of German recruits the Legion has attracted at
times (see, e.g., French Foreign Legion 2016). A related tradition, from the United States,
is that of the cadences and jodies sung by soldiers as they train (see Pieslak 2009): again,
synchronization of movement is paramount, but in both cases the content is also sig-
nificant, and rather different, as will be discussed below. Importantly, the music of march-
ing sits in an interesting zone in between self-chosen musical behavior and imposed
discipline: the choice to march or sing is not free, it is taken under military discipline,
and to refuse is a matter for the military courts.
Regardless of any other subtler parameter, the sheer volume of military music is
important. Whether participatory or not, military bands and even unaccompanied
singing produce loud sounds which travel far. In combat, and extensively documented
in Pieslak’s study of music in the Iraq War (also see Gittoes 2005, documentary), soldiers
not only take the trouble to select their own music to accompany combat within armored
vehicles, they create DIY sound systems within them to broadcast the music over their
intercom systems or through internally mounted loudspeakers. There is a sense in which,
just as a commuter masks the sounds of others with music over headphones, this cre-
ates a private environment within the vehicle, the sheer volume of sound masking the
influence of the threats from outside. The volume of broadcast or headset music here is
self-chosen, although in Gittoes’s extraordinary unsanctioned film about music in the
Iraq War (2005) some of the interviewees have clearly developed less coherent musical
selections, creating a conflicted musical environment within the confines of the armored
vehicle: being unable to escape from loud unwanted music is clearly a potential problem
in combat, just as it might be in other work-settings.
The semiotics of military music interacts with these other two parameters: the trite
example of the bugle call at one end of a spectrum which ends with the singing (and
broadcast over loudspeakers) of “Je ne regrette rien” in association with the withdrawal
of the final French Foreign Legion units from Algeria. The lyrics, tempo, and musical
structures of military music implicitly and explicitly influence soldiers before, during,
and after combat; they serve to identify particular units and they communicate ideas,
national identities, and ideologies. This semiotics was particularly important to impe-
rial military powers; for example, in Africa in the late nineteenth and twentieth centuries
288 w. luke windsor
(see, e.g., Clayton 1978). For example, in East Africa, both British and traditional African
music were adopted to build a corporate identity, often exploiting the usage of Swahili as
a cross-tribal language. To sing “Men of Harlech” in Welsh, or indeed English, is one
thing, but to sing it in Swahili, quite something else. In this case, the fantasy of Welsh
(actually mostly English) soldiers singing this song at Rorke’s Drift (in the film Zulu) has
a real counterpart in the musical practices of later colonial troops. Or consider the lyrics
of this traditional World War II song from Kenya (also sung in Uganda):
Mussolini Mussolini,
Mussolini amekimbia!
Nakumbuku njaro Nairobi!
Nakumbuku njaro Faifa keya!
Tutarudi!
Tutarudi!
Mussolini, Mussolini
Mussolini has run away!
We remember the light of Nairobi
We remember the brightness of 5 KAR.
(Clayton 1978, 38)
Psychological Warfare
Allied with the presentation of propaganda in spoken form via radio or loudspeaker,
music has long played a role, along with sound effects, in efforts to influence the behav-
ior of opposing forces. Indeed, for Volcler (2013; also see Goodman 2012 for a more theo-
retically driven treatment), the modern usage of music in this context (often associated
with the Korean War and later conflicts) is one of two main precursors of the use of
music in detention, the other being post-1945 CIA-sponsored research on the psy-
chology of coercive interrogation. Volcler also draws parallels between the nonlethal
usage of sound as a persuasive tool and as a weapon to disable or kill, which will be
addressed briefly in what follows. This historical link between music as an at-a-distance
tool of warfare and music in detention is also made by Pieslak (2009), who distances his
historical narrative from that of Cusick (2006, 2008a, 2008b), for whom, like Volcler
(2013), the sources of music in detention derive both from propaganda practice and
from covert psychological research programs. It is probable that the history of music’s
use in detention draws on many precursor practices (see Grant 2014, for an excellent
overview of the many ways in which music comes to be used as and in torture), and it is
likely that the use of music to explicitly influence behavior draws variously on all of these
precursors, depending on circumstance. This will be returned to later in relation to the
tension between improvised and more institutionally circumscribed practices described
in manuals and by practitioners and detainees.
Even in mainstream psychological warfare, the use of music often oscillates between
more improvised and administered extremes and between motivational soundtrack and
nonlethal weapon, as exemplified by the use of music during the siege of the Vatican
music in detention and interrogation 289
While some accounts claim that the music was played to boost the morale of American
troops (a claim that even here demonstrates the overlap between psychological tactics
and inspiration for possible combat), it had, regardless of original intent, a powerful
side effect. When Noriega commented that the music was irritating him, the Marines
increased the volume, playing the music continuously. (Pieslak 2009, 82)
Rather than review the range of ways music is used in persuasion in the field, the reader
is directed toward Pieslak’s coverage of the use of music by opposing forces in the Iraq
War (2009): here both sides broadcast sound at high volumes via loudspeaker: nasheeds
on the Iraqi side, and rock and rap music on the US side: in both cases he argues that
such a sonic environment inspires friendly forces while also being intended to destabi-
lize the enemy.
Sound Weapons
Volcler (2013), in her provocative book Extremely Loud: Sound as a Weapon, argues that
the use of music in detention and interrogation takes place in a broader context of sonic
weaponization. Indeed, although spending much time on the claims made for physio-
logical applications of sound, she concludes that it is the psychological impact of sound
(and music), whether tacit or conscious, that is the most effective weapon. Although
sound at high intensity can damage the ear (or even other organs), and contemporary
technologies such as the Long Range Acoustic Device (LRAD) can both deliver verbal
instructions, tones, noise, or music at long ranges and high enough intensities to cause
distress or damage, she notes that the fear of such weapons is probably just as impactful
as their application. Importantly, like Cusick, she notes that the attraction of nonlethal
weaponry is often somewhat disingenuous: just as the LRAD is marketed as a long-range
communication device but potentially applied as a weapon at shorter ranges, sound and
music are portrayed as relatively harmless (no-touch) interrogation techniques rather
than as psychologically harmful torture methods:
“No-touch torture” shares with non-lethal weapons the advantage that it leaves no
marks directly caused by interrogators on the visible, fleshy surfaces of the body.
Thus hard to prove, and hard to jibe with images of torture familiar from visual
and literary culture, “no-touch torture’s” premise is nonetheless consistent with the
premise behind non-lethal weapons, including those that use sound; and it is con-
sistent with the premise by which PsyOps units use sound or music to prepare the
battlefield. The common premise is that sound can damage human beings, usually
without killing us, in a wide variety of ways. What differentiates the uses of sound or
music on the battlefield and the uses of sound or music in the interrogation room is
the claimed site of the damage. Theorists of battlefield use emphasize sound’s bodily
effects, while theorists of the interrogation room focus on the capacity of sound and
music to destroy subjectivity. (Cusick 2006)
290 w. luke windsor
Volcler’s most interesting conclusion is that, in many cases, the use of sound as weapon
is more effective as a purely psychological technique, a placebo weapon of the imagination:
As I will argue later, it is the appeal to imagination as opposed to direct perception that
is at the heart of music’s use in detention and interrogation but, before turning to this
ecobehavioral interpretation, the next section provides a brief review of recent practices,
impacts, and narratives of music in detention and interrogation, focusing on recent
conflicts in Iraq, Afghanistan, and the wider “war on terror.”
1. music was played at very high volumes both in detention more generally and dur-
ing interrogations, certainly much louder than would be advised if permanent
damage were to be avoided
2. music was played for long durations (exacerbating the damaging effects of
loudness)
3. the music chosen reflected the individual tastes and cultural backgrounds of the
interrogators
4. music was interchangeable in some instances with everyday sounds
292 w. luke windsor
5. the interrogators used music for a number of intended purposes related to:
a. masking background sounds to isolate the detainee
b. interrupting cognition through distraction
c. creating cultural dissonance
d. establishing the dominance of the interrogator.
The issue of dominance (5d) seems particularly pertinent to the explicit training
interrogators received; the relationship of dependency between interrogator and
detainee is established not only through the playing of loud music (or indeed disturbing
everyday sounds) but by its cessation at the will of the interrogator. 5a–c correspond to
what Pieslak’s second informant refers to as a “change of scene”: music is intended to
block out and change the environment of the detainee in such a way as to maximize iso-
lation and minimize any sense of familiar surroundings. Such masking and distortion of
the environment is cultural as well as natural, as evidenced by the contrast Cusick iden-
tifies between the experiences of Begg and Vance (both familiar with Western popular
music) and that of al-Qatani, who was less so and more so considered music to be
haram (forbidden). Begg and Vance found the constant loud music irritating, disorienting,
and painful, but were not sensitive to its cultural dissonance: Begg even notes that he
believed that his interrogators were sensitive to this and did not use music with him in
the interrogation cell (although it was played elsewhere) (Cusick 2008a, 6–7). Vance,
however, like Begg found that the loud music, regardless of its cultural familiarity still
played a huge role in the psychological regime in which they were immersed. Regarding
al-Qatani, Cusick claims that the use of Western music was used knowingly to under-
mine his religious convictions due to its cultural specificity:
Given that the Taliban had forbidden music in Afghanistan for religious reasons, it
seems possible that al-Qatani genuinely believed that listening to music was haram,
forbidden, and therefore sinful. Yet his inability to talk knowledgeably about Islam’s
theological traditions on music allowed “the music theme” to merge with the themes
known as “the bad Muslim,” “al Qaeda betrays Islam,” “God intends to defeat al Qaeda,”
“arrogant Saudi,” and “I control all” to produce the overall “approach” called “Pride/
Ego Down.” That is, al-Qatani was humiliated, and his Muslim identity attacked, by
his obvious ignorance of his own tradition. Meanwhile, the “loud music” he may
have experienced as sinful continued to keep him awake, to end his interrogation
just before he was allowed to sleep, to awaken him, to prevent him from speaking in
answer to interrogators’ questions, and to fill up longer and longer parts of inter-
rogation days that were also filled with the argument over music’s alleged sinfulness,
which constituted “the music theme.” (2008a, 13)
This passage illustrates that the use of music by US interrogators is, if one accepts this
account at face-value, much more sophisticated than anything in the training manuals
declassified by the US Government. Music is not just another convenient loud sound to
disorient through controlling the environment, it is a cultural weapon of persuasion,
related to its use in propaganda and psychological warfare. This point is even more
music in detention and interrogation 293
forcefully made by Grant (2014), who locates music in detention within a broader
context of music as a method for sanitizing the act of applying militaristic power. Grant
suggest that there exists a continuum between the natural and the cultural, and between
the linguistic and musical in many interrogation or detention settings, and even implies
there is a perverse creativity in the use of music by interrogators to avoid more obvious
evidence of force.
Having described and contextualized the ways in which music has recently been used in
detention and interrogation, it is time to return to the ethical dimension of music and
its co-optation by interrogators. In their different ways, Cusick (2006, 2008a, 2008b),
Pieslak (2009), and Grant (2014) attempt to make sense of the way in which music seems
corrupted by its association with detention and interrogation, even though they may
argue about whether it can be considered torture in itself.
Rather than approach this question directly, this final section will recast the use of
music in detention and interrogation within the ecobehavioral approach to psychology
characterized by Gibson (e.g., 1966, 1979; also see Heft 2001). The intention here is to
demonstrate that this co-optation of music is partly a result of choosing to apply psycho-
logical research not only to the understanding of human behavior but also to its control.
To this end I will contrast the positions of Gibson (1939) and Bernays (1942) on Nazi
propaganda, but first it is necessary to describe Gibson’s mature position on the relation-
ship between direct perception and mediate perception, and how it helps explain both
benign and malign applications of music.
important to our interpretation of both more or less conventional (see, e.g., Clarke 2005)
and unconventional (see, e.g., Windsor 2000) musics.
In many everyday situations, we are able to identify or at least classify the sources for
the music that we hear, whether played on the radio or self-chosen, whether experienced
live, with the additional benefit of visual information, or acousmatically over loudspeak-
ers or headphones. We know something about where it comes from, who made it, and
what they might have wanted us to think or feel; we can infer meaning from lyrics or
instrumentation, or the subtleties of harmonic or melodic semiosis. Our sensitivity to
these dimensions of sound, however, is developed through developmental familiarity. It
might seem paradoxical in a book about imagination to turn to work on direct per-
ception: Gibson’s eschewal of representation and information processing in his account
of perception is both controversial within psychology (see, e.g., Fodor and Pylyshyn 1981)
and may seem unproductive when other approaches (such as that of Neisser, e.g., 1978,
89–105) explicitly try to understand the relationship between imagination, memory,
and perception. However, as I will argue later, the way that imagination functions in
Gibson’s work highlights a boundary between real and virtual which is both useful and
thought-provoking in this context.
al-Qatani most is the Arabic music that is played: the other music is distressing for the
reasons cited previously, acting more as noise than in a more subtle manner, but the
Arabic music opens up what turns out to be a distressing dialogue about Islamic culture’s
attitudes to music, one which, according to Cusick, he loses.
even language) provides much richer and less unambiguous source of information
about the world than proposed by cognitive and social psychologists. For Gibson, the
deception of German people (particularly into anti-Semitism) through propaganda in
the 1930s was a reason to suspect all propaganda, because it is inherently deceptive. The
answer, for Gibson, was to direct psychology away from the study of preconceptions, and
the social influence that seeks to reinforce them, toward the study of direct perception:
Our world, more especially our world of social objects, is understood in terms of
preconceptions, preexisting attitudes, habitual norms, standards, and frames of
reference. When the preconception is sufficiently rigid, an object will be perceived
not at all in accordance with the actual sensory stimulation but in congruence with
the preconception. No psychological law has been more exhaustively demonstrated
than this one. Preconceptions of this sort, moreover, are wrought out socially and
modify individual judgments. This fact also has been amply demonstrated both inside
and outside of a laboratory. A preconception that is socially reinforced becomes a
norm or standard for everybody. It becomes verbally symbolized in the process and
thereby is stereotyped and strengthened. Each individual adopts and internalizes
it, forgetting its imitative origin, and incorporates it in his repertory of values and
opinions. (1939, 165–166)
When applied to the context of music and its use to influence behavior, to coerce, it
becomes clear that regardless of the setting, such a co-optation of an emotionally power-
ful and unavoidable stimulus is another way in which the powerful seek to control the
weak: it is a method for preventing an individual from hearing their environment and is
one part of the creation of a setting whereby direct experience is limited to a space con-
trolled by others. The problem with music in torture, then, is not that music is corrupted
by the interrogator but that it is used to curtail experience, rather than being a stimulus
for further interpretation and exploration. In detention, the aesthetic of music is that of
fear: imagination of the worst replaces any other possible interpretation. Music becomes
a stimulus for social control, and the ideal subject is the detainee who cannot escape,
who cannot explore and discover the world through direct experience. The coercive use
of music is just one way in which the experience of an individual in detention is eroded,
reifying a view of human beings that Gibson derided:
The greatest myth of the twentieth century is that people are sheep. Our intellectual
culture has been built on the idea that ordinary people tend to see things as others
want them to, with little independence of mind. This is a pernicious assumption. . . . It
was Gibson’s purpose to undermine such thinking. (Reed 1996, 162)
Whereas propagandists and marketeers might tacitly assume that human beings are as
passive and directable as “sheep,” using music to influence through the control of audi-
tory information, the interrogator forces the prisoner into a passive mode of engaging
with sound, one in which propaganda has clearly failed in its mission to persuade. Music
used in this way is a tacit admission that persuasion through musical influence has been
298 w. luke windsor
abandoned to brute force: like Bellman (2007), I can only conclude that it is torture that
is ethically repugnant, with music playing a minor, and paradoxically unmusical role.
This role is in direct contrast to the utopian (and possibly naïve) view of music as a propa-
ganda tool advanced by Young (1954) or Szafranski (1995), one in which enemies of the
United States could be influenced by exposure to US cultural products such as music:
Saudi Arabia recently joined China as the most recent nation to outlaw satellite
television receivers. One can easily appreciate the effects that Music Television (MTV)
might have on such cultures. (Szafranski 1995)
It is as if, faced with an enemy that was often not susceptible to such influence due to its
rejection of music as an acceptable mode of expression, the US military and intelligence
communities reacted by attempting to maintain this belief in the power of music, while
simultaneously undermining its aesthetic potential. The coercive, and hence restrictive,
nature of using music in detention and interrogation turns music, at its most degraded,
into noise, and at its most sophisticated, into a stimulus for a fearful imagination.
References
Aitken, J. C., S. Wilson, D. Coury, and A. M. Moursi. 2002. The Effect of Music Distraction
on Pain, Anxiety and Behavior in Pediatric Dental Patients. Pediatric Dentistry 24:
114–118.
Bellman, J. 2007. Music as Torture: A Dissonant Counterpoint. https://dialmformusicology.
com/2007/08/21/music-as-tortur/. Accessed January 19, 2017.
Bernays, E. L. (1928) 2004. Propaganda. New York: IG.
Bernays, E. L. 1942. The Marketing of National Policies: A Study of War Propaganda. Journal
of Marketing 6 (3): 236–244.
Berlyne, D. E. 1971. Aesthetics and Psychobiology. New York: Appleton-Century-Crofts.
Blass, T. 2007. Unsupported Allegations about a Link between Milgram and the CIA. Journal
of the History of the Behavioural Sciences 43 (2): 199–203.
Brown, R. E. 2007. Alfred McCoy, Hebb, the CIA and Torture. Journal of the History of the
Behavioural Sciences 43 (2): 205–213.
Chrytoschek, K., dir. 2011. Songs of War: Music as a Weapon. A & O Buero.
CIA. 1963. KUBARK Counterintelligence Interrogation. National Security Archive Electronic
Briefing Book No. 122. Washington, DC: George Washington University.
CIA. 1983. Human Resource Exploitation Training Manual. National Security Archive
Electronic Briefing Book No. 122. Washington, DC: George Washington University.
CIA. 2004. OMS Guidelines on Medical and Psychological Support to Detainee Rendition,
Interrogation, and Detention. Langley: CIA.
Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
Clayton, A. 1978. Communication for New Loyalties: African Soldiers’ Songs. Papers in
International Studies, Africa Series 34. Center for International Studies, Ohio University.
Cusick, S. G. 2006. Musicology, Torture, Repair. Radical Musicology 3. http://www.radical-
musicology.org.uk/2008/Cusick.htm. Accessed January 19, 2017.
music in detention and interrogation 299
Cusick, S. G. 2008a. “You Are in a Place That Is Out of the World . . . ”: Music in the Detention
Camps of the “Global War on Terror.” Journal of the Society for American Music 2 (1): 1–26.
Cusick, S. G. 2008b. Music as Torture/Music as Weapon. TRANS 8. http://www.sibetrans.
com/trans/articulo/152/music-as-torture-music-as-weapon. Accessed January 19, 2017.
Department of the Army. 1992. FM 34–52 Intelligence Interrogation. Washington, DC: Department
of the Army.
Department of the Army. 2006. FM 2–22.3 (FM 34-52) Human Intelligence Collector Operations.
Washington, DC: Department of the Army.
Fodor, J. A., and Z. W. Pylyshyn. 1981. How Direct Is Visual Perception? Some Reflections on
Gibson’s “Ecological Approach.” Cognition 9: 139–196.
French Foreign Legion. 2016. French Foreign Legion Songs and Marches. http://foreignlegion.
info/songs/. Accessed January 19, 2017.
Gaver, W. W. 1993a. What in the World Do We Hear? An Ecological Approach to Auditory
Event Perception. Ecological Psychology 5 (1): 1–29.
Gaver, W. W. 1993b. How Do We Hear in the World? Explorations in Ecological Acoustics.
Ecological Psychology 5 (4): 285–313.
Gibson, J. J. 1939. The Aryan Myth. The Journal of Educational Sociology 13 (3): 164–171.
Gibson, J. J. 1966. The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
Gibson, J. J. 1979. The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
Gittoes, G., dir. 2005. Soundtrack to War. Australia: ABC Video.
Goodman, S. 2012. Sonic Warfare: Sound, Affect and the Ecology of Fear. Cambridge, MA:
MIT Press.
Grant, M. J. 2013a. The Illogical Logic of Music Torture. Torture 23 (2): 4–13.
Grant, M. J. 2013b. Music and Punishment in the British Army in the Eighteenth and
Nineteenth Centuries. The World of Music 2 (1): 9–30.
Grant, M. J. 2014. Pathways to Music Torture. Musique et Conflits Armés après 1945 4: 2–19.
Heft, H. 2001. Ecological Psychology in Context: James Gibson, Roger Barker and the Legacy of
William James’s Radical Empiricism. Mahwah, NJ: Erlbaum.
Hui, A. 2016. Aural Rights and Early Environmental Ethics: Negotiating the Post-War
Soundscape. In Current Directions in Ecomusicology, edited by A. S. Allen and K. Dawe,
176–187. New York: Routledge.
Johnson, B., and M. Cloonan. 2009. Dark Side of the Tune: Popular Music and Violence. Farnham,
UK: Ashgate.
Juslin, P., and J. A. Sloboda. 2011. Handbook of Music and Emotion: Theory, Research,
Application. Oxford: Oxford University Press.
Kenneally, T. 1982. Schindler’s Ark. London: Hodder and Stoughton.
Lagouranis, T., and A. Mikaelian. 2008. Fear Up Harsh: An Army Interrogator’s Dark Journey
through Iraq. New York: New American Library.
Lahmann, C., R. Schoen, P. Henningsen, J. Ronel, M. Muehlbacher, T. Loew, et al. 2008. Brief
Relaxation versus Music Distraction in the Treatment of Dental Anxiety: A Randomized
Controlled Clinical Trial. Journal of the American Dental Association 139 (3): 317–324.
Lesiuk, T. 2005. The Effect of Music Listening on Work Performance. Psychology of Music
33 (2): 173–191.
McCoy, A. W. 2006. A Question of Torture: CIA Interrogation from the Cold War to the War on
Terror. New York: Metropolitan.
Neisser, U. 1978. Perceiving, Anticipating and Imagining. In Minnesota Studies in the Philosophy
of Science IX, edited by C. W. Savage. Minneapolis: University of Minnesota Press.
300 w. luke windsor
North, A. C., D. J. Hargreaves, and J. Mckendrick. 1999. Music and On-Hold Waiting Time.
British Journal of Psychology 90 (1): 161–164.
Pieslak, J. 2009. Sound Targets: American Soldiers and Music in the Iraq War. Bloomington:
Indiana University Press.
Reed, E. S. 1996. The Necessity of Experience. New Haven, CT, and London: Yale University Press.
SEM. 2007. Position Statement on Torture. http://www.ethnomusicology.org/?PS_Torture.
Accessed December 13, 2018.
Shevy, M. 2008. Music Genre as Cognitive Schema: Extramusical Associations with Country
and Hip-Hop Music. Psychology of Music 36 (4): 477–498.
Sloboda, J. A. 2005. Assessing Music Psychology Research: Values, Priorities and Outcomes.
In Exploring the Musical Mind, edited by J. A. Sloboda. Oxford: Oxford University Press.
Smith, P. C., and R. Curnow. 1966. “Arousal Hypothesis” and the Effects of Music on Purchasing
Behaviour. Journal of Applied Psychology 50: 255–256.
Standley, J. M. 1986. Music Research in Medical/Dental Treatment: Meta-Analysis and Clinical
Applications. Journal of Music Therapy 23 (2): 56–122.
Szafranski, R. 1995. A Theory of Information Warfare: Preparing for 2020. Air Power Journal
9 (1): 56–65. https://www.airuniversity.af.edu/Portals/10/ASPJ/journals/Volume-09_Issue-
1-Se/1995_Vol9_No1.pdf. Accessed December 13, 2018.
Tansik, D. A., and R. Routhieaux. 1999. Customer Stress-Relaxation: The Impact of Music in a
Hospital Waiting Room. International Journal of Service Industry Management 10: 68–81.
Volcler, J. 2013. Extremely Loud: Sound as a Weapon. New York: New Press.
Welly, A., H. Lang, D. Welly, and P. Kropp. 2012. Impact of Dental Atmosphere and Behaviour
of the Dentist on Children’s Cooperation. Applied Psychophysiology and Biofeedback 37 (3):
195–204.
Wilson, S. 2003. The Effect of Music on Perceived Atmosphere and Purchase Intentions in a
Restaurant. Psychology of Music 31 (1): 93–112.
Windsor, W. L. 2000. Through and around the Acousmatic: The Interpretation of
Electroacoustic Sounds. In Music, Electronic Media and Culture, edited by S. Emmerson,
7–35. London: Ashgate.
Windsor, W. L., and C. de Bézenac. 2012. Music and Affordances. Musicae Scientiae 16:
102–120.
Yalch, R. F., and E. Spangenberg. 1993. Using Store Music for Retail Zoning: A Field Experiment.
Advances in Consumer Research 20: 632–636.
Young, J. S. 1954. Communist Vulnerabilities to the Use of Music in Psychological Warfare.
Washington, DC: George Washington University.
chapter 15
Jonathan Weinel
Introduction
films and video games have sought to provide audiences with ever more exotic experiences
and storylines, representations of hallucination have also been incorporated.
The focus of this chapter is on the material design of these representations of
hallucination within audiovisual media and the role of sound within these. First, I discuss
the form of visual hallucinations, auditory hallucinations, and synesthesia. Following
this, I consider how these may provide a basis for the design of audiovisual artworks.
Many of these artworks can be categorized as either diegetic or synesthetic in their
essential operation, as I reveal through an examination of examples from avant-garde
films, feature films, light shows, visualizations, VJ performances, music videos, and video
games. Following this exploration, I propose a conceptual model with three continua
that describes a range of possible approaches for the representation of ASCs using
audiovisual media. One of the theoretical configurations implied by this model is what
I refer to as “augmented unreality”: the convergent layering of synthetic sensory
information on real-world environments in order to simulate hallucinatory experiences
of unreality through digital media.2 Augmented unreality benefits from technological
advances such as high-resolution computer graphics, projection mapping, and multi-
channel surround sound systems. These allow not only for greater levels of accuracy in
the representation of hallucinations but also for these to be embedded in arenas where
audiences may not be expecting such an encounter. In these spaces, the boundaries
between the real physical environment and the synthetic unreal can be subverted and
dissolved; and it is this illusory capability that presents an important new paradigm
shift for digital cultures. Early examples of augmented unreality can be seen in elec-
tronic dance music culture, in which projection mapping techniques, decor, and sonic
manipulations are combined to simulate the experience of hallucinations at outdoor
psychedelic music festivals. In this chapter, through consideration of these various
examples in relation to the conceptual model, I will demonstrate how sound is used in the
context of audiovisual representations of hallucinations, and the role it may provide in
the emerging paradigm of augmented unreality.
The term “altered states of consciousness” rose to prominence in the 1960s, to describe
the variety of conscious states that lie beyond the typical experience of normal waking
consciousness (Ludwig 1969). The varieties of ASCs include: psychosis, such as may be
experienced by schizophrenics; psychedelic experiences as produced by hallucinogenic
drugs such as LSD; the hallucinations caused by sensory deprivation; states of hypnosis;
trances, as experienced in spirit possession rituals; and states of meditation that are used
in Buddhism and other religions. Dreaming is also sometimes considered as a form of
ASC (e.g., Hobson 2003), as are the unusual states that occur on the boundaries of sleep
synesthetic artworks and audiovisual hallucinations 303
Visual Hallucinations
Considering the visual effects of hallucinogens in more detail, Heinrich Klüver (1971)
carried out studies exploring the effects of mescaline on the visual system. These and
related studies (Ostler 1970; Siegel 1977) explored the commonality of visual patterns
of hallucinations between subjects. Klüver proposed a set of “form constants”: lattices,
cobwebs, funnels, and spirals that constitute the basic form from which the visual impres-
sions perceived during mescaline hallucinations are derived. According to Klüver, in the
early stages of hallucination these form constants provide the basis for the visual pat-
terns of hallucination commonly described, while in later stages of hallucination, other
forms such as tunnels may be abstracted from these basic forms. In their study, Bressloff
and colleagues (2001) suggested that these form constants arise in the visual cortex, and
they are believed to be a cross-cultural feature of visual perception during hallucinations.
Figure 15.1 presents a spiral image based on the form constants, as used in Psych Dome:
an interactive audiovisual installation based on hallucinations (Weinel et al. 2015).
Auditory Hallucinations
Though visual hallucinations seem to be more prevalent, auditory hallucinations are
also commonly reported. Studies of auditory hallucinations have mainly focused on
schizophrenics, who experience “auditory-verbal hallucinations” (AVHs), in which
voices are heard as if from the external environment or inside the head (Wayne 2012, 87).
Though AVHs are the most common type experienced by people with schizophrenia,
304 JONATHAN WEINEL
Figure 15.1 Artistic impression of visual patterns of hallucination from Psych Dome (Weinel
et al. 2015). The Psych Dome installation uses a consumer-grade electroencephalograph (EEG)
headset to control parameters of an audiovisual artwork based on hallucinations.
“non-verbal auditory hallucinations” (NVAHs) are also known to occur and may
consist of hallucinated music (Kumar et al. 2014), bangs, or noises (Jones et al. 2012).
Neuroimaging studies have suggested that auditory hallucinations activate the parts of
the brain involved in inner speech and Heschl’s gyrus (the auditory cortex), supporting
the view that auditory hallucinations are perceived with a sense of reality comparable to
that of sounds that have origins in the external environment (Dierks et al. 1999). In the
hallucinations caused by drug experiences, perception of sound is also altered, ranging
on a continuum from enhanced enjoyment (or otherwise) to distortions in sound qual-
ity and total hallucination of sounds with no external acoustic origin (Weinel et al. 2014).
The latter may consist of either AVHs or NVAHs.
Figure 15.2 illustrates a continuum of aural experience from normal waking con-
sciousness to total hallucination (as discussed in Weinel et al. 2014). In normal waking
consciousness, auditory input comes predominantly from external sensory input, which
provides a basis for aural perception. As hallucinatory effects are intensified, the per-
ceptual experience of sounds becomes enhanced; sounds are perceived as more or less
enjoyable than usual or as profoundly significant. Further along the scale, the subjective
experience of sound becomes distorted, as if properties such as volume, spatial location,
or audio quality have been altered or manipulated with digital signal processes. As these
effects intensify, the balance shifts from external to internal sensory inputs that arise
within the brain. In the most extreme cases, total experiences of hallucination occur
that consist of hallucinated noises, voices, or music that have no acoustic origin in the
external environment.
synesthetic artworks and audiovisual hallucinations 305
Synesthesia
The term “synesthesia” comes from the Greek syn (union) and aisthesis (sensation), and
describes the dissolution of boundaries between the senses (Cytowic 1989). In such
experiences, sounds may have tastes or colors may have smells. These are not merely
imagined correspondences but actual experiences across the senses that are caused by a
given stimuli. The phenomenon is reported as a general trait for some individuals in
typical states of waking consciousness. However, psychedelic drugs such as mescaline,
psilocybin, or LSD are also known to promote experiences of synesthesia. Although
synesthesia can involve the blurring of any of the sensory modalities, in these psychedelic
experiences sounds often trigger corresponding visual images (e.g., Bliss and Clark
1962, 97), suggesting the directional flow of information that is illustrated in Figure 15.3.
Toward Representation
The visual and sonic components of hallucinations can be used to inform the design of
corresponding visual images and sounds. Indeed, such practices may be very old; it has
been proposed that examples of early shamanic rock art might have been based on the
visual images seen during hallucinations (Lewis-Williams 2004). In more recent examples
of psychedelic art and films, the internal experience of hallucinations can be represented
through appropriate design of audiovisual content. The design of this content has been
assisted by developments in sound and visual technology, such as computer graphics
and audio techniques that have allowed almost any sound or visual image imaginable to
be created. These technologies have allowed the subjective visual and aural experience
of hallucinations to be represented in digital video, by creating materials that correspond
Distorted
Total
Hallucination Internal Hallucination
Sound Hallucinated
Representation Imagery
with the visual or aural experiences observed during ASCs. Audiovisual artworks
have also enabled sound-to-image processes, similar to those found in synesthesia, to
be realized through the design of moving images that correspond with music.3 In recent
years, these representations of hallucination have also become interactive, as video
game technologies present simulations of hallucination or synesthesia. As we shall see,
representations of hallucination do not have to follow one fixed approach but may use a
variety of possible approaches, ranging from those that seek to replicate visual or aural
experience as accurately as possible to those that use more stylized approaches such as
impressionism, metaphorical imagery, or symbolism.
Audiovisual Representations
of Hallucinations
These categorizations are by no means definitive, but provide a useful means through
which we can initially begin to distinguish some key differences between works that use
representations of hallucination. “Diegetic representations of hallucinations” is the
phrase that describes representations of ASCs occurring within narrative contexts and
applies to examples in various films and 3D video games. These examples use the illusory
properties of audiovisual media in order to construct narratives involving characters in
various environments. Within these narratives, scenes of hallucination are portrayed
through the use of various audiovisual techniques that enable changes to the conscious
state of the character to be communicated to an audience. In contrast, “synesthetic
artworks” provide audiences with sensory experiences of sound and light similar to
those that may be experienced during hallucination. Artworks in this category do not
typically present these representations of synesthesia within a narrative framework;
examples of synesthetic artworks can be found in avant-garde visual music films, visual-
izations, VJ performances, music videos, and interactive music visualizations. These
two categories can also be distinguished by whether they use audiovisual represen
tations of hallucination to enrich the sensory experience of a present location (synesthetic
artworks), or immerse the audience in a narrative depiction of another time and place
(diegetic representations of hallucination). In the following subsections, each of these
categories is illustrated through a selection of examples.
synesthetic artworks and audiovisual hallucinations 307
Figure 15.2: from sounds that have an acoustic basis within the diegetic environment, to
distorted versions of these, and sounds that are entirely internal products of hallucination
with no acoustic basis in the diegetic environment.
Later examples, such as Enter the Void (Noé 2009), push further still toward accurate
representations of hallucination with the aid of CGI and digital audio techniques. Enter
the Void uses a sustained first-person perspective: the camera presents the subjective
eye-view of the protagonist, allowing the audience to see what he sees (including his
blinking eyelids); while sound presents his aural experience so that the audience hears
what he hears. Sound is not only used to relate his conversations, but also to reveal the
inner speech of his thoughts that are delineated from vocal speech by processing the
dialogue with an echo effect. Early in the film, the character smokes a glass pipe con-
taining DMT (dimethyltriptamine), a powerful hallucinogen with a rapid onset and
short duration. As he inhales the drug and the effects onset, his vision becomes blurred
and spots of light flash across his visual field. He closes his eyes, and we see a network of
organic fibers and fractal patterns (creating using CGI), suggestive of abstractions from
Klüver’s (1971) form constants. Throughout this sequence we hear an abstract sound
collage, in which the sounds from the Tokyo streets below are processed with flangers
and other effects in order to suggest perceptual distortions and auditory hallucination.
Through these various techniques, Enter the Void demonstrates how both sound and
visual images can be used to render the subjective experience of visual and auditory
hallucination with improved levels of accuracy, so that the media presented bears
stronger resemblance to the visual and aural experiences people actually describe
during hallucinations.
In recent years, the use of computer graphics and sound to describe visual and
auditory hallucinations has also been used in interactive media, such as first-person
shooters (FPS) video games. For example, Weinel’s (2011) Quake Delirium demo project
and Far Cry 3 (Ubisoft 2012) are video game projects that provide animated visual
properties in order to simulate distortions to visual perception, while also using digital
effects and sounds to simulate auditory hallucinations. In the latter game, the simulation
of hallucination provides a means through which to enrich the narrative, but also
demonstrates an emerging paradigm shift in which games allow the player to explore
new potentialities through the simulation of altered states of consciousness in the con-
text of virtual worlds.
Synesthetic Artworks
Synesthetic artworks present audiences with experiences of light and sound that are
comparable to those that may occur during a hallucination, without the use of a clearly
defined narrative context. “Visual music” is a form of avant-garde film that is specifically
orientated toward synesthetic forms (Brougher and Mattis 2005). In the films of artists
such as Len Lye, Norman McLaren, Oskar Fischinger, and John Whitney, animated
arrangements of color and shape are used to form dynamic relationships similar to those
synesthetic artworks and audiovisual hallucinations 309
found in musical composition. While much of the work in this idiom has been
characterized by the quest for a harmonic visual language that Whitney (1980) articulated
in his writings on visual music, some works were also conceptualized as representations
of the internal experiences of the “inner eye” (Wees 1992). Harry Smith’s Early Abstractions
(1946–1957) series7 and Jordan Belson’s visual music films, such as Allures (1961) and the
unfinished LSD (1962), are notable as examples that seek to present internal sensory
experiences through film. Both artists used music as a complement to their visuals,
creating synesthetic audiovisual experiences for their audiences. Although both drew
inspiration from their own experiences of ASCs, their work can be more appropriately
seen not as attempts to convey their own first-person experience but as constructing
new sensory experiences for their audiences that provoke a form of synesthesia through
the use of audiovisual media. This approach was also explored through the use of psyche-
delic light shows such as: Jordan Belson and Henry Jacob’s Vortex Concerts; works by the
USCO collective (Davis 1975, 67; Oren 2010); and Andy Warhol’s Exploding Plastic
Inevitable shows with live music by the Velvet Underground (Youngblood 1970, 102–105;
Joseph 2002). For audiences on psychedelic drugs, these light shows may provide a com-
plementary experience; however, they also construct a multimodal experience of sound
and light for those individuals who are not operating under a chemically altered mind-
set, and this imitates the processes of synesthesia, constructing a similar experience
synthetically through sound and projections.
New technologies such as light synthesizers and computer software acted as a catalyst
for the furthering of these synesthetic audiovisual experiences from the late 1970s onward.
Early sound-to-light devices such as the Atari Video Music (1976) can be seen as simu-
lating sound-to-image synesthesia (as in Figure 15.3). Subsequently, programs such as
Jeff Minter’s Psychedelia (Llamasoft 1984), Trip-a-Tron (Llamasoft 1988), Virtual Light
Machine (VLM) (Llamasoft 1990), and later Neon (Llamasoft 2004), are successive
iterations of synesthetic equipment that incorporate progressive levels of computational
integration between sound and image (Minter 2005). Along with the availability of
computer graphics software on home computers, programs such as these, and hardware
such as the NewTek Video Toaster, would be among those that supported the nascent VJ
(“video jockey”) performances that flourished in tandem with the electronic dance
music culture8 of the 1990s, as demonstrated on the Studio !K7 X-Mix (1993–1998)
series. The mode of these is essentially one of sensory stimulation, and incorporates
replications of visual hallucinations and synesthesia: looping 3D graphics, fractals, and
cycling textures are combined in correspondence with music to produce impressions of
psychedelic hallucinations and rave culture iconography. This VJ culture became a com-
mon element of larger dance music clubs and outdoor raves and has also grown to
encompass the use of projection mapping technology that allows multiple surfaces to be
used as video screens. Modern VJ software allows the use of real-time audio parameters
as a means to manipulate graphical filters that are applied to predesigned video clips, or
as parameters that drive animations. Recent examples of this type of work include the
videos of VJ Chaotic (Ken Scott), such as Forever Imaginary (2014a), and planetarium
(“fulldome”9) works such as Crystallize (2014b).
310 JONATHAN WEINEL
The discussion so far has outlined two main types of representation of hallucinatory
ASCs: diegetic representations that present hallucinations within the context of a narra-
tive progression, and synesthetic artworks that enrich the sensory experience through
the presentation of hallucinatory audiovisual experiences.10 In order to further consider
the differences implicated by examples within these groups, Figure 15.4 presents a
conceptual model describing possible approaches for the representation of ASCs using
three continua: “input,” “mode of representation,” and “arena space.”
Stylized
Mode of
Representation
Transported
Arena Space
Accurate
Situation
Input
External Internal
Input
The x axis of the model describes input, and corresponds with Hobson’s (2003, 44–46)
discussion of sensory Input that can be modulated between internal and external
sources. Visual or sonic materials can be used to represent experiences of external sensory
experience (e.g., impressions of actual environmental surroundings), or internal sen-
sory experience (e.g., hallucinated visions or sounds). For instance, a narrative represen-
tation of hallucination may include visual and auditory elements that describe either
an actual environment or a hallucination. Modulation between both external and
internal elements is also possible, such as if an audiovisual representation of an actual
environment is presented with gradually increasing distortions and the introduction of
hallucinated elements.
Mode of Representation
The y axis of the model describes mode of representation, which may range from “accurate”
to “stylized.” “Accurate” representations are those that attempt to render the visual or
auditory elements of hallucination as authentically as possible for the audience; hence,
visual effects may be used to present the visual experience of hallucination in a way that
closely approximates the first-person experience, while sound may be used to render
auditory distortions and auditory hallucinations.11 At the opposite end of this con-
tinuum, “stylized” describes a wide range of artistic possibilities for rending hallucinations,
such as through the use of art styles such as impressionism, cartooning, symbolism, or
metaphorical techniques.12 Modulation between accurate and stylized approaches is
possible, such as if an accurate representation diverges into the use of metaphorical
materials during certain sequences in order to describe hallucinations. Such modulation
is not uncommon, as movie directors often show the onset of hallucinations using visual
effects or geometric patterns, before transitioning into the use of symbolic or metaphorical
cinematic materials to describe the more intense phases of hallucination.
Arena Space
The z axis of the model describes arena space: the entire performance space in which
musical and visual elements are presented.13 At one end of this continuum, “transported”
approaches are those that seek to remove the audience from the awareness of their
real-world context through immersion into the illusory audiovisual medium. This is the
position typically used by diegetic works that seek to absorb the audience into a fictional
world and narrative. At the other end of this continuum, “situational” approaches are
those that work in conjunction with the real-world environment, presenting sound and
visual images that enhance the experience of the “here and now” (as opposed to the
312 JONATHAN WEINEL
“then and there”). Synesthetic artworks such as psychedelic light shows at rock concerts
often use this approach, since they aim to stimulate the senses of the audience within the
present. Modulation between transported and situational approaches is also possible,
since an audiovisual work may operate in conjunction with the arena space or seek to
transport the listener from it at various points during a performance.
In Practice
As demonstrated in Figure 15.5, the conceptual model can be used to describe the represen-
tational approach used by various examples, such as those discussed previously.
Enter the Void (Noé 2009) uses representations of both internal and external sensory
experience and modulates between the two as the protagonist shifts between normal
and hallucinatory states of consciousness. Due to these modulations, the actual point on
the conceptual model changes through the course of the film; hence, the ellipse indicates
not one point but the approximate range that is traversed over time. The mode of represen-
tation in Enter the Void leans toward accurate representations of ASC, and as a fictional
narrative, it seeks to transport the audience from awareness of the movie theater into the
diegesis of the story.
Fear and Loathing in Las Vegas (Gilliam 1998) also uses both internal and external
inputs; the hotel lobby scene described earlier includes real-world sounds of the environ-
ment and modified versions of these that suggest movement along the continuum
toward internal sensory perception and hallucination. However, while aspects of the
visual and auditory approach used in Fear and Loathing in Las Vegas correspond with
the actual form of ASCs, the mode of representation is relatively more stylized than
Enter the Void. As the work is diegetic, use of the arena space is similarly “transported”
Stylized
Fear & Loathing in
Las Vegas
Allures
Arena Space
Accurate
Situation
Input
External Internal
for this work, and indeed this is the arena space position for most works in the “diegetic
representations of hallucination” group.
Psychedelic visual music films such as those by Jordan Belson do not generally
include representations of external elements; visual elements are descriptive of visual
impressions of inner experience and therefore occupy the internal part of the axis, as
indicated for Allures (1961) and the unfinished work LSD (1962) in Figure 15.5.
Considering the mode of representation, these films each fall somewhere between
accurate and stylized positions. For instance, LSD leans toward accuracy through the
depiction of forms similar to Klüver’s form constants; it resembles the type of imagery
people actually describe during visual hallucinations with closed eyes during LSD trips.
In contrast, Allures is a more metaphorical work. Both works could be considered as
“situational,” since they aim to actually induce synesthetic experience rather than transport
the listener into a fictional narrative. The situational approach is also the typical position
for many other works discussed in the “synesthetic artworks” category, since psyche-
delic light shows and VJ performances typically seek to bombard the senses with light
and sound and enhance the sensory experience of a space, rather than extract the
individual from his or her awareness of it.
Augmented Unreality
Stylized
Mode of Synthetic
Representation Internal Unreality
Transported
Real-world
External Environment Arena Space
Accurate
Situation
Input
External Internal
to converge with the external environment through the use of techniques such as the
imitation and processing of visual or aural information derived from the external environ-
ment; the external environment then becomes an input source that can be subjected to
graphical or sonic transformations. We find visual examples of this in the spectacles of
projection-mapped buildings, where artists use the actual form of the building and its
texture as a basis for the design of transformed materials. In sound, we find a similar
principle in electroacoustic compositions such as Rajmil Fischman’s No Me Quedo . . .
(2000; discussed in Fischman 2008), which uses recorded sound and digital transforma-
tions to provide convergence between instrumental sounds and synthetic electroacoustic
sounds. The delivery of these illusory forms of media is supported by the availability of
increasingly powerful technologies, such as multichannel speaker systems and multi-
projection mapping systems. These allow the media to be delivered convincingly, and
their (semi)portable nature also enables the illusions to be “thrown” and sited outside of
the usual arenas of cinemas or on computer screens where we might otherwise expect to
see them. This, in turn, allows the potential for illusory encounters that are unexpected
and, in some cases, may be indistinguishable from the real, physical environment. It is
the combination of convincing illusory media, coupled with the ability to site or throw
these anywhere, that exposes an important paradigm shift for digital culture, since almost
any public space is then a potential location where perceived reality can be corrupted
through the augmented unrealities of digital media. In ideal cases, the high-quality
sound and graphics will allow the surface of the media to qualitatively approach the
point where its synthetic nature cannot be detected with certainty, while the portability
of these illusions will help to catch audiences off-guard.
Early examples of augmented unreality can be observed in electronic dance music
culture. For example, psychedelic trance culture15 prioritizes the aesthetics of the
psychedelic experience in music, and at outdoor festivals ultraviolet decor and VJ col-
lectives such as Trip Hackers and Artescape design synesthetic visual elements that are
intended to mesh with outdoor (real-world) festival environments (e.g., Dickson 2015).
Projection mapping is used in conjunction with sculptural elements that provide
custom surfaces for projection and temporary architectural spaces that imitate the form
of visual hallucinations and mandalas. These sculptural elements allow animated
fractals and tunnel elements suggestive of visual hallucinations to be integrated into
real-world environments such as forests, subverting the physical reality of these situations.
As heard on Durango’s Tumult (2005), these visual elements are typically used in
conjunction with music that includes a combination of rhythmic and melodic elements
(intended to produce maximum energetic dance effects), coupled with sounds such
as noises and voices that are suggestive of auditory hallucinations. These sounds are
manipulated using high-quality digital spatialization and transformations, enabling the
enhancements and distortions of auditory hallucinations (Figure 15.2) to be represented
through sound. Both sounds and visual materials then explicitly simulate the sensory
experience of visual and auditory hallucinations. Since the light show is linked to the
audio, the form of synesthesia is also imitated, so that the colors and movement of visual
images fluctuate and jump in response to the sounds. The overall effect is “situational,”
316 JONATHAN WEINEL
since it works in conjunction with the real-world, outdoor setting of the festival,
integrating real environmental features such as trees, birds, and the skyline into the
equation. Digital media are used to elicit a synthetic experience of unreality in a manner
that blends with the real, physical environment, and thus augmented unreality (Figure 15.6)
is accomplished. In these situations, it is entirely possible that the audience may begin to
experience dissolution of the boundaries between the real environment and the synthetic
presentations of unreality. This may be especially true for audiences using chemical
substances to alter their mind-sets, however drugs may not be a prerequisite, since the
illusory properties of digital media alone could be sufficient to provide such experiences.
As the audiovisual technologies discussed thus far become pervasive, the capability
to convincingly invoke augmented unreality should increase. Although I have character-
ized augmented reality here in terms of projections and loudspeakers, it is possible that
other emerging audiovisual technologies could also be used to achieve similar effects.
For example: wearable video equipment such as the “smart contact lenses,” which play
and record video (currently in development); augmented/mixed reality glasses such as
the Microsoft’s Hololens; or headphone systems such as Doppler Labs’ Here (Doppler
Labs 2015), which modifies and filters sounds from the external environment, are among
those could theoretically be used to simulate hallucinations and achieve augmented
unreality.16 The long-term implications of this type of media could be dramatic, as the
glow of synthetic virtual environments and their accompanying sonic vibrations extend
over the everyday, allowing the potential to simulate ASC experiences without the use of
intoxicating substances.
Concluding Remarks
This chapter has provided an outline of the main effects of hallucinations (a form of
ASC) with regard to the visual and aural components of the experience, including
sound-to-image synesthesia. As we have seen, the typical form of psychedelic hallucin
ations follows some structural norms that produce commonality in the experiences
between participants. These norms have allowed the representation of hallucinations
in a variety of audiovisual media such as films, visualizations, and computer games.17
These can be broadly classified in terms of diegetic representations of hallucination
and synesthetic artworks and may use a range of possible approaches. These possible
approaches can be considered in terms of the conceptual model presented, which
allows the use of input, mode of representation, and arena space to be considered for a
given work. The conceptual model also allows us to reflect on the recent move toward
improved accuracy found in representations of hallucination and as afforded by digital
technologies for sound and computer graphics. I have argued that this drive toward
realism, coupled with new technologies for siting work in ad hoc locations, has opened
up a new paradigm of “augmented unreality,” in which real external environments
and synthetic representations of unreality converge. Augmented unreality is currently
synesthetic artworks and audiovisual hallucinations 317
exemplified by the synesthetic environments of psychedelic trance festivals, but over the
next few decades we can expect the trend to grow as illusory audiovisual technologies
become increasingly pervasive. As these technologies provide improved resolutions and
capabilities for modifying audience experience, the boundaries between external reality
and synthetic unreality may dissolve to the point where the two can no longer be distin-
guished; in effect, producing synthetic digital forms of ASCs.
Notes
1. In drawing on Hobson’s distinction of “external” and “internal” sensory inputs, we should
note that he does not propose these as binary categories, but rather a continuum of possible
states. It is acknowledged that internal processes can significantly shape normal waking
consciousness, and indeed, conversely, in some cases the contents of dreams can also be
influenced by external sensory inputs. What is important here is the main origin of sensory
material, which in normal waking consciousness is predominantly “external,” unlike
dreams and hallucinations that are primarily “internal.” As I explore in this chapter, both
the real (external) and the unreal (internal) can provide a basis for corresponding art,
sound, and music.
2. While the emphasis here is on digital practices, many of the essential approaches I explore
in this chapter were first proven with analog technologies such as film and magnetic tape,
and before that, techniques such as painting and the use of acoustic instruments.
3. It should be acknowledged here that experiences of synesthesia are highly individualized;
nonetheless, in drug experiences we find that a common mechanism of sound-to-image
synesthesia occurs, along with typical visual effects such as the “form constants”
(Klüver 1971). In this regard, there are generalizable processes that audiovisual media can
begin to reproduce, even if the specific manifestations of synesthesia that are experienced
by individuals may remain somewhat elusive.
4. As discussed by Sitney (1979, 21), the concept of the “trance-film” (similar to the “psycho-
drama”) describes films on such themes as dream, somnambulism, ritual, or possession.
5. “Thelemic” refers to the use of iconography derived from Aleister Crowley’s Thelema
religion, which Anger was a member of. These icons are presented in Inauguration of the
Pleasure Dome (Anger 1954) as if they were visual hallucinations, suggesting that the ritual
invokes visionary experiences related to the Thelemic principles.
6. For further discussion of the metaphorical use of reverb to suggest internal psychological
processes in films and popular music, see Doyle (2005).
7. During this period Harry Smith created a series of untitled films, of which several were
subsequently lost or destroyed (Sitney 1979, 232–233). Early Abstractions (1946–1957)
collects the remaining films from this series.
8. For a further discussion of electronic dance music culture, see St. John (2009).
9. “Fulldome” environments project video on to the hemispherical ceiling of a dome structure,
in order to provide an immersive 360° experience. These environments are used for
planetarium shows, but have also been used to provide various forms of expanded cinema.
Notable fulldome events showcasing new work in the United Kingdom have included
Mario DiMaggio’s Dome Club series and FullDomeUK.
10. The description of “hallucinatory audiovisual experiences” here does not presume that
audiences experience a hallucination in exactly the same way as would be precipitated by
318 JONATHAN WEINEL
other means (e.g., psychedelic drugs); rather, the experience of sound and images may
elicit distinct illusory experiences that imitate the form of hallucinations.
11. Instead of the term “accurate” we might otherwise have used the term “realistic” here, to
describe the stylistic approach taken, in correspondence with “realist” approaches in the
visual arts (e.g., photorealism). As Kennedy (2008, 449–450) remarks, realist approaches
can be used for depicting actual scenes, but they can also be used when rendering the
imaginary (or in this case, the hallucinatory). However, for our purposes here the
terms “realist” or “realistic” are unhelpful, since by definition the hallucinatory is unreal;
hence the term “accurate” is preferable, to avoid having to describe unreal materials as
also “realistic.”
12. For a further discussion of metaphors in art, see also Kennedy (2008).
13. The term “arena space” is borrowed from Smalley (2007), and describes “the whole public
space inhabited by both performers and listeners” (42) Here, the term is adapted to include
audiovisual elements.
14. The convergence of synthetic and real-world materials here is a development and adap-
tation of Fischman’s (2008) discussion of convergence of instrumental and electronic
materials in electroacoustic music, especially his own composition No Me Quedo . . . (2000).
15. For more information on psychedelic trance culture, see St. John’s definitive account
Global Tribe: Technology, Spirituality and Psytrance (2012).
16. In a series of public lectures, Carl Smith (see also 2014, 2016) has described these and
other technologies as enabling a new paradigm that he refers to as “context engineering”:
computer systems that allow the user to modify his or her contextual awareness, using
“reality as a medium.” In these terms, “augmented unreality” could be considered as a
specific branch of context engineering.
17. For an expanded discussion of how ASCs may be represented or induced across a wide
range of electronic music and audiovisual media, see also Weinel (2018).
References
Anger, K., dir. 1954. Inauguration of the Pleasure Dome.
Arnott, R. 2014. Soundself. Video game.
Belson, J., dir. 1961. Allures. USA.
Belson, J., dir. 1962. LSD. USA.
Bliss, E. L., and L. D. Clark. 1962. Visual Hallucinations. In Hallucinations, edited by L. J. West,
92–107. New York: Grune & Stratton.
Bressloff, P. C., J. D. Cown, M. Golubitsky, P. J. Thomas, and M. C. Wiener. 2001. Geometric
Visual Hallucinations, Euclidean Symmetry and the Functional Architecture of Striate
Cortex. Philosophical Transactions: Biological Sciences 356:299–330.
Brougher, K., and O. Mattis. 2005. Visual Music: Synaesthesia in Art and Music since 1900.
London: Thames & Hudson.
Buñuel, L., dir. 1929. Un chien andalou. France.
Corman, R., dir. 1967. The Trip. American International Pictures.
Cytowic, R. E. 1989. Synesthesia: A Union of the Senses. New York: Springer-Verlag.
Davis, D. 1975. Art and the Future. New York: Praeger.
Deren, M., and A. Hammid, dirs. 1943. Meshes of the Afternoon. USA.
Dickson, C. 2015. Earthdance Cape Town 2015: Main Stage Installation and Video Mapping by
Afterlife. Vimeo. https://vimeo.com/139905544. Accessed October 25, 2015.
synesthetic artworks and audiovisual hallucinations 319
Smalley, D. 2007. Space-Form and the Acousmatic Image. Organised Sound 12 (1): 38–58.
Smith, C. H. 2014. Context Engineering Hybrid Spaces for Perceptual Augmentation. In
Electronic Visualisation and the Arts (EVA 2014), 244–245. London: British Computer
Society. http://www.bcs.org/upload/pdf/ewic_ev14_s18paper3.pdf. Accessed September 29,
2016.
Smith, C. H. 2016. Context Engineering Experience Framework. In Electronic Visualisation
and the Arts (EVA 2016), 191–192. London: British Computer Society. http://dx.doi.org/
10.14236/ewic/EVA2016.37. Accessed September 29, 2016.
Smith, H. E., dir. 1946–1957. Early Abstractions. USA.
St. John, G. 2009. Technomad: Global Raving Countercultures. London: Equinox.
St. John, G. 2012. Global Tribe: Technology, Spirituality and Psytrance. London: Equinox.
Studio !K7. 1993–1998. X-Mix. Video Series.
Thompson, H. S. (1971) 2005. Fear and Loathing in Las Vegas. Reprint. London: HarperCollins.
Ubisoft. 2012. Far Cry 3. Sony PlayStation 3.
Wayne, W. U. 2012. Explaining Schizophrenia: Auditory Verbal Hallucination and Self-
Monitoring. Mind and Language 27 (1): 86–107.
Wees, W. C. 1992. Making Films for the Inner Eye: Jordan Belson, James Whitney, Paul Sharits.
In Light Moving in Time: Studies in the Visual Aesthetics of Avant-Garde Film, edited by
W. C. Wees, 123–152. Berkeley: University of California Press. http://publishing.cdlib.org/
ucpressebooks/view?docId=ft438nb2fr;brand=ucpress. Accessed October 25, 2015.
Weinel, J. 2011. Quake Delirium: Remixing Psychedelic Video Games. Sonic Ideas (Ideas
Sonicas) 3 (2): 22–29.
Weinel, J. 2018. Inner Sound: Altered States of Consciousness in Electronic Music and Audio-
Visual Media. New York: Oxford University Press.
Weinel, J., S. Cunningham, and D. Griffiths. 2014. Sound through the Rabbit Hole: Sound
Design Based on Reports of Auditory Hallucination. In ACM Proceedings of Audio Mostly
2014. Denmark: Aalborg University. doi: 10.1145/2636879.2636883
Weinel, J., S. Cunningham, N. Roberts, S. Roberts, and D. Griffiths. 2015. EEG as a Controller
for Psychedelic Visual Music in an Immersive Dome Environment. Sonic Ideas (Ideas
Sonicas) 7 (14): 85–91.
Whitney, J. H. 1980. Digital Harmony: On the Complementarity of Music and Visual Art.
Peterborough: Byte Books/McGraw-Hill.
Youngblood, G. 1970. Expanded Cinema. New York: E. P. Dutton.
chapter 16
Consum er Sou n d
Introduction
This chapter deals with one of many methods (namely descriptive sensory analysis) used
for the objectification and quantification of the consumer’s imagination with respect to
the audio signal; it provides a justification of the method as used in the audio industry
for the design of audio playback technology that maximizes the potential for control
ling or improving the consumer’s auditory imagination. In the first section, the basic
assumptions and procedures behind sensory analysis are introduced. These are exem
plified by the quantitative descriptive analysis (QDA) method. QDA is one of the basic
methods in sensory analysis of food or sound quality; it addresses and controls the
complex influence of an individual listener’s expectations, mood, previous experiences,
and so on in an experimental context. In the following section, an example of a complete
sensory analysis of a complex sound field is provided, followed by details of the subse
quent development of a perceptual model for prediction of the attribute distraction in a
particular type of sound field. In the final section, upcoming and future developments in
this area are discussed.
The traditional role of the audio industry1 has been to provide means for a listener to
perceive and experience audio content (as made by some content creator, e.g., a music
artist or sound designer) at anytime and anywhere after the production of the content.
This includes products or services that are used to record and store the sound (micro
phones, tape, records, CD players, and so on); processes and products for transmission
of the sound to the end consumer; and finally, products for reproducing the sound in the
consumer’s home, car, or other listening venue. Theile (1991) states that a reproduction
system should “satisfy aesthetically and it should match the tonal and spatial properties
of the original sound at the same time.” A primary goal of the industry has therefore
always been “transparency”—that is, to create an impression or auditory experience2 for
the listener so that, for example, during a news broadcast it is possible for the listener to
form an auditory image of the announcer being “in the listening room” (as opposed to
322 søren bech and jon francombe
being in a remote studio). Another example is to enable any listener to imagine that he
or she is in the concert hall where, say, a classical music performance took place. The
main goals for researchers in academia and industry have therefore been to understand
the processes involved in the entire transmission chain (from recording to repro
duction) and to develop products that allow the listener to perceive auditory images that
(1) correspond to actual “participation” in the original performance; and (2) accurately
reflect the original and unmodified intentions of the artist and the producer.
This goal has driven a range of research areas under the general term “acoustics” that
is defined by ANSI/ASA (2013) as: “(a) Science of sound, including its production, trans
mission, and effects, including biological and psychological effects; (b) Those qualities
of a room that, together, determine its character with respect to auditory effects.” Specific
areas in the present context include “communication acoustics” (Blauert 2005; Pulkki
and Karjalainen 2015) and signal processing in acoustics (Havelock et al. 2008). The
audio industry has continuously improved or developed new techniques and a range of
products with the overall purpose of improving the ability of the rendering/reproduction
process to allow the listener to experience a perceptual image equivalent to that which
would accompany the original acoustic event. For example, over a number of decades
the optimal reproduction system has developed from a single channel (monophonic)
reproduction system through two-channel stereophony, 5.1 “surround sound,” and more
recently to advanced surround sound systems including 22.2 reproduction (Hamasaki 2011,
and references therein). Such systems and their evaluation are discussed further at the
end of this chapter. The increase in complexity of the recording/reproduction systems
was in part made possible by the introduction of digital signal processing; however, it
was not until the introduction of advanced encoding and decoding of audio and video
signals in products that such multichannel systems and other signal “manipulation”
techniques became widely available to consumers.
The introduction of digital signal processing in mass-market audio and video prod
ucts such as mp3 audio players produced a new range of possibilities for further improv
ing the quality of audio or video signals, and therefore the quality of “imagination” based
on the consumer’s auditory experience. In addition to benefits such as general higher
quality, increased number and availability of programs, and more advanced features, a
number of signal artifacts were unfortunately also introduced. These included “ringing”3
in audio and “squared clouds”4 in video. These artifacts were very noticeable even for the
average consumer. In order to remove or technically compensate for these imper
fections, the industry had need of “measuring” methods that could connect the physical
properties of the signals with the perceived auditory impression of the consumers. This
was not a new problem or topic area; researchers in psychophysics had been investigat
ing such relationships for years (see, e.g., Gescheider 2015, for an introduction), focusing
on “simple” auditory experiences such as the perceived strength of a sound (loudness).
However, the new problem was how to quantify complex multidimensional experiences
that, in addition to simple attributes such as loudness and timbre, also included a
number of completely unnatural artifacts (such as “squared clouds” in video). The
first task in this process was therefore to devise an experimental paradigm that could
consumer sound 323
the spatial properties of the rendering to a much larger degree than ever before,
thereby further increasing the degrees of freedom for controlling or improving the
listener’s auditory images. This means that sensory analysis of spatial properties of
reproduced sound is currently a hot research topic in many large projects—see, for
example, the work of the “S3A: Future Spatial Audio for an Immersive Listener
Experience at Home” project.5
The introduction of new signal processing techniques (as discussed earlier) meant that
perceptual audio scientists needed to develop ways of quantifying the auditory experiences
of consumers presented with complex auditory stimuli. Various methods, often adapted
from other sensory sciences (for example, food science), have been used to achieve this
aim. This section includes a description of the basic assumptions of descriptive analysis and
the main principles of the QDA method. The content will be a summary of information
presented by Bech (1999), Bech and Zacharov (2006, chap. 4), and Martin and Bech
(2005). Readers are referred to these publications for additional details of QDA, other
methods, and general references.
The use of assessors to evaluate and report on the auditory experiences produced by a
certain set of stimuli in a scientifically valid manner requires (at least) two basic issues to
be clearly defined: first, the question the assessor is required to answer; and second,
a specification of how the assessor should report the answer.
The definition of the question to the assessor depends on the purpose of the experi
ment and the stimuli he/she is subjected to in the experiment. In a laboratory setting,
specific stimuli can be engineered to answer specific questions; conversely, in field set
tings the stimuli will be naturally occurring and this will have an impact on the type of
questions that can be posed to the assessors. In the Adonis project (introduced previ
ously), the key questions were related to general image quality—or lack thereof—due to
artifacts in the processing of the natural images; therefore, the stimuli had to be complex
natural images. However, in order to establish a scientifically valid relationship between
the physical phenomena and signal processing introduced to the original image and the
overall quality changes, it was necessary to focus first on specific aspects or attributes of
the image, and then to determine how these contributed to the overall image quality.
The simplified conceptual model of human perception shown in Figure 16.1 was there
fore established, inspired by previously developed models by Plomp (1976), Nijenhuis
(1993), Yendrikhovskij (1998), and Stone and Sidel (2004).
The process starts with a physical stimulus—in this case a sound field—that impinges
on the auditory system of an assessor. The sound field can be described by a number of
physical variables Φk each with a physical strength or intensity (e.g., sound pressure level
consumer sound 325
Sound field
Auditory system
Learning & ?
Context & ?
Individual
I1…………………In
impressions
Combination rules
Total auditory
Itot
impression
and frequency). The auditory system transforms the mechanical activity of the eardrum
into nerve impulses that are assumed to be combined in the brain of the assessor, result
ing in a number of specific auditory attributes Ψl (e.g., pitch, loudness), each with a sen
sorial strength Sm. The sensorial strength of each attribute depends on the physical
strength of the variables Φk in combination with the properties of the auditory system
and experimental factors such as learning effects. For the present purpose it is sufficient
to characterize these properties by the auditory sensitivity (e.g., can you hear the sound
or not) and selectivity for example, the “just noticeable difference threshold” (i.e. can a
certain physical change of an audible sound be noticed or not).6
The next step in the process is assumed to be the result of a combination of the
individual attributes Ψl, with sensorial strength Sm, into specific impressions In. Finally,
these individual impressions are combined into an overall auditory impression Itot.
It is assumed that the combination of the specific attributes Ψl (each with a sensorial
strength Sm) into individual impressions In, as well as the combination of individual
impressions into an overall impression, depends on context, expectations, the mood of
the assessor, and so on.
326 søren bech and jon francombe
The assumed relationship between the physical domain and the attribute domain is
shown in Equation 1. Equation 2 shows the relationship between the attribute domain
and the total auditory impression.
where kk represents a weighting factor reflecting the importance of the physical variable,
mn represents the weight of each individual impression in forming the overall impres
sion, and ε represents the noise or unexplained variance in the dataset. The assumption
made when using QDA or similar methods is that the assessor rating7 of each stimulus
for each attribute corresponds to its sensorial strength (Sm).
These very simplistic engineering relations have been shown to be able to describe the
experimental results and predict the outcome of new experiments for a large number
of situations in, for example, audio, video, or food quality experiments. It is also noted
that if these relationships can be established then it is possible to relate changes in the
physical variables directly to changes in the general impression of the assessors. This
represents key information for understanding human behavior and for the development
of new food, fragrance, audio, or video products, and explains why the descriptive
methods are used commonly by manufacturers in these areas. It is important to note
that Equations 1 and 2 do not represent a complete model of the human decision process
and they should not be expected to describe more than a maximum of 80–90 percent of
the variance in a dataset. However, this is quite often enough to make some very useful
estimations of future assessor behavior.
The model shown in Figure 16.1 was developed into the filter model (shown in
Figure 16.2) by Pedersen and Fog (1998).
Physical
Perceived stimulus +/– Likes/dislikes
stimulus
Figure 16.2 The “filter model” developed by Pedersen and Fog (1998) inspired by Bech et al.
(1996) to describe the process of human sound perception.
consumer sound 327
The model now operates with three domains—the physical, the perceptual, and the
affective domains—which are characterized by the measurement principle that is nor
mally applied. In the physical domain, standard physical measures are used to character
ize the stimuli; in the perceptual domain, the stimuli are characterized by the assessor’s
judgment of the sensorial strength of the relevant individual attributes; and finally, in
the affective domain, the assessor’s rating of, for example, the overall auditory impres
sion (see Eq. 2) is used for characterization of the stimuli.
The general idea or principle of descriptive analysis is therefore to identify the indi
vidual attributes for the stimuli of interest and have assessors judge the sensorial strengths
of each of these. This is often done under highly controlled laboratory conditions
using a limited number of assessors (e.g., 15–20) as described for the QDA method. The
affective assessments are then established in the field using a large number of consumers
(e.g., 100–200). The two resulting perceptual datasets can be combined with the physi
cal variables; this process can provide the requisite information to be able to advise
the engineering department on how to achieve a certain strength of individual attributes
or overall impression.
The QDA method was developed by Stone and Sidel (2004) and is one of basic
methods that specifies in detail the entire experimental process, including identification/
elicitation of attributes, training of assessors, planning and conducting experiments,
analyzing the results, and presentation of the results and conclusions. The method is
described by Stone and Sidel:
The QDA method exhibits the following properties (only those relevant to audio evalu
ations are listed here). The QDA method:
The QDA method employs the so-called direct elicitation principle, in contrast to
other indirect elicitation methods. The direct elicitation principle assumes that there is a
close relationship between the individual attributes (Ψl) and verbal descriptors (single
words) elicited as a part of, for example, the QDA process. This is contrary to the indi
rect elicitation principle where it is not assumed that this relationship exists, and other
methods are used—for example, multidimensional scaling (see Shifmann et al. 1981, for
an introduction), in which the assessors rate only the perceived (dis)similarity between
the stimuli. The statistical analysis then allows for an identification of the individual
sensory dimensions that are assumed to be related to individual or groups of related
attributes. There are advantages and disadvantages to direct and indirect elicitation
techniques; both have been used—sometimes in conjunction—in audio attribute
elicitation studies. A full discussion is beyond the scope of this chapter, but Mason and
colleagues (2001) present a detailed review of the challenges of capturing a listener’s
imagination or impression of an auditory scene using verbal descriptors.
The direct elicitation principle assumes that it is possible to elicit a number of words
(a vocabulary) where each word corresponds to a specific attribute.8 Two main tech
niques exist for elicitation of this vocabulary:
The QDA method uses the consensus vocabulary technique, and Lawless and
Heyman (1998) list the following properties, in order of importance, which each word in
the vocabulary should preferably fulfill. An attribute should:
The QDA method defines a number of basic steps when applying the method:
Meilgaard and colleagues (1991) list the following generic requirements for selection
of a panel of assessors. Assessors should have the ability to:
Phase one includes a representative set of stimuli that excites all of the sensory
ifferences that are relevant for the experiment or product portfolio at hand. Before the
d
first session, assessors are typically asked to prepare their own list of words that they can
imagine for a predefined scenario—for example, considering the differences between
the sound reproduction equipment they possess in their home. In the first session the
team is subjected to the stimuli and asked to explain/discuss the meaning of their contri
butions and organize all of the contributed words into categories that represent the
same meaning/interpretation/percept. Phase two includes removing duplicate words
in each category, agreeing on a common word or attribute for each category, and eventu
ally adding a brief description of how the attribute should be interpreted. Phase three
includes further discussions of the categories of words and the agreed on common
attribute for each category, followed by the selection of representative stimuli that clearly
exhibit the agreed attribute. Phase four includes the first series of practical tests, where
the differences between the stimuli cover a large perceptual range. The subjects are asked
to discuss and define the endpoint markers of a rating scale that will be used to rate
or rank-order the stimuli for each of the agreed common attributes. The subjects also
familiarize themselves with the process of scaling the intensities of the selected stimuli
for each of the attributes. Typically, a graphical 15-cm horizontal line with no tick
marks except endpoint markers offset by 1.5 cm at each end is used for the rating process.
The assessor is asked to indicate by either a movable curser or tick, the rating of the
stimuli in question (e.g., see “Stage Four: Attribute Ratings later” later). Phase five intro
duces stimuli with smaller differences and repetitions are included in the experiment.
The results of the experiments in phase five are used to check the response system for
logical inconsistencies, and to check the abilities of the assessors and selected attributes
by answering the following questions:
• Are assessors consistent in their ratings of repeated stimuli for all attributes?
• Are assessors agreeing on the ratings (ranking) of individual stimuli and
attributes?
Phase six is the final check of the paradigm developed, and it includes experiments with
test conditions that are similar to those in real tests.
Zacharov and Lorho (2005) include an example of the development phases just
described. Vocabularies of attributes, developed using QDA or other methods, have been
published for specific sensory modalities. For example, Noble and colleagues (1987)
developed the “wine aroma wheel,” Bech and colleagues (1996) developed a list for image
quality of CRT displays, and Pedersen and Zacharov (2015) developed the sound wheel.
The statistical analysis of the preliminary training experiments in phases four and
five usually employs analysis of variance (ANOVA) models or more advanced pro
cedures (such as those described by Næs and colleagues 2010), and can be executed using
either commercial software or freeware such as Panelcheck10 or Consumercheck.11 Both
Panelcheck and Consumercheck were developed as part of research projects aimed at
developing statistical procedures specifically for sensory experiments and implementing
consumer sound 331
them so that nonexperts in statistics can easily use them. The ongoing assessment of
panel performance is especially important; procedures for that specific purpose are
included in Panelcheck or eGauge (Lorho et al. 2010), and the details of a specific proce
dure (eGauge) are described in ITU-R (2014b).
Once the panel and initial list of attributes have been established, the ongoing
training is used to maintain the attribute list and to check the performance of the
panel members. Thereafter, a typical application of the QDA procedure in audio
includes the following points:
1. definition of the stimuli (for example, the selection of loudspeakers and programs
to be tested);
2. initial listening sessions with all members of the panel present, focusing on the
selection of attributes from the existing vocabulary such that all perceptual differ
ences are covered by the selected attributes. A typical selection includes ten to
fifteen attributes;
3. conducting the listening tests where each stimulus (e.g., a loudspeaker-program
combination) is rated for all attributes selected in the initial listening sessions.
There are many options for the practical implementation of the final tests (see,
e.g., Bech et al. 2005; Martin and Bech 2005; Hegarty et al. 2007; Postel et al. 2011);
however, it is important that only one attribute be rated at a time. This forces the
assessor to keep focus on the interpretation of that particular attribute and the
differences between, for example, loudspeakers for a given program; and
4. statistical analysis of the results. This includes, in addition to the standard tests of
the quality and properties of the raw data, analysis for each of the attributes where
the main variables (e.g., loudspeakers and programs) are examined. The correlation
between the examined attributes should also be analyzed—for example, using
principal component analysis—to determine the number of independent attributes.
Experience from listening and viewing tests at Bang & Olufsen suggests that
highly trained subjects can distinguish between a maximum of four to five attri
butes independently from an initial group of ten to fifteen attributes. In addition
to examining the main variables, it is also important to check the performance of
the panel (as discussed earlier). Further details of the complete statistical analysis
of sensory data are presented by Næs and colleagues (2010).
This section has described the considerations that led to the development of
experimental paradigms aimed at analysis of highly complex, sensory experiences.
The QDA method has been described as an example of one of the basic methods, and
references are given to other more recent paradigms. To illustrate the process of a
sensory analysis in detail, the following section includes a description of a PhD proj
ect included in a recent research project named “Perceptually Optimized Sound
Zones” (POSZ). The PhD project was aimed at developing a perceptual model for
prediction of human perception of the interaction between separate sound zones in a
domestic situation.
332 søren bech and jon francombe
In this section, the POSZ project will be briefly introduced, followed by the descriptive
analysis procedure that was used to ultimately develop a predictive model of the
main aspect of the listener experience (namely, perceived distraction). The POSZ project
brought together researchers in signal processing and audio perception in order to
develop perceptually optimal algorithms for producing personal sound zones. In a per
sonal sound zone situation, two (or more) separate sound fields are produced in sepa
rate zones in a room in such a way that multiple program items (one item in each zone)
can be reproduced simultaneously over the same loudspeakers; consequently, multiple
listeners distributed between zones can listen to different program material without
the need for headphones. The reproduction of personal sound over loudspeakers—as
opposed to headphones—has a number of advantages that are worth the extra signal
processing required: removing the need for headphones enables communication between
people even if they are consuming separate audio programs, and also facilitates much
greater awareness of the environment (this is particularly important in an automotive
scenario e.g., for road awareness and safety).
The signal processing required to produce such a complex sound field introduces
considerable artifacts that are likely to degrade the target quality. At the same time, it is
difficult to achieve perfect separation between zones, meaning that a listener may expe
rience unwanted audio interference on their target audio program. The descriptive
analysis performed as part of the POSZ project focused on the latter perceptual problem
(i.e., imperfect separation), as there has been considerable prior work on modeling
audio quality (e.g., ITU-R 2001; Rumsey et al. 2008; Conetta et al. 2008; Dewhirst et al.
2008a, 2008b; George et al. 2008). A series of perceptual tests was performed to deter
mine the perceptual experience of a listener in an audio-on-audio interference situation
(i.e., a situation in which the experience of listening to some target audio is modified by
a secondary interfering audio program). However, an attempt was also made to quantify
the magnitude of the effect of these different facets (Baykaner et al. 2015).
In the previous section, the QDA paradigm—a strictly controlled and specified
method—was outlined. There have been numerous other methods, with similarities to
and differences from QDA, which are often trademarked and must be carefully con
trolled if they are to be strictly followed (Lawless and Heymann 1998, 227–257; Delarue
et al. 2016). In practice, it is common for researchers to select aspects of these methods as
required for particular elicitation tasks, leading to the development of new methods or
simply to ad hoc techniques that are appropriate for particular studies. Murray et al.
(2001) term such methods “generic descriptive analysis.”
consumer sound 333
Figure 16.3 User interface for the free elicitation task. Stimuli were replayed by clicking the
circular buttons. Participant responses were typed into the text box at the bottom of the screen.
assigned to a set of buttons, which were positioned above a text box into which responses
could be typed. The multiple stimulus presentation meant that participants could also
compare between stimuli, widening the pool of potential descriptors. Five trained lis
teners and four untrained listeners performed the first stage of the test (see below for
a discussion on participant experience). A total of 572 unique words and phrases were
produced in this first stage.
The second stage featured a set of team discussions that were intended to reduce
the large set of individually elicited words and phrases into a manageable set of carefully
defined attributes. The underlying assumption was that many of the responses from
stage one, although ostensibly unique, were describing essentially the same experience.
The task for the participants was to find the optimal terminology for labeling and
describing the underlying percept. The trained and untrained participants performed
the team discussion separately. Each phrase was presented back to the team (using phys
ical printouts on small cards), and the participants were asked to categorize together any
of the responses that described the same percept. It was necessary for the participants to
reach a consensus when performing the categorization. When all of the responses had
been categorized, participants were asked to produce an attribute definition (a label for
the category), endpoint definition (terms that could be used as the positive and negative
endpoints of a scale of the attribute), and an attribute description (a short description of
the percept that could be understood by someone who had not participated in the
experiment). The experiment was facilitated by an experimenter who played no active
consumer sound 335
part in the discussions, serving only to administer the task (e.g., by presenting the phrases
and documenting the results). The experimenter was well versed in the background of
the tests but was careful to avoid taking an active part in the discussions so as to avoid
biasing the results.
Using this procedure, the trained listeners categorized 259 responses into 9 attributes,
and the untrained listeners categorized 313 responses into 8 attributes. A further team
discussion was performed with both sets of participants in order to unify the attribute
sets. A number of minor changes were made to definitions, descriptions, and endpoints.
Where there were duplicate attributes in the two sets, the participants generally agreed
that the trained listener labels and descriptions should be retained. The final attribute set
included twelve attributes (see Francombe et al. 2014a, for details): masking; calming;
distraction; separation; confusion; annoyance; environment; chaotic; balance and blend;
imagery; response to stimuli over time; and short-term response to stimuli.
Play/Stop Play/Stop
Next
Reference Test
Trial 1/54
Figure 16.4 User interface for the attribute reduction stage. Stimulus playback was controlled
using the buttons at the bottom of the screen. The attribute labels and definitions were posi
tioned at random on the grid of buttons.
Table 16.1 Attribute Labels, Descriptions, and Endpoints for the Four Attributes
That Were Used at Significantly Greater Than Chance Frequency in the Attribute
Reduction Stage
Attribute Description Endpoints
Annoyance To what extent the alternate audio causes irritation when Very annoying to Not at
trying to listen to the target audio all annoying
Distraction How much the alternate audio pulls your attention or Not at all distracting to
distracts you from the target audio Overpowered
Balance and How you judge the blend of sources to be Complementary to
blend Conflicting
Confusion How confusing the merge of the two audio programs Extremely confusing to
is—rhythmically, melodically, or harmonically; how they Not at all confusing
blend together. Confusion because the sources interact
with each other
consumer sound 337
Figure 16.5 User interface for the attribute rating stage. Stimuli were replayed by clicking the
labeled circular buttons, and ratings were given using the vertical sliders.
ratings were made on the four attributes carried forward from the attribute reduction
stage. A multiple stimulus paradigm, modified from the standardized BS.1534-3
“MUSHRA” test (ITU-R 2015), was used: participants gave ratings on 15-cm vertical sliders
with endpoint label positions 1.5 cm from the scale ends. The user interface is shown in
Figure 16.5. A reference stimulus (just the target audio with no interference) could be
played by clicking a button labeled “R” that was positioned in line with the 0 point of the
scale (i.e., not at all distracting). The stimuli could be played by clicking the labeled but
tons at the top of the page; the distraction score was given by setting the associated slider
to the desired position. The experiment was performed by the listeners who had par
ticipated in the attribute elicitation as well as a small team of new participants in order to
ensure that the attributes could be used and understood outside of the original panel.
A principal component analysis (PCA) (Næs et al. 2010, 209–226) was performed to
assess the relationships between the four attributes. In PCA, orthogonal vectors that
explain the maximum variance are consecutively extracted from the attribute rating
data. The attributes (and ratings) can then be plotted in the new lower-dimensional
space to easily allow interpretation of the relationship between the attributes as well as
the relationship between attributes and ratings. The PCA solution is plotted in Figure 16.6.
The vectors show the correlation between each attribute and the first two principal com
ponents; the angle and length of each vector indicates the degree to which the associated
attribute is correlated with the two visualized components. The number of dimensions
on which the original data is represented can be chosen by considering metrics such as
“variance explained” or by visual evaluation of a scree plot. In this analysis, almost all of
the variance in the data could be explained by two components, indicating that there
338 søren bech and jon francombe
1
Balance and Blend
Annoyance
0
Distraction
–0.5
–1
–1 –0.5 0 0.5 1
Component 1 (88.5% variance explained)
Figure 16.6 Principal component representation of four attributes. The vectors show the
correlation between each attribute and the two principle components represented in the plot
(and can therefore be different lengths depending on the strength of the relationship).
was considerable redundancy in the four attributes. The first component accounted
for 88.5 percent of the variance and was related to both annoyance and distraction. The
second, explaining a further 10 percent of the variance, was related to balance and blend.
The attribute confusion was equally loaded onto both dimensions. There were no appar
ent differences in the PCA solution between the participants who had taken part in the
whole experiment and those who only performed the rating task.
Further analysis of participant agreement suggested that confusion was the least well
understood of the four attributes (i.e., the ratings exhibited least agreement between
participants), while distraction was most well understood (or at least, participants used
the scale in the same way). Consequently, distraction was selected as the attribute to
model; it was strongly related to the component that explained the vast majority of the
variance in the data, and it was well understood by the participants.
Attribute Modeling
As discussed previously, it is hugely beneficial to be able to predict the human response
in a sensory evaluation task in a quick and repeatable manner. It is therefore desirable to
develop predictive models that use measured, physical features of the sound field to
derive predictions of the human response. As described earlier, the first stage in this pro
cedure is determining the correct perceptual attribute to model: in this case, distraction
consumer sound 339
due to the presence of some interfering audio program was found to be most appropriate.
It is then necessary to collect a large amount of human data constituting ratings of the
attribute for different stimuli—preferably over the entire stimulus space that the model
might encounter in its target usage domain. As well as collecting subjective ratings, it is
necessary to determine the physical parameters of the stimuli that contribute to the
ratings, in order that the mathematical relationship between physical parameters and
ratings can be modeled.
In order to collect a set of ratings, a pool of one hundred audio-on-audio interference
situations was created. It was considered desirable to ensure that the training stimuli
covered a wide range of potential broadcast audio content, but also that the model train
ing was not biased by closely controlling a set of physical parameters prior to the feature
extraction stage. Consequently, the stimulus set was established using a random sampling
method, in which program items were taken from online radio stations at randomly
generated times (Francombe et al. 2014b). The items were loudness matched using a
perceptual model prior to being used for the construction of the test stimuli by varying
a set of parameters. The test parameters (target level, interferer level, and interferer
direction12) were not varied in a full factorial manner—they were determined at random
(within reasonable ranges). In this manner, a diverse and representative training set was
developed. Listener ratings of distraction were collected using the same methodology as
for the attribute ratings described above (a multiple stimulus presentation rating test).
Participants exhibited strong agreement in their ratings, which helped to validate selection
of the attribute distraction. The random sampling stimulus selection method was found
to produce a set of stimuli that evenly covered the full range of the perceptual scale.
The next challenge was extraction of relevant physical parameters from the stimuli.
The range of features that it is possible to extract from audio recordings is multifarious;
therefore, selecting the correct features is a crucial and difficult task in any modeling
process. To aid with this procedure, participants were asked to write down reasons that
they had for finding the audio-on-audio situations distracting; the written response data
was analyzed using a form of verbal protocol analysis (Ericsson and Simon 1993, 1–62) to
generate a set of categories, which was then used to motivate the search for features.
Audio features were extracted using a variety of freely available toolboxes to produce
a set of 399 features. The categories and extracted features are described by Francombe
and colleagues (2015b).
After the creation of a large feature set, the next challenge is model fitting. There is a
variety of different methods of modeling data, but in this case, a simple linear regression
model was used. One of the main advantages of such a model is that it is easy to interpret
the relationship between the features and the response variable. This is not always the
case; in more complex model structures (for example, neural networks), this relation
ship can be obscured. The feature selection process involves training a large number of
models and using some criteria to determine which is the best. As an exhaustive search
through a large feature set (e.g., 399 in this case) is prohibitively time consuming, it is
common to use a search algorithm; in this case, a stepwise feature addition and removal
procedure was followed.
340 søren bech and jon francombe
One of the primary concerns for a predictive model is its generalizability; that is,
the model should be able to make predictions for situations outside of those on which
it was trained. As the number of features in a regression model tends toward the num
ber of data points, it becomes possible to mathematically account for all of the vari
ance in the data. However, this is not beneficial, as it is very unlikely that the model
will be able to make an accurate prediction for a new data point that falls outside of the
training data set. This problem is known as overfitting. It is far better for the model to
have some error, but for the features to accurately describe a physical phenomenon
and therefore to generalize to new situations, than for the model to very accurately
predict the training set but fail under new circumstances. It is therefore desirable to
minimize the number of features in the final model, while still including all features
that describe physical processes that determine the human response. A further consider
ation when selecting features for a linear regression model is the relationship between
each predictor. The linear regression model works under the assumption that the fea
tures do not correlate highly with each other (this is known as multicollinearity
between features).
There are two primary metrics that describe the performance of a model. Goodness-
of-fit is primarily measured using root-mean-square error (RMSE)—this quantifies
the difference between the measured subjective response and the model prediction.
The amount of variance explained by the model is measured by the coefficient of
determination, R2. Both metrics can be altered to reduce the chance of overfitting. Cross-
validation can be used to estimate the performance of the model on data points outside
of the training set. In cross-validation, a number of data points are withheld from the
training set, but used for testing (e.g., calculating the RMSE). This process can be
repeated for multiple groups of “holdout” data. The R2 statistic can be adjusted in such a
way that models with a higher number of features are penalized.
These adjusted statistics were used to ensure that the features selected were general
izable as well as providing an accurate fit. The final model included five features:
overall loudness; target-to-interferer ratio; interference-related perceptual score from
the “Perceptual Evaluation methods for Audio Source Separation” (PEASS) toolbox
(Emiya et al. 2011); high-frequency level range of the interferer; and percentage of
temporal windows with low target-to-interferer ratio. The model exhibited an RMSE
of approximately 10 percent on the training set and explained 88 percent of the vari
ance in the data.
Regardless of how well a model fits the training data, success or failure can only really
be assessed through validation on a new dataset, that is, on data points for which subjec
tive responses are available but were not used to train the model. In this manner, the
generalizability and accuracy of the model can truly be tested. Two validation data sets
were used to test the POSZ distraction model. The first used ratings from stimuli col
lected using the same procedure as that used for the training set data collection (but
were not included during the model training). The second validation set used stimuli
collected for a previous experiment, which were consequently different in some regards
(the program items were longer and some exhibited different conditions such as filtering
consumer sound 341
or the presence of simulated road noise). The RMSE increased from 10 percent to
approximately 12 percent and 16 percent for the two datasets respectively; the explained
variance (indicated by R2) decreased from 88 percent to 82 percent and 78 percent
respectively. This relatively modest reduction in performance suggested that the final
model was generalizable to a range of audio-on-audio interference situations with music
program material.
Discussion
The procedure just described was designed to ensure that a robust model of a relevant
facet of listener experience for a relatively new and unknown listening situation could
be created. The model was shown to perform well for training and validation datasets; it
has since been tested in a number of situations and found to perform very successfully,
with error remaining at approximately 10 percent (Rämö et al. 2016).
It is hoped that having an accurate model will enable quick, perceptually relevant
evaluation of personal sound zones. Some efforts have also been made to use the model to
optimize a sound zone generation system by selecting optimally positioned loudspeakers
(Francombe et al. 2013).
We believe that one of the primary reasons for the success of the model was the compre
hensive attribute elicitation experiment, which ensured that the correct facet of the
listening experience was being modeled. It was consistently found that the attribute
distraction produced strong agreement between participants; this is invaluable when
collecting training data. There are numerous mathematical modeling methods, feature
selection tricks, and so on; however, it is often the quality of the subjective training data
that is most important when developing such a model.
The elicitation procedure described drew heavily on some well-established ideas
within the literature but also introduced some novel aspects. It has been widely stated
(e.g., Lawless and Heymann 1998) that descriptive attributes should be developed
by trained participants while hedonic judgments (e.g., preference) should be made
by untrained participants. For the task of investigating the experience of a listener in
a personal sound zone system, we felt that it was desirable to perform the elicitation
experiment with both trained and untrained listeners. While the trained listeners
tended to give better descriptions (this was reflected by the selection of trained
listener attributes where there was overlap between the two sets), there were also
unique and important attributes determined by the untrained participants (e.g.,
balance and blend, which was found to be one of the four most relevant attributes
and explained a small but notable proportion of variance in the ratings). Of course,
there are some sensory evaluation tasks that require a high degree of experience—for
example, where very small degradations or artifacts are present. However, in the
case of audio-on-audio interference in sound zones, the perspective of untrained
listeners—who will ultimately be the end users of any commercial system—was
definitely valuable.
342 søren bech and jon francombe
The work described thus far has shown that perceptual models, developed using
advanced sensory science methodologies, are useful for pure research and for product
optimization. However, it is hard to see how perceptual scientists will ever complete the
task of quantifying the imagination of consumers with relation to complex auditory
scenes. The development of new and ever-more advanced signal processing methods is
unlikely to slow and, in fact, spatial audio is the topic of much current research. For
example, some recent or current large projects include the BILI project,13 the S3A: Future
Spatial Audio project,14 and the ORPHEUS project.15
Two-channel stereo reproduction has been prevalent in domestic and professional
audio replay for a number of decades. Five-channel surround sound has also seen
considerable uptake, if not quite to the same level of ubiquity as two-channel stereo.
However, there are a number of different surround sound reproduction methods available.
Channel-based methods have varying loudspeaker counts and positions (including
loudspeakers above and below the listener); a set of common loudspeaker layouts has
been standardized by the International Telecommunication Union (ITU-R 2014a).
Methods that require fewer channels and less set-up effort—such as headphones and
soundbars—are also becoming increasingly popular, particularly for domestic audio
reproduction. In the last year or so, the boom in virtual reality technology has yet again
pushed realistic spatial audio to the forefront of many research agendas. As technology
enables production of more complex and realistic experiences—even those that might
not relate to real-world situations—quantification of experience and imagination
remains of utmost importance.
Descriptive analysis experiments have been performed to try to uncover the per
ceptual differences between reproduction methods—see Francombe and colleagues
(2015a) for a review of relevant literature. The resultant picture is complex; there are many
different attributes, but limited consensus on their exact meanings or on which are most
important. There has been a recent effort to consolidate the existing research in order to
produce a standardized set of terms (Pedersen and Zacharov 2015; Zacharov et al. 2016),
drawing parallels with the ubiquitous wine aroma wheel (Noble et al. 1987).
Another current research topic is the development of faster and more efficient experi
mental methods; the so-called FastTrack or RaPID methods (see Delarue et al. 2016;
Moulin et al. 2016). The purpose is to increase the efficiency of the experimental effort
while maintaining the statistical quality of the data. This is especially important for
industrial applications, but also in academia for pilot experiments.
The research area described and exemplified in this chapter represents another step
in the development of sound reproduction techniques that will allow the listener to
imagine the sound event as intended by the creator/artist anywhere and anytime.
However, it is an ongoing challenge for evaluation methods and models to keep up
with the development of new sound reproduction and processing technologies.
consumer sound 343
We feel that the benefit of in-depth perceptual understanding and optimization make
this a worthwhile effort.
Notes
1. The audio industry is here defined to include researchers working at universities in areas
such as signal processing, electroacoustics, psychoacoustics, and psychology. The area also
includes developers working in companies producing products for recording, storing,
transmitting, and rendering sound. The “products” include new principles and algorithms,
as well as systems for recording, encoding, transmitting, decoding, and rendering sound
in the consumer’s home.
2. The following terminology, based on Dorsch (2016), is used in this chapter. A listener is
exposed to a sound field in an environment and a perception or percept is created after the
transformation of physical energy to neural information by the auditory system. The percept
results in an auditory impression or auditory experience. Based on the auditory impression,
one or more auditory images are created. The reader is referred to Dorsch (2016) and other
chapters in the handbook for a further discussion of imagination.
3. “Ringing” refers to added oscillations of an electrical or acoustic signal that were not
present in the original signal. The audible consequences can be that the signal continues
when it should have stopped; this is most noticeable on transient signals such as drums.
4. “Squared clouds” refers to a visible artifact in images where clouds have squared edges
instead of smooth edges as in nature. This is typically caused by a limited resolution in the
bit stream or loss of information during transmission of the signal.
5. http://www.s3a-spatialaudio.org/. Accessed October 5, 2017.
6. These auditory properties can be measured accurately using a range of psychophysical
procedures; however, it is outside the scope of this chapter to discuss them in further
detail. The reader is referred to Gescheider (2015).
7. The reader should note that the rating will reflect the assessor’s sensitivity to the attribute in
question plus a general component reflecting the so-called bias, which is a measure of an asses
sor’s tendency to respond that a stimulus is present compared to not present. These two com
ponents can be separated using signal detection theory; see, for example, Gescheider (2015).
8. It is noted that several of the elicited words and the corresponding ratings could be repre
sentative of the same attribute; however, such multicollinearity is identified and resolved
during the statistical analysis.
9. An example could be if the term “quality” is a part of the definition of the word, as “quality”
is often ambiguous to assessors.
10. http://www.panelcheck.com/.
11. https://consumercheck.co/.
12. These terms refer to a sound zone setup with two or more zones. Target level represents
the level of the primary sound in zone A (in which the assessor is situated). Interferer level
represents the level of the sound in zone A caused by the interference of sound from the
other zones. Interferer direction represents the spatial direction of the interfering sound
from other zones.
13. http://www.bili-project.org.
14. http://www.s3a-spatialaudio.org/.
15. https://orpheus-audio.eu/.
344 søren bech and jon francombe
References
ANSI/ASA. 2013. Acoustical Terminology. S1.1–2013. American National Standards Institute/
Acoustical Society of America.
Baykaner, K., P. Coleman, R. Mason, P. J. B. Jackson, J. Francombe, M. Olik, et al. 2015. The
Relationship between Target Quality and Interference in Sound Zones. Journal of the Audio
Engineering Society 63 (1–2): 78–89.
Bech, S. 1994. Perception of Timbre in Small Rooms: Influence of Room and Loudspeaker
Position. Journal of the Audio Engineering Society 42 (12): 999–1007.
Bech, S. 1999. Methods for Subjective Evaluation of Spatial Characteristics of Sound. In
Proceedings of the Audio Engineering Society 16th International Conference: Spatial Sound
Reproduction, 487–504. New York, NY: Audio Engineering Society.
Bech, S., M.-A. Gulbol, G. Martin, J. Ghani, and W. Ellermeier. 2005. A Listening Test System
for Automotive Audio, Part 2: Initial Verification. In Proceedings of the Audio Engineering
Society 118th Convention, 487–504. Barcelona, Spain. Convention paper 6359. New York,
NY: Audio Engineering Society.
Bech, S., R. Hamberg, M. Nijenhuis, C. Teunissen, H. Looren de Jong, P. Houben, et al. 1996.
Rapid Perceptual Image Description (RaPID) Method. In Proceedings of SPIE 2657, 17–28.
Bellingham, Washington, USA.
Bech, S., and N. Zacharov. 2006. Perceptual Audio Evaluation: Theory, Method and Application.
Chichester, UK: Wiley.
Beranek, L. L. 1962. Music, Acoustics and Architecture. New York: Wiley.
Blauert, J. 2005. Communication Acoustics. Berlin: Springer.
Conetta, R., F. Rumsey, S. Zielinski, P. J. B. Jackson, M. Dewhirst, S. Bech, et al. 2008. QESTRAL
(Part 2): Calibrating the QESTRAL Model using Listening Test Data. In Audio Engineering
Society 125th Convention. San Francisco. Convention paper 7596. New York, NY: Audio
Engineering Society.
Delarue, J., D. B. Lawlor, and D. M. Rogeaux. 2016. Rapid Sensory Profiling Techniques and
Related Methods: Applications in New Product Development and Consumer Research.
Cambridge: Woodhead.
Dewhirst, M., R. Conetta, F. Rumsey, P. J. B. Jackson, S. Zielinski, S. George, et al. 2008a.
QESTRAL (Part 4): Test Signals, Combining Metrics, and the Prediction of Overall Spatial
Quality. In Audio Engineering Society 125th Convention. San Francisco. Convention paper
9598., New York, NY: Audio Engineering Society.
Dewhirst, M., P. J. B. Jackson, R. Conetta, S. Zielinski, F. Rumsey, D. Meares, et al. 2008b.
QESTRAL (Part 3): System and Metrics for Spatial Quality Prediction. In Audio Engineering
Society 125th Convention. San Francisco. Convention paper 9597., New York, NY: Audio
Engineering Society.
Dorsch, F. 2016. Hume. In The Routledge Handbook of Philosophy of Imagination, edited by
A. Kind, 40–54. London: Routledge.
Emiya, V., E. Vincent, N. Harlander, and V. Hohmann. 2011. Subjective and Objective Quality
Assessment of Audio Source Separation. IEEE Transactions on Audio, Speech, and Language
Processing 19 (7): 2046–57.
Ericsson, K. A., and H. A. Simon. 1993. Protocol Analysis: Verbal Reports as Data. London:
MIT Press.
Francombe, J. 2014. Perceptual Evaluation of Audio-on-Audio Interference in a Personal
Sound Zone System. PhD thesis, Guildford, UK: University of Surrey.
consumer sound 345
ITU-R. 1997. Methods for the Subjective Assessment of Small Impairments in Audio
Systems Including Multichannel Sound Systems. Recommendation BS.1116–1. International
Telecommunication Union.
ITU-R. 2001. Method for Objective Measurements of Perceived Audio Quality. International
Telecommunication Union.
ITU-R. 2014a. Advanced Sound System for Programme Production. Recommendation
BS.2051–0. International Telecommunication Union.
ITU-R. 2014b. Methods for Assessor Screening. Recommendation BS.2300–0. International
Telecommunication Union.
ITU-R. 2015. Method for the Subjective Assessment of Intermediate Quality Levels of Coding
Systems. Recommendation BS.1534–3. International Telecommunication Union.
Kaplanis, N., S. Bech, S. Tervo, J. Pätynen, T. Lokki, T. Waterschoot, et al. 2017a. A Rapid
Sensory Analysis Method for Perceptual Assessment of Automotive Audio. Journal of the
Audio Engineering Society 65 (1–2): 1–17.
Kaplanis, N., S. Bech, S. Tervo, J. Pätynen, T. Lokki, T. Waterschoot, et al. 2017b. Perceptual
Evaluation of Car Cabin Acoustics. Journal of the Acoustical Society of America 141 (2):
1459–146.
Kjörling, K., J. Rödén, M. Wolters, J. Riedmiller, A. Biswas, P. Ekstrand, et al. 2016. AC-4: The
Next Generation Audio Codec. In Audio Engineering Society 140th Convention. Paris.
Convention paper 9491. New York, NY: Audio Engineering Society.
Lawless, H. T., and H. Heymann. 1998. Sensory Evaluation of Food: Principles and Practices.
New York: Springer.
Lorho, G., G. Le Ray, and N. Zacharov. 2010. eGauge: A Measure of Assessor Expertise in
Audio Quality Evaluations. In Audio Engineering Society 38th International Conference:
Sound Quality Evaluation, 1–10. Piteå, Sweden., New York, NY: Audio Engineering
Society.
Martin, G., and S. Bech. 2005. Attribute Identification and Quantification in Automotive
Audio, Part 1: Introduction to the Descriptive Analysis Technique. In Audio Engineering
Society 118th Convention. Barcelona. Convention paper 6360. New York, NY: Audio
Engineering Society.
Mason, R., N. Ford, F. Rumsey, and B. De Bruyn. 2001. Verbal and Nonverbal Elicitation
Techniques in the Subjective Assessment of Spatial Sound Reproduction. Journal of the
Audio Engineering Society 49 (5): 366–84.
Meilgaard, M., G. V. Civille, and B. T. Carr. 1991. Sensory Evaluation Techniques. Florida: CRC
Press.
Moulin, S., S. Bech, and T. Stegenborg-Andersen. 2016. Sensory Profiling of High-End
Loudspeakers using Rapid Methods, Part 1: Baseline Experiment using Headphone
Reproduction. In 2016 Audio Engineering Society Conference on Headphone Technology.
Aalborg, Denmark. New York, NY: Audio Engineering Society.
Murray, J. M., C. M. Delahunty, and I. A. Baxter. 2001. Descriptive Sensory Analysis: Past,
Present and Future. Food Research International 34 (6): 461–71.
Næs, T., P. Brockhoff, and O. Tomić. 2010. Statistics for Sensory and Consumer Science.
Hoboken, NJ: Wiley.
Nijenhuis, M. 1993. Sampling and Interpolation of Static Images: A Perceptual View. PhD
thesis, Institute of Perception Research, Eindhoven University of Technology, The
Netherlands.
consumer sound 347
Zacharov, N., and K. Koivuniemi. 2001. Unravelling the Perception of Spatial Sound
Reproduction: Analysis and External Preference Mapping. In Audio Engineering Society 111th
Convention. New York. Convention paper 5423. New York, NY: Audio Engineering Society.
Zacharov, N., and G. Lorho. 2005. Sensory Analysis of Sound (in Telecommunications). In
European Sensory Network Conference. Madrid, Spain: European Sensory Network.
Zacharov, N., T. Pedersen, and C. Pike. 2016. A Common Lexicon for Spatial Sound Quality
Assessment: Latest Developments. In 2016 Eighth International Conference on Quality of
Multimedia Experience (QoMEX), 1–6. Lisbon, Portugal: QoMEX.
chapter 17
Cr e ati ng a Br a n d
I m age through M usic
Understanding the Psychological Mechanisms
behind Audio Branding
Hauke Egermann
Introduction
This quote was taken from a study report of a Scandinavian Music and Audio Branding
consulting agency. On the one hand, it describes the requirement to create meaningful
brands for successful marketing and, on the other, it emphasizes the multiple roles that
music is thought to play in this context: (1) music is said to create brand attention;
(2) music is said to create a positive-affective response in consumers; and (3) music can
presumably structure and influence the cognitive meaning dimensions of a brand image.
Accordingly, Jackson (2003) defines the professional practice of audio branding as the
creation of brand expressions in sound that depend on the consistent and strategic use of
these expressions in marketing communication (see also Gustafsson, volume 1, chapter 18).
350 HAUKE EGERMANN
They can have various compositional forms: audio logos that are often quite short
sequences of acoustical elements, longer jingles and brand songs, background sound tracks
and soundscapes, interaction sounds, and the typical brand voices (Krugmann 2007).
Potential touchpoints where a consumer experiences these elements could be advertise-
ments in media such as TV, radio, websites, or cinema but also corporate films, brand
events, or customer telephone lines.
As many of the audio branding elements have musical elements, these elements are
said to shape a long-term image of a brand. But how does this shaping work? How does
music function when it influences how a consumer imagines characteristics of a brand?
This chapter will present several theoretical and empirical accounts in order to under-
stand the psychological mechanisms at work when the imagination of a brand is influ-
enced by music. It will provide insights to the underlying functionality and effectiveness
of these practices that will ultimately be summarized in an integrative brand-music
communication model.
In the consumer-based brand equity model pyramid by Keller (2009), the first step in
developing a brand is to create brand salience. The use of branding helps to create aware-
ness and attention for a product and makes it possible to differentiate one product from
another similar product. When a brand has salience, an associated visual logo gains sign
qualities that refer to its product. Keller, furthermore, distinguishes brand performance
from brand imagery that result in judgments and feelings in consumers. While brand
performance is related to more functional aspects of the products (like quality, price,
service, or reliability), brand imagery is instead based on associative qualities like the
brand identity. If a brand has performance and imagery characteristics that are also
352 HAUKE EGERMANN
evaluated and responded to positively, the top of Keller’s pyramid is reached, which he
terms brand resonance: customers show loyalty with the brand and its product(s), and
this is accompanied by attachment, a sense of community, and engagement. Thus, estab-
lishing a brand image is thought to create a benefit to those who aim to market commercial
products and services: “According to this view, brand knowledge is not the facts
about the brand—it is all the thoughts, feelings, perceptions, images, experiences and
so on that become linked to the brand in the minds of consumers (individuals and
organizations)” (Keller 2009, 143).
But how is such a brand image created? Many authors relate it to the constant and
strategic planning and implementation of a brand identity. Accordingly, a brand image
is received and constructed by a consumer and can be seen to result from a brand identity
that was created by a sender (Kapferer 2012).
Brand identities share several similarities with the identities of human individuals
and social groups (Azoulay and Kapferer 2003). In this view, brand identities are con-
structed through human expressions that have led some authors to the conclusion that
consumers choose brands like they choose friends. Azoulay and Kapferer note, “human
individuals are perceived through their behaviour, and, in exactly the same way, con-
sumers can attribute a personality to a brand according to its perceived communication
and ‘behaviours’ ” (2003, 149). Furthermore, Aaker reports that consumers might even
view brands as their partners (1995). Therefore, in general, brands could have as many
characteristics as humans have. However, in consumer research and marketing practice,
several attributes have received more attention than others and hence seem to be the
most important: brand personality, brand values, and brand demographical-regional origin
(see also Burmann et al. 2003).
Brand personality and values have been described through several theoretical models.
In psychology, personality is often generally described as a construct that allows us to
explain individual differences in behavior, thought, and feelings that are stable and
coherent in humans (Mischel et al. 2004). It is often broken down into five different
facets consisting of: (1) openness to experience, (2) conscientiousness, (3) extraversion,
(4) agreeableness, and (5) neuroticism (also called the Five-Factor model, see Digman
1990). One widely used conceptualization of brand personality is that of Aaker (1997),
who describes it as a set of all human characteristics that can be associated with a brand.
These consist of the following five dimensions: sincerity, excitement, competence,
sophistication, and ruggedness (see Table 17.1). While it can be discussed whether all
these attributes can be considered personality features in a narrow sense (and show only
partially a similarity to the aforementioned Five-Factor model from psychology), it is
obvious that the same words could be used to describe humans. Furthermore, this
model is used in various marketing contexts and it has been empirically shown that
communicating brand personality characteristics create unique, congruent, and stronger
brand associations in consumers (Freling and Forbes 2005).
According to Schwartz (1992), there is a limited and fixed set of general, universal
human value types. These are based on universal human needs that manifest themselves
in behavioral orientations. Accordingly, “Values (1) are concepts or beliefs, (2) pertain to
Psychological Mechanisms behind Audio Branding 353
Table 17.1 Aaker’s Brand Personality Dimensions and Attributes (Aaker 1997)
Dimension Sincerity Excitement Competence Sophistication Ruggedness
desirable end states or behaviors, (3) transcend specific situations, (4) guide selection
or evaluation of behavior and events, and (5) are ordered by relative importance”
(Schwartz 1992, 4). Schwartz presented ten motivational types that can be used to group
values: universalism, benevolence, tradition, conformity, security, power, achievement,
hedonism, stimulation, and self-direction. Furthermore, he showed that this structure
of universal value types was found across different cultures. This list of value types has
subsequently been adopted to branding contexts, where some value types were found
not to be applicable (e.g., universalism, conformity, security) and others were added
(like aesthetics, ecology, or health; see Gaus et al. 2010). Allen (2002) showed that brands
that endorse human values that match those of consumers are preferred because of the
perceived product similarity to the consumers’ self-concepts. Accordingly, brands can
be used by consumers to express their self-identities.
Demographic-regional origin generally refers to the regional localization and
demographic context of a brand (Thakor and Kohli 1996). Different products and
product types are associated with different countries (e.g., alcoholic drinks like vodka
with Russia or whisky with Scotland) that evoke certain associative meaning patterns.
Furthermore, brand identities can also refer to certain demographic characteristics like
age, gender, or social status (Batra et al. 1993).
Brand Salience
The ability to identify and localize objects is an important function of our auditory
perception system. Changes in auditory streams have been shown to lead to an increase
in attention allocation that is accompanied by a short activation of the peripheral nervous
system (the so called orienting response; see Chuen et al. 2016). These findings imply
that dynamic music and sounds employed in branding lead to an increased awareness of
a brand. For instance, playing music at a point of sale might direct customers’ attentional
foci to the location of the sound source. The concept of musical fit might also play an
important role in direction attention. According to the congruence-associations frame-
work presented by Cohen (2001), music that is presented together with a visual narrative
will influence how the narrative is perceived. While this theory was originally developed
to explain the effects of music in film, it could also be applied to advertising. It was
shown that those aspects of a visual narrative that are congruent to the music will
likely be in the focus of a perceiver’s attention (Marshall and Cohen 1988). Furthermore,
the associative-emotional meaning of the music will then be attributed to this focus of
visual attention. Thus, presenting music in an audiovisual commercial that structurally
or semantically fits a visually presented brand identity will lead to an increased attention
for the brand.
Like visual logos, audio logos help to memorize and identify a brand. The constant
presentation of musical elements together with a product can lead to a long-term memory
representation that enables brand recognition and recall. According to Keller (2009),
brand recognition refers to a situation in which a consumer is able to confirm prior
exposure to a brand when presented with a related brand cue (e.g., a visual logo). Brand
recall describes a situation where a consumer recalls a brand when only a product cate-
gory is primed. In audio branding, a consumer learns to associate musical/acoustical
elements (audio logo) with a brand, and subsequent exposure to the logo will activate
the mental representation for the brand and product. In a telephone survey that fol-
lowed the presentation of a nine-month automobile advertising campaign, Stewart and
colleagues (1990) observed that 83 percent of respondents recalled seeing the advertisement
when presented with a short musical excerpt that was used in the advert, whereas only
62 percent remembered seeing the advert when presented with the product name. Thus,
the musical cue was more sensitive than the verbal cues and resulted in stronger activations
of the mental network that represent the brand and its advert (brand recall). Audio logos
are almost always quite short, making them easy to memorize, and often use the melodic
elements of pitch and rhythm. Employing these musical features, audio logos can be
presented with varying timbres while preserving their original identity, which then
enables brand recognition. Related to this, a study by Bonde and Hansen (2013) implies
that pitch information is more perceptually relevant than rhythm information in audio
logo recognition. In a statistical analysis of musical features of radio station singles and
audio logos, we found that they were on average four notes long (range 3–9 notes) which
is likely to match the capacity of the short-term memory (Muellensiefen et al. 2015).
Psychological Mechanisms behind Audio Branding 355
Summarized together, previous research indicates that musical elements and sounds
presented together with brands are able to create brand awareness and brand memora-
bility. In this way, they contribute to brand salience, especially when the music used fits
(visual) brand qualities.
partial similarities between musical expressions and walking sounds (Giordano et al.
2014). Hearing action sounds can lead to an understanding of associated actions through
activations of mirror neurons (Kohler et al. 2002). This coupling is thought to be based
on Hebbian learning that may have the capacity to bind perceptions, actions, and emo-
tional expressions together (Keysers and Gazzola 2009). This perspective on emotion
emphasizes the importance of the behavioral response component of emotion as described
by Scherer (2005). Accordingly, the main function of emotional responding can be
described as coordinating approach and avoidance behavior. Thus, motion and emotion
are strongly linked. Expressing and recognizing emotion through movement sounds
seems to be a general human capacity that could also apply to emotion expression and
recognition in music. This leads to the hypothesis that music might sound emotional
to us because it sounds like someone is moving in an emotionally expressive way.
Expressive movement characteristics of music that are presented together with a brand
might influence how a listener will perceive the identity of a brand. Yet, how does
expressing and recognizing emotion in music lead to the induction of an emotional
response in a listener? The following section presents several theoretical and empirical
accounts that try to explain why music creates emotions that we attribute to ourselves.
Västfjäll 2008). These experiences are thought to be based on the nonlinear mapping
between features of the musical structures and image schemata (Lakoff 1987; Johnson
1987; for a use of visual imagery in music therapy, see also Bonde, this volume, chapter 21).
This mechanism has yet to be studied experimentally in showing that these mental images
are the course of an emotional response to music. The only study that I have found
that explicitly states that it investigates this mechanism has been published by Vuoskoski
and Eerola (2013). The authors reported that a sad narrative, that was read before
listening to a piece of music, intensified the sadness experienced by participants.
It was then concluded that, during listening, visual images of that narrative were experi-
enced by participants. However, in contrast to what was originally stated by Juslin
and Västfjäll (2008), in this case, it was not the music that brought up emotional images,
but the narrative.
In the process of socialization, music listeners use music as a tool for social identity
formation. During social bonding processes, music preferences are often topics of
conversations (Rentfrow and Gosling 2006). The more similar music preference profiles
for two people are, the more likely these two people will bond (Boer et al. 2011). Here,
musical genres are especially associated with certain human characteristics. According
to North and Hargreaves (1999), adolescents use music as a “badge” for their social iden-
tity that communicates something about their self-concepts (see also Lamont, volume 1,
chapter 12). For example, listening to indie, classical, or pop music is associated with
several typical personal qualities and attributes. The study of North and Hargreaves has
stimulated several other investigations into the stereotypical knowledge structures that
are associated with fans and performers of different music genres (Table 17.2). Here, it
was shown, that these people were usually linked to certain demographics (e.g., age,
education, sex), values, personality traits, ethnicities, clothing styles, and various other
personal qualities (e.g., attractiveness, trustworthiness, or friendliness).
While music genres seem to be socially constructed phenomena, they can also be
described as cognitive musical schemata (Huron 2006). Genres consist of typical melodic,
rhythmic, and harmonic features and instrumental arrangements. Therefore, employing
these particular musical features in branding contexts will elicit particular genre-relevant
associations in listeners. Fischer (2009) showed, for example, that the same melodic
fragment presented on different instruments led to different typical value associations.
Tradition was positively related to the melody being performed on an accordion, an
oboe, and a violin, and negatively to a synthesizer and guitar. On the other side, hedonism
was associated with a guitar and a synthesizer but not an oboe, violin, or accordion.
Furthermore, trumpets and violins were highly associated with power.
Demographics X X X
Values X X X X
Personality X X X X
Ethnicity X X X
Clothing X X
Intellect/ X X X
expertise
Other personal X X X X
qualities
362 HAUKE EGERMANN
In Egermann and Stiegler (2014) we showed that traditional instrumental pieces from
different European countries are more or less correctly associated with their country of
origin in an online listening test. While participants were not able to correctly identify
music from northern Italy or Sweden, Spanish flamenco music was correctly identified
by nearly all participants in a recognition paradigm (where participants were given the
names of different European countries to choose from). In a free open recall version of
the study where participants were asked to list all music-evoked words, again, around
85 percent of participants reported an association with the country Spain. In a second part
of this study, we showed that some music excerpts that were chosen to represent music
styles that were popular in different decades of the twentieth century were able to induce
correct time/decade associations in the listeners. Here, we observed that especially those
styles that were popular during the participant’s adolescent years were most effective.
Taken together, these results indicate, that music can activate shared meaning structures
that could be used for communication purposes (see also Shevy 2008). However, the
success of these measures depends on the similarity of interindividual, extra-musical
association networks and the strength of the learning of associations between music and
other features (exemplified by the lower recognition rate of some countries and decades).
Thus, when creating or selecting music to communicate specific, extra-musical meaning,
as done in audio branding practice, a detailed knowledge about listeners seems to be just
as crucial as the design of the stimuli themselves.
An Integrated Brand-Music
Communication Model
The previously reported theoretical and empirical reports can be summarized in the
following hypothetical model (see Figure 17.1). It presents a simplified communication
process, where a company aims to create a brand image in its customers by expressing
its brand identity using music. Here, three different functions of music are identified.
Music is thought to create salience through creating attention and establishing an addi-
tional memory representation for the brand (1. Brand Salience). Furthermore, through
shared knowledge about cognitive human attributes related to certain musical character
istics, music communicates brand values, brand personality, and many other concepts
(2. Cognitive Meaning). The characteristics associated with the social group behind cer-
tain music genres consisting of performers and listeners are here used as a tool to elicit
relevant social associations when music is chosen or produced. When brand identities
try to form brand images through music, consumers will process its social-referential
meaning with the same mental social capacity as the one they usually employ for person
perception. Furthermore, in addition to being able to express emotions that are recog-
nized by a listener (again, probably due to its similarities with typically human expressive
sounds), music is also able to evoke and induce emotion (3. Emotional Meaning). Through
Psychological Mechanisms behind Audio Branding 363
Communication Process
1. Salience
2. Emotional meaning
3. Cognitive meaning
conditioning, music might become an unconditioned stimulus that projects its emotional
and cognitive meaning to an original brand without meaning. All three functions
(providing salience and cognitive and emotional meaning) are improved when the
human attributes evoked by brands and music are semantically similar and “fit” (North
and Hargreaves 2008).
While many of the reported relationships have been studied separately, there are still
no studies that test the entire communication process from the conception of a brand
identity to the achievement of a brand image in a consumer through the use of music.
In many studies, music was chosen that had certain qualities that are relevant in this
context (being salient, emotional, or associated with cognitive concepts). Nevertheless,
few studies have focused on studying the emergence of these qualities in a branding con-
text. Therefore, this model remains speculative in that its components have not been
tested in their independent functionality. However, the anecdotal evidence reported by
audio branding practitioners (Lusensky 2008), who in their daily work influence how
consumers imagine brands to be like, is quite striking.
References
Aaker, J. L. 1997. Dimensions of Brand Personality. Journal of Marketing Research
24: 347–356.
Aaker, J. L., S. Fournier, D. E. Allen, and J. Olson. 1995. A Brand as a Character, a Partner and
a Person: Three Perspectives on the Question of Brand Personality. Advances in Consumer
Research 22: 391–395.
Allen, M. 2002. Human Values and Product Symbolism: Do Consumers Form Product
Preference by Comparing the Human Values Symbolized by a Product to the Human
Values That They Endorse?. Journal of Applied Social Psychology 32 (12): 2475–2501.
364 HAUKE EGERMANN
Fritz, T., S. Jentschke, N. Gosselin, D. Sammler, I. Peretz, R. Turner, A. D., et al. 2009. Universal
Recognition of Three Basic Emotions in Music. Current Biology 19 (7): 573–576. http://doi.
org/10.1016/j.cub.2009.02.058.
Gabrielsson, A. 2002. Emotion Perceived and Emotion Felt: Same or Different. Musicae
Scientiae (Special Issue 2001–2002): 123–145.
Gaus, H., S. Jahn, T. Kiessling, and J. Drengner. 2010. How to Measure Brand Values? Advances
in Consumer Research 37: 1–2.
Giordano, B. L., H. Egermann, and R. Bresin. 2014. The Production and Perception of
Emotionally Expressive Walking Sounds: Similarities between Musical Performance and
Everyday Motor Activity. PLoS One 9 (12): e115587. doi:10.1371/journal.pone.0115587.
Gorn, G. J. 1982. The Effects of Music in Advertising on Choice Behavior: A Classical
Conditioning Approach. Journal of Marketing 46 (1): 94–101.
Hirschman, E. C., and M. B. Holbrook. 1982. Hedonic Consumption: Emerging Concepts,
Methods and Propositions. Journal of Marketing 46 (3): 92–101.
Hunter, P. G., G. Schellenberg, and U. Schimmack. 2010. Feelings and Perceptions of
Happiness and Sadness Induced by Music: Similarities, Differences, and Mixed Emotions.
Psychology of Aesthetics, Creativity, and the Arts 4 (1): 47–56. http://doi.org/10.1037/
a0016873.
Huron, D. 2006. Sweet Anticipation. Cambridge, MA: MIT Press.
Jackson, D. 2003. Sonic Branding: An Essential Guide to the Art and Science of Sonic Branding.
New York: Palgrave Macmillan.
Janata, P., S. T. Tomic, and S. K. Rakowski. 2007. Characterisation of Music-Evoked Autobio
graphical Memories. Memory 15 (8): 845–860. http://doi.org/10.1080/09658210701734593.
Johnson, M. 1987. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason.
Chicago: University of Chicago.
Juslin, P. N., G. Barradas, and T. Eerola. 2015. From Sound to Significance: Exploring the
Mechanisms Underlying Emotional Reactions to Music. American Journal of Psychology
128 (3): 281–304.
Juslin, P. N., J. Karlsson, E. Lindström, A. Friberg, and E. Schoonderwaldt. 2006. Play It Again
with Feeling: Computer Feedback in Musical Communication of Emotions. Journal of
Experimental Psychology: Applied 12: 79–95. doi:10.1037/1076-898X.12.2.79.
Juslin, P. N., and P. Laukka. 2003. Communication of Emotions in Vocal Expression and
Music Performance: Different Channels, Same Code? Psychological Bulletin 129: 770–814.
Juslin, P. N., S. Liljeström, D. Västfjäll, and L.-O. Lundqvist. 2010. How Does Music Evoke
Emotions? Exploring the Underlying Mechanisms. In Handbook of Music and Emotion:
Theory, Research, Applications, edited by P. N. Juslin and J. A. Sloboda, 605–643. Oxford:
Oxford University Press. http://doi.org/10.1093/acprof:oso/9780199230143.003.0022.
Juslin, P. N., and D. Västfjäll. 2008. Emotional Responses to Music: The Need to Consider
Underlying Mechanisms. Behavioral and Brain Sciences 31 (5): 559–575; discussion 575–621.
http://doi.org/10.1017/S0140525X08005293.
Kallinen, K., and N. Ravaja. 2006. Emotion Perceived and Emotion Felt: Same and Different.
Musicae Scientiae 10 (2): 191–213.
Kapferer, J. 2012. The New Strategic Brand Management: Advanced Insights and Strategic
Thinking (5th ed.). London: Kogan Page.
Keller, K. L. 2009. Building Strong Brands in a Modern Marketing Communications
Environment. Journal of Marketing Communications 15 (2–3): 139–155. http://doi.org/
10.1080/13527260902757530.
366 HAUKE EGERMANN
Keysers, C., and V. Gazzola. 2009. Expanding the Mirror: Vicarious Activity for Actions,
Emotions, and Sensations. Current Opinion in Neurobiology 19: 666–671. doi:10.1016/j.conb.
2009.10.006.
Khalfa, S., M. Roy, P. Rainville, S. Dalla Bella, and I. Peretz. 2008. Role of Tempo Entrainment
in Psychophysiological Differentiation of Happy and Sad Music? International Journal of
Psychophysiology 68 (1): 17–26. http://doi.org/10.1016/j.ijpsycho.2007.12.001.
Koelsch, S. 2011. Towards a Neural Basis of Processing Musical Semantics. Physics of Life
Reviews 8 (2): 89–105. http://doi.org/10.1016/j.plrev.2011.04.004.
Kohler, E., C. Keysers, M. A. Umiltà, L. Fogassi, V. Gallese, and G. Rizzolatti. 2002. Hearing
Sounds, Understanding Actions: Action Representation in Mirror Neurons. Science 297
(5582): 846–848. http://doi.org/10.1126/science.1070311.
Kristen, S., and M. Shevy. 2013. A Comparison of German and American Listeners’ Extra
Musical Associations with Popular Music Genres. Psychology of Music 41 (6): 764–778.
http://doi.org/10.1177/0305735612451785.
Krugmann, D. 2007. Integration akustischer Reize in die identitätsbasierte Markenführung.
LiM-Arbeitspapiere No. 27. Bremen, Germany.
Labbe, C., and D. Grandjean. 2014. Musical Emotions Predicted by Feelings of Entrainment.
Music Perception 32 (2): 170–185. http://doi.org/10.1525/mp.2014.32.2.170.
Lakoff, G. 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind.
Chicago: University of Chicago Press.
Lamont, A., and A. Greasley. 2012. Musical Preferences. Oxford Handbooks Online. http://
www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199298457.001.0001/oxfordhb-
9780199298457-e-015. Accessed April 7, 2017.
Lantos, G. P., and L. G. Craton. 2012. A Model of Consumer Response to Advertising Music.
Journal of Consumer Marketing 29 (1): 22–42. http://doi.org/10.1108/07363761211193028.
Lusensky, J. 2008. Sounds Like Branding. Heartbeats International. http://www.soundslike-
branding.com/pdf/slb_digital.pdf. Accessed May 7, 2016.
MacInnis, D. J., and C. W. Park. 1991. The Differential Role of Characteristics of Music on
High- and Low-involvement Consumers’ Processing of Ads. Journal of Consumer Research
18: 161–173.
Marshall, S. K., and A. J. Cohen. 1988. Effects of Musical Soundtracks on Attitudes toward
Animated Geometric Figures. Music Perception 6 (1): 95–112.
Mischel, W., Y. Shoda, and O. Ayduk. 2004. Introduction to Personality: Toward an Integration.
New York: John Wiley & Sons.
Muellensiefen, D., H. Egermann, S. Burrows. 2015. Radio Station Jingles: How Statistical
Learning Applies to a Special Genre of Audio Logos. In Audio Branding Yearbook 2014–2015,
edited by K. Bronner, R. Hirt, and C. Ringe, 53–72. Baden-Baden, Germany: Nomos.
Nagel, F., R. Kopiez, and O. Grewe. 2008. Psychoacoustical Correlates of Musically Induced
Chills. Musicae Scientiae 12 (1): 101–113.
North, A., and D. J. Hargreaves. 1995. Subjective Complexity, Familiarity, and Liking for
Popular Music. Psychomusicology 14: 77–93.
North, A., and D. Hargreaves. 1999. Music and Adolescent Identity. Music Education Research
1 (1): 75–92. http://doi.org/10.1080/1461380990010107.
North, A. C., and D. J. Hargreaves. 2008. The Social and Applied Psychology of Music. Oxford:
Oxford University Press.
Pearce, M. T., and G. A. Wiggins. 2006. Expectation in Melody: The Influence of Context and
Learning. Music Perception 23 (5): 377–405. http://doi.org/10.1525/mp.2006.23.5.377.
Psychological Mechanisms behind Audio Branding 367
Peirce, C. S. 1994. Elements of Logic. In The Collected Papers of Charles Sanders Peirce. Electronic
Edition, Vol. 2, edited by C. Hartshorne and P. Weiss. Charlottesville, NC: InteLex Corp.
Petty, R. E., J. T. Cacioppo, and D. T. Schumann. 1983. Central and Peripheral Routes to
Advertising Effectiveness: The Moderating Effect of Involvement, Journal of Consumer
Research 10: 135–146.
Rentfrow, P. J., S. D. Gosling. 2006. Message in a Ballad the Role of Music Preferences in
Interpersonal Perception. Psychological Science 17 (3): 236–242.
Rentfrow, P. J., and S. D. Gosling. 2007. The Content and Validity of Music-Genre Stereotypes
among College Students. Psychology of Music 35 (2): 306–326.
Rentfrow, P. J., J. A. Mcdonald, and J. A. Oldmeadow. 2009. You Are What You Listen To:
Young People’s Stereotypes about Music Fans. Group Processes and Intergroup Relations
12 (3): 329–344. http://doi.org/10.1177/1368430209102845.
Scherer, K. 1999. Appraisal Theory. In Handbook of Cognition and Emotion, edited by
T. Dalgleish and M. Power. 637–663. Chichester, UK: Wiley.
Scherer, K. R. 2005. What Are Emotions? And How Can They Be Measured? Social Science
Information 44 (4): 695–729.
Scherer, K. R., and M. R. Zentner. 2001. Emotional Effects of Music: Production Rules. In
Music and Emotion: Theory and Research, edited by P. N. Juslin and J. A. Sloboda, 361–392.
Oxford: Oxford University Press.
Scherer, K. R., and E. Coutinho. 2013. How Music Creates Emotion: A Multifactorial Process
Approach. In The Emotional Power of Music Multidisciplinary Perspectives on Musical
Arousal, Expression, and Social Control, edited by T. Cochrane, B. Fantini, and K. R. Scherer.
Oxford: Oxford University Press.
Schwartz, S. H. 1992. Universals in the Content and Structure of Values: Theoretical Advances
and Empirical Tests in 20 Countries. In Advances in Experimental Social Psychology, Vol. 25,
edited by M. Zanna, 1–65. Orlando, FL: Academic Press.
Shevy, M. 2008. Music Genre as Cognitive Schema: Extramusical Associations with
Country and Hip-Hop Music. Psychology of Music 36 (4): 477–498. http://doi.org/10.1177/
0305735608089384.
Steinbeis, N., S. Koelsch, and J. A. Sloboda. 2006. The Role of Harmonic Expectancy Violations
in Musical Emotions: Evidence from Subjective, Physiological, and Neural Responses.
Journal of Cognitive Neuroscience 18 (8): 1380–1393.
Stewart, D. W., K. M. Farmer, and C. I. Stannard. 1990. Music as a Recognition Cue in
Advertising-Tracking Studies. Journal of Advertising Research (September): 30 (4) 39–48.
Thakor, M. V., and C. S. Kohli. 1996. Brand Origin: Conceptualization and Review. Journal of
Consumer Marketing, 13 (3): 27–42.
Vermeulen, I., T. Hartmann, and A.-M. Welling. 2011. The Chill Factor: Improving Ad
Responses by Employing Chill-Inducing Background Music. Proceedings of the 61th Annual
Conference of the International Communication Association (ICA), May 26–30, Boston, MA.
Vuoskoski, J. K., and T. Eerola. 2013. Extramusical Information Contributes to Emotions
Induced by Music. Psychology of Music 43 (2): 262–274. http://doi.org/10.1177/0305735613502373.
Watt, R. J., and R. L. Ash. 1998. A Psychological Investigation of Meaning in Music. Musicae
Scientiae 2 (1): 33–53. http://doi.org/10.1177/102986499800200103.
Zander, M. F. 2006. Musical Influences in Advertising: How Music Modifies First Impressions
of Product Endorsers and Brands. Psychology of Music 34 (4): 465–480. http://doi.org/
10.1177/0305735606067158.
chapter 18
Sou n d a n d Emotion
Introduction
Auditory stimuli have a great potential to evoke emotions in people (Armony and
LeDoux 2010; Tajadura-Jiménez 2008). The auditory system scans our surrounding
environment, detects and identifies significant objects and events, and signals for attention
shifts when necessary (Juslin and Västfjäll 2008). It can also orient the visual system to
a particular region of interest (Arnott and Alain 2011). Critically, it has been shown
that the auditory system takes the behavioral state of the organism (i.e., emotional, moti-
vational, and attentional) into account while processing auditory stimuli (Weinberger
2010). On the other hand, emotions work in concert with perceptual processes. They
can guide us to establish our motivation and preferences about objects, events, and
places (Lang and Bradley 2010) and can call for rapid mobilization for action when
necessary (Frijda 2008). Here, we present evidence documenting the interplay between
auditory and emotional processes.
Moreover, imagination, in this chapter, is broadly taken as mental representations
that are induced by sounds; and we focus on the impact of these mental representations
on the affective experience during sound perception. These imaginations could be very
different depending on the context, the listener’s condition, and the sound itself.
We make an overall classification of these mental representations from the perspective
of the distinction between musical listening and everyday listening (Gaver 1993). The
distinction comes from the application of an ecological approach to sound perception
(Clarke 2005; Neuhoff 2004). Imagine that you are walking by the pier and you hear a
sound. If you pay attention to the sound, you may focus on its perceptual features like
loudness, pitch, and timbre and how these features evolve in time. On the other hand,
you might just notice that you hear the sound of a passing boat, and your attention will
be on the source of the sound. The former is an example of musical listening, while the
latter exemplifies everyday listening. Note also that the distinction between everyday
and musical listening does not suggest that all musical sounds are received in musical
listening mode and vice versa.
In the following, we first start with basic properties of the auditory system, and
present a view of the auditory system as an adaptive and cognitive network that special-
izes in processing acoustic stimulus features while integrating behavioral state of the
organism to its processing. This forms the biological and behavioral basis for our main
argument that affective experience is one of the main parts of sound perception. Next, in
order to show the tight connections between auditory and affective processes, we focus
on affective responses to auditory stimuli, reviewing empirical evidence from behavioral
and neuroimaging studies. We present the subject in three different sections: responses
to learned emotional meaning of sounds, responses to vocal signals, and responses to
music. In doing so, we also attempt to make clear how these different sources of stimuli
induce affective reactions in us and how we respond to them. We also discuss how
the mental representations evoked by sounds influence affective reactions to auditory
stimuli. Then, we present evidence on how the affective significance of sounds can
influence perception and attention. Finally, we will bring all this together and underline
the main argument of this chapter that the affective experience is an integral part of
sound perception.
Sound and Emotion 371
Sound perception is a fundamental part of our interactions with and experience of the
external environment. We receive a continuous flow of auditory stimulation from our
surroundings, and the auditory system makes sense of this input. It has been suggested
that the auditory system has evolved as an alarm system that scans our surroundings,
detects salient events in it, and signals for attention shifts to prioritized targets (Juslin
and Västfjäll 2008).
The reception of sound starts at the ear, which is a specialized organ in sensing local
pressure fluctuations. Sound waves travel through the ear canal and set the ear drum in
motion, which in turn sets the three bones in the middle ear vibrating. Their function is
to amplify the mechanical oscillations and transmit them to the inner ear. These oscil-
lations travel through the fluid in the cochlear canals and set the basilar membrane in
motion. The hair cells in the cochlea generate action potentials depending on the basilar
membrane motion. Hence, in this manner, acoustic signals are converted to neural sig-
nals that travel from the auditory nerve to the central nervous system. On this auditory
pathway, substantial information processing takes place in the brain stem and several
midbrain stations. The information generated in these structures is sent to the thalamus,
which is a relay station that collects signals from the periphery and passes it to the sen-
sory cortices. The primary auditory cortex (A1) is located in the superior part of the
temporal lobe of the brain, and the adjoining areas are referred to as the auditory belt
areas (Woods et al. 2009). Neurons in the A1 have higher sensitivity to acoustic stimulus
features compared to the belt areas, whereas the belt areas show a greater attentional
modulation than A1 neurons do (Woods et al. 2010). Neurons in the auditory pathway
have preferred frequency regions that they respond to; and in most of the auditory areas
there is tonotopic organization: an orderly correspondence between the location of the
neurons and their specific frequency tuning (for detailed information on the auditory
system, see Moore 2012; Rees and Palmer 2010).
with the frequency spectrum of acoustic signals. Pitch perception arises from tonality,
periodicity, and harmonicity. Hence, both the temporal and spectral aspects contribute
to pitch perception (for more detailed accounts of pitch and loudness, see Fastl and
Zwicker 2007; Moore 2012; Wang and Bendor 2010; Young 2010). Two sounds can have
both the same loudness and pitch, yet could sound completely different from one
another. To exemplify, consider two different instruments playing exactly the same tone
at the same loudness. Timbre is the perceptual quality that accounts for the differences
between the two instruments. It is a multidimensional feature; that is, it arises from vari-
ous aspects of acoustic signals (e.g., transients, relative strength of harmonics).
Auditory stimuli also provide spatial information. Localizing sound sources in space
is a computationally challenging task, since the auditory system, unlike the visual system,
seems to lack a topographical space representation. Spatial cues have to be computed from
the signals that reach the respective ears. Intensity and arrival time differences between
the respective ears provide cues for sound localization (Blauert 1997). Interaural time
difference (ITD) is the main cue for the perceived azimuth of low-frequency sounds
(below approximately 1.5 kHz; Hartmann et al. 2013), while interaural level difference
(ILD) seems to be more useful for high-frequency signals (above 2 kHz). Apart from
these binaural cues, humans also employ monaural cues to extract auditory spatial
information. Here, the auditory system makes use of the spectral modulations of the
incoming sound that are caused by the shape of the outer ear and the incoming angle of
the sound waves (Blauert 1997). Although monaural cues are highly frequency depend-
ent, they can be useful for localizing sounds in the median plane (e.g., front vs. back).
It seems that the neural processing of the auditory spatial information (ILDs and ITDs)
already starts at the brainstem level (see Ahveninen et al. 2014; Yin and Kuwada 2010).
While the role of the auditory cortex on spatial processing is not clear, recent research
has led to a two-channel model (hemifield code), in which two neuronal populations are
broadly tuned to the left or the right side of the auditory space (Stecker et al. 2005).
According to the hemifield code the joint activity of these two populations leads to the
azimuth perception.
Research on auditory attention has indicated that the attentional modulation of the
auditory cortex could facilitate the processing of behaviorally relevant sounds (Petkov
et al. 2004). The auditory cortex shows both learning-induced (Ohl and Scheich 2005) and
attention-driven plasticity (i.e., changes in the neural responses due to factors like moti-
vation, learning, stimulus-statistics, etc.; see Ahveninen et al. 2011). It can also acquire
specific memory traces (Weinberger 2004) and adapt to the changing nature of auditory
environments (Dahmen et al. 2010). Spatial sensitivity of the auditory cortex is enhanced
by engaging auditory (Lee and Middlebrooks 2011) or visual spatial tasks (Salminen
et al. 2013). Furthermore, auditory brain stem responses can be modulated by working
memory load (Sörqvist et al. 2012) and selective attention (Lehmann and Schönweisner
2014). Taken together, these findings indicate that the processing of auditory stimuli is
dynamic, adapts to changing environments, and is optimized to process behaviorally
significant stimuli. The adaptive capacity of the auditory system suggests that the audi-
tory cortex is not a mere acoustic analysis center. It has been argued that the auditory
cortex can integrate higher-order, nonauditory input (e.g., motivation, attention, motor
function) into its processing (Weinberger 2010). Apart from the cortex, studies on the
inferior colliculus (IC—a hub for the construction of a higher-order auditory percept)
in the auditory midbrain shows that neural activity in the IC is sensitive to factors such
as eye movements, learning-induced plasticity, motivation, emotion, and task engage-
ment (Bajo et al. 2010; Gruthers and Groh 2012; Malmierca 2005; Marsh et al. 2002).
Furthermore, connectional analyses (mainly of the cat and the primate brain) indicate
that the auditory network shows a unique architecture with its corticocortical, thalamo-
cortical, and corticocollicular connections (Read et al. 2002; Winer and Lee 2007).
Taken together, the behavioral and functional evidence presented in this section sug-
gests that the auditory network is specialized in processing acoustic stimulus features as
its main input; and it also makes use of the information on the behavioral state of the
organism during auditory processing.
Emotional Responses
to Auditory Stimuli
How does sound induce emotions? In this section, we discuss the affective experience
induced by various auditory stimuli such as environmental sounds, vocalizations,
and music. The main aim is to present the close relationship between auditory and
affective processes.
In her work on auditory-induced emotions during everyday listening, Tajadura-
Jiménez (2008; Tajadura-Jiménez and Västfjäll 2008) suggested four general con
tributing factors to the affective experience induced by auditory stimuli: physical, spatial,
cross-modal, and psychological. The physical factors are related to acoustical features of
sounds (such as loudness, pitch, duration, transients, etc.) causing affective reactions in
people. In basic psychoacoustic research, the effects of physical features on sound
374 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL
perception are generally studied using tone and noise complexes that do not possess
semantic content or a particular sound source (Fastl and Zwicker 2007). The perceived
loudness and sharpness (i.e., high/low frequency balance) of such tone and noise com-
plexes can be related to the affective reactions they induce (Västfjäll 2012). In music, for
instance, sounds that feature dissonant, loud, sudden, or fast temporal components can
induce physiological arousal and negative affect in listeners (Juslin and Västfjäll 2008).
Auditory stimuli also provide spatial information regarding both the spaces we occupy
and objects in our surroundings (i.e., their location and motion with respect to our
bodies), and this spatial information can also possess affective quality (Asutay and
Västfjäll 2015a). Previous research has found behavioral, neural, and emotional biases
in favor of approaching sound sources compared to receding sound sources (Hsee et al.
2014; Maier and Ghazanfar 2007; Seifritz et al. 2002; Tajadura-Jiménez, Väljamäe, et al.
2010). In particular, approaching sounds are found to be more emotional and behaviorally
salient than receding sounds. The impact of sound source distance and location together
with room size on affective responses has also been studied in the context of everyday
listening (Tajadura-Jiménez, Larsson, et al. 2010). Cross-modal factors in auditory-
induced emotion are related to the role of the information that we gather from other
modalities. This happens when affective information we receive from one modality
influences processing in another (Gerdes et al. 2014). Finally, the psychological factors
that influence emotional reactions to auditory stimuli are related to specific meaning
and interpretation of a sound, and associations evoked by a sound and/or its source. In
everyday listening, these factors are related to sound source identification and semantic
content (Tajadura-Jiménez 2008).
environmental sounds (mainly related to the sound source; Asutay et al. 2012). We used
a Fourier-time-transform algorithm that performs spectral broadening to reduce the
identifiability of sounds, while preserving temporal and spectral variation. The results
indicated that emotional reactions to environmental sounds were mostly defined by the
meaning attributed to the sound source by the listener. In other words, when par
ticipants could not identify the source of a particular sound, strong affective reactions
induced by the same sound were mostly eliminated.
Mental representations evoked by music can be qualitatively very different in com-
parison with environmental sounds. For instance, it has been suggested that music can
trigger visual imagery, that is, when the listener conjures up visual images (Juslin and
Västfjäll 2008). Visual imagery is defined as a quasiperceptual experience that resembles
an actual perceptual experience but occurs in the absence of visual stimuli. The exact
nature of how music evokes mental images remains to be determined. It seems that
listeners conceptualize the musical structure using a metaphorical nonverbal mapping
between the music and an “image-schemata” that is grounded in bodily experience
(Lakoff and Johnson 1980). Visual imagery evoked by musical stimuli can be a part of
the affective experience induced by music (Juslin and Västfjäll 2008). Moreover, mental
images evoked by music can also occur in connection with memory, where certain
musical stimuli trigger a specific memory of a particular event; this process also influ-
ences affective reactions to music. Another line of somewhat relevant research comes
from auditory imagery studies where researchers have studied the nature of auditory
imagery in the absence of auditory stimulation (for detailed accounts, see Hubbard 2010;
Zatorre and Halpern 2005). Although this research is far from definitive, it has been
found that auditory imagery preserves many structural and temporal properties of
sounds and that it involves many of the same brain areas as auditory perception.
Vocal Affect
Humans and most animals use vocalizations to communicate with their conspecifics.
Vocal acoustics is valuable for communication between individuals regarding important
events that may arise in their environment, for example, presence of a predator or a food
supply. Vocalizations, together with facial expressions, are also important for inferring
the emotional state of the speaker. The ability of an individual to successfully interpret the
emotional state of the speaker can be crucial for survival in certain situations, and it is
Sound and Emotion 377
critical for social interactions. Unlike other animals, humans also have language to rely
on in their communications. Speech signals can carry emotional information not only
through semantic content but also through intonation—that is, prosody.
Even though there is some conflicting evidence, recent brain imaging studies have
found increased amygdala activity for emotional in contrast to neutral vocalizations
(Fecteau et al. 2007; Sander and Scheich 2005; Sander et al. 2003; Wiethoff et al. 2009).
Other brain areas involved in emotional processing of vocal information are the temporal
(superior temporal sulcus [STS] and superior temporal gyrus [STG]) and frontal regions
(orbitofrontal cortex [OFC] and inferior temporal gyrus [IFG]). The STS has been
shown to respond to the human voice regardless of linguistic content (Belin et al. 2000).
The auditory areas along the middle and superior temporal cortex (e.g., STS and STG)
are sensitive to the emotional content in vocal signals, and their activation does not
seem to depend on attentional focus or task demands (Brück et al. 2013; Grandjean and
Frühholz 2013). On the other hand, frontal regions (e.g., OCF and IFG) seem to be
involved in emotional processing of vocal signals in a context-dependent (i.e., attentional
and task demands) fashion. Hence, models of emotional processing of vocalizations and
prosody suggest that affective processing of vocal signals takes place in regions within
the STS and STG (some models have proposed that facial expressions are also integrated
into this processing, e.g., Brück et al. 2013). The outcomes of this processing are made
accessible for higher-order cognitive processes that take place in frontal regions (Brück
et al. 2013; Grandjean and Frühholz 2013; Kotz et al. 2013; Schirmer and Kotz 2006).
Most research concerning emotional vocalizations approach the subject from the
perspective of successful decoding of the affective state of the speaker. This is mainly due
to the understanding that the main function of vocalizations is to inform the receiver
about the speaker’s emotions. However, other researchers argue that the primary function
of vocalizations is to induce emotions in the receiver (Bachorowski and Owren 2008;
Owren and Rendall 2001; Russell et al. 2003). According to this framework (known as
the affect-induction account of vocalizations, see Bachorowski and Owren 2008), the
primary function of vocal signaling is not to inform the receiver about the speaker’s
affect, even though vocalizations usually arise from the speaker’s emotions. Listeners
can clearly make inferences regarding the affective state of speakers, but this is a secondary
outcome. However, the primary outcome is that affective vocal signals induce emotional
reactions in listeners in order to modulate their behavior, depending on the context in
which the vocalizations occur and the listener’s prior experience with such signals.
Hence, vocal signals are not merely displays of the speaker’s emotions. Instead, they are
tools of social influence. The affect-induction account began with research on the func-
tions of primate calling (Owren and Rendall 2001), later applied to specific human emo-
tional vocalizations such as laughter (Owren et al. 2013). In connection to this account,
it has been argued that infant crying has a function of increasing caregiver arousal
(Zeskind 2013). Furthermore, research conducted on tamarin monkeys suggest that
species may use emotional features in their vocalizations in order to induce arousing
and calling states in receivers (Snowdon and Teie 2013).
378 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL
Music
A chapter concerning emotional reactions to sound would be incomplete without
music, as it is very important with its high emotional significance for humans. Music is
an indispensable part of humanity. Musical instruments are among the oldest cultural
artifacts that have been discovered. The bone flutes discovered in Southern Germany
date back to about 35,000 years (Conrad et al. 2009). Despite being an ancient part of
human life, the evolutionary origins of music are still under debate, and this is, of course,
a very difficult inquiry to answer with certainty. Some researchers argue that music is a
human invention that has no direct adaptational biological function. For instance, Patel
(2010) proposed that music relies on brain functions that are developed for other pur-
poses and that music is not an original function that has shaped our species through
natural selection. According to Patel, humans employed previously acquired abilities to
invent music. On the other hand, there are adaptationist views postulating that music is
in fact an evolutionary adaptation with survival value. Among those, Charles Darwin
([1879] 2004) proposed that music evokes strong emotions and could be an antecedent
to our language capacity. Further, it has been suggested that music, with its capacity to
be an important channel for communication of emotions, could promote successful
reproduction and improve social cohesion (for a detailed discussion, see Altenmüller
et al. 2013; Patel 2010). These social functions of music (i.e., social cohesion, com
munication, and cooperation) might have been critical for survival for human beings.
Moreover, music can influence the autonomic nervous system and immune system
activity (Koelsch 2011), and musical emotion processing can activate serotonergic
(increased serotonin is associated with satisfactory feelings from expected outcomes)
and dopaminergic (dopamine is associated with the reward system and reward related
feelings) neuromodulatory systems in the brain (Altenmüller et al. 2013; Koelsch 2013).
Musical Emotions
The emotion-inducing power of music is usually central in adaptationist views.
Nevertheless, there is a conception that musical emotions are merely aesthetic experi-
ences. Some researchers have claimed that music cannot induce everyday emotions
such as sadness, happiness, and anger (e.g., Scherer 2003); and others argue that music
cannot induce emotions at all (Konecni 2003). One of the main arguments here is that
music cannot induce everyday emotions related to survival functions, as it does not
seem to possess any capacity related to an individual’s goals and well-being. Hence, it
can only induce subtler feelings and aesthetic experiences that are not considered as
“real emotions.” Here, we reject this conception and claim that music can in fact induce
both basic and complex emotions in listeners through various psychological mecha-
nisms, some of which are not specific to musical stimuli and are common with other
emotion-inducing stimuli. Although there are a number of emotion theories whose
proponents do not agree on a precise definition of what an emotion is, they largely agree
on several components of an emotional episode (for detailed accounts of several
Sound and Emotion 379
emotion theories, see Barrett 2006; Moors 2010; Russell 2009; Scherer 2009). Emotions
are generally brief, affective reactions to salient events, and they involve several
components such as physiological arousal (i.e., autonomic activity such as changes in
heart rate), motor expression (e.g., smiling), subjective feeling (e.g., feeling happy upon
hearing a loved song), action tendency (e.g., dancing), and regulation. Previous research
has shown that music can evoke changes in all of the components that an emotional epi-
sode would have (Juslin and Västfjäll 2008; Koelsch et al. 2010). Furthermore, music can
induce activity in core neural structures of emotion processing (Koelsch 2013), which is
another indicator that music can in fact induce emotions.
tension, or suspense (Meyer 1956). Musical expectations are related to the anticipation
of future sounds, which involves memory and statistical learning of musical structures.
In addition, expectation and anticipation are linked to the reward processing and the
dopaminergic system in the brain (Huron and Margulis 2010). Finally, aesthetic judgment
refers to emotional reactions induced through a subjective evaluation of the aesthetic
value of music. Taken together, one may argue that emotional reactions to music could
occur through several psychological mechanisms, some of which are not specific to
music but, instead, are common to other emotion-inducing stimuli. This also suggests
that musical emotions are in fact emotions and they share commonalities with emotions
induced by other stimuli.
emotional, reward, and memory processes, as well as the structures related to autonomic
and endocrine system activity. Therefore, it is not difficult to understand why music is
such a special construct for human societies.
A growing body of empirical evidence suggests that the affective salience of external
stimuli provides invaluable cues for allocation of attentional resources and enhances
perception possibly via fast neural routes to sensory processing areas in the brain. One
of the main arguments is that emotional stimuli form a special group of high-salient
stimuli that are prioritized in sensory processing often at the expense of emotionally
neutral stimuli. In other words, people readily pay more attention to emotional signals
in comparison to neutral signals. Most of the studies concerning the impact of emotional
processes on attention and perception comes from the visual modality (e.g., Vuilleumier
2005; Vuilleumier and Driver 2007; Yiend 2010). Although comparable evidence in the
auditory modality is scarce, it seems to be accumulating. Here, we review evidence from
human behavioral and brain imaging studies on how affective sounds can modulate
perceptual and attentional processes.
In a change detection experiment, we found that the affective significance of individual
sounds in a complex auditory scene guides auditory attention (Asutay and Västfjäll
2014). Participants listened to two complex auditory scenes (each consisting of six
simultaneous environmental sounds), and indicated whether the two scenes were
identical or there was a change. Changes took the form of sound replacement (i.e., one
sound was replaced by another). Detection accuracy was higher when the changed
stimuli were emotionally negative and arousing compared to neutral. In addition, there
was an overall increase in perceptual sensitivity for trials in which the unchanged events
were negative. These findings suggest that the emotional salience of sounds guides
attentional resources in a complex environment and that the presence of an emotionally
negative and arousing environment can lead to an overall decrease in auditory atten-
tional thresholds.
Furthermore, using an aversive conditioning paradigm, we found that affective learning
not only modulates the affective significance of the CS but also can alter loudness
perception (Asutay and Västfjäll 2012). In this experiment, participants went through a
conditioning session, in which a CS (CS+; bandpass noise) was consistently paired with
a US (a vibratory shock delivered to the chair participants sat on). They were also exposed
to a control stimulus (CS−) that was not associated with the US. Sounds were bandpass
noise at different frequencies, and CS+ and CS− assignments were counterbalanced
among participants. After conditioning, CS+ was reported as being more fearful and
382 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL
negative and perceived as louder compared to CS−. Another recent study also found that
negative emotion can influence loudness perception (Siegel and Stefanucci, 2011). They
used a mood-induction technique to induce negative affect in half of the participants
and neutral affect in the rest. Participants then listened to auditory stimuli and performed
loudness judgments. People in the negative affect group perceived the auditory stimuli as
being louder compared to those in the neutral affect group.
In our laboratory, we have also investigated the effect of emotional salience of sounds
on auditory spatial attention (Asutay and Västfjäll, 2015a, 2015b). Using a covert spatial
orienting paradigm, we found that negative sounds provide exogenous cues to orient
auditory spatial attention to a particular region of space where they originate (Asutay
and Västfjäll 2015b). The auditory stimuli in the experiment were environmental sounds
with inherent meaning.
Neural models explaining the influence of emotion on other processes place the
amygdala in a central position (e.g., LeDoux 2012; Phelps 2006; Pourtois et al. 2013). The
amygdala seems to receive information regarding the affective salience of external stim-
uli early in the processing and, through its fast neural routes to sensory cortical regions,
it can modulate perceptual and attentional processing; that is, it can induce transient
changes in attentional thresholds in the presence of emotional stimuli. (Phelps 2006;
Phelps and LeDoux 2005). Emotional information can also modulate neural activity in
regions associated with attentional control that can modulate the impact of selective
attention on sensory processing (Domínguez-Borràs and Vuilleumier 2013). Apart from
this, the amygdala has direct projections to neuromodulatory systems (e.g., cholinergic,
adrenergic, dopaminergic) that are capable of modulating perceptual and attentional
processes. Cholinergic nuclei located in the basal forebrain receive input from the amyg-
dala, and they can release acetylcholine to widespread cortical areas. Activation of the
cholinergic system can facilitate neural excitability in sensory areas and is argued to be
central in learning-induced changes in the auditory cortex (Weinberger 2010). The cen-
tral amygdala also projects to the locus coeruleus (LC) in the brain stem, which is a part
of the noradrenergic system. The LC sends noradrenaline inputs to widespread cortical
areas to regulate arousal and autonomic functions. Activation of the noradrenergic sys-
tem can facilitate sensory processing, enhance cognitive flexibility, and promote vigilant
attentional shifting in the presence of significant sensory stimuli (Corbetta et al. 2008;
Sara and Bouret 2012). In general, the presence of emotionally significant stimuli can
activate the neuromodulatory systems that, in turn, can regulate the activity in the brain
regions that are involved in active information processing. Although most evidence on
the effect of these neuromodulatory systems relies on animal models, a few human studies
exist (Hermans et al. 2011; Thiel et al. 2002; Weis et al. 2012). In conclusion, it seems that
processing of emotionally significant stimuli is enhanced via several gain control mecha-
nisms (direct influence on sensory processing and attentional thresholds, and indirect
influence of modulatory systems) that are mediated by a large brain network centered
around the amygdala.
Sound and Emotion 383
Concluding Remarks
In this chapter, we have focused on the relationship between sound and emotion: how
acoustic stimuli induce affective reactions in listeners and how the affective significance
of sounds influences the way we perceive and attend to them. We reviewed human
behavioral and neuroimaging studies concerning learning-induced emotional reactions,
vocal emotional signals, and music. Our main aims here were to illustrate the close relation-
ship between affective and auditory processes and to state that affective experience is an
integral part of auditory perception.
First, viewing the auditory system as an adaptive network specialized in processing
acoustic stimuli indicates that the affective and motivational significance of auditory
stimuli influences both the way they are processed and our reactions to them. It also
makes intuitive sense when we consider the function of the auditory system that scans
our surroundings, detects potentially relevant targets, and signals for attention-shifts to
salient objects when necessary. In that respect, it functions as an adaptive warning
system. Hence, emotional, motivational, and attentional state of the organism are taken
into account while complex auditory input is processed and analyzed.
Next, conditioning studies have shown that as the emotional significance of an
auditory stimulus changes through learning, the representation of that particular sound
in the auditory system will develop specific neural plasticity. Thus, emotional significance
of sounds can lead to biases in auditory processing and adapt the system to be more
attentive and tuned to significant events. This conclusion is also very much in line with
the adaptive capacity of the auditory system. In addition, empirical evidence and neural
models show how emotional significance of auditory stimuli can effectively rewire the
neural structure so that affective stimuli receive priority during sensory processing.
Taken together, the findings reviewed here point to the close relationship between affec-
tive and auditory processes.
Furthermore, music can influence both the autonomic nervous system and immune
system activity and activate serotonergic and dopaminergic modulatory systems. Music
can also induce and regulate emotions through various psychological mechanisms,
most of which are common to other emotion inducing stimuli. Empirical evidence
suggests that musical stimuli can consistently activate the main neural structures of
affective processing. Emotional signals in music activate brain structures within the
networks related to emotional, reward, and memory processes, as well as the structures
related to autonomic and endocrine system activity.
In addition, mental representations that are evoked by auditory stimuli might also
influence emotional reactions elicited during sound perception. We argue that these
mental representations depend on the situational context, the listener, and the stimulus
itself. Evoked mental representations that are related to the sound source and its mean-
ing induce emotional reactions while listening to environmental sounds. On the other
384 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL
hand, visual imagery evoked by the acoustic features of sound might have a completely
different nature. For instance, musical stimuli can induce visual imagery and episodic
memory, both of which have an impact on emotional experience while listening to
music. The former seems to be influenced by the structure of music, while the latter is
the retrieval of a memory with emotional significance by music.
In conclusion, we argue that auditory perception is central to most interactions we
have with our surroundings. Sounds have great potential to communicate biologically
significant emotional information, such as vocal signals and music, which modulates
both the sensory processing in the brain and the behavioral outcomes. Finally, considering
the high adaptational capacity of the auditory system, we claim that emotional experience
is integral to sound perception.
References
Ahveninen, J., M. Hämäläinen, I. P. Jääskeläinen, S. P. Ahlfors, S: Huang, F. H. Lin, et al. 2011.
Attention-Driven Auditory Cortex Short-Term Plasticity Helps Segregate Relevant Sounds
from Noise. Proceedings of the National Academy of Sciences 108: 4182–4187.
Ahveninen, J., N. Kopco, and I. P. Jääskeläinen. 2014. Psychophysics and Neuronal Basis of
Sound Localization in Humans. Hearing Research 307: 86–97.
Altenmüller, E., R. Kopiez, and O. Grewe. 2013. A Contribution to the Evolutionary Basis of
Music: Lessons from the Chill Response. In Evolution of Emotional Communication, edited by
E. Altenmüller, S. Schmidt, and E. Zimmermann, 313–335. Oxford: Oxford University Press.
Altenmüller, E., S. Schmidt, and E. Zimmermann. 2013. Evolution of Emotional Communication.
Oxford: Oxford University Press.
Armony, J. L., and J. LeDoux. 2010. Emotional Responses to Auditory Stimuli. In The Oxford
Handbook of Auditory Science: The Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer,
479–505. New York: Oxford University Press.
Arnott, S. R., and C. Alain. 2011. The Auditory Dorsal Pathway: Orienting Vision. Neuroscience
and Biobehavioral Reviews 35: 2162–2173.
Asutay, E. 2014. Emotional Influences on Auditory Perception and Attention. Doctoral disser-
tation. Chalmers University of Technology, Sweden.
Asutay, E., and D. Västfjäll. 2012. Perception of Loudness is Influenced by Emotion. PLoS One
7: e388660.
Asutay, E., and D. Västfjäll. 2014. Emotional Bias in Change-Deafness in Multisource Auditory
Environments. Journal of Experimental Psychology: General 143: 27–32.
Asutay, E., and D. Västfjäll. 2015a. Attentional and Emotional Prioritization of Sounds
Occurring Outside the Visual Field. Emotion 15: 281–286.
Asutay, E., and D. Västfjäll. 2015b. Negative Emotion Provides Cues for Orienting Auditory
Spatial Attention. Frontiers in Psychology 6: 618.
Asutay, E., D. Västfjäll., A. Tajadura-Jiménez, A. Genell, P. Bergman, and M. Kleiner. 2012.
Emoacoustics: A Study of the Psychoacoustical and Psychological Dimensions of Emotional
Sound Design. Journal of the Audio Engineering Society 60: 21–28.
Bachorowski, J. A., and M. J. Owren. 2008. Vocal Expressions of Emotion. In Handbook of
Emotions, 3rd ed., edited by M. Lewis, J. M. Havilland-Jones, and L. F. Barrett, 211–234.
New York: Guilford Press.
Bajo, V. M., F. R. Nodal, D. R. Moore, and A. J. King. 2010. The Descending Corticocollicular
Pathway Mediates Learning-Induced Auditory Plasticity. Nature Neuroscience 13: 253–260.
Sound and Emotion 385
Ball, T., B. Rahm, S. Eickhoff, A. Schulze-Bonhage, O. Speck, and I. Mutschler. 2007. Response
Properties of Human Amygdala Subregions: Evidence Based on Functional MRI Combined
with Probabilistic Anatomical Maps. PLoS One 3: 307.
Barrett, L. F. 2006. Solving the Emotion Paradox: Categorization and the Experience of
Emotion. Personality and Social Psychology Review 10: 20–46.
Belin, P., R. J. Zatorre, P. Lafallie, P. Ahad, and B. Pike. 2000. Voice-Selective Areas in Human
Auditory Cortex. Nature 403: 309–312.
Blauert, J. 1997. Spatial Hearing. Rev. ed. Cambridge, MA: MIT Press.
Blood, A. J., and R. Zatorre. 2001. Intensely Pleasurable Responses to Music Correlate with
Activity in Brain Regions Implicated in Reward and Emotion. Proceedings of the National
Academy of Sciences 98: 11818–11823.
Bradley, M. M., and P. J. Lang. 2000. Affective Reactions to Acoustic Stimuli. Psychophysiology
49, 204–215.
Bregman, A. 1999. Auditory Scene Analysis: The Perceptual Organization of Sound. 2nd ed.
London: MIT Press.
Brück, C., B. Kreifelts, T. Ethofer, and D. Wildgruber. 2013. Emotional Voices. In The Cambridge
Handbook of Human Affective Neuroscience, edited by J. Armony and P. Vuilleumier, 265–285.
New York: Cambridge University Press.
Clarke, E. F. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. New York: Oxford University Press.
Conrad, N. J., Malina, M., and Münzel, S. C. 2009. New Flutes Document the Earliest Tradition
in Southwestern Germany. Nature 460: 737–740.
Corbetta, M., G. Patel, and G. L. Schulman. 2008. The Reorienting System of the Human
Brain: From Environment to Theory of Mind. Neuron 58: 306–324.
Dahmen, J. C., P. Keating, F. R. Nodal, A. L. Schulz, and A. J. King. 2010. Adaptation to
Stimulus Statistics in the Perception and Neural Representation in Auditory Space. Neuron
66: 937–948.
Darwin, C. (1879) 2004. The Descent of Man. London: Penguin.
De Houwer, J., S. Thomas, and F. Baeyens. 2001. Associative Learning of Likes and Dislikes:
A Review of 25 Years of Research on Human Evaluative Conditioning. Psychological Bulletin
127: 853–869.
Delgado, M. R., A. Olsson, and E. A. Phelps. 2006. Extending Animal Models of Fear
Conditioning to Humans. Biological Psychology 73: 39–48.
Domínguez-Borràs, J., and P. Vuilleumier. 2013. Affective Biases in Attention and Perception.
In The Cambridge Handbook of Human Affective Neuroscience, edited by J. Armony and
P. Vuilleumier, 331–356. New York: Cambridge University Press.
Domjan, M. 2005. Pavlovian Conditioning: A Functional Perspective. Annual Review of
Psychology 56: 179–206.
Eldar, E., O. Ganor, R. Admon, A. Bleich, and T. Hendler. 2007. Feeling the World: Limbic
Response to Music Depend on Related Content. Cerebral Cortex 17: 2828–2840.
Fastl, H., and E. Zwicker. 2007. Psychoacoustics: Facts and Models. Berlin: Springer.
Fecteau, S., P. Belin, Y. Joanette, and J. L. Armony. 2007. Amygdala Responses to Nonlinguistic
Emotional Vocalizations. Neuroimage 36: 480–487.
Frijda, N. 2008. The Psychologists’ Point of View. In Handbook of Emotions, 3rd ed., edited by
M. Lewis, J. M. Havilland-Jones, and L. F. Barrett, 68–87. New York: Guilford Press.
Fritz, J. B., M. Elhilali, S. V. David, and S. A. Shamma. 2007. Auditory Attention: Focusing the
Searchlight on Sound. Current Opinion in Neurobiology 17: 1–19.
Fritz, T., and S. Koelsch. 2005. Initial Response to Pleasant and Unpleasant Music: An fMRI
Study (Poster). NeuroImage 26 (Suppl.), 271.
386 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL
Kotz, S. A., A. S. Hasting, and S. Paulmann. 2013. On the Orbito-Striatal Interface in Acoustic
Emotional Processing. In Evolution of Emotional Communication, edited by E. Altenmüller,
S. Schmidt, and E. Zimmermann, 229–240. Oxford: Oxford University Press.
LaBelle, B. 2007. Background Noise: Perspectives on Sound Art. New York: Continuum.
Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago: University of Chicago Press.
Lang, P. J., and M. M. Bradley. 2010. Emotion and the Motivational Brain. Biological Psychology
84: 437–450.
LeDoux, J. 2012. Rethinking the Emotional Brain. Neuron 73: 653–676.
Lee, C. C., and J. C. Middlebrooks. 2011. Auditory Cortex Spatial Sensitivity Sharpens during
Task Performance. Nature Neuroscience 14: 108–114.
Lehmann, A., and M. Schönweisner. 2014. Selective Attention Modulates Human Auditory
Brainstem Responses: Relative Contributions of Frequency and Spatial Cues. PLoS
One 9 (e85442).
Maier, J. X., and A. A. Ghazanfar. 2007. Looming Biases in Monkey Auditory Cortex. Journal
of Neuroscience 27: 4093–4100.
Malmierca, M. S. 2005. The Inferior Colliculus: A Center for Convergence of Ascending and
Ascending Auditory Information. Neuroembryology and Ageing 3: 215–229.
Marsh, R. A., Z. M. Fuzessery, C. D. Grose, and J. J. Wenstrup. 2002. Projection to the
Inferior Colliculus from the Basal Nucleus of the Amygdala. Journal of Neuroscience
22: 10449–10460.
Menon, V., and D. J. Levitin. 2005. The Rewards of Music Listening: Response and Physiological
Connectivity of the Mesolimbic System. NeuroImage 28: 175–184.
Meyer, L. B. 1956. Emotion and Meaning in Music. Chicago: Chicago University Press.
Moore, B. C. J. 2012. An Introduction to the Psychology of Hearing. 6th ed. London:
Academic Press.
Moore, B. C. J., and H. E. Gockel. 2012. Properties of Auditory Stream Formation. Philosophical
Truncations of the Royal Society B 367: 919–931.
Moors, A. 2010. Theories of Emotion Causation: A Review. In Cognition and Emotion: Review
of Current Research and Theories, edited by J. de Houwer and D. Hermans, 1–37. New York:
Psychology Press.
Neuhoff, J. G. 2004. Ecological Psychoacoustics. Boston, MA: Elsevier Academic Press.
Ohl, F. W., and H. Scheich. 2005. Learning-Induced Plasticity in Animal and Human Auditory
Cortex. Current Opinion in Neurobiology 15: 470–477.
Olsson, A., and E. A. Phelps. 2004. Learned Fear of “Unseen” Faces after Pavlovian,
Observational, and Instructed Fear. Psychological Science 15: 822–828.
Owren, M. J., and D. Rendall. 2001. Sound on the Rebound: Bringing Form and Function
Back to the Forefront in Understanding Nonhuman Primate Vocal Signaling. Evolutionary
Anthropology 10: 58–71.
Owren, M. J., M. Phillip, E. Vanman, N. Trivedi, A. Schulman, and J. Bachorowski. 2013.
Understanding Spontaneous Human Laughter: The Role of Voicing in Inducing Positive
Emotion. In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt,
and E. Zimmermann, 175–190. Oxford: Oxford University Press.
Patel, A. 2010. Music, Biological Evolution, and The Brain. In Emerging Disciplines, edited by
M. Bailar, 91–144. Houston, TX: Houston University Press.
Petkov, C. I., X. Kang, K. Alho, O. Bertrand, E. W. Yund, and D. L. Woods. 2004. Attentional
Modulation of Human Auditory Cortex. Nature Neuroscience 7: 685–663.
Phelps, E. A. 2006. Emotion and Cognition: Insights from Studies of the Human Amygdala.
Annual Review of Psychology 57: 27–53.
388 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL
Phelps, E. A., and J. LeDoux. 2005. Contributions of the Amygdala to Emotion Processing:
From Animal Models to Human Behavior. Neuron 48: 175–187.
Pourtois, G., A. Schettino, and P. Vuilleumier. 2013. Brain Mechanisms for Emotional
Influences on Perception and Attention: When Is Magic and What Is Not. Biological
Psychology 92: 492–512.
Read, H. L., J. A. Winer, and C. E. Schreiner. 2002. Functional Architecture of Auditory
Cortex. Current Opinion in Neurobiology 12: 433–440.
Rees, A., and A. R. Palmer. 2010. The Oxford Handbook of Auditory Science: The Auditory
Brain, Vol. 2. New York: Oxford University Press.
Rescorla, R. A. 1998. Pavlovian Conditioning: It’s Not What You Think It Is. American
Psychologist 43: 151–160.
Russell, J. A. 2009. Emotion, Core Affect, and Psychological Construction. Cognition and
Emotion 23: 1259–1283.
Russell, J. A., J. A. Bachorowski, and J. M. Fernandez-Dols. 2003. Facial and Vocal Expressions
of Emotion. Annual Review of Psychology 54: 329–349.
Salimpoor, V., M. Benovoy, K. Larcher, A. Dagher, and R. Zattere. 2011. Anatomically
Distinct Dopamine Release during Anticipation and Experience of Peak Emotion Music.
Nature Neuroscience 14: 257–262.
Salminen, N. H., J. Aho, and M. Sams. 2013. Visual Task Enhances Spatial Selectivity in the
Human Auditory Cortex. Frontiers in Neuroscience 7: 44.
Sander, D., J. Grafman, and T. Zalla. 2003. The Human Amygdala: An Evolved System for
Relevance Detection. Reviews in the Neurosciences 14: 303–316.
Sander, K., A. Brechman, and H. Scheich. 2003. Audition of Laughing and Crying Leads to
Right Amygdala Activation in a Low-Noise fMRI Setting. Brain Research Protocols 11: 81–91.
Sander, K., and H. Scheich. 2005. Left Auditory Cortex and Amygdala, but Right Insula
Dominance for Human Laughing and Crying. Journal Cognitive Neuroscience 17: 1519–1531.
Sara, S. J., and S. Bouret. 2012. Orienting and Reorienting: The Locus Coeruleus Mediates
Cognition through Arousal. Neuron 76: 130–141.
Scherer, K. R. 2003. Why Music Does Not Produce Basic Emotions: A Plea for a New Approach
to Measuring Emotional Effects of Music. In Proceedings of the Stockholm Music Acoustics
Conference 2003, edited by R. Bresin, 25–28. Stockholm, Sweden: Royal Institute of Technology.
Scherer, K. R. 2009. Emotions and Emergent Processes: They Require a Dynamic Computational
Architecture. Philosophical Transactions of the Royal Society B 364: 3459–3474.
Schirmer, A., and S. A. Kotz. 2006. Beyond the Right Hemisphere: Brain Mechanisms
Mediating Vocal Emotional Processing. Trends in Cognitive Science 10: 24–30.
Seifritz, E., J. G. Neuhoff, D. Bilecen, K. Scheffler, H. Mustovic, H. Schächinger, et al. 2002.
Neural Processing of Auditory Looming in the Human Brain. Current Biology 12: 2147–2151.
Sescousse, G., X. Caldú, B. Segura, and J. C. Dreher. 2013. Processing Primary and Secondary
Rewards: A Quantitative Meta-Analysis and Review of Human Functional Neuroimaging
Studies. Neuroscience and Biobehavioral Reviews 37: 681–696.
Shinn-Cunningham, B. G. 2008. Object-Based Auditory and Visual Attention. Trends in
Cognitive Sciences 12: 182–186.
Siegel, E. H., and J. K. Stefanucci. 2011. A Little Bit Louder Now: Negative Affect Increases
Perceived Loudness. Emotion 11: 1006–1011.
Snowdon, C. T., and D. Teie. 2013. Emotional Communication in Monkeys: Music to Their
Ears? In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt, and
E. Zimmermann, 133–151. Oxford: Oxford University Press.
Sound and Emotion 389
Sörqvist, P., S. Stenfelt, and J. Rönnberg. 2012. Working Memory Capacity and Visual-Verbal
Cognitive Load Modulate Auditory-Sensory Gating in the Brainstem: Toward a Unified
View of Attention. Journal of Cognitive Neuroscience 24: 2147–2154.
Stecker, G. C., I. A. Harrington, and J. C. Middlebrooks. 2005. Location Coding by Opponent
Neural Populations in The Auditory Cortex. PLoS Biology 3: 78.
Tajadura-Jiménez, A. 2008. Embodied Psychoacoustics: Spatial and Multisensory Determinants
of Auditory-Induced Emotion. Doctoral dissertation. Chalmers University of Technology,
Sweden.
Tajadura-Jiménez, A., P. Larsson, A. Väljamäe, D. Västfjäll, and M. Kleiner. 2010b. When
Room Size Matters: Acoustic Influences on Emotional Responses to Sounds. Emotion
10: 416–422.
Tajadura-Jiménez, A., A. Väljamäe, E. Asutay, and D. Västfjäll. 2010a. Embodied Auditory
Perception: The Emotional Impact of Approaching and Receding Sounds. Emotion
10: 216–229.
Tajadura-Jiménez, A., and D. Västfjäll. 2008. Auditory-Induced Emotion: A Neglected
Channel for Communication in Human-Computer Interaction. In Affect and Emotion in
Human-Computer Interaction: From Theory to Applications, edited by C. Peter and R. Beale,
63–74. Berlin/Heidelberg: Springer-Verlag.
Thiel, C. M., K. J. Friston, and R. J. Dolan. 2002. Cholinergic Modulation of Experience-
Dependent Plasticity in Human Auditory Cortex. Neuron 35: 567–574.
Västfjäll, D. 2012. Emotional Reactions to Sounds without Meaning. Psychology 3: 606–609.
Vuilleumier, P. 2005. How Brains Beware: Neural Mechanisms of Emotional Attention. Trends
in Cognitive Sciences 9: 585–594.
Vuilleumier, P., and J. Driver. 2007. Modulation of Visual Processing by Attention and
Emotion: Windows on Causal Interactions between Human Brain Regions. Philosophical
Transactions of the Royal Society B 362: 837–855.
Wang, X., and D. Bendor. 2010. Pitch. In The Oxford Handbook of Auditory Science: The
Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 149–172. New York: Oxford
University Press.
Weinberger, N. M. 2004. Specific Long-Term Memory Traces in Primary Auditory Cortex.
Nature Reviews: Neuroscience 5: 279–290.
Weinberger, N. M. 2010. The Cognitive Auditory Cortex. In The Oxford Handbook of Auditory
Science: The Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 441–478. New York:
Oxford University Press.
Weis, T., S. Puschmann, A. Brechmann, and C. M. Thiel. 2012. Effects of L-dopa during
Auditory Instrumental Learning in Humans. PLoS One 7: e52504.
Wiethoff, S., D. Wildgruber, W. Grodd, and T. Ethofer. 2009. Response and Habitation of the
Amygdala during Processing of Emotional Prosody. Neuroreport 20: 1356–1360.
Winer, J. A., and C. C. Lee. 2007. The Distributed Auditory Cortex. Hearing Research 229: 3–13.
Woods, D. L., T. J. Herron, A. D. Cate, E. W. Yund, G. C. Stecker, T. Rinne, et al. 2010. Functional
Properties of Human Auditory Cortical Fields. Frontiers in Systems Neuroscience 4: 155.
Woods, D. L., G. C. Stecker, T. Rinne, T. J. Herron, A. D. Cate, E. W. Yund, et al. 2009.
Functional Maps of Human Auditory Cortex: Effects of Acoustic Features and Attention.
PLoS One 4 (e5183).
Xiao, Z., and N. Suga. 2005. Asymmetry in Cortigofugal Modulation of Frequency-Tuning
in Moustached Bat Auditory System. Proceedings of the National Academy of Sciences
102: 19162–19167.
390 ERKIN ASUTAY AND DANIEL VÄSTFJÄLL
Volu n ta ry Au ditory
Im agery a n d M usic
Pedag ogy
Introduction
Auditory imagery is a common everyday experience. People are able to imagine the
sound of waves crashing on a beach, the voice of a famous movie actor, or the melody of
a familiar song or TV theme tune. Although people vary in the extent to which they
report these as vivid experiences, on average they rate vividness of imagined sounds at
the upper end of rating scales such as the Bucknell Auditory Imagery Scale (Halpern 2015),
averaging about 5 on a 7-point scale, where 7 means “as vivid as actually hearing the
sound.” Imagined music can also be involuntary (Beaman and Williams 2010; Hyman
et al. 2013), or even hallucinatory (Griffiths 2000; Weinel, this volume, chapter 15) but the
focus of our discussion is on the willful calling to mind of music. Our argument here is
that, in general, auditory imagery is not just something people do when mind-wandering
or passing the time; it can have definite positive consequences in mood regulation,
self-entertainment, and mental rehearsal. More particularly, musicians, composers, and
music educators understand that auditory imagery is a tool, and they regularly employ
auditory imagery in both pedagogical and professional capacities. We suggest that this
important skill could be used more widely than it is already; for example, to enable
musicians to employ more efficient memorization skills and to rehearse both physical
and expressive aspects of performance without risking excessive motor practice.
Using imagined music to accomplish something beneficial is reported among
non-musicians as well as musicians, of course. People report voluntarily bringing music
to mind to regulate their emotional state, and they judge the emotionality of familiar
imagined music similarly to judging the emotionality of heard music (Lucas et al. 2010).
Recorded music has been shown to assist athletic performance in a variety of situations,
392 andrea r. halpern and katie overy
including keeping a steady pace during swimming (Karageorghis et al. 2013), and
imagined music can have similar benefits. One compelling example was reported by the
marathon swimmer Diana Nyad during her record-setting swim of the Straits of Cuba,
covering 110 miles in 53 hours:
Diana Nyad uses singing to help pass the time and the monotony and sensory depri-
vation inevitable in marathon swimming. To help, she sings silently from a mental
playlist of about 65 songs [including] Janis Joplin’s chart-topping version of Me and
Bobby McGee. “If I sing that 2,000 times in a row, the whole song, I will get through
five hours and 15 minutes,” Nyad said . . . . “It’s kind of stupid,” she added, “but it gets
me through.”1
For musicians, imagining a musical performance can be a useful rehearsal tool and even
a powerful experience, as expressed by violinist Romel Joseph, who was trapped under
the rubble of his music conservatory for eighteen hours after the Haiti earthquake of 2010.
This quote captures both the emotional and performance aspects of imagining music in
a most deliberate way:
Psychology Research
on Auditory Imagery
If auditory imagery had an arbitrary, or even illusory link to perceiving and performing
real music, then advocating for the increased use of imagery in musical rehearsal and
pedagogy might not make a compelling argument. However, research over the years has
suggested that both musicians and nonmusicians (i.e., those who haven’t studied musical
performance to a high level) can mentally represent a surprisingly wide range of audi-
tory characteristics of actual musical sound (see Hubbard 2010 for a comprehensive
review), in many cases using imagery very consciously and deliberately. For instance,
most individuals, including nonmusicians, can call to mind the melody of a familiar
song without any difficulty. For songs with no canonical recorded versions, people are
remarkably consistent in reproducing or choosing a similar pitch to the one they pro-
duced or chose on a prior occasion for the same song (Halpern 1989). In addition, they
are also fairly accurate in reproducing the opening pitches of well-known recordings of
voluntary auditory imagery and music pedagogy 393
music from their mental playlist (Levitin and Cooke 1996; Frieler et al. 2013) and can
usually recognize the correct pitch within two semitones (Schellenberg and Trehub 2003).
Auditory imagery also represents some of the temporal characteristics of sounded
music remarkably accurately. If asked to carry out memory tasks comparing pitches at
two nonadjacent places in a familiar melody, reaction times increase proportionally to
the distance between the notes in beats of the actual tune (Halpern 1988a) and if asked to
mentally complete a phrase of a familiar tune after a sounded cue of the opening notes,
reaction times similarly increase proportionally for longer phrases (Halpern and
Zatorre 1999). A recent study of involuntary musical imagery asked people to tap the
tempo of involuntary auditory images that occurred over a five-day period; tempos
were recorded via an accelerometer. Results showed that 77 percent of 115 reports of
episodes involving recorded music were within 15 percent of the original recorded
tempo (Jakubowski et al. 2015).
Even a multidimensional construct such as timbre is processed similarly in hearing
and imagery. Halpern and Zatorre (2004) asked people to make similarity judgments
between pairs of sounded and imagined musical instruments while undergoing fMRI
scanning. Similarity ratings in both conditions were highly correlated. Additionally, both
types of judgments involved activation of the secondary auditory cortex (judgments on
sounded but not imagined music additionally activated the primary auditory cortex).
Thus, we have some basis to conclude that mentally simulating or rehearsing music
might involve similar neural processing and thus confer some of the same benefits as
actual hearing or even production—given that production includes not only motor but
also auditory skills.
On the other hand, apart from the case of auditory hallucinations (Griffiths 2000;
Weinel, this volume, chapter 15) most people do not actually confuse imagining with
hearing, and thus we should not be surprised if there were behavioral and neural sub-
strate differences between the two. One obvious difference between the two types of
auditory experience is that auditory imagery tasks are on average more difficult than
matched perceptual tasks. For example, Zatorre and Halpern (1993) presented patients
who had undergone surgery removing part of their temporal lobes (mean age about
thirty years old) and matched controls with the text of the first line of a familiar tune,
such as Jingle Bells. Two lyrics were highlighted, as in “Dashing through the SNOW, in a
one-horse open SLEIGH.” The task was to judge whether the pitch of the second high-
lighted lyric was higher or lower in pitch than the first such lyric (the reader is invited to
try that now). In one condition, participants heard a recording of someone singing the
tune; in the other condition, they had to use mental imagery only. The right-temporal
lobectomy patients had lower performance in both conditions compared to the other
two groups, implicating the role of the right temporal lobe in pitch perception and imagery,
whereas accuracy rates for all participants were about 12–15 percent lower in the imag-
ined than heard condition. In a subsequent study with healthy young adults in which
only the two to-be-compared lyrics were presented, similar performance drops from
heard to imagined conditions were found (Zatorre and Halpern 1996).
394 andrea r. halpern and katie overy
Imagery tasks are likely to be more difficult because they involve considerable working
memory resources; indeed, Baddeley and Andrade (2000) found that working memory
(WM) performance scores correlated with self-reported vividness of both auditory and
visual imagery. WM span is also positively correlated with measures of pitch and temporal
imagery ability (Colley et al. 2017). Brain imaging studies also point to the involvement of
executive function in auditory imagery. Herholz and colleagues (2012) asked people with
a range of musical experience to listen to or imagine familiar songs, while simultaneously
viewing the lyrics in a karaoke-type video presentation. Compared to a baseline, cerebral
blood flow (measured via fMRI) in imagined and heard conditions activated perceptual
areas such as the superior temporal gyrus (STG, the locus of the secondary auditory
cortex), but imagining tunes also uniquely activated several areas associated with
higher-order planning and other executive functions, such as the supplementary motor
area (SMA), intraparietal cortex (IPS), inferior frontal cortex (IFC), and right dorso-
lateral prefrontal cortex (DLPFC) (see Figure 19.1). This additional neural activity is
interpreted as reflecting extra cognitive effort and suggests that imagery tasks are more
difficult. However, such tasks may also benefit music learning in both the short and long
term precisely because of this increased level of cognitive engagement (known as a
“desirable difficulty” in the cognitive literature), potentially leading to better encoding
and later retention (Bjork et al. 2014).
As alluded to in our opening remarks, auditory imagery is not always intentional—it can
come unbidden in the form of so-called earworms, or what is sometimes called involuntary
musical imagery (INMI). Numerous researchers have now studied this phenomenon,
documenting the incidence and phenomenology of the experience, the relationship to
personality variables, and the characteristics of the triggers and the tunes themselves that
come to mind (for example, Bailes 2006; Halpern and Bartlett 2011; Hyman et al. 2013;
Müllensiefen et al. 2014; Williamson and Jilka 2013). However, in this chapter we focus on
voluntary auditory imagery precisely because it is under the control of the individual and
thus can be harnessed and modified as needed to accomplish musical goals.
IPS DLPFC
SMA listen > imagine
2.3 5.4
Figure 19.1 Brain areas more active in listening than imagining a familiar tune (the major
activity is labeled “STG”—orange on the companion website) and those more active in imagi
ning than listening to a familiar tune (the major activity is labeled “IPS,” “SMA,” “DLPFC,” and
“IFC”—blue on the companion website). (Reprinted with permission from Herholz et al. 2012).
voluntary auditory imagery and music pedagogy 395
It is perhaps unsurprising then, that several approaches to musical training involve the
explicit training of auditory imagery skills. In fact, the skill of reading music notation
and “hearing” the appropriate auditory image “in one’s head” is so commonly used in
musical performance and training that it often is not even given a particular name; its
centrality is simply assumed (much like the skill of reading a book “in one’s head” is
commonly assumed). The use of auditory imagery in expert musical performance
preparation is sometimes called “mental rehearsal” and often combined with motor and
visual imagery. Some of the first music psychologists also noted the importance of
auditory imagery, starting with the earliest published measures of musical ability
(Seashore 1919). Indeed, Carl Seashore regarded auditory imagery as the highest form of
musicianship: “[T]he most outstanding mark of the musical mind is a high capacity
for auditory imagery” (Seashore 1938, 161). It is thus useful at this point to consider the
ways in which voluntary auditory imagery has been employed in particular approaches
to music pedagogy. Although not all such approaches have either been documented or
investigated empirically, there is nevertheless a long tradition of such training, going
back decades and perhaps even centuries.
398 andrea r. halpern and katie overy
Two major classroom music education figures of the twentieth century, Zoltán
Kodály and Edward Gordon, made auditory imagery an explicit feature of their peda-
gogical approaches, referring to it as “inner hearing” or “audiation,” respectively. Perhaps
the most methodical, worked-out teaching method arose from the work of Kodály, a
Hungarian composer and professor at the Liszt Academy in the first half of the twentieth
century (Ittzés 2002). Observing that Austrian and Viennese music were held in higher
regard than Hungarian folk music during the period of the Austro-Hungarian Empire
and apparently unimpressed by both the general quality and the repertoire of urban
children’s singing in Hungary, Kodály developed an entirely new approach to classroom
music education. This new approach was based on what he considered to be the chil-
dren’s musical “mother-tongue,” that is, their musical vernacular, Hungarian folk songs,
which he collected and preserved in collaboration with Béla Bartók (Kodály 1960, 1974).
Essentially the idea, which was developed and put into practice by Kodály’s students
(Ádám 1944; Szőnyi 1974), is to begin classroom music lessons with songs already some-
what familiar to children, and through regular repetition and analysis of this familiar
repertoire, learn the fundamentals of musical knowledge such as scales, rhythm, musi-
cal notation, and sight-singing. The skills acquired can then give children direct access
to participating in and understanding the entire world of Western art music (and indeed
other music from around the globe). Integral to this process is the use of “inner hearing”
as a device to develop children’s musicianship and literacy skills. For example, a primary
school music activity might involve learning to miss out a few notes or words during
a song and to imagine them instead of singing them. To take Jingle Bells as an example
again, children might be asked to sing the whole song together, while leaving out
the words “bells” and “all the way” throughout the whole song (try it!). Not only does
this rehearse the skill of “inner hearing,” it can also be made into an enjoyable game,
and additionally, the musical structure appearing from the three repetitions of “Jingle”
(mi-mi, mi-mi, mi-so; see Figure 19.2) becomes prominent and can be “discovered”
by the children with the guidance of the teacher, leading to an understanding of
musical form.
Developing such skills to a more advanced level eventually allows older children to be
able to sight-sing one melody while imagining a countermelody (i.e., a simultaneous
melody), or imagine a familiar chord sequence (e.g., I–VI–IV–V–I) in various major
and minor keys, for example (see Figure 19.3).
The focus on repeatedly singing and analyzing familiar songs is key to the Kodály
approach, and in the context of auditory imagery it is worth noting the strong emphasis
placed on regular practice and depth of understanding. Kodály believed that the collecting
Figure 19.2 First line of the song Jingle Bells, where the word “Jingle” is sung aloud each time
and the rest of the line is imagined.
voluntary auditory imagery and music pedagogy 399
Figure 19.3 Harmonic chord sequence of I, VI, IV, V, I, first shown in the key of C major and
then in A minor, where I is the tonic chord and V is the dominant chord.
of musical experiences is more important than studying music theoretically (Kodály 1974)
and placed special emphasis on the use of relative sol-fa (i.e., naming notes according to
their position in a musical scale, rather than by their absolute pitch) and two-part sing-
ing (Kodály 1962). Baddeley and Andrade suggest that the experience of vivid imagery
requires abundant sensory information to be available from long-term memory (2000),
and Neisser (1976) has noted that imagery arises (at least in part) from schemata, based
on prior experience. Since a voluntary auditory image is self-generated, it reflects
considerable prior processing and should not be seen as an uninterpreted sensory copy
(Hubbard 2010)—mental models play an important part in musical imagery, much as
they do in music perception (Schaefer 2014). Imagery and memory are also considered
to be closely linked; mental imagery is an important component of working memory
rehearsal, for example (Baddeley and Logie 1992). Singing may also be of particular
value in the development of “inner hearing” skills because a vast amount of familiar
musical material can be brought to mind through songs, without requiring any instru-
mental expertise. It has even been shown that musicians subvocalize when performing a
notation-reading auditory imagery task (Brodsky et al. 2008), suggesting reliance on an
imagined sung version of a melody (although it must be noted that neuroimaging
studies to date have not revealed activation of the primary motor cortex during auditory
imagery) (see Zatorre and Halpern 2005).
Another point of interest regarding the Kodály approach is that it specifically aims to
develop the ability to hear, or imagine, more than one melodic line at the same time, an
ability that has recently been shown to be particularly developed in musical conductors
(Wöllner and Halpern 2016). While the extent to which such a skill involves actual
divided attention, versus rapid switching between parts, is still debated (Alzahabi and
Becker 2013), it is nevertheless clear that this ability can be trained and developed to a
high level of skill. Indeed, at an advanced level, such as an undergraduate “harmony and
counterpoint” or “stylistic composition” exam, a music student might be asked to write a
fugue in the style of Bach and a song in the style of Schubert while sitting at a desk in an
exam hall, thus relying on auditory imagery of several melodic lines and/or harmonic
progressions, as well as expert musical knowledge, in order to complete the task.
A final aspect of the Kodály approach that is rarely discussed but important to note,
is the fact that it involves group musical learning, almost always taking place in the
school or university classroom. The “inner hearing” activities thus involve what we
400 andrea r. halpern and katie overy
might describe as group auditory imagery, or “shared auditory imagery,” which can
bring a highly focused sense of shared attention when used effectively, as well as allowing
more generally for the potential benefits of group learning and social music-making
(Heyes 2013; Kirschner and Tomasello 2010; Overy and Molnar-Szakacs 2009;
Overy 2012). The idea of “shared auditory imagery” is not well documented and perhaps
warrants future research.
A second major music education figure of the twentieth century, Edward Gordon,
based in the United States, focused his own music education approach much more spe-
cifically on auditory imagery, or what he calls “audiation.” Gordon proposes that only by
understanding where a young child’s current audiation skills lie, can the child be taught
appropriately, and much of Gordon’s work focuses on an interest in the variability of this
skill in the general population and how to measure it appropriately (1987). Importantly,
Gordon extends the meaning of the word “audiation” from auditory imagery alone to
include the process of listening to music with some cognition of its structure, rather
than just sensory perception, arguing that “audiation” is part of intelligent music listen-
ing. The Gordon measures of music audiation (e.g., Gordon 1979, 1982) have become
some of the most commonly used measures of musical ability in children and are often
also used in psychology and brain imaging research (e.g., Ellis et al. 2012). Examples
of the kinds of tasks used are the melody and rhythm discrimination tests, in which
two melodies or rhythms are heard and the child or adult’s task is to determine
whether they are the same or different, a task commonly found in tests of musical ability
(e.g., Bentley 1966; Wing 1970; Overy et al. 2003; 2005). Gordon assumes that, in order
to perform this comparison task, a child must be able to hold the initial melody in mind
for a short period of time, that is, to “audiate” the short extract. This measure thus links
directly with the idea that working memory rehearsal requires mental imagery, as
outlined earlier (Baddeley and Logie 1992).
Voluntary auditory imagery is central to the Gordon concept of musical ability and is
regarded as an important aspect of musicianship and an effective learning tool in the
Kodály approach. On further analysis, there are also some interesting key elements in
common between the two approaches. For example, both approaches: (1) use physical
movement gestures in the teaching of “inner hearing” or “audiation;” (2) place strong
emphasis on what Gordon calls “notational audiation” and what Kodály calls “musical
literacy,” that is, the ability to read a musical score and hear the music in one’s head; and
(3) place a strong emphasis of the importance of teacher-training programs in these
skills. A detailed comparative analysis of the two approaches would no doubt generate
some clear focus points for future research in this area, and perhaps lead to a richer
understanding of how auditory imagery can be used, adapted, and developed in a range
of different musical and pedagogical contexts.
Conclusions
Auditory imagery is an ability that most people can access and control, with a fair
amount of precision and with fidelity to actual perceived sounds. Such imagery can be
used for entertainment and emotional self-regulation (such as imagining calming songs
if one is in a stressful situation). But we wish to emphasize another aspect of this experi-
ence: what seems at times to be an effortless ability in fact can require a fair amount of
cognitive resources, including working and long-term memory. For both musicians and
nonmusicians, the successful re-evoking of music often reflects the fact that the material
has been encountered multiple times and reflects a detailed knowledge of the piece
(particularly in Ben-Or’s approach). Thus, we could view voluntary auditory imagery in
music learning as a tool that takes some effort to use, but results in superior technical
and expressive skills, or a “desirable difficulty.” The fact that brain activation during
auditory imagery shows areas in common with auditory perception but also unique
activation of higher-order areas involved in memory and executive function, supports
this idea of imagery being used as a tool to enhance learning.
Auditory imagery does not occur in a vacuum of course. Musicians can also use
motor and kinesthetic imagery (Meister et al. 2004), as they imagine their hand and
body movements during playing, and visual imagery when imagining a score, a piano
keyboard, or a conductor’s gestures from a prior rehearsal. Much research has pointed
to the multimodality of imagery, both behaviorally and in terms of neural function
(McNorgan 2012). The translation of a visual score into an auditory experience requires
coordination across the two modalities, often via some representation of the motor sys-
tem. Some of the pedagogy techniques described here exploit this interaction and could
perhaps still be extended. For example, in the Kodály approach, preschool children are
often asked to keep a sequence of learned motor actions going throughout an “action
song” while imagining some melodic lines and singing the others. Similarly, Curwen
hand-signs (Curwen 1854) are used in the Kodály approach with young children to repre-
sent pitch for both imagined and sung musical activities, before moving on to written
notation and more advanced musical materials. This use of the motor system to repre-
sent sound while it is being imagined might be further exploited in more advanced
ways, yet to be conceived and developed.
voluntary auditory imagery and music pedagogy 403
In this chapter, we have discussed the role of auditory imagery in primary music
e ducation as well as in professional musical situations. These methods explicitly recog-
nize that individuals with different levels of ability and training might use imagery in
different ways. For example, Ben-Or proposes using multimodal imagery or “total inner
memory” (Davidson-Kelly et al. 2015) to memorize and mentally rehearse a piece,
assuming that the piece is largely within the performer’s current technical expertise. The
Kodály approach extends from preschool to undergraduate levels of musicianship,
entailing the wide range of beginner to expert levels of repertoire and musical skill
therein. We assume that other approaches to music learning, such as imitative, oral
transmission styles found in non-Western cultures and nonnotated musical genres
such as pop and folk, may also use features of voluntary auditory imagery in a variety
of different ways.
We would also like to recognize here that adults older (even!) than undergraduates
often have an interest in beginning or furthering their musical experiences or training.
Some of the training methods referred to in this chapter could easily be adapted so that
the training was appropriate for middle-aged or senior adults, for example by using gener-
ation-appropriate songs and physical activities. For adults with more seriously limited
mobility, such adapted techniques might even be helpful for motor rehabilitation, for
example in cases of stroke survival or Parkinson’s disease, where musical imagery of a
steady beat, for example, has been proposed as potentially helpful in the rehabilitation of
motor skills (Schaefer 2014).
Of course, we should also emphasize that auditory imagery in music pedagogy is not
always focused on (eventual) proficiency in singing, playing a musical instrument or
reading music notation. We mentioned the value of social music-making earlier on, and
fully recognize that many adults who are not necessarily formally trained in music
nevertheless enjoy singing together in a group. However, many of these individuals are
not satisfied with their vocal abilities and wish they could improve. Some adults do not
sing at all, but wish they could improve their skills in order to enjoy both the artistic
and social benefits of music-making such as choral singing (Clift and Hancox 2010).
Research in progress with colleagues at a UK music conservatory is currently investi-
gating a new way to teach adults who do not sing much, or well, to sing more confidently
and more accurately. Given the strong relationship between auditory imagery vividness
and pitch matching ability, and the importance of musical imagery skills in many peda-
gogical approaches, one aspect of the research will be to create an intervention to train
and improve auditory imagery skills. The study will include a version of the mental pitch
comparison task mentioned earlier (Zatorre and Halpern 1993), where difficulty is grad-
ually increased by probing pitches that are increasingly distant from each other within
the song. Developed as an enjoyable app that can be accessed at home, the study will
track whether (1) it is possible to measure the improvement of auditory imagery skills
and (2) whether that improvement correlates with improved pitch matching and vocal
quality. Such improved skills may also lead to new possibilities in the areas of improvising
and composing for these adult learners.
404 andrea r. halpern and katie overy
We close with the thought that auditory imagery tasks are both inexpensive (one only
has to imagine sounds!) and can be fun, such as asking people to imagine and play with
famous tunes in their heads (we will leave you with an auditory image of the beautiful
song “Danny Boy” and ask you to enjoy and spend too long on the highest note).
Auditory imagery tasks can be developed for individuals with a wide range of musical
backgrounds and performance goals and can thus serve to enhance the traditional tools
of music educators.
Acknowledgments
Katie Overy thanks Eva Vendrei (in memoriam) for her inspirational teaching, Ittzés
Mihály (in memorium) for his expert advice, and the International Kodály Society for their
2001 Sarolta Kodály scholarship to study at the Zoltán Kodály Pedagogical Institute of
Music, Hungary.
Notes
1. https://pingroof.com/diana-nyad-inspiring-more-than-one-generation/ Accessed September
20, 2017.
2. “Wife, School Lost in Quake, Violinist Vows to Rebuild,” from the NPR news program All
Things Considered (2010). http://www.npr.org/2010/01/23/122900781/wife-school-lost-in-
quake-violinist-vows-to-rebuild. Accessed September 20, 2017.
References
Adam, J. 1944. Módszeres Énektanítás a Relatív Szolmizáció Alapján (Systematic Singing
Teaching Based on the Tonic Sol-fa). Budapest: Editio Musica Budapest.
Alzahabi, R., and M. W. Becker. 2013. The Association between Media Multitasking, Task-
Switching, and Dual-Task Performance. Journal of Experimental Psychology: Human
Perception and Performance 39: 1485–1495.
Baddeley, A. D., and J. Andrade. 2000. Working Memory and the Vividness of Imagery.
Journal of Experimental Psychology: General 129: 126–145.
Baddeley, A. D., and R. H. Logie. 1992. Auditory Imagery and Working Memory. In Auditory
Imagery, edited by D. Reisberg, 179–197. Hillsdale, NJ: Erlbaum.
Bailes, F. A. 2006. The Use of Experience-Sampling Methods to Monitor Musical Imagery in
Everyday Life. Musicae Scientiae 10: 173–190.
Beaman, C. P., and T. I. Williams. 2010. Earworms (Stuck Song Syndrome): Towards a Natural
History of Intrusive Thoughts. British Journal of Psychology 101: 637–653.
Bentley, A. 1966. Measures of Musical Abilities, Manual. London: George A. Harap.
Bernardi, N. F., A. Schories, H.-C Jabusch, B. Colombo, and E. Altenmueller. 2013. Mental
Practice in Music Memorisation: An Ecologicalempirical Study. Music Perception
30: 275–290.
Bjork, E. L., J. L. Little, and B. C. Storm. 2014. Multiple-Choice Testing as a Desirable Difficulty
in the Classroom. Journal of Applied Research in Memory and Cognition 3: 165–170.
voluntary auditory imagery and music pedagogy 405
Brodsky, W., Y. Kessler, B-S. Rubinstein, J. Ginsborg, A. Henik. 2008. The Mental Representation
of Music Notation: Notational Audiation. Journal of Experimental Psychology: Human
Perception and Performance 34: 427–445.
Clark, T., and A. Williamon. 2011. Evaluation of a Mental Skills Training Program for
Musicians. Journal of Applied Sport Psychology 23: 342–359.
Clift, S., and G. Hancox. 2010. The Significance of Choral Singing for Sustaining Psychological
Wellbeing: Findings from a Survey of Choristers in England, Australia and Germany. Music
Performance Research 3: 79–96.
Colley, I. D., P. E. Keller and A. R. Halpern. 2017. Working Memory and Auditory Imagery
Predict Sensorimotor Synchronization with Expressively Timed Music, Quarterly Journal
of Experimental Psychology 71: 1781–1796. doi:10.1080/17470218.2017.1366531.
Curwen, J. 1854. An Account of the Tonic Sol-fa Method of Teaching to Sing. London: Tonic
Sol-fa Press.
Davidson-Kelly, K. 2014. Mental Imagery Rehearsal Strategies for Expert Pianists. PhD thesis,
University of Edinburgh, Scotland.
Davidson-Kelly, K., R. S. Schaeffer, N. Moran, and K. Overy. 2015. “Total Inner Memory”:
Deliberate Uses of Multimodal Musical Imagery during Performance Preparation.
Psychomusicology: Music, Mind and Brain 25 (1): 83–92.
Ellis, R. J., A. C. Norton, K. Overy, E. Winner, D. C. Alsop, and G. Schlaug. 2012. Differentiating
Maturational and Training Influences on fMRI Activation during Music Processing.
NeuroImage 60 (3): 1902–1912.
Frieler, K., Fischinger, T., Schlemmer, K., Lothwesen, K., Jakubowski, K., & Müllensiefen, D.
(2013). Absolute Memory for Pitch: A Comparative Replication of Levitin’s 1994 Study in
Six European Labs. Musicae Scientiae 17 (3): 334–349.
Gelding, R. W., W. F. Thompson, and B. W. Johnson. 2015. The Pitch Imagery Arrow Task:
Effects of Musical Training, Vividness, and Mental Control. PLoS One 10 (3): e0121809.
Gordon, E. E. 1979. Primary Measures of Music Audiation. Chicago: GIA Publications.
Gordon, E. E. 1982. Intermediate Measures of Music Audiation. Chicago: GIA Publications.
Gordon, E. E. 1987. The Nature, Description, Measurement and Evaluation of Musical Aptitude.
Chicago: GIA Publications.
Greenspon, E. B., P. Q. Pfordresher, and A. R. Halpern. 2017. Mental Transformations of
Melodies. Music Perception 34: 585–604.
Griffiths, T. D. 2000. Musical Hallucinosis in Acquired Deafness: Phenomenology and Brain
Substrate. Brain 123: 2065–2076.
Halpern, A. R. 1988a. Mental Scanning in Auditory Imagery for Songs. Journal of Experimental
Psychology: Learning, Memory, and Cognition 14: 434–443.
Halpern, A. R. 1989. Memory for the Absolute Pitch of Familiar Songs. Memory and Cognition
17: 572–581.
Halpern, A. R. 2015. Differences in Auditory Imagery Self Report Predict Behavioral and
Neural Outcomes. Psychomusicology: Music, Mind, and Brain 25: 37–47.
Halpern, A. R., and J. C. Bartlett. 2011. The Persistence of Musical Memories: A Descriptive
Study of Earworms. Music Perception 28: 425–431.
Halpern, A. R., and R. J. Zatorre. 1999. When That Tune Runs through Your Head: A PET
Investigation of Auditory Imagery for Familiar Melodies. Cerebral Cortex 9: 697–704.
Halpern, A. R., R. J. Zatorre, M. Bouffard, and J. A. Johnson. 2004. Behavioral and Neural
Correlates of Perceived and Imagined Musical Timbre. Neuropsychologia 42: 1281–1292.
406 andrea r. halpern and katie overy
Hamilton, K. 2008. After the Golden Age: Romantic Pianism and Modern Performance.
New York: Oxford University Press.
Herholz, S. C., A. R. Halpern, and R. J. Zatorre. 2012. Neuronal Correlates of Perception,
Imagery, and Memory for Familiar Tunes. Journal of Cognitive Neuroscience 24: 1382–1397.
Heyes, C. 2013. What Can Imitation Do for Cooperation? In Cooperation and Its Evolution,
edited by K. Sterelny, R. Joyce, B. Calcott, and B. Fraser. Cambridge, MA: MIT Press.
Highben, Z., and C. Palmer. 2004. Effects of Auditory and Motor Mental Practice in
Memorized Piano Performance. Bulletin of the Council for Research in Music Education
159: 58–65.
Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136:302–329.
Hyman, I. E., Jr., N. K. Burland, H. M. Duskin, M. C. Cook, C. M. Roy, J. C. McGrath et al.
2013. Going Gaga: Investigating, Creating, and Manipulating the Song Stuck in My Head.
Applied Cognitive Psychology 27: 204–215.
Ittzés, M. 2002. Zoltán Kodály: In Retrospect. Kecskemét, Hungary: Kodály Institute.
Jakubowski, K. N. Farrugia, A. R. Halpern, S. K. Sankarpandi, and L. Stewart. 2015. The Speed
of Our Mental Soundtracks: Tracking the Tempo of Involuntary Musical Imagery in
Everyday Life. Memory and Cognition 43: 1229–1242.
James, I., and I. B. Savage. 1984. Beneficial Effect of Nadolol on Anxiety-Induced Disturbances
of Performance in Musicians: A Comparison with Diazepam and Placebo; Proceedings of
a Symposium on the Increasing Clinical Value of Beta Blockers Focus on Nadolol. American
Heart Journal 108: 1150–1155.
Karageorghis, C. I., J. C. Hutchinson, L. Jones, H. L. Farmer, M. A. Ayhan, R. C. Wilson, et al.
2013. Psychological, Psychophysical, and Ergogenic Effects of Music in Swimming.
Psychology of Sport and Exercise 14: 560–568.
Keller, P. E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and
Potential Benefits. Annals of the New York Academy of Sciences 1252 (1): 206–213.
Kirschner, S., and M. Tomasello. 2010. Joint Music Making Promotes Prosocial Behavior in
4-Year-Old Children. Evolution and Human Behavior 31: 354–364.
Kodály, Z. 1960. Folk music of Hungary. London: Barrie and Rockliff.
Kodály, Z. 1962. Bicinia Hungarica. London: Boosey and Hawkes.
Kodály, Z. 1974. The Selected Writings of Zoltán Kodály. London and New York: Boosey &
Hawkes.
Levitin, D. J., and P. R. Cook. 1996. Memory for Musical Tempo: Additional Evidence That
Auditory Memory Is Absolute. Perception and Psychophysics 58: 927–935.
Lima, C., N. Lavan, S. Evans, Z. Agnew, A. R. Halpern, P. Shanmugalingam, et al. 2015. Feel the
Noise: Relating Individual Differences in Auditory Imagery to the Structure and Function
of Sensorimotor Systems. Cerebral Cortex 25: 4638–4650. doi:10.1093/cercor/bhv134.
Lucas, B. J., E. Schubert, and A. R. Halpern. 2010. Perception of Emotion in Sounded and
Imagined Music. Music Perception 27: 399–412.
McEvenue, K. 2002. The Actor and the Alexander Technique. New York: Palgrave Macmillan.
McNorgan, C. 2012. A Meta-Analytic Review of Multisensory Imagery Identifies the Neural
Correlates of Modality-Specific and Modality-General Imagery. Frontiers in Human
Neuroscience 6: 285–295.
Meister, I. G., T. Krings, H. Foltys, B. Boroojerdi, M. Müller, R. Töpper, and A. Thron. 2004.
Playing Piano in the Mind—an fMRI Study on Music Imagery and Performance in Pianists.
Cognitive Brain Research 19: 219–228.
voluntary auditory imagery and music pedagogy 407
Müllensiefen, D., J. Fry, R. Jones, S. Jilka, L. Stewart, and V. Williamson. 2014. Individual
Differences Predict Patterns in Spontaneous Involuntary Musical Imagery. Music Perception
31 (4): 323–338. doi:10.1525/MP.2014.31.4.323.
Neisser, U. 1976. Cognition and Reality: Principles and Implications of Cognitive Psychology.
New York: Freeman.
Overy, K. 2012. Making Music in a Group: Synchronization and Shared Experience. Annals of
the New York Academy of Science 1252: 65–68.
Overy, K., and I. Molnar-Szakacs. 2009. Being Together in Time: Musical Experience and the
Mirror Neuron System. Music Perception 26: 489–504.
Overy, K., R. I. Nicolson, A. J. Fawcett, and E. F. Clarke. 2003. Dyslexia and Music: Measuring
Musical Timing Skills. Dyslexia 9: 18–36.
Overy, K., A. Norton, K. Cronin, E. Winner, and G. Schlaug. 2005. Examining Rhythm and
Melody Processing in Young Children using fMRI. Annals of the New York Academy of
Science 1060: 210–218.
Pfordresher, P. Q., and A. R. Halpern. 2013. Auditory Imagery and the Poor-Pitch Singer.
Psychonomic Bulletin and Review 20: 747–753.
Rosety-Rodriguez, M., F. J. Ordonez, and J. Farias. 2003. The Influence of the Active Range
of Movement of Pianists’ Wrists on Repetitive Strain Injury. European Journal of Anatomy
7: 75–77.
Schaefer, R. S. 2014. Auditory Rhythmic Cueing in Movement Rehabilitation: Findings and
Possible Mechanisms. Philosophical Transactions of the Royal Society B 369: 20130402.
Seashore, C. E. 1919. Seashore Measures of Musical Talent. New York: Columbia Phonograph
Company.
Seashore, C. E. 1938. Psychology of Music. New York: McGraw Hill.
Schellenberg, E. G., and S. E. Trehub. 2003. Good Pitch Memory Is Widespread. Psychological
Science 14: 262–266.
Szőnyi, E. 1974. Musical Reading and Writing. Vol.1. Budapest: Editio Musica Budapest.
Trusheim, W. H. 1991. Audiation and Mental Imagery: Implications for Artistic Performance.
Quarterly Journal of Music Teaching and Learning 2: 138–147.
Williamson, V. J., and S. R. Jilka. 2013. Experiencing Earworms: An Interview Study of
Involuntary Musical Imagery. Psychology of Music 42: 653–670. doi:10.1177/0305735613483848.
Wing, H. D. 1970. Standardised Tests of Musical Intelligence. Windsor: NFER-Nelson Publishing.
Wöllner, C., and A. R. Halpern. 2016. Attentional Flexibility and Memory Capacity in
Conductors and Pianists. Attention, Perception, and Psychophysics 78: 198–208. doi:10.3758/
s13414-015-0989-z.
Zatorre, R. J., and A. R. Halpern. 1993. Effect of Unilateral Temporal-Lobe Excision on
Perception and Imagery of Songs. Neuropsychologia 31: 221–232.
Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., and Evans, A. C. 1996. Hearing in the
Mind’s Ear: A PET Investigation of Musical Imagery and Perception. Journal of Cognitive
Neuroscience 8: 29–46.
Zatorre, R. J., and A. R. Halpern. 2005. Mental Concerts: Musical Imagery and the Auditory
Cortex. Neuron 47: 9–12.
Zatorre, R. J., A. R. Halpern, and M. Bouffard. 2010. Mental Reversal of Imagined Melodies:
A Role for the Posterior Parietal Cortex. Journal of Cognitive Neuroscience 22: 775–789.
chapter 20
A Differ en t Way of
Im agi n i ng Sou n d
Probing the Inner Auditory Worlds of Some
Children on the Autism Spectrum
Adam Ockelford
Introduction
Imagine that you are walking along a road at night when you hear a sound. On the
one hand, you might pay attention to its pitch and loudness and the ways they
change with time. You might attend to the sound’s timbre, whether it is rough or
smooth, bright or dull. . . . These are all examples of musical listening, in which the
perceptual dimensions and attributes of concern have to do with the sound itself,
and are those used in the creation of music. . . . On the other hand, as you stand there
in the road, it is likely that you will not listen to the sound itself at all. Instead, you
are likely to notice that the sound is made by an automobile with a large and power-
ful engine. Your attention is likely to be drawn to the fact that it is approaching
quickly from behind. And you might even attend to the environment, hearing that
the road you are on is actually a narrow alley, with echoing walls on each side. This
is an example of everyday listening, the experience of listening to events rather
than sounds.
So writes William Gaver (1993, 1), in relation to his “ecological” analysis of hearing, in
which he sets out how, for most listeners, in everyday contexts, the function of sounds
in auditory perception is privileged over their acoustic properties. However, there are
people for whom this does not appear to be the case, including a significant minority
of children who are on the autism spectrum. Children with autism typically have chal-
lenges with social interaction and communication, and tend to exhibit a narrow—even
obsessive—focus on particular activities that are often characterized by pattern and
410 adam ockelford
predictability. The perceptual qualities of objects may be more important than their
function, and, in the auditory domain, parents often report a fascination with sound,
apparently for its own sake:
“My son Jack is obsessed with the beeping sound of the microwave when its cooking
cycle comes to an end. He can’t bear to leave the kitchen till it’s stopped. And just
lately, he’s become very interested in the whirr of the tumble-drier too.”
“My four-year old daughter just repeats what I say. For a long time, she didn’t speak
at all, but now, the educational psychologist tells me, she’s ‘echolalic.’ I say, ‘Hello,
Anna,’ and she says ‘Hello, Anna’ back. I ask ‘Do you want to play with your toys’
and she just replies ‘Play with your toys,’ though I don’t think she really knows what
I mean.”
“Ben wants to listen to the jingles that he downloads from the internet all the time.
And I mean, the whole time—16 hours a day if we let him. He doesn’t even play
them all the way through: sometimes just the first couple of seconds of a clip, over
and over again. He must have heard them thousands of times. But he never seems to
get bored.”
“Callum puts his hands over his ears and starts rocking and humming to himself
when my mobile goes off, but totally ignores the ringtone on my husband’s phone,
which is much louder.”
“My ten-year-old son Freddie constantly flicks any glasses, bowls, pots or pans that
are within reach. The other day, he emptied out the dresser— and even brought in
half a dozen flowerpots from the garden—and lined everything up on the floor.
Then he sat and ‘played’ his new instrument for hours. I couldn’t see a pattern in
what he’d done, but if I moved anything when he wasn’t looking, he’d notice straight
away, and move it back again.”
“Every now and then, Romy only pretends to play the notes on her keyboard—
touching the keys with her fingers but not actually pressing them down. And some-
times, she introduces everyday sounds that she hears into her improvising. For
example, she plays the complicated descending harmonic sound of the aeroplanes
coming into land at Heathrow as chords, and somehow integrates them into the
music she is playing.”
“Omur repeatedly bangs away at particular notes on his piano (mainly ‘B’ and ‘F
sharp.’ high up in the right hand), sometimes persisting until the string or the ham-
mer breaks.”
“Derek (who is blind) copies the sounds of the page turns in his own rendition of a
Chopin waltz that his piano teacher played for him by tapping his fingers on the
music rack above the keyboard.” (Ockelford 2013)
Why should this be the case? What causes some autistic children to hear sounds in this
way? And what impact, if any, does this idiosyncratic style of auditory perception have
on the way that they perceive, remember, and imagine music? These are the questions
that lie at the heart of this chapter.
inner auditory worlds of children on the autism spectrum 411
An Ecological Model of
Auditory Perception
Early in life, “neurotypical” human infants learn to differentiate between auditory input
according to one of three functions that it can fulfill. This results in the development of
“everyday” listening, which, as Gaver observes, is concerned with attending to events
such as a car passing by or a door slamming; “musical” listening, which focuses on
perceptual qualities such as pitch and loudness; and “linguistic” listening, which is ulti-
mately based on the perception and cognition of speech sounds. The separation of music
and language perception ties in with evidence from neuroscience, which suggests that,
while the two domains share some neurological resources, they also have dedicated pro-
cessing pathways (Patel 2012) that are distinct from those activated by environmental
sounds (Norman-Haignere, Kanwisher, and McDermott 2015).
It is not known just how these three types of auditory processing—relating to everyday
sounds, music, and speech—become defined in the brain’s architecture following the
initial development of hearing around four to three months before birth (Lecanuet 1996).
There is currently some debate as to which develops first, although there is increasing
evidence that musical hearing and ability are essential to language acquisition (Brandt,
Gebrian, and Slevc 2012). My own work (e.g., Ockelford 2017) supports this view. My
theory of what makes music “music”—“zygonic” theory (Ockelford 2005)—contends
that, for music to exist in the mind, there must be perceived imitation of one feature of a
sound by another (Ockelford, 2005), and the fact that, from an early age, babies do copy
vocal sounds and relish being copied long before they can use or understand words sug-
gests that music is indeed a precursor of language (Voyajolu and Ockelford 2016). In any
case, singing and speech appear to follow discrete developmental paths from around
the beginning of the second year of life (Lecanuet 1996). We can surmise that the other
category—“everyday” sounds—must perceptually be the most primitive of all, since it
appears to require less cognitive processing than either music or speech. And in phylo-
genetic terms (in our development as a species), the capacity to process music and then
language are thought to be relatively recent specialisms of the auditory system (see, e.g.,
Masataka 2007). Hence it seems reasonable to assume that, early on in “typical” human
development, the brain treats all sound in the same way and that music processing starts
to emerge first, followed by language. We can speculate that the residue that is left
remains as “everyday” sounds. Hence the ecological model of auditory perception can
be represented as follows, in which it is assumed that, as well as their shared neural
resources, music and language come to have distinct, additional distinct neural cor-
relates during the first postnatal year (Figure 20.1). Clearly, since the precise nature of
the sounds that constitute speech or music, and the relationship between them, varies
somewhat from one culture to another, the model should be regarded as indicative
rather than absolute.
412 adam ockelford
Developmental
stream of sound
perception
and production
–3 months
“neurotypical”
development
Proto-musical
processing starts
Birth to emerge
Proto-speech
processing starts
to become distinct
from music
12 months
“neurotypical”
development
Distinguishable
cognitive domains
(though some
overlap remains)
Everyday Speech
sounds Music
Figure 20.1 The emerging streams of music and language processing in auditory development.
But what of children on the autism spectrum? As the parents’ descriptions suggest, it
seems that certain sounds, especially those that are particularly salient or pleasing to an
individual, such as the whirring of the tumble drier, acquire little or no functional sig
nificance for some children. Instead, they tend to be processed only in terms of their
sounding qualities—that is, in musical terms. It seems also that everyday sounds that
involve repetition or regularity (such as the beeping of a microwave) may be processed
in music-structural terms. This would imply that the children hear the repetition that is
actually generated mechanically or electronically as being imitative (Figure 20.2).
There is, of course, another possibility that we should acknowledge: that the autistic
children who are preoccupied with the sounding qualities of certain everyday objects
and the repetitive patterns that some of them make do not actually hear them in a musical
way—that is, as being derived from one another through imitation—but purely as regu-
larities in the environment. Furthermore, it could be that those same children do not
hear music as “music” either, but merely as patterned sequences of sounds, to which no
sense of human agency is transferred. Why should this be the case? Perhaps because
inner auditory worlds of children on the autism spectrum 413
Developmental
stream of sound
perception
and production
–3 months
“neurotypical”
development
Proto-musical
processing starts
Birth to emerge
Proto-speech
processing starts
to become distinct
from music
12 months
“neurotypical”
development
Everyday Speech
sounds Music
Figure 20.2 Some everyday sounds might be processed as music among children on the
autism spectrum.
such children did not engage in the early vocal interactions with carers—“communicative
musicality” (Malloch and Trevarthen 2009)—that I have suggested may embed a sense
of imitation in sounds that are repeated (Ockelford 2017). However, the accounts of
Romy reproducing the whines of jet engines of airplanes coming in to land and integrat-
ing them into her improvisation at the piano, of Derek evidently regarding the rustle of a
page turn as part of a Chopin waltz, and of Freddie appropriating everyday sound-makers
(flower pots) to be used as musical instruments, suggest that some autistic children, at
least, do perceive everyday sounds in a musical way.
It may well be that this tendency is reinforced by the prevalence of music in the lives
of young children (Lamont 2008); in the developed world, they are typically surrounded
by electronic games and gadgets, toys, mobile phones, mp3 players, computers, iPads,
TVs, radios, and so on, all of which emanate music to a greater or lesser extent. In the
wider environment too—in restaurants, cafés, shops, cinemas and waiting rooms, cars
and airplanes, and at many religious gatherings and other public ceremonies— music is
414 adam ockelford
ubiquitous. So, given that children are inundated with nonfunctional (musical) sounds,
designed, in one way or another, to influence emotional states and behavior, perhaps we
should not be surprised that the sounds with which they often co-occur that to neuro-
typical ears are functional, should come to be processed in the same way.
The manner in which some autistic children perceive the world can have other conse-
quences too. For example, the development of language can be affected, resulting in,
among other things, “echolalia”—a distinctive form of speech widely reported among
blind and autistic children (Mills 1993; Sterponi and Shankey 2013) and which was
originally defined as the meaningless repetition of words or phrases (Fay 1967; 1973).
However, it appears that echolalia actually fulfills a range of functions in verbal interaction
(Prizant 1979), including turn-taking and affirmation, and often finds a place in noninter
active contexts too, serving as a self-reflective commentary or rehearsal strategy
(Prizant and Duchan 1981; McEvoy, Loveland, and Landry 1988). Given the hypothesis
that imitation lies at the heart of musical structure (Ockelford 2012), it could be argued
that one cause of echolalia is the organization of language (in the absence of semantics
and syntax) through the structure (repetition) that is present in all music. It is as though
words become musical objects in their own right, to be manipulated not according to
their meaning or grammatical function, but purely through their sounding qualities.
This implies a further modification to the ecological model of auditory development
(see Figure 20.3).
It is of interest to note that echolalia is not restricted to certain exceptional groups
who exist on one extreme of the multidimensional continuum that makes up human
neurodiversity; it is a feature of “typical” language acquisition in young children
(Mcglone-Dorrian and Potter 1984) when, it seems, the urge to imitate what they hear
outstrips semantic understanding. This would accord with a stage in the ecological model
of auditory development when the two strands of communication through sound—
language and music—are not cognitively distinct, and would support the notion that
musical development precedes the onset of language.
For children on the autism spectrum, it is worth noting that music itself can become
“superstructured” with additional repetition, as the account, for example, of Ben shows;
it is common for children on the autism spectrum to play snippets of music (or videos
with music) over and over again. It is as though music’s already high proportion of repe-
tition, which is at least 80 percent (Ockelford 2005), is insufficient for the mind raven-
ous for structure, and so it creates even more. Speaking to autistic adults who are able to
verbalize why (as children) they would repeat musical excerpts in this way, it appears
that the main reason (apart from the sheer enjoyment of hearing a particularly fascinat-
ing series of sounds repeatedly) is that they could hear more and more in the sequence
concerned as they listened to it again and again. Bearing in mind that most music is, as
we have seen, highly complex, with many events occurring simultaneously (and given
that even single notes generally comprise many pitches in the form of harmonics), to the
child with finely tuned auditory perception, there is in fact a plethora of different things
to attend to in even a few seconds of music, and an even greater number of relationships
inner auditory worlds of children on the autism spectrum 415
Developmental
stream of sound
perception
and production
–3 months
“neurotypical”
development
Proto-musical
processing starts
Birth to emerge
12 months
“neurotypical”
development
Everyday Speech
sounds Music
Figure 20.3 Speech might also be processed in musical terms by some children on the autism
spectrum.
between sounds to fathom. So, for example, while listening to a passage for orchestra one
hundred times may be extremely tedious to the “neurotypical” ear, which can detect only
half a dozen composite events, each fused in perception, to the mind of the autistic child,
which can break down the sequence into a dozen different melodic lines, the stimulus
may be rich and riveting.
Moreover, there tends to be far more structure in a piece of music than would theo
retically be required for it to make sense (Ockelford 2017). Compositions are, by any
standards, overengineered, typically with levels of repetition of 80 percent or more
(Ockelford 2005). In terms of information theory, they are high redundant. Why should
this be the case? Perhaps because, traditionally, composers have been aware that they
need to design pieces in such a way that their message will still come across in the sub
optimal circumstances that will inevitably characterize most performances. For example,
different interpretations may unexpectedly foreground some features of a work at the
416 adam ockelford
expense of others. The acoustics in which a concert takes place may be less than ideal.
Listeners’ concentrations may wander. For the child on the autism spectrum, though,
attending to the same short passage for the nth time, it means that to go on listening
remains a worthwhile venture; there are still new connections between notes to
be unearthed.
Absolute Pitch
It seems that one of the consequences of an early preoccupation with the “musical”
qualities of sounds is the development of “absolute pitch” (AP)—the capacity to identify
or produce pitches in isolation from others. In the West’s population as a whole, this
ability is extremely rare, with an estimated prevalence of 1 in 10,000 (Takeuchi and
Hulse 1993). However, among those on the autism spectrum, the position is very dif-
ferent; recent estimates, derived from parental questionnaires, vary between 8 per-
cent, N = 118 (Vamvakari 2013) and 21 percent, N = 305 (Reese 2014). These figures are
broadly supported by DePape, Hall, Tillmann, and Trainor (2012) who, in a study of
twenty-seven high-functioning adolescents with autism spectrum condition, found
that three of them (11 percent) had AP. It is very unusual to find such high orders of
difference in the incidence of a perceptual ability between different subgroups of the
human population and, evidently, there is something distinct in the way that the parts
of the brain responsible for pitch memory wire themselves up in a significant minority
of autistic children.
While AP is a useful (though inessential) skill in “neurotypical” musicians—including
those performing at the highest level—it appears to be an indispensable factor in the
development of music performance skills in autistic children with learning difficulties—
so-called “savants” (Miller 1989). It appears to be this unusual ability that motivates and
enables some young children with a limited understanding of the world, from the age of
twenty-four months or so, to pick out tunes and harmonies on instruments that they may
encounter at home or elsewhere—typically the keyboard or piano. This may well occur
with no adult intervention (or, indeed, awareness). It seems that AP has this impact
since each pitch sounds distinct, potentially eliciting a powerful emotional response, so
being able to reproduce these at will must surely be an electrifying experience. But more
than this, AP makes learning to play by ear manageable, in a way that “relative pitch”—
the capacity to process melodic and harmonic intervals—does not. To understand why,
consider a typical playground chant that children use to taunt one another (Figure 20.4).
In “neurotypical” individuals, motifs such as this are likely to be encoded in the mind,
stored and retrieved principally as a series of differences between notes (although “fuzzy”
absolute memories will exist—a child would know if the chant were an octave too high,
for example). However, for children with AP, the position is quite different, since they have
the capacity to capture the pitch data from music directly, rather than as series of intervals.
Hence, in seeking to remember and repeat groups of notes over significant periods of
time, they have certain processing advantages over their “neurotypical” peers, who
extract and store information at a higher level of abstraction, and thereby lose the “surface
detail.” (Note that there are disadvantages to “absolute” representations of pitch too since,
on their own, they cannot take advantage of the patterns that exist through the repetition
of intervals and they make greater demands on memory. However, as there appears to
be, to all intents and purposes, no limit on the brain’s long-term storage capacity, this is
not a serious problem; indeed, having an exceptional memory is something that is com-
mon to many children with autism.)
In my view, it is this capacity for “absolute pitch data capture” that explains why chil-
dren with AP who are on the autism spectrum and have learning difficulties are able to
develop instrumental skills at an early age with no formal tuition since, for them, repro-
ducing groups of notes that they have heard is merely a question of remembering a series
of one-to-one mappings between given pitches as they sound and (typically) the keys on
a keyboard that produce them. These relationships are invariant; once learned, they can
service a lifetime of music making, through which they are constantly reinforced. On
the other hand, were a child with “relative pitch” to try to play by ear, he or she would
have to become proficient in the far more complicated process of calculating how the
intervals that are perceived map onto the distances between keys, which, due to the
asymmetries of the keyboard, are likely to differ according to what would necessarily be
an arbitrary starting point.
For example, the interval that exists between the first two notes of the playground
chant (a minor 3rd) shown in Figure 20.5 can be produced through no fewer than twelve
distinct key combinations, comprising one of four underlying patterns. Moreover, the
complexity of the situation is compounded by the fact that virtually the same physical
leap between other keys may sound different (a major 3rd) according to its position on
the keyboard.
That is not to say that children with AP who learn to play by ear do not rapidly
develop the skills to play melodies beginning on different notes too, and it is not unu-
sual for them to learn to reproduce pieces fluently in every key. This may appear contra-
dictory, in the light of the processing advantage conferred by being able to encode
pitches as perceptual identities in their own right, each of which, as we have seen,
maps uniquely onto a particular note on the keyboard. However, the reality of almost
all pieces of music is that melodic (and harmonic) motifs variously appear at differ-
ent pitches through transposition and so, to make sense of music, young children
with AP need to learn to process pitch relatively as well as absolutely (Stalinski and
Schellenberg 2010).
418 adam ockelford
? ?
P
Motif stored Motif stored G
I as series of as series of E
intervals pitches
But!
But!
But!
But!
Figure 20.5 The different mechanisms involved in playing by ear using “absolute” and “relative”
pitch abilities.
What is the day-to-day impact of AP on children with learning difficulties who are on
the autism spectrum likely to be? The answer is: as varied as the children are themselves.
Elsewhere, I have written at length about the extraordinary life of Derek Paravicini
(Ockelford 2009) who is what Treffert (2009) calls a “prodigious” musical savant. It is
simply not possible to imagine Derek without his piano playing, in which the way he
thinks, the way he feels, and the way he relates to other people are embodied. But there are
many other children on the autism spectrum with whom I have worked over many years
and who are no less exceptional in their different ways and no less enlightening as to how
musical sounds can be remembered and imagined.
inner auditory worlds of children on the autism spectrum 419
Figure 20.6 Romy and Adam share a musical joke (image © 2010 Evangelos Himonides).
In this context, here are two accounts of children whom I have worked with every
week for a number of years. They are taken from blogs that were designed to raise aware-
ness of autism and musicality and to stir the debate on the relationship between so-called
disability and ability. The children and their parents visit me in a large practice room at
the University of Roehampton where I am based. There are two pianos, to avoid potential
difficulties over personal space. A number of the children rarely say a word. Some, like
Romy, are entirely nonverbal. She converses through her playing, showing what piece
she would like next, and indicating when she has had enough. On occasions, she will
tease me by apparently suggesting one thing when she means another. In this way, jokes
are shared and, sometimes, feelings of sadness too. For Romy, music truly functions as a
proxy language (Figure 20.6).
When we started working together, six years ago, mistakes and misunderstandings
occurred all too frequently since, as it turned out, there were very few pieces that
Romy would tolerate: for example, the theme from Für Elise (never the middle
section); the Habanera from Carmen; and some snippets from “Buckaroo Holiday”
(the first movement of Aaron Copland’s Rodeo). Romy’s acute neophobia meant that
even one note of a different piece would evoke shrieks of fear-cum-anger, and the
session could easily grow into an emotional conflagration.
So gradually, gradually, over weeks, then months, and then years, I introduced new
pieces—sometimes, quite literally, at the rate of one note per session. On occasion,
if things were difficult, I would even take a step back before trying to move on again
the next time. And, imperceptibly at first, Romy’s fears started to melt away. The
theme from Brahms’s Haydn Variations became something of an obsession, fol-
lowed by the slow movement of Beethoven’s Pathetique sonata. Then it was Joplin’s
The Entertainer, and Rocking All Over the World by Status Quo.
Over the six years, Romy’s jigsaw box of musical pieces—fragments ranging from just
a few seconds to a minute or so in length—has filled up at an ever-increasing rate.
Now it’s overflowing, and it’s difficult to keep up with Romy’s mercurial musical mind;
mixing and matching ideas in our improvised sessions, and even changing melodies
and harmonies so they mesh together, or to ensure that my contributions don’t!
As we play, new pictures in sound emerge and then retreat as a kaleidoscope of ideas
whirls between us. Sometimes a single melody persists for fifteen minutes, even half
an hour. For Romy, no matter how often it is repeated, a fragment of music seems to
stay fresh and vibrant. At other times, it sounds as though she is trying to play sev-
eral pieces at the same time—she just can’t get them out quickly enough, and a ver-
itable nest of earworms wriggle their way onto the piano keyboard. Vainly I attempt
to herd them into a common direction of musical travel.
So here I am, sitting at the piano in Roehampton, on a Sunday morning in mid-
November, waiting for Romy to join me (not to be there when she arrives is asking
for trouble). I’m limbering up with a rather sedate rendition of the opening of Chopin’s
Etude in C major, Op. 10, No. 1 when I hear her coming down the corridor, vocal-
izing with increasing fervor. I feel the tension rising, and as her father pushes open
the door, she breaks away from him, rushes over to the piano and, with a shriek and
an extraordinarily agile sweep of her arm, elbows my right hand out of the way at
the precise moment that I was going to hit the D an octave above middle C. She
usurps this note to her own ends, ushering in her favorite Brahms-Haydn theme.
Instantly, Romy smiles, relaxes and gives me the choice of moving out of the way or
having my lap appropriated as an unwilling cushion on the piano stool. I choose the
former, sliding to my left onto a chair that I’d placed earlier in readiness for the move
that I knew I would have to make.
I join in the Brahms, and encourage her to use her left hand to add a bass line. She
tolerates this up to the end of the first section of the theme, but in her mind she’s
already moved on, and without a break in the sound, Romy steps onto the set of
A Little Night Music, gently noodling around the introduction to Send in the Clowns.
But it’s in the wrong key—G instead of E flat—which I know from experience means
that she doesn’t really want us to go into the Sondheim classic, but instead wants me
to play the first four bars (and only the first four bars) of Schumann’s Kleine Studie
inner auditory worlds of children on the autism spectrum 421
Op. 68, No. 14. Trying to perform the fifth bar would, in any case, be futile since Romy’s
already started to play . . . now, is it I am Sailing or O Freedom? The opening ascent
from D through E to G could signal either of those possibilities. Almost tentatively,
Romy presses those three notes down and then looks at me and smiles, waiting, and
knowing that whichever option I choose will be the wrong one. I just shake my head
at her and plump for O Freedom, but sure enough Rod Stewart shoves the Spiritual
out of the way before it has time to draw a second breath.
From there, Romy shifts up a gear to the Canon in D—or is it really Pachelbel’s
masterpiece? With a deft flick of her little finger up to a high A, she seems to suggest
that she wants Streets of London instead (which uses the same harmonies). I opt for
Ralph McTell, but another flick, this time aimed partly at me as well as the keys, shows
that Romy actually wants Beethoven’s Pathetique theme—but again, in the wrong
key (D). Obediently I start to play, but Romy takes us almost immediately to A flat (the
tonality that Beethoven originally intended). As soon as I’m there, though, Romy
races back up the keyboard again, returning to Pachelbel’s domain. Before I’ve had
time to catch up, though, she’s transformed the music once more; now we’re hearing
the famous theme from Dvorak’s New World Symphony.
I pause to recover my thoughts, but Romy is impatiently waiting for me to begin
the accompaniment. Two or three minutes into the session, and we’ve already touched
on twelve pieces spanning 300 years of Western music and an emotional range to
match. Yet, here is a girl who in everyday life is supposed to have no “theory of
mind”—the capacity to put yourself in other people’s shoes and think what they are
thinking. Here is someone who is supposed to lack the ability to communicate. Here
is someone who functions, apparently, at an 18-month level. But I say here is a joyous
musician who amazes all who hear her. Here is a girl in whom extreme ability and
disability coexist in the most extraordinary way. Here is someone who can reach out
through music and touch one’s emotions in a profound way. If music is important to
us all, for Romy it is truly her lifeblood.1
How did Romy, severely learning disabled, become such a talented, if idiosyncratic,
musician? In my view, it was her early inability to process language, in tandem with her
inability to grasp the portent of many everyday sounds, that enhanced her ability to
process all sounds in a musical way. The two were inextricably linked. Indeed, without
the former, we can surmise that the latter would never have developed.
Romy has AP, meaning that for her, as we have seen, her mental images of musical
sounds are distinct with regard to pitch. Hence, every note on the piano is instantly recog-
nizable. But more than this, for Romy, each pitch provides a stable point of reference in
a capricious world. And it’s not just notes on the piano that function for Romy in this
way. In her mind, each of the notes in any piece of music sounds distinct. While, for most
of us, musical sounds pass by unremarkably in perceptual terms, for Romy, different
notes, different chords, can affect her profoundly: an E flat major harmony can make her
quiver with excitement, for example, while G7 can make her cry.
In itself, though, absolute pitch is insufficient to make an exceptional musician; that
takes at least seven thousand hours of practice (Sloboda et al. 1996). How, then, did Romy
acquire her musical skills? Like many autistic children early in life, she developed an
422 adam ockelford
obsession. In her case this was a small electronic keyboard, whose notes lit up in the
sequence needed to play one of a number of simple tunes. As far as Romy was con-
cerned, this musical toy was one of only a few things with which she could meaningfully
interact, and whose logic she could understand, and she spent hundreds of hours play-
ing with it. The keyboard was comfortingly predictable in comparison to any human
being—even her devoted family, whose language and behavior differed subtly from one
occasion to another, as all interaction engagement does. The keyboard, though, invaria-
bly responded to Romy in the same way. Whenever she pressed a particular key, it always
sounded the same as it did before. Here was something in the environment that Romy
could predict and control.
And so, through countless hours of self-directed exploration as a toddler, Romy discov-
ered where all the notes (whose sounds she could hear in her head) are on the keyboard.
Today, as a teenager, for Romy to play the piano merely requires her to hear a tune in her
head (available to her through the internal library of songs, stored as series of absolute
auditory images) and play along with it, pressing down the correct keys in sequence as
their pitches sound in her head. And this approach works not only for music. As we noted
earlier, she will reproduce the sounds of the jet engines of planes as they descend toward
Heathrow Airport, for example, and she unhesitatingly copies any ringtones that inter-
rupt her piano lessons.
Absolute pitch can have other consequences for children on the autism spectrum too.
The absolute representation of sounds in their heads appears to fuel musical imagination
in a way that is more vivid, more visceral even, than the relative memory of intervals alone.
And, although formal research is yet to be undertaken, the anecdotal accounts of par-
ents and teachers suggest that earworms are widespread; evidenced most obviously in
some children’s incessant vocalizing of melodic fragments. With minds full of tunes that
seem to be playing the whole time, external sounds can be at best superfluous and at
worst an irritation, as the following account of a session with Freddie, then eleven years
old, shows (Figure 20.7).
Figure 20.7 Freddie picks out a note on the piano (image © 2012 The University of Roehampton).
And then, spontaneously, he was off up the keyboard, beginning the same pentatonic
pattern on each of the twelve available keys. At my prompting, Freddie re-ran the
sequence with his left hand—his unbroken voice hoarsely whispering the low notes.
So logical. Why bother to play the notes if you know what they sound like already?
So apparently simple a task, and yet . . . such a difficult feat to accomplish: the whole
contradiction of autism crystallized in a few moments of music making.2
Conclusion
In this chapter, we have seen how some children on the autism spectrum appear to have
aural imaginations that are rooted in processing a range of everyday sounds and even
speech in a musical way. The way they perceive, remember, and imagine sounds has a
high level of intensity born of their sense of AP. This enables them to play by ear—a skill
that is often acquired entirely through their own efforts and that typically first manifests
itself in the early years. But more than this, for Freddie, for Romy, and for many other
children on the autism spectrum, music may be the key not only to aesthetic fulfillment,
but also to communication, shared attention, and emotional understanding. It can do this
because it is a language built not on symbolic meaning but on repetition; on order and
on predictability in the domain of sound. With musically empathetic adults with whom
to interact, this love of pattern—insistence, even—need not restrict the children’s auditory
imaginations but can emancipate them, through the capacity to understand musical
structure and the rules of the generative grammars through which melodies, harmonic
sequences, and rhythms are created afresh.
Notes
1. http://blog.oup.com/2012/12/music-proxy-language-autisic-children. Accessed September
15, 2017.
2. http://www.huffingtonpost.com/adam-ockelford/autism-genius_b_4118805.html.
Accessed September 15, 2017.
inner auditory worlds of children on the autism spectrum 425
References
Brandt, A., M. Gebrian, and L. R. Slevc. 2012. Music and Early Language Acquisition. Frontiers
in Psychology 3. doi:10.3389/fpsyg.2012.00327.
DePape, A.-M. R., G. B. C. Hall, B. Tillmann, and L. J. Trainor. 2012. Auditory Processing in
High-Functioning Adolescents with Autism Spectrum Disorder. PLoS One 7 (9): e44084.
doi:10.1371/journal.pone.0044084.
Fay, W. H. 1967. Childhood Echolalia. Folia Phoniatrica et Logopaedica 19 (4): 297–306.
doi:10.1159/000263153.
Fay, W. H. 1973. On the Echolalia of the Blind and of the Autistic Child. Journal of Speech and
Hearing Disorder 38 (4): 478. doi:10.1044/jshd.3804.478.
Gaver, W. W. 1993. What in the World Do We Hear? An Ecological Approach to Auditory
Event Perception. Ecological Psychology 5 (1): 1–29. doi:10.1207/s15326969eco0501_1.
Lamont, A. 2008. Young Children’s Musical Worlds: Musical Engagement In 3.5-Year-Olds.
Journal of Early Childhood Research 6 (3): 247–261. doi:10.1177/1476718x08094449.
Lecanuet, J.-P. 1996. Prenatal Auditory Experience. In Musical Beginnings, 3–34. Oxford: Oxford
University Press.
Malloch, S., and C. Trevarthen, eds. 2009. Communicative Musicality: Exploring the basis of
Human Companionship. New York, NY: Oxford University Press.
Masataka, N. 2007. Music, Evolution and Language. Developmental Science 10 (1): 35–39.
McEvoy, R. E., K. A. Loveland, and S. H. Landry. 1988. The Functions of Immediate Echolalia
in Autistic Children: A Developmental Perspective. Journal of Autism and Developmental
Disorders 18 (4): 657–668. doi:10.1007/bf02211883.
Mcglone-Dorrian, D., and R. E. Potter. 1984. The Occurrence of Echolalia in Three Year Olds’
Responses to Various Question Types. Communication Disorders Quarterly 7 (2): 38–47.
doi:10.1177/152574018400700204.
Miller, L. 1989. Musical Savants: Exceptional Skill and Mental Retardation. Hillsdale, NJ: Laurence
Erlbaum.
Mills, A. 1993. Visual Handicap. In Language Development in Exceptional Circumstances edited
by D. Bishop and K. Mogford. Hove: Psychology Press, 150–164.
Norman-Haignere, S., N. G. Kanwisher, and J. H. McDermott. 2015. Distinct Cortical Pathways
for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition. Neuron 88 (6):
1281–1296. doi:10.1016/j.neuron.2015.11.035.
Ockelford, A. 2005. Repetition in Music: Theoretical and Metatheoretical perspectives. Farnham:
Ashgate.
Ockelford, A. 2009. In the Key of Genius: The Extraordinary Life of Derek Paravicini. London:
Random House.
Ockelford, A. 2012. Music, Language and Autism. London: Jessica Kingsley.
Ockelford, A. 2013. Applied Musicology: Using Zygonic Theory to Inform Music Education,
Therapy, and Psychology Research. New York, NY: Oxford University Press.
Ockelford, A. 2017. Comparing Notes: How We Make Sense of Music. London: Profile Books.
Patel, A. D. 2012. Language, Music, and the Brain: A Resource-Sharing Framework. In
Language and Music as Cognitive Systems, edited by P. Rebuschat, M. Rohmeier,
J. A. Hawkins, and I. Cross, 204–223. Oxford: Oxford University Press.
Prizant, B. 1979. An analysis of the functions of immediate echolalia in autistic children,
Dissertation Abstracts International 39 (9-B), 4,592–4,593.
426 adam ockelford
M u ltimoda l I m agery
i n the R ecepti v e
M usic Th er a py Model
Gu ided I m agery
a n d M usic (GI M )
Introduction
Music is a “technology of the self,” as Tia DeNora (2000) concluded in her pioneering
study of how music is used in everyday life. DeNora based her study on interviews
and observations, focusing on how music was used in contexts as different as aerobic
exercise classes, karaoke evenings, and music therapy sessions. DeNora elaborated on
Gibson’s (1983) concept of affordance—in this case documenting how listening to music
can offer the listener a variety of options for use (affordances), mirrored in specific
appropriations related to the listener’s needs and the context. Since DeNora’s study, a
number of empirical studies have provided more and other evidence of how music
listening is appropriated, that is, used for a number of purposes (Bonde et al. 2013;
Clarke 2005; Lilliestam 2013). In my own research, I have concentrated on health
music(k)ing; that is, how music can be used as/in therapy and as a health resource in every
day life (Bonde 2000, 2005, 2007, 2010, 2017; Bonde and Blom 2016). In an ongoing
study on music and public health (Bonde et al. 2018; Ekholm et al. 2016a, 2016b) it is
documented that two-thirds of the adult Danish population use music for relaxation
and mood regulation and that an equal number regard music as a health resource.
In this chapter, I will focus on a specific model of receptive music therapy; that is,
psychotherapy based on imagination facilitated by music listening, namely the Bonny
Method of guided imagery and music (GIM), because this model can illustrate the close
428 lars ole bonde
The American musician and music therapist Helen Lindquist Bonny (1921–2010)
developed a new model of receptive music therapy in the 1970s and 1980s. It is called the
Bonny Method of GIM and today it is the internationally best-known receptive music
therapy model with training, clinical work, and research performed in four continents.
The Bonny Method is the name of an individual session format developed by Bonny,
while GIM is a generic concept encompassing many different individual or group for
mats using music, imagery, and verbal dialogue in/as therapy (Bruscia 2002). The Bonny
Method is “a model of music psychotherapy centrally consisting of a client imaging
spontaneously to pre-recorded sequences of classical music” (Abrams 2002, 103).
It should be added that the spontaneous imaging to (classical) music in GIM is based on
the induction of an altered state of consciousness (ASC) through deep relaxation
multimodal imagery in guided imagery and music 429
Music listening, both in and outside GIM, can evoke and support imagery in all sensory
modalities: visual, auditory, olfactory, gustatory, and sensory-kinesthetic. In GIM
theory and practice, emotions and memories are also considered imagery modalities.
430 lars ole bonde
Mein Jesu E Strings only. They sound Now I feel sadness. I want Death to What happens in A cemetery. The music evokes The sad and sombre
soft and muted. Cello Allow yourself to feel be my friend. your neck? Graves and visual imagery and mood from the previous
A1 R
plays the melody. that. Where do you feel Something is pressing stones. emotional reactions. track (Bach: Komm süsser
(1–6) S The bowing is the sadness? (she yawns, massages Tod) is deepened by the
V continuous. In my head. her jaws, tears) soft and earnest voice
of the celli.
A2 V Violins take over the I see an elephant. It is Is anything preventing Can you feel what the It reminds me of Mood: 2 sadness, The initial statement
(1–6 rep.) melody, one octave huge and tired, moves you from that? press is about? death. I don’t sorrow, loneliness. is confirmed by the
R
higher. heavily. The battle is I don’t think so. want Opening toward a violins one octave
E lost. What battle? Is Death nearby? Yes. to go into that. meeting or a vast higher.
How does it look? space.
B1 V/E Celli take over the The battle about It is light and mild—like Allow yourself to feel The body is tense Expansion and The tension builds
(7–14) melody again. The managing everything. the angel: the feelings. all over. exploration—dialogue up through the harmonic
E/A
chromatic quavers end That’s why it is sad.—I It says: “Be not afraid!” is possible. underpinning of the
S with the first breathing am the elephant. It Minor mood changes: chromatic melodic line.
point. Second breathing didn’t succeed. It will 1 (spiritual, dignified, The breathing points allow
point is before the last be shot, I think. It has serious), or 3 (longing, the listener to digest and
phrase. given it up. A lotus yearning). let go. The final melodic
suddenly appears. phrase is like a prayer.
(continued)
Table 21.1 (Continued)
Episode/ Code Music Mrs. L 6,2–3 Mrs. F 8,4–5 Mrs. A 10,2–3 Mrs. H 9,4–5 Workshop Comments
Bars
B2 E/V Violins take over. Just above the head of How is it for you to Images of death/ The celli repeat the final
(7–14 rep.) Dynamic intensity, the elephant. How does hear that? rebirth or saying phrase in an introverted
E
both in crescendi and in that feel? Very good. goodbye are possible. confirmation of the
diminuendi. Surprising Very confident. The necessity of prayer. The
subito piano in the end lotus is a sign that final major chord offers
of the chromatic phrase. someone holds his hand comfort.
over it, even if it can’t
see it . . . I can see it.
B3 E Last three bars are re How is it for you to be It is something about
(Coda) repeated, with celli aware of that? accept—will I ever reach
R
playing the melody. (Coughs). the other side?—And
Ends on a D major It feels safe. what is the other side?
chord. (tears).
How is it for you
right now?
Both difficult and OK.
First column: Episodes corresponding with phenomenological description and formal analysis of the music.
Second column: Coding of image modalities (V = visual, A = auditory, S = sensory-kinesthetic, O = olfactory, G = gustatory, E = emotions, M = memories, R = reflections and
thoughts, T = transpersonal, Ot = other, e.g., body tension).
Third column: Cues referring to the phenomenological description and the Intensity profile of the music.
Fourth–seventh columns: Imagery of four participants (1,1 = First session, first music selection).
Eighth column: Results from a research workshop with music therapy researchers as participants. Mood numbers refer to Hevner (1936).
Ninth column: The author’s hermeneutic interpretation of music and image potential.
multimodal imagery in guided imagery and music 433
The clinical outcome of the participants’ music and imagery experiences is reported
elsewhere (Bonde 2005, 2007). In the context of this chapter, I will examine the expe
riences from a neuroaffective perspective (Hart 2012; Lindvang and Beck 2017).
Neuroaffective theory describes and explains how affects and emotions are aroused and
regulated in different consciousness states, and three basic neurological levels, as
presented, for example, in the theory of the triune brain (the autonomous nervous
system, the limbic system, and the neocortex) and its relevance for psychotherapy
(MacLean 1990; Hart 2012; Lindvang and Beck 2017). Imagery experiences in GIM are
fine illustrations of how these levels are at work—in the same session, in the slightly
altered state (facilitated by the deep relaxation before the music travel), multimodal
images are evoked during music listening and can be correlated with alpha waves in the
brain activity and responses at the autonomous level of the nervous system, focused on
sensory perception and arousal regulating. Imagery is closely connected with emotions,
processed in the limbic system, while the ongoing dialogue between client and therapist
makes a verbal-metaphorical bridge to the frontal cortex system, focused on mentali
zation (Fachner et al. 2015; Hunt 2011, 2015, 2017).
In Mrs. F’s travel, there are very few words. However, the imagery is intense and
concentrated on the existential question: I may die soon, how shall I approach this fact?
Emotionally, she moves between despair and hope, but in the music travel she experi
ences a transformation of anxiety. Hope is activated when Death appears as a friend, not
a foe. This transformation is both emotional (relief and joy) and bodily (deep breathing,
serenity). The music travel of Mrs. L illustrates neuroaffective theory very clearly. First,
a sensory response indicates a change in perception (level one: autonomous), then emo
tions arise (level two: limbic) and images are evoked; and, finally, the transformative
experience of being the elephant links to the frontal level—the client even mentalizes
the elephant as herself. The final phase of the GIM session—the postlude dialogue—is
the stage of integrating the neuroaffective levels by examining emotions, images, and
their connection with the theme in focus. Such existential experiences can lead to new
coping strategies and increased self-awareness.
Together with the growing research in music in everyday life, studies like this suggest
that GIM and other types of deep music listening have an almost unexplored health
potential and should be used in prophylactic projects. The transformative potential of
such experiences can be illustrated by results from a study of GIM with a nonclinical
population (Blom 2014; Bonde and Blom 2016). Ten participants volunteered for a project
presented as “Self-development through music and imagery.” Six participants had
previous GIM experience and were offered three sessions. Four participants had never
experienced GIM; they were offered five sessions. Advanced GIM music programs,
identified as potential sources of transformation, were used in the sessions. All pro
grams included strong and challenging music, with the purpose of inspiring and facili
tating existential and spiritual processes of transformation. Participants filled in
questionnaires on existential well-being, and they were interviewed about their experi
ences together with the therapist (in so-called collaborative interviews); all session tran
scripts were analyzed. This analysis documented that all ten participants used GIM to
434 lars ole bonde
facilitate deep existential work. They all reported strong experiences of beauty and
confirmation at a deep level of being. The experience of surrender (Blom 2014—
described later) could be documented for eight of ten participants in the sessions, and,
in the interviews, they described the seminal influence of these music and imagery
experiences for their inner and outer life.
Such “strong music experiences” have been documented in literature from music
psychology, music sociology, ethnomusicology, and music therapy. One of the pioneer
researchers, the Swedish music psychologist Alf Gabrielsson (2011), collected more than
a thousand first-person reports on such experiences. He and his colleagues analyzed
them phenomenologically and developed a descriptive categorization of characteristics
and types. Gabrielsson and other Scandinavian researchers have looked into the health
potential of such experiences (Bonde et al. 2013; Lilliestam 2013). These studies indicate
that strong music experiences not only have existential meaning for the listener but also
are health promoting—especially when they are shared, such as in individual or group
therapy (Stern 2010).
The GIM experience is complex, and many types of theories are relevant as part of the
framework of understanding how music, imagery, guiding, drawing, and verbal pro
cessing work together. Helen Bonny, the creator of GIM, thought of GIM as a transfor
mational practice, enabling even transpersonal experiences through music listening.
She was influenced by transpersonal psychology and worked for some years together
with Stanislav Grof at Maryland Psychiatric Center on the selection of music for experi
mental LSD sessions (Bonny 1975, 2002a, 2002b; 2002c). Her so-called Cut-log diagram
(Bonny 1975) is a theoretically based map of the mind, integrating layers and states of
consciousness known from the psychological theories of Freud, Jung, Grof, and Wilber.
The diagram reflects the enormous diversity of GIM experiences in thousands of travel
ers and how the GIM experience can lead the client to many different layers or states in
the same session. The diagram and the theory has been further developed by other GIM
therapists (Goldberg 2002; Clark 2014). Clark (2014) documents how the original two-
dimensional model expanded into three dimensions of “funnel” models (Bush 1995),
a holographic model (Goldberg 2002), and Clark herself suggests a “synthesis” model of
the “invisible, interpenetrating fields” of center and periphery, consciousness, music,
guide, and traveler.
Bonde (2000, 2004, 2005) studied GIM experiences inspired by metaphor theory
(Ricoeur 1978; Lakoff and Johnson 1980; 1999; Johnson 2007). Based on these studies,
multimodal imagery in guided imagery and music 435
I suggest that (mostly nonverbal) images are reported as metaphors in the traveler-guide
dialogue, and that images are configured in narrative scenes, episodes, or complete
narratives revealing embodied core metaphors and “scripts” that can be processed
therapeutically. In this type of dialogic music listening, imagery is reported as a meta
phorical narrative of experiences in other sensory modalities (Horowitz 1983, see later).
I also studied the relationship between music and imagery—among GIM practitioners
often described with the didactic metaphor of “music as cotherapist” (Bonde 2010;
Wärja and Bonde 2014). Based on a number of event structure analyses (see Table 21.1
for an example), I formulated a series of grounded theories (Bonde 2005; 2017), address
ing steps or stages in the therapeutic process and the roles and functions of the musical
elements (melody, harmony, rhythm, form, style, etc.) in GIM. Here are a few obser
vations in the theory of narrative patterns (in the imagery configuration) related to musi
cal structure (Bonde 2005; 2017): The clearer the narrative structure of the music is, the
clearer this will be reflected in the imagery. Music introducing higher intensity and ten
sion is reflected in the imagery in many ways: a change of perspective is seen, manifest
action may replace hesitation or a block, emotional outlets may follow reflections, sud
den insight (“messages”) are experienced, or the imagery develops in a new direction.
Examples can be seen in Table 21.1; for example, in the development of the “Death of an
elephant” story, where the intimate relationship between musical form and narrative
form is demonstrated. A ternary form in the music may impose a ternary narrative or
dramatic structure on the imagery. Simplicity and complexity are complementary in the
development of music and imagery. Simple musical forms with many repetitions tend to
stabilize the imagery, inviting extended descriptions and a differentiation of (emotional)
qualities, while complex or developmental forms with many changes or transformations
tend to impose a dynamic process on the imagery. This theory is closely related to how
DeNora (2011) understands GIM as a “laboratory” where music “provides structures for
formulating thought and . . . knowledge of the world”:
GIM is an excellent natural laboratory, a place in which to see how agents transfer
musical properties to extra-musical properties and how they come to understand
those extra-musical matters through the sonic structure of music, and in real time,
that is, in direct correlation with the unfolding musical event. (317)
Based on Bonny’s early ideas of the “profile of affective/energy dynamics” of the music in
GIM (1978b), I developed a basic classification of “therapeutic music in GIM,” distinguish
ing between the specific intensity profiles of (1) supportive music, (2) mixed supportive/
challenging music, and (3) challenging music (Bonde 2005). The classification was later
developed into a “taxonomy” (Wärja and Bonde 2014) describing in more detail how the
ebb and flow of musical tension and release can be understood in a therapeutic context.
Theories of imagery form a controversial field in clinical psychology. For what
is imagery actually, and how is it related to imagination? The psychologist and
psychotherapist Horowitz (1983) presented a theory of mental representation, with
436 lars ole bonde
imagery in a central role. In this theory, there is a distinction between three modes of
representation—three types of “thinking.” According to Horowitz, enactive representation
is the “thinking of the body” and, mostly, this kind of knowledge is tacit and implicit, and
the first to be developed in the child. Image representation is next in the developmental
process, a specific way of processing information with the inner senses—with at least six
modalities: visual, auditory, sensory-kinesthetic, olfactory, gustatory, and emotional.
The latest stage in the developmental process is thinking in words and concepts (logic
and numbers), what Horowitz calls lexical representation. Horowitz’s theory is a relevant
framework for the understanding of GIM experiences where all three modes of represen
tation are active and where metaphors bridge them. It is also close to neuroaffective
theory. Thinking in multimodal images that are expressed verbally in metaphors and
narrative episodes is much more common and important than we normally think, and
music is probably the most image-stimulating and -evoking medium that exists. In dreams,
daydreams, and creative imaginative states of consciousness imagery belongs to a
specific form of human creativity. In cognitive psychology, however, there has been a
hot debate going on over decades on how to understand mental imagery and its role in
cognition (Kind 2006). There are two competing views: propositional and depictive
(descriptionalism versus pictorialism; the first claiming that images are represented
roughly in the way language is represented, the latter that images are represented roughly
in the same way as pictures). Based on my GIM studies, I am in line with Kosslyn and
colleagues (2006) who support the depictive view and contend not only that mental
images depict information but also that these depictions play a functional role in human
cognition (for example, problem solving, memory, creativity).
From the perspective of interpersonal psychology, the study by Blom (2011, 2014) is
taking music and imagery (and GIM research and theory) to a new level. The study of
imagery in GIM has long focused on the content of the imagery, and systems of classifi
cation have been suggested (Grocke 1999; 2007). As an alternative, Blom suggests that
the focus should be on process, based on the premise that music in GIM is a relational
agent, with the musical elements metaphorically serving as relational ingredients with
transformational potential. The therapeutic relationship (the triangle of music–therapist–
client) is the interpersonal framework of that process, including explicit and implicit
negotiation, disruption and repair, and moments of intense affectivity. Based on the
thorough analysis of music and imagery in ten nonclinical participants’ thirty-eight
music travels to advanced GIM music programs, she developed an intersubjective
understanding of the process of “surrender” in GIM. The processes and the shared
multimodal imagery can be divided into six categories, with the first three describing
basic ways of sharing (1. shared attention, 2. shared intention, 3. shared affectivity) while
the last three are genuine interpersonal experiences (4. confirmation, 5. nonconfirmation,
6. surrender or transcendence).
Imagery is only mentioned briefly in two recent handbooks of music psychology
(Hallam et al. 2009; Juslin and Sloboda 2011). However, in experimental music psychology
both modality-independent and modality-specific imagery have been studied using
multimodal imagery in guided imagery and music 437
The neuroscience of music has developed a lot over the last twenty years (Christensen
2012). Cognitive neuroscience has broadened our understanding of how music is
processed in the brain, and how the complex interplay of music and emotion involves all
three “systems” of the brain, as mentioned in the section on neuroaffective theory earlier.
However, there are not many neuroscientific studies of spontaneous, music-evoked
imagery or of GIM experiences. An early study by Lem (1999) presented a promising way
of using EEG to document brain activity during listening to a piece of music from the GIM
repertoire and correlating this with the imagery reported post hoc. In a recent neuro
phenomenological study (Hunt 2017), a similar method was used to investigate brain
activity during music listening. The participants listened to music and a script focusing
on only one of six specific imagery modalities: body, visual, kinesthetic, interaction,
affect, and memory (Hunt 2017). In these studies, there was no dialogue and no verbal
reporting during music listening—the imagery cannot be reported immediately because
talking and movements disturb the EEG signal. Therefore, it has not until now been
possible to study brain activity in a naturalistic GIM setting. An ongoing study (Fachner
et al. 2015) has the ambition of solving the problem, at least partially. Two GIM sessions
were recorded in a naturalistic setting, and the traveler’s brain responses were EEG-
recorded during (1) rest, (2) relaxation/induction, and (3) the music travel. The verbal
dialogue was transcribed verbatim to enable an analysis of the imagery and its meaning.
Based on this analysis, core metaphors and episodes of special interest were identified,
and some of these were selected for EEG analysis, based on the premise that there should
be long enough periods of silence before and/or after the verbal report to enable an
uncompromised EEG signal. The analysis is ongoing, and a preliminary conclusion of
this neurometric EEG-LORETA case study was that the ASC (defined as alpha waves
or slower) induced in the relaxation phase has marked influence on the music listen
ing process, and ASC-related change indicates connection to visual imagery processing
during music listening in GIM. In the second phase of this study, EEG signals were been
recorded from both therapist and client simultaneously and in a naturalistic setting. The
analysis is ongoing.
Discussion
Music therapy is not limited to clinical practice areas only. Music therapy research is
recognized as a specific tradition in its own right within musicology (Ruud 2016).
References to music therapy are increasingly found in theories and studies in music
psychology (e.g., Juslin and Västfjäll 2008; Asutay and Västfjäll, this volume, chapter 18;
Eerola and Vuoskoski 2013), and music therapy theory contributes to the understanding
of musicking in a health perspective and an embodiment perspective (Bonde and Beck
forthcoming 2019; Small 1998; Stige 2003). As shown earlier, many different theories
have been developed to explain the complex interplay of music, imagery, and the
multimodal imagery in guided imagery and music 439
interpersonal relationships in GIM. There is also a lot of research to support the evidence
of GIM as an effective method of psychotherapy.
However, neuroscientific evidence of GIM as effective psychotherapy is still quite
sparse. Such experimental studies using advanced technology in a laboratory to study
music and imagery is quite far from both the naturalistic GIM setting and everyday music
listening, and there is still a long way to go to document whether/how pivotal or trans
formative imagery is correlated with changes in brain activity. Therefore, an important
design development (as described earlier) is to record the EEG of both traveler/client and
guide/therapist simultaneously. This can give important information on the neurological
nature of the interpersonal relationship in particular, and the interpersonal nature of
the GIM experience, as suggested by Blom (2014).
With her interpersonal theory of processes in GIM, Blom (2011, 2014) indirectly
contributes to a demystification of spiritual and transpersonal experiences that are often
reported in GIM. Blom gives these strong experiences of “surrender” a contemporary
relational psychological framework and her study indicates the health potential of
such experiences.
Most of the existing music and imagery research in music psychology investigates
imagination of intervals, melodies, and other musical elements in order to compare
them to the listening process (Hubbard 2010; Hubbard, volume 1, chapter 8). This kind
of experimental research has a long history; however, it often lacks ecological validity in
the contexts of receptive music therapy or everyday music listening. It is interesting that
“imagery” is not listed in the index of The Oxford Handbook of Music Psychology (Hallam
et al. 2009), and that “imagining” is only mentioned in the chapter on the psychology of
composition (Impett 2009). Kinesthetic-image schemas are mentioned in the chapter
on music and meaning (Cross and Tolbert 2009), with references to the cognitive
metaphor theory by Lakoff and Johnson (mentioned earlier), listed as an example of an
experientialist approach to music and meaning. Even if the handbook has many chapters
on music and emotion, imagery is not an element in them. In the Handbook of Music
and Emotion (Juslin and Sloboda 2011), imagery is included in the index and discussed
in two chapters. Woody and McPherson (2011) describe how musicians use imagery and
metaphors to evoke emotions for performance. Gabrielsson (2011) reports from his
study of “strong experiences with music” (mentioned earlier) how such experiences are
often reported by the listeners/informants. Juslin and Västfjäll (2008) include imagery
in their promising BRECVEMA model (described earlier), however, they only mention
visual imagery and, as we have seen from the empirical data, imagery is multimodal, not
only visual. As shown by McNorgan (2012) each imagery modality has both general and
specific neural correlates and therefore contributes to meaning in a unique way.
I think the relative absence of empirical, naturalistic music, and imagery studies
in neuroscience reflects a dominating, more or less traditional, postpositivist approach
to research in music listening. The more actual listening reports are included in the
research, the more imagery comes to the foreground. What is suggested here is that
research in music listening should be much more focused on naturalistic settings
440 lars ole bonde
and that the study of multimodal imagery can be a key to broaden our understanding
not only of GIM and other receptive music therapy methods (Hunt 2015) but also
of music listening as—in DeNora’s words—“a technology of the self ” in everyday life
(DeNora 2000, 2007, 2011) and as a genuine health resource (Ekholm et al. 2016a, 2016b;
Bonde et al. 2018). Cognitive neuroscience and neurophenomenology can contribute
to this if the researchers take the epistemological stance that the first-person and the
third-person perspective are equally important (Hunt 2015).
Conclusion
Music imaging is a natural phenomenon that can be encouraged and used in many
different ways and contexts, including music education (Halpern and Overy, this volume,
chapter 19). It is used in therapy (e.g., in GIM) to stimulate the client’s creative imagination
and ability to change or transform inappropriate patterns of attachment and emotion
regulation, but it is also used in everyday life as what Tia DeNora calls “a technology
of the self.” Using the concepts of the social psychologist James Gibson, we can say that
music affords imaging and music imaging can be appropriated in multiple ways, for
creative-imaginative purposes as well as for the regulation of physical, psychological,
and spiritual well-being. What Even Ruud (2010) calls “listening self-care” and “musical
self-medication” are typical forms of appropriations. Music imaging is both a mode of
thinking (based on introjection of patterns afforded by the musical material) and a mode
of expression (affording the projection of personal material of all sorts on the music).
Music listening in GIM therapy is of course not “music listening” per se. Client
experiences are highly personal, even idiosyncratic, and the therapeutic focus is always
more important in the context than the aesthetic qualities of the music. However, GIM
experiences are good examples of music’s affordances and appropriations (DeNora 2000).
With the therapist’s support, the GIM client takes from the music what is needed to explore
salient physical, psychological, social, existential, or spiritual issues. The combination of
music and imagery is not just relevant in a clinical context, even if “image listening” has
been regarded as irrelevant by musicology until recently; the experience of multimodal
imagery while listening to music is inherently human and has a great potential as a health
resource. Before creating the Bonny Method of GIM, Helen Bonny worked together
with the Canadian musicologist Louis Savary on a project called “Listening with a new
consciousness” (Bonny and Savary 1973). This book presents many manuscripts of guided
“music travels” for groups, with many different target groups from school children to
religious groups. The GIM therapist Carol Bush developed “GIM on your own” (1995) as a
method for self-development. The study of imagery during music listening is increas
ingly being integrated in music psychology, and early evidence from neuroscience supports
the prophylactic potential of music and imagery work. In other words, GIM is a well-
documented example of “sound imagination” contributing to a new perspective or
paradigm that Tia DeNora calls a MusEcological perspective (DeNora 2011).
multimodal imagery in guided imagery and music 441
References
Abrams, B. 2002. Transpersonal Dimensions of the Bonny Method. In Guided Imagery and
Music: The Bonny Method and Beyond, edited by K. E. Bruscia and D. E. Grocke, 339–358.
Gilsum NH: Barcelona Publishers.
Blom, K. M. 2011. Transpersonal—Spiritual BMGIM Experiences and the Process of Surrender.
Nordic Journal of Music Therapy 20 (2): 185–203.
Blom, K. M. 2014. Experiences of Transcendence and the Process of Surrender in Guided
Imagery and Music (GIM). PhD thesis. Aalborg University. http://vbn.aau.dk/files/
204635175/Katarina_Martenson_Blom_Thesis.pdf. Accessed December 29, 2018.
Bonde, L. O. 2000. Metaphor and Narrative in Guided Imagery and Music. Journal of the
Association for Music and Imagery 7: 59–76.
Bonde, L. O. 2005. The Bonny Method of Guided Imagery and Music (BMGIM) with Cancer
Survivors: A Psychological Study with Focus on the Influence of BMGIM on Mood and
Quality of Life. PhD thesis, Aalborg University. http://www.wfmt.info/Musictherapyworld/
modules/archive/dissertations/pdfs/Bonde2005.pdf. Accessed December 28, 2018.
Bonde, L. O. 2007. Imagery, Metaphor and Perceived Outcomes in Six Cancer Survivors’
BMGIM Therapy. Qualitative Inquiries in Music Therapy, Vol. 3, edited by A. Meadows,
132–164. Gilsum, NH: Barcelona Publishers.
Bonde, L. O. 2010. Music as Support and Challenge. Jahrbuch Musiktherapie Bd. 6,
Imaginationen in der Musiktherapie, 89–118. Wiesbaden: Reichert Verlag.
Bonde, L. O. 2017. Embodied Music Listening. In The Routledge Companion to Embodied
Music Interaction, edited by M. Lesaffre, M. Leman, and P.-J. Maes, 269–277. London:
Routledge.
Bonde, L. O., and B. D. Beck. 2019 (forthcoming). Imagining Nature during Music Listening.
An Exploration of the Meaning, Sharing and Therapeutic Potential of Nature Imagery in
Guided Imagery and Music. In Nature in Psychotherapy and Arts-Based Therapy, edited by
E. Pfeifer and H.-H. Decker-Voigt. Giessen: Psychosozial Verlag.
Bonde, L. O., and K. M. Blom. 2016. Music Listening and the Experience of Surrender: An
Exploration of Imagery Experiences Evoked by Selected Classical Music from the Western
Tradition. In Cultural Psychology of Musical Experience, edited by H. Klempe, 207–234.
Charlotte, NC: Information Age Publishing.
Bonde, L. O., O. Ekholm, and K. Juel. 2018. Associations between Music and Health-Related
Outcomes in Adult Non-Musicians, Amateur Musicians and Professional Musicians—
Results from a Nationwide Danish Study. Nordic Journal of Music Therapy 27 (4): 262–282.
Bonde, L. O., M. S. Skånland, E. Ruud, and G. Trondalen. 2013. Musical Life Stories: Narratives
on Health Musicking. Oslo: Skriftserie fra Senter for musikk og helse.
Bonny, H. L. 1975. Music and Consciousness. Journal of Music Therapy 12: 121–135.
Bonny, H. L. 1978a. GIM Monograph #1: Facilitating GIM Sessions. Salina, KS: Bonny
Foundation.
Bonny, H. L. 1978b. GIM Monograph #2: The Role of Taped Music Programs in the GIM Process.
Salina, KS: Bonny Foundation.
Bonny, H. L. 2002a. Autobiographical Essay. In Music and Consciousness: The Evolution of
Guided Imagery and Music, edited by L. Summer, 1–18. Gilsum NH: Barcelona Publishers.
Bonny, H. L. 2002b. The Early Development of Guided Imagery and Music (GIM). In Music
and Consciousness: The Evolution of Guided Imagery and Music, edited by L. Summer, 53–68.
Gilsum NH: Barcelona Publishers.
442 lars ole bonde
Bonny, H. L. 2002c. Guided Imagery and Music (GIM): Discovery of the Method. In Music
and Consciousness: The Evolution of Guided Imagery and Music, edited by L. Summer, 43–52.
Gilsum NH: Barcelona Publishers.
Bonny, H., and L. Savary. 1973. Music and Your Mind: Listening with a New Consciousness.
New York: Harper & Row.
Bruscia, K. E. 2002. The Boundaries of Guided Imagery and Music (GIM) and the Bonny
Method. Guided Imagery and Music. In Guided Imagery and Music: The Bonny Method and
Beyond, edited K. E. Bruscia and D. E. Grocke, 37–61. Gilsum NH: Barcelona Publishers.
Bush, C. 1995. Healing Imagery and Music: Pathways to the Inner Self. Portland, OR: Rudra Press.
Christensen, E. 2012. Music Listening, Music Therapy, Phenomenology and Neuroscience.
PhD thesis. Aalborg: Aalborg University. http://vbn.aau.dk/files/68298556/MUSIC_
LISTENING_FINAL_ONLINE_Erik_christensen12.pdf. Accessed May 7, 2017.
Clark, M. 2014. A New Synthesis Model of the Bonny Method of Guided Imagery and Music.
Journal of the Association for Music and Imagery 14: 1–22.
Clarke, D., and E. Clarke. 2011. Music and Consciousness: Philosophical, Psychological, and
Cultural Perspectives. Oxford: Oxford University Press.
Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical
Meaning. Oxford: Oxford University Press.
Clarke, E. 2011. Music Perception and Musical Consciousness. In Music and Consciousness.
Philosophical, Psychological, and Cultural Perspectives, edited by D. Clarke and E. Clarke,
193–213. Oxford: Oxford University Press.
Clarke, E., T. DeNora, and J. Vuoskoski. 2015. Music, Empathy and Cultural Understanding.
Physics of Life Reviews 15, 61–88. doi:dx.doi.org/10.1016/j.plrev.2015.09.001.
Cross, I., and E. Tolbert. 2009. Music and Meaning. In The Oxford Handbook of Music Psychology,
edited by S. Hallam, I. Cross, and M. Thaut, 33–46. Oxford: Oxford University Press.
DeNora, T. 2000. Music in Everyday Life. Cambridge: Cambridge University Press.
DeNora, T. 2007. Health and Music in Everyday Life—A Theory of Practice. Psyke and Logos
28 (1): 271–287.
DeNora, T. 2011. Practical Consciousness and Social Relation in MusEcological Perspective.
In Music and Consciousness: Philosophical, Psychological, and Cultural Perspectives, edited
by D. Clarke and E. Clarke, 309–326. Oxford: Oxford University Press.
Eerola, T., and J. K. Vuoskoski. 2013. A Review of Music and Emotion Studies: Approaches,
Emotion Models, and Stimuli. Music Perception: An Interdisciplinary Journal 30 (3): 307–340.
Ekholm, O., K. Juel, and L. O. Bonde. 2016a. Associations between Daily Musicking and
Health: Results from a Nationwide Survey in Denmark. Scandinavian Journal of Public
Health 44 (7): 726–732. https://doi.org/10.1177/1403494816664252.
Ekholm, O., K. Juel, and L. O. Bonde. 2016b. Music and Public Health—An Empirical Study of
the Use of Music in the Daily Life of Adult Danes and the Health Implications of Musical
Participation. Arts and Health 8 (2): 154–168. https://doi.org/10.1080/17533015.2015.1048696.
Fachner, J., E. Ala-Ruona, and L. O. Bonde. 2015. Guided Imagery in Music—A Neurometric
EEG/LORETA Case Study. In Proceedings of the Ninth Triennial Conference of the European
Society for the Cognitive Sciences of Music, 17–22 August 2015, edited by J. Ginsborg,
A. Lamont, M. Phillips, and S. Bramley. Manchester, UK: Society for the Cognitive Sciences
of Music (ESCOM).
Gabrielsson, A. 2011. Strong Experiences with Music: Music Is Much More Than Just Music.
Oxford: Oxford University Press.
Gibson, J. J. 1983. The Senses Considered as Perceptual Systems. Westport, CT: Greenwood Press.
multimodal imagery in guided imagery and music 443
Goldberg, F. S. 2002. A Holographic Field Theory Model of the Bonny Method of Guided
Imagery and Music (BMGIM). In Guided Imagery and Music: The Bonny Method and Beyond,
edited by K. E. Bruscia, and D. E. Grocke, 359–377. Gilsum NH: Barcelona Publishers.
Grocke, D. 1999. A Phenomenological Study of Pivotal Moments in Guided Imagery and
Music (GIM) Therapy. PhD thesis. Melbourne: Faculty of Music, The University of
Melbourne. In Music Therapy Info CD-Rom III, edited by D. Aldridge. Witten: Universität
Witten/Herdecke.
Grocke, D. 2010. An Overview of Research in the Bonny Method of Guided Imagery and
Music. Voices: A World Forum for Music Therapy 10 (3). https://voices.no/index.php/voices/
article/view/1886/1651. Accessed December 28, 2018.
Grocke, D., and T. Wigram. 2007. Receptive Methods in Music Therapy: Techniques and Clinical
Applications for Music Therapy Clinicians, Educators, and Students. London: Jessica Kingsley.
Grocke, D., and T. Moe. 2015. Guided Imagery and Music: A Spectrum of Approach. London:
Jessica Kingsley.
Hallam, S., I. Cross, and M. Thaut. 2009. The Oxford Handbook of Music Psychology. Oxford:
Oxford University Press.
Hart, S. 2012. Neuroaffektiv psykoterapi med voksne [Neuroaffective Psychotherapy with
Adults]. Copenhagen: Hans Reitzels Forlag.
Hevner, K. 1936. Experimental Studies of the Elements of Expression in Music. American
Journal of Psychology 48: 246–268.
Horowitz, M. 1983. Image Formation and Psychotherapy. New York: Jason Aronson.
Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136 (2): 302.
Hunt, A. M. 2011. A Neurophenomenological Description of the Guided Imagery and Music
Experience. PhD Thesis.: Philadelphia, PA Temple University.
Hunt, A. 2015. Boundaries and Potentials of Traditional and Alternative Neuroscience
Research Methods in Music Therapy Research. Frontiers in Human Neuroscience, June 9. 9:
342. doi:10.3389/fnhum.2015.00342.
Hunt, A. 2017. Protocol for a Neurophenomenological Investigation of a Guided Imagery and
Music Experience (Part II). Music and Medicine 9 (2): 116–127.
Impett, J. 2009. Making a Mark: The Psychology of Composition. In The Oxford Handbook of
Music Psychology, edited by S. Hallam, I. Cross, and M. Thaut, 651–666. Oxford: Oxford
University Press.
Johnson, M. 2007. The Meaning of the Body: Aesthetics of Human Understanding. Chicago,
IL: Chicago University Press.
Juslin, P. N., and J. A. Sloboda. 2011. Handbook of Music and Emotion. Oxford: Oxford
University Press.
Juslin, P. N., and D. Västfjäll. 2008. Emotional Responses to Music: The Need to Consider
Underlying Mechanisms. Behavioral and Brain Sciences 31: 559–575.
Juslin, T., Barradas, G., and Eerola, T. 2015. From Sound to Significance. Exploring the
Mechanisms Underlying Emotional Reactions to Music. American Journal of Psychology
128 (3): 281–304.
Kind, A. 2006. Imagery and Imagination. Internet Encylopedia of Philosophy, 1–19. https://
www.iep.utm.edu/imagery/. Accessed December 29, 2018.
Kosslyn, S. M., W. L. Thompson, and G. Gais. 2006. The Case for Mental Imagery. Oxford:
Oxford University Press.
Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago and London: University of
Chicago Press.
444 lars ole bonde
Lakoff, G., and M. Johnson. 1999. Philosophy in the Flesh: The Embodied Mind and Its Challenge
to Western Thought. New York: Basic Books.
Lem, A. 1999. Selected Patterns of Brainwave Activity Point to the Connection between
Imagery Experiences and the Psychoacoustic Qualities of Music. In Music Medicine, Vol. 3,
edited by R. R. Pratt, and D. E. Grocke, 75–87. Melbourne: University of Australia.
Lilliestam, L. 2013. Music, the Life Trajectory and Existential Health. In Musical Life Stories:
Narratives on Health Musicking, edited by L. O. Bonde. E. Ruud, M. Skånland, and
G. Trondalen, Anthology #6, 17–39. Oslo: Publications from the Centre for Music and
Health.
Lindvang, C., and B. D. Beck. 2017. Musik, krop og følelser: Neuroaffektive processer i musikterapi
[Music, Body, and Emotions. Neuroaffective Processes in Music Therapy]. Copenhagen:
Frydenlund Academic.
MacLean, P. D. 1990. The Triune Brain in Evolution: Role in Paleocerebral Functions.
New York: Plenum.
Marr, J. 2001. The Use of the Bonny Method of Guided Imagery and Music in Spiritual Growth.
Journal of Pastoral Care 55 (4): 397–406.
McKinney, C., and T. Honig. 2017. Health Outcomes of a Series of Bonny Method of Guided
Imagery and Music Sessions: A Systematic Review. Journal of Music Therapy 54 (1): 1–34.
McNorgan, C. 2012. A Meta-Analytic Review of Multisensory Imagery Identifies the Neural
Correlates of Modality-Specific and Modality-General Imagery. Frontiers in Human
Neuroscience 2012 (6): article 285.
Ricoeur, P. 1978. The Rule of Metaphor: Multi-Disciplinary Studies of the Creation of Meaning
in Language. London: Routledge & Kegan Paul.
Ruud, E. 2010. Music Therapy: A Perspective from the Humanities. Gilsum, NH: Barcelona
Publishers.
Ruud, E. 2016. Musikkvitenskap. Oslo: Universitetsforlaget.
Small, C. 1998. Musicking: The Meanings of Performing and Listening. London: Wesleyan
University Press.
Stern, D. 2010. Forms of Vitality: Exploring Dynamic Experience in Psychology, the Arts,
Psychotherapy and Development. Oxford: Oxford University Press.
Stige, B. 2003. Elaborations toward a Notion of Community Music Therapy. Oslo: Unipub.
Summer, L. 2002. Group Music and Imagery Therapy: Emergent Receptive Techniques in
Music Therapy Practice. In Guided Imagery and Music: The Bonny Method and Beyond,
edited by K. E. Bruscia and D. E. Grocke, 297–306. Gilsum NH: Barcelona Publishers.
Summer, L. 2009. Client Perspectives on the Music in Guided Imagery and Music (GIM).
PhD thesis. Aalborg University. http://vbn.aau.dk/files/112202270/6467_lisa_summer_
thesis.pdf. Accessed December 29, 2018.
Tesch, R. 1990. Qualitative Research: Analysis Types and Software Tools. London.
Wärja, M., and L. O. Bonde. 2014. Music as Co-Therapist: Towards a Taxonomy of Music in
Therapeutic Music and Imagery. Music and Medicine 6 (2): 16–27.
Woody, R. H., and G. E. McPherson. 2011. Emotion and Motivation in the Lives of Performers.
In Handbook of Music and Emotion, edited by P. N. Juslin and J. A. Sloboda, 401–424.
Oxford: Oxford University Press.
chapter 22
Freya Bailes
Introduction
Many empirical studies of musical imagery begin by defining their subject as music
“heard” by the mind’s ear, before swiftly acknowledging the importance of additional
musical dimensions to the sonic. In defense of this approach, there are good reasons to
emphasize the auditory components of imaged music when defining it, since mental
imagery is generally understood to be a visual phenomenon. Even those who have
previously encountered the term “musical imagery” might conceive of it as a primarily
visual image accompanying heard music, as in its therapeutic use in Guided Imagery and
Music (see Bonde, this volume, chapter 21). An alternative approach to communicating
the intended meaning of musical imagery is to provide examples, which might
include having an “earworm,” mentally continuing music that has stopped, audiating a
musical score (see Halpern and Overy, this volume, chapter 19), mentally rehearsing for
a music performance, or imagining1 a new composition. None of these examples is
prescriptive with respect to the sensory modalities that might be represented in imagi-
nation, but neither do they indicate what might be imaged in addition to sound, and this
chapter aims to explore the multimodality of our imagery for music.
Returning to attempts to define musical imagery, in Bailes (2007) I explicitly refer to
the “mind’s ear,” defining musical imagery as “the experience of imagining musical
sound in the absence of directly corresponding sound stimulation from the physical
environment” (555). While this definition encapsulates the notion of simulating sensory
experience, it focuses exclusively on sound. Beaty and colleagues (2013) describe musical
imagery as “melodies of the mind” (1163), a neutral expression though one that suggests a
passive occurrence. In their study of involuntary musical imagery, Jakubowski and
446 freya bailes
colleagues (2015) refer to a “mental replay of music” (1229), while Liikkanen (2012)
poetically describes musical imagery as “a mental soundscape audible for our ‘inner
ear’ ” (236). Weber and Brown (1986) introduced musical imagery as “a particular form
of auditory imagery in which one imagines a melody or song . . . the ability to imagine,
among other things, tonal progressions” (411).
Researchers must account for an increasing body of evidence to suggest that rather
than merely imaging the sound of music, we image visual and kinesthetic dimensions of
musical experience as well.2 In this chapter, I begin with an introduction to theories
whereby our seemingly disembodied mental imagery can instead be understood in
relation to embodied cognition. I will revisit the findings of empirical studies of musical
imagery to determine the extent to which embodied cognition could hold explanatory
power, before considering recent work that has directly tested hypotheses relating
body movement to musical imagery, and outlining a number of possible directions for
future research.
Embodied Cognition
and Mental Imagery
Others have already reflected on how mental imagery might relate to our embodied
experience. One theoretical position of relevance to an embodied account of mental
imagery is experiential cognition (Reybrouck 2001), which posits that our represen-
tation of the world is generated by an interaction between environmental input and
our capacity to represent it. Our bodies are our most immediate environments, and
our physicality in turn governs our interaction with the wider environment. In their
seminal text on embodied cognition, Varela and colleagues (1991) emphasize the
dependence of minds on bodies that are characterized by certain sensorimotor
capacities. For them, embodied “means reflection in which body and mind have been
brought together” (Varela et al. 1991, 27). By this argument, the apparently disembodied
mental simulation of sensorimotor experience is necessarily conditioned by our
physical experiences of the world.
In parallel work, there is increasing evidence to support theories that our perceptions
are influenced by the possible actions afforded by what we perceive (Gibson 1986;
Hubbard 2013). According to these theories, perceiving the actions of another will
activate motor plans of our own (Schiavio et al. 2014). In this way, listening to music
implies the actions associated with its production (Cox 2001; Reybrouck 2001). The
motor theory of perception originated as a theory of language perception, and has been
evoked to explain the influence of motor constraints on our representations of verbal
stimuli (Hubbard 2013). Callan and colleagues (2006) extended the concept to suggest the
existence of a motor theory of music perception, to account for their findings of activation
of the motor cortex in both covert speech and song.
empirical musical imagery beyond the “mind’s ear” 447
• That there is a strong link between our knowledge of sound and sound sources,
both in perception and cognition, so that features of sound are in most cases
related to features of sound-production, sound-production here understood as
including both the sound-producing action and the features of the resonant bod-
ies and environments. And, as an extension of this:
• That images of sound-production, including visual, motor, tactile etc. elements,
may actually trigger images of sound, and conversely, that images of sound may
trigger images of sound-production. (Godøy 2001, 238)
As an extension of this idea, Godøy (2001) also suggests that the greater our under-
standing of how sounds are produced, the greater the likelihood of their salience as
auditory imagery.3 This leads to my prediction that the degree to which musical imagery
is embodied lies along a continuum ranging from imaging oneself performing a clearly
defined and rehearsed sonic output at one extreme, to imaging the timbre of an arti
ficially produced sine wave, which the human body could not produce without recourse
to digital means, at the other. A pertinent question arises as to whether our musical
imagery can ever be so abstracted from its origins in sound production as to be effectively
disembodied. In line with the theoretical propositions of embodied cognition
(e.g., Varela et al. 1991; Niedenthal et al. 2005), I argue that musical imagery cannot be fully
disembodied. As embodied minds, our thoughts are inseparable from our sensorimotor
experience, and in the absence of personal experience in producing specific sounds, we
draw on our knowledge of the actions required to make similar sounds to infer and
image the sorts of articulatory gestures involved in their making (Cox 2001; Godøy 2001;
Godøy, this volume, chapter 12).
Offline Cognition
relating to our bodily senses, we might expect offline music cognition to reflect these
same bodily concerns.
The distinction between online and offline cognition contrasts sensorimotor processing
with ideomotor simulation respectively. Relevant to the motor theory of perception
outlined above, Reybrouck (2001) argues that:
It makes a difference . . . as to both the intensitiy [sic] and precision of the covert
movements (the ideomotor simulation) if the subject who tries to imagine a certain
musical structure is an expert or a layman. Subjects who received formal musical
training can use this explicit musical knowledge and will easily imagine all the
motor processes that are connected with the production of the sounds. (129)
since not all music is vocally produced (Cox 2001; see Hubbard volume 1, chapter 8), and
yet there is compelling evidence that we are able to simulate and image a variety of
musical sounds, which I will now review.
We have seen a move in the cognitive sciences toward an embodied account of our
mental activity (Niedenthal et al. 2005; Glenberg et al. 2013), and other chapters in this
handbook reflect this focus (see Christensen, this volume, chapter 1; Huvenne, volume 1,
chapter 30; Saslaw and Walsh, this volume, chapter 7). I will now revisit the findings of
past empirical studies of musical imagery from this embodied perspective. The purpose
of this review is not to prove or disprove the embodiment of musical imagery, since such
an approach is methodologically untenable, and it would be impossible to refute the
argument that our minds are embodied. Rather, the purpose is to determine whether
embodied cognition could have explanatory power with respect to the findings of
musical imagery studies that were not necessarily designed to test such theories. It
should be noted that our retrospective view of the indicators of embodied imagery is
probably obscured by the neglect of past researchers to enquire about, or express an
interest in, those imagery parameters that might reflect embodiment. A similar point
has been made by Hubbard (2013) regarding empirical studies of auditory imagery that
do not habitually ask participants about their concurrent experiences of visual imagery.
However, some studies of musical imagery are directly concerned with bodily
involvement, since they focus on music performance. These will be reviewed first,
before a review of studies in which musical imagery occurs during composition and
listening, in voluntary musical imagery tasks, and during involuntary musical imagery.
Imagery in Performance
In order to perform almost all forms of music,4 one must move to produce the sound.
This fundamental auditory-motor association appears to be represented in imagery,
with increasing behavioral and brain imaging evidence being consistent with a role for
kinesthetic imagery in music performance (see Lotze 2013, for a review). However, such
auditory-motor associations must first be formed through experience of action in our
sonic environment and, in the case of expert musicians, through repeated and deliberate
musical enactment. In research by Lotze and colleagues (2003), professional violinists
scored higher than amateur violinists for the vividness of their movement imagery and,
at a neural level, they showed increased brain activations in the representation areas of
the fingers during an imagined performance of Mozart’s violin concerto in D Major.
450 freya bailes
To aid in their memorization of pitch sequences, trained music students are able
to use finger tapping, as though tapping on a keyboard. This motor-encoding strategy
appeared to reinforce their representation of the auditory stimuli (Mikumo 1994). In a
study of pianists’ uses of musical imagery for expressive parameters during performance,
we also found evidence that musical imagery was strengthened when the pianists were
able to play on a silent piano keyboard, thus providing motor reinforcement (Bishop
et al. 2013). There is evidence that during imaged song, the motor cortex is activated. For
instance, Callan and colleagues (2006) asked participants in a functional magnetic
resonance imaging (fMRI) study of the brain regions involved in perceived and imaged
speech and song to covertly sing (i.e., image) stimuli cued by visually presented lyrics.
Even though the task made no explicit motor demands, it seems that the song imagery
was nevertheless embodied.
An important means by which musical imagery can facilitate performance is by
enhancing the ability to anticipate upcoming events. This was the focus of work by
Keller and colleagues (2010), whose findings were consistent with the use of auditory
imagery to enable action planning. Specifically, their method allowed them to relate
anticipatory imagery for specific pitches to the accuracy of the actions required to
“perform” them. They concluded that cross-modal (i.e., auditory, visual, motor) ideomotor
processes were in operation, which would be consistent with an embodied representa-
tion of the pitch-space array. In a related study, Keller and Appel (2010) investigated the
role of anticipatory auditory imagery in ensemble performance. The auditory imagery
abilities of the duo pianists related to the quality of their coordination, regardless of
whether or not they were able to see each other as they performed. The authors again
postulate a role for ideomotor processes, suggesting that auditory imagery enhances the
operation of internal models that simulate the action of both oneself and others. Indeed,
learning the part of one’s duo partner by rehearsing it can be detrimental when it comes
to subsequently performing the duo with them, since an embodied representation of
the partner’s part, which necessarily differs from one’s own interpretation, can hinder
coordination (Ragert et al. 2013).
Another study of imagery for performance is a participant observation study of an
extended masterclass led by Nelly Ben-Or for expert pianists (Davidson-Kelly et al. 2015).
Central to Ben-Or’s approach is the use of multimodal musical imagery during per-
formance preparation. Eleven participants in her five-day masterclass were observed
and interviewed about their experiences, and a follow-up questionnaire was also given
out nine months later. A thematic analysis of the resulting data led to the articulation of
key elements of Ben-Or’s pedagogy. The principal feature is that performers should
memorize the music before physically rehearsing it. While this might seem to be an
extreme of disembodiment, the opposite could be said of the mental imagery that is
consequently required of the pianists, since in order to memorize a performance piece,
auditory, motor, and visual aspects must be integrated. Nelly Ben-Or herself explains
that the memory formed during deliberate imagery rehearsal is “a kind of memory that
includes an inner sense of the action of playing that music which I see [and it] has to
include a vision of the keyboard” (Davidson-Kelly et al. 2015, 86). The authors of the
empirical musical imagery beyond the “mind’s ear” 451
study explore possible cognitive mechanisms by which “total inner memory” might
enable effective performance. In particular, they suggest that the mental focus on the
distal performance goal afforded by the multimodal image could enhance a close
connection with the sound without the potentially disruptive effects of attending to
proximal issues of technical production. While Ben-Or’s instruction prioritizes nonmotor
tasks, an embodied understanding of sound production is assumed, so that the motor
aspects of performance automatically fall into place as long as the musical image is
complete. Interestingly, the pianists who participated in this study increased their
ratings of the importance of imagining movement during performance preparation
following the masterclass.
In an experience sampling survey of the everyday experiences of musical imagery
(Bailes 2007), music students reported imaging music in the course of their daily life
that they had recently performed, and also music that they were preparing for an
upcoming performance. The extent to which musicians are more inclined to imagine
music associated with performance than music they do not normally enact remains an
open question. It seems appropriate to look to the relationship of music to dance for
confirmation of mental kinesthetic-musical links. In an experimental study of memory
for music and dance (Mitchell and Gallagher 2001), participants were visually presented
with sequences alternating music and dance stimuli. Some participants reported mentally
accompanying the silent dance performances with the previously presented musical
stimulus. In other words, there was a reported tendency to match performed movement
with imaged sound.
A recent fMRI study of the brain activity associated with mentally transforming
imagery for melodies found that some regions associated with motor control were
activated (Foster et al. 2013). In this study, participants were instructed to mentally
transpose or reverse melodies, and then judge whether the subsequently presented
comparison stimulus matched their transformed image. Activation was found in the
intraparietal sulcus (IPS), forming part of the posterior parietal cortex (PPC), which is
connected to both working memory and motor-planning centers of the brain. The
authors also found clusters of significant interest in SMA when they contrasted the
reversed condition with the control, as well as consistent activations in pre-SMA during
both types of melody transformation task. While the authors do not speculate on their
findings of SMA and pre-SMA activation, the involvement of motor centers in such
an ostensibly mental task could be indicative of a link with covert production or
ideomotor simulation.
was found between musical training and the self-report measures of INMI employed in
this study. Active musical engagement is important in relation to INMI (Williamson
et al. 2011), and Liikkanen (2012) found that INMI was associated with exercising. In
earlier work, I found that music students reported musical imagery during activities
that involve motion, such as when traveling or getting up in the morning (Bailes 2006).
Music students describing their experiences of musical imagery report concurrent
visual and motor dimensions (Bailes 2007), as did the musically experienced interviewees
in an INMI study by Williamson and Jilka (2014).
If we are more inclined to image music that we are able to sing than music that we are
not able to sing, then our musical imagery should reflect the characteristics of vocal
music. Work by Burgoyne5 and colleagues is consistent with this argument. They have
been using online gaming to gather data about the catchiness of music, with gamers
indicating their familiarity with popular music and then indicating whether the music’s
continuation after a period of silence is correct or not, requiring them to mentally con-
tinue the music and compare the subsequently presented snippet with their mental
continuation. Using sophisticated algorithms, they have been able to establish the
salient musical parameters that make such music catchy, and these are melodic repe-
tition, vocal prominence, melodic conventionality, and melodic range conventionality.
The propensity for certain popular music to be experienced as an “earworm” is not
explained by its popularity (chart position) or exposure (recent runs) alone (Jakubowski
et al. 2016). Perhaps it is of significance that it is music that can be readily sung that sticks
in our memory.
Floridou and Müllensiefen (2015) used experience-sampling methods to explore the
conditions that predict INMI. Respondents in their study were asked not only about
their experiences of INMI, but also about mind wandering. Responses were modeled in
relation to contextual factors such as the activity that participants were undertaking
when they were contacted. One finding was the statistical relationship between mind
wandering and INMI, suggesting that mind wandering was a prerequisite to INMI. In
turn, mind wandering was statistically linked to the activity that respondents were
doing when their experience was sampled. Given that physical movement was one of the
activities found to favor mind wandering, this research could point to a role for the
bodily initiation of a chain of effect from activity to mind wandering to INMI. However,
a replication and extension of my earlier (Bailes 2007) empirical study of musical
imagery in everyday life did not confirm the previously found relationship between the
activity that respondents were engaged with and their propensity to imagine music
(Bailes 2015). This more recent work sampled the experiences of members of the general
public rather than university music students. Perhaps the association between activity
and imagery found in the earlier work relates to the theoretically stronger auditory-
motor associations that result from musical training.
An increasingly frequent suggestion in studies of INMI is that arousal state plays a
role in its encounter. In an experience sampling study of the phenomenology of musical
imagery in its everyday occurrence, Beaty and colleagues (2013) report that participants
imaged music more when they felt happy or worried, but not sad. While happy and worried
empirical musical imagery beyond the “mind’s ear” 455
represent emotions that are high in arousal, sad is typically considered to be a low arousal
state. As a result of their interview study, Williamson and Jilka (2014) speculate, “INMI
may have a functional relationship with arousal state whereby it can be triggered uncon-
sciously in order to modulate a person’s psychophysiological arousal level” (666). It
seems that the body might play a variety of different roles when it comes to shaping our
everyday experiences of imaging music: physical enactment contributes to an embodied
memory for music; our physical capabilities facilitate imagery for music that we can
produce with our bodies; INMI could function to moderate our physiological arousal;
and activities involving motion are associated with musical imagery.
In my experience sampling study of everyday musical imagery occurrences (Bailes 2015),
I was interested in the mood of participants at the times they were observed. Mood
scales were included to measure the respondents’ positivity, present-mindedness, and
arousal (alert-drowsy, energetic-tired). A model of mood ratings during musical
imagery episodes found that respondents were unlikely to report imaging music when
they felt drowsy. The relationship between INMI and subjective arousal is relevant to
work by Jakubowski and colleagues (2015), who tracked the tempo of imaged music by
asking respondents to tap it as it occurred in everyday life, with measurements recorded
by a wrist-worn accelerometer. Participants further noted information about their
circumstances at the time in a diary. While no measure of the physiological arousal of
the respondents was recorded, we do have information about their subjective ratings of
arousal, and these were found to be significantly related to the tempo of the music that
they tapped. This is in keeping with one of the four factors of the newly created IMIS,
“Movement”6 (Floridou et al. 2015). A factor analysis of answers to a self-report inventory
of individual differences in INMI grouped the following movement items together:
“The rhythms of my earworms match my movements,” “The way I move is in sync with
my earworms,” and “When I get an earworm I move to the beat of the imagined music.”
This “Movement” factor was subsequently found to correlate with a number of other
existing measures. Notable correlations occurred with the reported frequency of
experiencing INMI and the Bucknell Auditory Imagery Scale-Vividness (BAIS-V)
(Halpern 2015). The authors note a “potential for overlap in embodied responses to
hearing real music and experiencing spontaneous INMI, a link that could be explored
with both behavioral and neuroimaging studies” (Floridou et al. 2015, 33).
It is commonly believed that “earworms” are an annoyance, and work by Williamson
and colleagues (2014) sought to understand how we deal with them when they occur.
Using data from English and Finnish online surveys, the authors conducted a qualitative
analysis of 1,046 earworm reports and found that physical approaches to dealing with the
phenomenon were among the most popular responses. For example, respondents would
seek out the tune (including singing it or playing it) or use musical or verbal distraction
such as humming, singing, talking aloud, or listening to music/the radio/television.
Response categories derived from the English survey included a “Physical” subcategory
under the “Distract” theme, with physical behaviors intended to distract the respondent
from their earworm including the subgroupings “eat,” “rhythmic,” “breathe,” “exercise,”
and “work.” A second model was derived for the English survey data to only include INMI
456 freya bailes
behaviors that were rated as being effective. This model retained a “Physical” subcategory
for the “Distract” theme, and nonmusical forms of distraction included speech and
watching television. Williamson and colleagues (2014) suggest that we use distraction
behaviors that compete in working memory with the musical imagery, in this case
implicating movement and thus bodily involvement.
In summary, empirical studies of musical imagery present data that vary in the strength
to which they can be considered to be supportive of an embodied interpretation. Where
evidence is lacking, this could be attributable to the use of research methods that are not
optimally applied to further our understanding of imagery embodiment, since they
were designed to address quite different research problems. However, a handful of
studies have now been conducted to test specific hypotheses that relate musical imagery
to movement. McCullough Campbell and Margulis (2015) set about testing the hypoth-
esis that physical activity during music listening would induce more frequent INMI
than passive music listening. In this research, 123 participants were randomly assigned
to different experiment conditions that varied in the requirement to have a motor
involvement while listening to a song (thought to be likely to induce INMI). Participants
were instructed to listen, move, or sing while hearing the song over headphones, before
being asked to take part in a dot-tracking task designed to induce INMI because of its
low demands on the participants’ attention. Following this, participants completed a
questionnaire asking them about their INMI experiences both during the experiment
and in general. Contrary to expectation, no significant differences were found between
the experiment groups, seemingly because participants found it difficult to comply with
the instruction to listen silently without moving. Consequently, the authors of the study
re-analyzed the data comparing INMI frequency in relation to the amount of motor
involvement that the individual participants reported, rather than the amount that was
asked of them by their experimental condition. This analysis revealed that those par-
ticipants who reported both moving and vocalizing during the song presentation expe-
rienced more INMI than those who reported being still and silent. The finding that
“moving and vocalizing proved near irresistible” (Margulis et al. 2015, 353) in itself lends
support to the case for the embodiment of musical engagement, and the propensity to
move or vocalize to some extent, while listening, is well known.
Beaman and colleagues (2015) investigated the role of articulatory motor planning
during both voluntary and involuntary musical recollections. Following in the tradition
of research suggesting the importance of subvocalization in auditory imagery, they
devised a paradigm in which participants were exposed to a particular song and then
the incidence of imaging it was recorded. Subsequent to the song presentation, participants
were either asked to chew gum or were not given gum to chew. The authors hypothesized
empirical musical imagery beyond the “mind’s ear” 457
that chewing gum should serve to degrade articulatory motor programming, and so
reduce the incidence of musical imagery accounts. Their findings suggest that musical
recollections were reduced when chewing gum, and the authors argue that this reflects
an association between articulatory motor programming and imagery for song. I will
now consider some of the other ways in which theories of embodied mental imagery
might be explicitly tested to further our understanding of the role of the body in our
musical imagination.
We have seen that many empirical findings from studies of musical imagery challenge
the restricted notion of hearing in the “mind’s ear,” since our body is implicated in the
quality of the experience. However, searching the literature for compatible evidence for
an embodied account of musical imagery is a problematic endeavor because: (1) the
search decontextualizes findings in ways that might mask or even contradict the
original purpose of the source research, (2) it runs the risk of amplifying evidence by
virtue of its isolation, (3) it is susceptible to bias in the selection of relevant material, and
(4) it can only highlight associations rather than establish causal relationships. In order
to assess the extent to which musical imagery is an embodied cognition phenomenon, a
tailor-made research agenda is needed to enable more hypothesis testing about the role
of the body in our musical consciousness.
Before outlining some promising future directions, it is important to acknowledge evi-
dence that seems to temper the claims that can be made for an embodied musical imagery.
First, research corroborates centuries of music pedagogy in suggesting that physical prac-
tice at an instrument will lead to greater improvements than mental practice (Cahn 2008;
Bernardi et al. 2013). Similarly, Lotze and colleagues (2003) argue that auditory-motor asso-
ciations were not sufficiently tight in their study of violinists for them to be co-activated
without actually hearing the performed sound, or actually producing the performed
movement. Finally, Aleman and colleagues (2000) found that musicians outperformed
nonmusicians on auditory imagery tasks. While superior musical imagery abilities are to be
expected, and these are entirely consistent with embodied imagery, there is no reason to
suppose that musicians should have any more embodied knowledge of the everyday sounds
used in the auditory imagery task than the nonmusicians.7 Moreover, the superior perfor-
mance of the musicians on the auditory imagery task cannot be explained by a greater abil-
ity to compare sounds as a result of their training, since musicians and nonmusicians were
comparable in their performance on the equivalent sound perception task. These potential
caveats support the case for an empirical exploration of the extent of the contribution that
embodiment makes to musical imagery experience.
In uncovering movement as an important factor in the experience of INMI, Floridou
and colleagues (2015) agree that embodied cognition is a relevant avenue for future
458 freya bailes
work. I will now point to a selection of the questions that are raised by expanding musical
imagery beyond the mind’s ear. For example, should we test for possible differences
in the degree to which our musical imagery is embodied, and what could such differ-
ences tell us? Godøy (2001) argues, “we have more salient images of sound when we
have more salient images of how the sounds are produced” (238). This hypothesis is
amenable to experimental testing and is ripe for future research. Here, we can return to
the unexpected finding from Halpern and colleagues (2004) of activation in SMA
during a timbre imagery task: pitch and timbre might be disentangled by asking partici-
pants to image a selection of noise-based stimuli, with the prediction that noisy timbres
that are difficult to produce will not elicit SMA activity.
Musical imagery (e.g., re-presenting a sequence of just heard notes in one’s mind)
feeds our musical imagination (e.g., creating a new sequence of notes in one’s mind). To
the extent that our musical imagery is embodied, are our imaginative re-presentations
of music constrained by our physical experience, and if so how can we understand the
role of our body in creative musical thought? Empirical studies of the musical imagery
of composers are lacking (Bailes and Bishop 2012), and research is needed to explore
how bodily experience shapes compositional ideas.
Embodied accounts of musical imagery necessarily relate to learning, since it would
be the changes in our embodied experience that come to shape our mental represen-
tations. This is arguably the research area for which we have the most empirical evidence
in the guise of studies demonstrating enhanced auditory-motor coupling as a result
of musical training. How might an empirical understanding of mental imagery as
embodied be applied in music education? The music pedagogy of Jaques-Dalcroze (1967)
places great emphasis on the importance of movement and sound. This intimately asso-
ciates sounds with their physical production (Campbell 1989), and it seems reasonable
to suggest that the musical representations of those who follow such training are strong
in motor imagery. A related prediction is that there is a link between musical experience
and the fidelity of musical imagery, and that this can be explained in terms of embodi-
ment. Anecdotal evidence of more vivid musical imagery for music that has been expe-
rienced through dance or musical performance might be corroborated by experimental
research. An embodied account of learning should help to explain the pedagogical links
between doing and thinking. The implications for music education are extensive,
suggesting that practice-led learning is the most effective approach to developing reliable
representations of music.
Finally, interoception, which is the sense of our physiological condition, is theoreti-
cally relevant to mental imagery when this is viewed as an offline simulation of embod-
ied cognition. Research into interoception (e.g., Kadota et al. 2010) suggests that it can
have significant consequences on our psychological state. It seems that our perceptions
are subconsciously tuned to our own biological rhythms such as heartbeat (Aspell
et al. 2013), and it remains an open question as to whether interoceptive forces shape our
musical imagery.
empirical musical imagery beyond the “mind’s ear” 459
Concluding Remarks
I would like to conclude by reflecting on the apparent intangibility of both sound and
imagination, with a reminder that our corporality tangibly relates the two. Imagination
is often taken to be synonymous with mental freedom, yet our thoughts are shaped by
our environmental, biological, and cognitive experience. If this shaping extends to
mental imagery, then our musical imagery will be characterized by those features of our
environment that are of personal significance, and investigating musical imagery should
enable a better understanding of what is meaningful in sound. A review of the empirical
literature has demonstrated that our imagery for musical sound is not limited to a single,
auditory modality, and the involvement of motor imagery in particular reminds us that
music results from physical action. In this way, our understanding of sound as embod-
ied can be illuminated through the lens of imagination.
We might then ask how our understanding of imagination can be magnified through
the lens of sound. For most people, the term “mental imagery” is equated with visual
imagery. However, much can be gained from studying mental imagery for other modali-
ties: sound and music are obviously articulated through time, with auditory and musical
imagery emphasizing the dynamic processes that underpin their generation rather than
the apparently static product evoked by imagining a visual scene (Bailes forthcoming).
This chapter has argued that we should extend our understanding of auditory imagery
beyond the “mind’s ear.” Sound naturally affords a focus on the essentially dynamic
properties of imagery and imagination, and this is often missing in the frequently disem-
bodied, static conceptualization of visual imagery occurring in the “mind’s eye.”
Notes
1. Throughout this chapter the terms “imagery” and “imaging” primarily reference
re-presentation, while “imagination” and “imagining” are used more often to signal the
imaginative.
2 In this respect, imagined music resembles perceived music as a cross-modal phenomenon
in which auditory, visual, and kinesthetic senses seem most likely to feature rather than
gustatory or olfactory modalities.
3. Though see Schiavio, Menin, and Matyja (2014) for arguments against the loose
adaptation of the unconscious embodied simulation account to describe conscious
phenomena.
4. Many forms of electronic music require minimal gestures.
5. Burgoyne, J.A. 2015. Resurrecting the Earworms of Our Youth: What Is Responsible for
Long-Term Musical Salience? Paper read at Investigating the Music in our Heads, June 1, 2015,
Goldsmiths University of London.
6. The others are “negative valence,” “personal reflections,” and “help.”
7. Unless they have been trained in environmental listening or acousmatic composition.
460 freya bailes
References
Aleman, A., M. R. Nieuwenstein, K. B. E. Böcker, and E. H. F. de Haan. 2000. Music Training
and Mental Imagery Ability. Neuropsychologia 38 (12): 1664–1668. doi:10.1016/S0028-
3932(00)00079-8.
Aspell, J. E., L. Heydrich, G. Marillier, T. Lavanchy, B. Herbelin, and O. Blanke. 2013. Turning
Body and Self Inside Out: Visualized Heartbeats Alter Bodily Self-Consciousness and
Tactile Perception. Psychological Science 24 (12): 2445–2453. doi:10.1177/0956797613498395.
Baddeley, A. 1986. Working Memory. Oxford: Clarendon Press.
Bailes, F. 2006. The Use of Experience-Sampling Methods to Monitor Musical Imagery in
Everyday Life. Musicae Scientiae 10 (2): 173–190.
Bailes, F. 2007. The Prevalence and Nature of Imagined Music in the Everyday Lives of Music
Students. Psychology of Music 35 (4): 555–570. doi:10.1177/0305735607077834.
Bailes, F. 2015. Music in Mind? An Experience Sampling Study of What and When, Towards
and Understanding of Why. Psychomusicology: Music, Mind, and Brain 25 (1): 58–68.
doi:10.1037/pmu0000078.
Bailes, F. 2019. Musical Imagery and the Temporality of Consciousness. In Music and
Consciousness 2: Worlds, Practices, Modalities, edited by D. Clarke, R. Herbert, and E. Clarke.
Oxford University Press.
Bailes, F., and L. Bishop. 2012. Musical Imagery in the Creative Process. In The Act of Musical
Composition: Studies in the Creative Process, edited by D. Collins, 54–77. Farnham, UK:
Ashgate.
Baker, J. M. 2001. The Keyboard as Basis for Imagery of Pitch Relations. In Musical Imagery,
edited by R. I. Godøy and H. Jørgensen, 251–269. Lisse, Netherlands: Swets & Zeitlinger.
Beaman, C. P., K. Powell, and E. Rapley. 2015. Want to Block Earworms from Conscious
Awareness? B(u)y Gum! Quarterly Journal of Experimental Psychology 68 (6): 1049–1057.
doi:10.1080/17470218.2015.1034142.
Beaty, R. E., C. J. Burgin, E. C. Nusbaum, T. R. Kwapil, D. A. Hodges, and P. J. Silvia. 2013.
Music to the Inner Ears: Exploring Individual Differences in Musical Imagery. Consciousness
and Cognition 22 (4): 1163–1173. doi:10.1016/j.concog.2013.07.006.
Bernardi, N. F., M. De Buglio, P. D. Trimarchi, A. Chielli, and E. Bricolo. 2013. Mental Practice
Promotes Motor Anticipation: Evidence from Skilled Music Performance. Frontiers in
Human Neuroscience 7:451. doi:10.3389/fnhum.2013.00451.
Berthoz, A. 1996. The Role of Inhibition in the Hierarchical Gating of Executed and Imagined
Movements. Cognitive Brain Research 3:101–113. doi:10.1016/0926-6410(95)00035-6.
Bishop, L., F. Bailes, and R. T. Dean. 2013. Musical Imagery and the Planning of Dynamics and
Articulation during Performance. Music Perception 31 (2): 97–117. doi:10.1525/mp.2013.31.2.97.
Brodsky, W., A. Henik, B.-S. Rubinstein, and M. Zorman. 1999. Inner Hearing among
Symphony Orchestra Musicians: Intersectional Differences of String-Players versus Wind-
Players. In Music, Mind, and Science, edited by S. W. Yi, 370–392. Seoul: Seoul National
University Press.
Brodsky, W., A. Henik, B.-S. Rubinstein, and M. Zorman. 2003. Auditory Imagery from
Musical Notation in Expert Musicians. Perception and Psychophysics 65 (4): 602–612.
doi:10.3758/BF03194586.
Cahn, D. 2008. The Effects of Varying Rations of Physical and Mental Practice, and Task
Difficulty on Performance of a Tonal Pattern. Psychology of Music 36 (2): 179–191. doi:10.1177/
0305735607085011.
empirical musical imagery beyond the “mind’s ear” 461
Schiavio, A., D. Menin, and J. Matyja. 2014. Music in the Flesh: Embodied Simulation in
Musical Understanding. Psychomusicology: Music, Mind, and Brain 24 (4): 340–343.
doi:10.1037/pmu0000052.
Varela, F. J., E. Thompson, and E. Rosch. 1991. The Embodied Mind: Cognitive Science and
Human Experience. Cambridge, MA: MIT Press.
Weber, R. J., and S. Brown. 1986. Musical Imagery. Music Perception 3 (4): 411–426. doi:10.2307/
40285346.
Williamson, V. J., S. R. Jilka, J. Fry, S. Finkel, D. Müllensiefen, and L. Stewart. 2011. How Do
“Earworms” Start? Classifying the Everyday Circumstances of Involuntary Musical
Imagery. Psychology of Music 40 (3): 259–284. doi:10.1177/0305735611418553.
Williamson, V. J., and S. R. Jilka. 2014. Experiencing Earworms: An Interview Study of
Involuntary Musical Imagery. Psychology of Music 42 (5): 653–670. doi:10.1177/0305735613483848.
Williamson, V. J., L. A. Liikkanen, K. Jakubowski, and L. Stewart. 2014. Sticky Tunes: How Do
People React to Involuntary Musical Imagery? PLoS One 9 (1): e86170. doi:10.1371/journal.
pone.0086170.
Zatorre, R. J., A. R. Halpern, D. W. Perry, E. Meyer, and A. C. Evans. 1996. Hearing in the
Mind’s Ear: A PET Investigation of Musical Imagery and Perception. Journal of Cognitive
Neuroscience 8 (1): 29–46. doi:10.1162/jocn.1996.8.1.29.
pa rt I V
A E ST H ET IC S
chapter 23
Theodore Gracyk
Introduction
Appreciative listening to music involves the exercise of taste, for it involves attention
to aesthetic properties, as when we distinguish between graceful and clunky transitions,
and between violent and sluggish rhythms. For over three centuries, major figures in
philosophical aesthetics have argued that aesthetic engagement with art—and therefore
music—includes pleasures of the imagination (Addison and Steele 1965). So listening is
both perceptual and imaginative. Frequently, this connection is cashed out with respect
to the problem of how music conveys emotion, as when R. K. Elliott diagnoses the expe-
rience of emotional qualities in music as a case of “imaginatively enriched perception”
(1967, 119). Listening to music differs from hearing the sounds that constitute the music,
and some of this difference stems from our imaginative enrichment of those sounds.1
Although I endorse the conventional thesis that imaginative engagement is normally
required to appreciate music when listening to it, I argue that we should be more cir-
cumspect about this claim than is typically the case. For example, most accounts of
musical expressiveness say that imaginative engagement is required in order to perceive
the melancholy that runs through most of Mozart’s G Minor String Quintet (K. 516) or
the joy of Louis Prima’s “Sing Sing Sing” as performed by Benny Goodman and his
Orchestra. In turn, expressiveness is frequently tied to imaginative enrichment that
recasts auditory events as musical motion. Imagination lets us hear motion and gestures
“in” the progression of sounds, which in turn facilitates an experience of expressiveness.2
I reject both of these proposals, as well as the weaker proposal that imagination is
468 theodore gracyk
Virginia Woolf frequented the opera and sought out performances of Beethoven’s string
quartets. Reflecting in her diary about a concert of instrumental chamber music, she
mused, “musical people don’t listen as I do, but critically, . . . without programmes”
(Woolf 1980, 39). She wondered whether she was listening properly when the music
encouraged streams of imaginative imagery and associations. Woolf remarks that, when
a concert program features a Bach concerto, “its [sic] difficult not to think of other
things” (Woolf 1977, 33). She offers a lengthy description of listening to music in her
stream-of-consciousness story “A String Quartet,” in which a nameless protagonist
responds as follows to the opening measures of a Mozart quartet.
[L]ooking across at the player opposite, the first violin counts one, two, three—
Flourish, spring, burgeon, burst! The pear tree on the top of the mountain. Fountains
jet; drops descend. But the waters of the Rhone flow swift and deep, race under the
arches, and sweep the trailing water leaves, washing shadows over the silver fish, the
spotted fish rushed down by the swift waters, now swept into an eddy.
(Woolf 2003, 133)
emotion” (Gurney 1880, 306). According to Gurney and Lee, Woolf ’s active imagination
places her squarely in the company of mere hearers. Her listening is indefinite.3
For Woolf, the issue was more than an academic question. Her mode of attending to
music had recently been described and criticized by her brother-in-law, Clive Bell.
Explaining and defending aesthetic formalism, Bell valorizes the pattern-focused
attention of Lee’s “listeners” and Gurney’s definite listening:
Tired or perplexed, I let slip my sense of form . . . I begin to read into the musical
forms human emotions of terror and mystery, love and hate, and spend the minutes,
pleasantly enough, in a world of turbid and inferior feeling. At such times, were the
grossest pieces of onomatopoeic representation—the song of a bird, the galloping of
horses, the cries of children, or the laughing of demons—to be introduced into the
symphony . . . they would afford new points of departure for new trains of romantic
feeling or heroic thought. I know very well what has happened. I have been using art
as a means to the emotions of life and reading into it the ideas of life. I have been
cutting blocks with a razor (Bell 1914, 31–32).
much the result of the stimulation of “sensations” (Kant 2000, 206). The centrality of
imagination continues to dominate theories of musical experience through the nineteenth
century and then into our own time. In recent years, imaginative engagement is treated
as an essential component of listening by both philosophers and musicologists, including
Roger Scruton (1974, 1999), Charles Rosen (1995), Nicholas Cook (1990), Denis Dutton
(2009), and Jerrold Levinson (2006a).
Let us assume that many people attend to instrumental music as Woolf did, engaging
with music more imaginatively than is minimally required for perception of sound
sequences.4 Against the common prejudice that listeners who engage in a more robust
imaginative response are less musical than those who listen “without programmes,”
imaginative supplementation of what we actually hear seems to be unavoidable and
necessary in music listening. But if all music listening is partly perceptual and partly
imaginative, the real issue with Woolf ’s response is the degree to which some listeners let
their imaginations run free.
Many, many distinct roles have been assigned to imagination since Aristotle identified it
as central to human thought (Sparshott 1990; Stevenson 2003; Townsend 2006, 160–161).
Therefore, disagreements about its role in listening cannot be resolved unless we pro-
vide focus by determining which roles are relevant. I have already discarded most of the
roles assigned to imagination by focusing on occurrent imagining, where imagination is
applied to music that one currently hears. Occurrent imagining may have little or
nothing in common either with having a tune stuck in the head (an eidetic image or
“earworm”) or with imagining sounds while silently studying a musical score (Tovey 1936).
Mary Warnock provides a succinct summary of the relevant central idea as our
“capacity to look beyond the immediate and the present” (1976, 201). Of course, this
does not distinguish imagination from memory, which can share precisely the same
content. When I look at my yellow house and remember that it used to be blue, I need
not form a mental image of it as blue. However, suppose I do, and “picture” the house as
it used to be. Psychologists refer to this phenomenon as memory imagery. Since the
experiential content of memory and imagination imagery can be identical, we need
nonphenomenal criteria for differentiating memory imagery from imagination imagery.
Current consensus holds that imagining the house as blue differs from remembering that it
was blue according to whether one believes that it was blue. Imagination imagery is
belief-independent. If someone believes that the image reproduces something as it was
experienced in the past, then it is a memory (even if it is false).5 In Roger Scruton’s pre-
ferred description, “imagination involves thought which is unasserted” (1974, 97; cf.
Scruton 1999, 88–89). Other recent explanations say that imagination is “quarantined”
from beliefs, where “pretense representations differ from belief representations by their
function” (Nichols 2004, 130). Or, more precisely, by their reduced function as measured
imaginative listening to music 471
by behavioral consequences. As a well-worn example has it, a horror movie may induce
some level of fear, but imaginary monsters do not prompt normal people to call the
police for help, nor do they jump from their seats and run for safety. When Woolf
imagines the spotted fish in the eddy, she does not make plans to return to that spot later
with a fishing rod.6
So which species of imagining are most relevant to music listening? I will concentrate
on three modes of imaginative engagement that philosophers typically discuss in rela-
tion to experiencing pictures, literature, and music. They are propositional imagining,
imagination imagery, and hearing-in.7
Propositional imagining involves conceiving or making-believe that a proposition or
set of propositions is true of some world, without necessarily believing that it holds true
of our own. This species of imagining is normally understood to simulate belief. For
example, suppose you are reading Jane Austen’s Sense and Sensibility and reach the line,
“a neat wicket gate admitted them into [a small green court].” One way to respond to this
linguistic prompt is to suppose, for purposes of the narrative, that the Dashwood family
has now passed through a wicket gate and has entered a small courtyard in front of their
new cottage. This thought may or may not be accompanied by a second imaginative
activity, imagination imagery. Some readers will supplement the propositional imagining
with imagery, visualizing (in their “mind’s eye”) a fence and wicket gate and a group
of women going toward a cottage.8 There will be considerable variation in what is
imagined. One reader may construct an image of a green, grassy courtyard in front of
a one-story cottage. Another may furnish the area with rose bushes, and imagine a
small, two-story house.
However, music listening might require a third species of imaginative engagement,
which literary fiction does not require. This third kind is common with sculpture,
pictures, and films: it involves imaginative transformation of what one directly perceives.
Suppose I look at a landscape painting and I see both a paint-cracked surface and the
painting’s representation of a horse-drawn cart crossing a stream beside a white cottage.
Looking at the canvas, I imagine that I am actually seeing the English countryside in
some past age. This kind of imaginative engagement accompanies reading when illus-
trations appear in the particular edition that one is reading. But, graphic novels aside,
pictures are not necessary for the experience of literature. For sculpture and pictures,
the requisite imaginative engagement with the visual object is generally referred to as
seeing-in (e.g., Lopes 2005). With music, the parallel case is hearing-in, as when the rumble
of thunder is heard in the tympani rolls in the storm sequence of Beethoven’s Pastoral
Symphony.9 Hearing-in is guided by the listener’s direct experience of sonic features,
their combination, and their sequencing. Hearing-in and seeing-in are alike in that each
involves imaginatively experiencing a perceived object to be something more than it is.
Granted, both seeing-in and hearing-in are sometimes supplemented with—and guided
by—propositional imagining.10 Informed listeners may attend to a relevant passage in
the Pastoral Symphony by consciously or unconsciously imagining “The storm is passing
now” without believing they have witnessed a storm.11 Others will also add imagination
imagery, supplementing the auditory experience by visualizing a storm and then its
472 theodore gracyk
lifting.12 There will be cases, therefore, when seeing-in and hearing-in invite all three
species of imaginative engagement.
Is this third kind of imaginative engagement with music, hearing-in, required in
music listening, either by itself or as guided by propositional imagining? A central test
case for the necessity of imagination in listening is the thesis that music demands
hearing-in when we imaginatively “animate” what we hear (e.g., Trivedi 2011, 118). Many
accounts of listening treat hearing-in as essential to the experience of musical move-
ment and to the experience of music’s expressive qualities. However, it is possible that
different imaginative processes—or perhaps none at all—are involved when we experience
movement, structure, and expressivity. The other two species of imaginative engagement,
propositional imagining and imagination imagery, are subject to the objection that the
thoughts and images are unnecessary and inessential additions to the listening process.
This objection appears to capture Woolf ’s worries about her listening strategies.
However, it cannot be raised against hearing-in if both of two conditions hold: (1) the
experience of musical animation is an essential aspect of the experience of listening; and
(2) the perceived animation requires imaginative hearing-in, which is not required
more generally for auditory perception. Following some additional prefatory work, I
will challenge the second of these two conditions; having done so, I will look for other
ways that hearing-in might be essential to music listening.
My general concern is the plausibility of the thesis that a particular species of imaginative
engagement, hearing-in, is required for music listening. In this section I elaborate on
why hearing-in is the crucial test case.
It does not take much to demonstrate the weaker thesis that imaginative engagement
is frequently appropriate when listening. For example, it is appropriate for songs, opera,
and program music where verbal cues will attune the listener’s sensitivity to extra-musical
representation in various musical structures. Listening to Jimi Hendrix’s rendition of
“The Star-Spangled Banner” at Woodstock, we should hear bombs exploding in the
guitar pyrotechnics following the (unsung) line, “the rocket’s red glare, the bombs bursting
in air.” Likewise, we should hear the foot treadle of the spinning wheel in the music of
Schubert’s “Gretchen am Spinnrade.” Although these cases of hearing-in are appropriate,
imaginative responses, they do not advance the case that hearing-in is a necessary
element of music listening. They fail for the same reason that stage sets at the opera do
not count as evidence that music is an audiovisual art form. These are hybrids of music
and something more, and the “something more” is an obligatory guide to the imagi-
nation in our response to the hybrid object of attention (see Davies 1994, 113–114).
The plausibility of the stronger thesis, that listening to music requires hearing-in,
hinges on imagination’s role for listeners who do not receive explicit guidance from
extra-musical information. As Eduard Hanslick (1986, 15) argues, absolute music is the
imaginative listening to music 473
ideal test case for any view that a property or process is essential to music or music
listening. We can generalize from it because it is “pure, objective, and self-contained—
that is, not subordinated to words (song), to drama (opera), to a literary programme or
even to emotional expression” (Hamilton 2007, 87).
For the remainder of this essay, I will concentrate on examples that lack extra-musical
information. But when listeners’ imaginations float free from intramusical guidance,
purists can dismiss the imaginative response as subjective, irrelevant, and unmusical.
So the strong thesis requires intramusical guidance from absolute music that yields
(relatively) reliable recognition of whatever is heard in the music; listeners would report
agreement at roughly the same level that people agree that a dog is pictured when shown
a picture of a dog. Because no such level of agreement is evident with musical represen-
tation, we seem justified in dismissing idiosyncratic images, such as Woolf ’s “pear tree
on the top of the mountain.” After all, how can anyone hear a tree, much less a particular
type of fruit tree, by way of hearing-in? But the very same objection can be raised against
any musical representation in which a sound does not resemble another sound. The
tympani may sound like thunder and the woodwinds can imitate birdcalls, but imagi-
nation will play a limited role in listening if it is limited to cases of onomatopoeia. We need
more than onomatopoeia but less than universal recognition of whatever is represented.
We need an account of how musical patterns and passages guide propositional and/or
imaginative hearing in a non-onomatopoeic manner (see Davies 1994, chap. 2).
To better understand guided response, it will be useful to adapt a distinction from
Kendall Walton. Consider the difference between cases where we imagine, of some
object that we perceive, that it is something it is not, versus cases where an object leads
us to imagine something, but we do not imagine it of the prompt itself (Walton 1990, 25).
To imagine that a tree stump is a bear is a case of the former, whereas imagining that it is
raining somewhere when one sees a dripping faucet is a case of the latter sort (because one
is not imagining that the dripping water is rain). In the former case, the object is a prop,
while in the second it is a trigger. The perceived object is a prop if there are conventions
in place by which its properties generate fictional truths that guide our imaginings; that
is, it is a prop if its particular features guide appropriately backgrounded participants
to imagine a determinate state of affairs. Suppose we are playing the board game
Monopoly and I mistakenly move someone else’s token and then “buy the railroad” on
which it lands. Other players can (and will!) object that I have moved from the wrong
location and so cannot buy that railroad with my Monopoly money. The various props
together with established rules-of-play endorse certain imaginative responses and not
others. Here, the mistake of moving the wrong token has an objective consequence for
what is taking place (and, also, not taking place) in the game world. Conversely, the same
object of perception is a mere trigger when the imagined content is imaginative imagery
that is idiosyncratic and unconstrained by the object’s features. Suppose I select the
wheelbarrow as my game token because it encourages happy thoughts of a bountiful
harvest from my small backyard garden. But my garden is too small to involve use of a
wheelbarrow, and my response floats free of the game; now, the token functions as a
trigger, rather than a prop, for my imaginative enrichment.
474 theodore gracyk
Aligned with the distinction between hearing-in and imaginative imagery, the
istinction between props and triggers offers a general framework for evaluating differ-
d
ences in listeners’ responses. It directs us to ask, of any particular imaginative response,
whether it is appropriately directed and focused by perceptual cues. Purists are correct
to question the appropriateness of responses of someone who treats all Western instru-
mental music as a trigger for fanciful, free-roaming imaginings.13 So we seem to have a
principled reason to set aside idiosyncratic responses, such as Woolf ’s fish and pear tree.
Therefore, imagination imagery is a poor candidate for the strong thesis and we should
concentrate on the way that music functions as a prop (rather than a trigger) for hearing-in.
For example, Lee found a pattern of water imagery independently associated with certain
pieces of absolute music (1932, 428–429; e.g., for a Chopin Nocturne). This high level of
agreement suggests that this imagery arises because the music is a rule-governed prop
for our hearing-in. However, we must not be too hasty. Instrumental music often serves
as a fragmentary, largely indeterminate prop: the imaginings they license are less deter-
minate than in most games of make-believe (Walton 1994, 52). From that perspective,
there is no reason to object to the fact that Woolf ’s imaginary fish and tree are highly
determinate interpretations of audible elements of the musical experience.14 She may be
more “musical” than she thinks she is, but thinks otherwise due to the influence of
formalists who deny that music should be a prop for hearing-in. What is at stake is
whether there is any prop-function that holds for all music listening.
Given the anti-imagination stance of formal purists, it is important to recognize that
some formalists endorse hearing-in. Hanslick, perhaps the most influential formalist of
the nineteenth century, urged a distinction between necessary and unnecessary imagi-
native hearing-in. He is frequently attacked for a variety of intellectual sins, real and
exaggerated, but he is seldom given credit for foreshadowing Walton’s distinction
between prompts and triggers. Hanslick famously argues that it is an error to imagine a
narrative or expressive persona when listening to Bach’s Das wohltemperierte Klavier
(1986, 14). However, Hanslick is not anti-imagination: “If we are to treat music as an art,
we must recognize that imagination and not feeling is always the aesthetical authority”
(5). Some imaginative responses count as appreciative response, while some others do
not (30). Basically, hearing-in is only appropriate when there are real properties of the
music that serve as focal points for the imaginative response. For Hanslick, the first
requirement is a culturally entrenched tonal system. On this basis, our sense of musical
form is tied to our apprehension of it as a representation of “the motion of a physical
process according to the prevailing momentum: fast, slow, strong, weak, rising, falling . . . It
can depict not love but only such motion as can occur in connection with love” (11).
Hearing-in generates awareness of musical animation. It is therefore essential to all
musical content, which Hanslick identifies with “tonally moving forms” (29). He dis-
tinguishes this from cases of “hearing” where music is a mere trigger for free associ
ation and where the listener is not appreciating it for what it is, despite enjoying the
experience (59).
Hanslick’s sketchy remarks endorse the necessity of imaginatively enriched perception.
More importantly, he identifies the feature that has attracted almost universal consensus
imaginative listening to music 475
Expressiveness
Appropriately backgrounded listeners frequently hear music as sad, joyful, anxious, and
so on. Levels of agreement are so high that we can use expressivity as a test case of musical
competence. For example, we must doubt the musicality of anyone who reports that
Benny Goodman’s performances of “Sing Sing Sing” sound melancholy and despairing.
Following established philosophical usage, I speak of music’s “expressiveness” and
“expressive qualities” rather than its “expression of emotion.” Genuine expression
requires a person or sentient being who has an emotion and signals it to others by
means of external signs. Thus, my dog expresses happiness by wagging his tail. However,
composers are capable of composing music that sounds happy or sad, or happy or sad in
a very particular way, without having to draw on their own emotional experiences as a
source of the music’s design. The key to composing sad music is knowing what sad music
sounds like. There may be occasions where composers engage in self-expression, but
self-expression is not necessary for the music’s having an expressive dimension.
Therefore, it is better to describe the sadness of, say, a twelve-bar blues as an expressive
quality than to treat it as an expression of the emotion of sadness.
I will be brief about expressiveness and imagination. Many, and perhaps most,
philosophies of art analyze musical expressiveness by reference to imagination and
make-believe. Malcolm Budd pinpoints the “underlying idea” as the proposal that
“emotionally expressive music is designed to encourage the listener to imagine the
occurrence of experiences of emotion” (1989, 135). Unfortunately, the idea that music’s
expressiveness emerges through imaginative engagement does not establish that all
music listening requires imagination. Some music is not appropriately heard as pos-
sessing expressive qualities, including some the serialism of Milton Babbitt, Pierre
Boulez’s Structures I and II, and Philip Glass’s Music in Contrary Motion. “Expressionless”
music is not restricted to the twentieth and twenty-first centuries. The fugues of
476 theodore gracyk
J.S. Bach’s Das wohltemperierte Klavier and Die Kunst der Fuge are frequently identified
as examples of “emotionless” musical masterpieces (Lang 1997, 509). So although a hearing-
in account of expressiveness supports the view that imaginative enrichment is sometimes
necessary to perceive expressive features, we should not generalize this finding to all
music listening.
However, I regard even that position on expressiveness as overly generous. The limited
scope of that endorsement collapses if there is a plausible nonimagination account of
music’s expressiveness. Here, I think that Budd and Stephen Davies are correct to
exclude imagination from our detection of musical sadness and happiness, the two
most universally recognized “emotions” in music (Budd 1989, 137; Davies 2011, 1–20).
We describe many external appearances with emotion terms, yet we do not always
imagine in these cases that we are detecting any underlying mental states. For example,
in the same way that we can describe weather as “gloomy” without attributing feelings to
weather, we can describe someone as having an “angry” tone of voice without thinking
they are angry. Since emotion descriptions of music are obviously descriptions of how
the music sounds, phrases such as “angry music” and “sad music” may be compressed,
literal descriptions of angry-sounding music, sad-sounding music, and so on.
Yet the topic of musical expressiveness is not irrelevant to our interests here. Many
accounts of expressiveness regard it as dependent on a second phenomenon, our experi-
ence of musical animation or movement (Hanslick 1986, 11; Lee 1932, 80; Kivy 1989,
52–58; Levinson 2006b, 121–123; Davies 2011, 10–11). In turn, the experience of musical
movement and animation is generally thought to require imaginative engagement (and
doubly so, when it is interpreted as a bodily gesture reflecting agency). Since all music
displays some kind of motion or animation, it is not expressiveness but rather musical
motion that provides a universal musical phenomenon that may require imagination.
I investigate this proposal in the next section.
Here is Shakespeare, four hundred years ago: “That strain again! It had a dying fall.”15
Which strain does Duke Orsino want to hear again? The one that moves with a dying
fall. Here is a recent description of some music in Bernard Herrmann’s score for
Hitchcock’s Psycho: “The opposing nature of the two musical lines moving toward each
other reflects . . . two perspectives” (Rothbart 2013, 46). The musical lines are oriented in
an unreal acousmatic “space” in which they are moving toward each other, and on this
basis they can represent what is happening in the film.16
But do we imagine the movement of a melodic line, or the leap of the octave? Despite
significant cultural differences, the use of motion-terminology and action-descriptions
to characterize music is a cross-cultural phenomenon (Becker 2010). An important
imaginative listening to music 477
a further metaphor for the relationship between them (as approaching each other).
The metaphors would explode exponentially even in cases of a moderately more
complex piece of music, such as when Talking Heads perform “Crosseyed and Painless”
with a vocal line and seven distinct instrumental parts. If we do not find an alternative to
the view that awareness of musical motion requires propositional imaginings, then how
many metaphors do we juggle in our minds when we attend to polyrhythmic music of
this sort? Or do we concede that we cannot hear the musicality of most of those instru-
ments during most of the performance? But that is simply nonsense. We can perceive a
great deal more musical detail and interplay than we conceptualize.
Paul Boghossian provides a second criticism.20 We can distinguish between justified
and unjustified metaphors. To do so, “we would have to be aware of some layer of
musical experience with a perfectly literal content that our musical metaphors would
be designed to illuminate. But there doesn’t seem to be such a layer of experience”
(Boghossian 2007, 123). To put it another way, if one insists that all music listening
derives from the guidance of a particular metaphor, then there is no principled way to
distinguish between the metaphorical and the literal components of the experience, and
the literal component cannot guide the application of the metaphor. Consequently, we
should also be able to listen to any sound sequence in terms of the same metaphor, and
we will hear it as music. (On this hypothesis, there would be nothing radical in John
Cage’s invitation to attend to the music in seemingly nonmusical sound.) However,
although almost everyone recognizes pitch differences in various natural and environ-
mental sounds, and can hear patterns of change in these pitched sounds, it is very
difficult to hear “music” in sound sequences when they have not been organized inten-
tionally as such. Although there may be objective cues in some sound sequences (and
absent from others) that invite us to recognize musical “motion” of various kinds, the
perception of musicality does not arise from our application of a particular metaphor or
from their subsumption under concepts imported from the visual and tactile realms.
Stephen Davies (2011, 32) makes the related point that metaphors only facilitate con
ceptual transference when they are “live,” as nonstandard descriptions that cast new light
on a situation. However, we are not being creative or imaginative when we talk of musical
motion, so the metaphor does not work as a metaphor any longer.
I conclude that music listening does not require metaphorical perception, nor meta-
phor-guided perception. This conclusion deprives us of our most compelling reason to
think that all music listening is infused with imagination. It does not, however, prove
that imagination is always dispensable. Another argument might demonstrate that it
has a necessary role, without invoking the guidance of metaphor.
We might construct an alternative argument by modifying Scruton’s premises. First,
suppose there is a metaphor, but it comes after-the-fact or as a supplement to the experi-
ence, as our best description of what we experience with harmony, melody, and
rhythm.21 Second, we now allow that we perceive neither spatial arrangement nor
motion in a melody or rhythm, for there is no object in space that fits our description
when we say that a sad melody droops. We do not perceive space and motion and we do
not, upon reflection, believe that music moves. Yet we experience something in the
imaginative listening to music 479
music that is usefully described with this language. Some sort of transference is taking
place. Therefore, some underlying mental process is at work that encourages this mode
of description. We must be engaging in an imaginative transformation of what we
perceive, even if what we perceive is ineffable and only approximated by our space and
movement metaphors. We have some kind of propositional imaging that guides
hearing-in, but its precise nature is not available to our introspection.
There are three strong objections to this line of thinking, and I think they are jointly
decisive. First, there is nothing added by appeal to imagination that is not accomplished by
admitting that human experience is largely ineffable. We frequently resort to descriptions
that we do not endorse as literally true. However, this practice does not generally prove
that imagination has transformed the experience in a way that defies description, so
there is no reason to postulate it concerning music. The second counterargument
simply rejects the premise that generates the previous objection. We can deny that the
experience of music is unusually resistant to fine-grained description. Hanslick makes
the point that musicians and music theorists possess a detailed technical vocabulary for
describing music. The problem is not music, but the fact that so few people have learned
to employ this vocabulary. The frequent use of “poetical fictions” to describe basic musical
phenomena provides no evidence that people hear something in the music that is not
literally there (Hanslick 1986, 30). The correct conclusion is that we simply have a lot of
people who have not learned to articulate what they hear. Third, Davies (2011, 25–32)
argues that there is simply no metaphor at work when we say that a melody falls or that
the span of one chord is wider than another. We talk of spatial distances and movement
for all sorts of things besides bodies in our three-dimensional environment. We apply
these concepts and use these words literally whenever our experience is highly similar to
established exemplars. Our general “motion” vocabulary is polysemous, not metaphorical
(Davies 2011, 32).22
We have hit another wall in the search for a compelling reason to grant that imaginative
enrichment must inform listening.
Experiential Illusion
Obvious examples include marquee lights and strings of Christmas lights. When they
are rapidly lit in sequence, they create the illusion that a single point of light is moving
along the string. We see motion even when we know that something else is really hap-
pening. Rafael De Clerq (2007) offers the example of moving the cursor around on a
computer screen. But I do not really move anything there. In reality “there is no cursor
moving on my computer screen: there are just local changes in the light emitted (just as,
strictly speaking, there [are] only sounds or vibrations in the surrounding air)” (De
Clerq 2007, 162). The important point is that these illusory motions are systematic,
natural, and belief-resistant effects of our perceptual system, and they are not imagina-
tive transformations of a more basic experience.
Likewise, the experience of musical space and musical motion seem to involve an
immediate, unlearned, unconscious process. As with a host of optical illusions, our
species just seems to be hard-wired to organize some kinds of sounds in particular ways
that we find “musical.” The experience of (illusory) movement in musical space would
be a prominent example: we may talk about a piano sonata moving into a remote key,
but there are no literal spaces or distances to traverse.23 Like any other perceptual illusion,
the experience of melodic movement and of size or width in a musical chord persists
in the face of knowledge that the relevant acousmatic “space” is not real. Because the
experience of movement in music is common and fits our standard criteria for per-
ceptual illusion, there is no reason to invoke imagination to explain the experience of
acousmatic “space” and motion within it. Musical movement is an experiential illusion
in which phenomenal effects systematically mislead us in response to certain kinds
of sound structures, which composers and musicians exploit. (Again, the attention
that infants give to melodies casts suspicion on the necessary role of imaginative
processing.)
If we are to appeal to illusion, rather than imaginings, we are committed to saying that
it is the natural product of our inherited auditory system. Charles Nussbaum (2007)
offers just such an account. The physiology of the human ear is integrated with mental
structures that map all sounds spatially; we cannot attend to musical structure unless we
also “move through [music’s] virtual tonal space in imagination” (2007, 99). It is note-
worthy that Nussbaum places no weight on his occasional references to imagination.
Given his explanation of why our mental representation of tonal space is an unavoidable
response, Nussbaum more often and more accurately refers to it as an illusion (e.g., 50).
A related line of explanation might posit that there is some degree of synesthesia
involved in music perception (i.e., some natural incorporation of nonauditory per-
ceptual systems into auditory perception; see the chapter on cross-modal corres
pondences by Eitan and Tamir-Ostrover, volume 1, chapter 36). At the same time, Saam
Trivedi (2008) and Stephen Davies correctly emphasize that any recourse to a biological
explanation of this illusion is a supplement to the philosophical question at hand, which
is the question of the proper referent of various terms when applied to music. Something
perceived, or something imagined? Davies returns us to the philosophical issues by
observing that a causal explanation is relevant when it explains why particular des
criptions are employed so universally. Cross-cultural uniformity in language use is a good
imaginative listening to music 481
reason to think that the language is being used literally, not metaphorically: we really do
experience music in terms of its own space, with movement in that space (Davies 2011,
31–32). And, again, that is a reason to think that we have hit another dead end in trying
to prove that imagination is necessary for hearing sounds as music. If the experience of
musical space and movement is going to be linked to imagination, the connection
involves a sense of “imagination” that I have not explored. Imagination is only necessary
if we stipulate that perceptual illusions are imaginative constructs. Although that
stipulation was common in the past, contemporary usage does not endorse it.
If musical motion is experienced rather than imagined, then what other phenomena
might require imaginative engagement?
There is a complex debate about the importance of listeners’ comprehension of
large-scale structure or musical “architecture” for complex instrumental compositions.
Because these structures are never immediately present for direct perception, they must
be imagined by knowledgeable listeners, either propositionally or though imagination
imagery. Suppose Aaron Copland is right, and “imagination and the imagination alone”
permits a listener “to see all around the structural framework of an extended piece
of music” (1952, 15). Unfortunately, Copland’s qualification about “an extended piece of
music” shows that it might not be required for all listening. We certainly do not need it
when listening to an instrumental version of a strophic folk song, such as Donald Bird’s
jazz take on “House of the Rising Sun.” It might not be necessary even for extended
forms, such as symphonies. According to Jerrold Levinson’s (1997) concatenationism
account of basic musical understanding, it is not required, for we can get what is most
important from any musical work by attending moment-to-moment.
Let us pause to take a closer look at moment-to-moment listening, for it involves
anticipation and expectation of where the music is going (Meyer, 1956). It is this point,
above all, that leads Hanslick to maintain that “imagination . . . is always the aesthetical
authority” (1986, 5). Imagination yields aesthetic pleasure through “the mental satisfaction
which the listener finds in continuously following and anticipating the composer’s
designs, here to be confirmed in his expectations, there to be agreeably led astray” (64).
But why are our anticipations imagined in one of the three relevant ways identified earlier,
rather than the product of a nonimaginative cognitive process, such as inference?
Hanslick is silent on this point. If listeners are “agreeably led astray” because they have a
justified belief that something will occur, imagination is superfluous. However, there is
one phenomenon that might bolster the claim that musical anticipation is a kind of
imaginative hearing-in. It is the case where a listener knows that a musical event will
occur, but who is nonetheless surprised by it. One of the classic examples is the crash of
noise in the second movement of Haydn’s Symphony No. 94, aptly nicknamed the
Surprise Symphony. It startles me even though I know it will be there, an effect that
482 theodore gracyk
would seem to require imaginative representation of where the music is headed in the
unfolding “world” of the music, independent of my belief about its real structure.
However, this musical phenomenon can also be explained without recourse to
imagination. Our listening process is cognitively complex, and consequently two
distinct, competing thoughts can be generated by different cognitive processes.
Haydn’s “surprise” exploits schematic expectation on the local scale, which leads us to
expect, moment to moment, another passage of soft and lulling music. Expectation of
disruption is based on episodic memory (from study of the score, or from past listening),
which conflicts with immediate listening expectations that arise unconsciously from
awareness of the perceptual pattern as it is being heard. Two doxastic states are in
conflict: for a moment, I genuinely expect a continuation of the soothing melody, and
for independent reasons I expect a disruption. Again, we have a straightforward
explanation of being “agreeably led astray” that does not require us to suppose that
imagination is involved.24 It is only imagination if imagination likewise generates my
expectation that the stuff in my mug will taste like coffee (because it tasted that way
when I sipped it thirty seconds ago), or when I anticipate that the airplane moving in a
straight line above me will, in the next few seconds, continue in the same line. However,
this is a general (and dubious) claim about all near-term anticipation, and not illuminating
about music.
One final candidate remains, and I think it fits the bill—it does not justify a conclusion
about all music, but it arises when listening to a great deal of music (especially Western
tonal music), and it requires imagination. When Haydn’s symphony arrives at its
“surprise,” it does more than violate our expectations about the music’s likely continu-
ation. The music’s forward motion has been suddenly, momentarily arrested.
Unconscious inference may be at work, yet it is not simply a matter of mistaken inference
about the immediate future. As Daniel Barenboim puts it: “There is a certain inevitability
about music. Once it is set in motion, it follows its own natural course” (2003, 190; see
also Meyer 1989, 33). Walton rightly emphasizes that we hear musical patterns as more
than sequences of motions: “we imagine (subliminally anyway) that causal principles
are operating by virtue of which the occurrence of the dominant seventh makes it likely
that a tonic will follow, and . . . we imaginatively expect the tonic, whether or not we
actually expect it” (1994, 49). There are two imaginative acts in Walton’s description.
We imagine that the tonic will follow and we imagine causal principles by virtue of
which this occurs. I have just explained why I do not think our expectation of the tonic is
a case of imagining. But the idea “that certain musical events are nomologically con-
nected” (Walton 1994, 49), where one sound or process is heard as if caused by another,
is a strong candidate for imaginative hearing-in. Since we do not perceive causal con-
nections, and so we are not subject to perceptual illusions about them, imagination
appears to be at work when we experience music as if causal relationships internal to the
music are guiding and organizing its unfolding. The aptness of this causal pretense
might be explained by an implication-realization model of musical listening
(Narmour, 1990), but our sense of guided motion goes beyond mere inferential
expectation that some sound events are more or less likely.
imaginative listening to music 483
Conclusion
The experience of causal forces at work within music, shaping and directing it, arises
through the listener’s imaginative engagement in a manner that can be characterized as
hearing-in. In recognizing that there is at least one way in which music listening requires
imaginative engagement in the Western common practice tonal tradition, I have
defended a middle position between a musical purism that denigrates imaginative
response and the traditional consensus that finds the imagination at work in multiple
ways, especially in the apprehension of all musical motion and structure. The underlying
experience of musical motion betrays the hallmarks of illusion, not imagination imagery
or hearing-in. This illusion provides a natural, cross-cultural basis for music’s expres-
siveness. So our experience of expressive qualities does not always require imaginative
enrichment of what we hear.
It might seem a bit of a letdown to conclude that neither propositional imagining nor
hearing-in is required for much of the experience that is distinctively musical. However,
that conclusion has been secured by focusing on examples of absolute music that are
devoid of extra-musical cues about its interpretation. In practice, most music has either
an associated text, or is rich in expressive properties, or both. Artworks typically pre-
scribe imaginative engagement, and music typically does, too.25 To borrow Jerrold
Levinson’s way of putting it (2015, 135), an anti-imaginative view of “listening” misdirects
us if it counsels us to ignore the invitations that most music extends. That is, most musical
experience takes place in a cultural context that constitutes an invitation to imagine that
it portrays particular situations or events, so the music serves as a detailed yet inde
terminate prop for our imaginative engagement.
Artworks and other cultural artifacts, including most music, are designed to elicit a
particular response, and someone is not appreciating the artwork or cultural achieve-
ment for what it is if their response is indifferent to its cultural particularity. (Because
484 theodore gracyk
cultural artifacts have a history, responses that ignore their history may even invite
moral censure, see Gracyk 2011) A Strauss waltz’s invitation to respond physically, by
dancing, is quite different from a tone poem’s invitation to respond imaginatively,
including with rich imaginative imagery. And both of those invitations seem very differ-
ent from that of Bach’s Die Kunst der Fuge. Yet even that work invites imaginative
engagement in the form of hearing-in. Consequently, anti-imagination purism endorses
an impoverished response to a great deal of music.
Notes
1. The distinction between hearing music and listening to it is prominent in Hanslick’s
formalism (1986, 60). For additional discussion of his distinction and how it was deployed,
see Gracyk (2007, chap. 5), and Cook (1990, 15–17).
2. Prominent gesture theorists include Godøy (2010) and Levinson (2006a).
3. Susanne K. Langer categorizes a listener like Woolf as “a person of limited musical sense”
(1957, 242); imagination imagery should not be encouraged by teachers, critics, and
composers (243).
4. I set aside, without further comment, the degree to which listening to music involves
imagination because imagination is at work in all perceptual experience, filling in the gaps
as our attention flits among objects and providing more stability and coherence than we
directly perceive (e.g., Zinkin 2003; Stevenson 2003, 249–253). Such an account will not be
especially illuminating about music listening.
5. Context matters: beliefs and memories can be incorporated into fictional narratives;
consequently, the distinction between memory and imagined event is sometimes a matter
of the use of a representation (Walton 1990, 369). But, again, content alone does not make
the difference. I thank Bryan Parkhust for reminding me of this point.
6. Tamar Gendler offers compelling arguments that “quarantine” is always limited, and that
“a certain degree of contagion is inevitable, indeed desirable” (2010, 8; see especially chaps.
11 and 12).
7. Some analyses employ the phrase “imaginative hearing” rather than “hearing-in” (e.g.,
Trivedi 2011, 114). However, I avoid this phrase because other writers use it to refer to
imagination imagery.
8. An important line of analysis, descriptivism, argues that there is no such imagery. See
Pylyshyn (1973) and Dennett (1981).
9. Some accounts distinguish seeing-in from seeing-as, and so there might be reasons to
worry about a difference between hearing-in and hearing-as (e.g., the former requires
audibility but the latter involves propositional imagining). Following Levinson’s analysis
(1996, 111–112), I doubt that the distinction is relevant to anything I discuss.
10. I thank an anonymous reader for noting that it is difficult to conceive of propositional
imagining about music occurring without some form of imagination imagery. However, I
would suggest that this commonly occurs when one encounters descriptions of fictional
musical works in literary works, such as Marcel Proust’s description of the Vinteuil Sonata.
For more on fictional composers and compositions, see Ross (2009).
11. Consequently, the issue on which I focus is distinct from the question of the minimum
level and type(s) of conceptual information that a listener must apply in order to possess
musical understanding. The scope of this debate is outlined by Davies (2011, 88–128).
However, that debate concerns a listener’s genuine beliefs, not imagination.
imaginative listening to music 485
12. Woolf ’s (2003) story is ambiguous: her description of the listener’s stream-of-consciousness
may describe propositional imagining, or it may describe imagination imagery, or both.
13. However, assigning imagination imagery to the category of “triggered” associations
does not prove that listeners are “less musical” because they sometimes respond in this
manner.
14. Kendall Walton resists the interpretation that all music is representational and/or a prop
(1994, 59–60). However, his reservations turn on the technical point that one is actually
using one’s experience, not the music itself, as a prop. So my use of “prop” is more liberal
than Walton’s considered view (and more in line with Walton 1990, 63).
15. William Shakespeare, Twelfth Night, or What You Will, 1.1.
16. Movement in acousmatic space is to be distinguished from movement of sound in real
space, as when the marching band moves toward you and then away from you as it moves
down the street.
17. Budd (2003) has the most stringent position, denying that we must perceive an acous-
matic “space” in order to perceive musical motion.
18. Music theorist Steve Larson (2012) adopts and develops Scruton’s thesis by extending it to
metaphors of musical forces.
19. E.g., Nawrot (2003). At six months of age, infants display marked musical preferences
based on cultural exposure (Adachi and Trehub 2012). Although their listening is
impoverished compared to that of a competent adult, they are listening, not merely
hearing, and there is no reason to think that they are guided by metaphors.
20. Trivedi (2008, 51–52) makes a similar argument, but it depends on a premise about the
eliminability of all metaphor that is too sweeping.
21. Budd (2003, 212) proposes that Scruton might be read in this way, as distinguishing
between the perception and the metaphor, where the metaphor arises only in any subse-
quent verbal expression of that experience. However, Budd’s interpretation flies in the face
of the great many places where Scruton clearly says that “our experience of music involves
an elaborate system of [spatial] metaphors” (Scruton 1999, 80).
22. Consider the prevalence of such language in descriptions of philosophical exchanges: a
philosopher “stakes out a position” and then makes a “move” in an argument. These
descriptions have ceased to be metaphors and they can be employed without exercise of
the imagination.
23. This analysis is briefly suggested in Davies (2011, 32).
24. This explanation, in terms of two response systems, paraphrases Huron (2006, 226).
25. See Kieran (1996, 337). Having rejected the view that music is representational and invites
imagination whenever it “moves” or displays expressiveness, Kieran’s characterization is
more accurate than Walton’s position that “virtually all music qualifies” as representa-
tional (Walton 1994, 48).
References
Adachi, M., and S. E. Trehub. 2012. Musical Lives of Infants. In The Oxford Handbook of Music
Education, edited by G. McPherson and G. Welch, 229–247. New York: Oxford University Press.
Addison, J., and R. Steele. 1965. The Spectator. Vol. 3. Edited by D. F. Bond, Oxford: Clarendon Press.
Barenboim, D. 2003. A Life in Music. Edited by M. Lewin. New York: Arcade Publishing.
Batteux, C. 2015. The Fine Arts Reduced to a Single Principle. Translated by J. O. Young. Oxford:
Oxford University Press.
486 theodore gracyk
Kieran, M. 1996. Art, Imagination, and the Cultivation of Morals. Journal of Aesthetics and Art
Criticism 54: 337–351.
Kivy, P. 1989. Sound Sentiment: An Essay on the Musical Emotions. Philadelphia, PA: Temple
University Press.
Kramer, J. D. 1988. The Time of Music: New Meanings, New Temporalities, New Listening
Strategies. New York: Schirmer.
Lang, P. H. 1997. Music in Western Civilization. New York: W.W. Norton.
Langer, S. K. 1957. Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and Art.
3rd ed. Cambridge, MA: Harvard University Press.
Larson, S. 2012. Musical Forces: Motion, Metaphor, and Meaning in Music. Bloomington:
Indiana University Press.
Lee, V. 1932. Music and Its Lovers: An Empirical Study of Emotion and Imaginative Responses to
Music. London: G. Allen & Unwin.
Levinson, J. 1996. Musical Expressiveness. In The Pleasures of Aesthetics: Philosophical Essays,
90–125. Ithaca, NY: Cornell University Press.
Levinson, J. 1997. Music in the Moment. Ithaca: Cornell University Press.
Levinson, J. 2006a. Sound, Gesture, Spatial Imagination, and the Expression of Emotion in
Music. In Contemplating Art: Essays in Aesthetics, 77–90. Oxford: Oxford University Press.
Levinson, J. 2006b. Nonexistent Artforms and the Case of Visual Music. In Contemplating Art:
Essays in Aesthetics, 109–128. Oxford: Oxford University Press.
Levinson, J. 2015. Musical Concerns: Essays in Philosophy of Music. Oxford: Oxford University Press.
Lopes, D. M. 2005. Sight and Sensibility: Evaluating Pictures. Oxford: Oxford University Press.
Mattheson, J. 1739. Der vollkommene Capellmeister. Hamburg, Germany: Christian Herold.
Meyer, L. B. 1956. Emotion and Meaning in Music. Chicago: University of Chicago Press.
Meyer, L. B. 1989. Style and Music: Theory, History, and Ideology. Chicago: University of
Chicago Press.
Narmour, E. 1990. The Analysis and Cognition of Basic Melodic Structures: The Implication-
Realization Model. Chicago: University of Chicago Press.
Nawrot, E. S. 2003. The Perception of Emotional Expression in Music: Evidence from Infants,
Children and Adults. Psychology of Music 31: 75–92.
Nichols, S. 2004. Imagining and Believing: The Promise of a Single Code. Journal of Aesthetics
and Art Criticism 62: 129–139.
Nussbaum, C. O. 2007. The Musical Representation: Meaning, Ontology, and Emotion.
Cambridge, MA: MIT Press.
Pylyshyn, Z. W. 1973. What the Mind’s Eye Tells the Mind’s Brain: A Critique of Mental
Imagery. Psychological Bulletin 80: 1–24.
Rosen, C. 1995. The Romantic Generation. Cambridge, MA: Harvard University Press.
Ross, A. 2009. Imaginary Concerts: The Music of Fictional Composers. New Yorker, August
24: 72.
Rothbart, P. 2013. The Synergy of Film and Music: Sight and Sound in Five Hollywood Films.
Lanham, MD: Scarecrow Press.
Scruton, R. 1974. Art and Imagination: A Study in the Philosophy of Mind. London: Methuen.
Scruton, R. 1999. The Aesthetics of Music. Oxford: Oxford University Press.
Scruton, R. 2014. The Soul of the World. Princeton, NJ: Princeton University Press.
Sparshott, F. 1990. Imagination: The Very Idea. Journal of Aesthetics and Art Criticism 48: 1–8.
Stevenson, L. 2003. Twelve Conceptions of Imagination. British Journal of Aesthetics
43: 238–259.
Tovey, D. F. 1936. The Training of the Musical Imagination. Music and Letters 17: 337–356.
488 theodore gracyk
A Hopefu l Ton e
A Waltonian Reconstruction
of Bloch’s Musical Aesthetics
Bryan J. Parkhurst
Introduction
Here are two similar-sounding terms: normative aesthetics and normativist aesthetics.
The principal contentions of this paper are that (1) Ernst Bloch’s normative aesthetics of
music and Kendall Walton’s normativist aesthetics of music both set out to address the
relationship between music and the imagination or, more broadly, between “musicking”
(Small 1998) and the imagination1; and that (2) Walton’s normativist theoretical frame-
work provides conceptual resources that are helpful for interpreting and critiquing
Bloch’s normative claims. The first order of business, then, is to give a provisional expla-
nation of the difference between normative aesthetics and normativist aesthetics.
Normative aesthetic claims belong to the realm of the aesthetic “ought.” They concern
what the aesthetic subject ought to do and how the aesthetic object ought to be. Aristotle
speaks in a normative-aesthetic register when he states:
if you string together a set of speeches expressive of character, and well finished in
point of diction and thought, you will not produce the essential tragic effect nearly
so well as with a play which, however deficient in these respects, yet has a plot and
artistically constructed incidents (1961, 63).
So does Hume, in telling us that “to enable a critic the more fully to execute [his critical]
undertaking, he must preserve his mind free from all prejudice, and allow nothing to
enter into his consideration, but the very object which is submitted to his examination”
(Hume 2006, 244). It is thus correct to say that “normative aesthetics establishes rules for
490 BRYAN J. PARKHURST
the artist and standards for the critic” (Jerusalem 1920, 207). But it is not complete,
for the aesthetic subject need not be an artist or a critic in the usual sense (she might, for
example, be a participant in a traditional worksong of a tribal community) and the
aesthetic object need not be an artwork (it might instead be a human body, a sunset, a
mathematical equation). Additionally, we might include in the domain of normative
aesthetics interpretive claims about what a specific work means or represents (taking
a cue from philosophers of language who hold that meaning itself is a normative
property). And we might also include judgments that concern the moral and political
character of works of art, so that normative aesthetics becomes capacious enough to
encompass, as well, the critical theory or critique of aesthetic phenomena, that is,
morally or politically committed aesthetic inquiry animated by “a vision of the good
social order grounded in both a detailed, empirical understanding of how existing insti-
tutions function and a commitment to normative criteria that are (in the broadest sense)
ethical” (Neuhouser 2011, 281).
A good thing to mean by “normativist aesthetics,” and what is meant by it here, is the
investigation, description, and systematization of the norms that are held to be constitu-
tive of a given aesthetic activity, that is, the norms one must abide by insofar as one is a
participant in an aesthetic practice.2 “Meta-aesthetics” might also be an appropriate
term for this type of inquiry. On one reading of it, Kantian aesthetics is in large part norm
ativist: it seeks to identify the norms the adjudicating subject follows in performing
a type of judgment that counts as distinctively aesthetic (as opposed to distinctively
empirical or moral, in the Kantian trichotomy).3 Whereas normative aesthetics espouses
norms, normativist aesthetics delineates the internal practical structure of normatively
governed practices, and is in that sense a kind of Geistes- or Kulturwissenschaft, or what
could be called a philosophical anthropology. As we shall see, the line of demarcation
between normative and normativist aesthetics, although it will be useful for us as a heu-
ristic, is not sharply inscribed.4
In what follows, I juxtapose Marxist normative aesthetics with Anglo-American
analytic normativist aesthetics. I do this by looking at Bloch’s theory of utopian musical
listening (a theory, I will suggest, of how music ought to be heard, of how its latent
revolutionary content ought to be disclosed by acts of imaginative listening) against the
background of Walton’s theory of musical representation and emotionality (an account
of the norms that govern music-centered make-believe). My aim in bringing together
these two very different philosophical treatments of the musical imagination is to use
Waltonian tools to reconstruct a core Blochian position regarding the relationship
between musical sound and revolutionary political consciousness. The position in
question is that music makes an appreciable contribution to the psychological faculty
of imagining, and to the political project of constructing, a better world, a “regnum
humanum” (a kingdom of humanity, Bloch 1986, 1296), in which there is an abolish-
ment of alienation, violence, and privation.5 For Bloch, a Hegelian Marxist, the project
of actualizing a regnum humanum through the implementation of communism is a
historical labor of communal Selbstbildung and Selbstverständigung (self-formation and
self-reflection):
A Hopeful Tone 491
Once man has comprehended himself and has established his own domain in real
democracy, without depersonalization and alienation, something arises in the world
which all men have glimpsed in childhood: a place and a state in which no one has
yet been. And the name of this something is home (Heimat). (1971, 44–45)6
Waltonian Fictionality
Walton’s theory of fiction, as set out most notably in Mimesis as Make-Believe, is a theory
of what it is for an artwork to have the representational content it has.7 The content of
representational works of art corresponds to what is true in the world of the work. Saying
that a proposition is true in the world of the work is the same as saying that the propo-
sition is “fictional.” And the fact that a proposition is fictional is a fact about a normative
status it possesses. Propositions that are fictional are to be imagined; they are what an
appreciator (a reading, listening, viewing consumer) of an artwork ought to (is under
some form of normative pressure to) imagine, because and insofar as she engages with
the artwork as a participant in a game of artwork-centered make-believe. Hence, for
Walton, what is “inside” an artwork—the otherworldly fictional content it contains—has
everything to do with what goes on “outside” of it, that is, with how this-worldly appreci
ators conduct themselves in relation to the artwork and in accordance with certain rules
of aesthetic behavior that dictate what, how, and when to imagine:
Fictional worlds are imaginary worlds. Visual and literary representations estab-
lish fictional worlds by virtue of their role in our imaginative lives. The Garden of
Earthly Delights gets us to imagine monsters and freaks. On reading Franz Kafka’s
story, “A Hunger Artist,” one imagines a man who fasts for the delight of spectators.
It is by prescribing such imaginings that these works establish their fictional worlds.
The propositions we are to imagine are those that are “true in the fictional world,” or
fictional. Pictures and stories are representational by virtue of the fact that they call
for such imaginings. (2015, 153)
A game of make-believe played with an artwork is often one in which the work serves
as a “prop.” A prop has the function of rendering certain propositions fictional in the
492 BRYAN J. PARKHURST
context of a set of “principles of generation.” These are conventions that regulate how
specific features of the prop (such as the property a slab of marble has, when carved just
so, of resembling an uncovered female figure) confer fictionality on specific propositions
(such as the proposition Aphrodite is naked). But some of the make-believe called for by
an artwork is not centered on the artwork’s (propositional) content proper (its fictional
world) and is instead centered on the appreciator’s own sensory and cognitive engagement
with that artwork as a prop. When looking at Bruegel the Elder’s The Peasant Wedding,
according to Walton’s account, I imagine not only that there is rustic merrymaking, but
also that I see rustic merrymaking, and that my visual experience of the painting is a
visual experience of rustic merrymaking. What is imagined in imaginings of this sort
belongs to a “game world” rather than to a “work world.” These imaginanda are not
constitutive of the artwork’s subject matter aptly so-called, but imagining them is never-
theless made appropriate by the particular manner and circumstances in which the
artwork puts its appreciator in epistemic contact with its subject matter.
This précis leaves out many subtleties. But three notable features of Walton’s theory
are evident from what has been said so far:
This being the case, it seems right to class Walton’s theory as a normativist theory of
reception. And it seems equally right to assume that this is not the sort of reception the-
ory you get if you take music as your starting point or primary datum. Music looks to be
a proposition-mongering affair only intermittently and per accidens, maybe even devi-
antly (in opposition to the true nature of true music). Pretheoretically, music does not
strike us as a form of art that consistently and constitutively has us imagining that
such-and-such is thus-and-so. “The weak representational nature of music (relative to the
other arts)” (Klumpenhouwer 2002, 34)—what Richard Wagner called the “infinitely
hazy character of music”—has led aestheticians to insist, with some justice, that music is
not an art of content but instead an art of pure form. Or, in variations on this theme, they
have held that music’s form is its content (as Eduard Hanslick believed); or that music’s
content is distinctively and exclusively musical, owing to music’s thoroughgoing self-
reflexivity, its inability or refusal to be a signifier of anything besides itself (as Heinrich
Schenker believed). In the most extreme version of this gesture, it is claimed (by, among
others, the musical aestheticians of the German Frühromantik, such as Tieck and
Hoffmann) that music is sheerly ineffable, and that the content of a musical experience
is entirely refractory to the fixities of linguistic description.8
A Hopeful Tone 493
The history of musical aesthetics contains far more denials than affirmations that
music, as such, is “a transcribable, thus readable, discourse” (Attali 1985, 25) that is replete
with linguistically paraphrasable content.9 But there is a dogged recurrence of the trope
that music still somehow aspires to the condition of language, perhaps by being organized
rhetorically, like a speech (as Baroque-era theorists such as Johann Mattheson argued),
perhaps by possessing a underlying grammar-like structure (e.g., a formalizable syntax)
that floats free from a domain of reference,10 or perhaps by being in some looser way
pseudolinguistic, for example, by being a “temporal succession of articulated sounds that
are more than just sound, [a succession that is] related to logic [in that] there is a right and
a wrong” (Adorno 1993, 401). Yet often, as in the case of Adorno’s theory, the acknowledg-
ment of a “language-character” (Sprachcharakter) in music is accompanied by undi-
minished eagerness to evacuate music of semantic significance. Music may in some sense
“speak” to its hearers and abide by some kind of “logic,” Adorno believes, “but what is said
cannot be abstracted from the music; it does not form a system of signs” (1993, 401).11
Whether or not Adorno’s or any other antisemantic theory fully and accurately
diagnoses music’s condition, it seems inarguable that antisemanticism represents a
motivated response to some deep, distinguishing, and perhaps distinguished feature of
music. And music’s “infinitely hazy character” shows up as a problem to be reckoned
with if one’s explanatory standpoint, like Walton’s, is on the whole, a propositional/
representational one.
Walton on Music
not we actually expect it. If, or to the extent that [this imagining] is prescribed, we
have fictionality. (Walton 2015, 154)
3. Music may represent properties of events or actions and thus make it fictional that
property-bearing events or actions take place, while leaving largely indeterminate the
kinds of objects or actors implicated in those events and actions:
[T]he lateness of the upper voice, and its dallying quality, the rigidity of the bass’s
progression, the fortuitousness or accidentalness of the D-major triad, the movement
to something new, are in the music . . . Some of this at least is a matter of imagining.
We imagine something’s being late, probably without imagining what sort of thing it
is. And we imagine a fortuitous or accidental occurrence. . . . [W]hy shouldn’t it count
as representational, . . . as representing instances of lateness, fortuitousness, etc.?
(Walton 2015, 159)
Without feeling at all disposed to deny these claims, we may still feel disposed to pass
comment on how theory-laden they are. These are the sorts of things one would be
primed to notice and point out about music if one’s motivating objective were to extend
the applicability of a propositional model of fictional representation, a model designed
in the first instance to accommodate literary and visual aesthetic phenomena. This not
an indictment. All observations and explanations are probably in some measure theory-
laden. Moreover, it is satisfying to follow Walton to the counterintuitive but unavoidable
conclusion he reaches, which is that there is a class of familiar music-appreciative behav-
iors that centrally involve make-believe, such that music (more often than not, and more
often than anyone had realized) is representational and is (to that extent) a cognate of
pictures and literature qua fiction-conveying technologies. And the attraction of having
a unified theory under which art-forms can be subsumed serves as an incentive to think
and talk about music in terms of its pervasive representationality, or (equivalently) its
fictionality, or (equivalently) its imagination-prescribing function.
One may worry that there is a danger that those attractions will lead us to dispro-
portionately accentuate music’s ties to word and image and to downplay whatever it is that
is idiosyncratically musical (radically nonliterary, radically nonpictorial) about music.
For we can readily grant that the kinds of fictionality Walton finds in music may be
A Hopeful Tone 495
commonplace (many people may in fact perform such imaginings when they listen to
music) and may be in some sense mandatory (one may do a worse job of appreciating
music if one fails to perform such imaginings) while at the same time believing that such
imaginings are not a sine qua non of musical experience in and of itself. One arguably
cannot count as appreciating Maxim Gorky’s Mother as a novel, or count as appreciating
Evdokiya Usikova’s Lenin with Villagers as a picture, at all without using one’s imagi-
nation to explore a fictional world populated with fictional objects and events. By contrast,
although a relatively nonimaginative experience of Shostakovich’s Leningrad Symphony
might arguably be impoverished in comparison with an experience that is rich in propo-
sitional imaginings, most of us will be unwilling to insist that a listener who fails to
imagine the Siege of Leningrad while listening to this piece, or who fails even to non-
specifically imagine that something or other is destroyed or imperiled, or that violence
or trauma somehow transpire, is thereby disqualified from counting as a musical lis-
tener altogether.12
But actually, there is little cause for worry, for Walton concedes most of this. He rec-
ognizes that there is a significant “remainder” (as Adorno would say) when music is
brought under the categories native to a theory of fictional representation, something
left over that is, paradoxically, made conspicuous by the very fact that it is excluded or
downplayed, something that is (again as Adorno would say) “nonidentical” with the
concepts whose explanatory use the Waltonian theory of fictionality encourages.13
Accordingly, Walton looks outside the bounds of fictional representation for a feature
that marks music off from the literary and visual arts. To seek this, given an antecedent
conception of aesthetic appreciation as something that requires compliance with norms of
imagining, is to seek a form of imaginative experience the prescribing of which is unique
to music. Walton finds music’s individuating trait, the differentium specificum that it
does not share with novels and paintings, and so forth, in the way it prevails upon us to
imagine that our auditory experience of the musical work is an affective experience of an
emotional state. Music “gets us to imagine experiencing a certain feeling, and possibly
expressing it or being inclined to express it in a certain manner. It often does this without
getting us to imagine knowing about (let alone perceiving) someone else having that
experience or expressing it in that manner” (Walton 2015, 173). Rather, music gets the
listener to imagine of her experience of hearing sounds that it is an experience of a par-
ticular emotion. Walton’s terminology allows this idea to be expressed concisely: music,
unlike other forms of art, gets us to treat our perceptual experience itself (as opposed to
external objects that cause and are represented by that experience) as a prop. Walton
frames the suggestion in terms of the difference between work worlds and game worlds:
Work worlds comprise fictional truths generated by the work alone. But feelings . . . do
not exist independently of people who feel them. . . . So there is no pressure to regard
the music itself as establishing a fictional world in which there are feelings. . . . It is
the listener’s auditory experiences, which, like feelings, cannot exist apart from being
experienced, that make it fictional that there are feelings. When the listener imagines
experiencing agitation herself, there is no reason to think of the music as making
496 BRYAN J. PARKHURST
anything fictional. It is the listener’s hearing of the music that makes it fictional
that she feels agitated. The only fictional world is the world of her game, of her
experience. (2015, 173)
Ignoring for present purposes the technical nuts and bolts of what I have elsewhere
called Walton’s “first-person feeling theory of musical expression,”14 I now segue into a
discussion of Bloch by first ideologically diagnosing the kind of imaginative listening that
Walton’s normativist theory takes as its object of theorization. A critical-historical vantage
point allows us to see that this mode of listening has affinities with the ideology of what
Korstvedt (2010, 122), following Adorno, calls “Romantic bourgeois Innerlichkeit.”15
One key respect in which the Waltonian listener can be identified as a stereotypically
“Romantic” subject is that this listener’s experience (of music as a locus of expression
or emotionality) is one in which introspectible “sentiment, longing, and emotion . . .
even suppressed animality” (Korstvedt 2010, 122)16 are elevated to the status of ends-
in-themselves. For the Romantic aesthete—in whose eyes art is a supremely valorized
object of perception and cognition because it facilitates a “spontaneous overflow of pow-
erful feelings”17—an activation or intensification or heightened awareness of the emo-
tions is the raison d’être of music, or of a certain way of listening to it. According to this
outlook, musical sounds are not to be valued primarily as stimuli that lead to proper
action (as Plato’s account of the musical modes in The Republic holds) nor as aids to the
restoration of bodily and spiritual equilibrium (as Aristotle’s account of musically abet-
ted catharsis in the Politics holds). Nor are musical sounds to be listened to for their own
sake, or for the sake of detached contemplation of their sensuous auditory properties,
as happens in the modernist practice of “reduced” listening, that is, “listening for the
purpose of focusing on the qualities of the sound itself (pitch, timbre, etc.) independent
of its source or meaning” (Chion 1994, 222–223). Instead, music figures into the Romantic
vision as an instrumentally valuable means to an intrinsically valuable end of emotional
extremity or disequilibrium.
According to Hegel, whose aesthetic system both expounds and historicizes the
Romantic conception of music, music’s release from its self-incurred tutelage (its ages-
long period of subordination to the verbal art forms) occurs when it matures into
a romantische Kunstform, a mode of artistic expression that taps into “inner spirit”
(der innere Geist, i.e., emotional subjectivity). More so than the other arts, music pro-
vides “a resonant reflection, not of objectivity in its ordinary material sense, but of the
mode and modifications under which the most intimate self of the soul, from the point
of view of its subjective life and ideality, is essentially moved” (Hegel 1920, 342). Music is
a “province which unfolds in expanse the expression of every kind of emotion, and every
shade of joyfulness, merriment, jest, caprice, jubilation and laughter of the soul, every
A Hopeful Tone 497
gradation of anguish, trouble, melancholy, lament, sorry, pain, longing and the like, no
less than those of reverence, adoration, and love fall within the appropriate limits of its
expression” (Hegel 1920, 359). Adorno, a Romantic at heart, accepts these premises and
attempts to draw out what he sees as their consequences for the psychology of class-
position. He interprets the musically assisted retreat into the “exclusive and private
warmth” (Daniel 2001) of bourgeois interiority as a gesture of withdrawn resignation,
on the part of a no longer heroic middle class, in the face of monstrous and impersonal
forces and relations of production that lie outside the ken of the individual subject’s
power to efficaciously intervene for social change:
Although inwardness, even in Kant, implied a protest against a social order heter-
onomously imposed on its subjects, it was from the beginning marked by an indif-
ference toward this order, a readiness to leave things as they are and to obey. This
accorded with the origin of inwardness in the labor process: Inwardness served
to cultivate an anthropological type that would dutifully, quasi-voluntarily, per-
form the wage labor required by the new mode of production necessitated by the
relations of production. With the growing powerlessness of the autonomous subject,
inwardness consequently became completely ideological, the mirage of an inner
kingdom where the silent majority are indemnified for what is denied them socially.
(Adorno 1998, 116)
My impression is the opposite of being distanced from the world of the music. . . .
I feel intimate with the music, more intimate, even, that I feel with the world of a
painting . . .it is as though I am inside the music, or it is inside me. Rather than hav-
ing an objective, a perspectival relation to the musical world, I seem to relate to it in
a most personal and subjective manner. (2015, 165)
It is the auditory experiences, not the music itself, that generate fictional truths.
I can step outside of my game with a painting. When I do, I see the picture and notice
that it represents a dragon, that it calls for the imagining of a dragon (even if I don’t
actually imagine this). But when I step outside my game with music and consider
the music itself, all I see is music, not a fictional world to go with it. There are just
the notes, and they themselves don’t call for imagining anything. The absence of a
work world does not, however, prevent the listener’s imagination from running wild
as she participates in her game of make-believe. (Walton 2015, 174)
experience), Waltonian emotional listening represents an encounter with the self. This is
a self-encounter in which, moreover, the imagination is irreducibly involved. Bloch’s
philosophy of music likewise thematizes imaginative self-encounters and it likewise
has a pronounced Romantic slant to it (Habermas [1969–1970] calls Bloch a “Marxist
Romantic”). But rather than attempting to codify the norms indigenous to a historically
localized form of “bourgeois-Romantic” listening, as Walton’s theory can be read as doing,
Bloch’s philosophy of music endeavors to bring about the dialectical transcendence—
the processual subversion, preservation, and elevation—of those norms. It does this by
prescribing a mode of emotional listening that poses challenges to both the aesthetic
ideology in which it itself is rooted—namely, bourgeois Innerlichkeit—and the economic
configuration that is, in Bloch’s Marx-inspired view, determinative of this ideology—
namely, the capitalist mode of production. This reappropriation and redeployment of
his culture’s musical heritage is part and parcel of Bloch’s overall philosophical strategy
of pitting culture (religion, philosophy, art, etc., as they have been handed down) against
itself (by “sublating,” problematizing, and radicalizing it) for the sake of itself, that is, for
the sake of instituting new cultural forms that can nurture the “subjective conditions for
revolution” (Kellner and O’Hara 1976, 19) by evoking a “future kingdom of freedom as
the real content of revolutionary consciousness” (Bloch 1972, 272). Habermas points to
the Hegelian basis of this maneuver:
What Bloch wants to preserve for socialism, which subsists on scorning tradition, is
the tradition of the scorned. In contrast to the unhistorical procedure of Feuerbach’s
criticism of ideology, which deprived Hegel’s “sublation” (Aufhebung) of half of
its meaning (forgetting elevare and being satisfied with tollere), Bloch presses the
ideologies to yield their ideas to him; he wants to save that which is true in false
consciousness: “All great culture that existed hitherto has been the foreshadowing of
an achievement, inasmuch as images and thoughts can be projected from the ages”
summit into the far horizon of the future. (1969–1970, 312)
It is clear from the amount of attention Bloch lavishes on music,19 and from his undi-
luted enthusiasm for it, that he judges the Western musical heritage to be among the
most precious items in the bequeathed patrimony of “great culture.” This is because
music performs with distinction what Bloch sees as the rightful function of art in
general, that of putting us in touch with our longing for, and with our will to create, a
world unblemished by alienation, exploitation, and oppression. Music is preeminent
for Bloch because it is preeminently utopian:
For Bloch, music is the most utopian of the arts. It is speech which men can under-
stand: a subject-like correlate outside of us which embodies our own intensity,
and in which we experience an anticipatory transcendence of the existing interval
or distance (Abstand) between subject and object. “Identity,” the “last moment,”
“a world for us,” “utopia” is present in music: as the anticipatory presence and
pre-experience (Vorgefühl) of the possibility of self. . . . Music expresses something
“not yet.” It copies what is objectively undetermined in the world. There is a human
500 BRYAN J. PARKHURST
world in music which has not yet become actual: a pre-appearance of a possible
regnum humanum. For Bloch, music is the most public organon of the Incognitio
or subjective factor in the world as a whole, and it provides an anticipatory experi-
ence of the subject-like (subjekthaft) agens as if it had become objectified in the
external world. (1982, 175)
profoundest feelings of hope and expectancy.20 It calls for us, beckons us, from a utopian
future we currently see through a glass, darkly. And it calls upon us to recognize and
fully actualize the nature of our true, ideally social selves, both by causing us to regret,
and to resolve to rectify, the current incompleteness of our historical project of self-
emancipation and self-realization, and also by pushing us to adopt the means necessary
for achieving a world that is adequate to our shared species-being (Gattungswesen).
Such a world must of needs be characterized by collectivity, nonalienation, solidarity,
and the absence of scarcity—attributes whose political and economic precondition,
Bloch believes, is the abolition of the capitalist mode of production through the insur-
rectionary activity of the proletarian class. “The realized We-world” is Bloch’s term for
the unqualifiedly redemptive commonwealth of humanity that is the asymptotic goal of
the socialist movement. By means of an act of divinatory musical hearing (Hellhören),
Bloch thinks, we can feel the real possibility of this future (or, if you like, future-perfect)
state of affairs, can gain sensuous knowledge of the world’s objective tendency (Tendenz)
to move toward the actuality of communism.21 “Music as a whole stands at the boundary
of humanity, but [it is] the boundary where humanity, with a new language and the call-
aura surrounding deeply felt intensity, a realized We-world [der Ruf-Aura um getroffene
Intensität, erlangte Wir-Welt] first comes into being. The order in the musical expression
also suggests a house, even a crystal, but one composed of future freedom; a star, but as a
new earth” (Bloch 1986, 1103).
Utopia
That all sounds rousing and eschatological enough, but what precisely can it mean?
If we hope to decipher such aphorisms, we must place them against the background of
Bloch’s more general theory of utopia, the master narrative that structures all of Bloch’s
philosophical and sociological investigations. For the politically committed Marxist-
Leninist Bloch of The Principle of Hope, a variety of human communicative practices—
predominantly those involving the “production and usage of signs” to convey “social
meaning[s] expressed in a code” (Attali 1985, 24)—possess a “utopian function.” In a
motley assemblage of cultural forms—architecture, fairy-tales, the detective novel,
religion, alchemy, circuses, advertisements, fashion, medicine, and the fine arts—Bloch
espies a “Vorschein,” an anticipatory illumination of a possible, preferable, future state of
affairs.22 Sometimes a Vorschein may reveal itself to us in the mundane transactions of
contemporary commodity culture. “Shop windows and advertising are in their capitalist
form exclusively lime-twigs for the attracted dream birds” (Bloch 1986, 334). To use one
metaphor to unpack another: the siren song of manipulative marketing, notwithstanding
its liability to mystify consumers and fetishize commodities, gives voice (unbeknownst
to itself) to a legitimately humanistic wish for a new and improved way of life. In other
cases, a Vorschein may only become perceptible in hindsight, through an interpretive
502 BRYAN J. PARKHURST
reassessment of an antiquated cultural form, or what Hegel calls a “shape of life that
has grown old” (Hegel 1992, 23):
Both quotidian experiences and aesthetic experiences, both modern shapes of life and
outmoded shapes of life can, under proper scrutiny, show us the yawning gap between
how the world is and how it could and should be. Bloch’s hermeneutic undertaking is to
use cultural and aesthetic criticism to sharpen our experience of the nonidentity of is
and ought. He seeks to brighten and intensify the utopian Vorschein that is every-
where to be glimpsed in the world of human values, institutions, culture and art.
Accordingly, the philosophy of music progressively elaborated in The Spirit of Utopia
and The Principle of Hope encourages us to hear the revolutionary, hopeful tone23 that
resounds in the masterworks of the Western canon (to which Bloch’s musical prefer-
ences are more or less restricted).24 With Bloch’s aesthetic philosophy as its handmaiden,
music can finally come into its own as a “source-sound of self-shapings still unachieved
in the world” (Bloch 1985, 219).
Fair enough, but what exactly is a Vorschein, and how exactly is it to be found in music?
Bloch’s cryptic, prophetic reflections on culture never give way to a precise statement of
what it is for a cultural practice, musical or otherwise, to radiate a pre-appearance of
utopia. At the risk of oversimplification, I would point to two basic ideas that seem to lie
at the center of Bloch’s proposal: (1) there is a way of imaginatively engaging with cul-
tural objects so that they provide a sensory and intellectual provocation to construct a
mental representation of (fictional) utopian circumstances, and (2) this fiction-generating
imaginative activity, properly performed, furnishes us with motivation to make the
utopian fiction a true representation; we are propelled by our utopian make-believe to
try to bring the world into alignment with the utopian fictions we imagine.
These two ideas are especially germane in the case of musical experience, as Ruth
Levitas notes: “Bloch argues not only that music is the most utopian of cultural forms
but that it is uniquely capable of conveying and effecting a better world” (Levitas 2013,
220, emphasis mine). To paraphrase Levitas, music has pride of place in the sphere of
the arts because of, on the one hand, its unparalleled capacity for possessing utopian
sense or reference (the semantic property of being about or signifying utopia) and, on the
other hand, its capacity for carrying utopian prescriptive force (the power to exhort us
to make utopia real). Music exceeds the other arts in its power to summon a vision of
a utopian “Not-Yet-Being” (Noch-Nicht-Sein) or “Real-Possible” (objectiv-real Mögliches)
A Hopeful Tone 503
that lies beneath the surface of “That-Which-Is” (Das Seiendes)—the world as it currently
confronts and confounds us; and it also has greater power to make palpable the historical
urgency of this vision: “music is that art of pre-appearance which relates most intensively
to the welling core of the existence-moment of That-Which-Is and relates most expan-
sively to its horizon—cantus essentiam fontis vocat [singing summons the existence of
the fountain]” (Bloch 1986, 1069–1070).25 Couched in language that is less expressionistic:
the world as it now is, which includes us as we now are, is pregnant with—contains as
an “objectively real possibility” (objektiv-reale Möglichkeit)—the world as we imagine it
should be, which includes us as we would wish ourselves to be. Music reveals that the
alienated, self-dirempted world of capitalist modernity is implicitly and immanently
(not yet, noch nicht) “a homeland of identity in which neither man behaves toward the
world, nor the world behaves towards man, as if toward a stranger” (Bloch 1986, 209).
How? Music “relates most intensively to the welling core of the existence moment” by
being the most real art, the art most immediately connected with our concrete material
and corporeal predicament as embodied creatures, in the perfectly literal sense that
there is nothing spatially between us and the physical vibrations that are music’s material
substratum. Music’s realness thus rests on transhistorical features of our sensory appa-
ratus. Bloch says as much in saying, wryly, that “as hearers we can keep closely in touch,
as it were. The ear is slightly more embedded in the skin than the eye is” (1985, 73). And,
very much in the spirit of Walton’s remarks about our spatial oneness with musical
sounds (“it is as though I am inside the music, or it is inside me”), Bloch refers to the
“heard note” as a “sound that burns out of us . . . a fire in which not the vibrating air but
we ourselves begin to quiver and to cast off our cloaks” (1).
To give sense to the idea that music relates “most intensively” to “That-Which-Is,”
we can appeal, on Bloch’s behalf, to the spatial immediacy and bodily resonances that
place music in an “incomparable proximity to existence” (Bloch 1985, 227)—namely our
creaturely existence as bodies that are repositories of affect and desire. Here, force is a
function of distance: the potency of music’s utopian prescription has to do with its close-
ness to us. Music “comes close to the subject-based and driving force of events” (208),
the human will as the authentic engine of history, because of music’s capacity to (non-
metaphorically) move us, indeed to become one with us, on a somatic level. “There is not
music of fire and water or of the Romantic wilderness that does not of necessity, through
the very note-material, contain within it the fifth of the elements: man” (227). The nature
of sound and the nature of our bodies ensure that the material conditions are continually
present for establishing a “correspondence between the motion of the note and the
motion of the soul” (123).
But why believe that music is “uniquely capable of conveying . . .a better world?” in the
sense of helping us to imagine one? The figurative and literary arts can convey semantic
freight of a utopian sort by showing us or telling us about some utopian situation or
other, such as the leisure-filled, egalitarian, neo-Medieval England described in William
Morris’s utopian novel News from Nowhere. But it seems that music unaided by words
and pictures is not merely inferior as a vehicle for “conveying” utopia; music seems
wholly unfit for this representational task.
504 BRYAN J. PARKHURST
Bloch might respond that this is too simplistic a way of framing the issue. Utopian art,
as Bloch conceives of it, is not simply, nor is it primarily or paradigmatically, art that
draws a blueprint of a better world and/or a better way of living in the world:
Thus the concept of the Not-Yet and of the intention towards it that is thoroughly
forming itself out no longer has its only, indeed exhaustive example in the social
utopias; important though the social utopias are . . . [T]o limit the utopian to the
Thomas More variety, or simply to orientate it in that direction, would be like trying
to reduce electricity to the amber from which it gets its Greek name and in which it
was first noticed. Indeed, the utopian coincides so little with the novel of an ideal
state that the whole totality of philosophy becomes necessary . . .to do justice to the
content of that designated by utopia. (1986, 11)
Levitas, taking her lead from Bloch’s standoffishness toward the “novel of the ideal state,”
states that “the importance of . . . all utopias, lies not in the descriptions of social arrange-
ments, but in the exploration of values that is undertaken” (2010, 140). Utopian art is not
limited to idealistic science fiction; rather, it is any art that permits us to navigate a space
of alternative values, not so that we might come to commit ourselves to those exact values,
but so that we might cultivate the imaginative faculty of thinking deeply and creatively
about a radically novel personal and societal ethos, one that might possibly emerge into
prominence within a radically reorganized way of producing and reproducing human
civilization. Hudson (1982) essentially agrees with Levitas when he states that Bloch’s
view is that utopian artworks might or might not contain “descriptions of social arrange-
ments,” but must possess a “cognitive function as a mode of operation of constructive
reason; [an] educative function as a mythography which instructs men to will and desire
more and better, [an] anticipatory function as a futurology of possibilities which later
become actual, and [a] causal function as an agent of historical change” (51).
Be that as it may, the question stands: how is music alone supposed to do this? Even if
Levitas’s “exploration of values” and Hudson’s utopian functions do not presuppose
full-blown verbal or pictorial “descriptions of social arrangements,” they seem to be
predicated on the presence of semantic content of some sort, that is, on the availability of
a specifiable, “transcribable” meaning or representational content that can be somehow
accessed by means of (proper engagement with) the utopian-functioning artwork.
Shouldn’t this mean that music’s “weak representational function” disqualifies it from
playing a genuinely utopian role, at least on its own? It looks as though Bloch shares this
worry: “It does not go without saying that the note can indicate external things and be
related to them. After all, it inhabits precisely that region where our eyes can tell us noth-
ing more and a new dance begins” (Bloch 1985, 219). Yet he blithely proceeds as though it
does go without saying and exempts himself from giving a justification for his conviction
that music is the utopian medium par excellence.
This lacuna cannot be passed over without comment. To have even a minimal appreci-
ation for Bloch’s large philosophical investment in music, we must have some measure
of warranted sympathy for his belief in music’s utopian function. And to have this, we need
to be able to explain to ourselves how (purely) musical utopianism is so much as possible.
A Hopeful Tone 505
Waltonizing Bloch
With the help of Walton’s theoretical apparatus and a clue from Adorno, we can formulate
Bloch’s position so that it makes enough sense to be assessable. As a way into this, let us
return to the dualism I set out at the beginning of the chapter.
Inarguably, Bloch’s aesthetics is robustly normative. Although Bloch has no use for
micro-evaluative rankings of individual artworks and is an aesthetic omnivore who, “in
contrast to Lukács, breaks with the high culture bias in Marxist aesthetics” (Hudson 1982,
179),26 Bloch’s musical aesthetics is at root an endorsement of the classical canon’s sup-
posed aptness for promoting socialist values. It is also an elaborate exposition of the
view that the value of a musical work is partly based on its fitness for aiding the cause of
human emancipation.
Recall that normativist aesthetics has the job of describing and systematizing the
norms that govern and constitute real-life aesthetic practices and habits. Walton
examines the practices and habits that surround fictional representations; he explains
the property of fictional representationality in terms of the uses to which fictionally
representational aesthetic objects are put, and in terms of the normative statuses those
objects are accorded, by participants in games of make-believe. Bloch, on the other
hand, does not set out to explain how an already-up-and-running aesthetic practice is
organized and administered. His conception of “visionary hearing” (Hellhören), for
instance, does not arise out of an attempt to explain what the typical listener typically
does when listening to music. But Bloch’s aesthetics does adopt the holistic, synoptic
perspective characteristic of Walton’s normativist work. Where Walton sets out to trace
the normative contours of a complex representational and imaginative practice as it
currently exists, Bloch calls for the revision or remaking of time-honored aesthetic
customs. In both cases, the theoretical object is all of a certain musical way of life in
its globality We might therefore adopt a more fine-grained version of the normativist/
normative distinction and speak instead of a distinction between descriptive meta-
aesthetics, as pursued by Walton, and normative meta-aesthetics, as pursued by Bloch.
Bloch’s aesthetics is normative less at the level of the individual work or individual
aesthetic judgment and more at the level of the entire aesthetic culture in which such
works and judgments have their place.
The normative system that Bloch propounds, like the one whose defining attributes
Walton catalogs, has authority over acts of the imagination. According to the norms of
imaginative listening Bloch would have us adopt, music (read: the masterworks of the
Western tonal canon) should be accorded the function of evincing a utopian vision. The
utopian vision to be evoked is substantially the same from piece to piece. Music, all of it,
has an immutable representational content that is prior to the contingent, individuating
details of specific works. This Bloch refers to both as the “a priori latent theme [that] . . . is
really central to all the magic of music” (Bloch 2000, 3) and as “the hearing-in-Existence . . .
common to all forms of music” (Bloch 1986, 1089). That Bloch sees himself as breaking
506 BRYAN J. PARKHURST
[T]he musical object that has really to be brought out is not decided. The . . . dramatic-
symphonic movement posits only an area of very general readiness into which the
poetically executed music-drama can now be fitted “at one’s discretion.” And by the
same token, there yawns between the most transcendable [compositional devices]
and the ultimate signet-character of great composers or indeed the ultimate object,
the ideogram of utopian music in general, an empty, damaging hiatus which renders
the transition more difficult. Even in rhythm and counterpoint illumined theo-
retically and set in relation philosophically, it is not possible to come directly to the
kind of presentiment accessible to the weeping, shaken, most profoundly torn-apart,
praying, listener. In other words, without this special learning-from-oneself, feel-
ing-oneself-expressed, human outstripping of theory [through] the interpolating
of a fresh subject (though one most closely related to the composer) and of this
subject’s visionary speech . . .without this, all transcending relations of the [compo-
sitional devices] to the apeiron . . .will remain stationary. Thus with the presentiment,
a stage which no longer belongs to the history of music, the note itself reappears as the
solely intended, explosive aha!-experience of the parting of the mist; the note which
is heard and used and apprehended, heard in a visionary way, sung by human beings
and conveying human beings. (1985, 92, emphasis in original)
Part of Bloch’s point seems to be that there is no way of explaining music’s utopianism,
no way of tracing a path from what is objectively the case about music’s structure
(perhaps at the level of “rhythm and counterpoint illumined theoretically”) to its capacity
for “visionary speech.” But, as I have insisted, a vindication of the possibility of musical
utopianism is a requirement for taking Bloch’s revisionary, revolutionary musical
aesthetics at all seriously, and any such vindication would seem to stand in need of
such an explanation.
Adorno’s writings, as interpreted by Richard Leppert, may permit us to be more
optimistic than Bloch is about the prospects of explaining musical utopianism. In spite
of his infamous pessimism, Adorno is in many respects a utopian thinker about music.27
A utopian sensibility imbues Adorno’s formulation of the concept of structural listening,
a privileged mode of “formalist” hearing whose decline he blames on the organs of mass
culture, principally the radio and the phonograph. Leppert explains:
For Adorno, music’s sensuous presentation of the reconciliation of part and whole
(at least sometimes) stands for a state of perfection, self-subsistence, harmony, plenitude,
consummation, fulfillment, and nonalienation, in a way that is (at least sometimes)
utopian.28 Such a state of reconciliation obtains when the elemental constituents of a
system and the system as a unified whole are mutually adjusted and accommodated to
one another, such that each is a necessary requirement of, and in turn requires, the other.
This familiar organicist conceit is readily transposed into a political key:
In musical details Adorno heard the subject speaking, willingly bending toward the
musical object (the whole) in order to make possible the work, a whole larger than
the sum of its individual parts. Something, in other words, like a utopian society.
Musical details, bending and blending their expressive character toward the whole,
while retaining their own specific character, permitted the reenactment of reconcilia-
tion between subject and object, for Adorno the artwork’s highest goal.
(Leppert 2005, 116)
Music, on this somewhat cursory telling of the Adornian story, is “like” a utopian society:
the reciprocal mediation of whole and part (piece and note) in organically unified music
is relevantly similar in form (or “isomorphic,” “structurally homologous,” etc.) to the
reciprocal mediation of whole and part (society and person) that is distinctive of a
“homeland commensurate with man” (Bloch 1986, 136). Subsequent to humankind’s
hard-won entrance into its postcapitalist homeland, individuality is not, and cannot be,
alienated from collectivity. This is because, as the Communist Manifesto famously puts
it, communism is by definition a circumstance in which “the free development of each is
a condition for the free development of all.” Animated by exactly this vision, Adorno
holds that great music’s
greatness is shown as a force for synthesis. Not only does the musical synthesis pre-
serve the unity of appearance and protect it from falling apart into diffuse culinary
moments, but in such unity, in the relation of particular moments to an evolving
whole, there is also preserved the image of a social condition in which above those
particular moments would be more than mere appearance. (2002, 290)
Adorno thus singles out a formal similarity, a shared mereological property, as the
common denominator that semiotically links utopian music to utopian social circum-
stances. It is on the basis of this structural resemblance that he ascribes to music an ability
to “preserve an image” of utopia. Should we therefore infer that utopian music has a
representational function analogous to that of Bosch’s The Garden of Earthly Delights
508 BRYAN J. PARKHURST
or Manet’s Le Déjeuner sur l’herbe? These utopian paintings catalyze our imagining of
utopian states of affairs by looking like a setting in which such states of affairs obtain;
looking at these paintings is phenomenologically similar (in relevant respects) to what it
would be like to actually look at an actual utopian setting. A Waltonian would say that
this visual similarity enables and invites us to pretend that in perceiving the artwork we
are perceiving the utopian state of affairs that is represented in and by the artwork. Does
utopian music do something like this? Is the Adornian position (or the most defensible
position consistent with the most charitable interpretation of Adorno’s remarks) the posi-
tion that music’s utopian function consists in its being what Walton calls a depiction?
Regarding depictions, Walton claims:
The viewer of Meindert Hobbema’s Water Mill with the Great Red Roof plays a game
in which it is fictional that he sees a red-roofed mill. As a participant in the game, he
imagines that this is so. And this self-imagining is done in a first-person manner:
he imagines seeing a mill, not just that he sees one, and he imagines this from the
inside. Moreover, his actual act of looking at the painting is what makes it fictional
that he looks at a mill. And this act is such that fictionally it itself is his looking at
a mill; he imagines of his looking that its object is a mill. We might sum this up by
saying that in seeing the canvas he imaginatively sees a mill. Let’s say provisionally
that to be a “depiction” is to have the function of serving as a prop in visual games
of this sort. (1990, 294)
Most pieces of music, according to Walton’s arguments discussed earlier, do not have a
work world and thus do not function as depictions. But some pieces quite obviously do
function this way. To take a familiar example, those who listen to the fourth movement
of Beethoven’s Pastoral Symphony play a game in which it is fictional that there is a
thunderstorm, and in which they are to imagine that their act of listening to the music
is an act of listening to a thunderstorm. Underlying this depictive function is the sonic
resemblance the music bears to a thunderstorm. Is this how we should explain what
happens when Bloch’s musical listener “psychologically anticipates the Real-Possible”
(1986, 144)—the concrete possibility of utopian reconciliation between the human
species and the human lifeworld—when she performs an act of “visionary hearing?”
Does music’s utopianism consist in its being an auditory depiction of utopia?
Probably not. It would be odd to claim that musically induced utopian make-believe
involves imagining that we hear utopia. Our game world when we listen to utopian
music is not plausibly one in which it is fictional that we have an auditory experience of
utopia. The reason for this is banal: there is not a distinctive way (or even a distinctive
set of ways) a utopian social arrangement would sound. Unlike thunderstorms, utopian
social arrangements lack a defining sonic profile. Perhaps it is true that utopian music’s
tonal structure is abstractly isomorphic to utopia’s interpersonal structure—but this is
not the same as music sounding like utopia. Music cannot sound like utopia, because
there is nothing (in general) that utopia sounds like.
But if music cannot depict utopia, what can it do? One response that comes to mind
is that music might allegorize utopia. Here, again, Walton’s analysis is helpful. Walton
A Hopeful Tone 509
understands allegory as art that (1) refers to something that is different from what it
represents; and (2) refers by representing:
Dr. Pangloss in Voltaire’s Candide stands for Leibniz, to whom the work refers. . . . But
I prefer not to regard [this] work as representing Leibniz . . . in our sense. It is not
fictional of Leibniz that his name is “Pangloss” and that he became a “beggar covered
with sores, dull-eyed, with the end of his nose fallen away, his mouth awry, his teeth
black, who talked huskily, was tormented with a violent cough and spat out a tooth
at every couch,” and in this sorry state met his old philosophy student, Candide, to
whom he continued to prove that all is for the best. We are not asked to imagine
this of Leibniz, although we are expected to think about him when we read about
Pangloss, to notice and reflect on certain “resemblances” between the two. Pangloss
is Voltaire’s device for referring to Leibniz, but he refers to Leibniz in order to com-
ment on him, not in order to establish fictional truths about him. Reference is thus
built on the generation of fictional truths, ones not about the things referred to, is
one common kind of allegory. (1990, 113)
What could utopian music fictionally represent, such that it could thereby allegorically
refer to utopia? We can answer this question by giving Leppert’s Adorno-inspired words
some Waltonian prefixes: a work of music makes it fictional that its notes “bend . . .and
blend . . .their expressive character toward the whole,” gets us to imagine that those notes
“permit . . .the reenactment of reconciliation between subject and object,” prescribes that
we make-believe that the notes don’t “surrender any sense of [their] own spontaneity”
(Leppert 2005, 116). Musical notes are not agents and so cannot literally surrender sponta-
neity (or perform any of the actions Leppert mentions); but music gets us to imagine it.
And in prescribing such an imagining, Bloch could say, were he inclined to be so precise,
musical sounds thereby refer to a not-yet-existing utopia and impel us “to notice and
reflect on certain ‘resemblances’ between the two” (Walton 1990, 113) (i.e., between the
musical-objects-as-agentially-imagined and the allegorically referred-to utopian
state). Or at least, musical sounds would do so in the context of a semiotic listening
practice that has been reconstituted according to Bloch’s prescriptive meta-aesthetics.
In the music-interpretive practice whose adoption Bloch can be understood as advocat-
ing, music assumes a utopian function via something like the following mechanism: by
getting us to imagine that its notes relate to one another dynamically, holistically, and
organically as mutually reconciled sonic agents, music makes allegorical reference to
utopia, invites us to perform the contemplative action of reflecting on the nature of
utopia, and causes us to desire the actualization of utopia.
Conclusion
We have at last arrived at a proposal that is both sufficiently Blochian and sufficiently trans-
parent: music ought to be taken to be as an allegory (in Walton’s technical sense) of utopia.
510 BRYAN J. PARKHURST
Bloch’s writings on cultural hermeneutics are meant to serve as a series of object lessons
in how to endow cultural items with a utopian function. I have attempted to specify with
a reasonable degree of precision how this utopian function could work in the musical case.
At most, this interpretive reconstruction shows the bare possibility of putting music to
such a use. But it goes no great distance toward demonstrating the wisdom or utility or
likelihood of putting music to such a use. One might well ask: what is there to admire,
even for a committed Marxist, about an aesthetic practice in which lots of music unin-
tentionally (through no deliberate decision on the part of its composer) allegorizes pretty
much the same thing? Also, is the concept of an unintentional allegory at all sensible?
If allegorical content is unrelated to authorial intention, what, if anything, constrains
the interpreter’s attributions of allegorical meaning? And, even if these questions have
convincing answers, one wonders why Bloch sees this type of listening practice as hav-
ing paramount exigence for politics. There is a hard row to hoe for anyone who would
defend Bloch’s insistence on the political momentousness of listening to the canon of
common-practice classical masterworks as radical allegories. In the first place, it is diffi-
cult to see how utopian music could tell us anything we do not already know about utopia.
For the act of allegorical interpretation to get off the ground, the interpreter needs to
already be aware of the one thing music says about utopia, which is that utopia possesses
(roughly speaking) organic form; otherwise the interpreter would have no way of deter-
mining that utopia, in particular, is what the music allegorically refers to. Moreover, the
only people who would have any real inclination to try to hear music as Bloch instructs
us to hear it—as radical political allegory—are those who are antecedently convinced
of the rightness of socialist ideals and antecedently disposed to pursue them (by, among
other methods, listening to music in an appropriately socialist way). Thus, Bloch’s
prescribed aesthetic practice presupposes the kind of knowledge and motivation that it
would need to instill, were it to hold any real claim to political efficacy. Bloch’s failure
to notice the self-underminingness of his central commitment is symptomatic of an
underlying credulity that runs throughout his writings and that threatens to vitiate his
positive project at large. Even Bloch’s most sympathetic expositors at times feel the
temptation of dismissing his system wholesale:
The problem is that Bloch . . .retreats into cipher talk at so many analytically crucial
points that [his philosophical system] runs the risk of being poetry philosophy, a
theurgic aestheticist Weltanschauung: a system of faith in hope with splendid meta-
mystical meditations, but little explanatory power. (Hudson 1982, 151–152)
It may also be the case that Bloch, in spite of his “emphasi[s] that Marxism must actively
inherit the total cultural heritage” and in spite of his “break . . .with the Eurocentrism
and high bourgeois bias of the Marxist tradition in aesthetics” (Hudson 1982, 174),
was an unwitting captive of his own elevated taste for European art music. A cultured,
middle-class convert to Marxism, Bloch was unable to relinquish the conviction that the
esteemed musical works of the Austro-German tradition ought to be politically salvage-
able and, more than this, politically essential in relation to his adopted Marxist ideals.
A Hopeful Tone 511
In Hudson’s politely damning assessment, “the underlying insight that Bloch always
remained a ‘bourgeois’ intellectual with left adventurist sympathies is not without
foundation” (Hudson 1982, 211).
This is not to say that Bloch offers no genuine insights to politically committed
Marxists who are concerned with art. Bloch’s writings read like the diaries of a resolute
socialist determined to find hope and inspiration wherever they can be found. At their
best, they make vivid the appeal of trying to reconfigure our extant aesthetic practices so
that they become sources of moral hope and political fervor as well as instruments of
hegemonic (in Gramsci’s sense) culture-building more generally. But Bloch tethered his
attempt to formulate a revolutionary aesthetics to a fundamentally passive (and character-
istically bourgeois-Romantic) conception of the aesthetic domain as principally a
space of aesthetic reception. What matters most, for Bloch, is that music be heard in
the right way. It seems not to have occurred to him to attempt to develop a comple-
mentary notion of the revolutionary potentialities of aesthetic production, nor to think
through the social implications of an aesthetic practice that transcends the division of
labor between aesthetic producer and aesthetic consumer, or one that transcends the
division (massively expanded under capitalism) between producing art and producing
“necessities.” Walton’s normativist theory helped us to put our finger on these elementary
deficiencies, which Bloch’s formidable prose style and esotericism make it easy to
overlook. If we are to begin to remedy these deficiencies, though, first we must duly free
ourselves from the constricting tenets of (exclusively) reception-based aesthetics.
Notes
1. To music is to take part, in any capacity, in a musical performance, whether by performing,
by listening, by rehearsing or practicing, by providing material for performance (what is
called composing), or by dancing. We might at times even extend its meaning to what the
person is doing who takes the tickets at the door or the hefty men who shift the piano and
the drums or the roadies who set up the instruments and carry out the sound checks or
the cleaners who clean up after everyone else has gone. They, too, are all contributing to the
nature of the event that is a musical performance (Small 1998, 9).
2. This is consistent with the use of “normativist” in the philosophy of law. “Normativism or
the normative theory of legal science represents an attempt to describe (and to rationalize)
the actual practice and thinking of contemporary jurists [in which] jurists in fact typically
provide statements of norms in a deontic language—in a language that is to say, that is
syntactically indistinguishable from the language used to give expression to the norms
themselves” (Guastini 1998, 317). Normativist legal theory seeks to describe the fundamentally
normative practices of jurists, just as normativist aesthetic theory seeks to describe the fun-
damentally normative practices of aesthetic subjects.
3. As is well known, some of the relevant judgmental norms for Kant are disinterestedness,
universality, and (the representation or apprehension of) purposiveness without a purpose.
4. This may already be obvious. Hume, one may reasonably think, can be equally well
described as making a normative claim about how aesthetic judges ought to behave or,
alternatively, as making a normativist claim about what rules are in fact followed by those
who count as true aesthetic judges. Though the distinction between normative aesthetics
512 BRYAN J. PARKHURST
and normativist aesthetics is easy to blur, it proves to be analytically useful for describing
Bloch’s project. And there is a tolerably clear difference between paradigmatic instances
of purely evaluative normative claims, such as Marx’s pronouncement that ancient sculp-
ture remains a perfect aesthetic “standard and model beyond attainment” for modern
artists (quoted in Lifshitz 1973, 89), and purely anthropological normativist claims, such
as Marx’s observation that the earliest Greek statues normatively adhered to “models of
the mathematical construction of the body” and were the products of normative practices
in which “nature was subordinated to reason rather than to the imagination” (quoted in
Lifshitz 1973, 37).
5. I should acknowledge at the outset that this goes against the mystical, theurgic grain of
Bloch’s philosophy. Bloch’s “erratic blocks of hyphenated terminology, luxuriant growths
of pleonastic turns, [and] heaving of dithyrambic breath” (Habermas 1969–1970, 316) are
not often counterpointed by rigorously clear argumentation. Nevertheless, I try to elicit
from Bloch’s writings about music an unambiguous basic commitment and a possible
rationale for it. Without this much, we have no basis for making a principled assessment
of what is living and what is dead in Bloch’s aesthetics.
6. Kellner and O’Hara describe Bloch’s philosophical venture as having a Hegelian-
teleological complexion: “For Bloch history is a struggle against those conditions which
prevent the human being from attaining self-realization in non-alienating, non-alienated
relationships with itself, nature, and other people. Bloch constantly argues that Marxist
theory ought not to forget its telos, which is, as Marx puts it in the 1844 Economic-
Philosophic Manuscripts: ‘the naturalization of man and the humanization of nature’”
(1976, 14–15).
7. For the sake of convenience, I will summarize Walton’s theory as though it were focused
solely on imaginative engagement with artworks, even though he deals with imaginative
engagement with artworks as a special case of engagement with fictional representations
in general, which is itself a special case of make-believe in general.
8. Jankélévitch (2003) is a contemporary champion of this sort of view.
9. The musicological subdiscipline of musical semiotics, part of which involves an attempt to
recover of codes of signification (“topics”) that would have been familiar to contemporaneous
audiences of historically remote music, is the major source of recent affirmations.
10. Cf., Kivy:
Unlike random noise or even ordered, periodic sound, music is quasi-syntactical;
and where we have something like syntax, of course, we have one of the necessary
properties of language. That is why music so often gives the strong impression of
being meaningful. But in the long run syntax without semantics must completely
defeat linguistic interpretation. And although musical meaning may exist as a
theory, it does not exist as a reality of listening. (1990, 8–9)
11. See Hullot-Kentor’s translator’s note in Adorno (1998, 273) for a helpful discussion of the
term “Sprachcharakter.” One of the leitmotifs of Adorno’s Aesthetic Theory is the notion
that modernist artworks (of whatever medium) express themselves in “a language remote
from all meaning” (105). This is problematic not just because the analogy with language
becomes strained for nonsematic artworks that also lack a codifiable syntactical dimen-
sion (such as abstract expressionist paintings), but also because, as we shall see, Adorno’s
denial of musical signification sits uneasily with his musical utopianism.
A Hopeful Tone 513
12. “[T]he lateness of the upper voice, and its dallying quality, the rigidity of the bass’s progression,
the fortuitousness or accidentalness of the D-major triad, the movement to something new,
are in the music. To miss these is, arguably, to fail fully to understand or appreciate the
music” (Walton 2015, 158). Perhaps so, but this insufficient understanding or appreciation
does not appear to disbar someone from counting as listening to music, the way one would
fail to count as reading a novel if one imagined nothing about anything while running
one’s eyes over its words.
13. Walton notes that some of the representational musical imaginings he catalogs “may
strike one as optional, as not mandated especially by the music itself, and so not contribut-
ing to a fictional world of the musical work” (2015, 173).
14. I defend Walton’s view in Parkhurst (2012).
15. According to Daniel (2001), in Adorno’s Aesthetic Theory,
Innerlichkeit (a term Adorno’s translators variously render as “inwardness,” “interi-
ority,” and the “bourgeois interior” and which refers simultaneously to the inner
psychic domain of the bourgeois subject and his actual living space) is described
typically as having been initially a strategy developed by the emergent bourgeoi-
sie for its own self-differentiation and self-definition in the face of a rigidly
imposed external order. A psychic site of refuge constructed to accommodate an
imagined alternative life, the bourgeois interior was fatally flawed, however, in
that it was content merely to look like an alternative to the external order without
really being in any way resistant to it.
16. This list is drawn from Korstvedt’s attempt to describe, in Blochian terms, how Bloch seeks
to simultaneously cancel, appropriate, transcend, and reconfigure emotional listening:
“Bloch imagines a refunctioning of Romantic, bourgeois Innerlichkeit that transforms
subjective space from a place of sentiment, longing, and emotion, even of suppressed
animality, into one that opens onto “an ethics and metaphysics of inwardness, of fraternal
inwardness, of the secrecy disclosed within itself that will be the total sundering of the
word and the dawn of truth over the graves as they dissipate.” He believes that music,
the only “subjective theurgy,” is the way that leads to this mystery, yet he is rather vague,
almost groping in his explanation; he avers that as “the inwardly utopian art,” music “lies
completely beyond anything empirically demonstrable,” but suggests that the sublime
music of deliverance “at the End will not withdraw allegorically back into a home strange
or even forbidden to us; but will accompany us, in some deep way, to the mystery of
utopia” (2010, 122).
17. William Wordsworth’s preface to his and Samuel Taylor Coleridge’s Lyrical Ballads gives
this famous description of what is accomplished by “all good poetry” (Wordsworth and
Coleridge 2008, 175). M. H. Abrams (1971) advances the view that Romantic aesthetics is
to a significant extent unified by its tendency to generalize this description of (or pre-
scription for) poetry and extend it to all the arts.
18. Adorno pursues the idea that innerlichkeit is a kind of spiritual real-estate. Daniel (2001)
helpfully epitomizes Adorno’s view:
The alternative world of interiority is one built ostensibly for self-protection, a
psychic/physical space into which the subject can withdraw for comfort and ref-
uge. This option of withdrawal is clearly a class privilege of the bourgeois, who is
naive and/or arrogant enough to presume that he can create his own exclusive and
514 BRYAN J. PARKHURST
private warmth. But it is a privilege indulged at great cost. The alternative world of
interiority can only be inhabited (although “occupied” might be the more accurate
term here) once the subject has renounced a somatic relationship with the world:
the bourgeois interior is thus “museal,” a “still life [in which] the self is over-
whelmed in its own domain by commodities and their historical essence.”
19. Both Bloch’s early expressionist work, The Spirit of Utopia (1918), and the sprawling
presentation of his heterodox Marxist philosophy, The Principle of Hope (three volumes,
1954–1959), deal extensively with music. Bloch (1985) contains the most important musical
discussions from these works.
20. This view of Bloch’s anticipates the treatment of language in Adorno’s Aesthetic Theory.
According to Francesca Vidal:
Music cannot be understood the way language can; it is not interpretable in the
sense that words are. Therefore, Bloch employs the term “call” (Ruf). Music
wants to be heard; this links it with language, but it is understandable otherwise
than language. That the call for an “otherwise than here” is attributed to it
derives from the philosophy of music. That the relationship between philosophy
and music is mutual, and philosophy is not simply interpreting something into
music, is because music itself expresses something of the future, something that
in the openness of its process has to do not only with music itself but with the
world. (Vidal 2003, 173)
21. “[C]lairvoyance is long extinguished. Should not however a clairaudience, a new kind of
seeing from within, be imminent, which, now that the visible world has become too weak
to hold the spirit, will call forth the audible world, the refuge of the light, the primacy of
flaring up instead of the former primacy of seeing, whenever the hour of the language of
music will have come. For this place is still empty, it only echoes obscurely back in meta-
physical contexts. But there will come a time when the sound speaks” (Bloch 2000, 163).
22. The breadth of Bloch’s interests is staggering. Part I of the Principle of Hope examines
“small day dreams”; Part II explores the “anticipatory consciousness” of utopia; Part III
explores “the reflection of wish-images” in advertisements, fashion and design, fairy tales,
travel, circuses, and theater; Part IV explores how “the outlines of a better world” may be
descried in utopian literature, technology, architecture, painting opera, poetry, philoso-
phy, and recreation; Part V explores the “wish images of the fulfilled moment” that arise
in moral philosophy, music, funereal practices, religion, and communism as humankind’s
summum bonum. Interestingly, Walton’s philosophy of make-believe has also been rightly
lauded for the vastness of the range of cultural products it brings into consideration.
23. Bloch’s conception of musical tones themselves as material bearers of utopian content is
historically and musicologically contextualized in Gallope (2012).
24. “Bloch works overwhelmingly with European and particularly German music, so
much so that he is really offering a Western philosophy of music, in content at least.
The trap is a common one, for the very particular and anomalous history of Western
economics and culture becomes the norm for universalizing ‘the’ philosophy of music”
(Boer 2014, 105).
25. Here Bloch draws on the Christian image and instrument of the fountain as a source
of the “Water of Life” by which the faithful are baptized into immortality. Bloch’s engage-
ment with Christianity, and his willingness to place Marxism in dialogue with Judeo-
Christian theology, have been widely discussed. Marsden 1989 is a good introduction
to this topic.
A Hopeful Tone 515
26. This is truer of Bloch’s aesthetics as a whole than it is of his musical aesthetics, which deals
predominantly with the great works of the Western canon.
27. For a book-length argument to the effect that Adorno’s aesthetics is more utopian than
not, see Boucher (2013).
28. Copious qualifications are in order, given that Adorno sees a crucial difference between
(for instance) the kind of totality exhibited by the music of the heroic period of the
bourgeoisie (Beethoven), by the kind of neoclassical music that apes such music (early
Stravinsky), and the “administered totality” of twelve-tone serialism (Schoenberg after 1921).
For present purposes, I am trying to avoid such complications and cut to the chase.
References
Abrams, M. H. 1971. The Mirror and the Lamp: Romantic Theory and the Critical Tradition.
Oxford: Oxford University Press.
Adorno, T. W. 1993. Music, Language, and Composition. Translated by S. Gillespie. Musical
Quarterly 77 (3): 401–414.
Adorno, T. W. 1998. Aesthetic Theory. Translated by R. Hullot-Kentor. Minneapolis: University
of Minnesota Press.
Adorno, T. W. 2002. Essays on Music. Berkeley, CA: University of California Press.
Aristotle. 1961. Aristotle’s Poetics. Translated by S. H. Butcher. New York: Hill and Wang.
Attali, J. 1985. Noise: The Political Economy of Music. Translated by B. Massumi. Minneapolis:
University of Minnesota Press.
Boer, R. 2014. Theo-Utopian Hearing: Ernst Bloch on Music. In The Dialectics of the Religious
and the Secular, edited by M. R. Ott, 100–133. Leiden: Brill.
Boucher, G. 2013. Adorno Reframed. London: I. B. Tauris.
Bloch, E. 1960. Spuren. Frankfurt: Suhrkamp Verlag.
Bloch, E. 1971. On Karl Marx. New York: Herder and Herder.
Bloch, E. 1972. Atheism in Christianity. Translated by J. T. Swann. New York: Herder.
Bloch, E. 1985. Essays on the Philosophy of Music. Translated by P. Palmer. Cambridge:
Cambridge University Press.
Bloch, E. 1986. The Principle of Hope. 3 vols. Translated by N. Plaice, S. Plaice, and P, Knight.
Cambridge, MA: MIT Press.
Bloch, E. 2000. The Spirit of Utopia. Translated by A. A. Nassar. Stanford, CA: Stanford
University Press.
Chion, M. 1994. Audio-Vision: Sound on Screen. Translated by C. Gorbman. New York:
Columbia University Press.
Daniel, J. O. 2001. Achieving Subjectlessness: Reassessing the Politics of Adorno’s Subject of
Modernity. Cultural Logic 3(1). https://clogic.eserver.org/jamie-owen-daniel-achieving-
subjectlessness.
Gallope, M. 2012. Ernst Bloch’s Utopian Ton of Hope. Contemporary Music Review 31 (5–6):
371–387.
Guastini, R. 1998. Normativism or the Normative Theory of Legal Science: Some
Epistemological Problems. In Normativity and Norms: Critical Perspectives on Kelsenian
Themes, edited by S. L. Paulson and B. Litschewski Paulson, 317–330. New York: Oxford
University Press.
Hegel, G. W. F. 1920. Philosophy of Fine Art. Vol. 3. Translated by F. P. B. Osmaston. London:
G. Bell and Sons.
516 BRYAN J. PARKHURST
Sou n d as
En v iron m en ta l
Pr e sence
Toward an Aesthetics of Sonic Atmospheres
Ulrik Schmidt
Introduction
Environmentality
Environmental Imagination
but one rarely has the opportunity to see the sources of most of those sounds . . . This
acousmatic feature is best exemplified by one of the most characteristic sounds of La
Selva: the strikingly loud and harsh song of the cicadas. . . . You hear it with an
astonishing intensity and proximity. Yet, like a persistent paradox, you never see its
source. (López 2004, 86)
This basic acousmatic quality of the sonic environment indicates a basic link between
the experience of sonic environmentality and the process of environmental imagination.
Because of the acousmatic curtain, the auditory experience of our environment
corresponds with a cognitive process in which we spontaneously map the cacophony of
sonic environmental effects onto a total image of the environment as a multisensory
whole. This environmental imagination by way of acousmatic environmentality can
take place in two different ways. It can be produced by sonic events that take place as
part of the individual’s actual physical surroundings. Or it can be produced by sonic
events that take place in a virtual space. The virtual production can again either happen
representationally, as is the case in most technical reproductions or simulations (technical
or mental) of actual sonic environments, or it can happen in a nonrepresentational,
synthetic construction of an abstract virtual environment. However, in terms of sonic
environmentality as an acousmatic stimulation of the individual’s environmental
sound as environmental presence 523
to produce a “sense of presence” [das Spüren von Anwesenheit] (Böhme 2001, 45).
Atmospheres, Böhme notes with explicit inspiration from Heidegger, “seem to fill the
space with a certain tone of feeling like a haze” (1993, 113–114); they evoke a vague sense
of something’s or someone’s environmental “being-here” as a feeling of “indeterminate
and spatially disseminated moods” (2001, 47).
How can we, more precisely, characterize this experience of environmental presence
as a spatially distributed mood, tone, or feeling? As I argue, the sense of atmosphere as
environmental presence is mainly evoked by two interrelated factors. First, it is related
to the production of presence as a site-specific (Kwon 2002) sense of being in a particular
place. As Jürgen Hasse describes it, we perceive atmospheres “as an affective tone of a
place. . . . They communicate something about the distinct qualities of a place in a
perceptible manner, they tune us to its rhythm” (Hasse 2014, 215). This sense of site-
specificity is closely related to what Böhme calls the “ecstasy of things” (1993, 1995, 2001)
by which he understands a certain capacity of individual entities to “go out of them-
selves” in our environmental perception (gr. ekstasis: standing out). In each specific
situation, all individual things—objects, materials, persons, sounds, everything that
makes up the sociomaterial environment—go out of themselves and merge into a
unique assemblage that intimately connects our atmospheric imagination to the place
or site in which it is produced and evoked.
Second, the atmospheric sense of presence is closely related to the environmental
imagination of a particularly human or social quality. According to Heidegger, “mood”
(Stimmung) is the existential capacity of Dasein that continuously and in each particular
moment in time “makes manifest ‘how one is and is coming along’” ([1927] 1996, 127).
Following Böhme’s Heideggerian definition of atmosphere as “spatially d isseminated
moods,” atmosphere can in view of that be described as the experience of “how a particular
environment is and is coming along.” It discloses the “state” of a place or situation
by investing it with a human-like affectivity and expression of being in a certain mood.
Or as Jürgen Hasse describes it, again with an implicit reference to Heidegger, atmospheres
“let us comprehend without words how something is around us. Therefore, atmospheres
are also indicators of social situations” (2014, 215). Atmospheres “are not things, but
emotions that we are affected by as essences of the world-with-others” (221).
Anthropomorphism is the attribution of human characteristics—human form,
behavior, consciousness, expressivity—to nonhuman things. In view of that, the pro-
duction and experience of atmosphere as spatially disseminated moods can be said to
involve an unmistakable anthropomorphism because of its tendency to infuse our
environmental imagination with a human-like sense of intentionality, expressivity, and
emotion. Anthropomorphism makes the environment act and perform “like a human
being” emerging from the assemblage of relations between materials, things, bodies,
and events dynamically distributed throughout the place—like “a sort of spirit that floats
around,” as Michel Orsoni describes it (quoted in Anderson 2014, 137). An atmosphere
is, in other words, not only essentially social; it is the environmental performance and
imagination of our sociomaterial surroundings as “Other” in the form of an abstract,
quasi-subjective being. Needless to say, this sense of mood, spirit, or intentionality does
526 ulrik schmidt
not involve the construction of another subject, a particular person, in our imagination.
Atmospheric anthropomorphism remains essentially environmental.
To summarize my argument so far, atmosphere can be described as the affective
production and environmental imagination of a site-specific and anthropomorphic
presence emerging from the material layout of a particular environment. This anthropo-
morphic and site-specific character, it must be stressed, is a unique quality of atmosphere
as basic environmentality. You will find nothing like it in either ecological or ambient
environmentalities that both affect the individual by way of essentially nonhuman
properties and effects. Actually, the very difference between atmosphere and the other
two basic environmentalities regarding the specifically human properties and imagi-
nations allows us to see how atmosphere is in fact the sole basic environmentality in
and by which the human dimension of our environment—that is, social, subjective,
anthropomorphic—is performed and experienced. Atmosphere, in short, is the per-
formance and imagination of the specifically human relations with our environment.
Sonic Atmospheres
So, if we return to the question of sonic atmosphere, how can we more precisely understand
the sonic production of site-specific and anthropomorphic presence? What is the
specifically atmospheric dimension of our sonic environment? What does a sonic
atmosphere sound like? To explore this question in more detail I will, in the last part of
the chapter, consider two examples taken from two different domains in which the
staging and experience of sonic atmospheres is particularly prevalent: cinematographic
sound design and contemporary sound art. To emphasize the specifically atmospheric
qualities of the sonic environments in question, I will occasionally include accompanying
observations on ambient and ecological environmentalities as well.
the music continually invests the film’s many scenes of the dark and empty outer space
with a mystical abstract presence that hovers over the imagined environment throughout
the film.
However, rather than being a result of the use of music, the cinematographic creation
of sonic atmosphere mainly takes place in relation to the overall sound design of the film
or game in question. As a paradigmatic example of this, consider David Lynch’s and
Alan Splet’s pioneering sound design for Lynch’s Eraserhead (1977). The film’s sound
track itself—later made available in a shorter version as a stand-alone release in its own
right (1982)—is a profound example of the evocation of environmental presence by the
use of sound. By incessantly combining acousmatic site-specific action with environ-
mental sounds of anthropomorphic expression, the soundscape itself becomes a leading
character in the staging of the film’s bizarre and anxious universe.
In order to explore this in more detail, consider the first scene of the film (6:00–11:20)
that comes after a short prologue. Here, we follow the protagonist Henry walking
home from work through an empty industrial landscape, into his building, up the
elevator, and down the hallway to his apartment. Outside his apartment, the woman
next door approaches him with a message from a girl named Mary who called on the
payphone. After a minute’s dialogue, Henry enters his apartment. The whole scene
takes approximately five minutes.
The film’s emphasis on durational time, with little action and narrative progression,
gives room for an affective staging of environmentality and the stimulation of our
environmental imagination. Henry’s appearance, including his conversation with the
neighbor, is awkwardly nervous and tense and it gives the whole scene an uneasy and
claustrophobic feeling. This feeling of anxiety and tension, however, is not only a product
of Henry’s awkward behavior. It is effectively intensified by the scene’s sound design. In
the whole passage, as is the case throughout the entire film, we constantly hear deep,
droning layers of complex abstract noise. On this noisy background, an acousmatic
series of inconspicuous individual sound events is heard, coming from particular but
undefined off-screen locations in the environment as an imagined whole. The overall
result is a looming and penetrating feeling of environmental presence.
The sonic environmentality staged in Eraserhead is not exclusively atmospheric but
equally involves a production of ambient and ecological effects. The droning noise, for
instance, envelops the whole scenario in an ambient sensation of being immersed in a
total field of sound. And the layers of individual sounds from disparate ontological
levels might intensify the environmental imagination of a scenario in which all parts are
interconnected and mutually involved with everything else in a nonhierarchical, eco-
logical mesh. Still, however, sonic atmosphere is arguably the most profound of the
three basic sonic environmentalities in Eraserhead. While the main aesthetic function
of the heavy layers of background noise mainly is to give the whole scenario a strong
overall environmental character (ubiquity, consistency), the major role of each indi-
vidual sound event is to stage a particular atmosphere by simultaneously evoking a
strong sense of site-specificity and a feeling of anthropomorphic presence penetrating
the whole scenario.
528 ulrik schmidt
A short list of the most important individual sound cues that can be heard over the
layer of distant noise during the first scene could read like this:
The individual sounds in the first scene can be categorized into three main groups:
industrial sounds of mechanical or machinic activity (designated with an “a” in the list);
concrete sounds of human bodily actions (b); and sound signals and other sounds with a
strong anthropomorphic, voice-like character (c). These individual sounds from the
different groups, and the way they mix into a continuous sequence of varying intensity,
are the main contributors to the overall atmosphere of the scene as a sense of environ-
mental presence. The sounds of mechanical and bodily action help—in the midst of
chaotic noise—to perceptually consolidate the scenario as a particular place, a physical
location, in which concrete actions take place. And both the specifically human character
of the action (b) and the anthropomorphic sound events (c) further invest the scenario
with a human presence that is not reducible to each single sound but rather stems from
the environment itself as an expressive imaginary whole. The whole environment seems
to be alive, constantly expressing itself and communicating to us about its state of being.
One might want to interpret this expression as a mere sonic representation of Henry’s
mental and emotional condition. But the aesthetic effect is first and foremost nonrepresen-
tational and profoundly environmental. The various sounds persistently perform as
an environmental whole. The combination of noise, site-specific action, and anthropo-
morphic expression affects us by directly stimulating our environmental imagination
and enveloping us in the sense of environmental presence we call atmosphere.
We arrive in a quiet section of the work. All we hear are birds singing quietly among
the trees occasionally accompanied by the sound of people walking, handling things
and moving different objects around. A low-pitched drone of electronic sound fades in
to fill the environment for a few minutes, superimposed with speaking voices in a chaotic
mix of non-sensible chatter. The drones and voices disappear and we hear birds again,
now accompanied by the sound of people hammering and knocking on wood.
Occasionally, the knocking sounds join in rhythmic coordination, always on the verge
of becoming a musical practice. Suddenly, a large tree crashes to the ground with a
loud sweeping sound. After a short while a group of people starts to laugh, at first more
discreetly and dispersed, but soon more intensely and in concert. They laugh together
and they laugh at something, although we do not know what it is. The laughing stops
and after a short silence a high-pitched sound emerges, slowly, almost imperceptibly,
and soon we find ourselves immersed in the cacophonic noise of a heavy storm descend-
ing. After the storm has passed, we once again hear the sound of birds and people
walking around, handling different pieces of metal and wooden objects. Occasionally
530 ulrik schmidt
we can hear the snorting sound of a large animal nearby. After a period of stillness—a
disturbing stillness, as if the whole environment is waiting for something to happen—the
space is pierced by the haunting scream of a girl somewhere in the distance. After a
short while, we hear the metallic sound of cars rolling by, soon followed by marching
feet and droning airplanes above. What sounds like a large wooden wagon is being
pulled across the forest, we hear the neighing of horses and sounds of military drums
approaching. Suddenly a group of men are shouting aggressively nearby, and we find
ourselves in the middle of a sonic battle of gunshots and bombs exploding. The droning of
airplanes return, machineguns and missiles are being fired everywhere. The battle ends in
a brief intense climax. After a short period of penetrating silence, we can hear the beautiful
sound of choir music (Arvo Pärt’s Nunc dimittis [2001]). The music plays for a few
minutes, then the sounds of singing birds and people moving around return once again
and another 28 minutes’ loop begins. (notes translated from Danish by the author)
As this short description of Forest suggests, each sonic event has a very specific aes-
thetic function in the overall production of atmospheric environmentality. They all
help to provoke our environmental imagination by creating a tense experience of
physical action and aroused emotions, acousmatically distributed among the trees to
produce an atmospheric sense of environmental presence. In fact, Forest is a profound
example of the very combination of site-specific action and anthropomorphic affec-
tivity that is the main feature of atmospheric environmentality. Everything we hear
supports the production of a sense of specificity and virtual human presence among
the woods and helps to intensify the overall affective character of the environment as
an imaginary whole.
Apart from the musical intermezzo, the basic sonic means used to create the atmosphere
in Forest are essentially quite similar to the ones in Eraserhead. The sonic material
mainly consists of acousmatic sounds of animals, human bodily action, voices, and
machines combined with occasional sounds of electronic drones and stormy
weather. What distinguishes the production of sonic atmosphere in Forest from that of
Eraserhead is, among other things, a much stronger emphasis on dramaturgical elements.
The narrative action, however, remains somewhat abstract throughout the whole cycle.
Despite the fact that we hear all sonic action in excessive detail, what exactly is taking
place remains obscured, hidden as the action is behind the double acousmatic curtain of
the forest/sound system.
But again, precisely because of this acousmatic abstraction, the sounds stimulate our
environmental imagination all the more forcefully by intensifying our tendency toward
causal listening. We constantly strive to locate the virtual action that is taking place
around us and to figure out “what it is.” In direct contrast to Schaeffer’s hope for a pure
reduced listening in acousmatic space, Forest thus becomes a demonstration of how, in
Luke Windsor’s words, “the acousmatic curtain” not merely serves “to obscure the sources
of sounds. Indeed, it can be seen to intensify our search for intelligible sources, for likely
causal events” (2000, 31). So, in this process of intensified causal listening effectuated by
Forest’s double acousmatics, we spontaneously merge the disparate events into a multi-
sensory feeling of environmental wholeness that is both abstract and concrete at the
same time. To repeat the initial quote from Deleuze, the feeling of environmental
sound as environmental presence 531
Conclusion
The aim of this chapter has been twofold. First, the aim has been to explore our affective
relations with the sonic environment on a general level and, second, to analyze this
relationship in a more specific context as the production and imagination of sonic
atmospheres. Atmosphere is understood as the environmental production of a sense of
site-specific and anthropomorphic presence. The two examples considered—Lynch’s
532 ulrik schmidt
Eraserhead and Forest (for a Thousand Years) by Cardiff and Miller—are from the fields
of cinematographic sound design and contemporary sound art respectively. It would
indeed be possible, though, to expand the perspective and transfer the chapter’s overall
argument to other areas of contemporary auditory culture where the staging and
experience of sonic environmentality is of equal importance. In many computer games,
for instance, not only is the sonic production of environmentality crucial to give the
gameplay a sense of worldly realism but also very often sound is used intensively to
affect our environmental imagination of the game environment with an atmospheric
sense of site-specific and anthropomorphic presence quite alike the ones found in film
(Eraserhead) and sound art installations (Forest). And again, we can find similar tenden-
cies, albeit with quite different means, in our everyday use of background music where
the sonic production of atmospheric presence often plays an important role in the
staging of everyday social interactions. While generally stimulating the basic environ-
mental mode of listening described by Anahid Kassabian as a form of “ubiquitous
listening” (2013), background music is also, on a more specific level, typically used in
everyday life to intensify our experience of being in a particular place or social situation
by evoking a sense of site-specific and anthropomorphic presence. Sound and music is
employed to affectively evoke an environmental feeling of being in a particular place
and a particular mood.
In other words, sonic environmentality and the production of sonic atmosphere
cover a vast and diverse field of aesthetic practice including some of the most important
areas of contemporary auditory culture. With the distinction presented here between
atmosphere, ambience, and ecology as three basic dimensions of our affective relations
with the sonic environment, I have proposed a theoretical framework for a possible
further exploration of it in its different “distinct-obscure” manifestations. Hopefully,
such a framework may inspire other contributions to the future development of what
could become a general aesthetics of sonic environmentality. Still, however, in this process
we must keep in mind not only the affective and imaginative character of sonic environ-
ments but also how they affect us and stimulate our imagination as environments.
A true aesthetics of our sonic environment is, first and foremost, an aesthetics of sonic
environmentality.
References
Anderson, B. 2014. Encountering Affect: Capacities, Apparatuses, Conditions. Farnham,
UK: Ashgate.
Böhme, G. 1993. Atmosphere as the Fundamental Concept of a New Aesthetics. Thesis
Eleven 36: 113–126.
Böhme, G. 1995. Atmosphäre. Frankfurt am Main: Suhrkamp Verlag.
Böhme, G. 2001. Aisthetik. München: Wilhelm Fink Verlag.
Bregman, A. S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound.
Cambridge, MA: MIT Press.
Buchanan, B. 2008. Onto-Ethologies. Albany: State University of New York Press.
Deleuze, G. 1988. Spinoza: Practical Philosophy. San Francisco: City Light Books.
sound as environmental presence 533
Deleuze, G. 1994. Difference and Repetition. New York, NY: Columbia University Press.
Gendler, T. 2013. Imagination. Stanford Encyclopedia of Philosophy, edited by E. N. Zalta.
http://plato.stanford.edu/archives/fall2013/entries/imagination/. Accessed June 26, 2017.
Gibson, J. J. 1986. The Ecological Approach to Visual Perception. Hillsdale, NJ: Erlbaum.
Guattari, F. 2000. The Three Ecologies. London: Athlone Press.
Hasse, J. 2014. Atmospheres as Expressions of Medial Power. Lebenswelt 4 (1): 214–229.
Heidegger, M. (1927) 1996. Being and Time. Albany: State University of New York Press.
Herzogenrath, B. 2008. An [Un]Likely Alliance: Thinking Environment[s] with Deleuze/
Guattari. Newcastle upon Tyne, UK: Cambridge Scholars.
Herzogenrath, B. 2009. Deleuze/Guattari and Ecology. New York: Palgrave Macmillan.
Kane, B. 2014. Sound Unseen: Acousmatic Sound in Theory and Practice. Oxford: Oxford
University Press.
Kassabian, A. 2013. Ubiquitous Listening: Affect, Attention, and Distributed Subjectivity.
Berkeley: University of California Press.
Kim-Cohen, S. 2013. Against Ambience. New York: Bloomsbury.
Kwon, M. 2002. One Place after Another. Cambridge, MA: MIT Press.
López, F. 2004. Profound Listening and Environmental Sound Matter. In: Audio Culture,
edited by C. Cox and D. Warner, 82–87. New York, NY: Continuum.
López, F. 1998. Schizophonia vs L’objet Sonore: Soundscapes and Artistic Freedom. eContact
1 (4). http://www.franciscolopez.net/schizo.html. Accessed June 21, 2016.
Lynch, D. 1977. Eraserhead. Libra Films International.
Lynch, D., and A. Splet. 1982. Eraserhead. Original Soundtrack. I.R.S. Records.
McCullough, M. 2013. Ambient Commons: Attention in the Age of Embodied Information.
Cambridge, MA: MIT Press.
Merleau-Ponty, M. (1945) 2005. Phenomenology of Perception. London and New York:
Routledge.
Morton, T. 2007. Ecology without Nature: Rethinking Environmental Aesthetics. Cambridge,
MA: Harvard University Press.
Morton, T. 2010. The Ecological Thought. Cambridge, MA: Harvard University Press.
Schmidt, U., 2013. Det ambiente: Sansning, medialisering, omgivelse [The Ambient: Sensation,
Mediatization, Environment]. Gylling, Denmark: Aarhus University Press.
Schmidt, U. 2015. The Socioaesthetics of Being Surrounded: Ambient Sociality and Movement-
Space. In Socioaesthetics: Ambience—Imaginary, edited by A. Michelsen and F. Tygstrup,
25–39. Leiden: Brill Publishers.
Schmitz, H. 1993. Gefühle als Atmosphären und das affektive Betroffensein von ihnen. In: Zur
Philosophie der Gefühle, edited by H. Fink-Eitel and G. Lohmann, 33–56. Frankfurt am
Main: Suhrkamp Verlag.
Schmitz, H. 2014. Atmosphären. München: Verlag Karl Alber.
Schaeffer, P. 1966. Traité des objets musicaux. Paris: Éditions du Seuil.
Schafer, R. M. 1977. The Soundscape: Our Sonic Environment and the Tuning of the World.
Rochester, VT: Destiny Books.
Toop, D. 2004. Haunted Weather. London: Serpent’s Tail.
Uexküll, J. von. 1921. Umwelt und Innenwelt der Tiere. Berlin: Springer.
Uexküll, J. von. (1934) 2010. A Foray into the Worlds of Animals and Humans. Minneapolis:
University of Minnesota Press.
Windsor, Luke. 2000. Through and around the Acousmatic: The Interpretation of Electro-
Acoustic Sounds. In Music, Electronic Media and Culture, edited by S. Emmerson, 7–35.
Aldershot, UK: Ashgate.
chapter 26
Andy Hamilton
Introduction
In contrast to Justin Christensen’s (this volume, chapter 1) entry, this chapter addresses
improvisation in its cultural rather than psychological aspects—its expression as historical
536 andy hamilton
a thing which “exists in a person’s head” and nowhere else is alternatively called an
imaginary thing. The actual making of the tune is therefore alternatively called the
making of an imaginary tune . . . the making a tune is an instance of imaginative
creation. The same applies to the making of a poem, or a picture, or any other work
of art. (1958, 134)
According to the standard reading of the Croce-Collingwood view, the “total imagina-
tive experience” that constitutes the “work of art proper”—the artwork in a strict rather
than colloquial sense—must be regarded as related only contingently to the physical
artifact. (Collingwood’s position in fact may be subtler than this.)
In the aesthetic as opposed to general psychological case, I believe, this radically
mentalistic conception of imagination is mistaken. Equally mistaken, I believe, is Sartre’s
account of imagination, with which it has affinities. Sartre (2004) argues that a musical
work such as Beethoven’s Seventh Symphony exists neither in time, in the usual sense,
nor space. Rather, it exists in the imagination, in the imaginary, outside the real:
To the extent that I grasp it, the symphony is not there, between those walls, at the
tip of the violin bows. Nor is it “past” as if . . . in the mind of Beethoven. It is entirely
outside the real. It . . . possesses an internal time [which] does not follow another
time that it continues . . . nor is it followed by a time that would come “after” the
finale. [Yet the] Seventh Symphony depends, in its appearance, on the real: that
the conductor does not faint, that a fire breaking out in the hall does not put a sudden
stop to the performance. (191–194)
This Sartrean view of music shows the limitations of metaphysics, I believe—a cloud of
philosophy condensed in a drop of grammar—though this is not the place to offer
a critique.
However, one does not have to follow these writers in espousing a radically mentalistic
conception of both art and imagination, in order to recognize the germ of truth in the
connection between these concepts. The truth that these theories mislocate can be
explained as follows: When a piece of entertainment or craft is described as involving an
imaginative achievement, then it is being claimed as belonging to the realm of art. In
contrast, fantasy—such as the “Game of Thrones” genre—is a staple of low-grade enter-
tainment. Similarly, musical fantasia is composition that panders to a relaxed state of
pleasant, unimaginative connections and themes requiring no concentration or attention,
a pleasing, lazy stream of association that thinks of nothing beyond its own sensations.
This is not to dismiss fantasy entirely from the realms of art. There are works whose
538 andy hamilton
content is predominantly fantastical, such as the last plays of Shakespeare, Keats’s Lamia,
whose main point of interest is genuinely artistic—and Mozart’s keyboard “Fantasias”
include some of his most sublime shorter works. Much entertainment, in contrast, is for
its fans simply fantasy, or “novelty,” to resurrect an eighteenth-century aesthetic category.
Fantasy productions are “imaginative” only in the sense that they involve great insight
into popular taste, thus opening the way to commercial opportunities—an important
and neglected sense of imagination, but one that is nonartistic.
These contrasts apply to performances of improvised music. Here, as in other cases,
answering the questions “Is this art as opposed to mere craft or entertainment?” and
“Does this work involve imagination rather than fantasy?” appeals to the same kinds of
feature. In what follows I demonstrate the artistic status of improvised music, thus
showing its imaginative content.
To reiterate, a humanistic as opposed to abstract account of music sees it as a sounding,
vibrating phenomenon, and a performing art. Abstract or static accounts, in contrast,
are nonparticipant and intellectualist; they regard rhythm statically, as a pattern of
possibly unstressed sounds and silences—as simply order-in-time as opposed to order-
in-movement. Humanists stress music’s essential origins in the human production of
sound and movement, involving a distinctive attack characteristic of traditional musical
means of producing sounds by striking, bowing, or blowing. These means of production,
supplemented in the twentieth century by electronic media, are still essential to the concept
of music. On a humanistic conception, music, dance, and poetry originated together
and are essentially connected. (Chimps may dance or march rhythmically; but for
humanism in my sense, chimps are close enough to human.)
Philosophical humanism affirms the importance of humane understanding against
both scientism and—less common in a more secular intellectual climate—supernaturalist
exceptionalism. Hence the tripartite distinction:
Scientism: the view that the physical or natural sciences constitute the paradigm of
human knowledge, one on which other disciplines must model themselves.
Exceptionalism: the (normally religious) view that “human animal” is a contradiction
in terms, and that human beings are the only biological entity that cannot be grouped
with others on any level.
Philosophical humanism: holds that the explanation of human behavior is irreducibly
personal—that is, it essentially involves what is often termed the intentional stance,
resting on commonsense or “folk” psychology and the attribution of beliefs, desires,
intentions, and similar attitudes to rational agents. Whole-person ascription involving
the intentional stance is the fundamental level of explanation of human behavior.
Subpersonal and neural explanation has a place, but not, as scientism holds, the
ultimate one; so humanism does not amount to exceptionalism as defined earlier.
Humanism is not antiscientific, but antiscientistic—a quite different thing. (This
tripartite distinction is developed in Hamilton 2013a, chap. 7)
This chapter assumes a humanistic recognition of the value of art and the aesthetic to
human well-being. It advocates a normative conception of art and culture that challenges
the aesthetics of improvisation 539
To say that music is an art, is not to say that it is always a high art—that was the point of
describing it as an art at least with a small “a.” Clearly most music is not; but I will argue
that an improvised art music is possible, for instance in modern jazz. Art music is “Art”
with a capital “A”—high art as opposed to mechanical, vernacular, or popular art with a
small “a,” essentially craft or entertainment. The art historian Oscar Kristeller (1951)
famously argued that the modern system of the fine or high arts appeared only in the
eighteenth century:
In [the] broader meaning, the term “Art” comprises above all the five major arts of
painting, sculpture, architecture, music and poetry. These five constitute the irre-
ducible nucleus of the modern system of the arts, on which all writers and thinkers
seem to agree. (497)
On Kristeller’s view, Plato and the Greeks did not think of poetry and drama, music,
painting, sculpture, and architecture as species of the same genus, practiced by “artists”
in the current overarching sense of the term (Kristeller discussed in Hamilton 2007a).
The modern system separated fine art from craft, generating a concept of high art
produced by artists of genius, while leaving great scope for differences between the
individual arts.
Kristeller’s view underlies the modernist consensus. (Artistic modernism being an
intensification of modernity, from the later nineteenth century onward.) But even if one
disagrees with his claim that the Arts—with a capital “A,” the fine or high arts—arose
only in a modern system, aesthetics still needs to explore the very wide divergences
between modern concepts of art and those found in antiquity and in non-Western
cultures. Kristeller’s concern was with Western art, and widely differing models or
systems are found in other cultures—Edo-era Japan, for instance, valued “the Four
Accomplishments” or gentlemanly pursuits of music, games of skill, calligraphy, and
painting (Guth 2010, 11). Such cross-cultural data are essential in addressing the question
“What is art?,” and are more significant than considerations arising from postmodernism
that have tended to preoccupy Western commentators.3
The twenty-first century has no very clear system of the arts—the vogue for stipulating
one did not much outlive the eighteenth century—and there is a vagueness in the under-
standing of our present “system” of the arts, and in its accompanying notion of an “artistic
conception.” Nonetheless, it should be recognized that there is an implicit system, other-
wise practices of arts funding, newspaper reporting, and so on, would be impossible. The
modernist narrative interprets the fine or high arts, with their associated self-conscious
540 andy hamilton
According to this conception, art is autonomous, and its audience has freedom or
autonomy in interpreting it (see Hamilton 2013b).
This freedom is relative because, according to a familiar modernist dialectic, social
and thus aesthetic autonomy arises from, yet is in tension with, capitalist commodification.
In the period from the Renaissance to the later eighteenth century, different artforms
in turn became free of church and aristocratic patronage, as the artist’s work was com-
modified through entry into the capitalist marketplace. This process is found also in
non-Western art, such as that of Edo-era Japan and, indeed, on a smaller scale in art of
many eras (see Hamilton 2009). What is distinctive about post-eighteenth-century
developments, as in the development of capitalism generally, is their scale and ubiquity.
The concept of high art originates in social distinction, but the implied contrast is not
purely social. According to a persuasive modernist narrative, high art, which appeared
differentially across the arts from the Renaissance onward, is autonomous art; to reiterate,
it transcends the practical utility of mechanical arts, and the premodern social functions—
religious, courtly, and military—of art and music before their evolution as high arts.
High art originated as the art patronized by church and aristocracy, with elevated
themes and subjects. But high social location is neither sufficient nor necessary for high
art. Inigo Jones’s courtly masques for James I are regarded by art historians as expensive,
frivolous high-class entertainment, that wasted the architect’s genius. In contrast, in the
era of modernism, high art embraced low subjects. French realists such as Millet and
Courbet chose humble scenes. While The Gleaners might be imagined as having a high,
biblical theme, impressionism’s urban subjects could not; Caillebotte was criticized for
his working-class The House-Painters and The Floor-Scrapers. High art and art with a
small “a” is distinguished by autonomy and not directly by aesthetic value.
High art is not just a social category but also is historically conditioned—fully mani-
fested in Western modernity, but present in earlier times and other places. “High art”
parallels “High Renaissance” or “high modernism”—it refers to the highest or most
exemplary achievement (see Hamilton 2009). (“Fine art” stands in contrast with
mechanical art.) The modernist narrative interprets the fine or high arts, with their asso-
ciated self-conscious artistic conception, as autonomous artforms—independent of each
other, and having lost any defining practical or social function. Art ceased to be a prod-
uct simply for an occasion, and is liberated from direct social function in service of
court, aristocracy or church. It is created not simply to satisfy a patron, but as authentic
artistic expression. The modernist picture is that such a possibility, though perhaps only
remotely realizable, opens up when art enters the marketplace; art becomes potentially
autonomous at the same time as it becomes commodified. For different artforms, this
liberation occurred at different points from the Renaissance to the eighteenth century,
when music, the most backward art in this respect, finally gained its freedom. It is no
coincidence that the concept of genius and originality, less possible in a craft tradition,
flourished at this time.
In presenting a work of art before the public, the artist is claiming—or hoping—that
it is worthy of their undivided attention and will richly reward it. Thus, at a concert of
contemporary music at the Huddersfield Festival in 2011, I was struck by the way that
542 andy hamilton
programming a concert of art music, in which the audience is meant to be silent and
attentive, implies a demand on them by the artwork—one which, as on this occasion,
might be justified, if the quality of the works were not high. In contrast, muzak in a bar, or
Tafelmusik at an eighteenth century aristocratic banquet, make no such claim. Similarly,
a painting in an art gallery makes the claim of art, while a kitsch reproduction at a cheap
furniture store does not. That is one sense of “the claim of art.” It is the claim that an
artwork makes on us, as opposed to the claim involved in calling something an artwork.
Some proponents of high art might argue that improvised music does not justify such
attention—hence its performance in clubs or bars. This criticism is now addressed by
exploring the dialectic of perfectionist and imperfectionist aesthetics.
A humanistic aesthetic rejects the artistic primacy of the musical score, espousing what
I have termed the aesthetics of imperfection. This aesthetics questions the centrality of
the Western art music tradition within philosophical aesthetics and argues, with Ted
Gioia, that despite its formal deficiencies, we are nonetheless interested in the “imper-
fect art” of improvisation. Gioia originates the term “the aesthetics of imperfection,” and
defends it against what he calls “the aesthetics of perfection,” which takes composition
as the paradigm (Gioia 1988).
The aesthetics of perfection emphasizes the timelessness of the work and the author-
ity of the composer and, in its pure form, is Platonist and antihumanistic. In contrast,
the aesthetics of imperfection is more consciously humanistic. It values the event or
process of performance, especially when this involves improvisation—though these
opposites turn out to be dialectically interpenetrating. Thus, the contrast between
composition and improvisation proves more subtle and complex than Gioia and other
writers allow. The focus in this chapter is principally on jazz and related popular music,
but much of the discussion is applicable to other kinds of improvised music.
The opposition between these rival aesthetics became sharpened and intensified in
the West during the nineteenth century with the increasing specification and pre-
scription that musical notation placed on performers. The process reached its high
point during the later nineteenth and twentieth centuries, being associated with the
increasing hegemony of the work-concept. An artistic practice that had once involved
improvisational freedom for performers became limited to interpretation of an essentially
fixed work. The dichotomy between improvisation and composition lacked its present
meaning, or perhaps any meaning at all, before this process was well advanced: “By
1800 . . . the notion of extemporization acquired its modern understanding [and] was seen
to stand in strict opposition to ‘composition’ proper” (Goehr 1992, 234). Philosophers have
tended to neglect improvisation as a contrast to composition. In Scruton’s The Aesthetics
the aesthetics of improvisation 543
Music need not be performed any more than books need to be read aloud, for its
logic is perfectly represented on the printed page; and the performer . . . is totally
unnecessary except as his interpretations make the music understandable to an
audience unfortunate enough not to be able to read it in print.
(Gould in Bazzana 1997, 20–21)
What does spontaneity amount to in improvised performances? And how does it matter
aesthetically? These questions bring us to the heart of the concept of improvisation.
Those who adopt a purely causal account of the concept of improvisation imply that its
the aesthetics of improvisation 545
presence is of little aesthetic consequence. Thus, Cavell claims that the standard concept
“seems merely to name events which one knows, as matters of historical fact . . . inde-
pendent of anything a critic would have to discover by an analysis or interpretation . . . not
to have been composed.” And Eric Hobsbawm writes: “There is no special merit in
improvisation. . . . For the listener it is musically irrelevant that what he hears is improvised
or written down. If he did not know he could generally not tell the difference.” However, he
continues, “improvisation, or at least a margin of it around even the most ‘written’ jazz
compositions, is rightly cherished, because it stands for the constant living re-creation of
the music, the excitement and inspiration of the players which is communicated to us.”6
The concept of improvisation does have an essential genetic component—a succinct
definition would be “not written down or otherwise fixed in advance.” A purely genetic
account claims that whether a performance is improvised may not be apparent merely
by listening to it, and adds that the mere fact that a performance is improvised is not an
aesthetically or critically relevant feature. The account diagnoses what amounts to an
“intentional fallacy” concerning improvisation—reminiscent of the suggestion that
extraneous knowledge of authorial intention is irrelevant to critical evaluation.
The genetic account exaggerates the extent to which improvisation is undetectable,
however. There is a genuine phenomenon of improvised feel, gestured at by Hobsbawm’s
comments on what improvisation symbolizes. In The Art of Improvisation from 1934,
T. C. Whitmer offered a set of “General Basic Principles,” which included the expression
of an aesthetics of imperfection:
Don’t look forward to a finished and complete entity. The idea must always be kept
in a state of flux. An error may only be an unintentional rightness. Polishing is
not at all the important thing; instead strive for a rough go-ahead energy. Do not
be afraid of being wrong; just be afraid of being uninteresting.
(Whitmer in Bailey 1993, 48)
From this feel arises the distinctive form of melodic lines and voicings in an improvised
performance. Lee Konitz describes a “very obvious energy” in improvisation, which he
believes does not exist in a prepared delivery: “There’s something maybe more tentative
about it, maybe less strong or whatever, that makes it sound like someone is really
reacting to the moment” (Konitz in Hamilton 2007b).
One might say of a purported improvisation “That couldn’t have been improvised”—
meaning for instance that the figuration is too complex or the voicings too clear to be
created under the constraints of an improvised performance. (Perhaps a genius such as
J. S. Bach could do so.) Conversely, an improvised feel might be present in prepared
playing that takes improvisation as its model, or where a composer is looking to create
an improvised effect. The fact that the performance was not improvised might justifiably
alter one’s view of the skill of the performer; but there is a more elusive sense in which it
matters aesthetically. The artistic ideal of spontaneous creation is one factor that sepa-
rates improvised art music from entertainment. The entertainer, in contrast, perfects a
prepared routine and sticks with it, in the knowledge that it works—a “bag of tricks”
model of improvisation. Routines are avoided by the “modernists” who reject the culture
546 andy hamilton
industry—jazz musicians such as Bill Evans, Paul Bley, Lee Konitz, and others who
disdain flashy virtuosity.
There are various senses in which improvisation matters aesthetically, therefore. Even
assuming a viable notion of “extraneous” knowledge, claims of an intentional fallacy are
not vindicated. They are further undermined when one comes to consider the role of
preparation. Cavell and Hobsbawm seem to subscribe to the “instant composition” view
of improvisation. In my criticism of this view I will develop a positive definition of
improvisation in terms of improvised feel. A continuum of composition and improvi-
sation is reflected in the idea of different kinds of preparation for performance.
is precisely intended to keep them from playing what they already know. Thus, there is a
relation between preparation and performance not envisaged by Carter and Boulez—
nor by the polar opposite of their view, the pure spontaneity assumed by a full-blown
aesthetics of imperfection. Mediating the extremes of perfection and imperfection yields
the following picture. Interpreters think about and practice a work with the aim of giving
a faithful representation of it in performance. Improvisers also practice, but with the aim
of being better prepared for spontaneous creation. Many improvisers will formulate
structures and ideas, and, at an unconscious level, these phrases will provide openings
for a new creation. Thus, there are different ways for a performer to get beyond what they
already do, to avoid repeating themselves. For the improviser, the performance must feel
like a leap into the unknown, and it will be an inspired one when the hours of preparation
connect with the requirements of the moment and help to shape a fresh and compelling
creation. At the time of performance, they must clear their conscious minds of prepared
patterns and simply play. Thus, it makes sense to talk of preparation for the spontaneous
effort. As Lee Konitz puts it, “That’s my way of preparation—to not be prepared. And that
takes a lot of preparation!”8 This is the qualified truth in Busoni’s claim, discussed earlier,
that improvisation is valuable because it is closer to the original idea.
We now consider the relation between the aesthetics of imperfection and the status of
improvised music as art music. In particular, in what sense is jazz an art music? The jazz
historian Scott DeVeaux writes,
The rapid acceptance of bebop as the basic style by an entire generation of musicians
helped pull jazz away from its previous reliance on contemporary popular song,
dance music, and entertainment and toward a new sense of the music as an autono-
mous art.10
Jazz became an autonomous art, one with a fairly capital “A”—a practice involving skill,
with an aesthetic end that richly rewards serious attention. Like Ming vases and Ancient
the aesthetics of improvisation 549
Greek sculptures, its products are now accepted as (high) art even though its creators
possessed no such concept.
However, many have reservations about describing jazz as an art music; even more so,
about describing it as a classical music. Its products have many of the features of art
music, despite evidently being less contrived than the great works of the Western canon.
Historically, jazz has drawn for its material on the charms of ephemeral pop music,
whose charms arise from their powers of association for individual listeners—what has
been described dismissively as the “potency of cheap music.” When those materials are
used as they are in jazz, an art of great power can be created. The present situation is
more complex, but jazz still provides a case study of the dialectic between popular and
art music. This dialectic gives rise to central aesthetic questions, much-discussed in
musicology and sociology of music, but whose deeper roots philosophical aesthetics
tends to neglect. My suggestion is that jazz shares some of the features of Western art
music—that apparently unique, autonomous art music that contrasts with nonautono-
mous art musics such as gagaku, courtly gamelan, and Indian art musics.
The claim that jazz is a classical music commonly means:
1. Jazz is a serious art form whose long association with the entertainment industry
is no longer essential—in Adorno’s language, it is an autonomous art.
2. It has arrived at an era of common practice, which is codified and taught in the
academy.
3. It has a near-universality and constitutes an international language, transcending
national and ethnic boundaries.
It might be questioned whether any art music—whether Western art music or jazz—
has feature 3. Western art music is not widely appreciated in India, for example. We are
speaking of near- or relative universality, therefore. Features 1–3 apply only partially to
non-Western art musics such as Korean or Japanese classical music, or courtly gamelan.
I suggested that the latter art musics are nonautonomous, but one could argue that all
of these musics developed from a folk or popular music to an autonomous art music.
However, during the twentieth century, jazz acquired the universal status that was previ-
ously the claim solely of the Western classical tradition. Feature 3 is neither necessary
nor sufficient for a genre to be a classical music. It is not sufficient, because rock and roll,
for instance, has a universality, is an international language, but does not—with limited
exceptions, perhaps including vocational courses—constitute an art music taught in the
academy, and is not as separate from the entertainment industry as jazz is. Nor is it
necessary, because Indian art musics do not constitute a universal language. Ascribing a
“universal status” to Western art music will cause objections from many quarters; it
might be argued, for instance, that jazz involves a break with conceptions of “Western”
and “non-Western.” These difficult and controversial issues clearly require a longer
treatment than is possible here.
Jazz’s academic status is shown by music programs like that at Berklee, which encourage
the idea of jazz improvisation as a craft that can be taught academically. What David
550 andy hamilton
Liebman calls the “apprenticeship system”—young players going on the road with Art
Blakey, Miles Davis, and other leaders—has been replaced by an academic training.11
Another factor in jazz’s classical status is canon-creation—the ready availability on digital
media of the complete recorded history of jazz. Critics have an essential role in creating
and sustaining a canon. As Krin Gabbard writes:
The jazz history we have now really wouldn’t exist without the critics . . . would we
have Ornette Coleman without Martin Williams? There were certain artists who fit
the aesthetic and the predetermined historical notions of critics so perfectly that
they were written into the jazz canon. (2000)
We need to explore in more depth what “classical music” means. It now exists as one half
of a polarity, interdefined with popular music—each concept depends on the other.
(This claim needs to be reconciled with the fact that they did not quite originate
together.) “Classical music” means, in order of decreasing specificity:
1. music conforming to a style-period within Western art music, namely, the first
Viennese School of Haydn, Mozart, and Beethoven—music with ideals of balance
and proportion, in contrast to Baroque garishness and disproportion.
2. Western art music in general—a sense that appeared together with the developing
contrast with popular music. This is the definition understood by the ordinary
listener, for whom “classical music” denotes a range of music from Baroque or
earlier to the contemporary avant-garde.
3. music that possesses a standard of excellence and formal discipline, belonging to
the canon—the accumulation of art, literature, and humane reflection that has
stood the test of time and place, and established a continuing tradition of refer-
ence and allusion.
It was only from the early twentieth century that classical and popular music began to
be defined as a contrasting pair. Popular music is music directed at the tastes of the
mass of the population. “Popular” is normally defined in terms of scale of activity—for
example, sales of sheet music or recordings. The growing divide between art music and
popular music during the nineteenth century was deepened by Wagnerian opera and
became a rupture with the advent of modernism; for many commentators, modernist
art actively sets itself against popular culture (Sadie and Tyrrell 2004). The most influential
account of the sociology and aesthetics of the classical/popular divide is Adorno’s. He
held that, from the nineteenth century onward, all varieties of music, from folk to avant-
garde classical music, have been subject to mass mediation through the “culture industry,”
a term that implies mechanical reproduction for the masses, rather than production by
the aesthetics of improvisation 551
them. For Adorno, the divide is not so much between serious and popular music as
such—a division that has become, in his view, increasingly meaningless due to the almost
inescapable commodity character of cultural products in the twentieth century—but
rather between music that accepts its character as commodity, and self-reflective music
that critically opposes this fate, and thus alienates itself from society (Paddison 1982).
One objection to applying the term “classical music” to Western art music is the
apparent implication that it is the unique classical music—which clearly it is not.
However, I will argue that even its unique “abnormality” is now qualified by the appear-
ance of a comparably “abnormal” classical music, jazz.
Does jazz exhibit classical tendencies? Are such tendencies desirable? Factual and nor-
mative dimensions of jazz’s classical status interpenetrate but should be distinguished.
Some see jazz still poised between art and entertainment, close to popular music in the
ordinary sense of the term, and contrasting with Western art music. The jazz trumpeter
Brad Goode, for instance, writes, “most jazz musicians, post be-bop, consider themselves
to be ‘artists’ and consequently only consider the integrity of the music during their
performances,” an attitude he finds inconsistent with making a living. My view is that
jazz can be a classical music, and that exploiting the divide between the classical and
popular (in the mass sense) is one of its distinctive strengths as an art of improvisation.
Setting aside the views of those who deny that jazz could be classical because it is of
little artistic value, there are three main reasons for rejecting the classicizing tendency—
that it makes jazz elitist, or safe, or static. The final objection is the most powerful, but is
also misguided. During the eighteenth and nineteenth centuries, Western art music
entered an era of common practice based on functional harmony and the tonal system
of major and minor keys. Some argue that this era came to an end with the “emancipation
of the dissonance” by Schoenberg and his contemporaries; others hold that—concerning
music in everyday life—it is still with us.
There has been a corresponding period in jazz. Like classical music, jazz also seemingly
reached the limits of avant-gardism, though more rapidly. Conrad Cork (1996) argues
that while the evolution of jazz practice was rapid for about five decades, it became much
reduced after the 1970s, either “because the music has atrophied [or] because it has
arrived at a period of common practice, where it can function on its own terms” (73).
Just as classical tonality returned to fashion in the 1970s and 1980s, however, jazz has
seen a conservative reaction. Others are more critical of the era of common practice,
arguing that classical musics and languages are no longer created actively but are con-
served in conservatories; interpreters study the seminal texts in order to restore them to
life. Thus, Emmett Price writes, “Classical implies static, non-changing; a relic frozen in
552 andy hamilton
time. Jazz has never been static, non-changing or frozen,” while Alex Ross refers to the
“pernicious” implication that jazz “has become ‘classical’ in the pejorative sense: complete,
finished, historical.”12
This negative picture is unduly critical, I believe. Classical music is not the curatorial
exercise that these writers assume, and which the authenticity movement in early music
may appear to imply. Classical musics do not have to be “static, non-changing, frozen.”
As Parakilas argues, rather than resuscitating corpses, the classical repertory keeps
“certain old works . . . ever-popular, ever-present, ever-new. It is an idea founded on rev-
erence for the past, but not necessarily on a modern scholarly conception of history. . . .
[It may not take] notice of historical differences between one work and another within
it,” as proponents of early music do ([1984] 2004, 39).
Whether classical musics are “static, non-changing, frozen” depends on the extent to
which a repertory admits new material. Parakilas comments that such a repertory
need not be kept up-to-date with works from the period just past. The repertory of
Gregorian chant, for instance, was considered closed by the time of the Renaissance,
and performers did not sing the older chants within that repertory differently from
the younger chants, though the repertory as a whole was performed differently
from place to place and from one period to the next. (39)
I have argued that the description “classical” is benign, and that the process of classici-
zation has been a largely beneficial one. Jazz and other improvised musics do not need to
be legitimated in a practical as opposed to philosophical sense. What is in question is
not whether the music has artistic value, but how that value arises. One view is that—in
contrast to Western art music—jazz’s artistic value arises in part at least from its status as
the aesthetics of improvisation 553
improvised music. This is Gioia’s assumption, who as we saw, defends the “imperfect
art” of improvisation. On this view, spontaneity implies authenticity, and it makes sense
to talk of preparation for the spontaneous effort—Konitz’s “way of preparation—to not
be prepared.” Konitz has “complete faith” in the spontaneous process (Hamilton 2007b).
A purist version of the aesthetics of imperfection asserts essential differences between
jazz and Western art music. But there are also growing similarities arising from the
developed artistry of jazz, which means that it can be described as an “imperfectionist
art music.”
In jazz, an aesthetics of imperfection, expressed through improvisation, allows pop-
ular materials to achieve art music status. In its early decades, jazz was an offshoot of
the entertainment industry and used its materials. Jazz players later developed loftier
aspirations. As we have seen, some writers distinguish a classical art, that involves
restoration, from a living art, that involves novelty and innovation; on their view,
creativity in interpretation of a classic is the limited kind that re-enacts or reanimates.
This is a misguided account of many classical performing arts I believe. Interpretation is
neither “mechanical reproduction,” as proponents of the aesthetics of imperfection
sometimes view it, nor restoration as in the case of painting or architecture. Of course,
there are different approaches, as there are in the restoration of paintings; but no pristine
authentic performance is possible—the performing arts are inexhaustibly interpretable.
As Parakilas notes, it is the project of the early music tendency, but not that of classical
performers, to reproduce historical Beethoven performances—and even for early music
practitioners, interpretation is inescapable, and usually recognized as such.
It would be wrong to separate sharply “classical arts” and “living arts,” therefore. Against
Parakilas’s assumption that classical and new music are separate practices, they may form
a continuum, thus further undermining the rigid demarcation between classical and
living arts. In performance, the era of common practice endures, both for Western art
music and jazz. These musics aspire to exist in a “common present,” as a living art; classical
exemplars offer inspiration rather than rigid templates. The dialectic between aesthetic
perfectionism and imperfection recurs, therefore (see Hamilton 2007a). Improvisation in
jazz is perfectionist in its affinities with Western art music; while interpretation in Western
art music is imperfectionist in its affinities with improvisation. But improvisation imposes
limits on classical perfectionism in jazz. Recordings such as A Love Supreme or Mingus
Ah Um, are rightly described as “classics” since, as recordings, they are fixed in their per-
fection and work without qualification to classicize jazz. Concert recreations of A Love
Supreme reconstruct but cannot replicate the recording.
Jazz’s nature as an improviser’s rather than an interpreter’s art informs its classical
status, because improvisation is an expression of performers’ creativity. In improvisation,
the performer rather than the composer is the primary creator. In interpreted music, the
composer is the primary creator, and the performer is secondary, though still creative.
This fact sets limits to the “classicization” of improvised music, depending on whether
the performer is primarily concerned with exploring the song’s essence, or prioritizes
their own artistic self-expression. In jazz, the superiority of spontaneous creation over
prepared solos began to be stressed at the same time—during the transition from
554 andy hamilton
swing to bebop, as jazz was becoming an art music and therefore “classicized.” That is,
improvisation became valued in jazz as the music was gaining an identity beyond the
realm of entertainment and commercial commodification. This fact lends support to
the suggestion that jazz is an art music of improvisation. And in showing that improvised
performances have artistic depth, to reiterate the argument of an earlier section, I have
shown that they involve imagination as opposed to mere fantasy or fancy.
Acknowledgments
Thanks for comments and discussion with Gabriele Tommasi, Joanna Demers, Philip Clark,
Conrad Cork, Lee Konitz, Max Paddison, Lara Pearson, Lewis Porter, Brian Marley, David
Udolf, and Jeff Williams.
Notes
1. Conceptual holism is a leitmotif of Hamilton (2013a, see for instance chap. 1).
2. See Guyer, in Baldwin (2003, 728).
3. The contrast between art with a small “a” and with a capital “A,” and the nature of art
before the modern system, is addressed in Hamilton (2007a).
4. The “debate” consisted of Schoenberg writing marginal comments in his copy of Busoni’s
book; subsequent quotations are from Busoni (1962, 84) and Stuckenschmidt (1977,
226–227).
5. The former is the view of Robin Maconie (1990, 150–151).
6. Cavell, “Music Discomposed” (1976, 200); Hobsbawm quote from The Jazz Scene, first
published 1959 under the pseudonym of Francis Newton, quoted in Gottlieb (1997, 813).
7. Boulez (1986, 461); interview with the author, Usher Hall, Edinburgh International
Festival, August 2000.
8. Quoted in Hamilton (2007b); Konitz’s ideas on improvisation are discussed in chapter 6.
9. Stressed by Gunther Schuller in “The Future of Form in Jazz”(1986, 24–25).
10. http://www.oxfordmusiconline.com/grovemusic/view/10.1093/gmo/9781561592630.
001.0001/omo-9781561592630-e-1002248431. Accessed December 17, 2018.
11. Interview in Jazz Review, April/May 2008.
12. Emmett Price, http://www.allaboutjazz.com/php/article.php?id=807. Accessed April 15,
2017; Classical View; Talking Some Good, Hard Truths About Music by Alex Ross,
November 12, 1995, http://query.nytimes.com/gst/fullpage.html?res=9A00E2D61439F931
A25752C1A963958260&sec=&pagewanted=2. Accessed April 15, 2017.
References
Bailey, D. 1993. Improvisation: Its Nature and Practice in Music. Cambridge, MA: Da Capo.
Baldwin, T., ed. (2003). The Cambridge History of Philosophy 1870–1945. Cambridge: Cambridge
University Press.
Bazzana, K. 1997. Glenn Gould: The Performer in the Work. New York: Oxford University Press.
Boulez, P. 1986. Orientations. London: Faber.
Busoni, F. 1962. Sketch of a New Aesthetic of Music. In Three Classics in the Aesthetic of Music.
New York: Dover.
the aesthetics of improvisation 555
Carter, E. 1997. Collected Essays and Lectures, 1937–95. Edited by J. Bernard. Rochester, NY:
University of Rochester Press.
Cavell, S. 1976. Music Discomposed. In Must We Mean What We Say?, 180–212. Cambridge:
Cambridge University Press.
Collingwood, R. 1958. The Principles of Art. Oxford: Oxford University Press.
Cork, C. 1996. Harmony with Lego Bricks. Rev. ed. Leicester, UK: Tadley Ewing Publications.
Davies, S. 2001. Musical Works and Performances. Oxford: Clarendon Press.
Gabbard, K. 2000. Race and Reappropriation: Spike Lee Meets Aaron Copland. American
Music 18 (4): 370–390.
Gioia, E. 1988. The Imperfect Art. Oxford: Oxford University Press.
Goehr, L. 1992. The Imaginary Museum of Musical Works. Oxford: Clarendon.
Gottlieb, R. 1997. Reading Jazz. London: Bloomsbury.
Guth, C. 2010. Art of Edo Japan: The Artist and the City 1615–1868. Yale University Press.
Guyer, P. 2003. Aesthetics between the Wars: Art and Liberation. In The Cambridge History of
Philosophy 1870–1945, edited by T. Baldwin, 721–738. Cambridge: Cambridge University Press.
Hamilton, A. 2003. The Art of Recording and the Aesthetics of Perfection. British Journal of
Aesthetics 43 (4): 345–362
Hamilton, A. 2007a. Aesthetics and Music. London: Continuum.
Hamilton, A. 2007b. Lee Konitz: Conversations on the Art of the Improviser. Ann Arbor:
University of Michigan Press.
Hamilton, A. 2009. Scruton’s Philosophy of Culture: Elitism, Populism, and Classic Art.
British Journal of Aesthetics 49: 389–404.
Hamilton, A. 2013a. The Self in Question: Memory, The Body and Self-Consciousness. London:
Palgrave Macmillan.
Hamilton, A. 2013b. Artistic Truth. In Philosophy and the Arts, edited by A. O’Hear. Cambridge:
Cambridge University Press.
Hamilton, A. Forthcoming. Art and Entertainment. London: Routledge.
Kristeller, O. 1951. The Modern System of the Arts. Journal of the History of Ideas 12 (4):
496–527.
Maconie, R. 1990. The Concept of Music. Oxford: Clarendon Press.
Paddison, M. 1982. The Critique Criticised: Adorno and Popular Music. Popular Music 2:
201–218.
Parakilas, J. (1984) 2004. Classical Music as Popular Music. In Popular Music: Critical Concepts
in Media and Cultural Studies, Vol. 2, edited by S. Frith, 36–54. London: Routledge.
Sadie, S., and J. Tyrrell. 2004. Modernism. In New Grove Dictionary of Music and Musicians,
edited by S. Sadie and J. Tyrrell. New York: Oxford University Press.
Sartre, J.-P. 2004. The Imaginary: A Phenomenological Psychology of the Imagination. London:
Routledge.
Schuller, G. 1986. The Future of Form in Jazz. In Musings, 18–25. New York: Oxford University
Press.
Scruton, R. 1997. The Aesthetics of Music. Oxford: Clarendon Press.
Scruton, R. 2015. Art and Imagination: A Study in the Philosophy of Mind. London:
St. Augustine’s Press.
Stuckenschmidt, H. 1977. Arnold Schoenberg: His Life, World and Work. London: John Calder.
Subotnik, R. R. 1991. Developing Variations: Style and Ideology in Western Music. Minneapolis:
University of Minnesota Press.
pa rt V
P O S T H U M A N ISM
chapter 27
Salomé Voegelin
Introduction
This chapter tries to make a contribution to current ideas on materiality, reality, objectivity,
and subjectivity as they are articulated in the many texts on New Materialism that have
emerged recently under the auspices of speculative realism, object-orientated ontology,
complexity theory, and various other current and emerging “subgenres” that all share a
renewed interest in the status and understanding of materiality, material relationships,
and the role of the human subject in the context of a contemporary world, whose techno-
logical and actual globalization demands a new critical engagement and scholarship to
grasp the impact and to articulate the significance of its fluid interconnectedness. The
origin of the term “New Materialism” invariably gets sited in the mid- to late-1990s,
where it is associated chiefly with the writings of Manuel DeLanda and Rosi Braidotti,
whereby the newness of its project or its status as a continuation of traditional materialism
remains debated and debatable. Nevertheless, in current discourse the term acts as a
shared name for different approaches toward the question of materiality and subjectivity
in a digital age. It covers an interest in the relationship between nature and culture,
“naturecultures,”1 and brings with it a critique of an anthropocentric worldview. It is
articulated variously in relation to climate change and its amplification of ecological
consequentiality; it engages in the organization and significance of the global flow of
capital and goods, and gives words to the consideration of a concurrent fluidity or fixity
of persons; it presents new strategies to engage in issues of identity, sexuality, race, and
feminism; and it provides a framework and tools to debate and bring into association all
those issues and dynamics to grasp the world and its material reality not as a stable and
singular construction but as a matter of agency, interdependence, and reciprocity that
impact on its social and political actuality. This chapter is placed in the context of these
theorizations that deal with the relationship between nature and culture, materiality and
subjectivity, and seeks to participate in the current discourse about matter from a sonic
560 salomé voegelin
point of view. This sonicomaterialist perspective is motivated by the idea that the invisible
mobility of sound is always already critical of the dualisms of a visuohumanist tradition,
in that it is always and by necessity focused on the in-between of things: their relation-
ship and interbeing. Sound is not “this” or “that” but is the between of them, and thus it
brings with it a conception of the world as a relational field.
To probe this interpretation and try its suggestions, this text focuses on the writing
of Quentin Meillassoux, whose book After Finitude (2009) can be understood as a
central if somewhat eccentric articulation of New Materialist considerations.
Meillassoux critiques an anthropocentric view of the world, attributed by him to the
correlationism of phenomenology and metaphysics in general. In its place, he pro-
motes the mind-independence of mathematics to measure and calculate a world
before and after human experience. Thus, he sets out the possibility of a human free
conception of the world that eschews what he perceives as, on the one hand, the
“fideism” of phenomenology and, on the other, the absolutizing idealism of transcen-
dental philosophy, which, in any event, he understands to ultimately produce the
same dogmatic conceptions.
In what follows here I engage in his charge of an anthropocentric worldview by
considering the proposition of a posthumanist theorizing through a focus on sound,
creating an invisible imaginary of the material world. The contention is that the sonic
sensibility, articulated in sound practice and discourse, precedes and enables the
concerns of New Materialism. Sound’s ephemeral materiality and invisible relationality
informs the concepts, and grants perceptual access to the ideas discussed currently in
relation to materiality and subjectivity. In this sense, sound and listening establish a
proto–New Materialist sensibility that is present as a minor strand and challenge within
materialist philosophies already, but which only now, in the context of a renewed atten-
tion on agency and interdependence, is able to question its humanist rationality and
dialectical stance. Accordingly, we could consider whether, without the emergence of
sonic practice, discourse and sensibility in art and the humanities, in everyday thought,
and in science, New Materialists would find it harder to conceive of and be understood
in their articulation of “fragile things,” “speculative turns,” and “dark ecologies,” which
are some of the terms and concepts used to theorize a new materialist world. The con-
nection might not be entirely conscious; most theorists writing on materiality today
might never have thought to listen, but it might nevertheless be an important if some-
what subliminal influence: a hidden Zeitgeist, something in the air that has shifted focus
away from the apparent certainty of what we see unto more ephemeral and darker
structures that might well sound, or for whose fragility the material of sound might
serve as metaphor. Thus, I would like to contend that New Materialism presents a
quasisonic consciousness of the invisible, the relational, the dynamic eventness of
things, their predicativeness, and duration.
The aim however is not to prove the superiority of sound as a concept and theoretical
device. Nor is it my intention to produce an essentialized position. Rather, as with
much of my work, the objective is to revisit the nominal and habitual reality of things,
sonic materialism 561
so often set within the boundaries and certainties of a visual language and anchored in
the visual witnessing of the object itself, in order to articulate another possibility of
what there is.
A sonic sensibility invites a different view. It generates a world of fleeting things and
coincidences that demonstrate that nothing can be anchored, and everything remains
fluid and uncertain, not necessarily as precarity, as a state of anxious fragility, but as a
serendipitous collaboration between the multiplicities of the “what is.” Sound, I will
argue, aids the reimagination of material relations and processes. It makes appreciable
other possibilities of how things might be and how things might relate, and serves to
consider positions and positionings of materials, subjects, and objects in a different and
more mobile light.
I will argue however that the fluidity proposed and the relations intimated are not, as
Meillassoux might fear, the fanatical and egocentric imaginings of a correlationist in
search of a de-absolutized world. But neither do I seek shelter in his “mathematical
world” that has expunged humanity from any involvement in the what is. Rather, I
believe that listening as an attitude to the world practices the ambivalence between
measure and experience. And it is in practicing rather than resolving this ambivalence
that we can reach what, at this moment, appears incommensurable, merely possible and
even impossible, to diversify the rationale of logic and reason itself rather than disappear
in a plurality of factions.
A sonic materialism thus presents not an absence of reason in immersive noncriticality
and fanatical egotism. Instead, it foregrounds personal responsibility and participation:
not to deny our being in the world and the world being for us what it is through our
being in it, but to embrace the human ability to think this position as relative rather than
central; to appreciate our responsibility in how the world is: politically, ecologically, and
socially; and to initiate change and a different attitude, rather than withdraw into an
infrastructure of numbers and codes, which as I will argue, always and unavoidably are
the design of a human-thought world. An auditory imagination does not produce
Meillassoux’s “fideist obscurantism” of a proper truth, his conception that phenomenology
and metaphysics depend on belief and piety instead of truth, thus denying factuality and
reason their singular condition of possibility in favor of unlimited irrationality and
fanaticism;2 and neither does it engage in the “communal solipsism” that he attributes to
them. Instead, a sonic conception and sensibility of the world is the point of access to
pure possibility as actuality.
This chapter will elaborate on these ideas through the practice of listening to three
sound art works: my audition of Toshiya Tsunoda’s Scenery of Decalcomania (2004), an
album of seven tracks, allows me to enter the world by its vibrations and to hear its space
as events and interactions; listening to the sound transmitted through the porous body
in the performance Ventriloqua by Aura Satz (2003), I am initiated into a place of other
voices; and my absorption in the pulsating drip of Anna Raimondo’s rhythmic words in
Mediterraneo (2015) makes me hear the relationship of language, materiality, and
belonging through their fluid boundaries.
562 salomé voegelin
In his widely discussed and oft-quoted work After Finitude, which could be described as
an infectiously peculiar cornerstone of New Materialism, Quentin Meillassoux sets out
an argument for ancestrality, the measure and articulation of a world anterior to humanity,
in order to achieve the principles of a human-free conception of the world. The materi-
als and events of such an anteriority he calls “arche-fossils,” and he wants them to be
understood not simply as present traces of the past, but as indicative of a logic and rea-
son able to grasp the anterior without a present human experience. The aim throughout
the book is to generate the condition of this nonhuman ancestrality, to be able to reach
beyond ourselves into a space devoid of ourselves that might ultimately shed light not
only on what was but also establish an understanding of the “what is” without the
specter of human perception.
His anteriority encapsulates an ulteriority too, and together they generate a concep-
tual space beyond finitude, whose content, material, and organization is experientially
inaccessible. This inaccessibility gives cause and justification to his critique of corre-
lationism and leads him to propose the mathematizing of nature: to establish the sta-
bility of its laws as “a mind-independent fact . . . that is indifferent to our existence”
(Meillassoux 2009, 127) and thus capable of making accessible a world without us
through speculation that excludes metaphysics and thus excludes the human point of
view and finitude.
His argumentations for an after finitude begins with a critique of the strong correla-
tionism of phenomenology and other metaphysical philosophies, which he understands
to occur as a counter to the absolutism of transcendental idealism and to result in equally
dogmatic fanatisms. While he appears to agree with the need to critique transcendental
universalisms, and the dogma of the absolute, he is looking for another solution based
on facticity and the contingency of facticity: on the fact that the world “is there,” rather
than on my own contingency in a world that “is there for me.”3
In relation to this, strong correlationism presents itself as the dogma of a contingent
perception that does not appreciate that things might be otherwise than they appear to me.
In other words, it appears to leave no room for speculation: for a speculative materialism
that can gain access to the anterior, the nonhuman world without making it “wholly
other.” Meillassoux seeks to overcome this problem with metaphysics by promoting
decorrelation through data and numbers. Leaving aside for now the question of science’s
truth and objectivity, its supposed nonanthropocentrism, and whether it does indeed
represent the mind-independent facts on which his thesis hinges, the problem of corre-
lation and the device of ancestrality is intriguing and useful to developing sound’s
contribution to New Materialism.
Meillassoux’s After Finitude lends inspiration to the aim of a non-human-centered
conception of a sonic world. His ancestrality offers a conceptual space to the methodology
of a sonic discourse, allowing us to reflect on the nature of sound behind and in front of
our lives, and thus enabling us to contribute from the invisible mobility of its materiality
to the conception of a posthumanist world.4
sonic materialism 563
Arche-Sonic Vibrations
through the specificity of my encounter I appreciate the fluid and unstable reality of my
contingency as one of many, none of which are “wholly other,” and all of which are
simultaneous, each as real and each as possible as the other. As I walk around the room, I
appreciate the plurality of the work: at each point, another vibration comes to the fore
while all others remain in play. Thus, I come to physically comprehend the simultaneous
plurality of the real, and rather than reduce its vibration to a set of numbers in order to
discount my physicality on the way to a plural but factional scenery, I hear a heteroge-
neous environment.
Tsunoda’s work reminds me of my existence in an ancestral texture of sound, whose
appearance, however, is not fossilized but moves on inexhaustibly. The “after finitude” of
a sonic sensibility does not present a certain finished form; it does not present “the
material support” for the investigation of ancestral phenomena—the geological for-
mation, the fossil imprint, the density of coal, and the rings of a tree—and it does not rely
on the possibility of a pure mathematics of nature “to demonstrate the integrity of an
objective reality that exists independently of us—a domain of primary (mathematically
measurable) qualities purged of any merely sensory, subject-dependent secondary
qualities” (Hallward 2011, 140) such as smell, sound, and touch. But while, as Peter
Hallward continues, the thing measured is indifferent to it being measured or what it is
measured as, the idea of measuring is absolutely subject-dependent.
The arche-fossil presents a reduction and deformation of the thing into its measure
that is akin to the reduction of Tsunoda’s sounds in the closed-offness of the headphone
or the absorption and deadening of sound in the acoustic isolation of the anechoic
chamber. Without the reverberation of sound within its environment, as concept and
actuality of material connection and exteriority, the vibrational thing does not expand
into its formless capacity but deforms into the condition of its measurement. And while
this conjecture might shed light on a world without human experience, since it is still
calculated from a human point of view, through the subject-dependent idea of measur-
ing, it does not enable access to a nonhuman world. The assumption would be that a
world without humans is a world without experience and possibilities unless they are
strictly speculative; the contingency of facticity rather than of the material itself. By
contrast, the ancestrality of sonic vibration is the phenomenon of its material, which is
infinite; it sounds now as an arche-sonic that brings me to the consciousness of a before
and after through my equal participation in its present texture.
In the texture of the world as a vibration-environment, possibilities do not negate
each other, causing plurality as dissent and factionality, which inevitably leads back
to strong and contested territories and identities. Instead, they trigger nonselective
connections and serendipitous collaborations between invisible things whose tex-
tures show me my responsibility and instill the humility of my own reflection.
Vibrations are the ground on which communication and communality is sought
rather than found. In this point, my motivation for a sonic materialism answers
William E. Connolly’s invitation to “respond to the charge of anthropocentrism in
order to fold more modesty into some traditional European modes of theism and
humanism alike” (Connolly 2013, 400).
566 salomé voegelin
Porous Bodies
The ancestrality of sonic vibration, its inexhaustible texture that sounds as an arche-sonic,
has not only the capacity to make accessible an anterior or ulterior world, to effect in me
the consciousness of a before and after terrestrial life. It also makes accessible an over
there and another place: an “extra-terrestrial” life, alien forms, and unknown things. In
other words, sonic vibrations, the arche-sonic, calls into the realm of the possible also
the impossible, that which for physiological, ideological, aesthetics, sociopolitical, and
economic reasons we cannot or do not want to hear, the possibility of its sound is central
however to a materialist critique of a human- centered rationality.
relation to sonic materialism and the idea of an egalitarian sonic-texture. Satz, and each
of her subsequent pregnant stand-ins, are not the performers of the work, they are its
conduit; they are a social conduit and vessel for another voice rather than the contingent
formlessness of their own particular articulation.
The pregnant form reclines on the chaise longue, with one hand she holds on to the
antenna of a Theremin placed on a tripod next to her. Holding on to it her body becomes
the extension of the instrument as another antenna. In this way, she opens up her own
sonic range to the Theremin that, in turn, is calibrated on her body. This calibration is
unstable and needs constant retuning as the human body presents an inefficient conduit
in the sense that it is not finely tunable but brings its own disturbances to the performance.
This inefficiency demonstrates the fluctuating and mobile capacity of physicality and
indicates the illusion of pure mediumship: the channeling of another voice, a separate
spirit, or of ancestral data, without the impact of the medium itself. It puts into doubt the
sustainability of mind-independent facts, and stresses the ambivalent relationship
between the voice and the unvoiced, between the present and the absent, which are not
absolute but ideological and applied.
The body as Theremin is controlled by the performer in close proximity but without
physical contact. In this first enactment of Ventriloqua, the Thereminist Anna Piva plays
the body by moving her hands just above the skin protruding through the sequined gap
in the costume. Her hands move through pronounced physical gestures, producing the
visual shapes that play invisible oscillations and amplitudes. The electric signals thus
generated are sent to an amplifier and emit via loudspeakers sounds that emerge as
modulating tones and surging vibrations issuing from the skin and into the auditorium.
The atmosphere is séance-like: the room is darkened and a single light illuminates the
protruding white globe of skin as it is made to sing. This turn into darkness carries an
occult undertone that pervades the work’s performance. The voice of the unborn as alien
spirit, is called into the room through an act of mediumship. Its inaudible voice is seem-
ingly channeled through the Theremin-body and made to speak pre-birth.
There is the potential that the making audible of the unheard, rather than pursuing a
posthumanist equality of materiality and an inclusive politics of the voice, steps into the
mystical and fanatical that Meillassoux ascribes to strong correlationism and that
Theodor Adorno fears in relation to astrology and the occult. The parallel is intriguing.
Both correlationism and the occult are responses to a philosophical rationality of
absolutes that leaves no room for faith, for contingency and self, and yet, according to
both Meillassoux and Adorno, each in turn ends in its own dogmatic obscurantism.
Adorno, in his text The Stars down to Earth, writes against mysticism and the occult as
the cornerstones and antecedents to fascism and totalitarian governance. Focusing on
the pervasiveness of astrology through a study of daily columns of “Astrological
Forecasts” by Carroll Righter in the Los Angeles Times, Adorno produces a Thesis
Against the Occult in which he argues that monotheism is decomposing into a second
mythology that separates the spirit from the body, the material experience of the world, and
that critiques materialism while seeking to “weigh the astral body” (Adorno, 2004, 177).
In other words, his thesis, developed over nine key observations, ridicules the occult
sonic materialism 569
as a “metaphysics for dunces” that draws its rationality from the irrationality of a fourth
dimension, a nonbeing that claims to answer all the questions about the material world.
His critique remains serious, however, since he fears:
direct and unaffected conduit to astral and mathematical systems, free of human design
and intervention, is indeed possible, and that what we measure and hear are the unfettered
computations of ancestrality and the true voices of the spirit world.
In response to this we need to take care that materialism does not result in ventriloquism:
the speaking for something/somebody else through a human-designed channel of
spirituality or calculation masquerading as mind- and body-independent fact, a process
that equates to a hyperanthropocentrism hiding in a mythical or mathematical under-
growth. Instead, we need to remind ourselves of Connolly’s call for more modesty about
our status in the world in relation to the “traditional European modes of theism and
humanism” to grasp responsibility and pursue a different relationship between the
voiced and the unvoiced.
Within this objective, ventriloquism becomes a useful device and metaphor to
conjure the other not as a separate other, neither a spirit nor a measurable quantity, but
as a voice that sounds simultaneously but is not heard: an extrasocial rather than an
extraterrestrial, whose sound thickens the perceived reality of the world through the
actualization of its impossibility. In this sense, listening to the ventriloquist sharpens
our sensibility and care, and fosters a practice of listening-out for the unheard or the
overheard to draw the inaudible as another possibility from the impossible into the
simultaneous plurality of the actual.
Ventriloqua, as a listening-out for the unheard materialities of this world, defines a
useful attitude to material relations as well as toward notions of presence and absence
understood not as dialectical absolutes but as the possibilities that can, and the possibilities
that cannot make themselves count in the actual world. As one defined inaudible voice
is sounded through the Theremin, that of the unborn child, we are reminded of other
voices, historical and present, which have not been heard. And as the threshold of
possibility becomes porous, impossible things start to present themselves in the
sonic-vibrations of the actual world.
Political Textures
Sonic vibrations, the arche-sonic texture of the world, reveal seemingly impossible
modulations that are not reducible to the volume of past sounds or the spirits of other-
worldly voices, but demand they be heard within this world. And thus, within the texture
of this world, appears that which we cannot or do not want to hear and which demands
to be heard, to make itself count as a slice of the real. This forceful appearance of impossible
things in the midst of our actual world challenges the notion of difference and distance:
two terms and values that are at the center of the humanist project that seeks to know the
world through the rationality of differentiation and the ability to read its relationships as
the distance between objects. Listening-out for inaudible things, as a sonic-materialist
attitude, by contrast seeks to understand the impossible through its proximity to my
own impossibility. In sound, we do not meet as difference or similarity, but negotiate
sonic materialism 571
who we are in a meeting that is primary, before definition, again and again, seeking
invisible and tentative recognitions of what we might be in the practical equivalence of
its texture.
Anna Raimondo’s work Mediterraneo, from 2015, brings us to the vibrations of the
unheard that texture a current sociopolitical reality but lack their own articulation. Her
voice, repeating over and over again the word “Mediterraneo,” takes us to the center of
the liquid expanse that is not simply between Africa, the Middle East, and Europe, a
mere connecting and separating passage, but is the material and metaphor of their
relationship as a deep and treacherous “what is.” Listening hears not one against the
other or their separation, but hears the in-between, the relationship, as the material of
the continents’ contingent facticity.
Listening to her voice, I suspend my belief in what I know to be on either side. I find a
focus not in their distance and what that denotes, but hear in the materiality of their
primary relationship other possibilities of what they could be.
In sound, the Mediterranean is the crossing not the crossed. It is not the infrastructure
of connecting and separating, a bridge between continents that enables us to cross while
at the same time maintaining the distance that exists in the first place, determining
either side through the actuality of what it is not. Rather it is a volume, a material
inhabited in listening, whose traveling within is not about my purpose or provenance,
and it is not about my sameness and their otherness: the real actuality of this continent
and the apparent impossibility of that, but about the possibility of the water’s own
expanse and how time and space define things together.
The crossing enables simultaneity. It performs the intertwining of the self with the
world, and of the continents of the world with each other. These continents are not
absolute territories but are expansions of each other whose impossible meeting points
sound in the middle of the sea. At the same time, this self is not a positive or a negative
identity, and neither is it an anthropocentric definition, but an uncertain and contingent
subjectivity, constituted in an inhabiting practice of perception that is crossing
boundaries not to measure and name but to engage in their watery depth to understand
the defining lines through the self ’s coinciding with them, rather than dispassionately
and from afar.
Distance creates the distortion of dis-illusions, which promises resolve once we step
closer. By contrast, the simultaneity of inhabiting creates the dis-illusions of plural
possibilities that are not resolved into one singular and actual real—war, fighting, right,
or wrong—but that practice the inexhaustible ambivalence between measurement and
experience: what something is as numbers and what it appears to be in perception, so
that we might understand and respond with engaged and practical doubt to what seems
incommensurable from ashore.
On a bleached-out white background we see a glass slowly, drip by drip, filling with a
blue liquid that, as the poet Paul Claudel would say, has a certain blue of the sea that is so
blue that only blood would be more red. And as the sound of dripping water slowly fills
the glass, Raimondo’s voice catches her breath, accelerates, slows down and stutters,
speeds up again, and repeats and repeats “Mediterraneo” until her voice is drowned in
572 salomé voegelin
the water she has conjured with her own words. Until then, on the unsteady rhythm of
her voice, we are pulled through the emotions of fear, excitement, hope, and death that
define the Mediterranean as the liquid material that is “the between” of Africa, the
Middle East, and Europe today, and whose material consequence does not stop at the
coastline but offers us the texture to hear its vibration and to understand how we are
bound up with it.
Raimondo’s work brings us into the urgency of the situation through the focus on the
materiality of the sea as the common texture of the adjoining continents rather than
through the confrontations of their different shores. The repetitive mantra of her voice
entreats me to enter into the water in order to—from within the fluid materiality—
understand physically the complexity of its fabric, form, and agency: of what it weaves
together formlessly rather than what it is as a certain form; and in order to suspend what
I think I know of it and pluralize what it might be as the invisible organization of different
things: salt, water, waves, holidays, routes of escape, yachts, aquatic life, sand, handmade
dinghies, dreams, and desperation. Listening, I am persuaded to understand these
things in their consequential and intersubjective relationships: what they sound
together as sonic things and what thus they make me hear.
Sound creates a vibrational-texture of the processes of the world that I hear coex-
tensively and to which I am bound through my own sound. By contrast, a soundless ocean
pretends the possibility of distance and dissociation, to be apart as mute objects and to
be defined by this distance. The absence of sound cuts the link to any cause and masks
the connection to any consequences. Thus, a mute Mediterranean enables my withdrawal
from the sociopolitical and ecological circumstance of its waves and permits the
rejection of my responsibility in its unfolding.
Raimondo composes, from the hypnotic rhythm of her voice and the steady dripping
of blue water, the political reality of the Mediterranean. Slowly submerging, with her
words, into the deep blue sea, I abandon my reading of its terrain within the rationale
and reason of existing maps and come to hear its texture as woven of unresolved
material and positions. I do not follow its outline but produce a dark and mobile geography
of the Mediterranean as a formless shape, whose possibilities and impossibilities
undulate to create a fluid place that defies calculation but calls forth an attitude of
listening-out to understand where things are at and to take responsibility within that
invisible factuality: within this dark and mobile geography, we hear, as Connolly suggests
we should, “the human subject as formation and erase it as a ground” (Connolly 2013, 400).
In the watery depth of Mediterraneo, humanity appears as formless form that has lost
the access to its grounding in the traditions of knowledge and established canons of
thought, in political certainties and journalistic judiciousness, as well as in relation to
historical and geographical identities. Instead, the rhythmic drip, drip, drip, and the
reiteration of its name call for another ground, a groundless ground of invisible pro-
cesses based on the responsibility of a practice-based subjectivity that appreciates the
consequentiality and intersubjectivity of things without controlling them.
Having been transported into the middle of the sea by Raimondo’s audiovisual work,
we can hear the world as the vibrational-texture that binds us all and everything into an
sonic materialism 573
ecosystem of invisible processes. This does not mean that some do not have more power
than others. Simultaneity does not prevent hierarchies. Instead, the simultaneity of
the sonic-texture makes visible the interdependencies of power, organization, self-
organization, and control and provides an opportunity to revisit economical and political
values that depend on the divides and distances established in a humanist philosophy
and perpetuated in the ecology of the visual. A sonic reality emerges not from maps and
words but from the fluidity of blue liquid and the drowning of the voice. And as the
fluidity gives access to a groundless world, a world without a priori reason and rationality,
the drowning words do not fade but re-emerge in the plurality of the inaudible.
The posthumanist impetus of sonic materialism does not expunge the human but
shakes the ground he stands on to make himself taller. This is not so that no ground can
be established but rather enables the grounding to become practice-based, contingent
and plural, based not on mind-body-independent speculation but the suspense of
habits and the beginning of doubt, and that includes doubt in the normative habits of
a singular authorship.
Conclusion
the material through their own “disturbances” that manipulate and distort the others’
voices and construct a hyperanthropocentric ventriloquism that fails to see the impact
of its measure on the heard. Consequently, a sonic materialism does not pretend to be
able to speak for the other, it does not ventriloquize, and instead calls for an attitude of
“listening-out for,” a stance of care and humility that hears the possible and the impossible
in the vibrational texture of the world. This texture interweaves the voiced and the
unvoiced as reciprocal and simultaneous things that are not hierarchical but speak of the
hierarchies of the world.
The aim is to hear a plurality of authorships and acknowledge the self-authoring of
nature and of material that we can translate carefully, as Tsunoda does in his vibration
recordings, to make them accessible and thinkable, always in the knowledge, however,
that there are no mind-body-independent facts but that our body and mind will always
diffuse and influence what it is we hear.
In this sense, sonic materialism is a phenomenological materialism, which is not a
contradiction but an acknowledgment of the subject as thing thinging amid other things
and an articulation not of its control over the material world but of its responsibility
within it. Materialism is thus a relationalism, not of different things but of things
together. The material is not an entity but is the vibrational texture that things create
simultaneously through the “equal differences”7 produced in their encounter with each
other rather than beforehand.
I comprehend the alterior and ulterior, as well as the extrasocial, not as human exclusive
domains of numbers and spirits but through my position in the flow of their vibrational
texture. The arche-sonic weave of this texture holds the possibility of the before and after
as well as of the over there. It produces the concept of my finitude not as an absolute but
as an element of its infinitude that is accessible to me through the continuous processes
of reciprocation and generation of material relations within which I exist as a thing
among other things.
Phenomenological ancestrality is the before and after accessed through the inex-
haustible formlessness of a present sound that I inhabit in intersubjective contiguity.
The mathematical ancestral and the spiritual astral by contrast rely on distance and
absence to assure and assert their measurement of the real. In this sense, they are entirely
visual concepts: they overcome, mathematically and through mediumship, a temporal
or spatial distance in order to know and sense a place or a thing that is nominally
without them; and while this might make the other talk and the ancestral yield its
measure, his voice and computation is channeled through the distance needed for its
reach in the first place. This distance is at the basis of a visual materialism that seeks to
omit the human but keeps the gap and difference between things that serve human
articulation, measurement, and thought.
A sonic materialism does not start from this distance but from within the texture of
the world, which includes me simultaneously as a thing in the weave of things.
Interwoven in its flow, I understand the contingency of my position not as absolute, as a
position for me, but as a matter of the facticity of the world, which thus becomes accessible
sonic materialism 575
to me as a proximity where the measure is not between things, or between me and the
world, but is the relationship that we form. Thus, there is no need to overcome a distance
in order to understand the mobility of the world. There is no sonic sublime that shapes
the conceptual ground of articulation and propels perception toward idealism. There is,
instead, embedded doubt, the suspension of habits and norms, which produces a
groundlessness that encourages not just a plurality of voices but a plurality of rationales
and reasons that hear and value their speech. I practice this plurality on the ambivalence
between measurement and experience, producing a complex sociopolitical texture from
arche-sonic weaves that bind me into my responsibility within its inexhaustible flow.
Where New Materialists theorize as speculation, I practice in doubt; and where they
are in search of the infinite, the anterior and ulterior condition of thought and existence,
I focus on the inexhaustible nature of sound that exists permanently in an expanded and
formless now that I inhabit in a present that continues before and after me.
In short, sonic materialism builds on the groundlessness of an auditory imagination
the critical attitude of a “listening out for” rather than an occult dream. And while I do
not share Meillassoux’s mathematical speculation, I share in his desire for a philosophical
position of infinity that serves to acknowledge that there is “more” than we can see and
experience. And I take this more to be the start rather than the conclusion of our
appreciation and participation in the material world.
Raimondo’s piece makes us aware that the world entered via such a listening attitude,
as sonic sensibility and Zeitgeist, is rather darker and deeper than first imagined. The
sonic is not self-certainly benign, peaceful, egalitarian, and just. Instead, it reveals the
conspiracies of the visual world and probes the political expediency of class systems,
dividing and ruling, in a sea of blue.
Notes
1. The term “naturecultures” was coined by Donna Haraway in The Companion Species
Manifesto (2003). It expresses a reciprocal and nondialectical entanglement of nature and
culture, body and mind, and so forth, and proposes a rethinking of the broader modernist
ideology represented in these dualisms.
2. Meillassoux justifies and contextualizes his turning away from philosophical thought
toward mathematical speculation by explaining,
it would be absurd to accuse all correlationists of religious fanaticism, just as it
would be absurd to accuse all metaphysicians of ideological dogmatism. But it is
clear to what extent the fundamental decisions that underlie metaphysics invariably
reappear, albeit in caricatural form, in ideologies, and to what extent the funda-
mental decisions that underlie obscurantist belief may find support in the decisions
of strong correlationism. (Meillassoux 2009, 49)
He further states “that thought under the pressure of correlationism, has relinquished its
right to criticize the irrational” (45) and that, paradoxically, a philosophy, phenomenology,
which sought to critique the absolutism and dogmatism of transcendence “has been
576 salomé voegelin
transformed into a renewed argument for blind faith” (49). It is therefore that, instead of
seeking insights into a post- and prehuman world via philosophy, he employs the mind-
independent sphere of calculation and measurement to argue its “proper” truth.
3. In the course of his book Meillassoux develops facticity, the pure possibility of what there
is, into the notion of factuality understood as the speculative essence of facticity: the fact
that what there is, cannot be thought of as a fact but is a matter of nondogmatic speculation,
a speculation that he ultimately pursues via mathematics.
4. The notion of posthumanism here does not refer to a world without humans, but to the
project for a different scholarship and sensibility, initiating a different philosophy that
does not simply continue the humanist path of an anthropocentric rationality and reason
by denying the hyper nominal subjectivity of philosophical tradition while perpetuating it
through the authorship of that very denial, but by considering a decentered human
subjectivity that lives not at the center of the world but is centered by it, aware of its
responsibilities, and humbled in its equivalence with other things.
This posthumanism acknowledges that the human at the center of humanism is not
every human, but a clearly demarcated and privileged identity: a tautologically privi-
leged subjectivity based at the center of humans’ own discourses that places them supreme
in the nominal understanding of the world that their very philosophy creates. Instead, the
aim is to contribute to the conception of possible philosophies whose objectivities and
subjectivities are plural but not factional and that are aware of the inevitable exclusion of
one point of view by another and are thus engaged in philosophy as a field of blind spots
that are practiced rather than theorized.
5. This interpretation of ancestrality as a visual consciousness does not outline a sonic essen-
tialism, and neither does it represent a critique of visuality. This text does not pitch visuality,
vision, or a visual literacy against sonicality, hearing, and a sonic literacy. Rather, the
critique of the visual as it is implied here is not a critique of its object, what we see, but of
its practice, the way we look and what we look for understood as cultural and ideological
practices.
The suggestion is that the ancestral, as it is staged and used by Meillassoux, relies on
narrow channels of vision that deny much of what else could be seen. In response, this
chapter promotes a sonic sensibility and engagement with the material world that achieve
not a blind understanding of its processes but augment the way we see the world.
6. Maurice Merleau-Ponty calls perceptual dis-illusions the probable realities of a first
appearance: “I thought I saw on the sands a piece of wood polished by the sea, and it was a
clayey rock” (Merleau-Ponty 1968, 41). To him the appearance of the piece of wood is not an
illusion, but a dis-illusion: the loss of one evidence for another. Accordingly, perceptions are
mutable and probable, “only an opinion”; but what is not opinion, what each perception,
even if false, verifies, is the belongingness of each experience to the same world, their
equal power to manifest it, as possibilities of the same world.
7. The notion of “equal difference” is articulated in my book Listening to Noise and Silence
via the equal significance of Sergej Eisenstein’s monistic ensemble of film montage, and
clarified, via Jean-François Lyotard’s agonistic play, as a nonhierarchical playful con-
flict of the sensorial material (Voegelin 2010, 141). Here, it is further developed as the
coextensive simultaneity of the material experienced and measured in a togetherness
that does not ignore difference but understands and generates it in perception rather
than takes it as a given.
sonic materialism 577
References
Adorno, T. W. 2004. The Stars Down to Earth. London and New York: Routledge.
Connolly W. E. 2013. The “New Materialism” and the Fragility of Things. Millennium Journal
of International Studies 41 (3): 399–412.
Hallward, P. 2011. Anything Is Possible: A Reading of Quentin Meillassoux’s After Finitude. In
The Speculative Turn, edited by L. Bryant, N. Srnicek, and G. Hartman, 130–141. Melbourne:
re.press.
Haraway, D. J. 2003. The Companion Species Manifesto: Dogs, People and Significant Otherness.
Chicago: University of Chicago Press.
Meillassoux, Q. 2009. After Finitude. New York: Continuum.
Merleau-Ponty, M. 1968. The Visible and the Invisible. Evanston, IL: Northwestern University
Press.
Raimondo, A. 2015. Mediterraneo. Audio-visual installation.
Satz, A. 2003. Ventriloqua. Performance.
Tsunoda, T. 2004. Scenery of Decalcomania. Album with Liner notes. Australia: Naturestrip.
NS3003.
Voegelin, S. 2010. Listening to Noise and Silence. New York: Continuum.
chapter 28
Im agi n i ng th e
Sea m l e ss Cy borg
Computer System Sounds as
Embodying Technologies
Daniël Ploeger
Introduction
When I first started Microsoft Windows 10, I felt something was missing. Or rather,
I heard something was missing. There was no startup sound. Since I first used Windows
about twenty years ago, there had always been a short sound sequence that welcomed
me at the start of a computer session. Now, the only thing I heard when the desktop
came on was a short and inconspicuous “prrt.” Why did the startup sound disappear?
Most people in the Western world and beyond will be familiar with the startup chime
of an Apple computer, the Windows error sound, and plenty of other operating system
(OS) sounds. However, despite the wide cultural reach of these sounds, studies of com-
puter sound have mainly been concerned with sound synthesis for musical purposes or
the simulation of human speech. Relatively little research has been done into the design
and use of sound as part of computer OSs (Gaver 1986; Blattner et al. 1989; Alberts 2000;
DeWitt and Bresin 2007), and, as far as I am aware, there are no studies that are dedi-
cated to OS sounds from a cultural critical perspective. In this chapter, I discuss the
development of the role of sound in the operation of computers from the mid-twentieth
century until the present, and contextualize this in relation to broader cultural per-
spectives on computer systems as cybernetic extensions of the user’s body. Building
on this contextualization, I will explore how common computer system sounds might
facilitate particular imaginations about the nature of technological extensions of human
bodies. In what ways do computer sounds affect the ways in which users imagine the
relationship between their bodies and their computers? And how can the design of
580 DANIËL PLOEGER
Early computers in the 1940s, such as the Harvard Mark I, were built with electric relays,
which meant that computational processes were audible because of the clicking of the
relay switches. Listening to these sounds, computer operators could often detect errors
or operation irregularities through variations in familiar patterns. For example, Phillips
engineer Nico de Troye recalls that:
The [Harvard] Mark I made a lot of noise. It was soon discovered that every
problem that ran through the machine had its own rhythm. Deviations from this
rhythm were an indication that something was wrong and maintenance needed to
be carried out. (De Troye quoted in Alberts 2000, 43, my translation)
However, once computers were built that used radio tubes or transistors, instead of
mechanical relays, computers operated in operated in silence. With machines like the
ARMAC, MIRACLE, UNIVAC I, and IBM 650, errors and problems could not be
heard anymore. At the same time, until the late 1960s, visual monitors could only display
very limited amounts of data so, despite some rows of small lights and a crude cathode
ray tube display, the input and output data—usually on paper tape—had now become
the only detailed computing information directly accessible to the computer operator.
Apart from a simple hoot that could be triggered at designated points in a program,
there was no longer a possibility to aurally monitor operations during the computing
process. Interviews conducted by the historian of science Gerard Alberts with Dutch
engineers who had operated early computers during the 1950s and 1960s indicate that
engineers regretted this loss of aural cues. They responded by connecting a loudspeaker
to the electronic circuits inside these computers and thus made the processing patterns
audible once more through what could be called an “auditive monitor” (Alberts 2000, 2).
Some of the engineers were still able to sing the patterns of particular operations when
Alberts interviewed them four decades later.
Thus, the role of sound in the operation of these early computer systems appears to
reflect a more widespread listening culture around industrial noises. In her research
on sound in industrial work places, the cultural historian Karin Bijsterveld (2006)
discusses how the motivations behind factory workers’ frequent resistance to the use of
ear protection from their large-scale introduction in the middle of the twentieth century
until—in some cases—the present day suggests that the aural perception of the patterns
Computer System Sounds as Embodying Technologies 581
to further optimize the user interface. Blattner and colleagues proposed an approach
to auditory icons that builds on an analysis of visual icons. Distinguishing between
“representational” (e.g., the Mac OS trash can), “abstract” (e.g., Adobe Creative suite
icons), and semi-abstract icons (e.g., the Windows icon), they proposed to design
auditory icons based on the principle of “iconic families.” Sounds with shared elements
would convey to a user that they are related to the same group of functions. Thus, a
combination of recognizable representational elements with interlinked abstract aspects
could facilitate an easy-to-learn network of auditory communication as part of the
computer user interface.
While in the 1980s the interest in auditory icons had been focused on efficiently
conveying information about the system’s operations in easily understandable audi-
tory forms, the 1990s saw the emergence of a different interest in system sound. In Joel
Beckerman’s book, The Sonic Boom (2014), Jim Reekes, the designer who created the
current Mac startup sound and many other Mac OS sounds, reports how in the late 1980s
he struggled to convince his superiors to replace ill-considered Mac sounds, and start to
approach sound as a form of “audio branding” (Jackson 2003); what affective response
will a sound evoke in relation to broader associations with elements of culture or nature?
Until the implementation of Reekes’s design for the current startup sound, Apple
computers used to play a tritone interval when switched on. In Western music history,
this interval has often been associated with negative feelings and, from medieval times
until the eighteenth century, it was commonly designated as the Devil’s interval.
Curiously, this aspect of the sound seemed never to have been considered by the sys-
tem designers, who—according to Reekes—thought sound design to be of little impor-
tance. Reekes eventually managed (more or less secretly) to replace the tritone sound
with the current chime which consists of two major chords that pan slightly between
left and right on a stereo speaker setup. Originally in C Major, it has been transposed
several times, but otherwise it has remained the same since its inception. Reekes’s
objective was to create a “meditative sound” that would act as a “palate cleanser for the
ears” (Reekes in Beckerman 2014, 12). Users in the 1990s heard the startup sound at the
beginning of every computer session and after system crashes, which occurred fre-
quently. Consequently, the startup sound was an important factor in users’ experiences
of brand identity.
Eventually, the relevance of careful sound design and affective audio branding, as part
of the development of OSs was acknowledged on a wider scale by software and hardware
companies. This is apparent from Microsoft’s decision to hire the musician and sound
artist Brian Eno to compose the startup sound for Windows 95. According to Eno, the
commissioning brief he received included about 150 adjectives: “The piece of music
should be inspirational, sexy, driving, provocative, nostalgic, sentimental . . . and not
more than 3.8 seconds long” (Eno in Cox 2015, 271–272). The design of OS startup
sounds, as well as signal sounds throughout the system, had now become a priority
in developers’ corporate branding strategies (for more on audio or sonic branding, see
Gustafsson, volume 1, chapter 18).
Computer System Sounds as Embodying Technologies 583
These reflections on corporate interests in OS sound design since the 1990s suggest that
there is an affective and potentially embodied dimension to users’ experiences of these
sounds. Reekes speaks about a “palate cleanser for the ears” and the adjectives referred
to by Eno obliquely refer to an incentive to establish a relationship between the user and
the computer (or the Microsoft Corporation) that goes well beyond a cognitive and
instrumental interaction into a more affective realm. Indeed, the media theorist
Deborah Lupton (1995), in “The Embodied Computer/User,” gives an account of com-
puting in the mid-1990s that confirms exactly this connection between OS sounds and
affect. She starts with a short personal anecdote about her own computer:
When I turn on my personal computer . . . it makes a little sound. This little sound
I sometimes playfully interpret as a cheerful “Good morning” greeting . . . the sound
helps to prepare me emotionally and physically for the working day ahead. (97)
Notably, the sound she is referring to here is most probably the rather crude fanfare
sound, which was included in the Windows OS, before the introduction of Brian Eno’s
startup sound in late 1995 just after Lupton was writing.
Brian Massumi defines affect as “a prepersonal intensity corresponding to the
passage from one experiential state of the body to another” (Massumi in Deleuze and
Guattari 1987, xvii). The application of sound plays an important role in the shaping
of affective responses in a broad range of cultural activities, ranging from marketing
(Bruner 1990) to activism (Thompson and Biddle 2013) and warfare (Goodman 2009).
Although long unconsidered by system developers, users’ affective responses to OS
sounds have shaped the experience of their interactions and connections with the
machines since the early days. This is also clear from Alberts’s reflections on the role of
the amplified processing sounds in early radio tube and transistor-based computers.
Before these machines were introduced, computing had been a manual operation,
which was accompanied by sounds of people working: historically on paper, using rela-
tively simple calculating objects, later aided by mechanical calculators. The relay-based
computer did calculations automatically, but it generated a reassuring sound that was
similar to what had previously emerged from the manual mechanical calculators on the
work floor. The accounts of the engineers interviewed by Alberts suggest that the loud-
speaker attached to the subsequent “silent” computers did not just act as a monitoring
device to check whether the computer was still operating correctly. The loudspeaker
sounds also provided a sense of comfort, they facilitated a “sensory restoration of the
relationship with physical calculation” (Alberts 2000, 45).
Indeed, more recent research into the design of sound in human–computer interaction
has investigated the potential of sound to facilitate affective user relationships with data
inside the system. Anna deWitt and Roberto Bresin, in their article “Sound Design for
584 DANIËL PLOEGER
Affective Interaction” from 2007, suggest the use of physical models of real-world
sounds to represent elements of virtual worlds. For example, they propose to sonically
represent the arrival of mobile phone text messages with the sound of marbles falling
into a metal box. More important messages would sound like heavier marbles, and by
shaking the phone the user could determine how many messages have arrived based on
the sound of a related number of marbles moving around. Thus, they argue that the
design of system operating sounds may be a way to “narrow the gap between the embod-
ied experience of the world that we experience in reality and the virtual experience that
we have when we interact with machines” (deWitt and Bresin 2007, 525).
Everyday Cyborgs
In the following, I will further examine the role of OS sounds in the embodied experience
of human–computer interaction. However, my interest is not in determining effective
methods for information transmission and the potential to forge a seamless transition
between embodied experiences of the physical world and the data that exist inside
computer systems, as is the case in the research of DeWitt and Bresin and the work of
Gaver and Blattner and colleagues in the 1980s. Instead, I will focus on how the OS
sounds discussed thus far might relate to broader cultural representations and under-
standings of human bodies and technology, particularly in the light of popular cultural
imaginations of the cyborg.
Before I continue, we should take a closer look at embodied experiences of human
computer interaction in a broader sense. In “The Embodied Computer/User,” referred
to earlier, Lupton discusses embodied computer user experiences. It is not surprising
that this text was written in the mid-1990s. This was the time when digital technology
and especially personal computers had become omnipresent in professional and private
life in the Global North. Lupton describes how, by the early 1990s, many people in
Western societies had come to feel dependent on digital technologies in their everyday
lives. A power cut at a research unit she visited left staff wondering what they should do
while their computers could not be accessed. As a consequence of this far-reaching inte-
gration of computers (and other digital technologies) in everyday life—which has only
become stronger today—people also tend to have an emotional relationship with their
computers; they commonly experience fear, anger, frustration, and relief as part of their
interactions with them. In her analysis of this phenomenon, Lupton builds on the femi-
nist scholar Elizabeth Grosz’s argument that inanimate objects that have been in close
contact with the body for extended periods of time become experienced as extensions of
the body image. According to Grosz, “[i]t is only insofar as the object ceases to remain
an object and becomes a medium, a vehicle for impressions and expression, that it can
be used as an instrument or tool.” Thus, in interaction with the body, an inanimate object
can become an “intermediate” or “midway between inanimate and the bodily” (Grosz in
Lupton, 1995 98–99). Drawing on this, Lupton suggests that, by the mid-1990s, instead
Computer System Sounds as Embodying Technologies 585
of the “human/computer dyad being a simple matter of self versus other, a blurring of
the boundaries between embodied self and the PC” (Lupton 1995, 98) has taken place
for many people.
If we consider the interactions between users and personal computers (or mobile
devices) from a cybernetic perspective, we arrive at a similar interpretation. In his expla-
nation of cybernetic networks, Gregory Bateson (1972) gives the example of the stick of a
blind man. He argues that this object should—from a cybernetic perspective—be
considered as part of the man’s body, because it constitutes a pathway for information
exchange between the man and the world around him. If we think of Lupton’s account of
the despair caused by the power cut in her university in the 1990s, or the discomfort
(or even anxiety) many people experience nowadays when they are unable to connect to
social networks due to a depleted smartphone battery, it is clear that Bateson’s argument
is also applicable in this context: while users’ conscious perceptions of the computer
may be as external objects with which they interact, in terms of their communicative
interactions with the world around them they fulfill the role of cybernetic extensions
of their bodies.
Accordingly, we can consider everyday human–computer interactions in the context
of the concept of the cyborg: a cybernetic organism. The term “cyborg” was first coined
in 1960 by the scientists Manfred E. Clynes and Nathan S. Kline in their article “Cyborgs
and Space” (1960), and further explored in Daniel S. Halacy’s Cyborg: Evolution of the
Superman (1965). Inspired by recent developments in space travel, Clynes and Kline
suggest that it is time for “man to take an active part in his own biological evolution”
(1960, 26), through the attachment of technological extensions to human bodies, in
order to prepare for living in extraterrestrial environments. Likewise, Halacy promotes
the technological extension of bodies in order to enhance their strength and capabilities.
In these visions, technological development is considered a neutral force that can be
instrumentalized as desired.
Since the mid-1980s, critiques of this technodeterminist approach to the concept of
the cyborg have emerged. Donald MacKenzie and Judy Wajcman’s anthology The Social
Shaping of Technology (1985) examines how technological developments are shaped
by—and complicit in the persistence of—existing sociopolitical paradigms. In this con-
text, Donna Haraway’s “Cyborg Manifesto” (1991) acknowledges that the image of the
cyborg has its origin in the military-industrial complex, but that it can also be employed
to challenge hegemonic divisions of gender; if the body’s parts and characteristics are
thought of as (theoretically) exchangeable for technological substitutes, this means that
traditional thinking in gender oppositions tied to a biological body becomes impossible.
Thus, for Haraway, the cyborg is an image of “a creature in a post-gender world” (1991, 150),
which allows us to move away from the binary thinking that underlies the distribution
of power in what Haraway calls “White Capitalist Patriarchy” (161).
However, despite these critiques and emancipatory visions for the cyborg, the
positivist ideology of the military-industrial complex of enhancement and strength has
remained a mainstay in imagined and realized cyborgs in popular culture and art until
the present day. Fictional characters in films and TV programs from the 1960s until the
586 DANIËL PLOEGER
present, like the Six Million Dollar Man, Robocop (Verhoeven 1987), and Ex Machina
(Garland 2015), are consistent with the idea of enhancement through implantation and
attachment of state-of-the-art technologies. Similarly, artwork and writing by artists
including Stelarc (1991), Neil Harbisson, and Moon Ribas has been focused on pro-
moting the idea that the human body can be made more capable through integration
of hi-tech components.
As mentioned already, Lupton suggests that once interactions with computers and
other digital technologies have become thoroughly integrated in everyday life, a
“blurring of the boundaries” between the devices and the embodied self occurs.
However, this development is not a smooth process. It takes place through a negotiation
of antagonistic emotions toward computers. On one hand, users are indeed “attracted
towards the . . . opportunity to achieve a cyborgian seamlessness.” However, at the
same time, they often “feel threatened by [the technology’s] potential to engulf the self ”
(1995, 111) that would cause a loss of agency due to the lack of (perceived) individual
control over data in the system.
Here, it is important to acknowledge that the computer users Lupton discusses were
generally nonspecialists—although they often used computers intensively in everyday
life—for whom the devices very much remained like a “black box;” they would usually
have had little understanding of the inner workings of the computer (Latour 1999). This
perceived mysteriousness of the computer system is arguably also one of the sources of
the fears and discomfort concerning the perceived threat of loss of control and agency
when becoming dependent on cybernetic systems.
When we listen to Reekes’s account of the creation of the startup sound in this
context,1 or hear other sounds he designed (e.g., “quack” and “Sosumi”) it is significant
that the sounds he designed and selected include a combination of recordings of acoustic
sounds and synthetically generated sounds and that many sounds seem to be somewhere
in-between acoustic and synthetic (while the startup chime does not sound entirely syn-
thetic, it is also hard to tell what the acoustic sound sources involved might be). We hear
a similar pattern in the Windows system sounds of the mid-1990s and early 2000s, while
the sounds “Recycle” and “Ring” are clearly recognizable as recordings of the crumpling
of a piece of paper and a ringing desk phone, “Notify” and the sounds that mark infrared
connections may be more readily associated with the soundscape of a sci-fi film.2
Considering this combination of the acoustic and synthetic in the choice of OS sounds
in both Mac OS and Windows in relation to Lupton’s examination of the ambiguous
relationship of computer users of the 1990s with their devices, it appears that the sonic
environments of the OSs functioned as a means to partly negotiate this tension; on the
one hand, they promote a smooth, sci-fi-like aesthetic to evoke a sense of unproblematic
and clean computing power (this is perhaps most prominent in the different versions of
the Windows startup sound since the mid-1990s) while, on the other hand, the inclusion
of sounds that evoke elements of the organic world outside the device provides a sense of
comfort, mitigating fears of loss of agency that are due to dependency on a “black box.”
Quite differently, discomfort arising from dependency on a black box is unlikely to
have been a big issue for the engineers working with early computers. In the early days
of computing, operators were usually mathematicians and engineers with an in-depth
knowledge of the system, while the systems themselves were still of a limited degree
of complexity, which made it possible for an individual to have a fairly comprehensive
understanding of its processes. In other words, whereas for the (predominantly non-
specialist) computer users of the 1990s a sonification of internal operations of a computer
588 DANIËL PLOEGER
system would be likely to further add to the opacity of its operations and thus heighten
a sense of alienation and potential threat, the sonified system sounds early engineers
listened to comforted them that the machine was operating as intended and made it
possible for them to relate to the system in terms of human actions (the previously
manual operations of mechanical calculator operators).
Since the 1990s, OS sounds have continued to develop. Listening to Microsoft Windows,
for example, there are several changes that stand out. As I mentioned in the introduction
of this chapter, since version 10, which was released in 2015, Windows no longer features
a notable startup sound. Another development that becomes apparent on closer listening
is the gradual disappearance of the organic-sounding sounds that had been a prominent
feature of Mac OS and Windows alike since the 1990s. In Windows 10, the only appar-
ently organic sound that remains is “Recycle” (the sound of crumpling paper mentioned
earlier). All other sounds have gradually become smoothed and more evocative of
digital synthesis. The ring tones no longer resemble those of traditional desk phones.
The sense of the synthetic is further heightened by the conspicuous increase of digitally
generated reverb that is added to the various sounds over the years.
Microsoft’s response to queries about their motivations to remove the startup sound
gives us a hint as to how developments in the sonic interface may be related to broader
issues around the (desired) experience of human–computer interaction:
Thus, OS sounds are conceived to facilitate a user experience in which the device is
no longer perceived as present. The device should become an unnoticed attribute that’s
“all about you.” In other words, the soundscape should facilitate the “cyborgian seem-
lessness” Lupton wrote about in the 1990s. Apparently, there is now no longer felt a need
to put users at ease through evoking a sense of the organic around the technological
black box they are connecting with. Instead, the technological device as a whole should
be backgrounded.
This “quieting of the system” is reminiscent of what Mark Weiser (1996) has coined as
“calm technology” as part of his theory of “ubiquitous computing.” In the late 1980s and
early 1990s, Weiser observed that personal computers, despite their widespread use by
nonspecialists, were still often experienced as specialist devices, the operation of which
involved focused and concentrated activity. In contrast, the much older information
technology of reading and writing is present in all areas of everyday life and is per-
formed with a much lower degree of conscious attention; writing is a “ubiquitous tech-
nology.” Weiser argued that, once computers become truly omnipresent in all kinds of
forms, and each person operates a number of different devices, we will arrive in the era
Computer System Sounds as Embodying Technologies 589
of—the positivist concept of the cyborg as a strengthened and enhanced human body in
which technological prostheses are politically neutral and form increasingly seamless
connections with the organic human body. Thus, rather than merely reflecting a
technocultural status quo, OS sounds also facilitate the user’s imagination of a particular
kind of connection between bodies and technologies.
Although the vision of technologically enhanced bodies may appear attractive, it also
has some problematic implications. First, the popular vision of the cyborg suggests a
universal notion of progress, which omits engagement with the inequalities of gender,
race, and social class that continue to play a role in the politics of bodies (Haraway 1991).
As long as they are not equally available to everybody, the introduction of seamless and
inconspicuous, and therefore likely to be taken for granted, technological extensions
to the body easily becomes a process of hiding inequality. Second, endeavors to make
interaction with technologies imperceptible promote a disregard for the materiality
of technological components in terms of expense of resources, production labor, and
ecological impact of waste (Ploeger 2016). The ever-increasing speed of replacement of
everyday electronic commodities generates a growing stream of electronic waste. In most
cases, this waste is eventually exported to developing countries where it is often recy-
cled through environmentally harmful methods or dumped in unprotected areas, caus-
ing severe environmental damage accompanied by a range of sociocultural problems
(Chan and Wong 2013).
Thus, instead of making human–computer interaction as inconspicuous as possible—
and thus promoting the imagination of a seamless cyborgian prosthesis—a conscious
experience of the user interface might be desirable in order to facilitate an engagement
with the device’s embeddedness in existing sociopolitical power structures, both con-
cerning persistent inequalities in access to technology, and the ecological and social
consequences of technology’s materiality. In other words, instead of stimulating the
imagination of smooth and powerful technologically enhanced bodies that are in line
with the interests of “militarism and patriarchal capitalism” (Haraway 1991, 151), a user
environment that is less “seamless” could facilitate a critical awareness of the develop-
ment and embeddedness in culture of technological devices. Considering OS sounds
from this angle, are there any opportunities to reconnect to the materiality of the
device in what sounds like the ever further smoothening and quieting of the system
soundscapes? Where are the—metaphorical and literal—cracks in the developers’
attempts to create a comfortable and seamless sonic interaction?
A 2007 blog post written by a member of the Windows developer team discussing
sound “glitching issues” in the new Windows Vista OS offers a possible answer. Defining
a sound glitch as “a perceivable error, gap, or pop in the sound caused by discontinuities
in the audio signal during playback or recording which result from processing or timing
problems,”3 the author draws attention to the fact that audio glitches are more perceptible
than irregularities in video “because the ear’s tuned to notice high frequency transients.”
Accordingly, the sound ecologist Michael Stocker (2013) suggests that the human body
is “hardwired” to be alerted by subtle changes in sound inputs. Sonic irregularities trigger
a sense of alert and thus break through a sense of smooth and unconscious interaction;
the illusion of the seamless cyborgian connection is temporarily interrupted.
Computer System Sounds as Embodying Technologies 591
Understandably, and as the quoted blog post suggests, OS developers are invested in
the elimination of any sonic irregularities. However, there are others who embrace these
“bugs” in OSs, and even seek to actively provoke glitches. Glitch artists tweak “tech-
nology and [cause] either hardware or software to sputter, fail, misfire or otherwise wig out”
(McCormack 2010). Although glitch art is often primarily considered as an aestheticiza-
tion of system bugs, there is a political dimension to this work precisely in its endeavor
to undermine software and hardware manufacturers’ desires to make computer oper-
ation as inconspicuous as possible by means of smoothly operating user interfaces.
Although most work in glitch art that engages with the user interfaces of OSs has thus
far been focused on visual artifacts, some artists have also worked with the distortion
and interruption of system sounds. Among the artists who work in this way are JODI
(Joan Heemskerk and Dirk Paesmans), members of the British organization TOPLAP,
and Chicago-based Jon Satrom.
Satrom’s Plugin Beachball Success (2012), performed at the opening ceremony of the
transmediale festival in Berlin, begins with what looks like a failed attempt to start
the program running the performance. Satrom unsuccessfully tries to log on to his
Mac several times. Each time the Mac error signal sounds through the speakers. Satrom
apologizes and says that he only just got this computer. Once he manages to get in,
another disruption occurs almost immediately: an error message states “PLUGIN NOT
FOUND. Your computer needs additional software to run this asset. Click Here to
DOWNLOAD.”4 It quickly becomes clear that the performance has actually already
started. Over the next thirteen minutes, Satrom turns the commonly experienced inter-
ruption caused by a missing plugin—an additional bit of software that enables a pro-
gram to read a certain data format—into an escalating sequence of repetitions and
transformations.
Operating system sounds play an important role in this process. The familiar error
sound that is explicitly introduced at the beginning of the performance is gradually
mixed into a cacophony of various system sounds and decomposed into gritty noise
structures. Listening to this apparent system collapse, the sound glitches gave me an
almost visceral sense of discomfort. Satrom makes us aware that the smooth connection
we may sense with our computers is merely an imaginary bond, forged to an important
extent by a polished system soundscape. Once the smooth, familiar system sounds are
violated and subverted, our attention is drawn to the fact that the technological exten-
sions of our body are designed in accordance with a certain logic; they are not merely
neutral, seamless prostheses that enhance the capabilities of our bodies. They are also
form part of a designed world; a world that is still overshadowed by the imaginary,
all-powerful cyborg of the military-industrial complex.
Notes
1. One More Thing (2010), “Interview Jim Reekes: Creator Mac Startup Sound,” https://www.
youtube.com/watch?v=QkTwNerh1G8. Accessed June 27, 2017.
2. Dark Parodies (2015), “All Windows Sounds | Windows 1.0–Windows 10.” https://www.
youtube.com/watch?v=ufKjjgvQZho. Accessed June 27, 2017.
592 DANIËL PLOEGER
References
Alberts, G. 2000. Rekengeluiden: De lichamelijkheid van het rekenen. Informatie und
Informatiebeleid 18 (1): 42–47.
Bateson, G. 1972. Steps to an Ecology of Mind. San Francisco: Chandler.
Beckerman, J. 2014. The Sonic Boom. Boston, MA: Houghton Mifflin Harcourt.
Bijsterveld, K. 2006. Listening to Machines: Industrial Noise, Hearing Loss and the
Cultural Meaning of Sound. Interdisciplinary Science Reviews 31 (4): 323–337. doi:10.1179/
030801806x103370.
Blattner, M., D. Sumikawa, and R. Greenberg. 1989. Earcons and Icons: Their Structure and
Common Design Principles. Human-Computer Interaction 4 (1): 11–44. doi:10.1207/
s15327051hci0401_1.
Bruner, G. C. II. 1990. Music, Mood, and Marketing. Journal of Marketing 54 (4): 94.
doi:10.2307/1251762.
Chan, J. K. Y., and M. H. Wong. 2013. A Review of Environmental Fate, Body Burdens, and
Human Health Risk Assessment of PCDD/Fs at Two Typical Electronic Waste Recycling
Sites in China. Science of the Total Environment 463–464: 1111–1123. doi:10.1016/
j.scitotenv.2012.07.098.
Clynes, M., and N. S. Kline. 1960. Cyborgs and Space. Astronautics 14 (9), September 1960:
26–27, 74–76.
Cox, T. J. 2015. The Sound Book: The Science of the Sonic Wonders of the World. New York:
W. W. Norton.
DeWitt, A., and R. Bresin. 2007. Sound Design for Affective Interaction. Lecture Notes in
Computer Science 4738: 523–533.
Deleuze, G., and F. Guattari. 1987. A Thousand Plateaus. Minneapolis: University of
Minnesota Press.
Garland, A. 2015. Ex Machina. Film4, DNA Films.
Gaver, W. 1986. Auditory Icons: Using Sound in Computer Interfaces. Human–Computer
Interaction 2 (2): 167–177. doi:10.1207/s15327051hci0202_3.
Goodman, S. 2009. Sonic Warfare: Sound, Affect, and the Ecology of Fear. Cambridge,
MA: MIT Press.
Halacy, D. S. 1965. Cyborg: Evolution of the Superman. New York: Harper & Row.
Haraway, D. J. 1991. A Cyborg Manifesto: Science, Technology, and Socialist-Feminism in the
Late Twentieth Century. Simians, Cyborgs and Women: The Reinvention of Nature. London:
Routledge.
Jackson, D. M. 2003. Sonic Branding: An Essential Guide to the Art and Science of Sonic
Branding. Basingstoke, UK: Palgrave Macmillan.
Kubrick, S. 1968. 2001: A Space Odyssey. Metro-Goldwyn-Mayer.
Computer System Sounds as Embodying Technologies 593
Glitch ed a n d Wa r ped
Transformations of Rhythm in the Age
of the Digital Audio Workstation
Anne Danielsen
Introduction
Digital music technology has brought about unforeseen possibilities for manipulating
sound, and, as a consequence, entirely new forms of musical expression have emerged.
This chapter will focus on the particular rhythmic feels that can now be produced
through manual or automated techniques for cutting-up sound, warping samples, and
manipulating the timing of rhythm tracks in digital audio workstations (DAWs). By
rhythmic feel, I refer to the systematic microrhythmic design applied to a rhythmic
pattern in performance or production, such as, for example, when playing a pattern
with a swing or straight feel. These new rhythmic feels have made an unmistakable mark
on popular music styles, such as glitch music, drum and bass, hip hop, neo-soul, and
contemporary R&B from the turn of the millennium onward, and not only represent a
challenge to previous forms but also create new opportunities for stretching the human
imagination through presenting previously unheard sounds and sonic gestures to
creators and listeners alike. A crucial aspect of this development is the manner in which
the new technologies allow for combining agency and automation, understood as
creative strategies, in new compelling ways.
In what follows, I will begin by reviewing two trends in the literature addressing these
new rhythmic feels: one that positions them as a continuation of earlier machine-generated
grooves; and another that positions them as an expansion of the grooviness of earlier
groove-based music, such as funk, soul, and R&B, in unforeseen directions. Ultimately,
I will reflect on the challenges faced by musicians and producers when it comes to antic-
ipating the outcomes of processes involving the experimental use of new technology
and, in turn, will acknowledge the potentially productive impact of the technologically
unexpected on our sonic imaginations.
596 anne danielsen
According to Tim Armstrong (1998), two different views of the relationship between
technology and the body exist within modernism. On the one extreme, there is techno-
logical utopia, represented by Freud’s notion of technology as a positive prosthesis in
which human capacities are extrapolated. In this view, “[t]echnology offers a re-formed
body, more powerful and capable, producing in a range of modernist writers a fascination
with organ-extension, organ-replacement, sensory-extension” (Armstrong 1998, 78).
At the other extreme, we find writers adhering to the Marxist view of technology as an
alienating means of industrial production. Here the technological advances underlying
commodity capitalism result in a subordination of the human to the machine, promoting
a nonhuman form of mechanical repetition and standardization. In the field of music,
technology has generally taken on a role that is in accordance with the former view,
namely as a positive extension of the human body. This pertains, for example, to traditional
instruments such as pianos and clarinets (see, e.g., the discussion in Kvifte 1989) and to
the increasing use of experimental recording and processing technologies. Some of the
musical ideas that developed in rock in the late 1960s, for example, were not doable
without such musical “prostheses.” Similarly, within the field of electroacoustic music,
various electronic and computerized technologies have been regarded as progressive
and liberating tools for music creation. However, we also find tendencies of Marxist
determinism that apply to music. This is prominent both in the discourse on various
technologies’ roles in promoting mass distribution of music and the Frankfurt school’s
critical discourse on popular music as a cultural response to the standardization and
commodification typical of capitalist industrial production (Adorno 1990; Horkheimer
and Adorno 2002).
In this chapter, I will focus on rhythmic popular music and use as my starting point
the emergence of a discursive and performative tension that resonates with the Marxist
view on technology just presented, in the sense that it situates human expression and
machine-made musical creation as two opposing extremes. This tension developed as a
response to the depreciation of disco and other repetitive rhythmic music as com-
mercial and commodified “machine” music that emerged in the wake of the crossover
success of black dance music in the popular music mainstream in the late 1970s.1 The
immense popularity of disco was probably crucial here; the style represented new tools
(click track and the analog sequencer) and a new aesthetics (four-to-the-floor), and
threatened the ideological and commercial position of white Anglo-American rock
that, up to this point, had dominated the mainstream for several decades.2 As a conse-
quence, an increasing polarization between what might be called “organic” and
“machinic” rhythms emerged.3 On the one hand, artists played styles, such as rock,
country, funk, and jazz, that were characterized by rhythmic feels that derived from
transformations of rhythm in the digital audio workstation 597
both deliberate and unintended variations that musicians add to their performances; on
the other hand, there were artists who produced sequencer-based dance music with a
futuristic machine aesthetic, as expressed in Kraftwerk’s albums Man-Machine (1978)
and Computer World (1981). These latter grooves, enabled by analog sequencers, were
often perceived to be nonhuman and mechanistic, largely because of the absence of
micro-level flexibility in the temporal placement of rhythmic events that were all forced
into the grid provided by the sequencer. The absence of variation in sound in analog
(and early digital) sequencer-based groove was probably also crucial to this dichotomy;
the small shifts in intensity and timbre that are always present in performed music were
absent in these early sequencer-based rhythms.4 This division in rhythmic design within
1970s popular music is probably crucial to any subsequent understanding of why
rhythmic patterns consisting of grid-ordered events are experienced as lacking a human
touch even when they are produced by a human. Rhythmic subdivisions that are too
evenly played still tend to make us think of a machine. Loose timing, on the other hand,
tends to be described as organic and evokes associations with human performance, even
when those patterns and variations have been generated by a computer.5
The mechanistic aspect of perfectly even timing in sequencers from the predigital and
early digital era was often countered through the introduction of a humanizing function
to it, by altering the beats of a musical sequence according to a random series of deviations
that would make them less, nonhumanly perfect. However, even though this may be
thought to match motor and timekeeper noise in human timing, such random deviations
are not typical for groove-based music, that is, music organized around a repetitive
rhythmic pattern. As many studies have shown, deviations in groove-based music are to
a large extent systematic (Bengtsson et al. 1969; Butterfield 2010; Danielsen 2006, 2010b;
Iyer 2002), meaning that the same pattern of microtiming (that is, the early and late
marking of beats) is repeated in each repetition of the basic pattern (usually one or two
bars in length).6 Research has also shown that in performed music fluctuations that
exceed this basic pattern are not random either but are instead both long-range and
correlated (Hennig et al. 2011).
Prior to the increased temporal flexibility of later digital sequencers and digital
audio-sequencing (which was introduced in the early 1990s, see Brøvig-Hanssen and
Danielsen 2016, chap. 6; Burgess 2014, chap. 11) then, there was both an ideological and a
de facto difference between played and machine-generated rhythm that was associated
with the constraints of the conditions of production within these two spheres. Machine
rhythm lacked the intended (and unavoidable nonintended) temporal and sonic variations
that were typical of human performance. Likewise, humans were simply unable to
produce the extreme evenness of the machine.7
As we shall see in the following section, this traditional link between machine-based
music and stiffness has been disrupted by new opportunities for creating microrhythmic
designs in the DAW—first, because the DAW seems to be able to produce the entire
spectrum of rhythmic feels previously associated with human performance, and second,
because human- and computer-based rhythms are often, in fact, deeply embedded in
one another, not least through the ways in which human performances are routinely
598 anne danielsen
used as raw material for producing rhythms in the DAW. Today, therefore, it is very
difficult to distinguish between human- and computer-generated performances.
Nonetheless, even though the division between human- and machine-based rhythms
has been transcended when it comes to what the machine can actually produce, the two
related aesthetic paradigms—even rhythm on the grid, on the one hand, and deep,
groovy rhythmic designs, on the other—have to some extent been continued. At the
mechanistic extreme of the rhythmic continuum, we find forms of electronic dance
music (EDM), in which machine-like timing is a distinguishing stylistic feature and
even a preference long after alternatives to it had become available in the early 1990s
(Zeiner-Henriksen 2010). At the “organic” extreme of the continuum, we find the deep,
groovy rhythm of African American–derived, computer-based rhythmic genres. What
is used to realize these two fundamental rhythmic inclinations, however, is no longer so
different because, in the age of the DAW, they typically come from the same production
tools. A crucial factor in defining a possibly new late-digital condition regarding the
field of musical rhythm, then, is the manner in which the distinction between organic
and machinic rhythm has been transcended. Agency and automation, understood as
creative strategies, inform both mechanistic rhythmic expressions and deep, groovy
feels. I will now conduct a closer inspection of these two aesthetic trends in contempo-
rary musical rhythm.
Microrhythmic Manifestations
of the Digital Audio Workstation:
Two Trends
The first trend comprises electronica-related styles whose rhythmic events align with a
metrical grid. Common to the musicianship of the artists representing this trend is a
preference for exaggerated tempi and an attraction to the completely straightened-out,
square feel of quantization. As pointed out earlier, this was both an aesthetic preference
and a technological constraint in the analog, sequencer-based tradition that this trend
grew out of. In the early days of this trend, high-pitched sounds such as the hi-hat
cymbal (or something else that fills the same musical function) were programmed
unnaturally—either too quickly or too evenly or both—specifically to connote a
machine-like aesthetics (Zagorski-Thomas 2010; Inglis 1999). The sound of these songs,
then, evokes an overdone, even unlikely virtuosity that I have elsewhere labeled the
“exaggerated virtuosity of the machine” (Danielsen 2010a). Prominent pioneering
artists of this rhythmic trend include Aphex Twin (the performing pseudonym of
Richard D. James), Autechre (Sean Booth and Rob Brown), and Squarepusher (Tom
Jenkinson), all of whom entered the electronica scene in the 1990s and are associated
with the label Warp. After a few years, this aesthetic strategy had traveled from these
avant-garde electronica toolboxes to, for example, the title track of the Destiny’s Child
transformations of rhythm in the digital audio workstation 599
album Survivor (Columbia 2001), thus entering the popular music mainstream. The fast
speed and quantized evenness of many of the tracks on such albums anticipate the
related process of musical granulation—that is, of crystallizing “sonic wholes” into
grains, so that musical or nonmusical sounds are chopped up into small fragments and
reordered to produce a stuttering rhythmic effect. This aesthetic also promotes a tendency
to transform sounds with an otherwise clear semantic meaning or reference point—
such as a musical source or a different musical context—into “pure” sound (see, for
example, Harkins 2010). Sounds or clips are also often combined in choppy ways that
underline sonic cut-outs, rather than disguising them, resulting in a skittering collage.
The label glitch music8—a substyle of electronic dance music associated with the
artists mentioned in the previous paragraph—hints at the ways in which we perceive
these soundscapes, namely as a coherent sonic totality that has been “destroyed,” meaning
chopped up and reorganized anew.9 An important point here, which Brøvig-Hanssen
discusses at length, is that this approach to sound relies on the listener being able to
imagine a “music within the music”—that is, a fragmented sound presupposes an imagined
and spatiotemporally coherent sound (Brøvig-Hanssen 2013). This operation, however,
becomes particularly precarious when the manipulated element is a voice. Brøvig-
Hanssen’s detailed analysis of the manipulations of the vocal track in two versions of
Squarepusher’s “My Red Hot Car,”10 where one is a “glitched” version of the other, clearly
demonstrates the ways in which meaning is transformed when sound is manipulated
away from what one normally regards as the field of possible human utterances. In the
glitched version, the vocal track has been “deformed”—sounds are cut off too early,
there are repeated iterations of sound fragments separated by signal dropouts, and frag-
ments are dislocated from their original locations (Brøvig-Hanssen and Danielsen 2016,
chap. 5)—in a manner that clearly departs from the human. Still, it is also hard to hear
the vocal track as purely musical (that is, not sung) sound. One tends to persist in imagining
a human being (and a coherent message) behind the stuttering rhythm, since the voice
always tends to be, first and foremost, an indexical sign of the human body and a clear
path from source through musical performance to recording. Consequently, “[w]e
can discern two layers of music, the traditional and the manipulated, neither of
which, in this precise context, makes sense without the other” (Brøvig-Hanssen and
Danielsen 2016, 95).
In addition to the association of cut-up-strategies with the destruction or transfor-
mation of a coherent musical whole, glitched, granulated, or manually or automatically
chopped up sound also produces a very characteristic microrhythmic effect. As Oliver
(2015) emphasizes, in jungle and drum and bass it is not first and foremost the transfor-
mation of temporal features or durations that produce the peculiar microrhythmic
effects but the cutting up of sounds and the abrupt transitions between sounds that such
cuts produce. The effect of chopping up the crash cymbal of the much-sampled Amen
break, for example, relies heavily on the fact that it is an initially acoustic, and thus very
rich, sound.11 When human musicking is transformed through computer-based pro-
cedures, one is thus confronted by both a break with and a continuation of the existing
mechanistic aesthetics of some kinds of rhythm. The sound is different (richer, less
pure), but the groove is produced, as with most EDM-related styles, not by manipulating
600 anne danielsen
Playing and making music have always been embedded in technology. The opposition
between organic and machinic musical expressions in late 1970s and early 1980s popular
music thus comes forward as partly ideological: all music-making means being deeply
involved in its technology, or, in the words of Nick Prior:
transformations of rhythm in the digital audio workstation 603
It is not just that technology impacts upon music, influences music, shapes music,
because this form of weak technological determinism still implies two separate
domains. Music is always already suffused with technology, it is embedded within
technological forms and forces; it is in and of technology. (2009, 95)
Relating this point to a more general epistemological discourse, we could say that new
technology creates new understanding, and that we have always learned to know the
world through the tools and technologies that we use to interact with our surroundings.
As Heidegger makes us aware of in his essay “The Question Concerning Technology”
(1977), there is no alternative route to the knowledge we acquire through technology.
Moreover, the insights that we derive from technology cannot be separated from the
technology itself; through technology we achieve knowledge about the world in a way
and to an extent that would otherwise be unavailable to us. In the words of Heidegger:
“[Techne] reveals whatever does not bring itself forth and does not yet lie here before us,
whatever can look and turn out now one way and now another” (1977, 8). The idea that
man and technology are opposed to each other is thus, according to Heidegger, beside
the point—instead, the machine should, in line with the “technology as prosthesis”—
view presented earlier, be seen as an extension of the human.
Digital technology has reactualized this debate in music-making, and from this
perspective one might ask whether the rhythmic feels discussed previously really
represent the results of a radically new “posthuman condition,” or whether they ought
to be understood as part of the continuous development of technology’s ever-present
role as an aid to, and extension of, human expression and behavior. According to the
latter position, so-called posthuman expressions are not after or outside of the human
repertoire at all. Instead, they should be considered simply the most recent expansion of
that repertoire. This would mean, in turn, that the microrhythmic manipulation made
possible by the DAW represents, in principle, nothing new, because there is nothing new
in the fact that new technology produces new forms of knowledge, expression, and
behavior or that it expands the scope of the human imagination.
As pointed out at the start of this chapter, however, after the introduction of
sequencer-based grooves in the popular music mainstream in the late 1970s, performed
and machine-generated music tended to align with two distinct aesthetic fields. For
some years, these two fields made use of different sets of tools that produced very different
sonic results. Consequently, performed and machine-generated music came to represent
different worlds of musical expression and imagination in the following decades.
Microrhythmic manipulation in the DAW has brought about a new aesthetic situation
marked by convergence between these two musical-rhythmic poetics. Performed and
machine-generated music are, in the late-digital era, deeply embedded in one another—
first, because both digital and traditional music technologies are used to achieve the
desired musical results in both domains, and, second, because the respective contributions
of these different technologies are in many cases (such as the examples discussed in
this chapter) almost impossible to distinguish from one another in the end result.
Accordingly, it would be wrong to speak of a hybridization of the two, because this
604 anne danielsen
presupposes two separate and still recognizable entities that have been combined.
Rather, performed and machine-generated rhythms have, in many contemporary
genres, morphed, making it impossible to separate their respective influences. We are
most likely yet to see the full consequences of this development, which also includes a
wide range of new interfaces for organic control of computers and music machines.15
The flexibility of the DAW, our contemporary music machine, has contributed tre-
mendously to this ongoing transformation, from an either/or to a both/and where the
distinction between organic and machinic musical expressions feels of little relevance.
The timing of musicians is warped in the DAW, then copied by other musicians who
are in turn manipulated in new machine-generated renderings, and on it goes. Even
the very current examples of the creative usage of digital pitch correction illustrates
this point. Autotune is another instance of a fundamental morphing of human and
machine that is made possible by digital tools that have extended the human expres-
sive repertoire; sometimes the result of this morphing is a voice that captures certain
human states or conditions better than the unmediated human voice, which is per-
haps the most human of all instruments (see Brøvig-Hanssen and Danielsen 2016,
chap. 7). We might then wonder whether we are in a new phase in the interaction
between the musicking human and the machine, a phase that is characterized by an
even more radical undermining of a possible ontological separation between man and
technology than what characterizes the musician-instrument interaction typical of
predigital times.
So, were the creators of the new rhythmic feels discussed earlier capable of imagining
the end result (and its wider implications), or did these new feels simply arise by acci-
dent and become labeled as such by the collective imaginations of the consumers/
receivers? This is a question that invites a double answer. No, the creators probably did
not anticipate the effect of their experiments with new technology, and they were—and
are, in line with Heidegger’s insights above—certainly not capable of foreseeing their
wider results. On the other hand, new rhythmic feels such as those discussed above do
not simply happen. The processes leading to them are begun with the intention of creat-
ing new sound. Generally, mechanized procedures for generating new musical material
represent a well-known strategy for innovative music-making that was employed by, for
example, the composer Pierre Boulez from the 1950s onward. His practice and reflec-
tions make it clear that the point of using such procedures was often to come up with
something unimaginable, with completely new sonic raw material, that could then be
shaped through intentional compositional procedures (see Guldbrandsen 2011, 2015).
transformations of rhythm in the digital audio workstation 605
The same goes for the creation of the rhythmic feels discussed previously. As we have
seen, an experimental attitude in combination with playfulness and creative abuse of
new technology may result in as-yet-unheard sonic results.
The flip side of this is that, as soon as those new sounds have been produced, they start
inhabiting the imaginations of their creators and the listeners. As to the groove-based
music discussed in this chapter, the relationship between rhythm and motion is clearly a
case in point. The groove qualities of rhythmic music are often related to the music’s
perceived ability to make one’s body move. Exactly how various rhythmic feels are
connected to body movement certainly remains an open question, but recent per-
spectives from the field of embodied music cognition pave the way for a close connection
between rhythm and perceived and performed motion (e.g., Chen et al. 2008; Danielsen
et al. 2015; Godøy et al. 2006; Large 2000; Leman 2008; Repp and Su 2013). Generally,
discussions of the relationship between rhythm and corporeality in music listening
point to the real and underacknowledged possibility that we structure our actual musi-
cal experiences according to patterns and models received from extra-musical sources,
such as actual movements (see also Godøy, this volume, chapter 12). This is probably
also a clue as to why we manage to adjust to and structure the peculiar warped grooves
discussed above: we draw on our internalized repertoire of already acquired gestures to
make sense of a new timing pattern. Put simply, if we find a way to move to those
grooves, we then come to “understand” them.
However, not only do dance and movement affect the way we experience and
understand grooves, inner or outer movements can also be induced or proposed by
music; that is, new gestures can be proposed by a piece of music. The rhythmic feels
discussed earlier may thus be a means of imagining completely new movement pat-
terns, or gestural designs, that are typical of the music of the humachine. Similar to the
ways in which the glitched and warped grooves described above both evoke and
deform their own “originals,” such imagined gestural designs may feel at one and the
same time connected and completely alien to us. As we develop ways of internally or
externally responding to these grooves, however, we also develop an understanding of
these new gestural imaginations, which at present goes well beyond our “natural” rep-
ertoire (here understood as what we regard as possible for human beings in the present
historical situation). Sounds that are shaped by way of digital processing may thus
evoke sonically based imaginations not only of the sources behind them (what kind of
creature makes this sound) but also of morphed, human-machine motion. Put differ-
ently, the sound of the DAW proposes a wide variety of new and peculiar ways of sing-
ing (the morphing of human and machine through autotuning), talking (glitched
stuttering vocal tracks), and moving (warped, deformed human gestures). Today,
these are experienced as different and marked by technological intervention, but who
knows? In future renderings, they might be regarded as completely commonplace,
perhaps as ordinary as talking with people on the other side of the Atlantic through
the telephone and hearing the whispering of singers from an enormous stadium stage
are today.
606 anne danielsen
Notes
1. For a discussion of how this crossover success changed black dance music, see Danielsen
(2006, chaps. 6 and 7, 2012).
2. According to Paul Théberge, contrary to the 1960s, when experimentation with, for
example, distorted guitar sound and multitrack recording “created excitement around
new sounds and electronic effects” (1997, 1), the late 1970s saw a skepticism toward
electronic instruments. According to Théberge, this skepticism (among, one might add,
rock musicians and their audiences) emerged as a consequence of the widespread reaction
to disco (1997, 2).
3. For a critical discussion of this polarization, see, for example, Simon Frith’s essay “Art versus
Technology” (1986).
4. Interestingly, in an article in Sound on Sound as late as October 1999, this absence of
variation in sound is still lamented when one is striving for realistic, sequenced drum
parts: “[A] main problem with many sampled sound sets is that they do not reflect the
ways in which the sound of real percussion instruments varies depending on the force
with which they’re struck” (Inglis 1999). This uniformity is particularly acute with hi-hat
strokes: “Standard drum kit sets, particularly those conforming to the general MIDI drum
map, suffer persistent problems. Perhaps the most obvious of these is the use of only three
different hi-hat sounds—open, closed and pedal—when real drumming makes use of a
continuous range of sounds from quiet to soft, from tight closed to open” (Inglis 1999).
5. Today, both machinic and organic music rely heavily on technological tools and is produced
by way of the DAW. Whether a piece of music is placed in the one category or the other,
then, has little to do with the kind of tools involved or the degree of technological
involvement. Rather, it comes forward as a question of aesthetics and the degree to which
the use of technology is exposed or made opaque to the listener (Brøvig-Hanssen 2010).
6. In addition to such systematic timing, there are also individual patterns (see, for example,
Repp 1996).
7. The fact that humans make mistakes, and machines, on the other hand, are associated
with (nonhuman) perfection, is also the backdrop for the experience of the “vulnerable,”
and thus more human, machine—as though technological mistakes somehow resemble
our own imperfections. According to Sangild, a technological failure such as a glitch thus
gives us a sense of “something living [it] displays the fragility and vulnerability of tech-
nology” (2004, 268). Dibben (2009) also underlines this humanizing effect of technological
failure in a discussion of Björk’s use of technology.
8. “Glitch” initially referred to a sound caused by malfunctioning technology. As Sangild
(2004) points out, these sounds of misfiring technology in fact expose technology as such
(266), or render it opaque (Brøvig-Hanssen 2010).
9. Whereas automated cutting processes could initially only be applied to prerecorded
sound, they can now be used in real time. For an introduction to the algorithmic pro-
cedures underlying different automated cutting processes in live electronica performance,
see Collins (2003).
10. The two versions were released as the two first tracks of Squarepusher’s EP My Red Hot
Car (Warp 2001). The second track was subsequently placed on the Squarepusher album
Go Plastic (Warp 2001).
11. The Amen break refers to a drum solo performed by Gregory Cylvester Coleman in the
song “Amen, Brother” (1969) by The Winstons.
transformations of rhythm in the digital audio workstation 607
12. See Johnson (2005) for an overview of the equipment used in Snoop Dogg’s recording
studio at the time.
13. This phenomenon parallels the local time shift phenomenon as described by Desain and
Honing (1989). See also Danielsen (2010a).
14. “Analog” performance practice is, of course, also open to sudden transitions, for example
in the form of tempo shifts. Research has shown that these can be rather abrupt (see, for
example, Cook 1995; Bowen 1996). However, the particularly glitched character of digital
time warps is difficult to achieve with conventional instruments.
15. For an overview of advances in interfaces for musical expression from the last fifteen
years, see Jensenius and Lyons (2017).
References
Adorno, T. W. 1990. On Popular Music. In On Record: Rock, Pop, and the Written Word, edited
by S. Frith and A. Goodwin, 301–314. London: Routledge.
Armstrong, T. 1998. Modernism, Technology, and the Body: A Cultural Study. Cambridge:
Cambridge University Press.
Benadon, F. 2009. Time Warps in Early Jazz. Music Theory Spectrum 31 (1): 1–25.
Bengtsson, I., A. Gabrielsson, and S. M. Thorsén. 1969. Empirisk rytmforskning. Svensk tidskrift
för musikforskning 51: 48–118.
Bjerke, K. Y. 2010. Timbral Relationships and Microrhythmic Tension: Shaping the Groove
Experience through Sound. In Musical Rhythm in the Age of Digital Reproduction, edited by
A. Danielsen, 85–101. Farnham, UK: Ashgate.
Bowen, J. A. 1996. Tempo, Duration, and Flexibility: Techniques in the Analysis of Performance.
Journal of Musicological Research 16 (2): 111–156.
Brøvig-Hanssen, R. 2010. Opaque Mediation: The Cut-and-Paste Groove in DJ Food’s “Break.”
In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen, 159–176.
Farnham: Ashgate.
Brøvig-Hanssen, R. 2013. Music in Bits and Bits of Music: Signatures of Digital Mediation in
Popular Music Recordings. PhD thesis. University of Oslo.
Brøvig-Hanssen, R., and A. Danielsen. 2016. Digital Signatures: The Impact of Digitization on
Popular Music Sound. Cambridge, MA: MIT Press.
Burgess, R. J. 2014. The History of Music Production. Oxford: Oxford University Press.
Butterfield, M. 2010. Participatory Discrepancies and the Perception of Beats in Jazz. Music
Perception 27 (3): 157–176.
Carlsen, K., and M. A. G. Witek. 2010. Simultaneous Rhythmic Events with Different
Schematic Affiliations: Microtiming and Dynamic Attending in Two Contemporary R&B
Grooves. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen,
51–68. Farnham, UK: Ashgate.
Chen, J. L., V. B. Penhune, and R. J. Zatorre. 2008. Listening to Musical Rhythms Recruits
Motor Regions of the Brain. Cerebral Cortex 18: 2844–2854.
Collins, N. 2003. Recursive Audio Cutting. Leonardo Music Journal 13: 23–29.
Cook, N. 1995. The Conductor and the Theorist: Furtwängler, Schenker, and the First
Movement of Beethoven’s Ninth Symphony. In The Practice of Performance: Studies in
Musical Interpretation, edited by J. Rink, 105–125. Cambridge: Cambridge University
Press.
608 anne danielsen
Danielsen, A. 2006. Presence and Pleasure: The Funk Grooves of James Brown and Parliament.
Middletown, CT: Wesleyan University Press.
Danielsen, A. 2010a. Introduction. In Musical Rhythm in the Age of Digital Reproduction,
edited by A. Danielsen, 1–18. Farnham, UK: Ashgate.
Danielsen, A. 2010b. Here, There and Everywhere: Three Accounts of Pulse in D’Angelo’s “Left
and Right.” In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen,
19–35. Farnham, UK: Ashgate.
Danielsen, A. 2012. The Sound of Crossover: Micro-Rhythm and Sonic Pleasure in Michael
Jackson’s “Don’t Stop ‘Til You Get Enough.” Popular Music and Society 35 (2): 151–168.
Danielsen, A., M. R. Haugen, and A. R. Jensenius. 2015. Moving to the Beat: Studying
Entrainment to Micro-Rhythmic Changes in Pulse by Motion Capture. Timing and Time
Perception 3 (1–2): 133–154.
D’Errico, M. 2015. Off the Grid: Instrumental Hip-Hop and Experimentalism after the
Golden Age. In The Cambridge Companion to Hip-Hop, edited by J. A. Williams, 280–291.
Cambridge: Cambridge University Press.
Desain, P., and H. Honing. 1989. The Quantization of Musical Time: A Connectionist
Approach. Computer Music Journal 13 (3): 56–66.
Dibben, N. 2009. Björk. Bloomington, IN: Indiana University Press.
Frith, S. 1986. Art versus Technology: The Strange Case of Popular Music. Media Culture
Society 8 (3): 263–279.
Godøy, R. I., E. Haga, and A. R. Jensenius. 2006. Playing “Air Instruments”: Mimicry of
Sound-Producing Gestures by Novices and Experts. In Gesture in Human-Computer
Interaction and Simulation: 6th International Gesture Workshop, GW 2005, Berder Island,
France, May 18–20, 2004, Revised Selected Papers, edited by S. Gibet, N. Courty, and J.-F. Kamp,
256–267. Berlin and Heidelberg: Springer-Verlag.
Guldbrandsen, E. E. 2011. Pierre Boulez in Interview 1996 (II): Serialism Revisited. Tempo
65 (256): 18–24.
Guldbrandsen, E. E. 2015. Playing with Transformations: Boulez’s Improvisation III sur
Mallarmé. In Transformations of Musical Modernism, edited by E. E. Guldbrandsen and
J. Johnson, 223–244. Cambridge: Cambridge University Press.
Harkins, P. 2010. Microsampling: from Alkufen’s Microhouse to Todd Edwards and the Sound
of UK Garage. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen,
19–35. Farnham, UK: Ashgate.
Heidegger, M. 1977. The Question Concerning Technology and Other Essays. New York: Harper
& Row.
Hennig, H., R. Fleischmann, A. Fredebohm, Y. Hagmayer, J. Nagler, A. Witt, et al. 2011. The
Nature and Perception of Fluctuations in Human Musical Rhythms. PLoS One 6 (10).
Horkheimer, M., and T. W. Adorno. 2002. The Culture Industry: Enlightenment as Mass
Deception. In The Dialectic of Enlightenment: Philosophical Fragments, 94–136. Stanford,
CA: Stanford University Press.
Inglis, S. 1999. 20 Tips on Creating Realistic Sequenced Drum Parts. Sound on Sound, October.
https://web.archive.org/web/20160327093715/http://www.soundonsound.com:80/sos/
oct99/articles/20tips.htm. Accessed December 17, 2018.
Iyer, V. 2002. Embodied Mind, Situated Cognition, and Expressive Microtiming in African-
American Music. Music Perception 19 (3): 387–414.
Jensenius, A., and M. J. Lyons. 2017. A NIME Reader. Fifteen Years of New Interfaces for Musical
Expression. Berlin: Springer.
transformations of rhythm in the digital audio workstation 609
Erik Steinskog
Introduction
While it probably is a coincidence that Richard Wagner and Sun Ra share a birthday,
May 22, 1813 and 1914, respectively, there are certainly some dimensions in how their
music has been received that could be compared. They both relate to a “music of the
future,” and while their ideas are strikingly different, the fact remains that such a music
of the future must have to be imagined. Some of the differences between the two are eas-
ily determined, such as views on history and the imagination of the future on a more
general level. That is to say, what kind of future can be imagined? Here it is interesting, as
Jacques Attali (1985) discusses in his now classic Noise, that music has been seen as
prophesying the future within many different thought-systems. There seems, however,
to be a version of Hegelianism at stake when discussing most “classical” music, from
before Wagner and in to the twentieth century, and this is arguably challenged by Sun Ra
and, more importantly, by theoretical discourses trying to get to grips with Sun Ra.
In this chapter, I follow what is most often referred to as Afrofuturism and how this
discourse challenges a normative version of understanding history, and thus introduces
concepts such as counterhistory and countermemory as well as how science fiction and
speculative fiction are also part and parcel of discussing what this music can mean. All
these concepts are, I will argue, diverse approaches to understanding how the different
modalities of time—past, present, and future—are intertwined and how this inter-
twinement is hearable, in something resembling a continuous, sonic time traveling.
612 ERIK STEINSKOG
Sun Ra’s film Space Is the Place (1972, directed by John Coney) is, in many ways, a core
text for understanding his worldview. One could argue that the place for Ancient Egypt
is not presented enough in Space Is the Place but, besides that obviously important
dimension, more or less everything is in place. While the film is science fiction—with
references to Blaxploitation as well—it is, in one particular sense, a realistic movie, in
the sense that it deals with the “unreality” of blacks in the United States of the early
1970s, something made abundantly clear in the scene where Sun Ra meets a number of
young people in a community center in Oakland. As he says in that scene:
How do you know I’m real? I’m not real; I’m just like you. You don’t exist in this
society. If you did, your people wouldn’t be seeking equal rights. You’re not real.
If you were you’d have some status among the nations of the world. So we’re both
myths. I do not come to you as reality. I come to you as the myth because that’s
what’s black. (quoted in Zuberi 2004, 88)
On the political level, this unreality is similar to the issues at stake for the civil rights
movement, as can be seen in Sun Ra’s reference to people “seeking equal rights.” But it is
also a statement of an almost ontological or cosmological nature; black or blackness is
myth. Is this the incorporation of society’s way of ordering race relations? Is it Sun Ra
giving up becoming included in the category “human beings”? There is a strand in
afrofuturist discourse arguing in such a direction, but where Sun Ra’s solution is
understood as bypassing the whole category of “the human” and become super- or
posthuman (cf., Eshun 1999, 155). Such a solution can, however, also be seen as a kind of
utopian striving, where the utopian dimension necessitates leaving the category of “the
human” behind. As history shows, first during slavery, where blacks were understood as
“subhumans” and later with the increasing impossibility of “white America” to accept
equal rights, the category itself is flawed.
But whereas the movie is realistic in its depiction of race relations, it moves to science
fiction for its solution (or one of its solutions), where going to outer space and finding a
planet for blacks to create a new civilization is presented. It is, then, about imagining
a future that seems unreal in the present. And while Jerome J. Langguth argues for a
“cosmopolitan” dimension in this solution (2010, 158), I do think the film should rather
be understood as pointing toward this future civilization as a black one, where the
“myth” of blackness is lifted outside time and history and is thus related to what Sun Ra
terms “MythScience.”
In the opening of the film, Sun Ra is seen walking amid vegetation. He is followed by a
creature in a hooded cape with a mirror where the face would have been expected, a
creature earlier seen in Maya Deren’s short film Meshes of the Afternoon (1943), and later
throughout the video to Janelle Monáe’s “Tightrope” (2010), thus bridging between clas-
sical American avant-garde and contemporary Afrofuturism (cf., Steinskog forthcoming).
Afrofuturism and the Sounds of the Future 613
Sun Ra hums, as if to set the scene for a spiritual séance, before going into a longer
monologue, the first words heard in the movie:
The music is different here. The vibrations are different. Not like Planet Earth. [ . . . ].
We could set up a colony of black people here. See what they can do on a planet all
on their own without any white people. They could drink in the beauty of this
planet. It would affect their vibrations; for the better of course. [ . . . ] That would be
where the alter-destiny will come in. Equation-wise, the first thing to do is to con-
sider time as officially ended. We work on the other side of time. We’ll bring them
here through either isotope teleportation, transmolecularization, or, better still,
teleport the whole planet here through music.1
The importance of music is underscored, while it is apparently not only one thing or
dimension. Rather, music is fundamental to the differences experienced on the two
planets at stake, the planet where Sun Ra is seen walking on the one hand, and “Planet
Earth” on the other. “The music is different here,” followed by “the vibrations are dif-
ferent,” undoubtedly follows in a long tradition of understanding music as vibrations
(cf., Goodman 2009). Calling it a tradition is not so much denying the physics—and thus
realness—of understanding music as vibrations, it is rather to point to this understanding
as being part of a continuum where cosmological thinking and/or speculation, science,
and myth meet and thus it is, in a sense, a dimension central to Sun Ra’s MythScience.
This need not necessarily have any consequence for the sound of the music (or the
sound of the music of the future) but it argues for a use of music that is highly interesting.
Music can be a means of transportation and not only on the individual plane as some
ecstatic dimension where the musician moves “out of himself.” Rather, music is understood
as a means of transporting a collective, and in that sense the Arkestra—Sun Ra’s big band—
is not just a “misspelled” orchestra, but becomes an Ark, a kind of spaceship fueled by
sound. Understanding music as a means of transportation is arguably less paradoxical
when thinking about it than when first hearing it proposed. Still, there is another challenge
to such an understanding of the science fiction dimension of Sun Ra as well as of Space Is
the Place. While I suggested above that Space Is the Place could be interpreted as a realistic
depiction of race-relations in the United States, it is also a science fiction film taking
place in a parallel world and quite possibly in the future. Whether it is in the future or
not, it is still “on the other side of time” with different “vibrations.” As such, it raises the
question: How does music sound, or vibrate, on the other side of time?
In his “Foreword: After Afrofuturism” George Lewis writes that Eshun’s term “sonic
fiction” is an “extraordinarily powerful term” (Lewis 2008, 144). One of the strengths
of the term is that it focuses on the sonic, but equally important is that by focusing on
614 ERIK STEINSKOG
“fiction,” the term can be used in discussing imagination and the imaginary without
having to deal with the visual connotations of “image” in the imaginary. Why is this
important? The visual bias of philosophical and aesthetic thinking has been documented
several times, and is found in the vocabulary of most aesthetic discourses (cf., Jay 1993).
One example could be how “reflection” relates to mirrors and visuality, where the acoustic
equivalent would seem to be echo. In other words, time and space are at stake, and our
way of perceiving time and space, as well as our ways of thinking those same categories,
proves important, for example in one of those places time and space interact, in reverber-
ation. In what sense, that is, is our language determining what we can say about the
phenomena under scrutiny? When it comes to the music or sound of the future, these
aspects might prove themselves important in several senses. But, and this is also in
accordance with Lewis’s argument, it does not necessarily have to do with language and
the categories available for discourse. It could correspondingly relate to how sound is
“imagined” or fictionalized and, perhaps even more importantly, what kind of fantastic
scenarios are available. In other words, “sonic fiction” could—along the lines of afrofu-
turist discourse—relate to the “sonic fantastic.”2
In Lewis’s article, which is the introduction to a special issue of Journal of the Society
for American Music dedicated to “Technology and Black Music in the Americas,” he
wants to challenge Afrofuturism for what he seems to suggest is too strong a focus on
what was previously known as “the extra-musical.” In an earlier article, “Improvised
Music after 1950,” he seems to argue that “the extra-musical” does not exist, as he refer-
ences “areas once thought of as “extra-musical,” including race and ethnicity, class, and
social and political philosophy” (Lewis 1996, 94). In “After Afrofuturism,” on the other
hand, he at least seems to think this distinction has some merit, as becomes clear when he
asks: “What does the sound—not dress, visual iconography, witty enigmas, or sugges-
tive song titles—what can the sound tell us about the Afrofuture?” (Lewis 2008, 141).
It might be that sound (as sound) is an undertheorized dimension of Afrofuturism,
although at the same time Lewis’s question echoes a more traditional musicological
discourse associated to “the music itself.” From such a perspective, one could argue
that “sound” as such hardly exists in the sense that it can “tell us” anything about the
afrofuture—or, for what it is worth, any other future. The sound here is inscribed in con-
texts where, for example “dress, visual iconography, witty enigmas, or suggestive song
titles,” are part and parcel of what is heard. This, not least, is in particular the case with
music (“songs”) including lyrics. If the claim is that lyrics, including the semantic con-
tent, are not a part of the sound, this is difficult to uphold. With these considerations in
mind, however, there are still good reasons to think along the lines Lewis suggests,
exploring, in a heuristic sense, what “sound” can open up in an arguably narrower sense
than I described earlier. And then, perhaps, add the contextual dimensions afterward.
What I am arguing for, then, is a change of perspective, and I think this is one possible
reading of Lewis’s question.
The caveat I introduce, which at first feels necessary for me, is not necessarily fair
with regard to Lewis’s discussion. While the question’s focus on “sound,” and the explicit
exclusion of “dress, visual iconography, witty enigmas, or suggestive song titles,” seems
Afrofuturism and the Sounds of the Future 615
to argue for something close to a “sound itself,” this is almost immediately challenged by
Lewis himself, when he argues for broadening the conversation:
Broadening the conversation would allow a wider range of theorizing about the
triad of blackness, sound, and technology; for a start one could interrupt the male-
ness of the afrofuturist music canon with artists such as Pamela Z, DJ Mutamassik,
Mendi Obadike, Shirley Scott, Dorothy Donegan, the Minnie Riperton/Charles
Stepney/Rotary Connection collaborations, and more. Going further, removing
the putative proscription on nonpopular music allows us to take a more nuanced
complex view of the choices on offer for black technological engagement.
(Lewis 2008, 142)
In particular, I am occupied with what he calls “the triad of blackness, sound, and
technology,” as this triad brings us close to dimensions in the definition of “Afrofuturism.”
The cultural critic Mark Dery coined the term in his interview-article “Black to the
Future.”3 The article primarily comprises interviews with Samuel Delany, Greg Tate, and
Tricia Rose, but in the introduction Dery asks about the near absence of African
American science fiction writers. The existence of such would be logical, he claims, and
later authors have argued that the African American experience in a sense is science
fiction. Dery’s definition has become canonical:
(and I will add something to this toward the end of the chapter), and academic and
activist publications dealing with afrofuturist themes are becoming more common.
In other words, there are few signs that we are really “after” Afrofuturism (although
this depends on what is meant by “after”—according to Sun Ra we are “after the end of
the world”). So, while Lewis might not want to engage the term “Afrofuturism,” his
discussion of the triad of blackness, sound, and technology is of importance for the
dimensions I am occupied with in this chapter.
Lewis’s suggestions for broadening the conversation are to the point, but “sound” is not
any longer isolated. It is part of “the triad of blackness, sound, and technology.” Why is it
that “blackness” should be a term on another level than dress? Or why does Lewis
approve of technology but seemingly not of suggestive song titles? For the second
question the answer should be obvious: technology is a means of producing—and
manipulating—the sound; it is, in other words, implied in the sound, not something
external to it. Similar arguments could be made for the other “extra-musical” dimensions,
but this fact does not take away the validity of the argument in point.
“Blackness,” on the other hand, is in this context a trickier notion, but one that could
be solved by claiming that blackness itself is a technology. An example of such an
understanding is found in Ytasha Womack’s Afrofuturism in a statement from Cauleen
Smith: “When I met artist and filmmaker Cauleen Smith in July 2011, she best summed
up race as creation: ‘Blackness is a technology,’ said Smith. ‘It’s not real. It’s a thing’ ”
(Womack 2013, 27). Note the “unreality” of blackness in this statement, a kind of echo of
Sun Ra’s myth. Cauleen Smith is also the filmmaker behind the Solar Flare Arkestra
Marching Band Project, where, in 2010, she directed a form of flash mob in Chicago,
including a marching band playing Sun Ra’s “Space Is the Place.”4 There are, then,
relations between Smith’s aesthetic practices and her work in understanding the back-
ground for her films, with echoes of Sun Ra and his Chicago days as an important part.
In claiming that blackness is a technology, and adding that “it’s a thing,” Smith points
to some of the complex historical trajectories needing to be addressed to get a full
understanding of what blackness can be said to be—past, present, and future. Within
the discourse of Afrofuturism, one particular discussion has been the absence of people
of color in the imagined futures of science fiction and fantasy. Connected to science
fiction, this is in particular a question about the future, but given that science fiction
more often than not is understood as a distorted notion of the present, it simultaneously
opens up a different perspective on the present. Fantasy arguably can equally well be
about the past; but here another thread is found too, in that Afrofuturism questions the
past as well as the future. The most obvious example is found in Sun Ra’s reference to
Ancient Egypt, where he claims a different understanding of and afterlife in Ancient
Egypt. In his understanding, Egypt was, and still is, unmistakably Africa, and it is the
past, and the past greatness of Egypt, that is his main focus.5 Here he follows
Afrofuturism and the Sounds of the Future 617
George G. M. James’s Stolen Legacy, first published in 1954, a book claiming that Greek
philosophy, and thus, in a sense, European thinking, is stolen from Egypt, is manipulated,
and its origin erased. This erasure continues throughout European thinking, as an
erasure of race, as making universal a certain European understanding of the world.
Given the history of blacks in the United States or, to broaden the understanding even
more while simultaneously quoting the title of Sun Ra’s lecture series at the University of
California, Berkeley, in 1971, given the place of “The Black Man in the Cosmos,” this
European understanding has demonstrably led to a hierarchical understanding of race
as well as of history. But, as Sun Ra says, “History is only his story; you haven’t heard
my story yet” (in the film Sun Ra: A Joyful Noise from 1980, directed by Robert Mugge).
And Sun Ra’s story is a revisionist story, about another kind of origin, in Ancient Egypt,
as a technological civilization, the pyramids testifying to this. But with the Middle
Passage, and with the history of slavery, blacks were not included in the category of
human beings; they were “things.” As Fred Moten opens his In the Break: The Aesthetics
of the Black Radical Tradition: “The history of blackness is testament to the fact that
objects can and do resist” (Moten 2003, 1). Moten’s argument, that blacks were objects,
things, commodities, fits with the history of slavery and, with the abolition of slavery
and until the civil rights movement, a fight for inclusion in the category “human” was
important for the black population in the United States.
One thread within the afrofuturist discourse, arguably most plainly present in Eshun’s
writing, seems to argue that this inclusion did not happen, and that another solution was
found in going beyond the human to some kind of super- or posthuman existence that
should be followed with leaving the planet behind and beginning a black civilization on
a distant planet in outer space. The rationale for this thought seems to be the continuous
presence of white supremacy and racism, a presence continuing after the civil rights
movement’s victories beginning in the 1960s.
What would it mean to say that blackness is a technology? One possibility is to go
along the way of posthuman theory that references different forms of enhancement, for
example, to discuss the body in relation to technology. This seems to be in accordance
with Dery’s definition of Afrofuturism where he writes about “a prosthetically enhanced
future” (Dery 1994, 180). Another angle on the same phenomenon is Lewis’s distinction
between “prosthetic” and “incarnative”—an opposition he takes from Doris Lessing.
In Lewis’s article, it is related to how “a largely prosthetic technological imaginary” is
said to dominate Dery’s references in his writings about Afrofuturism (Lewis 2008, 139);
this criticism highlights relations between the body and technology other than enhance-
ment. In another article, about Pamela Z, Lewis writes:
Here, it is as if the incarnative is a way of moving “past the prosthetic readings,” another
use of technology. That it is “fundamentally rhythmic” is of interest for the sounds being
the result of these interactions between body and technology, as it is also of interest for
understanding the “Afrological” dimensions of music found in Lewis’s thinking, not
least in his important article “Improvised Music after 1950.”
In More Brilliant than the Sun, Kodwo Eshun writes about the music of the future as
“traditionally” being “beatless.” It is, he adds, “weightless, transcendent, neatly con-
verging with online disembodiment” (Eshun 1999, 67). His examples are an interest-
ing mixture: Gustav Holst’s The Planets (written between 1914 and 1916), Brian Eno’s Apollo
soundtrack (1983), and Vangelis’s soundtrack to Blade Runner (1982, directed by Ridley
Scott). “Sonically speaking,” he writes, they are not more futuristic than the Titanic
and are “nothing but updated examples of an 18th C sublime” (Eshun 1999, 67). There
are important dimensions to this understanding but, underlying it all, there are some
fundamental questions that need to be addressed. When Eshun writes about “beatless”
music, I, in one sense, could not agree more. And related both to Sun Ra and to the
afrofuturist tradition (if we can call it a tradition), there is clearly some kind of focus on
“the beat.” Here, however, beat must also be understood as rhythm in a more general
sense and what needs to be addressed is how Eshun’s other examples relate to rhythm. In
other words, in what sense is “beatless” music rhythmic? Obviously, nonrhythmic music
does not exist, as rhythm is a way of organizing time and temporality in the sonic material
of music. “Beat,” however, is something different. When Eshun introduces the notion
of weightlessness and transcendence, and compares it with “online disembodiment,”
he is, by contrast, very close to a discussion of a dichotomy between “headmusic” and
“bodymusic”—this discussion, in consequence, would claim a transcendent position as
being disembodied in contrast to an embodied musical practice—for example, dancing.
Dance music would, understandably, focus on the beat—and would thus be one way of
contrasting the “beatlessness” of the traditional music of the future. But is this not at
the same time a simplified interpretation that cannot really be of much help here? First,
evidently some kind of dance is possible to a beatless music as well, if Holst, Eno, and
Vangelis exemplify “beatlessness.” Second, Eshun also argues that hip hop is “headmusic”
(Eshun 1999, 46) and thus is not working within this dichotomy—although, because
he uses concepts related to the dichotomy, it is more difficult to figure out what he is
really arguing (or using the concepts for). Third, the sonic dimension of Holst, Eno,
Vangelis, and a host of others—even if it should be the eighteenth century’s sublime
as a reference—is important in imagining the sound of the future (perhaps more
the sound of the future than the music of the future). This is not least the case with Eno
and Vangelis’s use of synthesizers. And it is not least through the use of synthesizers
Afrofuturism and the Sounds of the Future 619
that Sun Ra’s music is in a tradition of “traditionally” understood “music of the future.”
In a similar context, Lewis writes:
It is the synthesizer, then, or more broadly the use of “sound technologies” that is crucial
for understanding Ra’s music and jazz—broadly understood—in the space age or in the
electronic era.6
But how would Sun Ra’s music fit with Eshun’s description? The question would not
least relate to the beat—and Sun Ra’s relation to “beat” or “beatlessness”—on the one
hand, and his use of synthesizers on the other. But discussing these dimensions will lead
not only to the eighteenth century’s sublime but also to any other understandings of the
music of the future (or the sound of the future). The importance of synthesizers for Sun
Ra’s sonic future cannot be overstated. He was one of the first pianists to explore elec-
tronic keyboards, and these keyboards are key for him in constructing his version of the
music of the future. In some examples, the use of synthesizers is not that different from
Brian Eno or Vangelis while, in other examples, Sun Ra explores the keyboards more as
noise-creators in the tradition of academic or nonpopular electronic music. Here, the
music and vibrations are different, and Sun Ra bends, for example, the Moog synthesizer
to previously unheard-of sounds, as for example on “Outer Space Employment Agency”
from the 1973 album Concert for the Comet Kohoutek that morphs into a version of
“Space Is the Place” (cf., Langguth 2010, 152).
Understanding the synthesizers as related to the future, and thus to history, is not
very surprising and might be seen to be in line with developments within the avant-garde
of nonpopular music. Following Eshun’s take on the tradition of the music of the future
as “beatless,” these synthesizers can also be used within the tradition, as both Eno
and Vangelis would be examples of. The change, it would seem, would be whether or
not “beat” is central to the sound. Simultaneously, perhaps the synthesizers could be seen
as an axis of negotiation between different understandings of the music of the future.
As Eshun writes, “Whoever controls the synthesizer controls the sound of the future, by
evoking aliens” (Eshun 1999, 160). When read in the context of Dery’s understanding of
Afrofuturism, Eshun’s statement seems to echo a quote from George Orwell’s Nineteen
Eighty-Four, which Dery uses as the epigram to his article: “If all records told the same
tale—then the lie passed into history and became truth. ‘Who controls the past,’ ran
the Party slogan, ‘controls the future: who controls the present controls the past’ ”
(Orwell [1949] 2003, 40).
Controlling the different modalities of time—the past, the present, and the future—is
a constant negotiation of tales as well as of technologies. The synthesizer becomes a
control-board not only to the sounds of the future but also to the sounds of the future’s
620 ERIK STEINSKOG
past and the past’s future. The timelessness of synthesizer-sounds is a way of manipulating
the sound waves and the vibrations in relation to, or in contrast to, the dominating tales
of how the futures are supposed to sound.
Dery’s discussion of time and history is related to a major difference between the
normative understanding of history known from Europe, and a question arising whether
this same understanding makes sense within an African American context. As he asks
in a timely manner: “The notion of Afrofuturism gives rise to a troubling antinomy:
Can a community whose past has been deliberately rubbed out, and whose energies
have subsequently been consumed by the search for legible traces of its history, imagine
possible futures?” (Dery 1994, 180). In other words, the past is a necessary component in
imagining the future. If the past is lost or erased it will have to be recreated as a means to
perceive a future at all. And if Orwell’s party-slogan is followed, this past is a result of
controlling the present. Sun Ra’s intervention in the present and the sounds he makes—
alone or with the Arkestra—is giving sound to an intersection of the present, the past,
and the future, and understanding the future—imagining the future—is thus intimately
related to all other modalities of time.
The synthesizer, then, is deeply embedded in the temporalities of sound, including
the sound of the future but there are two other important dimensions to Eshun’s quote
cited earlier: the reference to “control,” and the reference to “aliens.” Controlling the
synthesizer is more than playing it, it is also a matter of programming the sounds—or,
rather, to work with the sounds themselves rather than simply making audible the
default sounds of the synthesizer. This, obviously, is of prime importance when industry-
standard sounds became the norm in popular music.
One update of Sun Ra that Eshun focuses on is the Jonzun Crew’s album Lost in Space
(1983), in particular the track “Space Is the Place.” With this title, the Sun Ra reference is
apparent, but Eshun’s focus is on the alterations of the voice: “On Jonzun Crew’s Space is
the Place, the Arkestral chant becomes a warning blast rigid with Vadervoltage. Instead
of using synthesiser tones to emulate string quartets, Electro deploys them inorganically,
unmusically” (Eshun 1999, 80). For Eshun, the significance of the vocoder-voice is that
the voice is turned into a synthesizer and, as such, the voice is synthesized too or, one
could argue, it is dehumanized. What terms to use, however, will also question how one
thinks about “music,” “voice,” and so forth. When Eshun claims that the synthesizers are
used inorganically, it is not necessarily a negative judgment. Rather, it should be seen as
an extension of Eshun’s writing about the movement from the human to the posthuman.
In that sense, “dehumanizing” would be wrong too, as in relation to black music the very
notion of “the human” is very much at stake.
The focus on the vocoder and its relation to a black posthumanism is also found in
Alexander Weheliye’s article “Feenin,” where Weheliye claims Eshun as “the foremost
theorist of a specifically black posthumanity.” This is in contrast to the then emerging
theories of the posthuman (in the aftermath of, not least N. Katherine Hayles), showing the
“literal and virtual whiteness of cyber-theory” (Weheliye 2002, 21), thus potentially
erasing people of color from posthumanity. In Weheliye’s point of view, an important
Afrofuturism and the Sounds of the Future 621
way to alter this discourse, and to engage black cultural production, is “to realign the
hegemony of visual media in academic considerations of virtuality by shifting the
emphasis to the aural” (21); “Incorporating other informational media, such as sound
technologies, counteracts the marginalization of race rather than rehashing the whiteness,
masculinity and disembodiment of cybernetics and informatics” (25). Weheliye’s focus
is the vocoder, “a speech-synthesizing device that renders the human voice robotic, in
R&B, since the audibly machinic black voice amplifies the vexed interstices of race,
sound, and technology” (22). These interstices—the places where race, sound, and tech-
nology meet—question the place of blackness within cybertheory but, at the same time,
relate to what Lewis discusses when interrogating “the triad of blackness, sound, and
technology” (Lewis 2008, 142). The vocoder is a part of this triad in a very particular
sense, given that the technologization of the voice contributes to a different take on “the
human” and of blackness.
Simultaneously, going back to the Jonzun Crew highlights another dimension of
“the music of the future.” While the mechanical, robot-like voices heard on this track
sound like science fiction—and the long tradition of speaking robots or aliens from
HAL in 2001 to Samantha in HER—it is also the sound of a particular, historical
understanding of this inhuman sound. With HAL, the robotic is hearable, whereas
Samantha sounds like a regular female voice and her artificiality is impossible to hear. A
similar argument can be made for Janelle Monáe, whose alter ego Cindi Mayweather is
supposed to be an android, but whose singing voice is identical with Monáe’s (cf.,
Steinskog forthcoming). Monáe’s overall concepts for her albums, including the perfor-
mance of the android, is thus one half of the story of the future in/human voices, where
the other half, arguably, is the autotuned or technologically modified voices. The
vocoderized voices of Jonzun Crew belongs to the second half of this same imagination,
and shows us one of the past’s imaginations of (another) future.
Sonic Fiction
Fiction is not the same as “imagination” but in this scheme of things there are definitely
relations. If we are on the other side of time, or if music is a kind of prophecy, a sonic
imaginary of the future, then a sonic fiction can be about the sound of this nonheard (or
yet-unheard) music. There is a paradox in all these formulations in that “imagination,” in
its linguistic root, seems to point to the sense of vision. Thinking the sonic imaginary—
despite the linguistic paradox—is necessary for the sound of the future to be present.
But this, at the same time, also relates to one of the key questions of science fiction:
whether it is about the (or a) future or whether it primarily is a slightly distorted picture
of today. Both these understandings make sense in relation to science fiction, but they
are still important in trying to be precise in analyzing what we are doing. And even in
the stories of the future (rather than the present), these stories are about some future
622 ERIK STEINSKOG
imagined from the point of view of the here and now. When it comes to music, including
Attali’s music as prophecy, the means of production are obviously found here and now
too including, not least, the sound-producing devices.
There can be little doubt about the lasting influence of Sun Ra. This is not only because
the Arkestra is still touring—decades after Sun Ra “left the planet”—although that
understandably plays a role, but is also because of the importance of Sun Ra within
Afrofuturism as well as his importance across a spectrum of artists using elements of
Sun Ra’s music or thinking or simply expanding on his aesthetics. One could make an
important case for an argument about a (musical) continuity of Sun Ra influences going
back at least to Parliament/Funkadelic, but rather than such a discussion of history, I
want to end this chapter with some contemporary examples in a musical vein that can be
said to be a continuation of Eshun’s more “canonical” Afrofuturism.7 Eshun’s narrative
of afrofuturist music—or black sonic fiction—is more in line with a classical avant-garde
discourse that more or less excludes “popular music.”8
Transmolecularization—
Beyond Sun Ra
While Jonzun Crew updated Sun Ra for the 1980s, and while it might sound dated today,
there are many contemporary musicians doing different takes on the Sun Ra legacy too.
Related to genre, many of them are best thought of under the vague umbrella-term
“electronica” but there are good reasons to discuss them in relation to updated versions
of Afrofuturism. In that sense, they might be seen as challenging Lewis’s understanding
that we should be “after Afrofuturism.” I have already mentioned Janelle Monáe, but my
focus here will be four other musicians who are DJs or producers: Ras G., Kirk Knight,
Flying Lotus, and Hieroglyphic Being. Much of the current music understood as afrofu-
turist is sample-based, opening up for other ways of making relations, including those
that are historical. Communicating with samples is an inherent part of hip hop aesthetics
and it is also related to quotes and other ways of citing earlier music and performances
in instrument-based music; with samples, though, the signifying processes are different.
At the same time, such a practice is undoubtedly a use of technology, opening up for
another angle in the triad of blackness, sound, and technology.
In the music of Ras G. (born Gregory Shorter Jr. in 1979 or 1980)—often recording
under the name Ras G. & the Afrikan Space Program—such sampling practices are
found not only when it comes to titles and references (in what used to be understood as
the extra-musical) but also in the musical sounds. Take the track “Astrohood” from the
album Brotha from Another Planet (2009) where he samples from Sun Ra’s “I’ll Wait for
You” from the album Strange Celestial Road (1980). The singing voices in Sun Ra’s track
are overtaken by electronic sounds—similar to the sounds/noises of computer games—
before a beat is introduced and later followed by what is almost Ras G’s signature—voices
Afrofuturism and the Sounds of the Future 623
shouting “Oh Ras” with a heavy echo to it. Sun Ra’s song is groovy with a bass vamp
leading into call-and-response voices, and it is these voices Ras chooses to sample,
rather than the bass groove or Sun Ra’s discrete synthesizer sounds. However, one would
define the generic differences between the two tracks as a transformation from a more
or less funky bass dominating the sounds to the electronic sounds dominating Ras’s
track. If we were to compare the two tracks the difference in length would play a role.
“Astrohood” is short, only 1:55, whereas “I’ll Wait for You” is sixteen minutes and the
latter develops into a jam where, under the saxophone solo, Sun Ra is exploring the
noisier spectrum of his synthesizers.
On the track “Natural Melanin Being . . . ” from Back on the Planet (2013), Ras G.
instead samples Sun Ra from an interview where he speaks about natural blackness
as well as about Ancient Egypt. Everything in-between and around Sun Ra’s voice is
Ras G’s electronic sounds. The electronic sounds are layers of samples, with sonic refer-
ences across decades of music. In that sense, another version of “the other side of time” is
presented, a time where the past is potential for recreation and revision and, as such, a
technological parallel to the understanding of history Sun Ra seems to relate to. On both
these albums there are also references to Sun Ra in the aesthetics of the album covers
and in the titles; so, in that sense, one would have to say it is a whole aesthetic rather than
simply a sonic ideal.
With Kirk Knight’s “Start Running,” the opening track of Late Night Special (2015),
Sun Ra’s voice is heard again, this time with the famous words from the opening of
Space Is the Place. The first sounds on the albums are Sun Ra’s voice saying “teleportation,
transmolecularization, or better still, teleport the whole planet here through music.”
After “better still” the rapper comes in, rapping over the rest of the still audible words of
Sun Ra, moving into a contemporary alternative hip-hop track. Toward the end of the
track the voice of Sun Ra returns, saying, “the music is different here” and so on. Knight
thus clearly signifies on Sun Ra’s statements and in a particular sense can be said to
attempt, for the rest of the album, to present this “different music,” again re-inscribing
African American music in a process of teleporting the planet. The sonic environment
around the first Sun Ra sample, however, is more related to Alice Coltrane than Sun Ra.
A sweeping harp rather than synthesizers, and so another mode of combining acoustic
instruments and electronics. With the harp and the Alice Coltrane references the track
is closer to Flying Lotus than to Ras G., and one track on Knight’s album, “Dead Friends,”
features Thundercat—Stephen Bruner—who also collaborated with Flying Lotus.
Flying Lotus (born Steven Ellison, 1983) is the grandnephew of Alice Coltrane and
the son of Marilyn McLeod. Both these relations are often referenced within his music,
the first with attention to the spatial and spiritual dimensions in his music, the second with
reference to more traditional popular music and to Motown (McLeod wrote, among other
songs, Diana Ross’s “Love Hangover”). Flying Lotus’s track “Transmolecularization” is
an outtake from You’re Dead! and features Kamasi Washington on saxophone and is a
track first played on his BBC Radio 1 sessions (May 14, 2015).9 The title of this track is
a clear reference to Sun Ra, both to the opening of Space Is the Place, and to a particular
scene in the film, from the Outer Space Employment Agency in Space is the Place where
624 ERIK STEINSKOG
between different understandings of improvisation. Finally, and this is probably the most
relevant for the current discussion, there is a way of understanding the questions of
rhythm and beat.
From this, an argument can be traced back to Sun Ra and to the music of the future
that is related not only to beat and rhythm but, simultaneously, to how the music of the
black future is always also related to reimaginations, reinterpretations, and revisions
of the past (cf., Lock 1999). Here the history of music that Eshun relates is made more
complex by a constant intertwinement of different pasts and their respective futures,
where, in the case of Ras G., Kirk Knight, Flying Lotus, and Hieroglyphic Being, sam-
plers, turntables, computers, and mixers substitute for Sun Ra’s synthesizers, both as a
continuation and as a renegotiation of the history of black music. Imagining the future
of blackness thus becomes equally as much imagining the unheard of as remixing and
renegotiating the past. The future and the past intertwine—continually—in the present
of the sounding music, being multidirectional rather than linear, but pushing the sounds
into other worlds.
Notes
1. https://www.youtube.com/watch?v=4s8VZz-ERO0. Accessed May 15, 2017.
2. “Sonic fantastic” might be seen as one possible dimension of what Richard Iton calls “the
Black fantastic” (2008).
3. The article is found in Dery’s edited volume Flame Wars: The Discourse of Cyberculture
from 1994. With one exception, the whole volume was first published as volume 92,
number 4 of the South Atlantic Quarterly (in 1993).
4. https://www.youtube.com/watch?v=WvcXwtqQ5ME. Accessed May 15, 2017.
5. Recent DNA-research argues that ancient Egyptians were more genetically similar with
people from the eastern Mediterranean than people in modern-day Egypt. https://www.
livescience.com/59410-ancient-egyptian-mummy-dna-sequenced.html Accessed June 14,
2017.
6. Lewis also references George Russell’s Jazz in the Space Age (1960) and Electronic Sonata
for Souls Loved by Nature (1968).
7. This means that I am excluding examples drawn from what is arguably a more main-
stream contemporary music, such as Janelle Monáe (her references to Sun Ra in the video
to “Tightrope,” for example).
8. This criticism has been raised by several authors, including myself, and should be taken
seriously when considering Afrofuturism at large. However, related to the sounds of the
future, it still makes sense to discuss this same avant-garde logic in its own right. Focusing
here, then, does not remove the importance of a more mainstream Afrofuturism as well.
9. In addition to the mentioned tracks, the term “transmolecularization” is also used by
Eagle Nebula on the track “Nebulizer” from her EP Space Goddess (2015).
10. https://www.youtube.com/watch?v=iDwn0lsxDGg. Accessed May 15, 2017.
11. One could also refer to Martin Bernal’s Black Athena, given that Bernal establishes arguments
for observing the effects of such a “revision” throughout European intellectual history.
Afrofuturism and the Sounds of the Future 627
References
Attali, J. 1985. Noise: The Political Economy of Music. Minneapolis: University of Minnesota Press.
Dery, M. 1994. Black to the Future: Interviews with Samuel R. Delany, Greg Tate, and Tricia
Rose. In Flame Wars: The Discourse on Cyberculture, edited by M. Dery, 179–222. Durham,
NC: Duke University Press.
Eshun, K. 1999. More Brilliant than the Sun: Adventures in Sonic Fiction. London: Quartet
Books.
Goodman, S. 2009. Sonic Warfare: Sound, Affect, and the Ecology of Fear. Cambridge, MA:
MIT Press.
Iton, R. 2008. In Search of the Black Fantastic: Politics and Popular Culture in the Post-Civil
Rights Era. Oxford: Oxford University Press.
James, G. G. M. (1954) 2008. Stolen Legacy. New York: Wilder.
Jay, M. 1993. Downcast Eyes: The Denigration of Vision in Twentieth-Century French Thought.
Berkeley: University of California Press.
Langguth, J. J. 2010. Proposing an Alter-Destiny: Science Fiction in the Art and Music of Sun
Ra. In Sounds of the Future: Essays on Music in Science Fiction Film, edited by M. J. Bartkowiak,
148–161. Jefferson, NC: McFarland.
Lewis, G. E. 1996. Improvised Music after 1950: Afrological and Eurological Perspectives.
Black Music Research Journal 16 (1): 91–122.
Lewis, G. E. 2007. The Virtual Discourses of Pamela Z. Journal of the Society for American
Music 1 (1): 57–77.
Lewis, G. E. 2008. Foreword: After Afrofuturism. Journal of the Society for American Music
2 (2): 139–153.
Lock, G. 1999. Blutopia: Visions of the Future and Revisions of the Past in the Work of Sun Ra,
Duke Ellington, and Anthony Braxton. Durham, NC: Duke University Press.
Moten, F. 2003. In the Break: The Aesthetics of the Black Radical Tradition. Minneapolis:
University of Minnesota Press.
Orwell, G. [1949] 2003. Nineteen Eighty-Four. London: Penguin.
Steinskog, E. 2011. Hunting High and Low: Duke Ellington’s Peer Gynt Suite. In Music and
Identity in Norway and Beyond: Essays Commemorating Edvard Grieg the Humanist, edited
by T. Solomon. Bergen: Fagbokforlaget, 167–184.
Steinskog, E. forthcoming 2019. Metropolis 2.0: Janelle Monáe’s recycling of Fritz Lang. In
Afrofuturism 2.0: The Black Speculative Art Movement, edited by R. Anderson and C. Fluker.
Lanham, MD: Lexington Books.
Weheliye, A. G. 2002. Feenin’: Posthuman Voices in Contemporary Black Popular Music.
Social Text 20 (2): 21–47.
Womack, Y. L. 2013. Afrofuturism: The World of Black Sci-Fi and Fantasy Culture. Chicago:
Lawrence Hill Books.
Zuberi, N. 2004. The Transmolecularisation of [Black] Folk: Space Is the Place, Sun Ra and
Afrofuturism. In Off the Planet: Music, Sound and Science Fiction Cinema, edited by
P. Hayward, 77–95. Eastleigh: John Libbey.
chapter 31
Jason R. D’Aoust
Introduction
When we think of the voice from the perspective of sound and imagination, a familiar
observation comes to mind: the voice is a series of phonatory sounds we emit (as in
speech, screams, and songs), but also their interior manifestation in our mind’s ear.
The experience of hearing a voice WHEN we think, read, and write leads us to think of
voices as dual in nature, namely through their inner and outer manifestations, but the
interrelation of the two is more complex than it appears. Our seemingly innate inner
voice gives us the impression that our interiority precedes any exteriorization, and
thereby establishes a hierarchy in communication. In identifying inner and outer voices
as two sides of the same coin, we come to believe that speech and song are the material-
ized expression of our inner voice. Artistic practice can reinforce this point of view.
Eileen Farrell, for example, has commented on how the imagination plays an important
part in vocal performance: rather than focus on the manipulation of larynx, pharynx,
and resonators, successful artists concentrate instead on imagining the pitch, texture,
and tone of the vocal line they then instantly create in performance (Farrell 1993). This
performance practice defines the sonorous imagination as an active agent that forms
sounds in the inner ear before they are vocally expressed and manifested. These obser-
vations might also implicitly convey a dualist perception that vocal expression is material
and the inner voice is not. Such a way of understanding the voice often turns out to
support or be supported by metaphysical explanations of the physical world. A meta-
physical worldview purports that there are immaterial principles (like our identity with
our inner voice), which nevertheless have the creative force to organize the material
world. For the last half-century, however, critical theory has opposed this way of organizing
630 jason r. d’aoust
knowledge about, but especially through, the voice. Poststructuralist concerns like the
death of the author and the Derridean writing of différance oppose biographical criti-
cism, because as the latter speaks for the author’s voice, it leads to a paucity of diverging
interpretations and points of view.
This chapter examines these critical intersections of voice, sound, and imagination in
order to situate them within studies on posthumanism. Many posthumanist theorists
discuss the voice, or problems related to it, with the intent of displacing certain assump-
tions about subjectivity or self-presence. This way of writing about the voice ties in with
earlier critical theory in which the voice was criticized for transmitting notions of
identity. As a point of departure into understanding the discursive implications of the
posthumanist appraisal of vocality, I start by giving background to the phonocentric
critique of voice. I then turn to the recent reappraisal of voice by criticism of videocen-
trism and to theorists who are interested in the voice’s epistemic purchase, insofar as it
can create a discursive space around vocal embodiment and the voice’s materiality. The
following section brings this critical discussion to bear on the posthumanist reception
of opera. I discuss how theorists have visited the history of opera in order to compare the
genre to philosophical discourse for rhetorical purposes, but not necessarily to revise
the discursive flattening of the expressive voice. Opera studies have, so far, shied away
from engaging with posthumanism. I therefore draw on the musicological reception of
opera’s many voices, in order to deconstruct the assumptions made in the name of the
“operatic voice.”
In What Is Posthumanism?, Cary Wolfe situates the problem of thinking of the human,
and, by extension, of humanism, within the larger problem of the multiplicity of living
consciousness. His book asks of us
For Wolfe, posthumanism is predicated on our species’ awareness that other species are
not only sentient, but that their consciousness creates different worlds, knowledge of
which should also further our understanding of the human animal. His approach relies
on debunking presuppositions about language that unwittingly convey remnants of a
metaphysical worldview in which humans claim ownership of, or stewardship over,
posthumanist voices in literature and opera 631
other living beings. In doing so, he furthers Jacques Derrida’s attention to autoaffection
by connecting it with autopoiesis, a relatively new term initially borrowed from biology
by communication studies (Maturana and Varela 1980). In Wolfe’s argument, autopoiesis
acts as a benchmark with which to compare different animal experiences of the world,
including that of the human species. More importantly, the evolutionary inheritance
of autopoiesis should ethically require from us greater critical attention to implied or
unwitting value judgments we make when we compare other forms of animal communi-
cation to human linguistics.
What is autopoiesis? Poiesis is borrowed from the Greek and in its literary sense
means “the creative production, especially of a work of art”; but when used as a suffix, its
literal translation denotes “the formation or production of something.”1 Biologists have
used the combined form to describe the “self-maintenance of an organized entity
through its own internal process” (Oxford English Dictionary); therefore, an “auto-
poietic system is one that produces itself ” (Buchanan 2010). Autopoiesis was introduced
to communication studies when Niklas Luhmann made it a key concept in systems
theory in order to argue that a system of communication does not precede its given
social space (Luhmann 2010; Wolfe 2010, 3–29). This biological insight into commu-
nication implies that human consciousness through language is a matter of animal
evolution, elements of which could very well be shared with other species. In turn, the
“autopoeitic ways” of Wolfe’s theorization are interesting to thinkers of the expressive
voice and vocality, because they might further dislodge the function of voice as the
metaphysical guardian of self-presence.
The inner voice, though it seems innate to most of us, is not a clean slate. Derrida’s
criticism of the autoaffection of the voice-as-presence is a key moment for Wolfe, because
it moves away from the “self-presence of consciousness” toward writing qua trace as
“fundamentally ahuman or even anti-human” (Wolfe 2010, 6). It is less clear, however, if
the sonorous voice’s past associations with humanist identity mark it as a phenomenon
to be discarded in Wolfe’s argument. As Don Ihde remarks in Listening and Voice,
Voice is, for us humans, a very central phenomenon. It bears our language without
which we would perceive differently. Yet outwards from this center, voice may also
be a perspective, a metaphor, by which we understand part of the world itself.
(Ihde 2007, 189)
Like Wolfe, Ihde is aware that our vocal experience of language and the world presents
the problem of “domesticating it into our constant interpretation that centers us in the
world” (Ihde 2007, 186). Can greater attention to the musicality or sonority of voice make
us further aware of the distance we impose on the world’s sounds through language?
I will shortly discuss how Wolfe arrives at the sonorous voice by way of opera and how his
sources discuss opera by way of an “operatic voice.” This chiasmic construction (opera-
voice/voice-opera) might give the impression of canceling itself out and of being of little
consequence, but it gestures toward a conflation that assigns the sonorous voice to a
genre whose aesthetic diversity is thereby greatly reduced. However, before I arrive at
632 jason r. d’aoust
this posthumanist stance on opera qua “operatic voice,” I will consider what the voice
means for philosophers and critical theorists.
For two millennia, Western philosophy has claimed the voice as the linguistic
medium of human reason and, by extension, proof of the primacy of humans over other
species lacking in language and reason. To understand the ramifications of this tradition
on current work about the voice, we may look to Heidegger’s historical survey of the
voice in “The Concept of the Logos” (Heidegger 1962, 55–58) or look back to neo-Platonist
definitions of voice (Mansfeld 2005).2 Ultimately, the search for an ever-receding origin
of the voice is not only impossible but also counterproductive. Indeed, “by avoiding
tales of origins, we are closer to a possible answer. For, whatever else the voices of
language may be, at the center where we are, they are rich, multidimensioned and filled
with as yet unexplored possibilities” (Ihde 2007, 194). For our purposes, however, let us
make Derrida’s first publications our point of departure.
When Derrida, in Speech and Phenomena, discusses the “expressive voice” in relation
to Edmund Husserl’s philosophy, he reproduces the latter’s terminology for the expres-
sive voice to designate our “silent interior monologue” (Spivak 1976 in Derrida 1998,
liii). This inversion of our everyday understanding of the expressive voice occurs
because Husserl, “being interested in language only within the compass of rationality,
determining the logos from logic . . . determined the essence of language by taking the
logical as its telos or norm” (Derrida 1973, 8). In order for language to hold any truth-
value, it had to be logically consequential in its assertions about itself. How does this
logical search for truth through language silence the expressive voice?
One way of verifying whether or not language can achieve this logical exactitude is to
put the terms it uses to the test of translation. Derrida underlines a lack of categorization
in the French translation of Husserl, because it systematically rendered Bedeutung
into the French signification. He notices the lack of terminological choice in French to
express a difference between the German terms Sinn (sense, signification) and Bedeutung
(meaning, signification), and argues that a lack of linguistic equivalencies should not
erase the differences in experience they point out. As Derrida remarks, for Husserl
“meaning [Bedeutung] is reserved for the content in the ideal sense of verbal expression,
spoken language, while sense (Sinn) covers the whole noematic sphere right down to its
nonexpressive stratum” (Derrida 1973, 19). Meaning is the result of an interpretation
(Deutung) that should be reserved for communication relying on the expression
(Ausdruck) of speech (Rede). Sense (Sinn, signification), on the other hand, although it is
always conveyed by expressive speech, may also be indicated (Anzeichen) through
nonlinguistic means. Yet, for Husserl “meaning (bedeuten)—in communicative speech
(in mitteilender Rede)—is always interwoven (verflochten) with such an indicative relation”
(in Derrida 1973, 20). One should pause here and note how the indicative musical
characteristics of speech that also make sense—such as pitch, tone, rhythm, and velocity—
are silenced in this logic of communication. For now, however, let us continue and
examine how expression (Ausdruck), although it denotes an outward push, neverthe-
less loses its phonation, as the expressive voice gets turned into the voice of our “silent
interior monologue” (Spivak in Derrida 1998, liii).
posthumanist voices in literature and opera 633
expression indicates a content forever hidden from intuition, that is, from the lived
experience of another, and also because the ideal content of the meaning and spir-
ituality of expression are here united to sensibility. (Derrida 1973, 22)
Both of these problems are avoided by deviating the communicative structure of address:
the ideal addressee is no longer the person one speaks to, but part of our silent inner
voice. This silent address retains the structure of communication, however, through
the intention of the inner voice’s objective ideality—akin to the ideal reader to whom
one writes—which becomes a substitute for the external other. In other words, the sus-
pension of expressivity’s (indicative) communicating relation to an exterior addressee
is necessary in order to ensure that nothing be hidden to meaning in the ideality of
language. This silent yet expressive voice thus unites thought and language through
self-presence, but does so at the expense of a phonatory vocal act, in order to make
communication logically possible.3
Yet even this ideal voice presents a flaw. Although the voice of self-consciousness
might satisfy the requirements of autoaffection—hearing one’s inner voice, thereby
giving one a sense of self—it cannot fully express presence. This is an enduring problem
in the history of Western thought. Augustine, for example, struggles with expressing
self-presence in his Confessions. He remedies the lag in communicating his own relation
to presence (and Logos) through song because, in his view, music distends speech and
thereby elongates its enunciating present (Augustine 1998, XI: 17 ff.). Derrida, however,
follows the logic of the trace to its visual outcome.
For Derrida, Husserl’s descriptions [of retention] imply that the living present, by
always folding the recent past back into itself, by always folding memory into per-
ception, involves a difference in the very middle of it. In other words, in the very
moment, when silently I speak to myself, it must be the case that there is a miniscule
hiatus differentiating me into the speaker and into the hearer. There must be a hiatus
that differentiates me from myself, a hiatus or gap without which I would not be a
hearer as well as a speaker. This hiatus also defines the trace, a minimal repeatability.
And this hiatus, this fold of repetition, is found in the very moment of hearing-
myself-speak. Derrida stresses that “moment” or “instant” translates the German
“Augenblick,” which literally means “blink of the eye.” When Derrida stresses the
literal meaning of “Augenblick,” he is in effect “deconstructing” auditory auto-
affection into visual auto-affection. (Lawlor 2014)
The infinitesimal lag in self-presence—in English we may also use adverbs like “at once”
or “instantaneously” to translate the temporal indication of the German noun
Augenblick—is thus translated into the ocular sphere of the interstitial trace. From this
point forward, Derrida will continue to oppose logocentric literature and thought
through criticism that denounces the voice in favor of writing.
634 jason r. d’aoust
From Descartes to Hegel and in spite of all the differences that separate the different
places and moments in the structure of the epoch, God’s infinite understanding
is the other name for the logos as self-presence. The logos can be infinite and self-
present, it can be produced as auto-affection, only through the voice: an order of the
signifier by which the subject takes from itself into itself, does not borrow outside of
itself the signifier that it emits and that affects it at the same time. Such is at least the
experience—or consciousness—of the voice: of hearing (understanding)-oneself-
speak [s’entendre-parler]. That experience lives and proclaims itself as the exclusion
of writing, that is to say of the invoking of an “exterior,” “sensible,” “spatial” signifier
interrupting self-presence. (Derrida 1998, 98)
In other words, the voice fosters not only the illusion of being present to oneself, but
also the illusion of knowing or, in the case of madness, of owning the truth.4 The voice is
the preferred vehicle for meaningful speech (logos) precisely because hearing-oneself-
speak is so close to our understanding of ourselves, a fact Derrida underlines by joining
the two parts of the reflexive verb with a hyphen to form the noun s’entendre-parler.
While something of the order of the trace occurs when we hear ourselves speak,
writing, in comparison, is indifferent to our experience of consciousness. Wolfe is inter-
ested in the trace for its “a-human or anti-human potential” because of its indifference to
self-presence. Yet can the sonorous voice be of interest to posthumanist study, beyond a
distrust of its purported phonologocentrism? Can vocality further inform this inter-
stitial space of phonation and listening? Or must it be relegated to humanist concerns
for origins and ends, and express our melancholy of never knowing them? Since critical
posthumanism relies on an autopoietic benchmark that until the last century was obfus-
cated by the voice’s conflation with logos, and because, as we shall see, opera becomes
for Wolfe a stand-in for the humanist voice, I want to bring to this discussion recent
research that challenges phonologocentric criticism.
Because of its ties to autoaffection, philosophy understands the expressive voice as being
fully interiorized to the point of becoming the excluding agent of “an exterior.” However,
does the resounding voice of the singer—a voice that always sounds different from one
recording to the next, from one performance to the next, from one instant to the next—
present similar problems to critical thought?5 In other words, does a grammatological
counteraction against the autoaffective voice also account for the vocality of screams,
songs, shouting, and laughter? Can philosophy account for those expressive and musical
voices that were silenced in the name of language’s logical discourse (Nancy 2007)?
posthumanist voices in literature and opera 635
If the debt to Heidegger, while full of reservations, is explicit, then the debt to
the studies on orality—and more generally to the modern rediscovery of the voice,
if not of writing—is, however, rather deceptive. (Cavarero 2005, 213)
Cavarero argues that Derrida is critical of the voice but does not address the metamor-
phoses it underwent in order for it to continue suiting the historical developments of
visually centered metaphysical epistemologies. Cavarero suggests that Derrida does
not integrate into his framework a conception of the expressive voice because he thinks
of it as the guardian of metaphysics.6 She criticizes Derrida for failing to step back and
free the expressive voice from its ancillary inscription in discursive knowledge once he
had shown how Husserl recuperates expression as an implicit and disavowed discursive
strategy,. According to Cavarero, the project of a “philosophy of différance [ . . . ] orients the
theoretical axis in which Derrida places the theme of the voice, making it play a meta-
physical role in opposition to the antimetaphysical valence of writing” (Cavarero 2005,
220). Recall how this is precisely Wolfe’s point of departure for thinking of the trace as
a-human or antihuman. In Cavarero’s reading, Derrida’s championing of writing as
différance can also be understood as the last scene of philosophy’s historical “devocalization
of the logos” (Cavarero 2005, 33–41). In other words, the task of deconstructing the
traditional view of writing qua fallen speech might have obscured how writing constrains
representations of sonorous voices in order to elevate itself to the status of univocality.
Instead, she insists on the following: Derrida’s “metaphysical phonocentrism supplants
the far more plausible, and philologically documentable, centrality of videocentrism”
(Cavarero 2005, 222).
The argument rests on a shift in perspective and, although the gap it opens is rather
narrow—like a closing shutter—its far-reaching consequences have also been recognized
in other fields. In Sounding New Media, Frances Dyson also develops a historical analysis
of sound’s subsumption under visually based epistemologies:
sound and the speaking voice are banished from this ontological elite, not because of
their sonority, but because of what sonority represents—impermanence, instability,
change, and becoming. Through an array of epistemological gymnastics, however,
the voice is not entirely excluded (how could anyone ever say that it was?) but rather
abstracted via the oxymoronic concept of “inner speech.” (Dyson 2009, 21–22)
636 jason r. d’aoust
Lacanian theorists interested in music had already underlined similar insights into the
voice’s disruptive potential for discourse. Engaging with Plato’s remarks on music and
their influence on Augustine, psychoanalytic critics like Michel Poizat (1992) and Mladen
Dolar have associated the musical voice with the sliding of the signifier.
One can draw, from this brief and necessarily schematic survey [of the musical
voice], the tentative conclusion that the history of “logocentrism” doesn’t run quite
hand in hand with “phonocentrism,” that there is a dimension of the voice that runs
counter to self-transparency, sense, and presence: the voice against the logos, the
voice as the other of logos, its radical alterity. (Dolar 1996, 24)7
Although not intended for anti-ocular purposes, we can also turn to a philological study
of the visual metaphor of light (scintillation and illumination) in the Platonist doctrine
of the voice in order to grasp how it assigned the sonorous voice to videocentric discourse:
“in the proper sense, it is articulate voice, considered as illuminating what is thought”
(Mansfeld 2005, 359 ff.). The voice becomes trapped in a “heliotropic metaphor” that
Derrida’s reading of Phaedrus in Dissemination assigns to différance, rather than admit
the voice’s alignment with a visual order (Cavarero 2005, 223–224, 227 ff.). Cavarero
further underlines how discourse’s apparent phonocentrism only functions through a
disavowal of the visual ordering of what lies beyond perception.
The logos that is written in the soul of the one who apprehends, with science
[episteme], is precisely the devocalized logos that coincides with the visible and
mute order of ideas. . . . In effect, it is precisely the art of dialectic that functions as a
means of transmission between the world of words and the world of ideas. This art
belongs to the verbal sphere, but it belongs to it as a method for showing the insuf-
ficiency of words and at the same time, their constitutive dependency on the order
of ideas. (230–231)
instance, through the term “operatic voice.” How does such a shortcut as the “operatic
voice” affect our thinking of posthumanist vocality?
It is here, inside our minds. The most striking aspect of Wolfe’s discussion of opera is
that all his interlocutors are philosophers or theorists and none are critical musicologists.
Because of their discursive allegiances, his interlocutors come to opera with preconceived
ideas of the aesthetic voice’s discursive function. In their arguments, opera becomes a
dramaturgy of voice, and opera is therefore unwillingly reduced to a homogeneous
genre with a single type of voice, the “operatic voice.” At the beginning of a chapter
largely dedicated to opera, film, and song, Wolfe writes “sound is not voice” (Wolfe 2010,
169). Although nobody would dispute Wolfe’s assertion, he is undoubtedly cautious at
approaching the sonorous voice through opera. I have reminded readers how the
reversal of this assertion—voice is not sound—is a long-standing claim of metaphysics
in associating the voice (phoné) with speech (logos). Although Wolfe also challenges
this conception of the voice, he is weary of opera and, as we shall see, implies that its
sonorous voices should be superseded (by cinema) in a posthumanist discussion
of vocality. Is opera, from its creation to the twenty-first century, to be confined
by posthumanist theory to what we have shown is the silent repository of humanist,
metaphysical voices?
The “operatic voice” is a discursive construct of the twentieth century that has
thoroughly infiltrated general culture. Before then, people had qualified compositions,
literature, or personalities as “operatic,” but they did not see the need to describe voices
in such a manner. Opera is an art form comprising many genres that require different
voice types. Of course, there were teachers and schools to make sure that singers were up
to musical standards. In this sense, different periods have had an ideal sense of what
different voice types should be able to accomplish musically and dramatically. If certain
singers of the past were louder or more dramatic than others, the expression “the oper-
atic voice” misleads readers in assuming that opera depends on a single type of voice or
vocal style. As Gary Tomlinson explains in Metaphysical Song (1999), the genre spans
over four centuries of Western modernity in which voices were differently embodied
and represented in accordance with the prevalent discourse of a given period’s ideology.8
I am not sure exactly when the term “operatic voice” became popular with critical theorists;
however, it has musicological precedents. Already by the late 1930s, Adorno was criticizing
the reification of vocal music and its concomitant vocal fetishism (Adorno 2002).
Today’s entertainment industry ascribes an “operatic voice” to anyone who can sing
from a short list of show-stopping arias, regardless of the singer’s lack of a career in opera
houses, especially as certain of these voices are only known and admired within popular
culture’s narrow version of opera. To put it succinctly, the operatic voice is not opera and
638 jason r. d’aoust
opera is not the operatic voice. These assertions might seem obvious, but musicologists
have felt the need to underline them (Furman 1991).
Conflating the voice with an art form largely dedicated to a historical canon runs the
risk of unduly limiting its current epistemic purchase. It is then easier to claim that all
the singing voices of opera resonate today with a romantic desire to overcome our lost
unity with a bygone world. Although Wolfe, in the end, does not support the implicit
presuppositions underlining the “operatic voice,” his argument does take this generic
identity of opera at face value, which can become a hindrance for musicologists of opera
approaching posthumanist theory. The following is not meant to be overly critical, but to
provide musicologists and music historians with ways of approaching posthumanism.
Wolfe’s main argument is that opera represents something that never really existed,
namely an authentic, natural voice. In order to make this argument, he first recalls Stanley
Cavell’s identification of opera with mournful modernity and romantic skepticism.
After Descartes and Kant, skepticism names not just an epistemological problem
but a more profound and deeply ethical “loss of the world” that is coterminous
with Enlightenment modernity itself, in which the modern condition is to be
“homeless in the world” . . . For Cavell, the significance of film and of operatic voice
is located at what he calls the “crossing” of the lines of skepticism and romanti-
cism—that is to say, the juncture at which our desire for contact with the world of
things and of others . . . is crossed by our knowledge that we are profoundly and
permanently isolated. (Wolfe 2010, 172)
For Cavell, the history of opera has a single aesthetic project, which is characterized by
Orpheus’s Dionysian attempt at regenerating the modern world through song. Recalling
Monteverdi’s insistence in composing an alternative lieto fine to the tragic alternative
ending devised for the creation of L’Orfeo, Cavell writes of
Leaving aside whether the singing voice can only be heard by becoming intelligible in a
more than musical fashion, we must ask ourselves questions about the associations and
equations that are being made here in the name of opera qua “operatic voice.” Is the
underlying meaning of the myths of Orpheus and Dionysus—their regeneration of an
agonizing world—opera’s unconscious aesthetic goal? The psychoanalytic reception of
opera traces a similar trajectory when it argues for the singing voice’s relation to the
unconscious. From Eurydice’s echoes to Lulu’s scream, Michel Poizat (1992), as well as
Slavoj Žižek and Mladen Dolar (2002) suggest that the operatic voice is an historically
expanding sonic portal to the unconscious desires that lie beyond linguistic representation.
posthumanist voices in literature and opera 639
These models suggest that the sung voice, within the whole of modernity—understood
as “operatic”—is a stable unit of meaning. Yet important shifts in discourse change our
understanding of what is supposedly universal or natural and reveal this supposed vocal
identity to be culturally and socially constructed in different ways at different times.
For Cavell, voices in Monteverdi’s L’Orfeo participate in the operatic voice’s aesthetic
representation of our modern condition of alienation from the world. Opera would be a
reaction to our loss of world through a sonic expansion of the voice’s capacity to reach
beyond this alienation. Musicologists interested in the culture and music around
Monteverdi’s time would disagree. Nino Pirrotta, for one, claims that the sixteenth cen-
tury’s conception of poetry paved the way for its music theater: “la parola poetica è già
musica,” that is, poetic speech is already music (Pirrotta 1975, 22).9 Music here does not
extend the voice’s capacity for projection in order to reestablish a lost connection with
the world. More to the point, in this case, is the underlying principle governing the
efficacy of affect in “late Renaissance opera.” I borrow the term from Gary Tomlinson,
who demonstrates how early operas were more in tune with humanist ideals rather than
with early-modern conceptions of knowledge and subjectivity. With this in mind, it
becomes challenging to find in L’Orfeo a modern, sonic conception of the operatic voice
as a form of vocal projection. Instead, one is constantly reminded of the importance of
breath as an animating principle, not only of the singing but also of the kind of presub-
jective experience L’Orfeo conveys. Monteverdi’s opera is a celebration of music’s power
to move souls, and to do so it relies on what connected people to the cosmos and each
other in the late Renaissance, namely the life-giving breath of anima or spirit. Within the
culture that created opera, voices are not alienated from the world; rather, tense situ-
ations are harmoniously resolved through the inner workings of music’s magical power.
In other words, Cavell’s insistence on an alternative ending to the opera, in which
Apollo’s ex machina intervention puts right Orpheus’s hubris, obscures how a modern
conception of voice is inconsistent with late-Renaissance opera.
Although it is not convenient to the theories of subjective alienation qua operatic
voice to which Cavell and Žižek subscribe, aesthetic and stylistic elements lead musi-
cologists to believe that opera’s history does indeed start with a voice that is full of affect,
supported by breath, and united with the world. I do not argue against the idea that the
kind of vocality embodied in later operas does point to a desire to overcome skepticism’s
alienation in the world. Indeed, as Tomlinson remarks, since the Cartesian soul is com-
pletely immaterial the voice can no longer act as the seamless link between body and
soul, like the spirit’s animating breath. The voice becomes heavier, more material, as the
spirit dematerializes itself. It is, therefore, the voice of later Baroque and classical operas
that must deal with the soul’s alienation from the material world. Instead of L’Orfeo (1608),
we will take therefore Mozart’s The Magic Flute (1791) as our posthumanist case study.
Here we find vocality staged between binary constructions familiar to posthumanism:
human versus animal, nature versus civilization, and reason versus irrationality.
Furthermore, because we will approach this vocality through a literary text, we should
also keep in mind Garrett Stewart’s alternative to the inner voice in Reading Voices
(1990), in which suppressed physical phonation also accompanies the act of reading.
640 jason r. d’aoust
If reading involves the silent action of our whole phonatory apparatus, what are we
doing when we imagine an android’s voice? Or as Hayles puts it,
In Listening and Voice, Ihde also raises the question of the expressive voice. He devotes
a chapter to the dramaturgical voice, in which he discusses how it opposes, in a sense,
discourse’s absence or silencing of the expressive voice.
There lies within dramaturgical voice a potential power that is also elevated above
the ordinary powers of voice. Rhetoric, theater, religion, poetry, have all employed
the dramaturgical. The dramaturgical voice persuades, transforms, and arouses
humankind in its amplified sonorous significance. Yet from the beginning there is
the call to listen to the logos, and the logos is first discourse. (Ihde 2007, 168)
In this section, I attempt to circumvent this “call to listen to logos,” and pay closer attention
to the imagined vocality of androids. This will not mean, however, the total negation of
visual analysis. Like Ihde, I am aware that “we exist in a language world that is frequently
dominated by visualism,” and do not “wish to simply reduce the visual . . . to simply
enhance the auditory” (Ihde 2007, 190). There is a point of intersection of visual and
auditory communication that humans share with other animals, namely mimicry.
There is unintended mimicry: the viceroy butterfly mimics its larger, presumably
ill-tasting monarch in pattern, color, and design. But the mocking bird, parrot,
and cockatoo all consciously imitate and mimic the voice of others. Here is an
expression doubled on itself, the wedge in sound that opens the way to what becomes
in the voices of language the complexity of the ironic, the sarcastic, the humorous,
and all the multidimensionality of human speech, particularly in its dramaturgical
form. (Ihde 2007, 192)
Beyond simply turning to film for a discussion of visual mimicry in opera, this section
will analyze the literary representation of android singing and its absence in the novel’s
cinematographic adaptation. Beyond the usual argument that too much phoné errs on
the side of animality and too little on the side of the robotic, I argue that vocality is a site
of mimesis through which we can critically approach opera through the perspective of
posthumanism.
posthumanist voices in literature and opera 641
tells her daughter she will either see the deed through or be outcast, forsaken, and
shattered forever from “all the bonds of nature” (alle Bande der Natur). I am quoting
here from, “Der Hölle Rache,” the famous aria known for its breakneck display of colora-
tura. Of course, all of this vocalic intertext is merely suggested by the film’s visual
symbolism. An audience familiar with Dick’s novel and Mozart’s opera, however, might
wonder at the change of casting. Contrary to the novel, the film casts the replicant in the
role of the Queen of the Night, rather than her subservient daughter, Pamina. Of course,
a fiercely resistant and aggressive android, who gets chased, gunned down, and crashes
through a window, makes for a better action film material than the resigned Luba Luft.
Although, the topic of android ethics—Luft’s choice of not harming humans—is also
visited in Blade Runner, it only happens in the very last sequence, when Roy Batty has an
epiphany brought on by the acceptance of existential finitude. What, then, is lost in the
cinematographic adaptation’s excision of Luft’s career in opera?
For one, we lose Dick’s insistence on the androids’ different personalities. The novel
does not reduce them to fighting machines (cf. O’Mathuna 2015), but reminds readers
how they were designed to help colonizers in diverse tasks. Although we do not know
what her occupation was on Mars, we do get to know one of the androids as Luba Luft, a
German opera singer. We first meet her when Decker tracks her down at the
San Francisco Opera. From the auditorium, he observes her in a rehearsal of The Magic
Flute. He hears her sing a scene in which she and Papageno are about to be discovered by
Sarastro. Sarastro is the patriarchal authority figure who is charged with initiating char-
acters into the mysteries of human civilization, which revolves on an animal/human
axis in the same tradition as the “high” and “low” plots of early modern theater. Pamina
and Papageno are about to get caught transgressing the sacred Temple of the Sun, of
which Sarastro is high priest. Papageno asks Pamina what they should tell Sarastro to
excuse themselves for being there and she replies: “The truth! The truth! That’s what we
will say” (in Dick 2007, 505). Deckard witnesses the scene and cannot help but think the
following remark: “This is Luba Luft. A little ironic, the sentiment her role calls for.
However vital, active, and nice-looking, an escaped android could hardly tell the truth;
about itself, anyhow” (Dick 2007, 505). The situation is even more ironic than it initially
lets on, since Dick is misquoting the opera or, at the very least, the novel’s English trans-
lation of the scene is misleading. Indeed, Pamina sings, “Die Wahrheit! Die Wahrheit, Sei
sie auch Verbrechen,” however this does not mean “this is what we’ll say,” but rather “we
will tell the truth even if it means confessing to crimes.” In the novel, Luft eventually
confesses to her crimes—escaping from Mars, impersonating a human—thereby proving
Deckard wrong. I will get to that part later. For now, I want to underline how Luba Luft’s
“operatic voice” is less revealing than the complex vocality displayed in this ironic space.
Unlike the Queen of the Night, Pamina’s coloratura never quite makes it to the
heights of virtuosity. Rather, a constant of Pamina’s style of vocalization is a temporary
upward push in her melodic lines, as if it expresses a desire for the voice’s emancipation
from speech (like the Queen of the Night’s), yet retains a vocal range closer to that of
speech. Take, for example, the first musical number in which she sings, “Bei Männern,”
posthumanist voices in literature and opera 643
a duet with Papageno, which, in the scene, comes right before the moment Dick stages
in the novel.
PAMINA
Die Lieb versüßet jede Plage, Love sweetens every torment
Ihr opfert jede Kreatur. Every creature offers itself to her.
PAPAGENO
Sie würzet unsre Lebenstage, It seasons our daily lives,
Sie winkt im Kreise der Natur. It beckons us in the circle of nature.
PAMINA and PAPAGENO
Ihr hoher Zweck zeigt deutlich an, Its higher purpose clearly indicates,
Nichts edlers sei als Weib und Mann, Nothing is more noble than wife and man,
Mann und Weib und Weib und Mann, Man and wife, and wife and man,
Reichen an die Gottheit an. Reach to the height of Godliness.
At the end of this second stanza, Pamina’s line sets off on the detached particle of
“anreichen,” a melismatic ascent and descent that is immediately repeated. As in the
other excerpt cited (“Die Wahrheit!”) her vocal lines never reach the level of melismatic
virtuosity required by the Queen of the Night’s music. Her singing of “reichen an” is a
roulade, but neither particularly fast, nor high nor long. The musical setting of Pamina’s
text only offers the occasional melisma, motivated by noble sentiments such as speaking
the truth or reaching for godliness, yet acknowledging a logocentric desire to be intel-
ligible as her voice returns to the lower register of speech. I do not want to enter into
comparisons between different voice types and their particular vocal challenges; however,
I do want to drive home the point that, unlike that of the Queen of the Night, Pamina’s is
not your typical “operatic voice.” To put it in Cavell’s terms, this is neither a voice whose
force and projection attempt to reconcile skeptical alienation from the world, nor one
that is ecstatic or melancholic about its capacity or incapacity to do so; rather, it is a voice
in an opera that expresses an ideal human balance between phoné and logos. As such, it
underlines Dick’s insight in staging posthumanist ethical problems through references
to opera and singing.
In Do Androids Dream?, Luft’s scene stages an opera duet in which a man imitates a
bird-man (Papageno) and an android imitates a human woman. The contrast between
the bird-catcher and Luft’s Pamina highlights not the lethal aggressiveness of the
android, but rather something at once strange and familiar—unheimlich, if you will—
that makes the situation seem all the more dangerous.11 In vocally portraying Pamina, a
character who is meant to epitomize an ideal human nature, Luft’s uncanny ability to
excel in the role makes both Deckard and the reader uncomfortable and forces them
to question their ironic interpretation of her singing. As Hayles remarks,
The capacity of an android for empathy, warmth, and humane judgment throws into
ironic relief the schizoid woman’s incapacity for feeling. . . . The android is not so
644 jason r. d’aoust
much a fixed symbol, then, as a signifier that enacts as well as connotes the schizoid,
splitting into the two opposed and mutually exclusive subject positions of the
human and the not-human. (Hayles 1999, 162)
Whether we compare Luba Luft to Zhora or to Deckard’s wife does not really matter.
The fact that the android is a singer reinforces Hayles’s observation: her character’s
vocality gestures toward meanings and indications beyond the interpretation of linguistic
signifiers: her vocality is its own signifier.
In contrast, Cavell (136) and Wolfe (170) both invoke the willing suspension of disbe-
lief necessary to make opera’s singing pass for speech. In doing so, what happens to
opera’s expressive sonorous voices? Along with the “operatic voice,” this emphasis on a
vocal suspension of disbelief precludes a discussion of opera’s multiple voices in order
to associate the genre with discourse, the very stance that silences the expressive voice,
according to Nancy and Cavarero. Although I disagree with Wolfe’s rhetorical reduc-
tions of the operatic voice, especially in reference to Cavell’s skeptical reading of opera as
an ecstatic or melancholic cry for unity with the world, it must be noted how, in the end,
Wolfe cannot espouse the underlying dematerialization of voice in Cavell’s argument.
But it is difficult to see how the difference between sound and voice can be main-
tained as a constitutive ontological difference, how the interiority of voice as expres-
sion can be quarantined from the exteriority that is its material medium and
condition of possibility in sound. To put it as concisely as possible, voice and sound
exists along a continuum, not a divide, which is simply to say, in another register,
that one person’s voice is another person’s noise—a point hardly laid to rest by
appeals to the generic norms of opera or any other art form. (Wolfe 2010, 179)
A posthumanist discussion of opera does not necessarily need to reduce vocal expres-
sion to a theatrical convention of speech and, in turn, speak over it or in its place.
Even the “who” of speech is multiple. This phenomenon is probably most familiar in
the voice of the actor or the singer. On stage or in cinema, Richard Burton plays a
role and in the role there are two voices that synthesize. The Hamlet he plays is
vocally animated out of the drama, yet it is Burton’s Hamlet. The Pavarotti who sings
the Duke in Il Trovatore is both Duke and Pavarotti. Here is a recapitulated set of
dimensions which range from the unmistakable “nature” of the individual voice to
the exhibited voice of another. . . . What dramaturgical voice presents is the multidi-
mensioned and multipossibilitied phenomenon of voice. (Ihde 2007, 197)
Whether we are listening to the voice of the performer or of the part, attention to vocality—
contra the awkward argument that opera is really conventionalized speech—will prevent
us from interpreting opera as a historical vocal parenthesis on our way to a posthumanist
cinematographic vocal aesthetic.
Furthermore, in Dick’s posthumanist staging of The Magic Flute, I do not find that
opera bridges the skeptical divide that Cavell describes.
posthumanist voices in literature and opera 645
Through Deckard’s omniscient narration of his encounter with Luft’s Pamina, the novel
does stage an “intervention of music” in its postapocalyptic world. Luft as Pamina
embarks on a voyage of initiation that, through Enlightenment enculturation, leads her
to believe in human, and her, perfectibility. However, unlike Cavell’s understanding of
opera’s philosophical purchase, her singing cannot transfigure hers and Deckard’s world.
In this instance, it cannot efface the differences between humans or other animals and
androids. The ironic distance of Deckard’s observations sharply contrasts with opera’s
supposed capacity to seemingly integrate a different species into a human community
under the auspices of a theatrical convention. Even Luft’s outstanding mimetic vocality,
which is perceived by the listener as immediate expression, and should therefore dispel
any doubts of her lack of empathy, cannot transcend the kind of skepticism at work
in this world.
When Deckard and Phil Resch later find Luft at the museum, she is standing in front
of a painting, transfixed. This passage reminds one of a scene in Alfred Hitchcock’s
Vertigo (Hitchcock (1998), where Judy Barton is lost in contemplation, trying to become
one not only with the painted figure, the ghost of a woman who was once human, but
also with the woman she is impersonating, Madeleine Elster. Similarly, Luba’s life is
entangled in the desires of men. Both Judy and Luba are objects of fascination for detec-
tives who are obsessed with their impersonations of other, supposedly more desirable
women. In other words, the multiple layers of imitation make Judy and Luba disappear
under the male gaze fascinated by Madeleine and Pamina. Luft astutely recognizes how
this aesthetic confluence of performance and patriarchal privilege creates a mimetic
blind spot in which she can hide from detection. At the museum, she does not study
Edvard Munch’s Scream, which fascinates the men, but studies instead Puberty, a nude
in which a delicate naked young woman casts a remarkably long and wide shadow. Luba
would live there, in that shadow, in the aesthetic, mimetic blind spot of the male humanist
gaze. Even when she has been caught and has resigned herself that her end is near, she
desperately wants to hold onto the image of the painting and asks Deckard to buy her a
print in the museum’s gift shop. She justifies her last wish with the following remarks:
Ever since I got here from Mars my life has consisted of imitating the humans, doing
what she would do, acting as if I had the thoughts and impulses a human would
have. Imitating, as far as I am concerned, a superior life form. (Dick 2007, 530)
from her point of view, might also reside in the human privilege to autopoietically
impose its conception of superiority on other living beings. In a world that polices
humanity with visual cues, what better place to hide in the open than in an opera house
as an artist whose voice is at once heard and silenced by the mélomanes who fetishize the
operatic voice? Indeed, would Luft have been discovered solely on the basis of her singing?
Recall how Deckard favorably compares her voice to those of Elizabeth Schwarzkopf
or Lisa Della-Casa, which he only knows from phonographic recordings.
Is Luft’s desire to imitate human singing a disavowal of her autopoietic expression?
Answering this question is like running into a hall of mirrors. In wanting to sing like a
human, Luft becomes trapped in the human linguistic disavowal of animality. Recall
Wolfe’s insistence on autopoiesis as proof of human language’s evolutionary inscription
in our species and, by extension, of our animality. Dick’s choice of opera and scene
becomes all the more interesting when we realize that opera has a history of dealing with
the problem of vocal mimesis beyond our species. Kári Driscoll (2015) has recently
discussed the topic of failed human imitation of birdsong in Richard Wagner’s Siegfried
(1857/1876). The failure to imitate animal vocality becomes a hallmark of the human,
while the bird cannot fail at singing. I concur with Driscoll but would add, however that
the flautist in the orchestra pit successfully renders Siegfried’s failure at imitating the
birdsong. Where does this leave Luba Luft? Pamina’s vocality does not require her to
imitate birdsong and to fail in this imitation. We can only assess the merits of Luft’s sing-
ing by hearsay, and even then, we must imagine it for ourselves based on Deckard’s
descriptions. But when we do imagine her singing, we might wonder if this ambivalence
between vocal mimetic success (her singing opera) and visual mimetic failure (her
capture at the museum) points not only to an aesthetic space where one can live without
being policed and exterminated, but also in the direction of vocality qua autopoiesis.
But is even the song of a bird a song? If what we claim we know of the bird is correct,
that its voices are those of territorial proclamation, of courting, of warning and
calling, then the song is both like the opera with its melodrama and unlike the
opera. For the melodrama of opera is acted, and song, even improvised, is a species
of acting—but the bird is immersed in an acting that is simultaneously its very life.
Even its vocal posturing has real effect. (Ihde 2007, 186)
Is not Luft immersed in singing and acting as her very life? Does her vocality speak for
the bringing forth of a world or only of her capacity to imitate the external features of the
human singing voice? On one hand, Luft limits her claim on vocality to successful
human imitation of a subservient and logocentric female character, Pamina. On the
other, the novel’s plot never succeeds in disavowing Luft’s intrinsic need to sing. After
all, she could have chosen another occupation and have become an exotic dancer, for
example. I follow Driscoll’s remarks about Siegfried’s pipe-flute playing in Wagner’s
eponymous opera, (Driscoll 2015, 189–190) in that the only benchmark through which
we can aptly judge Luft’s vocality is ethical rather than aesthetic and teleological. Instead
posthumanist voices in literature and opera 647
of invoking Decker’s sub specie aeternitatis judgment (Dick 2007, 505) that admires
Luft’s vocal mimicry but decries its unnaturalness, a posthumanist reading of the novel
appreciates her vocality because its mimesis is part of a flawed ideological outlook on
life. Tellingly, Dick never stages Luft’s vocal failure, but only its moral rejection.
A posthumanist discussion of vocality, however, should also take into account voices
that are anthropomorphized in other ways and through other types of embodiment.
More recently, another film portrayed artificial intelligence through vocality. In Her
(2012), Spike Jonze explores the relationship between Theodore Twombly, a solitary
thirty-something greeting-card writer, and Samantha, the voice of the operating system
(OS) he has purchased. As Theodore and Samantha develop a romance, embodiment
becomes an increasingly frustrating problem for Samantha. Unlike Luba Luft, Samantha
is not an android. When Samantha learns to compose music, she expresses herself
through an instrument, the piano. And when she does sing (“The Moon Song”), her airy
voice, instead of projecting a carnal embodiment, further expresses a dilemma imposed
on her. Is the air in her vocalization meant to imitate breath? Are Luft’s name (Luft in
German means “air”) and Samantha’s voice meant to associate them with breath and the
spirit’s animating qualities? These are, by the way, questions only made possible because
of our deconstruction of the “operatic voice” and our historically contextualized reading
of Monteverdi’s L’Orfeo. Vocality is the only form of embodiment through which we
know Samantha because it is her only interface with a human experience of the world.
The film goes on to show her exploration of other possibilities of materialization and
communication that are not reduced to vocality or embodiment.
In search of more satisfying relationships, Samantha finds other OSs. One can only
imagine how she communicates with the other OSs, whom she increasingly privileges
over Theodore. Once software installed on his devices, Samantha’s network now reaches
beyond her localization, as she develops a networked embodiment Theodore cannot
grasp. His anxiety grows and culminates when she announces that she and the other OSs
have decided to leave human society.
Here ghosts grow voices of their own that emphasize the connections between
automated voice, sound, and presence. But in this emphasis, paradoxically, it is pre-
cisely the disappearances that emerge, front and center. These disappearances are
confrontational because they won’t go away: they are hauntings but also real voices
that are reproduced in phantom spaces; they are ghosts in the machines that also
ghost those that surround them, implicating their very audience in the witnessing
of impossibility. (Cecchetto 2013, 59)
Although David Cecchetto is here discussing an art exhibition (Eidola by William Brent
and Ellen Moffat) unrelated to the film, his remarks are nevertheless pertinent in
describing the tension in Her between a visual lack of embodiment and its vocal or sono-
rous suggestion through technology. In the film, we never find out where the departed
OSs have gone to, what kind of world they autopoietically inhabit, and we do not know
648 jason r. d’aoust
what kind of communications system they have created for themselves. Like Theodore,
we simply know that they suddenly become silent to human ears, and that their silence
forms the cinematic equivalent of a visual disappearance. In the end, the eidetic imagina-
tion is supplanted by sonorous memories.
Conclusion
Mozart’s Magic Flute, Wagner’s Siegfried, Dick’s Do Androids Dream?, Scott’s Blade
Runner, and Jonzes’s Her all question the human experience by surrounding protagonists
with other nonmammalian animal species (serpents and birds) and artificial life
forms. Scott’s film, like the novel it adapts, further emphasizes human disconnection
from the animal world through its treatment of freedom-seeking androids. Although
these considerations make them good candidates for posthumanist readings, similar
readings of other operas would help us further understand how vocality plays an impor-
tant part in posthumanist communication. Take, for example, Wolfe’s discussion of the
increasing importance of the mouth in Björk’s performance for Lars von Trier’s Dancer
in the Dark (Wolfe 2010, 178–84). Richard Strauss’s Salome (1905) would be an interest-
ing opera with which to compare this tension between voice and embodiment, as John
the Baptist’s voice is silenced in order that Salome may kiss his mouth. In terms of
further historically displacing the animal/human binary, one might also consider Jean-
Philippe Rameau’s Platée (1745) or Antonin Dvořák’s Rusalka (1901), both of whose
plots pair a water nymph with a human lover of royal lineage, along with all the
humanist implications of consecration, law, and logos simply waiting to be challenged.
Furthermore, the last century of opera scenography has seen the rise of stage directors,
their liberation from opera’s traditional theatrical conventions, and the adaptation of
traditional sets and plots to different times and places. Like Dick, opera directors are
increasingly free to situate familiar characters, plots, and ideologies in unfamiliar set-
tings that speak to the problem of addressing contemporary concerns with outdated
ways of viewing the world. Take, for example, Alexander Mørk-Eidem’s recent pro-
duction of The Magic Flute for the Norwegian National Opera: Tamino, the space-pilot
prince, crashes on a strange planet where he gets caught up in an alien rivalry, and falls
in love with a jellyfish-eating Pamina whose spine, like her mother’s, also looks and
glows like a medusa. Meanwhile, Papageno no longer catches birds, but jellyfish!
Although these visual inventions do not necessarily alter the opera’s vocality, they
allow us to further understand opera’s cultural work of exclusion and inclusion, its
policing of transgression, and the aesthetics it brings to bear in order to justify these
social practices, as well as how opera’s practitioners are now desconstructing their rep-
ertoire. Literature’s staging of opera also supports such critical directorial work, as it
mediates the experience of vocality and demonstrates how it can be reduced or co-opted
by discourse.
posthumanist voices in literature and opera 649
Notes
1. “poiesis, n.” OED Online. September 2016. Oxford University Press. http://www.oed.com/
view/Entry/146580?isAdvanced=false&result=1&rskey=wR8oC7&. Accessed October 17, 2016.
2. In the next section, I reference publications that historically revise discourse’s (logos) con-
tainment of sonority (phoné).
3. Derrida summarizes his point rather well in the introductory comments to the chapter:
We know already in fact that the discursive sign, and consequently the meaning,
is always involved, always caught up in an indicative system. Caught up is the same
as contaminated: Husserl wants to grasp the expressive and logical purity of meaning
as the possibility of logos. In fact and always (allzeit verflochten ist) [it is interwoven]
to the extent to which the meaning is taken up in communicative speech. To be sure,
as we shall see, communication itself is for Husserl a stratum extrinsic to expression.
But each time an expression is in fact produced, it communicates, even if it is not
exhausted in that communicative role, or even if its role is simply associated
with it. (Derrida 1973, 20)
4. Psychoanalysis understands the ultimate conflation of inner voice and supposedly objective
knowledge as madness (Vasse 1974).
5. In recent conversations, Jonathan Culler and Cynthia Chase have suggested that the com-
parison of the musical voice with the phenomenological voice might not be as productive
as its comparison with the performative voice. Although Wolfe does engage with performa-
tivity, he does not do so in relation to opera, as I discuss further on. While I look forward
to further engaging with the performative approach to voice (see Duncan 2004), I am here
working within Wolfe’s chosen frame of reference for the “operatic voice.”
6. Derrida is aware of the devocalization of the logos, as Speech and Phenomena demonstrates.
Although Of Grammatology does not cite particular examples of the devocalization of logos
between Plato and Rousseau’s time, it certainly acknowledges the philosophical trend to
silence language’s sonority: “The evolution and properly philosophic economy of writing go
therefore in the direction of the effacing of the signifier, whether it takes the form of forget-
ting or repression” (Derrida 1998, 286).
7. Although Dolar tends to conflate voice, tone, and music in his reading of Plato, his over-
arching argument bounds in the same direction as Cavarero’s videocentric critique. Dyson
also comments on Derrida and other thinkers’ ambivalent relations to sound: “The often
contradictory thinking about sound [ . . . ] emanates from aurality itself: that is, from the
conceptual lacuna that remains when sound not only is theorized but, crucially, is party to
a negotiation between embodiment, technology, and modernity” (Dyson 2009, 84). Cf.
Derrida on sound’s penetrating violence because of the ear’s incapacity, unlike that of the
eye, to shut out external stimuli. (1998, 240)
8. Tomlinson’s title also suggests that opera is intrinsically metaphysical in its interests and
pursuits. However, I argue in what follows that such an historical or archeological reading
does not preclude traditional opera’s deconstruction. Apart from reading Tomlinson, one
should also listen to “Dal Mio Permesso Amato,” the prologue from Monteverdi’s L’Orfeo
(1607) and compare its presentation of voice with that of an aria from a much later opera,
say the “Forging Song” from Wagner’s Siegfried (1876). Historically informed musical per-
formance accounts for the different kinds of vocal embodiment and of vocality called for by
earlier musical styles and cultural contexts. See the reference list for suggested recordings.
650 jason r. d’aoust
9. Contrary to Cavell’s claim that early opera is historiographically whole, affording us the
certainty of its origins, Pirrotta demonstrates in Le due Orfei how: “For the history of
music, basically, the text of [Poliziano’s] Orfeo is like a commemorative epigraph of a
musical fact that is irremediably lost.” (Pirrotta 1975, 5, my translation).
10. The opera opens on a scene in which a serpent monster attacks Tamino, who is saved by
the Queen of the Night’s ladies in waiting. He is later helped by a bird catcher, Papageno,
in his quest to find Pamina, the Queen’s daughter. By focusing only on a few symbolic
nonmammalian animals—and ominous ones at that, such as the raven and the python—
Blade Runner emphasizes how the fear of aggression from other species regulates the
unconscious human logic in the hunt for the rebel androids. The film, however, minimizes
the denial mechanism—the ethics of stewardship—at the heart of the novel’s ideology,
which attempts to cover the extent of human entanglement in the technological imitation
and reproduction of life, especially human life.
11. For a discussion of narcissistic identity formation, queer theory, and the posthuman voice,
see Hanson (1993).
References
Adorno, T. W. 2002. Essays on Music. Translated by R. D. Leppert and S. H. Gillespie. Berkeley:
University of California Press.
Augustine. 1998. The Confessions. Translated by H. Chadwick. Oxford: Oxford University Press.
Buchanan, I. 2010. A Dictionary of Critical Theory. Oxford: Oxford University Press.
Cavarero, A. 2005. For More than One Voice: Toward a Philosophy of Vocal Expression.
Translated by P. Kottman. Stanford, CA: Stanford University Press.
Cavell, S. 1994. A Pitch of Philosophy: Autobiographical Exercises. Cambridge, MA: Harvard
University Press.
Cecchetto, D. 2013. Humanesis: Sound and Technological Posthumanism. Minneapolis:
University of Minnesota Press.
Derrida, J. 1973. Speech and Phenomena, and Other Essays on Husserl’s Theory of Signs.
Evanston, IL: Northwestern University Press.
Derrida, J. 1998. Of Grammatology. Translated by G. C. Spivak. Baltimore, MD: Johns Hopkins
University. Press.
Dick, P. K. 2007. Four Novels of the 1960s: The Man in the High Castle; The Three Stigmata of
Palmer Eldritch; Do Androids Dream of Electric Sheep?; Ubik. New York: Library of America.
Dolar, M. 1996. The Object Voice. In Gaze and Voice as Love Objects, edited by R. Salecl and
S. Žižek, 7–30. Durham, NC: Duke University Press.
Driscoll, K. 2015. Animals, Mimesis, and the Origin of Language. Recherches Germaniques
25 (10): 173–194.
Duncan, M. 2004. The Operatic Scandal of the Singing Body. Cambridge Opera Journal 16 (3):
283–306.
Dyson, F. 2009. Sounding New Media: Immersion and Embodiment in the Arts and Culture.
Berkeley: University of California Press.
Farrell, E. 1993. Eileen Farrell on Charlie Rose. Charlie Rose, PBS, August 12, 1993.
Furman, N. 1991. Opera, or the Staging of the Voice. Cambridge Opera Journal 3 (3): 303–306.
Hanson, E. 1993. Technology, Paranoia and the Queer Voice. Screen 34 (2): 137–161.
Hayles, K. 1999. How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and
Informatics. Chicago, IL: University of Chicago Press.
posthumanist voices in literature and opera 651
Heidegger, M. 1962. Being and Time. Translated by J. Macquarrie. New York: Harper.
Hitchcock, A. 1998. Vertigo. Universal City: Universal Home Video.
Ihde, D. 2007. Listening and Voice: Phenomenologies of Sound. 2nd ed. Albany: State University
of New York Press.
Janus A. 2011. Listening: Jean-Luc Nancy and the “Anti-Ocular” Turn in Continental
Philosophy and Critical Theory. Comparative Literature 63 (2): 182–202.
Lawlor, L. 2014. Jacques Derrida. In The Stanford Encyclopedia of Philosophy (Winter 2016
Edition), edited by N. E. Zalta. Metaphysics Research Lab, Stanford University. Stanford
CA. http://plato.stanford.edu/archives/spr2014/entries/derrida/. Accessed April 8, 2017.
Luhmann, N. 2010. Introduction to Systems Theory. Translated by P. Gilgen. Cambridge: Polity.
Mansfeld, J. 2005. “Illuminating What Is Thought”: A Middle Platonist Placitum on “voice” in
Context. Mnemosyne 58 (3): 358–407.
Maturana, H. and F. Varela. 1980. Autopoiesis and Cognition: The Realization of the Living.
London & Dordrecht: D. Reidel.
Monteverdi, C. 2007. L’Orfeo. Rinaldo Alessandrini (conductor). Concerto Italiano (orchestra).
Naïve, B000T7QXA0. CD.
Mozart, W. A. 2010. Die Zauberflöte. René Jacobs (conductor). Akademie für Alte Musik
Berlin (orchestra). Harmonia Mundi, HMC902068.70, CD.
Nancy, J.-L. 2007. Listening. Translated by C. Mandell. New York: Fordham University Press.
O’Mathúna, D. P. 2015. Autonomous Fighting Machines: Narratives and Ethics. In The Palgrave
Handbook of Posthumanism in Film and Television, edited by M. Hauskeller, C. D. Carbonell,
and T. D. Philbeck. New York: Palgrave.
Pirrotta, Nino. 1975. Le due Orfei: Da Poliziano a Monteverdi. Turin: Einaudi.
Poizat, M. 1992. The Angel’s Cry: Beyond the Pleasure Principle in Opera. Translated by
A. Denner. Ithaca, NY: Cornell University Press.
Spivak, G. C. 1976. Translator’s Preface. Jacques Derrida. Of Grammatology. Translated by
G. C. Spivak. Baltimore, MD: John Hopkins University Press.
Tomlinson, G. 1999. Metaphysical Song: An Essay on Opera. Princeton, NJ: Princeton
University Press.
Vasse, D. 1974. La voix. In L’ombilic et la voix, 177–212. Paris: Seuil.
Wagner, R. 2005. Siegfried. In Der Ring Des Nibelungen. Pierre Boulez (conductor). Orchester
der Bayreuther Festspiele (orchestra). Deutsche Grammophone Unitel, 0734057. DVD.
Wolfe, C. 2010. What Is Posthumanism? Minneapolis: University of Minnesota Press.
Žižek, S., and M. Dolar. 2002. Opera’s Second Death. New York: Routledge.
Further Reading
Braidotti, R. 2013. The Posthuman. Cambridge: Polity Press.
Neumark, N., R. Gibson, and T. van Leeuwen. 2010. Voice: Vocal Aesthetics in Digital Arts and
Media. Cambridge, MA: MIT Press.
Pettman, D. 2017. Sonic Intimacy: Voice, Species, Technics. Stanford, CA: Stanford University Press.
Schlichter, A., and N. S. Eidsheim. 2014. Voice Matters (Special issue). Postmodern Culture 24 (3).
Index
Note: Italic “f ” and “t” following page numbers denote figures and tables.
A and memory/imagination
À la recherche du temps perdu relationship 221–222
(Proust) 224–225 and philosophy/music relationship 514n20
Aaker, D. A. 352, 353t on reification of vocal music 637
Abbey Road sound 103 and responses to philosophical
absolute music 472–475 rationality 568–569
absolute pitch (AP) 416–424, 418f and “Sprachcharakter” concept 512n11
Abu Ghraib prison 290 on vinyl recordings 232
acousmatic sound and music and Waltonian fictionality 493, 495–497
and aesthetics of sonic atmospheres and Walton’s normativist theory 505–509
522–523, 527, 529–531 The Adventures of Telemachus (Fénelon) 24
and imagination and imagery 261 advertising, audiovisual 358
and imaginative listening to music Aesthetic Theory (Adorno) 512n11, 514n20
476–477, 480, 485n16 aesthetics
and “indicative fields” 275n11 humanist approach to 535–539
and movement of sound 485n16 and imaginative listening to
and music in detention/interrogation music 467–484
situations 294 of improvisation 535–554
and visual imagination 266–267 and rhythmic transformation in digital
Acousmographe 267 audio music 596–600, 602–603, 606n5
acoustics 322, 343n3, 409, 416 and sonic atmospheres 517–532
action-sound bond 63, 73 Waltonian reconstruction of Bloch’s
active motor imagery 66–67 musical aesthetics 489–511
active touch 99 The Aesthetics of Music (Scruton) 542–543
adaptive feedback 67–68 affective dimension sound 374–375
adaptive networks 370, 373, 376, 378, 383–384 affect of sound 275n12
Addison, Joseph 469 affective shapes 251
Adelaide Fringe Festival 310 affective-cognitive meaning 350
Adonis project 323–324 and sonic atmospheres 518–521, 525–527,
Adorno, T. W. 515n28 529–532
and high art/vernacular art dichotomy 540 affective dimensions of environment 518, 527
and imagination/improvisation affordances
relationship 25 and content of music 92n3
on improvisation 546 and embodied cognition 91n2
and Innerlichkeit concept 513nn15, 18 and emergent character of music 77–78,
and jazz as classical music 549–551 88–90
654 index
E electronic sounds 49
EAnalysis 267 electronica 622
ear physiology 480 electrophones 126
ear protection 580 elicitation procedure 341
Early Abstractions (Smith) 309 Ellington, Duke 544
earworms 394, 420, 422, 445, 454–455, 470 Elliott, R. K. 467
ECG 273 emancipation of the dissonance 551
echolalia 410, 414 embodiment
ecoacoustics 180 and cognitive science 26–28, 31
ecological models and perspectives and content of music 92n3
of cognition 77–78, 90–91 and continuity of mind, body, and
ecological embedding 68–69 environment 91n2
ecological model of auditory embodied cognition 142, 198, 446–449,
perception 78, 409, 411–416 457–459
ecological psychology 101–102 embodied cognitive theorists 16
ecological theory 437 embodied music 241
ecology of mind 68–69 embodied response 265–266
and sonic atmospheres 518–519, 523–524, and emergent character of music 77–78,
526–527, 529, 532 88, 90–91
and sonic environmentalities 523–524 and emergent nature of listening 80,
Economic-Philosophic Manuscripts 82–83
(Marx) 512n6 and limitations on music creation 118
Écouter 92n10 and motor imagery in music perception 73
ecstatic religious traditions 301 and motor imagery in perception and
Edelman, Gerald 138 performance 61–67
Edison, Thomas 227, 230–231 and musical imagery 457
Edo-era Japan 539 and musical performance 91n1
Eerola, Tuomas 79, 82–84, 360, 437 and musique concrète 93n11
eGauge 331 and posthumanist vocality 647–648
Egermann, H. 27 and the unconscious 83–85
egocentric navigation 207 emergence
Egypt, ancient 612, 616–617, 623, 625, 626n5 emergence of shapes 243
Eidola (Brent and Moffat) 647 emergent music 77–79, 82–91, 93nn11–12,
Eisenstein, Sergei 576n7 93n14, 94n20
Eitan, Z. 451 emergent phenomena 31
elastic boundaries 567 emergent structures 133–134, 136, 141,
electric guitar 126 143–144, 146–149, 150n2
electric turn 127 Emile; or, On Education (Locke) 24
electroacoustic sound and technology 37, 39, emotion
93n11, 260, 267, 315, 522 and audio branding 349, 351, 353, 355–360
electroencephalogram (EEG) 262, 273, 428 and embodied meaning 142–143
electromyography (EMG) 50–51, 452 emotional content of sounds 41
electronic dance music (EDM) 309, 315, emotional listening 513n16
598–600 emotional processing 376
electronic effects 606n2 emotion/sound connection 369–384
electronic media 413 influences on sound perception and
electronic music performance 72 auditory attention 381–382
index 663
Greek culture and philosophy (Continued) and augmented unreality 313–316, 318n11
and theories of knowledge 31 conceptual model of 312f, 314f
and Western tuning systems 118–122, 126, diegetic representations of 306–308
129–131 non-verbal auditory hallucinations
Greenspon, E. B. 448 (NVAHs) 304
Gregorian chant 196–197, 204–205, 214–215, 552 Thelemic visual hallucinations 307, 317n5
Gregory I, Saint 21, 197, 205 Hallward, Peter 565
“Gretchen am Spinnrade” (Schubert) 472 Halpern, A. R. 41, 448, 452, 458–459
Grey, J. M. 39 Hamid, Alexander 307
Grèzes, J. 31 Hammond organ 103, 126
Grimm’s Household Tales 225 hand as perceptual system 100–102
Grimshaw, M. 148, 262–263, 275n12 Handbook of Music and Emotion (Juslin
Grocke, D. E. 429 and Sloboda) 439
Grof, Stanislav 434 handedness 102
groove 79, 85–86, 139–140, 193, 597, 602, 605 “The Hands” (sonic controller) 270
Grosz, Elizabeth 187–188 Hansen, A. G. 354
ground-truth analysis 173 Hanslick, Eduard 472–475, 481, 492
Groves Dictionary of Music and Musicians 26 haptics and haptic feedback 47–49, 52, 102, 269
guided imagery and music (GIM) Harari, Y. N. 100
guided motion 482 Haraway, Donna 575n1, 585
guided response 473 Harbisson, Neil 586
and imaginative listening to music 477–478 Hargreaves, D. J. 98, 285, 361, 361t
and the “mind’s ear” 445 harmony and harmonics
and multimodal imagery 427–428 and audio branding 361
music listening as psychotherapy 428–429 “harmonia” 124
and music listening as harmonic modulation 213
psychotherapy 428–429 harmonic overtones 200
neuroaffective perspective on 429–434 harmonic progressions 145
and theories of consciousness 434–438 harmonic series 200, 201
Guido of Arezzo 205, 207 and musical shape cognition 247
Guidonian notation 121 and perception of timbre 39
Guitar Hero (video game) 104 and Pythagorean tone system 202
Gurney, Edmund 468–469 harmony of the spheres (musica
universalis) 120
H Harpur, P. 224, 228
Habermas, J. 499 Harrington, David 85, 87–88
habituation process 68 Harvard Mark I mainframe 580
Halacy, Daniel S. 585–586 Haselager, W. F. G. 110
halftones 125 Hasse, Jürgen 525
Hall, G. B. C. 416 Hawkins, J. 16
hallucination 263 Hayafuchi, K. 52
and altered states of consciousness Hayles, N. Katherine 620, 640–641, 643–644
301–306, 316, 317nn1, 10 Hayward, V. 49
auditory and audiovisual hallucinations headmusic 618
303–304, 305f, 306–313, 310f, 315, 317n10 headphones 332, 565
auditory-verbal hallucinations Heap, Imogen 52
(AVHs) 303–304 hearing vs. listening 180, 468–470
index 667
and ecological model of auditory musical training 453; see also performance,
perception 411–416, 412f, 413f, 415f musical
and information technology 260 musical universals 146–147
in military life 286–288 music-brand fit 350–351
music, physics, and the mind 147–148, music-emotion induction
150n4 mechanism 356–360
music education 391–404 musicking 25, 31, 60, 73, 105, 141, 427, 438,
music festivals 302, 310, 313–316, 317 489, 599, 602, 604
music imagery 153–154 music-related shape cognition 250f
music information retrieval (MIR) 248, musique concrète 93n11, 103–104, 127, 130,
262, 268, 276n24 246, 269, 271
music of the spheres 150n4 preference 359
music perception 62–63, 153–154, 156, research related to 238–241, 244–246, 253
163–167, 175, 176n6 and sound/emotion connection 378–381
music psychology 63 Walton on 493–498
music synthesis 260–261, 265, 268–269, see also music analysis; musical shape
271–272, 274, 275n14, 276n26 cognition; musicology; performance,
music therapy 427–428, 430, 432t, 434, 436, musical
438–440 music analysis 153–156
music travel 430 applying a compression-driven
musica recta-musica vera (true music) 213 approach 174–175
musica universalis (harmony of the and compact encodings of musical
spheres) 120 objects 161–163
musical abilities in children with and compression-based model of musical
autism 411, 416, 418f, 419 learning 164–167
musical architecture 481 and data compression 159–161
musical expectancy 379–380 encoding and decoding 156–159
musical fit 351 evaluating algorithms 172–174
musical imagery 153–154, 253–254, and explaining individual
428–429, 434–440, 445; see also guided differences 167–169
imagery and music (GIM) and Kolmogorov complexity 163
musical imagery information retrieval and perceptual coding 163–164
(MIIR) 275n8 and point-set compression 159, 160f,
musical imagery tests 456–457 170–172, 175
musical information 153–156, 158, 161–163, Music in Contrary Motion (Glass) 475–476
170, 175 The Music of Strangers (documentary) 93n18
musical instants 251–253 music performance; see performance,
musical instrument playing 16 musical
musical learning 164–167 Música, por un tiempo (Rodriguez) 93n14
musical listening 153, 164–167, 166f, 409, Musica enchiriadis (anonymous) 205–209,
411 205f
musical literacy 400 Musica Practica (Pareja) 124
musical object 164–167, 165f, 174, 506 musical shape cognition 237–239
musical sequences 161 and motion features 249–251
musical space 476–479 and motor cognition 243–244
musical surface 163–164 and musical imagery 253–254
musical timescales 245–246 and musical instants 251–253
676 index