0% found this document useful (0 votes)
42 views10 pages

AI and Automatic Music Generation For Mindfulness

This paper presents a system that uses machine learning and physiological data to generate emotionally congruent music for mindfulness applications. The system analyzes a user's galvanic skin response while listening to labeled music pieces to correlate physiological reactions with emotions. It then uses the user's real-time galvanic skin response as a control signal to generate new music structures matched to the user's emotional state for mindfulness training. The goal is to create personalized, biofeedback-driven music generation to target mindfulness in applications like gaming, automated soundtracks, and mindfulness coaching.

Uploaded by

jack.arnold120
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views10 pages

AI and Automatic Music Generation For Mindfulness

This paper presents a system that uses machine learning and physiological data to generate emotionally congruent music for mindfulness applications. The system analyzes a user's galvanic skin response while listening to labeled music pieces to correlate physiological reactions with emotions. It then uses the user's real-time galvanic skin response as a control signal to generate new music structures matched to the user's emotional state for mindfulness training. The goal is to create personalized, biofeedback-driven music generation to target mindfulness in applications like gaming, automated soundtracks, and mindfulness coaching.

Uploaded by

jack.arnold120
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Audio Engineering Society

Conference Paper 84
Presented at the International Conference on
Immersive and Interactive Audio
2019 March 27–29, York, UK

This paper was peer-reviewed as a complete manuscript for presentation at this conference. This paper is available in the AES
E-Library (http://www.aes.org/e-lib) all rights reserved. Reproduction of this paper, or any portion thereof, is not permitted
without direct permission from the Journal of the Audio Engineering Society.

AI and Automatic Music Generation for Mindfulness


Duncan Williams1, Victoria J. Hodge1, Lina Gega2,3, Damian Murphy1, Peter I. Cowling1 and Anders Drachen1
1 Digital Creativity Labs, University of York, UK.
2 Dept of Health Sciences, University of York, UK.
3 Hull York Medical School, University of York, UK.
Correspondence should be addressed to Duncan Williams ([email protected])

ABSTRACT
This paper presents an architecture for the creation of emotionally congruent music using machine learning aided
sound synthesis. Our system can generate a small corpus of music using Hidden Markov Models; we can label
the pieces with emotional tags using data elicited from questionnaires. This produces a corpus of labelled music
underpinned by perceptual evaluations. We then analyse participant’s galvanic skin response (GSR) while
listening to our generated music pieces and the emotions they describe in a questionnaire conducted after
listening. These analyses reveal that there is a direct correlation between the calmness/scariness of a musical
piece, the users’ GSR reading and the emotions they describe feeling. From these, we will be able to estimate an
emotional state using biofeedback as a control signal for a machine-learning algorithm, which generates new
musical structures according to a perceptually informed musical feature similarity model. Our case study
suggests various applications including in gaming, automated soundtrack generation, and mindfulness.

The process involves the practitioner or patient


1 Introduction concentrating their attention and awareness in a
We employ a generative music system designed to deliberate manner. Chambers [2] showed that this is
create bio-signal synchronous music in real-time correlated to galvanic skin response, heart rate
according to an individual’s galvanic skin response variability, and the ratio of alpha and beta waves in
(GSR), using /machine learning (ML) techniques to brain imaging techniques, amongst other
determine similarity between an emotion index physiological metrics. Bondolfi [3] and Economides
determined by perceptual experiment, and musical [1] proposed mindfulness as a therapeutic treatment
features extracted from a larger corpus of source and suggest there is evidence to link physiological
files. This work has implications for the future changes with mindfulness training. Mindfulness may
design and implementation of novel portable music be considered a consolidated emotion - or more
systems and in music-assisted mindfulness training accurately, an affective state. The distinction
and coaching. between affective state, emotion, and mood, is
1.1 Background complex, but in the context of sound and music,
cognitive scientists have suggested that the temporal
Mindfulness increases awareness of thoughts,
nature of the response can be a useful method of
feelings, and sensations, while keeping an open
delineating between such descriptors [4].
mind, free from distraction and judgment [1]. It can
Existing work has shown that there is a neurological
benefit mental health and general well-being [1].
and physiological connection to music [5]. When
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

listening to our favourite music, our bodies respond small amount of music due to the required
physically, inducing reactions such as pupil dilation, participant sample sizes, so we augment our human-
increased heart rate, blood pressure, and skin labelled data using ML. We build on these findings
conductivity [6]. Thus, there is a potential crossover to deliver a personalized AI approach to target
between mindful action, physiological reaction, and mindfulness.
musical stimulation. We are attempting to fruitfully
1.2 Emotional responses to music
exploit this crossover to gamify mindful interactions
and create a music-based training system for the There are a number of approaches for modelling
end-user using machine learning to automate the emotional responses to musical stimuli [13]. Often,
process. For example, mood-based regulation may these borrow from conventional models used to
be a target for the user. This might be adapted in the quantify and qualify emotion, such as the
creative industries to the designs that use circumplex (two-dimensional) model of affect [14].
physiological metrics as control systems: for This model places valence (as a measure of
example in video games [7], [8] in which case, the positivity) and arousal (as a measure of activation
player might be subjected to targeted mood strength) on the horizontal and vertical axes
disruption (i.e., being deliberately excited or even respectively. Emotional descriptors (e.g., happy, sad,
scared). angry) can be mapped on to this space, though some
Machine learning (ML) is a field of computer descriptors can be problematic in terms of a duality
science covering systems that learn "when they of placement on the model. For example, angry and
change their behaviour in a way that makes them afraid are different emotions, but would be
perform better in the future" [9]. These systems considered negative valence and high arousal and
learn from data without being specifically thus difficult to differentiate on this type of emotion
programmed. Many ML algorithms use supervised space.
learning. In supervised learning, an algorithm learns Another problem in evaluating emotional responses
a set of labelled example inputs, generates a model to music exists in the distinction between perceived
associating the inputs with their respective labels or and induced emotions [15] - this is also relevant in
scores, and then classifies (or predicts) the label or multimodal stimulus such as film [16]. This might
score of unseen examples using the learned model. be broadly summarised as the listener or viewer
This can emotionally label music pieces for our understanding what type of feeling the stimulus is
system. supposed to express (perceived), versus describing
Kim et al. [10] and Laurier & Herrera [11] give a how it makes them feel (induced). For example, a
literature overview of detecting emotion in music sad piece of music may be enjoyable to an individual
and focus on the music representations. Laurier & listener in the right context, despite being
Herrera [11] also analyse the ML algorithms used. constructed to convey sadness.
Classification algorithms used in the literature 2 System Overview
include C4.5, Gaussian mixture models, k-nearest
Recent advances in portability, wearability, and
neighbour, random forest, support vector machines,
affordability of biosensors now allow us to explore
[10]–[12]. Regression techniques include Gaussian
evaluations considering the above distinction.
mixture model regression, multiple linear regression,
Biophysiological regulation may circumnavigate
partial least-squares regression and support vector
some of the problems of self-reported emotion (e.g.,
regression. ML has been used to retrieve music by
users being unwilling to report particular felt
mood and found the personalized approach more
responses, or confusing perceived responses with
consistent than a general approach [12]. A
felt responses). Real-world testing of systems using
significant area for further work is the need to better
bio-signal mappings in music generation contexts
understand the whole process and be more
has become an emerging field of research. For
intelligent with respect to music, users and emotions.
example, [17] generate simple music for specific
Thus, we underpin our system with results from
emotional states using Markov chains. The Markov
human experiments. These are only feasible on a
chains are used to generate music while the user

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 2 of 10
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

wears a heart-rate sensor to monitor their bio- 1000 kOhms


X
physiological response to the created music. The
system was able to generate emotionally responsive Scored song
corpus
music in a limited trial considering basic emotional
descriptors. 800 kOhms
We have developed another such system, which
assumes lower skin conduction variability as a Current Machine New
correlate for mindfulness. It attempts to generate Song Learning Song
emotionally congruent music as a training tool to
promote positive affective states in the context of
mindfulness. In the future, this system could also
work in reverse by using skin conductance Fig. 2. We determine the change required for the
variability as a control signal to inform musical user to attain their goal. The ML model selects a
feature mapping for non-linear music generation. new piece that is musically consistent but at the
required new calmness. Finally, the system has a
feedback loop to remove pieces that do not influence
the user’s emotional level.

2.1 Task 1
To generate a database of labelled musical pieces,
we elicited scores using user data accumulated
through an anonymous online survey. This is an
initial feasibility evaluation to assess whether human
labelling is possible. Hence, we used an anonymous
voluntary survey. We ran small pilot evaluations on
Fig. 1. Listeners rank musical excerpts, which are the best survey questions and determined that binary
analysed for features to train a ML model for comparison of two pieces elicited the most
construction of new excerpts. consistent results. We surveyed 53 participants using
The system detects the user’s current emotional level a Qualtrics on-line survey (www.qulatrics.com). We
and the ML algorithm picks musical pieces to distributed the URL link to the survey via email lists
influence their future emotional level to achieve to colleagues who responded anonymously but are
their desired mood. This whole process requires English speakers, which is important for
musical pieces that have an associated emotional understanding the emotional labels. For this
label (score) to allow the selection of appropriate development system, we selected music that is
pieces. We use two tasks to achieve this. The first unknown to the participants. As discussed in [11],
task in Fig. 1 and section 2.1 is to generate and emotions induced in the listener are influenced by
expand a human-labelled corpus to provide many different contextual elements, such as personal
sufficient labelled pieces for the system to operate. experiences, cultural background, music they have
The music generation process is described in detail only recently heard or other personal preferences, so
in section 2.1.1. The second task in Fig. 2 and using generated music as a stimulus may help to
section 2.2 is to analyse the user’s galvanic skin eliminate some of these confounds as
conductivity and to select the most appropriate preconceptions are removed. There is much debate
music from the corpus according to the user’s regarding adjectives as emotional descriptors, and
emotional requirement. There is also a feedback how they might be best interpreted particularly
loop to adapt the corpus scores according to the considering ambiguities across various languages. In
user’s actual experience. this work we use mindful (calm/ not scary) and
(tense/ scary) as these can be considered

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 3 of 10
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

diametrically opposite on the circumplex model of


affect [14]. We therefore consider “not scary vs
scary” as analogous to “high vs low mindfulness”
respectively, and designed the online survey asking
listeners to evaluate a generated pool of training
material accordingly.
Each participant evaluated four musical excerpts, 2
not scary (N1 & N2) and 2 scary excerpts (S1 & S2),
in a bipolar ranking across six pairs choosing the Fig. 4. Generated calm/mindful source material.
scariest in each pair {N1vsS1, S2vsN2, S1vsN2, Note pitch range and three # in bass clef which
N1vsS2, S1vsS2 and N1vsN2}. The survey imply A-Maj.
presented an initial question to allow the user to
familiarize themselves with the format and then 2.1.2 Apparatus
presented the six questions. The Qualtrics Musical stimuli were rendered using a range of
questionnaire allowed us to specify that each track synthesizer timbres intended to convey the intended
played in full to each participant to ensure that the emotional range across an assumed emotional space.
participant adapted fully to the track. We
randomized the order of presentation of the
questions (pairs of tracks) to each of the participants
to reduce contextual effects. Participants were not
required to answer every question in order to
complete the evaluation.
2.1.1 Material
Source material was generated by training a Hidden
Markov Model (HMM) and creating new
permutations of the HMM with deliberate feature
constraints following the procedure described in [4]. Fig..5. Chart of a rotated Valence-Arousal space
We use a transformative algorithm based on a after [12] bisected with a mindfulness scale
second order Markov-model with a musical feature assuming high mindfulness is a combination of high
matrix. It allows discrete control over five valence and low arousal, and vice versa for low
parameters in a 2-dimensional model. The model is mindfulness (with some suggested adjective labels at
generative and can be used to create new state each end of the scale)
sequences according to the likelihood of a particular In this space, low mindfulness might be equated
state occurring after the current and preceding states. dimensionally to high arousal and low valence, or
Fig. 3 and 4 show two example scores from the descriptively to adjectives such as scary, tense,
stimulus set. afraid, angry, etc., whilst high mindfulness might be
equated to low arousal and high valence, or
adjectives like calm, content, relaxed etc. This
‘mindfulness’ scale can be plotted via a rotation of
the traditional circumplex model of affect, as shown
in Fig. 5. We generate the source MIDI files in near
real-time and render them to audio with minimal
latency using a DAW. Previous studies showed that
the length of each music excerpt needs to be
between 30 seconds to 60 seconds long to
Fig. 3. Generated scary/angry source material. successfully induce emotions [18].All tracks were
>30 seconds long, including a fade out to ensure

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 4 of 10
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

they did not have an abrupt ending (which might and 89.1% of participants rated it scarier than N1
otherwise also influence emotional response in the and N2 respectively. Yet, 58.1% of the participants
participants). rated S2 scarier than S1 despite S2 having lower
scariness than S1 when compared to the non-scary
2.1.3 Results
tracks. Similarly, for N1, 4.6% and 10.9% rated it
Data from 53 participants was collected for analysis. scarier than S1 and S2 respectively while for N2,
Table 1 provides an overview of the composition of 6.8% and 11.9% rated it scarier than S1 and S2
the six bipolar questions and table 2 details the respectively. This presents a similar contradiction as
participants’ responses. for the scary tracks as N1 has lowest scariness rating
yet was rated scarier than N2 by 54.6% of
Table 1. Listing the pair of tracks in each question participants. We cannot explain this.
(Q1-Q6) with one question per column. Scary tracks Although we randomized the order of presentation
are labelled Sn and marked with shading of the questions to the individual participants, we did
Q1 Q2 Q3 Q4 Q5 Q6 not alter the order of presentation within the
N1 S2 S1 N1 S1 N1 questions. This may have contextual effects on the
S1 N2 N2 S2 S2 N2 participants and needs to be considered. However,
we note that the participants rated the second track
as scariest in Q5 and the first track as scariest in Q6
Table 2. Listing the number of participants that indicating that the intra-question ordering is unlikely
picked each track as the scariest of the pair of tracks to be significant.
in each question. Each question (Q1-Q6) is one From these comparisons, we were able to attain
column in the table sufficient data that we can calculate a ranked order
(score) for the pieces from these pairwise
Q1 Q2 Q3 Q4 Q5 Q6
comparisons [19]. From above, 58.1% of
2 37 41 5 18 24 participants thought S2 scarier than S1 while 54.6%
42 5 3 41 25 20 felt N1 was scarier than N2. Hence, the ranking is
that S2 is scarier than S1 and N2 is calmer than N1.
We produce a scored label in contrast to Laurier &
2.1.4 Analysis Herrera [10] who used a Boolean label, for instance
Responses to the musical stimuli in table 2 suggest a song is “happy” or “not happy”. However, this
that listeners found it relatively easy to discriminate Boolean label does not provide the fine-grained
the affective states between stimuli rendered using differentiation we require to select emotionally
different synthesized timbres. As expected (see relevant pieces so we produce a score from [0-10]
figures 3 and 4), shorter durations and larger pitch for each musical piece where 0 is completely calm,
ranges were considered lower in mindfulness 10 is completely scary and 5 is the midpoint: neither
(“scarier/more tense”) than longer durations with a calm nor scary.
more restricted pitch range, regardless of the timbre
being used. The tracks we expected to be labelled 2.1.5 Enhancing the corpus
“scary” were labelled “scary” by the participants and Human experiments are only feasible on a small set
the tracks we expected to be labelled “not scary” of music pieces as n pieces of music require n!
were labelled “not scary”. Questions 5 and 6 comparisons and enough human survey participants
compare the two “scary” tracks and the two “not to provide enough responses for each of the n!
scary” tracks respectively. Here the results are closer comparisons. Using human participants to generate a
as we may expect. 58.1% of participants thought S2 sufficiently large database of labelled pieces for our
scarier than S1 while 54.6% felt N1 was scarier than work is very time consuming and complex. To
N2. augment our small labelled database, we need to use
For S1, 94.5% and 93.2% of participants rated it ML to label new music and to provide a corpus
scarier than N1 and N2 respectively. For S2, 88.1% sufficiently large for task 2 to be feasible.

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 5 of 10
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

2.2 Task 2 2.2.1 Evaluation


The second task is to analyse the user’s galvanic In Williams et al. [24], 30 participants evaluated two
skin conductance and then to select the most automatically generated pieces against two well-
appropriate music. This process is similar to Huang known pieces of scary music (the themes from the
and Cai [17] who analysed heart rate to reflect Psycho and Jaws films which were composed to be
emotions and to select appropriate music pieces. scary). The two generated pieces induced emotional
Huang demonstrated varying heart rates (beats per responses where the response described by the
minute (bpm)) of participants according to the participant in a questionnaire tallied with their bio-
emotional label of the piece played. Happy music physiological responses as measured by GSR
induced the highest bpm and sad music the lowest. sensors. Our analyses [24] revealed that there is a
However, the music they labelled angry induced a direct correlation between the scariness of a musical
very similar bpm to the music labelled joyful. We piece, the users’ GSR reading and the emotions they
analyse skin conductivity, or galvanic skin response describe feeling in a questionnaire survey conducted
(GSR). When the skin’s sweat glands secrete sweat, after listening. Users display elevated GSR for scary
it changes the balance of positive and negative ions pieces which they also labelled as scary in the
in the secretion, thus increasing the skin’s questionnaire and lower GSR and appropriate labels
conductivity. Measurement of GSR has been shown for calmer pieces. Our preliminary experiments also
to be a robust metric for analysis of emotional highlighted that familiarity influences people’s
responses to music [6], [22], [23]. responses. For this reason, we focus on generating
The first step of task 2 requires us to compare the novel music to ensure that the user responds
user’s GSR signal, the emotional tag they describe emotionally rather than responding to memories
after listening and the calmness level of the piece the evoked by the music. We also keep to two labels:
participant is listening to. To analyze GSR, we used “not scary/calm” and “scary/tense” to limit
the Shimmer3 wireless GSR+ Unit1 which has been confusion by reducing complexity.
validated for use in biomedical-oriented research This indicates that we are able to compose
applications1, can detect very small changes of emotionally relevant music using HMMs. For the
GSR, and can stream data in real time [24]. It can two film pieces, where participants were familiar
also connect to recording software and export the with the music then the emotional responses they
data for extended analysis. Shimmer3 needs to be described in the questionnaire tallied with the
calibrated on each use through the user wearing the expected response for a scary film but did not
device for one minute to establish a baseline skin necessarily tally with their bio-physiological
conductance signal. The baseline of each person responses. For this reason, we have focussed on
varies due to many factors including skin dryness, auto-generating music to generate our own corpus
nervousness (due to unfamiliarity with the and using these to induce mindfulness and relaxation
experimental procedure) and ambient temperature. rather than selecting music from a known corpus
The captured reading for each user under analysis is (e.g., using streaming platforms like Spotify) which
their skin conductance response whilst undertaking will contain pieces of music with varying degrees of
the listening exercise, minus their individual skin familiarity for the listeners.
conductance response baseline. After listening to
each piece, the users completed a questionnaire
3 Future Work
describing the emotion they felt while listening For task 1, we will use our human-labelled data to
which we compared to the GSR data [24]. analyse a number of ML methods to identify the best
ML method with respect to accuracy foremost, but
also flexibility, scalability, and adaptability.
Classification and regression algorithms need a rich
1
http://www.shimmersensing.com/products/shimmer3-wireless- data description of each piece for learning. We have
GSRsensor developed a multi-feature music representation to
enable this. We couple the symbolic musical feature

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 6 of 10
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

data from a MIDI file, which represents the structure will provide additional data. Combining data from
of the melody, chords, rhythm and other musical multiple sensors will be richer. Analysing this richer
dimensions, with Mel-Frequency Cepstral data using suitable machine learning algorithms will
Coefficients features [20] obtained from the entire be more accurate, more reliable and reveal finer-
piece to represent the piece's quality (character). grained fluctuations and changes in the participant’s
This dual representation is more flexible and richer emotional responses than would be revealed by
than simply using MIDI or signal-based audio analysing only a single sensor.
content features. We only use numerical data Further evaluation of our automatically generated
features to describe each piece and perform feature music is not trivial. Although the influence of music
selection to identify the most significant set of on emotional state is widely acknowledged [13],
features as described in [21]. Using this reduced set [25], [26] perceptual audio evaluation strategies
of significant features, the ML model will predict the often consider issues of audio quality [27] rather
“calmness” score of new music pieces by than the influence of generative strategy on the
determining the similarity between pieces using their resulting affective state in the listener. Moreover,
respective sets of features. methods which do consider the influence of
In task 2, from the GSR analysis, we can calculate generative music on affective state tend to be
the new calmness level required to achieve a focused on creativity [28], and issues regarding the
mindfulness goal (e.g., make the listener calmer if authorship of the material [29]. Thus, methods for
they are over-stimulated). This new calmness value perceptual evaluation of affectively-charged music
allows us to retrieve all pieces of music in the scored generation remain a significant area for further
song corpus at this new calmness level. To select the work: for example in the design and development of
most appropriate piece from this matching set, we a multi-purpose evaluation toolkit. Many such kits
will match the input piece against each matching exist in audio quality evaluation, for example [30].
piece using the selected set of features. We will use
the same music data representation as task 1 and the
4 Conclusion
identical ML model to ensure consistency and to We have shown through an experiment with 53
stop the system introducing contextual biasing and participants and four music pieces that we can
irregularities. We summarize task 2 in Fig. 1. The generate emotionally communicative music. We
features we have selected are input to the ML model, generated two scary pieces and two calm pieces and
we will calculate the musical similarity score for the users ranked these as we expected, thus
each piece using the ML model and then recommend supporting our hypotheses regarding how to
the music piece that is at the correct calmness level generate calm and scary music in a more robust
and is most similar (musically contiguous) with the fashion in future.
user's current state. To support this we intend to generate a larger corpus
As we continue monitoring the participant’s GSR we of pieces and recruit further listeners to bootstrap the
will assess whether the new piece has achieved the generation of a larger corpus. A rich data description
desired level of calmness. This difference (error of human labelled pieces will allow machine-
between actual and required GSR) will feed back learning algorithms to label new pieces
into the corpus of scored pieces to adjust the independently, which would mean we can expand
stimulus calmness score (essentially a calmness the corpus to any size required for a task.
index). We will adjust both the global score to Once we have a sufficiently large labelled corpus of
ensure the system correctly rates each piece and the our auto-generated music, we will use these to select
person’s own scoring mechanism to provide pieces to play to users according to their galvanic
personalized music for their mindfulness skin response. Our previous work [24] showed that
requirements. we can combine auto-generated music and GSR
We can enhance the monitoring further by using monitoring to induce emotions and that these
additional sensors. We have proven GSR sensors for emotions correspond with those felt by the listener
this task but other sensors such as heart-rate sensors (as self-reported via questionnaires). The ultimate

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 7 of 10
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

goal of the system would be to generate a calmer However, such work also needs to heed the potential
piece in direct response to a listener’s physiological drawbacks of emotional manipulation using AI and
reaction and promote the necessary emotional state related systems. There is potential for emotional
for enhanced mindfulness. Physiologically informed manipulation for marketing purposes or social
feedback is vital for this process: Any error (error control issues. However, the promising every day
between actual and required GSR) feeds back into applications for mindfulness, and the potential
the corpus of scored pieces to adjust that particular therapeutic applications of this provides a strong
piece’s calmness score by a small error factor. This argument to continue investigating this area.
requires a large dataset as the corpus will be adjusted
gradually and incrementally to maximize the
5 Acknowledgements
available emotion space. By using our own This work was supported by the Digital Creativity
automatically-generated pieces, we can minimize Labs (www.digitalcreativity.ac.uk), jointly funded
confounds of familiarity, and the need to actively by EPSRC/AHRC/Innovate UK under grant no
rank music whilst listening (in itself a process which EP/M023265/1.
might break mindfulness or relaxation). Thus the use References
of biophysiological sensors is critical in the
development of suitable systems for audio [1] M. Economides, J. Martman, M. J. Bell, and
generation in the context of mindfulness or B. Sanderson, “Improvements in Stress,
relaxation. Affect, and Irritability Following Brief Use of
Generative music technology has the potential to a Mindfulness-based Smartphone App: A
produce infinite soundtracks in sympathy with a Randomized Controlled Trial,” Mindfulness,
listener’s bio-signals, and in a biofeedback loop. vol. 9, no. 5, pp. 1584–1593, Oct. 2018.
There are promising applications for linking music [2] R. Chambers, E. Gullone, and N. B. Allen,
with emotions, especially in the creative industries “Mindful emotion regulation: An integrative
art and therapy, and particularly for relaxation. review,” Clin. Psychol. Rev., vol. 29, no. 6,
Enhancement of well-being using music and the pp. 560–572, 2009.
emotions music induces is becoming an emerging [3] G. Bondolfi, “Depression: the mindfulness
topic for further work. The potential of cheaper method, a new approach to relapse,” Rev.
wearable biosensors to collect large amounts of data Med. Suisse, vol. 9, no. 369, p. 91, 2013.
for training machine learning algorithms suggests [4] D. Williams, A. Kirke, E. R. Miranda, E.
that gamifying emotions through musical sound Roesch, I. Daly, and S. Nasuto, “Investigating
synthesis might be possible in the near future. For affect in algorithmic composition systems,”
example, this type of audio stimulus generation need Psychol. Music, vol. 43, no.6, pp.831-854,
not be restricted to a given extracted bio-signal value 2014.
- in future, trials with target emotional values could [5] I. Daly et al., “Automated identification of
be conducted, i.e., encouraging the listener to move neural correlates of continuous variables,” J.
towards a specific emotional correlate or Cartesian Neurosci. Methods, vol. 242, pp. 65–71, 2015.
co-ordinate in a dimensional emotion model, such as [6] S. D. Vanderark and D. Ely, “Cortisol,
a gamified approach to mindfulness, or a biosensor biochemical, and galvanic skin responses to
driven thriller or horror game. We note that the music stimuli of different preference values by
music generation software using HMMs allows us to college students in biology and music,”
generate this music rapidly so we can generate on- Percept. Mot. Skills, vol. 77, no. 1, pp. 227–
the-fly and on-demand in the future rather than 234, 1993.
selecting pre-generated tracks from a corpus. This [7] K. Garner, “Would You Like to Hear Some
auto-generation is much richer, more varied, more Music? Music in-and-out-of-control in the
adaptive and more personalized than selecting from Films of Quentin Tarantino,” Film Music Crit.
a play list. Approaches, pp. 188–205, 2001.

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 8 of 10
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

[8] M. Scirea, J. Togelius, P. Eklund, and S. Risi, emotion models, and stimuli,” Music Percept.
“Affective evolutionary music composition Interdiscip. J., vol. 30, no. 3, pp. 307–340,
with MetaCompose,” Genet. Program. 2013.
Evolvable Mach., vol. 18, no. 4, pp. 433–465, [19] F. Wauthier, M. Jordan, and N. Jojic,
2017. “Efficient ranking from pairwise
[9] I. H. Witten, E. Frank, M. A. Hall, and C. J. comparisons,” in International Conference on
Pal, Data Mining: Practical machine learning Machine Learning, 2013, pp. 109–117.
tools and techniques. Morgan Kaufmann, [20] B. Logan and others, “Mel Frequency Cepstral
2016. Coefficients for Music Modeling.,” in ISMIR,
[10] Y. E. Kim et al., “Music emotion recognition: 2000, vol. 270, pp. 1–11.
A state of the art review,” in Proc. ISMIR, [21] V. J. Hodge, S. O’Keefe, and J. Austin,
2010, pp. 255–266. “Hadoop neural network for parallel and
[11] C. Laurier and P. Herrera, “Automatic distributed feature selection,” Neural Netw.,
detection of emotion in music: Interaction vol. 78, pp. 24–35, 2016.
with emotionally sensitive machines,” in [22] D. C. Shrift, “The galvanic skin response to
Machine Learning: Concepts, Methodologies, two contrasting types of music,” University of
Tools and Applications, IGI Global, 2012, pp. Kansas, Music Education, 1954.
1330–1354. [23] I. Daly et al., “Towards human-computer
[12] A. C. Mostafavi, Z. W. Ras, and A. music interaction: Evaluation of an
Wieczorkowska, “Developing personalized affectively-driven music generator via
classifiers for retrieving music by mood,” in galvanic skin response measures,” in, 7th
Proc. Int. Workshop on New Frontiers in Computer Science and Electronic Engineering
Mining Complex Patterns, 2013. Conference (CEEC), IEEE, pp. 87–92, 2015.
[13] M. Zentner, D. Grandjean, and K. R. Scherer, [24] D. Williams, C-Y. Wu, V. J. Hodge, D.
“Emotions evoked by the sound of music: Murphy and P. I. Cowling, “A Psychometric
Characterization, classification, and Evaluation of Emotional Responses to Horror
measurement.,” Emotion, vol. 8, no. 4, pp. Music,” in 146th Audio Engineering Society
494–521, 2008. International Pro Audio Convention, Dublin,
[14] J. A. Russell, “A circumplex model of March 20-23, 2019.
affect.,” J. Pers. Soc. Psychol., vol. 39, no. 6, [25] K. R. Scherer, “Acoustic Concomitants of
p. 1161, 1980. Emotional Dimensions: Judging Affect from
[15] A. Gabrielsson, “Emotion perceived and Synthesized Tone Sequences.,” in
emotion felt: Same or different?,” Music. Sci., Proceedings of the Eastern Psychological
vol. 5, no. 1 suppl, pp. 123–147, 2002. Association Meeting, Boson, Massachusetts,
[16] L. Tian et al., “Recognizing induced emotions 1972.
of movie audiences: Are induced and [26] K. R. Scherer, “Which Emotions Can be
perceived emotions the same?,” in Affective Induced by Music? What Are the Underlying
Computing and Intelligent Interaction (ACII), Mechanisms? And How Can We Measure
2017 Seventh International Conference on, Them?,” J. New Music Res., vol. 33, no. 3, pp.
2017, pp. 28–35. 239–251, Sep. 2004.
[17] C.-F. Huang and Y. Cai, “Automated Music [27] J. Berg and F. Rumsey, “AES E-Library:
Composition Using Heart Rate Emotion Spatial Attribute Identification and Scaling by
Data,” in International Conference on Repertory Grid Technique and Other
Intelligent Information Hiding and Methods,” in 16th International Conference:
Multimedia Signal Processing, 2017, pp. 115– Spatial Sound Reproduction, 1999.
120. [28] F. Rumsey, B. de Bruyn, and N. Ford,
[18] T. Eerola and J. K. Vuoskoski, “A review of “Graphical elicitation techniques for
music and emotion studies: approaches, subjective assessment of the spatial attributes

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 9 of 10
Williams, (Hodge, Gega, Murphy, Cowling and Drachen) AI Automated Music Generation for Mindfulness

of loudspeaker reproduction ’ a pilot


investigation,” in 110th Audio Engineering
Society Convention, Amsterdam, 2001.
[29] O. Ben-Tal and J. Berger, “Creative aspects of
sonification,” Leonardo, vol. 37, no. 3, pp.
229–233, 2004.
[30] T. Neher, T. Brookes, and F. Rumsey, “AES
Journal Forum: A Hybrid Technique for
Validating Unidimensionality of Perceived
Variation in a Spatial Auditory Stimulus Set,”
J. Audio Eng. Soc., vol. 54, no. 4, pp. 259–
275, 2006.

AES Conference on Immersive and Interactive Audio, York, UK, March 27–29, 2019
Page 10 of 10

You might also like