Disambiguating Music Emotion Using Software Agents: Dan Yang Wonsook Lee
Disambiguating Music Emotion Using Software Agents: Dan Yang Wonsook Lee
Music retrieval systems typically lack query-by- The structure of emotion studied by psychologists
emotion, leaving it up to the user to know which artist, includes core affect [14], emotion, mood, attitude [13],
album, and genre names correlate with the desired and temperament [21]. Psychological perspectives on
musical emotion. Progress in this area is hindered by a music and emotion have tended to focus on people’s
“serious lack of annotated databases that allow the verbal reports of feeling states and on whether these
development of bottom-up data-driven tools for musical emotion words can be structured according to how
content extraction” [8]. In this paper we show that many and which dimensions [14][15][19].
machine learning techniques can be embedded in
annotation tools using software agents, in support of Dimensional ratings are quick single-item test scales for
high-level design goals. eliciting emotions [15], suitable for repeated use in
One high-level design choice is to model musical applications such as annotating short musical segments.
The aim is to not confuse the user with the need to
Permission to make digital or hard copies of all or part of this work
for personal or classroom use is granted without fee provided that
discriminate between similar emotions, so related
copies are not made or distributed for profit or commercial emotions are placed close together on the dimensional
advantage and that copies bear this notice and the full citation on scale. This makes the single-item test rating a useful
the first page. entry point for eliciting emotion. Some studies directly
© 2004 Universitat Pompeu Fabra. relate dimensional scale ratings to musical features such
as tempo, or perceived energy [12]. Others have
depicted the relationship between dimensional scales listening habits. The system learns which songs are
and musical features with greater complexity [9]. Music played in which listening habit.
emotion is generally seen as irreducible to simply one or
two dimension ratings. For example, the online All The above taxonomies tended to be ad hoc lists mixing
Music Guide [4] uses over 160 different discrete together words for feelings, thoughts and everyday
emotion categories (e.g. trippy, quirky) to describe activities instead of systematically examining these
artists, by using up to 20 emotion words to describe the affective, cognitive and behavioural aspects of emotion.
tone of their career’s work. Baumann’s Beagle system
[1] mines text documents to collect all the emotion- 2.3. Systems for music data mining
related words (e.g. ooh, marry, love, wait, dance, fault),
used in lyrics or online reviews Human listening is very effective at organizing the
stream of auditory impulses into a coherent auditory
Psychological research has explored ways of unifying image. If digital signal processing primitives can be
the dimensional and discrete approaches to emotion used to discern features of interest to a human listener,
ratings. Sloboda and Juslin [15] note that dimensional then these are useful to add to the music emotion
and discrete models can be complementary to each annotation environment. The evaluation of the best
other. One accessible approach is the PANAS-X test features is hindered by a lack of standardized databases
scale [22] which has two dimensional ratings called [5]. Current feature extraction tools are very low-level,
Positive Affect (PA) and Negative Affect (NA). The such as MPEG-7 Low Level Descriptors [6].
dimensional ratings function as entry points to more
detailed ratings of discrete emotions under each axis Recently, wavelet techniques have been developed that
(e.g. Fear under NA). The two PANAS-X dimensions tile the acoustic landscape into smaller features that
can be mathematically related to Russell’s circumplex correspond to musical elements such as octaves [20].
model[14]. Russell’s Arousal is the sum of PA and NA, Another innovation is the automatic discovery of feature
while Russell’s Valence is the difference (PA – NA). extractors. Sony’s Extractor Discovery System (EDS)
Tellegen, Watson and Clark [19] use the Valence [12], uses genetic programming to construct trees of
dimension (pleasant-unpleasant) as the top-level entry DSP operators that are highly correlated to human-
point of a 3-layer model. This unified model offers the perceived qualities of music.
benefits of dimensional ratings, plus a theoretical basis
that links the entry-point of the hierarchy to the discrete Baumann’s Beagle system [1] demonstrates the
emotion categories at the base (as shown in Figure 1). relevance of mining music reviews and lyrics for
emotion words co-occurring with artist names and song
names. State-of-the-art performance on extracting
2.2. Systems for music annotation individual names from text is about 90%, but accuracy
falls below 70% when compound relations such as
A number of online systems exist for annotating popular (artist, song, emotion) because errors multiply.
music emotion. The All Music Guide collects user
responses from the web in dimensional form (e.g. 3. EMO SYSTEM
exciting/relaxing, dynamic/calm). Moodlogic.com
collects user emotion ratings from the web in bipolar
dimensional form (general mood positive or negative),
in multivalent dimensional form (e.g. brooding, quirky) 3.1. Motivation
as well as discrete terms (e.g. love, longing).
Moodlogic.com allows query-by-emotion using 6
Initially we were looking for online datasets of music
discrete emotion categories (aggressive, upbeat, happy,
already annotated by discrete emotion labels, and what
romantic, mellow and sad). Songs are regularly labelled
we found was an ad hoc mix of discrete terms, including
by two or more emotions. A query for two emotions
cognitive, behavioural and affective words. Single-item
together, ‘both happy and upbeat’, retrieved about half
dimensional ratings did not separate like emotions, such
the songs in the database. There was no way to
as happy/upbeat or anger/fear. Hierarchical models of
disambiguate the results into happy, upbeat as separate
music emotion recognition have been reported by Liu
emotions.
[10], and we decided to extend the hierarchical
approach further to the level of discrete emotion
Microsoft MSN.com has 115 Mood/Theme discrete
categories. The key problem we found annotating music
category names. No theory of emotion is used in this
in greater detail was the cognitive load on the annotator
taxonomy, which is based on a mix of artist names,
in both listening and reporting in detail. The solution we
emotion names, country names, and names of parts of
are developing uses software agents that learn to make
the daily routine such as workout, dinner, etc. Musicat
the annotation task more efficient.
[3] also used names of everyday contexts such as
bedtime as well as moods (happy, romantic) to label
3.2. Emotion model persist a little time after some event, while Hostility is
directly involved with some threat event.
In this paper we focus on Negative Affect as these
emotions are less distinguishable from each other than Sadness and guilt show some separation at the middle
positive emotions are from each other [21]. This finding level of the emotion model, because sadness is slightly
is shown in the way emotions such as fear and anger are correlated with Positive Affect while guilt is not. This
highly correlated in the dimensional model. In real implies that sadness and guilt can be better
terms, this can be seen in the way that related episodes differentiated by assessing the Positive Affect
of negative emotion such as hostility, paranoia and component, given lack of separability in terms of
sadness occur in depressive illness. Negative Affect.
The Watson model [19][21] explains negative emotion The Tellegen-Watson-Clark model discussed in Section
as the threat-avoidance function in the structure of 2.1 is useful in linking the dimensional and discrete
emotions. Taking fear and anger as an example, these levels of emotion. The experiments and results in
categories are both high in negative affect on a section 4 are based on this model shown in Figure 1.
dimensional scale (high Negative Affect in Figure 1), but
they can be distinguished apart based on their pattern of
response to the threat. Fear anticipates a threat and
4. EXPERIMENTS
triggers flight, while anger can involve a fight against
the threat. By finding more information about the
pattern of response to a threat, using information from
both the lyrics and the music, negative emotions can be 4.1. Music emotion intensity prediction
distinguished from each other into discrete classes.
This first experiment was designed to implement a
Taking anger and guilt as an example, these are both classifier for music emotion intensity, understood in
high in negative affect on a dimensional scale (high terms of the psychological models of Russell [14] and
Negative Affect in Figure 1) and practically uncorrelated Tellegen-Watson-Clark [19], where intensity represents
with Positive Affect. Guilt is related to feelings that the sum of the PA and NA dimensions (in Figure 1).
High PA
delighted
Pleasantness alert
amazed
excited
surprised
happy
focused astonished
joyful
angry
distressed
at rest fearful
Low NA High
HighNA
NA
calm afraid
relaxed scared
discouraged
sad ashamed
sleepy downhearted
quiet tired
still sluggish
Unpleasantness
Low PA
Figure 1. Elements of the Tellegen-Watson-Clark emotion model [19] [21]. Dotted lines are top-
level dimensions. The Positive Affect (PA) and Negative Affect (NA) dimensions shown as
solid lines form the middle of the hierarchy, and provide heuristics needed to discern the
specific discrete emotion words based on function. Discrete emotions that are close to an axis
are highly correlated with that dimension, e.g. sad is slightly correlated with positive affect.
The emotional intensity rating scale was calibrated to consisted of 500 randomly-chosen rock song
Microsoft’s emotion annotation patent [17] , using the segments of 20 seconds each taken beginning a third
same method to train a volunteer. The initial database of the way into the song.
learning package was used. The original text files
Acoustic feature extraction used a number of tools to were also examined using the Rainbow text mining
give a broad mix from which to select the best package[11].
features.
Wavelet tools [20] were used to subdivide the signal 4.2.1. Results and analysis for text disambiguation
into bands approximating octave boundaries, and then
energy extraction and autocorrelation were used to This experiment tested the idea that general
estimate Beats per Minute (BPM). Other acoustic psychological features driving emotions, such as
attributes included low-level standard descriptors seeking and attaining goals or reacting to threats, can
from the MPEG-7 audio standard (12 attributes). be conveyed specifically in text and can add focus to
Timbral features included spectral centroid, spectral the way music is interpreted.
rolloff, spectral flux, and spectral kurtosis. Another 12
attributes were generated by a genetic algorithm using Hostility expletives, not, get, got, want, never,
the Sony Extractor Discovery System (EDS) [12] with don’t, go, no, my, oh, fight, burn, show,
simple regression as the population fitness criteria. had, you
Labels of intensity from 0 to 9 were applied to Sadness love, life, time, say, slowly, hold, feel,
instances by a human listener reporting the subjective said, say, go, if
emotional intensity, following exactly the human Guilt one, lost, heart, face, alone, sleep,
listening training method in the Microsoft patent [17]. mistake, memory, lies, eyes, die, silence,
remember
The WEKA package [23] was used for machine
learning. The results below were calculated using Table 1. Lyric words that distinguish lyrics by
negative emotion i.e. have high information gain.
Support Vector Machine (SVM) regression.
4.1.1. Results and analysis for emotion intensity Hostility Words about no gain from love or
friendship,
This experiment confirmed the results of Liu [10] Not being a participant in love or
which found that emotional intensity was highly friendship, Words about not
correlated with rhythm and timbre features. We understanding,
achieved almost 0.90 correlation (mean absolute error Words expressing a need or intent
0.09), the best features being BPM, Sum of Absolute Sadness Loss of well-being,
Values of Normed Fast Fourier Transform (FFT), and Words about a gain of well-being without
Spectral Kurtosis [7]. color or relationship
Guilt Saying-type words,
4.2. Disambiguation of emotion using text mining Talking about gains from love or
friendship, Passive-type words
In our compositional model of musical emotion, non-
Table 2. General Inquirer[18] psychological features
acoustic features such as lyrics words or social
of lyrics text that most distinguish lyrics by negative
contextual content do play a role of focusing specific emotion in the WEKA C4.5 decision tree.
emotions and help account for the range of emotional
responses to the same song. One problem with this
approach is the size of the vocabulary used in Hostility, Sadness and Guilt
expressive lyrics, which could be over 40,000 The negative affect behaviours are related to threat
different words for songs in English. Various feature- avoidance, so words strongly related to distinguishing
reduction strategies are used in classic machine each negative emotion from other negative emotions
learning, but it is not certain how well these apply to were ranked using their information gain [23]. The
emotion detection. We chose an established approach, data shown in Table 1 includes forms of hostile
the General Inquirer[18], to begin to explore the display shows in the form of threatening expletives,
available techniques for verbal emotion identification. and sounds showing lack of constraint such as ah, oh.
This system was chosen for its good coverage of most Other words tended to connote commands or threats,
English words, and its compactness of representation, such as no, don’t etc. There were references to
with 182 psychological features. ‘weapons’ and destruction such as fire, burning etc.
Words that favoured guilt over hostility are related to
Of 152 30-second clips of Alternative Rock songs waking/sleeping, mistakes, and reflection. The
labelled with emotion categories by a volunteer, only references to low energy activities in guilt are
145 songs had lyrics. The emotion categories of the interesting, considering that the music is as arousing
PANAS-X schedule [22] were used. Lyrics text files as hostility.
were transformed into 182-feature vectors using the
General Inquirer package, and the WEKA machine
Sadness was also interesting in strongly referring to different types of errors in classification from the
positive words such as love, life, feel etc. This higher acoustic and text features.
correlation of sadness to positive affect is predicted in
the emotion model. 5. CONCLUSION AND FUTURE WORK
Table 2 shows the psychological features mined from This paper evaluated a structured emotion rating
lyrics using WEKA’s implementation of C4.5. model for embodiment in software agents to assist
Informally, these features appear to make sense, and human annotators in the music annotation system
the machine learning has mined verbal patterns that Emo. In experiments we found the structured emotion
one would expect to correspond with these negative model useful in the context of a compositional model
emotions, such as needing-type words associated with of musical meaning and emotion, where text features
anger. focused attention on more specific music emotions.
Experiments were designed to explore this model,
Love, Excitement, Pride and we focused on negative emotions where there is
the greatest ambiguity.
Love Not knowing-type words,
Not political, Results were given for a single-attribute test to rate
Not loss of well-being, emotion intensity (the sum of positive and negative
Not negative, energy in the model), based on 500 songs. About
Not failing, 90% accuracy was achieved using both timbral and
Gains from love and friendship, rhythmic features. For learning to distinguish like-
Passive-type word, valenced emotions, a sample of 145 full-text lyrics
Not saying-type word showed promising results. Informally, the verbal
Excitement Gains of well-being from relationship, emotion features based on General Inquirer appeared
Animal-type words to correlate with significant emotion experiences
Pride Political words, reported by listeners. The small sample size
Respect, precluded robust testing at this exploratory stage. The
Initiate change, allmusic.com[4] song browser, with 1000s of songs
Knowing-type words classified by mood, could be one way to increase the
Attentive Knowing-type words, sample size significantly.
Color words
Reflective Passive type words Future work will investigate the way in which a
Calm Completion of a goal-type words compositional model of musical meaning and
emotion can be deployed using graphical user
Table 3. General Inquirer[18] features that most
interface devices for the user. The system tracks the
distinguished lyrics by positive emotion in the
WEKA C4.5 decision tree.
focus of attention as each emotion is experienced by
the user, and the resulting annotation trees can be
mined to help confirm the theory of music as
Table 3 shows the psychological features that WEKA compositional. Subtle shifts in cognitive focus can
mined from song lyrics associated with positive correspond with shifts in musical meaning and
emotions. Informally, these results are recognizable in emotion. Different methods of verbal emotion
terms our common-sense understanding of emotions. identification will be investigated, as this is a new
and rapidly growing area of machine learning
4.3. Disambiguation of emotion using data fusion research. The existing bootstrap database appears
adequate, and with further use there will be more
This experiment fused together both acoustic and text songs added to the database, complete with
features to maximise the classification accuracy. appropriate stimulus material such as lyrics and
cultural data. The emerging Semantic Web also
There was an increase in accuracy of successful provides further opportunities to find musical
classification from 80.7% to 82.8%, a decrease in stimulus material by means of a shared music
mean error from 0.033 to 0.0252. The decrease in ontology [1]. Graphical media types are also relevant
root relative squared error was from 30.62% to as stimulus material for popular music such as music
25.04%. videos. Visual features could be extracted from music
videos as MPEG-7 video descriptors, and related to
These results do not distinguish very much between the function of each emotion. Some researchers
the two procedures on such a small training set believe that a more promising approach to rating
without testing, but the numbers do not contradict the emotion is to use direct physiological means [2]. This
informal discussion in the preceding section. A larger type of input could be added to the resource hierarchy
study would have more scope for examining the of Emo.
6. REFERENCES Retrieval, Classification and Clustering”.
From www-2.cs.cmu.edu/ ~mccallum/bow.
[1] Baumann, S, & Klüter, A., “Super- [12] Pachet, F. & Zils, A., “Evolving
convenience for Non-musicians: Querying Automatically High-Level Music
MP3 and the Semantic Web”, Proceedings Descriptors from Acoustic Signals”, Lecture
of the International Symposium on Music Notes In Computer Science, Vol. 2771, 42-
Information Retrieval, Paris, France, 2002. 53 Springer Verlag, Heidelberg Germany,
2004.
[2] Cacioppo, J.T., Gardner, W.L. & Berntson,
G.G., “The Affect System Has Parallel and [13] Russell, J.A., Weiss, A. & Mendelsohn,
Integrative Processing Components: Form G.A., “Affect Grid: A Single-Item Scale of
Follows Function”, Journal of Personal and Pleasure and Arousal”, Journal of
Social Psychology, Vol. 26, No. 5, 1999, Personality and Social Psychology, 57, 3,
839-55. 495-502, 1989.
[3] Chai, W., “Using User Models in Music [14] Russell, J.A., “Core affect and the
Information Retrieval Systems”, psychological construction of emotion”,
Proceedings of the International Symposium Psychological Review Vol. 110, No. 1, 145-
on Music Information Retrieval, Plymouth, 172, Jan 2003.
MA, USA, 2000. [15] Scherer, K.R., “Toward a dynamic theory of
[4] Datta, D., “Managing metadata”, emotion”, Geneva Studies in Emotion, No.
Proceedings of the International Symposium 1, 1-96 Geneva Switzerland, 1987.
on Music Information Retrieval, Paris, [16] Sloboda, J.A. and Juslin, P.N.
France, 2002. “Psychological perspective on emotion”, in
[5] Downie, J.S., “Towards the Scientific Juslin, P.N. and Sloboda, J.A.(eds.), Music
Evaluation of Music Information Retrieval and Emotion, Oxford University Press, New
Systems”, Proceedings of the International York, NY, USA, 2001.
Symposium on Music Information Retrieval, [17] Stanfield, G.R., “System and methods for
Baltimore, MD, USA, 2003. training a trainee to classify fundamental
[6] ISO/IEC TC1/SC29/WG11, “ISO/IEC properties of media entities”, US Patent
15938-6 Information Technology – Application No. 20030041066, Feb 27,
Multimedia Content Description Interface 2003.
(MPEG-7) – Part 6: Reference Software”, [18] Stone, P. J., The general inquirer a
Geneva, Switzerland, 2000. computer approach to content analysis. MIT
[7] Kenney, J. F., Keeping, E. S., Section 7.12 Press, Cambridge MA USA, 1966.
in “Mathematics of Statistics” Pt.1, 3rd ed. [19] Tellegen, A., Watson, D. & Clark, L.A., “On
Princeton, NJ; Van Nostrand, pp.102-103, the dimensional and hierarchical structure of
1962. affect”, Psychological Science, Vol. 10, No.
[8] Leman, M., “GOASEMA – Semantic 4, July 1999.
description of musical audio”. Retrieved [20] Tzanetakis, G., “Manipulation, Analysis and
from http://www.ipem.ugent.be/. Retrieval Systems for Audio Signals”, PhD
[9] Leman, M., Vermeulen, V., De Voogdt, L., Thesis, Princeton University, Princeton, NJ,
Taelman, J., Moelants, D. & Lesaffre, M., USA, 2002.
“Correlation of Gestural Music Audio Cues [21] Watson, D., Mood and Temperament,
and Perceived Expressive Qualities”, Guilford Press, New York, NY, USA, 2000.
Lecture Notes in Artificial Intelligence, Vol.
2915, 40-54, Springer Verlag, Heidelberg, [22] Watson, D. & Clark, L.A, “The PANAS-X
Germany, 2004. Manual for the Positive and Negative Affect
Schedule – Expanded Form”. Retrieved
[10] Liu, D., Lu, L. & Zhang, H.J., “Automatic from http://www.psychology.uiowa.edu/
Mood Detection from Acoustic Music
Data”, Proceedings of the International [23] Witten, I.H., & Frank, E., Data Mining:
Symposium on Music Information Retrieval, Practical machine learning tools and
Baltimore, MD, USA, 2003. techniques with Java implementations,
Morgan Kaufmann, San Francisco, CA,
[11] McCallum, A., “Bow: A Toolkit for USA, 2000.
Statistical Language Modelling, Text