Affect and Emotion in Human-Computer Interaction - From Theory To Aplications - Christian PeterC PDF
Affect and Emotion in Human-Computer Interaction - From Theory To Aplications - Christian PeterC PDF
Affect and Emotion in Human-Computer Interaction - From Theory To Aplications - Christian PeterC PDF
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Christian Peter Russell Beale (Eds.)
13
Volume Editors
Christian Peter
Fraunhofer Institute for Computer Graphics
Joachim-Jungius-Str. 11, 18059 Rostock, Germany
E-mail: [email protected]
Russell Beale
University of Birmingham, School of Computer Science
Edgbaston, Birmingham, B15 2TT, UK
E-mail: [email protected]
ISSN 0302-9743
ISBN-10 3-540-85098-8 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-85098-4 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2008
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12445844 06/3180 543210
Preface
Interacting with computers is ever changing; the activities computers are used for, the
domains they are used in, the people who use them, and the way that they are used
have been constantly evolving. But over the past few years an additional dimension
has been added to this interaction, which considers the utility and effectiveness of
incorporating emotion into the interface. Over ten years ago, Rosalind Picard coined
the phrase “affective computing” for computing that relates to, arises from, or deliber-
ately influences emotion or other affective phenomena1. Since then, numerous re-
searchers all over the world have devoted themselves to studying the role affect and
emotion play in our interactions with technology. The affective computing community
is growing rapidly. A number of workshops, symposia, and even conferences make
emotion in HCI their subject. This book actually developed out of some of those
events. When Springer approached us to compile a book on the subject, we were just
organizing the third workshop on emotion in HCI at the British HCI Group’s annual
conference, held in the UK. While the workshop had numerous contributions from the
traditionally very active British and European community, we also received contribu-
tions from overseas.
We decided early on not to make this merely a book about the workshops, but to
broaden it: this book is intended to give a wider overview on the developments in the
field. We hence decided to also issue an open call for contributions. We aimed for a
balanced report with as wide a spectrum of research presented as possible, addressing
the topics
− Theoretical foundations (e.g., emotion representations, ethical and legal issues)
− Emotion and affect as input (e.g., sensor systems, multimodal sensor networks,
sensor fusion, data enhancement and analysis)
− Emotion and affect as output (e.g., desktop applications and agents, Web-based
services and applications, presence and smart environments, mobile applica-
tions, robots)
− User experience studies, usability, and design issues
− Community reports (e.g., on related research networks, national and interna-
tional research programs, or standardization efforts)
This list evolved out of our experiences at the workshops, reflecting the topics people
were most interested in. Looking at the contributions and comparing them with the list
above, many of our experiences at the workshops are confirmed, namely that people
show an interest for many aspects of this young discipline as well as awareness of its
challenges and risks. There is somewhat of a bias towards applications and studies,
rather fewer on sensing, and actually none on ethics (although at each workshop all
participants agree this is a very important subject!). The community section was also
mainly ignored, which might be interpreted as the affective computing research
landscape being fairly fragmented at present––but this scattering of work reflects the
fact that this area is a relatively young, fast-moving field.
We got numerous high-quality submissions covering most of the themes, which
made it very difficult for us to decide on which should be included in the book and
which not. All papers were extensively refereed, and revised by the authors, and if we
had accepted all submissions that were suggested for inclusion by the scientific com-
mittee members, the book would nearly be twice as big as it is now. So we had to
decide against many very good papers, from known and (as yet) unknown authors.
Finally, based on the accepted contributions, we restructured the book as follows:
– Theoretical Considerations: with contributions raising awareness on typical pit-
falls and shortcomings of currently common approaches, suggesting new views,
and working on well-known problems or open issues
– Sensing Emotions: namely, on speech-related issues and multimodal data fusion
– User Experience and Design: with interesting studies and suggestions on how to
motivate and make use of affect and emotion in real-world applications
– Affective Applications: the largest section, showcasing a wide range of projects
across diverse domains
We hope this fine selection of the state of the art will make its contribution to provid-
ing solid foundations for this fast-growing research field, making it less fragmented,
and giving up-to-date orientation on the developments in the domain.
We would like to thank all members of the Scientific Committee for their valuable
assistance, feedback, and suggestions, and all authors who took the time to write about
their fascinating ideas, projects, and results. Thanks also to the Springer team, for
giving us the opportunity to compile this volume as well as for their highly profes-
sional support. And finally: thanks to you for considering this book worth reading!
Scientific Committee
Elisabeth Andre Augsburg University, Germany
Ruth Aylett Heriot-Watt University Edinburgh, UK
Nick Campbell ATR, Japan
Lola Cañamero University of Hertfordshire, UK
Pabini Gabriel-Petit Spirit Softworks, USA
Roland Göcke Seeing Machines & Australian National University,
Australia
Kristina Höök KTH/SICS, Sweden
Nicola Millard British Telecom plc, UK
Ana Paiva Instituto Superior Técnico, Purtugal
Karina Oertel Fraunhofer IGD Rostock, Germany
Marc Schröder DFKI, Germany
Jianhua Tao Chinese Academy of Sciences, China
John Waterworth Umeå University, Sweden
Robert Ward University of Huddersfield, UK
Ian Wilson neon.AI, Japan
Table of Contents
Introduction
The Role of Affect and Emotion in HCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
Russell Beale and Christian Peter
Theoretical Considerations
Don’t Get Emotional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Leysia Palen and Susanne Bødker
Computational Affective Sociology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
William Sims Bainbridge
Comparing Two Emotion Models for Deriving Affective States from
Physiological Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Antje Lichtenstein, Astrid Oehme, Stefan Kupschick, and
Thomas Jürgensohn
Consideration of Multiple Components of Emotions in
Human-Technology Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Sascha Mahlke and Michael Minge
Auditory-Induced Emotion: A Neglected Channel for Communication
in Human-Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Ana Tajadura-Jiménez and Daniel Västfjäll
Sensing Emotions
Automatic Recognition of Emotions from Speech: A Review of the
Literature and Recommendations for Practical Realisation . . . . . . . . . . . . 75
Thurid Vogt, Elisabeth André, and Johannes Wagner
Emotion Recognition through Multiple Modalities: Face, Body Gesture,
Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Ginevra Castellano, Loic Kessous, and George Caridakis
The Composite Sensing of Affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Gordon McIntyre and Roland Göcke
Affective Applications
Affective Human-Robotic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Christian Jones and Andrew Deeming
Abstract. Affect and emotion play an important role in our everyday lives: they
are present whatever we do, wherever we are, wherever we go, without us being
aware of them for much of the time. When it comes to interaction, be it with
humans, technology, or humans via technology, we suddenly become more
aware of emotion, either by seeing the other’s emotional expression, or by not
getting an emotional response while anticipating it. Given this, it seems only
sensible to commit to affect and emotion in human-computer interaction, to in-
vestigate the underlying principles, to study the role they play, to develop
methods to quantify them, and to finally build applications that make use of
them. In this introductory chapter we discuss and give short accounts on present
developments in the field, covering theoretical issues, user experience and de-
sign aspects, sensing issues, and report on some affective applications that have
been developed.
1 Introduction
When one talks of giving emotions to computer systems, it is hard not to imagine the
science fiction futures embodied in books and film, that of computers becoming self-
aware sentient machines that then suffer the same dreams of self-importance or even
megalomania that afflict their human creators. These systems tend to go bad and try to
take over the world, and have to be brought back down to size by a hero who engages
in some daring-do and in doing so reasserts the values of being properly human.
However, the reality is that most researchers in the field are not trying to produce this
future, real or imagined. Adding emotion into computers is not about making new
forms of artificially intelligent, human-like systems – as Rosalind Picard wrote more
than ten years ago:
“Computers do not need affective abilities for the fanciful goal of becoming hu-
manoids, they need them for a meeker and more practical goal: to function with intel-
ligence and sensitivity towards humans” ([1], p. 247).
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 1–11, 2008.
© Springer-Verlag Berlin Heidelberg 2008
2 R. Beale and C. Peter
protracted, ongoing relationships that we are now building with our computer sys-
tems, for it is becoming rarer to find one-time interactions, and much more common
to find areas in which we have much longer term involvement. These interactions
may be sporadic and disjointed, but they span increasing amounts of time: just con-
sider the increasing frequency with which people order groceries online, buy airline
tickets and use the online checking facility, or drive everywhere with their satellite
navigation system giving them directions. This increasing trend for extended, repeated
relationships with technology suggests that we need to consider further aspects of in-
teraction than have traditionally been covered, especially those that relate to longer-
term relationships, and hence emotion has become a very interesting area for many.
This is not to argue that all interactions should be emotionally-mediated, or even that
long-term ones need be so – we find it hard to envisage the benefits of an emotional
airline booking system, whether it was detecting our emotions, or displaying some of
its own. However, we do see some benefit of developing a stronger, more personal
relationship with an in-car navigation system, which could both learn the sorts of
routes preferred to drive and so offer more than the choice of shortest or fastest – and
one that nagged us if we were to drive too fast, especially if it was late at night and we
seemed tired, or stressed, may eventually convey some significant benefits.
The present book provides an account of the latest work on variety of aspects re-
lated to affect and emotion in human-technology interaction. It covers theoretical
issues, user experience and design aspects, sensing issues, and reports on some affec-
tive applications that have been developed.
The first five papers deal with general Theoretical Considerations. We start with
a paper from Leysia Palen and Susanne Bødker, Don’t get emotional! (Chapter 2).
While this sounds a little provocative in a book on affect and emotion, Palen and
Bødker are indeed right: they warn us that while the community is currently very
euphoric about what has been done and what seems to be possible with the technolo-
gies at hand, we should be careful not to watch out for emotions everywhere and
anytime. We should not put the focus onto emotion for all our endeavours, but rather
keep it on the interaction and overall experience, and how it should be conceptualized
in the particular context, including affective aspects. Palen and Bødker state that
“emotion is so large an idea that it can be descriptive of all kinds of interaction and
non-interaction, meaning that it is fundamental to these things, and so therefore it is
not powerful to us as analysts of human computer interaction. We need instead … to
rely on more pointed understandings of what interaction means in different contexts
and at different units of analysis, and give those experiences their proper notice and
place.” This article reminds us that affect and emotion are just parts of a bigger, very
complex whole which needs to be conceptualized anew for each considered scenario.
In fact, affect and emotion can, in many cases, be considered as “just” additional
contextual information for the system. If we keep this in mind when designing affec-
tive systems or applications, the chances increase for our products being accepted as
useful and meaningful. If not, if we put all our efforts in the emotion part and pass
over the traditional design aspects, it is likely that our system will not attract much
positive attention. If you are a designer and have proudly demonstrated your affective
product to ordinary people, you might have experienced that heartsinking feedback
from them: but what is it good for? We certainly have. And in the course of discus-
sion it often works out that the main value of the application is not the affective bit,
4 R. Beale and C. Peter
but the bigger idea behind it. The affective component just adds functionality that
makes the product friendly, nice, attentive, … more likable. And that’s what affective
computing should be all about: making systems more pleasant for the user, and the
interaction with it a positive experience. The examples in the Applications section of
this book provide good evidence for this. There are times when the affective compo-
nent is stronger than this, however: for some applications, the incorporation of emo-
tion produces a much stronger experience for the user, and this may trigger some
significant change – an alteration in mood, or even in long-term behaviours. A func-
tionally similar system without the emotional contribution may not be sufficiently
engaging to demonstrate that effect.
William Bainbridge contributes an essay on personality and its influence on and im-
portance for affective interaction (Computational Affective Sociology - Chapter 3). He
encourages us to draw on the rich and well grounded knowledge of social sciences, sug-
gesting that use of sociological methods will benefit the design of interactive software.
He is in a sense in line with Palen & Bødker, arguing that personality is just another facet
in the colourful picture of human-computer interaction. Just as Palen and Bødker declare
that knowledge on affect and emotion itself is meaningless for a system without knowl-
edge on context and the interaction involved, Bainbridge adds that knowledge of the user,
here specifically the personality traits, is an important aspect for systems to analyze and
make use of emotion-related information. In fact one wonders how interaction
designers could do so long without considering these aspects. Or, put the other way, it
becomes clearer now why so many people can’t get on with the one product but with
others, while other people experience it exactly the other way around. Consider the Big
Five personality model [3] favoured in Bainbridge’s article. Its five dimensions (Neuroti-
cism, Extraversion, Openness, Agreeableness, Conscientiousness) clearly specify aspects
of a user’s behaviour pattern, the user experience and attitude towards the product and
interaction with it. Neuroticism deals with the user’s experience of emotions, i.e. if the
user is emotionally stable or tends to experience negative emotions more easily. Extra-
version marks the tendency of being active and open for (inter-personal) interactions.
Taking this further and considering Reeves’ and Nass’ remarks on computers as social
actors [2], extraversion would indicate a person’s tendency to actively engage with a
system or to wait for being invited to interactions by it. Openness adds to this a person’s
curiosity and openness for new ideas and experiences. For interactions, it reveals if a user
is inclined to explore a system’s capabilities and new interactions out of curiosity, or if
the system is expected to stick to traditionally known interaction patterns. Agreeableness
is about a person tending to be cooperative and trustful, or sceptical and hostile; for de-
signing interactions this is of particular importance since gaining the trust and coopera-
tion of the user are often keys for successful interactions. Conscientiousness finally
describes the tendency to act diligently and planned versus superficially and spontane-
ously. This might be interesting for designing personalized agents or other supportive
applications. These five dimensions, even in their raw form, provide some interesting
concepts with which to analyse problems and interactions, and to guide the design of
solutions to them, depending on the people they are meant to interact with and the im-
pression that they are trying to convey.
The fourth chapter, Comparing Two Emotion Models for Deriving Affective States
from Physiological Data by Lichtenstein et al. also elaborates on research results from
other domains and their applicability in the HCI context. Their focus is on applicability of
The Role of Affect and Emotion in HCI 5
emotion models which have their roots in psychology. Those models are often applied
to study affect or emotion without thorough reflections in the HCI domain. That the
choice of the right model is a delicate issue has been addressed before by others [4, 5, 6],
as has the question if models from psychology can be used without alterations in HCI
[e.g. 7]. Lichtenstein et al. now provide a study comparing the two emotion models
mostly used in HCI, the Basic Emotion model [8] and the dimensional approach of
valence and arousal [9]. The aim of their study was to investigate on the models’
suitability for self-assessment of emotions as well as for deriving affective states from
physiological readings. Their results can be used to negotiate between the advocates
for the one or the other model: according to Lichtenstein et al., the Basic Emotion
model is better suited for assigning pre-selected states to physiological patterns while
the dimensional approach has been found to be more accurate for self-assessments.
On the other hand, the participants of their study found it more difficult to assess their
affective state using Lang’s [10] self-assessment manikins (SAM) used for valence
and arousal measures, compared to pictures and descriptive words for Ekman’s basic
emotions. How these findings affect each other (i.e. how accurate are results which
are based on a model assessed as difficult to use, and how useful are results based on
pre-selected states) is also briefly discussed by Lichtenstein et al., but remains open as
for now.
Chapter 5, Consideration of Multiple Components of Emotions in Human-
Technology Interaction builds on another model to describe emotions. Mahlke and
Minge use Scherer’s appraisal approach [11] to analyze emotions in a multi-
component fashion. According to Scherer’s component process model, emotional
states can be seen as being composed of states of five components: cognitive ap-
praisal, physiological activation, motor expression, behaviour intentions, and subjec-
tive feeling. Mahlke and Minge now make use of these five components to better
distinguish between the effects each of them has on the overall emotional experience.
While this approach faces new challenges in accessing the particular information, we
think it opens a new perspective for better understanding and exploiting emotion
information in interactive systems.
Taking this section’s papers that discuss psychological emotion theories, we can
see that their application to interactive systems analysis and design raise as many
questions as they give answers – there seems to be no one right model, but a range of
approaches that all contribute different facets to our understanding. Indeed, one of the
other open questions in the field is whether there needs to be an expansion or alterna-
tive modelling paradigm for human-computer emotional modelling – in other words,
are there key differences between human emotions and human-computer emotions?
The last contribution to the Theoretical Considerations draws our attention to the
audio channel. Tajadura-Jiménez and Västfjäll give account on the so far quite rare
research on the influence of sound on affect and introduce results of a study they
performed. In Auditory-Induced Emotion: A Neglected Channel for Communication
in Human-Computer Interaction (Chapter 6) they suggest that acoustic stimuli can be
used to influence a user’s emotional state in a gentle, almost unnoticeable manner.
Since humans can and often do perceive sounds without turning their attention to
them, this channel seems to be designated for stimulating a certain mood or emotional
state (just think of the different music styles people pick for different activities). To
make use of auditory induced emotion, Tajadura-Jiménez and Västfjäll suggest
6 R. Beale and C. Peter
considering four aspects of affective reactions to sound which they call the physical,
psychological, spatial, and cross-modal determinants. Physical aspects refer to quali-
tative characteristics of a sound, e.g. intensity and frequency. While it seems obvious
that a loud sound is more unpleasant the louder it gets, it is interesting to learn that
certain frequency bands are experienced differently as pleasant or unpleasant depend-
ing on the intensity of the signal. The psychological determinants which deal with the
subjective judgment of a sound are more difficult to assess. There are differences in
perceiving artificial sounds (e.g. the fan in a laptop computer) and natural sounds (e.g.
a dog barking, birdsong, or footsteps). Additionally, natural sounds can be divided
further as to their affective evaluations, depending on their sources and our emotional
associations with them. It also seems clear that the emotional evaluation of a sound
should be strongly dependant on the context the user is in. The full sound of a car
engine might be enjoyable to hear for the young driver, but not enjoyable for the per-
son just crossing the road. The sound of footsteps might be relieving for someone who
found himself locked in a museum, but scary for someone who thought to be alone at
home. So again, context and emotion seem to be very closely related to each other.
For the spatial dimension, Tajadura-Jiménez and Västfjäll show that spatial informa-
tion is important for emotional assessment of auditory information and that the sub-
jective sense of presence will increase with the spatial resolution of the sound. From
this it can be concluded that the degree of immersiveness into a task or activity can be
influenced by spatial characteristics of sound used. Finally, the cross-modal determi-
nant is about how an affective evaluation of information from one channel
(e.g. sound) interferes with the affective evaluation of information from other chan-
nels, like sight or touch. For instance, there is evidence for the perception of visual
information being influenced by affective sounds. This underlines the importance of
the audio channel when it comes to effectively conveying affective or emotional
information.
Vogt et al. also deal with the audio channel as they open the Sensing section with
an overview and guidelines on how to address the technical challenges of extracting
emotion information from speech in the HCI context (Chapter 7: Automatic Recogni-
tion of Emotions from Speech). They begin with more general theoretical and practical
considerations of emotion recognition in speech and point out that all results reported
in scientific papers so far are only of limited value for real-world application design-
ers, due to a number of restrictions. First, most of them work on acted data – even
acted by trained actors with often exaggerated, single emotions. This makes the cor-
pus less valuable for real-world application designers since in real life emotions are
usually quite shallow and might overlap with each other (known as “blended” or
“mixed emotions”). Second, reported results are often not comparable due to different
underlying corpora and/or the different application fields they have been designed for,
so their results can only provide for interesting considerations for building one’s own
dedicated corpus. Last but not least there are high requirements on speed, robustness
and accuracy of real-time emotion recognition algorithms, since failure of one of
these might result in the application being not acceptable to users. But Vogt et al. also
raise hope. They suggest that it does not always need to be the best algorithm avail-
able, that not all speech features need to be considered, and that one could indeed try
to work with one of the standard (i.e. staged) corpora to train one’s classifiers, and
one can get satisfying results nonetheless. It all depends on the requirements of the
The Role of Affect and Emotion in HCI 7
application and the context. Emotion recognition in general tends not to be perfect –
not by machines, nor by humans. And as we have learned before, emotion informa-
tion is just one piece of the bigger picture, so it should be sufficient for emotion
recognition to provide just hints on this ever-changing element of the context.
The two other articles in the Sensing section report on approaches to increase accuracy
of emotion recognition by merging results from different modalities. Castellano et al.
(Emotion recognition through multiple modalities – Chapter 8) merge information from
facial expressions, body gestures and speech. They report on two approaches to this.
While in the first they perform modality specific classifications and merge the results, in
the second they put the features of all modalities in one large feature set and work with
these merged data. They found that by merging multiple modalities, recognition rates can
increase by up to thirty percent compared to classification results of single modalities.
Comparing the two approaches, they observed that merging data at the feature level
yields slightly better results than merging them at the decision level, at least for their
specific setup.
While Castellano et al. use acted data and put restrictions on subjects’ clothing,
faces, and the background, the focus of McIntyre and Göcke’s Chapter 9 is on emo-
tion detection in real-world, natural settings. In The Composite Sensing of Affect they
remark that affective expressions are very dynamic, versatile, and specific for each
individual and are influenced by the context the person is in (as we already have
learned). McIntyre and Göcke suggest dealing with those challenges by providing
semantic clues based on background information (e.g. social, cultural, context) for
reasoning about affective content in the collected sensor data. To support this process,
they propose a generic model for affective communication accompanied by a domain-
specific ontology, which offers a way to include context information into affective
reasoning.
To sum up the sensing section, we have learned that, with the current state of the
art, it is still not possible to accurately infer the whole range of affective states in real-
time, in real-world settings. The main reasons for this are: first, that the corpora avail-
able to train emotion classifiers are not of sufficient quality for use in general
real-world applications; second, the quality of the sensing devices is still at laboratory
level, requiring well-controlled environments to provide good measurements; and
third, the sheer complexity of the real world with its effect on affect and emotion in
the human exposed to it. We have also learned that, because of this, we might be
better off restricting ourselves to detect just a few states which can be distinguished
quickly and reliably with robust algorithms, instead of trying to identify a variety of
states with high uncertainties and long processing times. A common theme emerges:
don’t be too ambitious trying to always identify an emotion - your system will (or at
least should) work as well without emotion updates every few seconds. The affective
applications portrayed in this book are good examples of this.
The User Experience and Design section starts with an investigation in design
and its influence on people’s emotional experiences (Chapter 10: Emotional Experi-
ence and Interaction Design). Just as Don Norman and Andrew Ortony have de-
scribed [12, 13], Lim et al. work out that emotions can indeed be designed, and
support this with experimental results. In their study, they examine product quality
based on Norman’s three levels of emotional response (visceral, behavioural, reflec-
tive), by use of semantic differentials. They found that the interaction-related features
8 R. Beale and C. Peter
of a product can significantly influence the emotional experience. They also found
that functional as well as interaction qualities of a product are directly linked to emo-
tional experiences, leading to the conclusion that emotion can’t be considered inde-
pendent from functionality and interaction, but result from these. This is a strong hint
to designers that emotions need to be consciously designed into products and systems,
since people seem to perceive emotions and their effects even if they are not explicitly
present.
Walker & Prytherch’s Chapter 11: How is it for you? motivate us to make use of
psychologist’s insights in behaviour theory and consider users’ response to our de-
signs as a result of the mental processes of perception, cognition, and emotion. They
even suggest that design, by shaping users’ experiences, can influence users’ behav-
iour which can be exploited to motivate the user. Highly motivated users will respond
more interactively to a system, which in turn will result in a better user involvement
and understanding of the product. Since motivated users might be more prepared and
willing to cope with usability problems, Walker & Prytherch argue for extending our
perception of usability to include user motivation.
Porat and Tractinsky contribute a model to study the influence of design characteris-
tics on user’ affect and emotion in Chapter 12 (Affect as a Mediator between Web-
Store Design and Consumers’ Attitudes towards the Store). They propose to apply
Mehrabian & Russell’s environmental psychology model [14] and add to it HCI design
aspects. The Mehrabian-Russell model of environmental influence is well suited here
since it nicely allows to apply HCI design variables as “environmental parameters”
influencing peoples’ emotional states, leading to an “approach/avoidance response”
which in the given context corresponds to staying with or leaving that particular web
site or, more generally, using or rejecting a product or service. Porat and Tractinsky
offer nine propositions regarding antecedents and consequences of emotion, specifi-
cally in the context of e-commerce, most of which apply also to other products and
services.
While Porat & Tractinsky focus on e-retail applications, Harbich & Hassenzahl
emphasize the need of positive affective experiences to improve the overall quality of
products for the workplace. In Chapter 13: Beyond Task Completion in the Workplace
they argue that traditional usability measures should be expanded by non-utilitarian
aspects. Their e4 (for execute, engage, evolve, expand) model of user experience is
especially targeted at workplace products which often suffer from the inconvenience
of having to be used, instead of wanting to be used as e-stores, games or social net-
work sites are. Particularly for these products, it is essential that their value does not
draw solely on the functional, utilitarian aspects, but also on hedonistic qualities
which are perceived by the user independently of the actual task to be performed. The
e4 approach is behaviour oriented, with its four components complementing each
other. The Execute component considers traditional aspects related to accomplishing a
task like effectiveness and efficiency. The Engage component is about motivating the
user to use the product, ideally enjoying it and looking forward to use it in the future,
regardless of its efficiency. An example for this is a user who prefers one text proces-
sor over another, despite the fact that composing and formatting the text might actu-
ally take longer with it. The Evolve component considers the desire of humans to
explore, manipulate and adjust their environment to their ideas. The design of a
product should invite exploration of its functionality and offer ways to do so in an
The Role of Affect and Emotion in HCI 9
unobtrusive manner. With the text processor example, this could be the program
offering alternative or advanced editing or formatting options related to the user’s
context (with the user’s affective state being part of that context). The fourth compo-
nent, Expand, can be considered as the icing on the cake, sweetening and intensifying
the effect of the other three components. If users have the opportunity to develop new
working methods or usage scenarios with existing products, they feel much more
involved in the whole working process, giving them the feeling of their creativity
being appreciated, which will in turn make them more engaged and motivate them to
evolve further approaches. A text processor offering possibilities to, for example,
create document templates or dedicated plug-ins would be an example here.
The final chapter in this section explores in detail how users respond to displays of
emotion from affective agents (Simulated Emotion in Affective Embodied Agents –
Chapter 14). Creed and Beale review the research in this space, highlighting a lack of
comparative studies between emotional and unemotional agents and hence a lack of
definitive results to demonstrate what effects are due to emotion and what to other
causes. They also highlight that emotions are developed over time, and so longer-term
studies are needed in order to properly develop an understanding of this. They note
that in human-human interaction, people we like and trust exert a much stronger in-
fluence on us, and examine whether this is true for human-affective agent interaction
too. Would it be the case that agents that are emotionally expressive and empathetic
be more able to exert an influence on users and make more significant changes to
their behaviour than unemotional ones? They test this through an experiment that first
checks whether users can recognize the emotions represented by the agent, and then
tested user perceptions of the emotionally expressive agent as well, finding that it was
more likeable and caring. This experiment forms the basis of a future longitudinal
experiment into behaviour change, attempting to improve people’s eating habits over
a long course of interaction. This work ties in to the tenants of Walker & Prytherch,
explored earlier, in that emotionally engaged users are more likely to be affected than
those less engaged.
The Affective Applications section presents a series of contributions on systems
that utilize affective components to achieve some significant interaction effect. These
cover a range of domains, from call centres to gaming to car driving. In Chapter 15:
Affective Human-Robotic Interaction Jones and Deeming look at the future for affec-
tive entertainment in the home through the development of a robotic dog that can
recognize emotional states of its owner through acoustic cues in the owner’s speech,
and perform appropriate actions – for example, if the owner is perceived as being
happy, the dog makes a playful bark and wags its tail. This makes it appear more
emotionally intelligent and ‘alive’. Trying to work with the mood of call centre em-
ployees is the focus of Millard and Hole’s research (Chapter 16: In the Moodie: Using
‘Affective Widgets’ to Help Contact Centre Advisors Fight Stress). They create ani-
mated, affective widgets called Moodies that can be released to wander over the
screen as a way of combating stress in advisors. These widgets allow the advisors to
use humour to alleviate tension and stress by allowing the moodie to stomp about on
the screen. By producing a visual record of their internal emotional turmoil, they are
also able to indicate this to their colleagues and so get support from them as well.
In Chapter 17: Feasibility of Personalized Affective Video Summaries Money and
Agius look at a very different application: they use physiological measurements of
10 R. Beale and C. Peter
emotional response, and ascertain that users have significant, measurable affective
responses to different video segments. This allows them to identify emotionally sig-
nificant changes in video streams and opens the way for a personalized video sum-
mary that is based on key moments in the video that generate an emotional response
in the viewer. This differs from the more usual approach of analysing the semantic
content of the video. Jones and Sutherland (Acoustic Emotion Recognition for Affec-
tive Computer Gaming – Chapter 18) also use recognition of user emotion (in
speech), but they take their recognized emotions and map them onto a character in a
computer game, changing the characteristics of that avatar. In order to undertake
some energetic activity, the character is more likely to succeed if it is imbued with
upbeat, lively emotions – conversely, if the task requires slow, careful movement and
planning, then downbeat, quiet emotions make it more likely that it can be successful.
This game was evaluated with users, who found that it made the game more engaging,
though the input of emotional state through talking to the game made some users
uncomfortable. These two applications, one using detailed physiological measure-
ments, and the other a simpler voice recognition approach, do demonstrate the wide
range of potential input approaches that can be used.
Loviscach & Oswald present a system that uses either sensor input or the state of a
computer game to determine probable emotional state, and then uses this information
to select and play appropriate music (Chapter 19: In the Mood: Tagging Music with
Affects). The user tags some music as it is entered into the system, which is used as the
basis for an automatic tagging of the rest of the music library based on feature extrac-
tion and pattern matching with the tagged items. This has some similarities to the work
by Voong & Beale [15] on tagging music with colour to represent emotion, though the
playback mechanisms in their work are explicitly user defined, not calculated.
Chapter 20 presents the final paper in this section, by Jones and Jonsson (Using
Paralinguistic Cues in Speech to Recognise Emotions in Older Car Drivers), and
considers the detection of emotion in car drivers, in particular older ones, and demon-
strates that it is possible to create a system that can recognize and track the emotional
state of the driver. This information can then be used to modify the car’s navigational
voice so as to respond more appropriately to the driver. This system therefore has the
potential to de-stress and relax a tense driver, or to make them happier about the con-
ditions, which increases their concentration and hence makes them safer. Emotional
systems have clearly moved on from just being interesting add-ons: this demonstrates
that they have an active role to play in safety critical situations as well.
Looking at the papers collectively in this volume, we can draw some general con-
clusions. Firstly, the case for considering affect in interaction design seems to have
been successfully made, and won – affect exists within systems whether we design it
there or not, and leveraging it appropriately can lead to more engaged users who are
more committed to the systems and more affected by their influences. Secondly, the
field is complex: trying to develop independent scientific metrics for emotional meas-
ures often requires controlled laboratory-style setups, in which users may not behave
as naturally, and in which the complex interplay of different elements that is critical to
the overall effect is lost. Conversely, other more holistic experiments demonstrate some-
thing useful but potentially domain-specific, and obtaining generalisable results is hard.
We have also seen, however, that if we consider emotion as a component in an interac-
tion, our often coarse and imprecise measurements can provide sufficiently useful data
The Role of Affect and Emotion in HCI 11
for us to be able to make a significant impact on the quality of the interaction: we do not
need perfection in detection or understanding in order to make a useful difference. But
the major lesson is that there is a large amount of work that remains to be done before
we can claim to understand, model, design and use emotion effectively in interactive
systems.
References
1. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
2. Reeves, B., Nass, C.: The Media Equation: How People Treat Computers, Television, and
New Media Like Real People and Places. The Center for the Study of Language and Infor-
mation Publications (1996)
3. Wiggins, J.S. (ed.): The five-factor model of personality: Theoretical perspectives. Guilford,
New York (1996)
4. Russell, J.A.: How shall an emotion be called? In: Plutchik, R., Conte, H. (eds.) Circumplex
Models of Personality and Emotion, pp. 205–220. APA, Washington (1997)
5. Douglas-Cowie, E., et al.: HUMAINE deliverable D5g: Mid Term Report on DatabaseEx-
emplarProgress(2006),
http://emotion-research.net/deliverables/D5g%20final.pdf
6. Schröder, M., Devillers, L., Karpouzis, K., Martin, J.-C., Pelachaud, C., Peter, C., Pirker, H.,
Schuller, B., Tao, J., Wilson, I.: What should a generic emotion markup language be able
to represent? In: Paiva, A., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738,
pp. 440–451. Springer, Heidelberg (2007)
7. Peter, C., Herbon, A.: Emotion Representation and Physiology Assignments in Digital Sys-
tems. Interacting With Computers 18(2), 139–170 (2006)
8. Ekman, P.: An argument for basic emotions. Cognition and Emotion 6 (3/4) (1992)
9. Russell, J.A.: A Circumplex Model of Affect. Journal of Personality and Social Psychol-
ogy 39, 1161–1178 (1980)
10. Lang, P.J.: Behavioral treatment and bio-behavioral assessment: computer applications. In:
Sidowski, J.B., Johnson, J.H., Williams, T.A. (eds.) Technology in Mental Health Care De-
livery Systems, pp. 119–137. Ablex, Norwood (1980)
11. Scherer, K.R.: On the nature and function of emotion: A component process approach. In:
Scherer, K.R., Ekman, P. (eds.) Approaches to emotion, pp. 293–317. Erlbaum, Hillsdale
(1984)
12. Norman, D., Ortony, A.: Designers and Users: Two perspectives on emotion and design. In:
Proc. of the Symposium on Foundations of Interaction Design at the Interaction Design Insti-
tute, Ivrea, Italy (2003)
13. Norman, A.D.: Emotional Design: Why We Love (Or Hate) Everyday Things. Basic Books
(2005)
14. Mehrabian, A., Russell, J.A.: An approach to Environmental Psychology. MIT Press, Cam-
bridge (1974)
15. Voong, M., Beale, R.: Music organisation using colour synaesthesia. In: CHI 2007 extended
abstracts on Human factors in computing systems, pp. 1869–1874. ACM, San Jose (2007)
Don’t Get Emotional
1 Introduction
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 12–22, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Don’t Get Emotional 13
It also was not long ago when the human-computer interaction community dis-
missed usability outcomes that amounted to claims such as, “Our users thought that
was cool!” or “It made them feel happy.” Where was the substance in that kind of
assessment, many wondered. Now, with emotion having come into vogue, we might
judge these remarks to be newly appropriate indications of a system’s viability and
suitability.
Re-interpreting these claims that many suspected were important, but could not
validate on the terms we had historically come to expect in the field of human-
computer interaction, is a worthy objective. Defining what emotions are and envision-
ing what we might do with them in human-computer interaction, is far from trivial.
However, existing side-by-side with these benefits is a danger. By foregrounding
emotion, we place it as a feature of interaction, a sub-component of a greater whole,
and perhaps to the exclusion of other features of interaction. We submit that it is more
powerful to simply consider emotion as a fundamental quality of interaction; in other
words, to the extent that there are different, discernable features of interaction, emo-
tion underlies all of them. Marking emotion has the surprising consequence of trivial-
izing its function, and, in some cases, problematizes how we then address emotion in
design and critique.
We further submit that emotion has come into vogue as a dichotomic reaction to
the troublesome emphasis put on efficiency and productivity in human computer inter-
action for so long. This focus on emotion could be just as problematic.
And so, with our tongue-in-cheek title, we turn the otherwise unpleasant directive
on its head by claiming that emotion is far too important and integral to treat as some-
thing distinct in interaction. We use this paper to argue why we should temper the
attention given to emotion, explore alternatives, and consider the challenges brought
on by this current focus in human-computer interaction (HCI).
3 Tempest in a Teapot
We consider a few of the perspectives that have informed the attention to emotion, as
well as some alternative perspectives that differently address some of the same mat-
ters. Following the cognitivist frame represented by Norman’s teapot [14]—which we
figuratively (and apologetically) borrow for our subtitle to illustrate the fervor the
topic of emotion has reached—we describe Boehner, DePaula, Dourish, Senger’s [3]
work that represents the interactionist frame.
With his book, Emotional Design [14], Don Norman offered an outline for how
HCI might deal with emotion. Pointing out that design is emotional was an important
step that helped bring additional attention to existing ideas of affective computing [18],
and we believe, like many others, that this issue is deserving of further attention.
First, however, we will look at how Norman deals with interaction in relation to
emotion. He discusses at length his collection of teapots, and in particular, his favorite
teapot. In this discussion, moves he discussions of emotion beyond primary emotions
of, for example, fear, and beyond the immediacy of affect and arousal. Norman con-
nects emotions to experiences and things, and he points out how in addition to relating
to things cognitively, human beings relate to them emotionally.
As the emotional response to digital artifacts started to become an acceptable topic of
the science and design of HCI, we saw the rise of another view: that emotion is not an
isolated experience resident in the head of a single person; it is instead an interactional
and social experience. Boehner et al. [3] point out how Norman is constrained to the
cognitivist paradigm, which means that he sees emotions as add-ons to cognition.
Boehner et al. [3] tackle the topic of emotion from a social and interactionist point
of view, arguing that meaning in emotion is generated by the interaction of people,
and that emotion is understood and often modified by interaction with others—rather
than an internally-generated experience. Whereas Norman focuses on the single hu-
man being and her relation to things, Boehner et al. focus on the relationships be-
tween human beings, in a framework that does not easily address, for example,
personal history— a further issue we believe is important to consider.
Accordingly Norman’s perspective on emotion in HCI as an add-on to cognition
has many limitations: it ignores how emotions are created in interaction among hu-
man beings, how these emotions arise from many types of human needs, some of
which are created outside the object of emotion itself. And it ignores how emotions
are created in historical processes where experiences are not only created but also
recreated through anticipation and recounting. Norman really seems to be thinking
about single-user experiences with the machine and information that the machine
presents.
Boehner et al.’s interactional framing is problematic because it puts emotion in the
foreground as a feature of interaction, whereas we believe it is more helpful to think
of all interaction as emotional—even when it is allegedly unemotional, it is still de-
fined by emotion through, once again, negation (as in an attempt to remove, or hide,
or “keep in check”). So in some ways, Boehner et al. make too little of emotion by
making too much of it, and ignore richer ways of understanding emotions than the
ones that are directly tied to the interactional situation.
16 L. Palen and S. Bødker
We now turn to one of our own recent experiences that help illustrate how the cogni-
tivist, interactionist, and—then—experientialist perspectives might interpret a series
of computer-mediated events, and the recounting of those events.
Susanne remembers:
Not long ago, I went to friends’ for dinner. It was quite a long drive,
and on the way there I remember thinking—do I need gas? No, no, I
would probably make it.
On the way back, at almost midnight, I decided to stop for fuel anyway.
Off the highway and into the nearest town I went. At the major road
crossing was a gas station. It was quite deserted, but I was pleased to see
it. I pulled up, took out my credit card and went out to punch in my pin-
code and open the gas tank. “Please insert the card”—I did—a pause—
“This card cannot be read.” I tried to turn the card, but that did not help.
Who cares?, I thought, I’ve got more cards! I inserted another card, and
the same happened. Now I started to worry—what had happened to my
credit cards? Did I have cash? No. Could I make it home without extra
gas? Oh misery!
I decided to move the car to the pump behind me as a last attempt to
identify if it was the card or the pump causing the trouble. And I still ne-
gotiated with myself whether I could make it home. Somehow the new gas
pump did not share the problem—it read my card, and asked for the pin-
code. What a delight! There was no problem with my cards, and I would
not be stranded on the freeway in the middle of the night! I fuelled the car
while another car pulled up across the aisle. An elderly couple—who
given the way they talked, were on a date, they weren’t married—got out
and the woman offered to use her card. That was not accepted by the ma-
chine either, and they started to futz about “doing things right.” At that
point, I had gotten my fuel, and I felt that I could help out by telling them
that I had problems too. Somehow, the man’s card worked. So in the end,
it probably wasn’t because I moved the car that the problem resolved;
rather it seemed due to some sort of periodic error in the payment system.
Based on Susanne’s experience (and the couple’s lack of it), she “read” the situa-
tion differently than the couple. They thought it was their mistake, whereas she had
ruled that out when it happened to her. Susanne explains that she has had earlier dra-
matic problems of credit cards not working, and those experiences triggered a host of
emotions. This type of analysis could probably be made using Norman’s understand-
ing of emotion. However, these emotions were not made better with the risk of being
stuck along on the freeway, a level of analysis that is not quite included in Norman’s
thinking. This new experience changed her emotional state and compelled her to help
the poor people at the other gas pump. This “interactional emotion” could also be
investigated from the perspective presented by Boehner et al., but neither would their
Don’t Get Emotional 17
framework consider Susanne’s past history. The question for us is how these emerg-
ing and historical bits of the emotional situation get addressed.
5 Beyond Emotion
From our perspective, the useful effect of these experiential perspectives is that they
provide ways of discussing interaction, with its emotions, affective actions, etc. in
ways that reach out of the direct encounter between the human being and the com-
puter, as processes over time, where researchers can be allowed to focus on emotional
phenomena other than only those reactionary emotions given rise by beautiful teapots
and frustrating gas pumps.
Our example illustrates how emotion is not, after all, detached from efficiency and
safety, security and all those more ordinary concerns of HCI. Emotion is also not
only “good;” just as Norman offers that “attractive things work better” [14] we
know that emotion includes fear, terror, anxiety etc. that are often counterproduc-
tive to efficiency1.
Susanne’s interaction with the system and with the couple and their interaction
with the system were fuelled by a larger set of emotional interpretations that had eve-
rything to do with the experience at the gas pump. In our example, the potential trou-
ble in the space of interaction came from many levels—and potential units of
analysis—including typing in and receiving feedback about the pin-code; pushing the
right buttons on the gas pump; being stranded on the freeway; interpreting the cou-
ple’s experience and their interpretation of events, including their personal relation-
ship and possible embarrassment about the status of their credit cards; and so on.
When emotion is the foregrounded object of design and analysis, the burden at
some point or another sits on bounding units of analysis for emotion’s consideration.
If emotion is a cognitive activity, and an interactive one, and an experiential one, then
the obligation to honestly consider its relational contexts becomes impossible. Per-
haps we are earnestly working toward a poorly framed target? Or put differently,
framing human-computer interaction as emotional as well as efficient has not made it
any clearer how the field should adjust its focus or methodology. Again this dilemma
has historical parallels: When Mumford and others [12] presented socio-technical
design as an alternative to traditional machine thinking in information systems and
organizational design, the problem was exactly that while their thinking shifted the
design focus away from the machine and efficiency and to the “emotional man,” they
had a problem of analyzing and designing for this new frame. This problem yielded
odd analytical comparisons. For example, in questionnaires designed to analyze a
workplace, they made workers rate both how clean the bathrooms were and how
much influence that had on the detailed planning of their work—and clean toilets are
almost Norman’s teapot qualities (“attractive things work better”)!
1
This is indeed a well-known reason why Taylorism had to be substituted/supplemented with
other forms of production: Workers were simply more productive when they “felt better.”
This has been achieved through socio-technical improvement of the production [12], by mak-
ing workers an item of concern (the Hawthorne effect) or through more radical ways of giving
up industrial production all together.
Don’t Get Emotional 19
Our point is not that we ignore emotion, but rather that we need to think more criti-
cally about how the community uses fashionable trends like emotion to productive
ends, and even how we conceptualize interaction in the first place. Such conceptuali-
zations have critical outcomes.
Perhaps we can help ourselves understand what it is we get with the focus on emo-
tion by looking beyond and anticipating what some instances of the next “wave” will
be in HCI. Emotion is a segue to this next wave, as a negation of work and efficiency,
but perhaps other concepts highlight different attributes of interaction. Some of these
might be later thought of as fashions or trends, or maybe they will endure. Perhaps
only what we can do now is try on for size a “negation of the negation” in dialectical
terms: something that emerges in reaction to the limits of efficiency and emotion, or
both, and be something “third” [5].
Such an exercise might anticipate the limits we will encounter with emotion. How
does it differ when we think about, for example, play, relaxation, laziness, sentimen-
tality, inefficient togetherness—all themselves additional conceptual contrasts to pro-
ductivity and efficiency? And what about solitude, a contrast to or negation of the
current focus on emotion? Much research has, at one time or another, put these ideas
into the foreground; the question is, should they be there still now? Is it emotion that
should be foregrounded, or other states of being and human activity that emotion,
instead, helps realize?
We focus here on the idea of solitude in relation to emotion. Solitude is an almost
inevitable fashionable topic in HCI. Disconnectedness, privacy, and access are known
needs of our digital society, certainly, although much of our use of the state of the art
dismantles understandings around being alone and being together. Furthermore, we
believe solitude is a predictable contrast to some essential matters we face when we
focus on emotion, and yet clearly differs from what efficiency offers as a contrast.
Efficiency-emotion plays at the rational-irrational tension. Emotion emphasizes the
fascination that we (mostly) collectively have about the power of digital connection.
Though we are not ignorant of its problems and compromises, these are the heady
days of ICT development and use, where ICT has penetrated the everyday—the
irrational, messy, mundane everyday—but is still fairly “visible” and not quite taken-
for-granted. We are still discovering the ways we might leverage digital connection
anytime and anywhere. This itself is an emotional experience, as the interaction with
digital devices and with each other sometimes shatters boundaries (including bounda-
ries other than spatial and temporal ones) that constrained our previous experiences.
ICT itself is an impetus for emotional response, as was the rocket to the moon, and
the television, and the printing press, and the car, and so on (and continue to be, as we
think of some of these memories with wistfulness, or a retroactively applied sense of
doom, or relief, or whatever one’s experiences might bring to these points in time).
So whatever our emotional reaction to ICT, we are also naturally fascinated by the
power of digital connection and the social connections that result. These forms of
social connections are rich and meaningful in spite of constraints, even those imposed
by ascii text (eg., SMS or text messaging), and so we rightly attribute the richness of
20 L. Palen and S. Bødker
interactions to are our emotions. And now, through deliberate design, we hope to
further facilitate the expression of these emotions in our connectedness with things
and with people through things.
That brings us to what it is the community is ultimately trying to do through our
calling out of emotion for design and analytical purposes—trying to bring computer-
mediated interaction to its fullest fidelity in support of this pervasive connection.
Emotion, then, is on par with connectedness, which is why it is more powerful to
think about emotion as interaction, rather than a feature of interaction. It is also the
basis for our proposal that solitude will be an imminent conceptual turn for innova-
tion. That concept is, of course, already percolating in terms of complaints among
people at large in reaction to the interruptive quality of their lives. However, we pro-
pose that solitude will be explored as a countermove to the some of the issues that are
at the core of our current discussion around emotion.
Take, for example, mobile telephony. Mobile phones offer connection to others. Its
early appeal to many new users first rested on meeting perceived needs of safety,
though the benefits to convenience and interpersonal connection very quickly became
clear and valued [15]. Eventually, as the means to these personal connections, the
mobile phone came to embody them. Some age groups especially want to tailor and
customize their phones, signaling an emotional connection with the device itself. We
note that here, where the desire to have emotional connection with others translates
into an emotional connection with the phone device, is where the cognitivist and in-
teractionist frames work in complement. But as soon as these telephonic connections
were possible, we began to see a backlash to this pervasive availability and the affect
on our other lived experiences. With the possibility of all this digital connection came
the recognition (or reinforcement of existing belief) that disconnection is also impor-
tant and valuable. Disconnection offers “emotional release” from others—the pleasure
of solitude.
Emotion is so large an idea that it can be descriptive of all kinds of interaction and
non-interaction, meaning that it is fundamental to these things, and so therefore it is
not powerful to us as analysts of human computer interaction. We need instead—or at
the very least, in addition—to rely on either more pointed understandings of what
interaction is at different units of analysis, or on the composites of emotion and cir-
cumstance that give rise to constructs such as solitude, play, sentimentality, and so on,
in human life.
6 Closing Remarks
We would be remiss if we did not communicate our support for all things irrational.
Our interest in reframing and expanding our understanding of interaction in this paper
still argues for a place for emotion in HCI, though perhaps it is now time that this
place should be more explicitly considered. We have pointed out how history in some
ways seems to repeat itself with respect to emotion and efficiency. While there is
certainly more to be learned from this past, as well as from other elements of the dis-
cussion around emotion than what we have brought here, we hope that our considera-
tion of how emotion relates to interaction—and what happens when we conceptualize
emotion as a waypoint rather than as a destination—supports progress in this
Don’t Get Emotional 21
innovative area. We are hopeful that HCI research will benefit from the pursuits and
interests represented in this volume on emotion.
References
1. Ballegaard, S.A., Bunde-Pedersen, J., Bardram, J.E.: Where to, Roberta?: reflecting on the
role of technology in assisted living. In: Mørch, A., Morgan, K., Bratteteig, T., Ghosh, G.,
Svanaes, D. (eds.) Proceedings of the 4th Nordic Conference on Human-Computer Interac-
tion: Changing Roles, Oslo, Norway, October 14–18, 2006. NordiCHI 2006, vol. 189,
pp. 373–376. ACM Press, New York (2006)
2. Bertelsen, O.W., Petersen, M.G.: Erotic Life as a New Frontier in HCI. In: Ramduny-Ellis, D.,
Rachovides, D. (eds): Proceedings of the 21st British Computer Society HCI Group Confer-
ence, vol. 2, pp. 7–10 (2007)
3. Boehner, K., DePaula, R., Dourish, P., Sengers, P.: Affect: from information to interaction.
In: Bertelsen, O.W., Bouvin, N.O., Krogh, P.G., Kyng, M. (eds.) Proceedings of the 4th De-
cennial Conference on Critical Computing: between Sense and Sensibility, Aarhus, Denmark,
August 20–24, 2005, pp. 59–68. ACM Press, New York (2007)
4. Brewer, J., Kaye, J., Williams, A., Wyche, S.: Sexual interactions: why we should talk about
sex in HCI. In: CHI 2006 Extended Abstracts on Human Factors in Computing Systems,
Montréal, Québec, Canada, April 22–27, 2006, pp. 1695–1698. ACM Press, New York
(2006)
5. Bødker, S.: When second wave HCI meets third wave challenges. In: Mørch, A., Morgan, K.,
Bratteteig, T., Ghosh, G., Svanaes, D. (eds.) Proceedings of the 4th Nordic Conference on
Human-Computer interaction: Changing Roles, Oslo, Norway, October 14–18, 2006, vol. 189,
pp. 1–8. ACM Press, New York (2006)
6. Carroll, J.M.: Community computing as human-computer interaction. Behaviour and Infor-
mation Technology 20(5), 307–314 (2001)
7. Cockton, G.: Designing worth is worth designing. In: Mørch, A., Morgan, K., Bratteteig, T.,
Ghosh, G., Svanaes, D. (eds.) Proceedings of the 4th Nordic Conference on Human-
Computer interaction: Changing Roles, Oslo, Norway, October 14–18, 2006, vol. 189,
pp. 165–174. ACM Press, New York (2006)
8. Edwards, W.K., Grinter, R.E.: At Home with Ubiquitous Computing: Seven Challenges. In:
Abowd, G.D., Brumitt, B., Shafer, S. (eds.) UbiComp 2001. LNCS, vol. 2201, pp. 256–272.
Springer, Heidelberg (2001)
9. Hirsch, T., Forlizzi, J., Hyder, E., Goetz, J., Kurtz, C., Stroback, J.: The ELDer project:
social, emotional, and environmental factors in the design of eldercare technologies. In: Pro-
ceedings of the 2000 Conference on Universal Usability, Arlington, Virginia, USA, November
16–17, 2000, pp. 72–79. ACM Press, New York (2000)
22 L. Palen and S. Bødker
10. Hochschild, A.R.: The Time Bind: When Work Becomes Home and Home Becomes Work.
Metropolitan Books (1997)
11. Ishii, H., Wisneski, C., Brave, S., Dahley, A., Gorbet, M., Ullmer, B., Yarin, P.: Ambien-
tROOM: integrating ambient media with architectural space. In: CHI 1998 Conference
Summary on Human Factors in Computing Systems, Los Angeles, April 18–23, 1998,
pp. 173–174. ACM Press, New York (1998)
12. Land, F., Hawgood, J., Mumford, E.: A Participative Approach to Forward Planning and Sys-
tems Change, Information Systems Methodologies. LNCS, vol. 65, pp. 39–61. Springer,
Heidelberg (1978)
13. McCarthy, J., Wright, P.: Technology As Experience. MIT Press, Cambridge (2004)
14. Norman, D.: Emotional Design: Why we love (or hate) everyday things. Basic Books, New
York (2004)
15. Palen, L., Salzman, M., Youngs, E.: Discovery and Integration of Mobile Communications in
Everyday Life. Personal and Ubiquitous Computing Journal 5(2), 109–122 (2001)
16. Palen, L., Liu, S.B.: Citizen Communications in Crisis: Anticipating a Future of ICT-
Supported Participation. In: Proceedings of the ACM Conference on Human Factors in
Computing Systems 2007, pp. 727–736. ACM Press, New York (2007)
17. Palen, L., Vieweg, S., Sutton, J., Liu, S., Hughes, A.: Crisis Informatics: Studying Crisis in a
Networked World. In: Proceedings of the Third Annual Conference on E-Social Science,
Ann Arbor, MI, USA, October 7–9 (2007),
http://ess.si.umich.edu/papers/paper172.pdf
18. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
19. Petersen, M.G.: Remarkable computing: the challenge of designing for the home. In: CHI
2004 Extended Abstracts on Human Factors in Computing Systems, Vienna, Austria, April
24–29, 2004, pp. 1445–1448. ACM Press, New York (2004)
20. Weiser, M., Seely Brown, J.: Designing Calm Technology Xerox PARC (1995),
http://sandbox.xerox.com/hypertext/weiser/calmtech/calmtech.htm
Computational Affective Sociology
1 Introduction
The primary tradition in affective computing is represented by Picard’s work [1-3],
but other traditions also exist, and this is a good time to begin to integrate them with
it. In particular, much work has gone into computer-assisted research on the dimen-
sions of human personality, and recently we have begun to see a range of potential
applications of this research in human-centered computing. In this context, the
concept may be defined thus: Personality is the constellation of relatively stable pro-
pensities to feel and act in certain ways in response to a range of stimuli, that distin-
guishes one individual human being from another.
Several fields of social and behavioral science have something to say about the
topic. Personality psychology is the twin of social psychology, as personality greatly
determines how an individual will behave in social interaction, and social interaction
shapes the individual [4]. However, many social psychologists are sociologists rather
than psychologists, and they use terminology such as identity, ideology, attitudes,
preferences, or values that in one or another way overlap the concept of personality.
Two things these perspectives have in common are a strong linkage between emotions
and behavior, and a tendency to link emotion with cognition, especially through such
feeling-laden cognitions as self-image and self-esteem. Clearly, personality is an in-
terdisciplinary topic, and applying it to affective computing compounds the complexity
* Any opinion, finding, and conclusions or recommendations expressed in this material are those
of the author and do not necessarily reflect the views of the National Science Foundation.
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 23–34, 2008.
© Springer-Verlag Berlin Heidelberg 2008
24 W.S. Bainbridge
Conscientiousness: Love order and regularity; Find it difficult to get down to work
Emotional Stability (opposite of Neuroticism): Seldom get mad; Get overwhelmed
by emotions
Imagination (comparable to Openness to Experience): Love to think up new ways of
doing things; Try to avoid complex people.
To explore the logistics of personality capture, I ported Goldberg’s 2,036 items
into a pocket computer. A portable device allows the person to answer in convenient
moments during the day, and facilitates administering very long questionnaires. Sub-
ject A was comfortable rating how well one item described him about every five sec-
onds, so the entire 2,036 took about 3 hours. Then the respondent went through the
list again, rating each as he would wish himself to be in the future. This took another
3 hours, and provided a contrast between the person as he currently views himself
versus the ideal he would wish to achieve, his self-image versus values.
The primary goal of personality archiving is simply to get the individual’s data into
the information system. However, secondary goals can help motivate the person to do
the work necessary for archiving, such as providing him or her with insights about
fundamental desires, social characteristics, and strategies for action. For example,
Table 1 shows how the subject scored on the Big 5.
A high score means the respondent feels he has the characteristic (yourself as you
generally are now) or wants it (yourself as you wish to be in the future). The differ-
ence shows how much the respondent needs to gain the characteristic, according to
his own personal wants. This particular research subject wants to be more extraverted,
and more emotionally stable, but is less interested in becoming more agreeable in
dealings with other people.
Of course personality is not just a person’s self-image and values; it is also re-
flected in what other people feel about us. Table 2 summarizes responses from two
sisters who were asked to rate each other as well as themselves, on Goldberg’s 100
Big 5 items. Note how the two siblings rate on Extraversion. Subject B rates herself at
2.7 on the scale from 1 to 5, whereas her sister rates her higher at 3.5. In contrast, both
agree that Subject C should be rated around 2.0 or 2.1. Both research subjects rate the
other higher on Agreeableness than the other rates herself, and indeed they rate each
other higher than they rate themselves. Subject C rates Subject B higher on Conscien-
tiousness than Subject B rates herself. Each rates the other higher on Intellect or
Imagination than the rater rates herself.
26 W.S. Bainbridge
Subject B Subject C
Rated By: Rated By:
self sibling self sibling
Extraversion 2.7 3.5 2.1 2.0
Agreeableness 2.9 3.5 3.1 3.7
Conscientiousness 3.1 3.9 2.9 2.7
Emotional Stability 3.7 3.5 3.0 2.9
Imagination 3.5 4.7 4.4 4.0
The fact that one person’s description of another may not perfectly match the indi-
vidual’s own self-description does not invalidate the approach. Rather, the two kinds
of data reflect different aspects of the person, and thus combined provide a more
complete description. It should be kept in mind that the Goldberg items concern per-
ceptions of the individual. Very different measures would be used to capture attitudes,
beliefs, preferences, observable behavior, and meanings.
3 Affective Semantics
Sociology tends to conceptualize emotion in terms of the meanings people use to
make decisions and those chains of decisions that constitute planning. This orientation
can be integrated with computer and information science in many ways, but immedi-
ately we can see a connection to decision-support systems and rule-based approaches
in artificial intelligence. For example, one of the most influential classics was the
1897 book, Suicide, by the French pioneer sociologist Emile Durkheim [21]. Rather
than conceptualizing the affective component of suicide psychologically as depression
or despair, Durkheim argued it could often be the result of anomie, which is meaning-
lessness, normlessness, or the state of being without effective rules for living.
Whereas Durkheim believed society was a unit and provided a coherent system of
meaning for all its inhabitants, later sociologists used questionnaire survey research to
uncover the many subcultures in society that could have somewhat different systems
of meaning, and the socio-economic variables that could cross-cut each other and thus
produce a wide range of individual differences.
More than half a century after Durkheim, Talcott Parsons and a group of col-
leagues at Harvard published Toward a Theory of Action, synthesizing the work since
Durkheim and packaging its ideas in a comprehensive categorization scheme [22, 23].
Notably, they argued that anything people respond to can be described in terms of five
dimensions called pattern variables, the first of which was affectivity versus affective
neutrality. That is, a crucial aspect of people’s responses to things, people, and actions
is whether they elicit strong feelings or not. In a separate analysis, the book also dis-
tinguishes cognition, cathexis, and evaluation. These are three distinguishable but
intertwined functions of the human mind: 1) how a person understands something, 2)
how the person feels about it, and 3) how the person decides which of several actions
to perform in response to it. Taken from personality psychology, the concept of
Computational Affective Sociology 27
[34, 35]. For example, if you know a given culture’s semantic differential ratings for
“tired,” “mother,” scold,” and “child,” you can accurately predict how the culture rates,
“The tired mother scolded her child.” I have been mindful of Heise’s work not primarily
because he participated in the Artificial Social Intelligence workshop, nor because he
published his seminal paper with Schneider in the same issue of Mathematical Sociology
where I published one on using neural networks to model human racial prejudice [36],
but because he consistently showed over the years how development of a very large
number of well-calibrated questionnaire items was necessary for the creation of really
effective AI avatars of particular people.
For many groups of adjectives – such as those describing physical, mental or social
characteristics – the correlation between having a quality and judging that it is good is
properly described as self-esteem. This research subject was consistent in evaluating
pairs of antonyms, but I designed the software to point out such inconsistencies so the
person could ponder them. Pilot research has suggested, as Heise would have pre-
dicted, that people tend to agree about which qualities are good or bad, but tend to
rate themselves quite variously in terms of how much they have the characteristics.
From the standpoint of traditional semantic differential research, it might seem
overkill to have 1,600 adjectives, each with two rating scales, for a total of 3,200
30 W.S. Bainbridge
the second collecting negative emotions, and the third collecting four emotions that
describe the energetic character of the stimulus: excitement, surprise, indifference,
and boredom.
The bad-good question asked about the evaluation dimension of the semantic dif-
ferential, and the pocket PC program asked the respondent to rate each stimulus in
terms of additional weak-strong and passive-active scales, to capture the potency and
activity dimensions. In fact, this particular respondent did not distinguish between
potency and activity, combining them mentally in what I am here calling an energetic
dimension. Figure 2 graphs the twenty emotions in terms of the average evaluation at
the two points in time versus the average of the potency and activity scales (ener-
getic), based on correlations with how much the 2,000 stimuli would elicit each emo-
tion in this individual.
The map for another person would be both similar and different. Many members of
many cultures would distinguish potency from activity, but Subject E does not. Inter-
estingly, he considers most positive emotions to be more energetic than most negative
emotions. Fear is the only exception. This is consistent with the prevalent American
ideology that people can actively achieve good things for themselves, but if they re-
main passive they may become victims to bad things. Note that some emotions gener-
ally considered similar occupy almost exactly the same spots on the map, anger and
hate, satisfaction and joy.
Evaluation
-0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80
0.60
Excitement
0.40
Satisfaction
Pride
Surprise
Joy
Love 0.20
Lust Pleasure
Fear Gratitude
Energetic
Desire
Pain
0.00
Anger
Hate
Shame
Disgust
Sadness
-0.20
Indifference
Frustration
Boredom
-0.40
-0.60
References
1. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
2. Cohen, I., Huang, T.S., Chen, L.S.: Affective Computing. In: Bainbridge, W.S. (ed.) Ency-
clopedia of Human-Computer Interaction, Berkshire, Great Barrington, Massachusetts, pp.
7–10 (2004)
3. el Kaliouby, R., Picard, R., Baron-Cohen, S.: Affective Computing and Autism. In: Bainbridge,
W.S., Roco, M.C. (eds.) Progress in Convergence, pp. 228–248. New York Academy of Sci-
ences, New York (2006)
4. Zajonc, R.B.: Emotions. In: Gilbert, D.T., Fiske, S.T., Lindzey, G. (eds.) The Handbook of
Social Psychology, pp. 591–632. McGraw-Hill, Boston (1998)
5. Bainbridge, W.S.: Personality Capture. In: Bainbridge, W.S. (ed.) Encyclopedia of Human-
Computer Interaction, Berkshire, Great Barrington, Massachusetts, pp. 546–551 (2004)
6. Bainbridge, W.S., Brent, E.E., Carley, K., Heise, D.R., Macy, M.W., Markovsky, B.,
Skvoretz, J.: Artificial Social Intelligence. Annual Review of Sociology 20, 407–436 (1994)
7. Frank, R.H.: Passions within Reason, Norton, New York (1988)
8. Frank, R.H.: The Strategic Role of the Emotions. Rationality and Society 5, 160–184 (1993)
9. Brent, E.E., Glazier, J., Jamtgaard, K., Wetzel, E., Hall, P.M., Dalecki, M., Bah, A.: Erving:
A Program to Teach Sociological Reasoning from the Dramaturgical Perspective. Teaching
Sociology 17, 38–48 (1989)
10. Bainbridge, W.S.: Survey Research: A Computer-assisted Introduction, Wadsworth, Bel-
mont, California (1989)
11. Bainbridge, W.S.: Massive Questionnaires for Personality Capture. Social Science Computer
Review 21, 267–280 (2003)
12. Bainbridge, W.S.: Cognitive Technologies. In: Bainbridge, W.S., Roco, M.C. (eds.) Manag-
ing Nano-Bio-Info-Cogno Innovations: Converging Technologies in Society, pp. 203–226.
Springer, Berlin (2006)
13. Da Cunha, A.D., Greathead, D.: Does Personality Matter? An Analysis of Code-Review
Ability. Communications of the ACM 50, 109–112 (2007)
14. McRae, R.R., Costa, P.T.: Reinterpreting the Myers-Briggs Type Indicator From the Per-
spective of the Five-Factor Model of Personality. Journal of Personality 57, 17–40 (1989)
15. McRae, R.R., Costa, P.T.: Toward a New Generation of Personality Theories: Theoretical
Context for the Five-Factor Model. In: Wiggins, J.S. (ed.) The Five-Factor Model of Person-
ality: Theoretical Approaches, Guilford, New York, pp. 51–87 (1996)
16. Kshirsagar, S., Magnenat-Thalmann, N.: A Multilayer Personality Model. In: ACM Symp.
on Smart Graphics, Hawthorne, NY (2002)
17. Arya, A., Di Paola, S.: Multispace Behavioral Model for Face-Based Affective Social
Agents. EURASIP Journal on Image and Video Processing (2007)
18. Su, W.-P., Pham, B., Wardhani, A.: Personality and Emotion-Based High-Level Control of
Affective Story Characters. IEEE Trans. Visualization and Computer Graphics 13, 281–293
(2007)
19. Goldberg, L.R.: The Structure of Phenotypic Personality Traits. American Psychologist 48,
26–34 (1993)
20. Goldberg, L.R.: A Broad-bandwidth, Public Domain, Personality Inventory Measuring the
Lower-level Facets of Several Five-factor Models. In: Mervielde, I., Deary, I., De Fruyt, F.,
Ostendorf, F. (eds.) Personality Psychology in Europe, vol. 7, pp. 7–28. Tilburg University
Press, Tilburg (1999)
21. Durkheim, E.: Suicide. Free Press, New York (1897)
34 W.S. Bainbridge
22. Parsons, T., Shils, E.A. (eds.): Toward a General Theory of Action. Harvard University
Press, Cambridge (1951)
23. Parsons, T.: Some Comments on the States of the General Theory of Action. American So-
ciological Review 18, 618–631 (1953)
24. Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of
Illinois Press, Urbana (1957)
25. Osgood, C.E., May, W.H., Myron, M.S.: Cross-Cultural Universals of Affective Meaning.
University of Illinois Press, Urbana (1975)
26. Bainbridge, W.S.: Semantic Differential. In: Asher, R.E., Simpson, J.M.Y. (eds.) The Ency-
clopedia of Language and Linguistics, pp. 3800–3801. Pergamon, Oxford (1994)
27. Lawler, E.J.: An Affect Theory of Social Exchange. American Journal of Sociology 107,
321–352 (2001)
28. Massey, D.S.: A Brief History of Human Society: The Origin and Role of Emotion in Social
Life: 2001 Presidential Address. American Sociological Review 67, 1–29 (2002)
29. Heise, D.R.: The Semantic Differential and Attitude Research. In: Summers, G.F. (ed.) Atti-
tude Measurement. Rand McNally, Chicago, pp. 235–253 (1970)
30. Heise, D.R.: Controlling Affective Experience Interpersonally. Social Psychology Quar-
terly 62, 4–16 (1999)
31. Morgan, R.L., Heise, D.R.: Structure of Emotions. Social Psychology Quarterly 51, 19–31
(1988)
32. Heise, D.R., Calhan, C.: Emotion Norms in Interpersonal Events. Social Psychology Quar-
terly 58, 223–240 (1995)
33. Heise, D.R.: Project Magellan. Electronic Journal of Sociology 5 (2001),
http://www.sociology.org/content/vol005.003/mag.html
34. Schneider, A., Heise, D.R.: Simulating Symbolic Interaction. Journal of Mathematical Soci-
ology 20, 271–287 (1995)
35. Heise, D.R.: Enculturing Agents With Expressive Role Behavior. In: Payr, S., Trappl, R.
(eds.) Agent Culture, pp. 127–142. Lawrence Erlbaum Associates, Florence (2004)
36. Bainbridge, W.S.: Minimum Intelligent Neural Device: A Tool for Social Simulation.
Mathematical Sociology 20, 179–192 (1995)
37. Bainbridge, W.S.: Validity of Web-based Surveys. In: Burton, O.V. (ed.) Computing in the
Social Sciences and Humanities, pp. 51–66. University of Illinois Press, Urbana (2002)
38. Castronova, E.: Synthetic Worlds: The Business and Culture of Online Games, p. 93. Uni-
versity of Chicago Press, Chicago (2005)
39. Merrick, K., Maher, M.L.: Motivated Reinforcement Learning for Non-player Characters in
Persistent Computer Game Worlds. In: Proceedings of the 2006 ACM SIGCHI international
conference on Advances in computer entertainment technology, Hollywood, California
(2006)
40. Bainbridge, W.S.: The Scientific Potential of Virtual Worlds. Science 317, 472–476 (2007)
Comparing Two Emotion Models for Deriving Affective
States from Physiological Data
1 Introduction
The European Integrated Project e-SENSE1 (Capturing Ambient Intelligence for Mo-
bile Communications through Wireless Sensor Networks) aims at enabling Ambient
Intelligence in “Beyond 3G Systems” using wireless sensor networks (WSN) for
providing context-rich information to applications and services. Dey [1] describes
context as ‘any information that can be used to characterize the situation of an en-
tity.’ With entity, he refers to a ‘person, place, or object that is considered relevant to
the interaction between a user and an application, including the user and applications
themselves.’ According to his definition, a system that is context aware provides rele-
vant information (and/or) services to the user. The relevancy of the information natu-
rally depends on the user’s task.
1
www.e-SENSE.org; e-SENSE is an Integrated Project (IP) supported by the European 6th
Framework Programme. The study is a result of a collaboration between Human-Factors-
Consult HFC, Berlin and the University of Surrey UniS, Guildford.
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 35–50, 2008.
© Springer-Verlag Berlin Heidelberg 2008
36 A. Lichtenstein et al.
Fig. 1. Danger Warning Scenario selected from e-SENSE D1.2.1 ([2] p.13)
Figure 1 depicts one of the envisioned scenarios aiming at personal security. The
collection of voice, breathing rate, heart rate, noise, and position is symbolized. The
woman depicted is riding a public transport vehicle all by herself when a dangerous
person enters and begins to approach her. Fear is inferred from Body-Sensor-Network
data and an alert is triggered to inform security staff members at the next stop [2].
The experiment reported in the following is a first step on processing captured con-
text information. Emotional states are triggered with short audio-visual stimuli and
collected via body sensors.
Several authors (e.g., [3, 4, 5, 6, 7, 8]) based their experimental set-ups on the basic
emotion theory, which assumes the existence of a certain number of universal emo-
tions, that can be distinguished clearly from one another [9]. These are: anger, disgust,
fear, happiness, sadness and surprise. Even though results were often controversial
between those studies, some physiological correlates of emotions could be identified
more frequently than others: increase of heart rate, skin conductance level and systolic
blood pressure has been associated with fear (e.g. [7]), while an increase of heart rate,
systolic and diastolic blood pressure has been associated with anger (e.g., [6]). Sad-
ness has been found to sometimes lead to an increase (e.g., [11]) and sometimes to a
decrease (e.g., [7]) of heart rate. Palomba et al. [12] could not find a significant heart
rate difference to the measured baseline at all.
A second approach to structure emotions is the dimensional one. Authors that have
based their empirical studies on this theoretical assumption (e.g., [13, 14, 15, 16]) have
used two or three dimensions (1. valence, 2. arousal, 3. dominance) to define the emo-
tional states induced in their subjects. The theory goes back mainly to work done by
Russell and Feldman Barrett (e.g., [17, 18, 19]). Using this approach physiological
parameters are not correlated to certain emotional states, but instead to the underlying
dimensions. In most of these studies, valence was found to correlate positively with
heart rate while arousal correlated positively with skin conductance level. Herbon et al.
[20] additionally measured pupil diameter and found it to be a rather strong indicator
of the valence of an emotion with small values for negative emotions and larger values
for positive emotions. The assessment of facial muscle activity has resulted in the iden-
tification of two muscles (zygomatic and corrugator EMG) which correlate with the
dimension of valence.
The question of which of the two approaches is more suitable to be used in emo-
tion measurement studies has not been answered yet. Ritz et al. [21] showed in an
experiment on different respiration parameters that both, the dimensional and the
basic emotion model can be valuable to explain assessed data. They found that
arousal and valence alone could not explain vagal excitation differences between two
positive emotions. This indicates either the necessity of adding another dimension to
differentiate between two particular emotions – which does not have to be essential to
differentiate between others – or the added value of additionally taking into account
the basic emotion model for data analysis. Feldman Barrett [22] suggested that the
applicability of one of the two models might differ individually.
Another way to describe emotions is the appraisal based approach with Klaus
Scherer's component process model being the one mostly used in the HCI context.
Mahlke & Minge's contribution to this book [23] is dedicated to this approach and
elaborates on its prospects for affective computing research.
Most of the above mentioned studies have used photographs to induce positive or
negative emotions in their subjects. This strategy is troublesome in as far as emotions
induced are usually not very intense. The project, however requires the measurement
of intense emotions in everyday life, since only meaningful messages are to be sent by
the e-SENSE applications, e.g., in fearful situations when an alert for the security staff
is triggered. This ensures the reduction of false alarms and generally enhances the
perceived significance of the system. A continuous stream of triggered mood icons
could negatively impact the acceptance and therewith usage in a more private
context as well.
38 A. Lichtenstein et al.
More suitable in the e-SENSE context, Herbon et al. [20] and also Mahlke &
Minge [23] used interactive computer-scenarios of some minute’s length to ensure
that the emotional response would be strong enough to reliably measure physiological
changes. However, interactive scenarios are hard to control and therefore result in a
non-estimable amount of error-variance, which is to be kept at a minimum in this first
phase of the e-SENSE project. The current study therefore used 4-minute-film clips to
a) present a stimulus long enough to induce a strong emotional response and b) stan-
dardize the experimental situation.
3 Experiment
Both theoretical models presented in Chapter 2 coexist and are currently used in psy-
cho-physiological studies. However, their results cannot be compared. Developers of
emotionally intelligent systems have to decide which model to base their systems on
and can then only use data from respective studies. Results of studies based on the
competing model are useless and dispensable. One objective of the experiment was
therefore to compare the two approaches as to their suitability to form the basis for
emotion measurement studies and identify the one approach that researchers should
commit to in the future to avoid inflationary and parallel research without use. Find-
ings on correlations of physiological measures with emotions were to be used to
implement emotionally intelligent applications in the e-SENSE project. Following
general practice in psychological methodology, an experimental design was chosen in
order to support the classification approach by collecting data with minimal noise-
ration, i.e., to control interfering variables, that are unavoidable in field studies as
much as possible. This procedure firstly enables the actual detection of effects if there
are any and secondly avoids the measurement of effects that are possibly not replic-
able because they are based on variable-variations not definable in the field.
If this fundamental research step will be successful, further measures will be taken
to gradually adapt the classification algorithms to the field.
The test sample consisted of 40 subjects (27 male), recruited from the Center for
Communication Systems Research CCSR and Surrey’s School of Management, who
voluntarily took part in the experiment and were not rewarded. Mean age was 30.1
with an age range from 22 to 54. The sample was multi-cultured with representatives
from 16 different countries participating.
The experiment was performed at the I-Lab of the University of Surrey, England
over a period of five days. Subjects sat in a separated 3.2mx4.5m windowless test
room in 2m distance of a 2.4mx1.35m screen (projector solution: 1920x1080 pixels).
Stereo sound was provided by two front speakers. The investigators observed the trial
via a glass pane from a separate room. Microphones ensured that test-leaders and
subjects could talk to each-other in case of any problems.
Subjects were equipped with the HealthLab System (Koralewski Industrie-Elektronik
oHG) for physiological measurement, which included a chest belt for respiration
measurement, ECG electrodes, EDA electrodes (on the palm of the subject’s left hand),
Comparing Two Emotion Models for Deriving Affective States from Physiological Data 39
EMG electrodes for facial muscle measurement and a wristband for measurement of skin
temperature. The procedure was explained to them and it was made clear that they were
to watch short films and afterwards state how they had felt in doing so.
The basic emotion model and the dimensional model imply two different methods
of collecting statements about the actual emotional state of a person: According to the
basic emotion model, a person who is experiencing an emotion is expected to be able
to choose one emotion out of the six universal ones that represents her actual feeling
best. In the experiment subjects were prompted with evaluated pictures of people
making angry, disgusted, fearful, happy, sad or surprised faces after an emotion in-
duction phase. The corresponding emotion word was written underneath each picture
and subjects were asked to pick one. If test-subjects had trouble choosing one of the
pictures/words simply because they did not feel like one of them, they were asked to
say so and in addition state if they either felt a different emotion (then name it) or if
they did not experience an emotion at all.
The dimensional approach implies a very different rating method, which is based
on the assumption that emotions can be described by their degrees of valence and
arousal. Lang [24] introduced rating scales for these dimensions which consist of
pictures of manikins, called SAM for Self Assessment Manikin (see Figure 2).
In the experiment after each emotion induction phase subjects were prompted with
each of the scales separately and then indicated their respective level by saying the
letter that was written underneath either each manikin or in between two manikins.
The emotion induction material consisted of five films of a mean duration of
approximately 4.15 minutes. The main requirement of the material was to give considera-
tion to both approaches of modeling emotions, since both models were to be compared to
one another and constructing material based on only one of the approaches might have
had the effect of decreasing the goodness of fit of the other approach. The films were to
induce emotions in all four quadrants of the coordinate system shown in Figure 3. In
addition, they had to represent one of the six basic emotions anger, disgust, fear, happi-
ness, sadness or surprise.
40 A. Lichtenstein et al.
During the construction phase, all films were constantly validated and adjusted in
three trials of 5–7 subjects each. Film 1 was to induce a positive-arousing emotion. The
target emotion from the basic emotion list was happiness, meaning that an emotion
similar to what is usually referred to as “happiness” was to be induced. For this purpose,
a cartoon was chosen. Film 2 was to induce a negative-arousing emotion, with the target
emotion anger from the basic emotion list. Different scenes from a political drama
movie were carefully chosen. Film 3 was to induce a negative-arousing emotion also,
but the target emotion from the basic emotion list was fear. Different scenes from a
horror movie were chosen. Film 4 was to induce a negative-calming emotion. The target
basic emotion for this film was sadness. It consisted of black-and-white slides, to which
special music in minor mode and slow tempo was added (for empirical evidence on the
emotional character of music see Peretz et al. [25]. Film 5 was to induce a positive-
calming emotion. There is no basic emotion which can be considered to be located in
the fourth quadrant, which is why no target emotion from the basic emotion list could be
set for this film. An emotion like content was kept in mind while the film was con-
structed. Again, special music was utilized.
At the end of the experiment, subjects were asked to fill out a brief questionnaire
on their experience using the two different rating methods. Preference, difficulty and
advantages of the rating scales were assessed.
Data was recorded via Bluetooth on a laptop. There were separate data records for
each film and person, which contained the data of all sensors. In the following sec-
tion, the utilised physiological measures and derived parameters are described.
Electrocardiogram (ECG): An important parameter which can be extracted from the
ECG is heart rate variability (HRV), which is calculated by analyzing the time series
Comparing Two Emotion Models for Deriving Affective States from Physiological Data 41
of beat-to-beat intervals. Heart rate is the average number of heart beats per minute.
Different studies indicated fairly strong correlations with valence [26, 27]. Differ-
ences between adjacent beat-to-beat intervals that are larger then 50 msec are meas-
ured with pNN-50. As mental stress increases, HRV decreases and thus, pNN-50
should decrease as well. A frequency domain analysis was performed to investigate
effects of the sympathetic and parasympathetic nervous system.
The LF/HF ratio (high frequency to low frequency) seems to be the most promis-
ing parameter from the frequency domain analysis, as it describes the relation be-
tween the activating (sympathetic) and deactivating (parasympathetic) influence of
the autonomic nervous system [28]. Larger values indicate an increased activity level.
Breathing rate data has been collected via a chest strap and passed a similar pre-
processing as the heart rate data. After baseline and shock removal the peaks of single
breaths were detected. Breathing rate was measured as number of breaths per minute.
If the amplitude met certain criteria, the inspiration was marked as deep breath. As the
number of deep breaths increases in calm situations, a negative correlation with
arousal is assumed.
Electrodermal activity (EDA) is a measure of the potential difference between two
areas of the skin and is often associated with levels of emotional arousal [29]. At
large, electrodermal activity consists of two components. The tonic component is a
low frequency baseline conductivity level, the phasic is of higher frequency. Espe-
cially for EDA highly significant correlations with valence and arousal have been
observed [13]. Arousal has been hypothesized to be a drive state or a non-specific
energizer of behavior, something that describes the intensity of an experience. The
EDA-median of the raw signal provides an impression about the average level of the
electrodermal activity. Statistically meaningful tonic parameters are: EDA-changes
(total number of turning points of the low-pass filtered signal) and EDA-increase
(percentage of increasing data of the low-pass filtered data curve). Selected phasic
parameters are: EDA-responses (total number of short term variations of the signal
with amplitudes larger than 3 kOhm and a duration of the decrease of 0.5–5 sec [30])
and EDA-slope (average slope of a response).
The electromyogram (EMG) measures electrical potentials generated by the mus-
cles in activation and rest. Zygomatic EMG (corners of the mouth) and corrugator
EMG (eyebrows) were used in this study. Very strong correlations with the valence
dimension are described in [13]. The following parameters were extracted: EMG-
reactions (total number of typical contractions of the respective muscle),
EMG-activity (percentage of the data in which any activity is measured) and EMG-sd
(standard deviation of the signal). Higher values indicate stronger muscle activity.
Skin temperature was measured on the wrist. It evolves inversely proportional to
skin resistance. Two parameters were extracted from the signal: TEMP-mean (mean
value of the smoothed signal) and TEMP-increase (gradient of the linear regression
function of the signal).
This vast number of parameters was reduced after preliminary computations to the
nine most promising ones that were then included in further analyses. Results are
reported in the following chapter.
42 A. Lichtenstein et al.
Pearson correlations of the physiological parameters and the two dimensions valence
and arousal were calculated. Table 1 summarizes the results of this analysis. While
changes in ECG and EMG parameters are mainly associated with changes in valence,
EDA measures correlate with arousal only.
Valence Arousal
ECG HR -.248** -
BR .142* .152*
BR
deep breaths - -
Median - -.330**
EDA Increase - -.149*
Responses - -
corr. SD -.162* -.162*
EMG
zyg. Act % .315** -
Temperature Increase - -
For analyses of basic emotion ratings, repeated measures ANOVA’s were con-
ducted. While the experimental set-up was originally a within-subjects design, analy-
ses had to be conducted in a between-subjects design because hardly any subjects felt
every one of the six emotions on the list2, but instead usually four or five different
ones. However, results were corrected for test subject number in order not to bias
effect sizes. Results are reported parameter-wise.
Heart rate varied significantly with the basic emotion subjects reported to have ex-
perienced (F(5, 635)=3.8, p<.01, part. Eta2=.119) and pair-wise comparison revealed group
differences between happy and sad. Breathing rate varied significantly (F(5, 635)=4.0,
p<.01, part. Eta2=.125), which was due to differences between happy and sad also. The
number of deep breaths did not show a significant effect for basic emotion rating. While
analyses did not reveal significant variations for EDA-increases, EDA-responses varied
significantly (F(5, 635)=3.0, p<.05, part. Eta2=.097), mainly due to group differences be-
tween happy and sad again. EDA-median varied significantly (F(5, 635)=3.5, p<.01, part.
Eta2=.106) and differentiated between afraid and angry, afraid and sad, happy and sad,
and sad and surprise. Both, corrugator and zygomatic EMG varied significantly with
values of F(5, 635)=3.6, p<.01, part. Eta2=.112 and F(5, 635)=5.8, p<.01, part. Eta2=.170,
respectively. While corrugator EMG mainly differed between happy and sad, zygo-
matic EMG significance was additionally due to group differences between a) happy
and angry and b) happy and disgusted. Temperature parameters did not reach signifi-
cance in this ANOVA.
2
Boredom and Contentment were additionally stated by a number of subjects, even though
they were not on the list. They are included in classification analyses later in this chapter.
Comparing Two Emotion Models for Deriving Affective States from Physiological Data 43
No 1 No 2 No 3 No 4 No 5
(Happiness) (Anger) (Fear) (Sadness) (Contentment)
HR BR
HR BR EDA-median
EDA-median BR EDA-median EDA-increase
Happiness
ZEMG ZEMG EDA-response CEMG-SD
CEMG-SD ZEMG
ZEMG-activity Temp-increase
HR BR HR
Anger
EDA-median CEMG Temp-increase
HR EDA-median
Fear EDA-median EDA-increase
Temp-increase
44 A. Lichtenstein et al.
4.3 Classification
Following the film-based approach, two SVMs were trained for the dimensional and
one was trained for the basic emotion model. Normalized self ratings were divided
into two dimensions (arousal and valence) each with two groups (high and low). In
contrast to the film-based dimensional analysis, here valence was predicted slightly
better than arousal. However, the percentage of correct estimations of arousal was
about 10% lower (see Figure 4). Predictive parameters for arousal were EDA-increase,
breathing rate and EDA-reactions. Valence was predicted best with heart rate and EDA
reactions.
Fig. 4. Comparison of film based (left) and rating based (right) dimensional classification
For the basic-emotion approach, subjects had to choose between different emotion
categories, which differed in number from the presented films. Additional basic emo-
tions were ‘boredom’ and ‘surprise’ as well as ‘contentment’ for the sake of compara-
bility. Numbers of chosen basic emotions were not uniformly distributed. This and the
larger number of classes (8 vs. 5) resulted in inferior classification rates (about 30%)
in comparison to the film-based analysis. Even utilizing specific instead of identical
parameter sets did not lead to better results (boredom: 50%, surprise: 42.5%, fear:
35.4%, disgust: 33.3%, contentment: 30%, sadness: 28.3, happiness: 26.6%, anger:
25%). Most frequently used parameters were EDA-increase, breathing rate, deep
breaths, zygomatic-EMG activity and TEMP-increase.
When asked whether they preferred one of the two rating methods, i.e. the dimen-
sional or the basic emotion approach, 18 subjects stated that they preferred using the
basic emotions and 22 stated that they preferred using the dimensional scale. Advan-
tages were identified through content-analysis of written choice-explanations.
Eight subjects chose the basic emotion model for its intuitiveness, seven for its
clearness. A sample statement from one of the subjects describing this issue was: “It’s
more close to the fact, very clear.” Three subjects stated that they liked the facial
photographs more than the manikins from the dimensional rating scale and three sub-
jects described that the basic emotion approach because it takes into account the
46 A. Lichtenstein et al.
25
20
15 Valence
Arousal
10 Basics
0
very easy fairly easy it depends pretty hard very hard
Fig. 5. Difficulty ratings for the dimensional scales and the basic emotion list
5 Discussion
5.1 Subjective Ratings and Physiological Reactions
in comparison to the nine-point valence and arousal scales. Similar to factor analyses,
loss of information here can be due to the fact that the dimensional model constitutes
some sort of underlying structure of a categorical approach.
Several parameters that differentiated between films did not differentiate between
subjective emotional states. Analysis of subjective ratings showed a strong effect for
the happy and sad conditions, but few for others. This bias towards happy and sad
differentiation could be observed for film also and the authors propose that this is due
to the fact that happy and sad are the two states (and films, respectively) that differ
from each other most. In a valence-arousal-plane happy is located in quadrant 1 (top
right), while sadness is located in quadrant 3 (bottom left). Other emotional states
induced in the experiment only differ from one another in either valence or arousal,
while anger and fear do not differ in one of the dimensions at all. This greater similar-
ity obviously makes differentiation a lot harder.
However, there were other films where a differentiation based on physiological re-
actions was possible, which are worth a discussion: Zygomatic muscle activity
appeared to be a very valid parameter for detecting something like happiness. It sig-
nificantly differentiated the induced emotion of film 1 from all other emotional states.
Temperature was especially relevant for film 5, something like contentment. In con-
trast, heart rate and even more so electrodermal activity appear to be useful in a more
general sense. Both – combined or individually – are important parameters for the
differentiation of a number of groups, e.g. happiness/sadness, happiness/anger, con-
tentment/anger, fear/anger, fear/sadness and fear/contentment. Corrugator muscle
activity was again a very specific parameter.
It is apparent that film and rating ANOVAS are very different. Film ANOVA led
to a much greater amount of significant results. This indicates that – at least when a
basic emotion model is utilized – objective data on induced emotions is superior over
subject’s self-assessment. We do not assume that really all of our subjects had so very
different emotional reactions to the induction material, but instead believe that rating
differences are due to different standards of evaluation. Emotion words and pictures
from the basic emotion scale were apparently interpreted individually. An aggravating
factor specific to our experiment was a non-representative sample. Subjects scored
48 A. Lichtenstein et al.
above-average for rationality and control, while answering slightly below average for
extraversion. This indicates that self-assessment might not have been perfectly
reliable.
5.3 Classification
SVM analyses led to very satisfactory results, especially for the basic emotion model.
Correct estimations were made in up to 72.5% of the time for film 1 (happiness) with
guessing probability of 20% in the ‘one out of five’ model. Film 2 (anger), 3 (fear),
and 4 (sadness) were also classified fairly well with correct estimations up to 60%.
Even the estimation for film 5 (contentment) with 52.5% was still 2.6 times greater
than guessing probability. Naturally, estimation rates dropped when more categories
were taken into account with the self-assessment, but when guessing probability
(12.5% in the ‘one out of eight’ model) is taken into account, differences are not as
large after all. However, a slight difference remains and in the end ‘film’ still seemed
to form a better basis for classification analysis than self-assessment. Film-wise esti-
mation rates were three times greater than guessing probability on average while they
were about 2.7 times greater for subjective ratings.
Estimation rates for the dimensional model were inferior to those for the basic
emotion model with a range from 65 to 82% and a guessing probability of 50%. This
finding confirms the results of the statistical analyses, in which the basic emotion
model accounted for larger explained variance.
Results of the post-experimental questionnaire did not yield any differences between
the two approaches. Preference and difficulty ratings were approximately balanced
and advantages were stated for both scales.
6 Outlook
Following the results reported in this study, the authors suggest to utilize subjective
ratings all along the stimulus design in order to evaluate and check the subjective
validity of the material. However, since subjective measures in this field of research
are strongly biased by inter-individual differences in nomenclature interpretation,
additional variance to the already largely varying physiological data may lead to de-
creased effect detection. A more stable and comparable result can be reached by using
evaluated material that is also the base for data variance analysis. In the study re-
ported films were used to induce different emotional states. Comparisons based on
films led to a larger selectivity than those based on subjective ratings.
The selection algorithm developed so far can now be applied to a standardized set-
ting for demonstrating the overall system usage, i.e., the emotional state of a person
can be measured and used to trigger a message such as ‘This movie is funny, I would
recommend it’ in the private context. In real life such a situation is quite controllable
since a person watching a movie usually sits in a chair and concentrates. For more
lively scenarios, the algorithms will have to be trained with a higher noise-ratio in
Comparing Two Emotion Models for Deriving Affective States from Physiological Data 49
semi-controlled and finally in field trials. In doing so, adjustments will have to be
made on a scenario-by-scenario basis.
For the development of ambient intelligent systems, two variants of system training
can be proposed: Firstly, the subjective rating can be applied in various situations by
the user him-/herself. Inter-individual differences and the above mentioned bias are not
influential when the systems can be customized. Secondly, and less obtrusive can be a
learning algorithm that is supported by additional context information besides the emo-
tional state and therewith correlates certain situations (places, things, individuals, etc.)
with the user’s affective data. This, however, is a hard challenge for the development
of such systems. Yet, it would depict a true integration of context information.
Following our literature review, voice parameters seem to offer additional predic-
tive value for affective state recognition. The unobtrusive measurement via micro-
phone could be easily utilized for ambient intelligence. In order to measure these data,
however, another test setting has to be chosen in order to ensure voice samples valid
enough for analysis. Besides the advantage of easy usage, this is also a disadvantage
of the measure as such: the user has to talk for a sufficient time period.
The following research will be dedicated to additional data collection for the detec-
tion-algorithm learning-process including an improved, more focussed emotion induc-
tion. Besides SVM, other algorithms will be tested (e.g., neural networks and fuzzy
clustering).
References
1. Dey, A.K.: Understanding and Using Context. Personal and Ubiquitous Computing Jour-
nal 5, 4–7 (2001)
2. Forest, F., Oehme, A., Yaici, K., Verchère-Morice, C.: Psycho-Social Aspects of Context
Awareness in Ambient Intelligent Mobile Systems. In: 15th IST Mobile & Wireless Com-
munication Summit, Myconos (2006),
http://www.ist-esense.org/index.php?id=149
3. Ax, A.: The physiological differentiation between fear and anger in humans. Psychosomatic
Medicine 55, 433–442 (1953)
4. Ekman, P., Levenson, R.W., Friesen, W.: Autonomic nervous system activity distinguishes
among emotions. Science 221, 1208–1210 (1983)
5. Palomba, D., Stegagno, L.: Physiology, perceived emotion and memory: responding to film
sequences. In: Birbaumer, N., Öhmann, A. (eds.) The Structure of Emotion, pp. 158–168.
Hogrefe & Huber Publishers (1993)
6. Fredrickson, B.L., Mancuso, R.A., Branigan, C., Tugade, M.M.: The undoing effect of posi-
tive emotions. Motivation and Emotion 24, 237–257 (2000)
7. Christie, I.C.: Multivariate discrimination of emotion-specific autonomic nervous system ac-
tivity. MSc Thesis, Virginia Polytechnic Institute and State University (2002)
8. Nasoz, F., Alvarez, K., Lisetti, C.L., Finkelstein, N.: Emotion recognition from physiological
signals for presence technologies. International Journal of Cognition, Technology, and
Work 6 (2003)
9. Ekman, P.: An argument for basic emotions. Cognition and Emotion 6(3/4) (1992)
10. Johnstone, T., Scherer, K.R.: Vocal communication of emotion. In: Lewis, M., Haviland-
Jones, J. (eds.) Handbook of Emotions, 2nd edn., pp. 220–235. Guilford Press, New York
(2000)
50 A. Lichtenstein et al.
11. Levenson, R.W., Ekman, P., Friesen, W.V.: Voluntary facial action generates emotion-
specific autonomic nervous system activity. Psychophysiology 27, 363–384 (1999)
12. Palomba, D., Sarlo, M., Agrilli, A., Mini, A., Stegagno, L.: Cardiac response associated with
affective processing of unpleasant film stimuli. International Journal of Psychophysiology 36,
45–57 (1999)
13. Bradley, M., Greenwald, M.K., Hamm, A.O.: Affective picture processing. In: Birbaumer,
N., Öhmann, A. (eds.) The Structure of Emotion, pp. 48–65. Hogrefe & Huber Publishers,
Toronto (1993)
14. Detenber, B.H., Simons, R.F., Bennett, G.G.: Roll ’em!: the effects of picture motion on
emotional responses. Journal of Broadcasting and Electronic Media 21, 112–126 (1998)
15. Anttonen, J., Surakka, V.: Emotions and heart rate while sitting on a chair. In: CHI 2005
Conference Proceedings, pp. 491–499. ACM Press, New York (2005)
16. Peter, C., Herbon, A.: Emotion Representation and Physiology Assignments in Digital Sys-
tems. Interacting With Computers 18, 139–170 (2006)
17. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39,
1161–1178 (1980)
18. Russell, J.A., Feldman Barrett, L.: Core Affect, Prototypical Emotional Episodes, and Other
Things Called Emotion: Dissecting the Elephant. Journal of Personality and Social Psychol-
ogy 76(5), 805–819 (1999)
19. Russel, J.A.: How shall an emotion be called? In: Plutchik, R., Conte, H. (eds.) Circumplex
Models of Personality and Emotion, APA, Washington, pp. 205–220 (1997)
20. Herbon, A., Peter, C., Markert, L., van der Meer, E., Voskamp, J.: Emotion studies in HCI –
a new approach. In: Proceedings of the 2005 HCI International Conference, Las Vegas
(2005)
21. Ritz, T., Thöns, M., Fahrenkrug, S., Dahme, B.: Airways, respiration, and respiratory sinus
arrhythmia during picture viewing. Psychophysiology 42, 568–578 (2005)
22. Feldman Barrett, L.: Discrete Emotions or Dimensions? The Role of Valence Focus and
Arousal Focus. Cognition and Emotion 12(4), 579–599 (1998)
23. Mahlke, S., Minge, M.: Consideration of Multiple Components of Emotions on Human-Tech-
nology Interaction. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer
Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
24. Lang, P.J.: Behavioral treatment and bio-behavioral assessment: Computer applications. In:
Sidowsky, J.B., Johnson, J.H., Williams, T.A. (eds.) Technology in mental health care deliv-
ery systems, pp. 119–137. Ablex, Norwood (1980)
25. Peretz, I.: The nature of music from a biological perspective. Cognition 100, 1–32 (1998)
26. Prkachin, K.M., Williams-Avery, R.M., Zwaal, C., Mills, D.E.: Cardiovascular changes during
induced emotion: an application of Lang’s theory of emotional imagery. Journal of Psychoso-
matic Research 47, 255–267 (1999)
27. Neumann, S.A., Waldstein, S.R.: Similar patterns of cardiovascular response during emotional
activation as a function of affective valence and arousal and gender. Journal of Psychosomatic
Research 50, 245–253 (2001)
28. Malik, M., Bigger, J., Camm, A., Kleiger, R.: Heart rate variability - Standards of measurement,
physiological interpretation, and clinical use. European Heart Journal 17, 354–381 (1996)
29. Cook, E.W., Lang, P.J.: Affective judgement and psychophysiological response. Dimensional
covariation in the evaluation of pictorial stimuli. Journal of psychophysiology 3, 51–64
(1989)
30. Roedema, T.M., Simons, R.F.: Emotion-processing deficit in alexithymia. Psychophysiol-
ogy 36, 379–387 (1999)
31. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1999)
Consideration of Multiple Components of Emotions in
Human-Technology Interaction
1 Introduction
A necessary precondition for studying emotions and affect as part of the user experi-
ence is the assessment of product-driven emotional reactions. To measure emotional
reactions, a variety of methods can be applied. They range from physiological meas-
ures, such as heart rate and electrodermal activity (EDA), electromyography (EMG)
or pupil responses, and the analysis of facial expressions captured by video, to various
kinds of survey methods, like questionnaires, interviews, etc. No single parameter can
index emotional states unambiguously. Psychological research suggests that emotions
represent a complex phenomenon consisting of reactions on various component levels
[2, 3]. Therefore the assessment of emotional reactions can be improved by combin-
ing methods that are associated with different components of emotions.
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 51–62, 2008.
© Springer-Verlag Berlin Heidelberg 2008
52 S. Mahlke and M. Minge
cognitive behavioral
appraisals tendencies
emotion
triad
motor physiological
expressions reactions
methods and assessment scales. Therefore in this section, we discuss a selected vari-
ety of quantitative measurement approaches and consider their application in previous
research in human-technology interaction.
Several methods can be used to gain information on physiological reactions. The most
promising way to determine emotional connotations is the measure of electrodermal
activity (EDA). Common parameters are skin conductance response (SCR), skin re-
sistance response (SRR) and skin potential response (SPR). The EDA is merely con-
trolled by sympathic activation. Previous research suggests that higher decreases in
EDA are correlated with more negatively valanced situations [11, 12].
A second way to gain information on physiological activation is to record heart ac-
tivity by an electrocardiogram. There are a variety of parameters for analyzing and
interpreting the raw signal. Common time-related parameters are heart rate, inter-
beat-interval and heart rate variability [13]. However, these show inconsistent results
for predicting emotional valence in human-computer interaction. While Ward &
Marsden [11] describe a decrease of heart rate in negative valenced situations, other
authors found a positive correlation between heart rate and valence [14]. Summariz-
ing, heart activity seems to be a more reliable indicator for arousal and mental work-
load than for emotional valence [13].
Finally, pupillometry is the study of the dilatation of the pupil. Previous research
suggests consistently that, like heart activity, pupillometry is a powerful measure for
autonomic responses and mental workload. The more demanding a process is, the
larger the pupil is supposed to be [15]. Additionally, Hess & Polt [16] found a signifi-
cant correlation between dilatation and the valence of a stimulus. Thus, more pleasant
stimuli are susceptible for generating more dilated pupils.
54 S. Mahlke and M. Minge
Measurements of motor expressions are related to facial and body expression, ges-
tures and speech characteristics. Especially, the relation between emotions and facial
expressions has been studied extensively. The Facial Action Coding System (FACS)
is one approach to assess facial expression. It is based on the analysis of 44 facial
muscles that are activated single or in combination by an individual. A trained person
categorizes the observed pattern of activity in respect to the six culture-independent
basic emotions fear, anger, joy, disgust, grief, and surprise [17]. To gain reliable in-
formation, FACS needs an intensive training in observation. Video-computer-based
analysis of facial activity does not yet lead to comparable results [18].
Electromyography (EMG) measures spontaneous muscle activity and is another
way to study facial expression. Sensors are placed on the muscle regions and detect
sensitively minimal voltage caused by activity. Facial EMG studies have found that
activity of the corrugator supercilii muscle, which lowers the eyebrow and is involved
in producing frowns, varies inversely with the emotional valence of presented stimuli
and reports of emotional state. The activity of the zygomaticus major muscle, which
controls smiling, is positively associated with positive emotional stimuli and positive
affect [19, 20].
Another approach based on measuring motor expressions is the analyses of speech
characteristics, like speed, intensity, melody, and loudness. Empirical research sug-
gests that these qualities are highly correlated with emotional feelings, and are there-
fore reliable indicators for emotional reactions [21].
Different approaches examine the role of cognition in the development process of emo-
tions [5, 22, 23, 24]. In other words, they deal with the problem that the same situation
can induce different emotions depending on how a person interprets the situation. All
theories of cognitive appraisals of emotions offer a set of dimensions that are proposed
to be relevant for the interpretation of a situation in relation to the development of emo-
tions. Although the proposed dimensions differ to some extent, efforts have been re-
cently made to find an integrative model of cognitive appraisals [25].
Based on the appraisal dimensions quantitative and qualitative methods can be
used [26]. A quantitative approach is the GAF (Geneva appraisal questionnaire) by
Scherer [6]. The items of the GAF represent the five dimensions of Scherer’s cogni-
tive appraisal theory: intrinsic pleasantness, novelty, goal conduciveness, coping
potential, and norm/self compatibility [1]. Novelty is connected to familiarity and
predictability of the occurrence of a stimulus, while the intrinsic pleasantness dimen-
sion describes whether a stimulus event is likely to result in positive or negative
emotion. A goal relevance check establishes the importance of a stimulus for the
momentary hierarchy of goals/needs. The dimension of coping potential relates to the
extent an event can be controlled or influenced by an agent, and norm/self compatibil-
ity is connected to internal and external standards.
Consideration of Multiple Components of Emotions in Human-Technology Interaction 55
Additionally, various qualitative methods can be used. The thinking aloud method
that is very common in human-technology interaction can also be used to gain ap-
praisal-related information. People are encouraged to state and describe every emotional
reaction they feel during interaction with a technological system. The statements have to
be recorded properly, analyzed by a qualitative procedure and related to the appraisal
dimension. To prevent non-ecological interaction between usage and assessment, the
thinking aloud method can be applied retrospectively, e.g. by presenting a video confron-
tation that took place after interaction completion.
Scherer defines the role of behavior tendencies as preparing for reactions. Indicators
of these reactions can be measured in different ways. Behavioral measures as per-
formance related variables have a long tradition in human-technology interaction
research. Central indicators of performance are speed of reaction (e.g. the time re-
quired for single input operations or completing a defined goal), the accuracy of
reaching a goal, the number of errors and the number of creative ideas during interac-
tion with a system. Findings of Partala & Surakka [27] indicate that behavioral data is
related to EMG values. The results demonstrated that the less the corrugator supercilii
muscle was activated during interaction with a usable designed system, the higher was
the rate of successful and goal conductive reactions. As indicators of behavioral ten-
dencies, also unspecific questionnaires about the intention of use or the intention of
purchase can be mentioned. However, these indicators are connected with some prob-
lems of reliability [3].
Summarizing, we discussed a number of methods that can be associated with the
five components of emotion. The question remains to what extent a combination of
methods based on the component approach offers a comprehensive way to understand
different aspects of the emotional user experience in the context of HCI.
The aim of the following study was to investigate the relations that exist between the
five components of an emotional experience in an interactive context. Therefore we
composed a combination of various methods representing the full range of compo-
nents of Scherer’s model. Emotional-related changes on the components were re-
corded while participants interacted with two versions of a computer-based simulation
of a mobile phone.
4.1 Method
Participants: Thirty individuals (15 men, 15 women) with an average age of 25.9
years took part in the study. Most of them were students at Technische Universität
Berlin. They were familiar with usage of mobile phones and had moderate to high
computer experience. Participants were paid for taking part in the study based on their
performance.
56 S. Mahlke and M. Minge
Participants started with one version and completed the first set of tasks. Then they
switched to the other system to solve the remaining tasks. Maximum time for each
task was two minutes. Heart rate, EDA and EMG were measured during task comple-
tion. After each task, participants filled in the SAM scales. After the electrodes were
removed, we started the video confrontation. The participants watched their video-
taped task completion behavior and were asked to think aloud what they felt during
Consideration of Multiple Components of Emotions in Human-Technology Interaction 57
system use. After the presentation of each task they filled in the short appraisal ques-
tionnaire. To ensure a realistic emotional involvement, participants were paid
depending on their performance. They started with a credit of 15 Euro, which was
reduced by 1 Euro whenever a task could not be completed. Participants were in-
formed about each reduction and were constantly aware of the amount of money that
was left. A session lasted about 75 minutes. Time for task completion was about fif-
teen minutes overall.
Data reduction: Heart rate and EDA were measured as differences from the
individual baseline level in order to reduce inter-individual differences and allow
comparisons between subjects. For the heart rate data we converted the time series
data to single points through averaging the time series for each task. Regarding EDA,
we decided to interpret the amplitude of the maximum reaction value for each task in
respect to the latency and recovery time of the EDA signal. The EMG data was
integrated and t-transformed. All utterances received from the retrospective thinking
aloud method were categorized with respect to the appraisal dimensions. Affirmative
utterances were offset against negating ones and means were estimated for each
dimension and all participants. For indicating the behavior intention, the time for task
completion was divided by the number of inputs to get the average time per input.
4.2 Results
Table 2. Correlations between dependent variables of the emotion triad (*p < .05; **p < .01)
The results demonstrate a significant correlation between the valence dimension and
the arousal dimension of SAM (r = -0.32, p <.01). This connection of the two theoreti-
cally independent dimensions may be caused by the stimuli we chose. The “ill-designed”
version led to high arousing reactions with negative valence, while positive and low
arousing emotions were experienced when the well-designed version was used. On the
other hand, no significant correlations were found between the data recorded through the
two methods of physiological measurements, i.e. EDA and heart rate.
Regarding the two methods we used to assess cognitive appraisals, we also found dif-
ferences. Correlations between intrinsic pleasantness, goal/need conduciveness, and
coping potentials lie between r = 0.44 and r = 0.71 and were highly significant (p < .001).
On the other hand, no significant correlations were found on the dimensions of novelty
and norm/self compatibility.
In another step, we analyzed the relations between the elements of the emotion
triad and the other two components. The dimensions of the appraisal questionnaire
were highly correlated with the valence dimension of the self-assessment manikin
(see Table 3). Valence correlated positively with pleasantness (r = 0.73), goal/need
conduciveness (r = 0.64), coping potential (r = 0.64), and norm/self compatibility (r =
0.64). Valence and novelty were negatively correlated (r = -0.44). All correlations
were statistically significant (p < .01). Smaller correlations could be found between
the arousal dimension of subjective feeling component and the appraisal dimensions.
Physiological and motor expression data correlated with some of the appraisal dimen-
sions slightly between r = 0.13 and r = 0.23. The correlation between the data gained
with the retrospective think aloud method and the emotion triad differed for the two
appraisal dimensions novelty and norm/self compatibility. Correlations were smaller
for these dimensions with the respect to all components of the emotion triad.
The average time per input – our measure of the behavioral component – was sig-
nificantly higher in situations that were experienced as less pleasant, goal conductive,
capable and norm/self compatibility (see Table 4). The behavioral data also correlated
with the valence and arousal dimension of the subjective feeling component (r = -0.23
and r = 0.14). No significant correlations were found between the behavioral compo-
nent and the physiological or expressive ones (see Table 5).
Table 3. Correlations between dependent variables of the emotion triad and data from the
cognitive appraisal questionnaire (*p < .05; **p < .01)
Table 4. Correlations between dependent variables of the cognitive appraisal questionnaire and
the behavioral component data (*p < .05; **p < .01)
Table 5. Correlations between dependent variables of the emotion triad and the behavioral
component data (*p < .05; **p < .01)
4.3 Discussion
The results support the assumption that emotional user experience is determined by a
number of different but related components. Summarizing the correlations between
the components, we found high correlations between cognitive appraisal and subjec-
tive feeling data. Both are connected significantly but with smaller correlations to the
other three components of emotions. No significant correlations were found between
physiological and expressive reactions, and both components also did not show any
connection to behavioral tendencies.
Looking at the correlations in more detail the results regarding valence are consis-
tent with our expectations. Measures showed lower EDA values and less activity of
corrugator supercilii when experienced emotions were rated as rather positive. These
results are consistent with earlier findings [11, 27]. Moreover, EDA measures and
heart rate correlated positively with the arousal dimension.
Although the detected pattern of correlations is rather coherent and consistent, not
all methods led to expected results. Especially, our results regarding the activity of the
zygomaticus major differ from most other studies, which found higher activity in
relation to positive emotions [27]. Instead, our data points in the same direction as
other experiments which also detected high activity of the zygomaticus major for
negative emotions [29]. Hence, it seems that the activity of the zygomaticus major is
not a reliable indicator for positive feelings. An alternative explanation could be that
we did not induce strong positive emotions in our setting. These results point to an
area for future research concerning emotional expressiveness in different situations of
interactive system usage.
Another point for discussion is the extent of the correlations we found between
subjective feeling, physiological and expressive measures. Although, other studies on
emotions showed similar correlations between physiological measures and ratings and
discussed this as a problem of emotion research [12], these results may not only be
caused by measurement uncertainties, but also by the theoretical premise that the
components of emotions represent different aspects of emotions that are only corre-
lated in a specific way. The second assumption would lead to the conclusion that only
60 S. Mahlke and M. Minge
Acknowledgements
This research was supported by the German Research Foundation (DFG) as part of
the Research Training Group ‘Prospective Engineering of Human-Technology Inter-
action’ (no. 1013).
References
1. Scherer, K.R.: On the nature and function of emotion: A component process approach. In:
Scherer, K.R., Ekman, P. (eds.) Approaches to emotion, pp. 293–317. Erlbaum, Hillsdale
(1984)
2. Larsen, R.J., Fredrickson, B.L.: Measurement issues in emotion research. In: Kahnemann, D.,
Diener, E., Schwarz, N. (eds.) Well-Being - the foundations of hedonic psychology, New
York, Russel Sage Foundation, pp. 40–60 (1999)
3. Brave, S., Nass, C.: Emotion in HCI. In: Jacko, J., Sears, A. (eds.) The Human-Computer-
Interaction Handbook, pp. 81–96. Lawrence Erlbaum, Erlbaum (2003)
4. Izard, C.E.: Human Emotions. Plenum Press, New York (1977)
5. Lazarus, R.S.: Emotion and adaptation. Oxford University Press, New York (1991)
6. Scherer, K.R.: Appraisal considered as a process of multi-level sequential checking. In:
Scherer, K.R., Schorr, A., Johnstone, T. (eds.) Appraisal processes in emotion: Theory,
methods, research, pp. 92–120. Oxford University Press, New York (2001)
7. Mehrabian, A., Russell, J.A.: An approach to environmental psychology. MIT Press, Cam-
bridge (1974)
8. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychol-
ogy 39, 1281–1288 (1980)
9. Lang, P.J.: Behavioral treatment and bio-behavioral assessment: Computer applications. In:
Sidowski, J.B., Johnson, H., Williams, T.A. (eds.) Technology in Mental Health Care Deliv-
ery Systems, Norwood NJ, pp. 119–137. Ablex Publishing, Greenwich (1980)
10. Russell, J.A., Weiss, A., Mendelssohn, G.A.: The Affect Grid: A single-item scale of pleas-
ure and arousal. Journal of Personality and Social Psychology 57, 493–502 (1989)
11. Ward, R.D., Marsden, P.H.: Physiological responses to different WEB page designs. Interna-
tional Journal of Human-Computer Studies 59, 199–212 (2003)
12. Herbon, A., Peter, C., Markert, L., van der Meer, E., Voskamp, J.: Emotion studies in HCI - a
new approach. In: HCII 2005 proceedings, Lawrence Erlbaum, Mahwah (2005)
62 S. Mahlke and M. Minge
1 Introduction
Human communication is essentially emotional: most human interactions with the
surrounding environment entail affective processes, a natural predisposition of our
organism to feel either attraction or rejection towards the objects, people or even ideas
involved in these interactions [1]. Emotions have a great impact on human behavior,
since they influence processes such as perception, attention, learning, memory or
decision-making [2]. The same mechanisms are activated in individual’s interaction
with all forms of media [3]. Thus, an efficient human-computer interaction (HCI) is
highly dependant on the ability of computer-based applications to express emotions,
interpret users’ affective states and understand the role of external and internal influ-
ences on affective responses [4].
The link between HCI and emotional reactions is becoming an important new re-
search area. While relatively much research has addressed the relation between visual
media content and affective reactions, research on the link between auditory
form/content and emotions is an under-researched area. It is also well established that
sounds evoke emotions and can provide affective information, perhaps more effectively
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 63–74, 2008.
© Springer-Verlag Berlin Heidelberg 2008
64 A. Tajadura-Jiménez and D. Västfjäll
than many other forms of information channels that are available to HCI designers and
researchers [5]. In our everyday life, sound often elicits emotional reactions in the lis-
tener. People can be startled by the sudden sound of a door slamming or a thunder in a
storm, annoyed by the noise of cars in the street, pleased by the sound of a water stream
in the forest, tired after a full day of work in a noisy environment, etc. Nowadays,
sounds are used in many HCI scenarios. For instance, auditory information/feedback in
the form of earcones (abstract musical sounds) and auditory icons (a sound caricature of
the intended action the user is supposed to take or has taken) are used in desktop com-
puters and many other applications. In some systems (i.e. vehicles and working envi-
ronments) sounds are used to convey different forms of alerts and warnings. Today,
many of these signals have to be learned through statistical association between the
sound going off and an event taking place. Designing sounds that immediately convey
affective information (i.e. danger) and create an affective reaction (i.e. mild fear or anxi-
ety) also helps facilitate correct action [6].
If we consider that humans are continuously exposed to sound, both in real world
and when interacting with many media applications, it seems surprising that so little is
known about how users respond to affective auditory stimuli. Therefore, there is a
great need for a theory, a systematic approach and a consensus for measuring human
emotional responses to sound [7]. Understanding the role of sound in evoking human
affective responses might improve our quality of life by helping to design spaces and
media applications which are emotionally optimized.
This article provides some examples of the research targeting affective reactions to
everyday sounds. Up to date, research in this area has been trying to connect physical
sound properties and basic emotions [7]. However, it seems more meaningful to di-
vide ongoing research on affective reactions to sounds into several main categories
including: 1) physical determinants, 2) psychological determinants, 3) spatial deter-
minants and 4) cross-modal determinants [8].
Psychological determinants concern other variables related to subjective interpreta-
tion and meaning that should be considered because different sources evoke different
subjective evaluations [9] (e.g. dog barking vs. rock music). Spatial determinants deal
with the role of the auditory space in creating an emotional response (for instance, the bark-
ing dog will have different emotional effects if the spatial cues would suggest that the
space is small versus big). Finally, cross-modal effects concern the relation between
different modalities in producing an affective reaction [10]. While much research still
is needed to fully understand the different determinants of affective reactions to dif-
ferent categories of sounds, we highlight these four categories in the presented case
study. We show how reproduction parameters and meaning of sound, apart from
physical features, can affect emotional experience.
Even though the physical properties of the sound undoubtedly play a big role on the
affective reactions induced, research on ecological sounds, i.e. sounds surrounding us
in everyday environments, has shown that other psychological factors, related to sub-
jective interpretation, need to be considered. Ecological sounds can be divided on
artificial and natural sounds. Studies with artificial sounds comprise, for instance, the
experiments by Bisping [32-33] with sounds in the interior of a car, or the ones by
Västfjäll et al. [7] with sounds in the interior of a commercial aircraft. These studies
showed the validity of a two-dimensional space (with pleasantness and either power-
fulness [32-33] or activation [7] as coordinates) to classify different affective reac-
tions to artificial sound. In addition, Bisping reported that the different classifications
of the engine sounds in the affective space were mainly based in the envelope of the
low frequency components. Västfjäll et al. found significant correlations between
affective reactions and perceptual and cognitive ratings: pleasantness correlated with
loudness and naturalness of the reproduced sound, while activation was related to
sharpness (high frequency components), fluctuation strength (amplitude- and fre-
quency modulation between 15 and 300 Hz) and prominence ratio (tonal vs. noise
66 A. Tajadura-Jiménez and D. Västfjäll
spectra components). Their experiments also showed that pleasantness increases with
loudness and audibility of tones, while activation increases with noise spectra level.
Natural sounds have been also characterized in terms of their affective quality. For
instance, Björk’s [34] studies placed 15 natural sounds in the valence-arousal dimen-
sional space and Jäncke et al. [9] observed the physiological effect of environmental
sounds with different valences (bird song, church bell and baby’s crying). In a more
ambitious project, Bradley and Lang [35] used graphical scales and psychophysi-
ological measures to characterize the affective experience when exposed to sixty
naturally acoustic stimuli. Physiological changes showed to be highly correlated with
self-reported emotional reactions in terms of valence, arousal and dominance. In addi-
tion, they looked at a physical determinant, the equivalent sound level, and found
almost no correlation with valence ratings (r = .07) and only a moderate one with
activation ratings (r = .38). These correlations only accounted for 14% of the vari-
ance, thus supporting the suggestion made in this article that emotional reactions are
due to other aspects of the stimuli apart from physical properties. The set of digitized
sounds used by Bradley and Lang [17], together with their normative affective rat-
ings, served to develop the International Affective Digitized Sounds (IADS) for use in
emotions research [36].
Human speech and vocal cues are more complex acoustic stimuli which are
frequently included in human-computer interfaces and have also been often used in
emotion research (e.g. emotional speech synthesis [37]; prosody of speech [38]; see
also [39-41], in this volume). Wexler et al. [42] designed an experiment with semanti-
cally emotional words. Emotional and neutral words were overlapped to form dichotic
stimulus pairs and presented in such a way that in many cases emotional words were not
consciously heard. Physiological measures (electroencephalography and facial electro-
myography) provided evidence that emotional processing occurred even in the cases
where words did not reached conscious awareness. In another study, Hietanen et al. [43]
explored how tendencies to approach or withdraw varied in response to vocal affective
expressions. Results suggested that emotional expressions may be contained in vocal
cues. In human social interaction, vocal cues are as important as visual cues, and human
speech contains many features (melody, rate, pauses, intonation, etc.) which inform us
about the speaker’s affective state. It should however be noted that vocal cues often
exhibit the same properties as sounds [5, 44].
A particular case of ecological sounds is self-representation sounds [45] which can
be associated with a person’s own body (e.g. heartbeat, breathing) and its embodied
activity (e.g. footsteps, chewing or smoking). These sounds increase body awareness
in listeners and they might have a stronger potential for inducing an emotional experi-
ence. In [46], adding naturally breath intake sounds to synthetic speech aided listeners
to recall sentences. Eating has been described in [47] as an emotional experience
which involves “being aware of and listening to the crunch of each bite and noise of
the chewing sound in your head”. Fast heartbeat sounds have been shown to increase
self-reported arousal (e.g. [48]). In particular, body sounds such as a heartbeat may
force a listener to physiologically mimic the external auditory stimulation, due both to
the fundamental tendency of our organism to couple internal biophysiological rhythm
to external auditory drivers, and also to additional cognitive effects (self-identification
with that sound or empathy, see next section, [49] and references therein). These
Auditory-Induced Emotion: A Neglected Channel for Communication in HCI 67
changes at the physiological level can affect one’s emotional experience (e.g. [50-51])
or induce a particular mood state.
Finally, music has the ability both to express and to produce emotions in listeners,
even though it does not have obvious implications for our life goals [52]. Its role in
evoking emotions is still not understood, but music is already used in many applica-
tions which try to elicit affective responses such as film music, marketing and music
therapy [53]. There are a number of studies giving evidence of emotional responses to
music (for an extensive overview see [54]; see also [55], in this volume). In general,
results of these studies indicate that judgments of musical emotions are quick, innate,
hard-wired and automatic, i.e. attention does not need to be involved.
New media applications, such as computer games, home theatre systems or virtual
reality, incorporate the most advanced digital sound reproduction systems. However,
there are few studies exploring the impact of sound rendering or spatial reproduction
techniques on emotional responses. In [1] the startle response when using auditory
probes presented by means of different techniques was measured; binaural sound or
monaural sound stimulating either the left or right ear were tested. Their results
showed that binaural cues have the largest startle effect. It was also found that the
monaural-left ear stimulation was more effective in producing a startle effect than the
right ear. The authors suggested that this finding might be due to dominance of the
right-hemisphere when processing affective stimuli. Other studies have also shown a
bigger sensitivity of the left ear to emotional speech and music (see [1] and references
therein).
The experienced emotions and the subjective sense of presence in auditory virtual
environments were investigated in [56]. Presence can be defined as the sensation of
'being there' in mediated environments such as virtual reality, simulators, cinema, televi-
sion, etc. [57]. In this study, mono, stereo and six-channel loudspeaker reproductions
were used. Results showed that emotion and presence were unavoidably linked, and that
they both increase when improving sound spatial resolution (see also [58-60]).
Another study on this topic compared the loudspeaker and headphone reproduction
conditions when listening to news [61]. In was hypothesized that the close sound
condition (headphones) would provide a more immersive experience and shorten the
interpersonal distance between the user and the news anchor, thus providing a more
intense, arousing and pleasant experience. Generally, results showed that the head-
phone listening was preferred over loudspeakers, even being judged as ‘less realistic’
by the participants. However, no significant differences in elicited arousal were found
between both conditions.
Recent studies in multisensory research have shown that information which is only
available in one sensory modality is in many cases integrated and used by other sen-
sory modalities (for a recent review see [62]). This is also true for emotional informa-
tion. An example can be found in the multisensory integration of the information
expressing emotions in seen faces and heard speech, where the perceived hedonic
68 A. Tajadura-Jiménez and D. Västfjäll
valence of stimuli in one sensory modality is altered by the valence of stimuli in other
sensory modality [10]. Another study [63] showed how emotional arousal evoked by
pictures depicting threatening stimuli (snakes and spiders) is transferred to the tactile
modality. Similarly, results from the Bradley and Lang experiment with natural
sounds described above [35] were similar to the ones obtained with pictorial stimuli
what proves that affective processing and reactivity are not sensory modality specific.
Future research needs to address the issue of how emotion is transferred or interact
between sensory modalities.
The ongoing mood can affect the responses to subsequent emotional events (situation
appraisal). For instance, pre-existing cognitive processes and emotional states may
influence the judgment of an auditory event [25]. The study reported in [64] success-
fully showed that both current mood and individual noise sensitivity are as important
as noise characteristics when judging noise annoyance. It has been also shown [1] that
affective responses can vary in magnitude depending on the previously induced emo-
tional state: a reflex with the same valence than the current emotional state will be
increased and inhibited otherwise.
Individual differences, such as personality traits, may also play a role in the re-
sponse to sound. In studies considering factors such as speech rate, loudness and dis-
tance to sound [61], it was shown that people preferentially process auditory stimuli
emotionally congruent with their personality traits. In the same way that speech can
express a variety of emotional states (e.g. arousal is revealed in increased pitch, loud-
ness and rate), and individual characteristics (e.g. fast speech can be related to extra-
version), people tend to prefer voices which matches their personality traits or their
current emotional state. This can be extrapolated to other audio characteristics (e.g.
background music) and to media messages (see [61] and references therein).
In each trial heart beat sounds were presented during 50 seconds, and subjects’ task
was to rate a photograph viewed during the last 6 seconds of the trial. Heart beat
sounds could be presented at a medium or high rate (60 versus 110 beats per minute),
and at close or far distance (headphones versus loudspeakers). Sound level was set at
approximately 60 dB(A). Silence conditions were also included as a baseline for
comparisons. 34 photographs, with positive or negative valence, were selected from
the International Affective Picture System (IAPS, a set of normative pictures rated in
an arousal/valence/dominance dimensional affective scale [66]), according to their
medium-arousal value (5 on a 9-point scale) and valence (moderate negative and
positive valence – 3 and 7 respectively on a 9-point scale) and presented on a flat
projection screen placed at 1.7 meters distance from participants (768x576 pixels
resolution and 33°×26° field-of-view).
Participants’ peripheral heartbeat signals were collected by means of a heart rate
sensor attached to an earclip. Self-reported valence and arousal ratings for the pic-
tures were collected by using the Self-Assessment manikin (SAM), a 9-point picto-
rial scale developed by Lang [67]. Finally, a free-recall task for the photographs
shown was implemented at the end of the experiment and memory performance
scores were collected.
Results showed a small but significant (p < 0.05) effect of sound on physiology
(heart beat changes around one beat per minute after forty seconds of heartbeat sound
presentation). The influence of sound was significant as observed on the emotional
responses to pictures: fast heart rate made people rate pictures as more arousing, and
increased memory performance, while slow heart rate showed a relaxing effect when
facing negative pictures. As hypothesized, there was a stronger effect of rate for the
close sound reproduction condition (headphones); the effect was observed both at the
physiological level and on the self-reported arousal ratings.
In summary, the presented results give further support to the idea of the amodal
character (in terms of sensory modality) of affective reactions, since here sounds
affected emotional judgments of visual stimuli. They also suggest that the emotion
eliciting power of auditory modality is influenced by the perceived distance, with
close stimulation being more affective than distant one. Close is intimate, arousing,
engaging [3]. Moreover, this study highlights the possibility of considering the affec-
tive power of self-representation sounds or other stimuli related to one’s body (here
body sounds). Future research needs to clarify whether the effects found can be ac-
counted only to the distance cues or to the fact that the sound used was from the self-
representation category. The implications of these findings for the design of media
applications are discussed in the next section.
4 Conclusions
Traditionally visual domain has captured the main interest in multimedia applications.
In search of pictorial realism other sensory modalities were often neglected. However,
there is evidence to affirm that people are much more sensitive to audio fidelity than
to visual fidelity [3]. This means that sound may compensate for visual imperfections.
Therefore, audiovisual applications can be optimized by making use of audio tech-
nologies, which tend to be technically less complex than visual rendering systems. In
particular, sound might be considered in the design of affective human-computer
70 A. Tajadura-Jiménez and D. Västfjäll
interfaces. Sound is capable of eliciting a full range of emotions, which can vary ac-
cording to factors such as the physical properties of the acoustical signal, the subjec-
tive interpretation and meaning of the sound, sound rendering techniques (the spatial
dimension), situation appraisal and as a result of cross-modal interaction with other
sensory modalities.
An open question in this area is the possibility for reliable predictions of the emo-
tional response to sound. Research has shown that it is possible to separate cognitive
evaluations and emotional reactions to auditory stimuli, and thus, in theory we might
potentially establish a relation between sound physical properties and affective re-
sponse [7]. When considering the meaning attributed to sound, this article provides
some examples showing that both artificial and natural sounds have an affective
power which is not only dependant on the physical properties. For instance, self-
representation sounds like a heartbeat, breath or footsteps might facilitate self-
identification with that sound (an embodied experience) and this might induce strong
affective processes. In particular, this type of sounds might be used in the design of
multimodal virtual environments, where research has already shown that visual cues
representing one’s body increase engagement [68]. In these virtual environments,
self-representation sounds would form part of a user’s multimodal virtual body [45].
Sound spatial dimension (resolution, distance to sound) has also been mentioned as
a determinant of listener’s emotional arousal. Arousing or intense emotional experi-
ences accompany engagement in media applications [69], something desired, for
instance, in e-learning environments, because arousal and positive experiences facili-
tate memory for events and encourage users to go on with the tasks [70].
In addition, the result of using different sensory modalities in combination needs to
be considered, given the amodal character of emotional processes. Future research
should also consider other factors which may influence emotional reactions to audi-
tory stimuli, such as situation appraisal and individual differences, personality traits or
individual goals.
Although the present review covers only a small sample of everyday sounds, the
principles considered here are likely to be virtually extended to any kind of sound
perception, and therefore, to the design of any system interacting with humans. Re-
search in human affective responses might help to improve our quality of life by
contributing to the design of spaces, objects and applications which are emotionally
optimized. For instance, our personal everyday life can be enhanced by including affec-
tive human-computer interfaces in workplace systems, because they increase motiva-
tion and persistency of users [70-71] (see also [72], in this volume); in the area of
health they can help in telemedicine applications (see [73], in this volume) or in tools
to fight against stress (see [74], in this volume) or fear (e.g. of public speak-
ing [75]); in e-learning environments, they may enhance memory [70]; and they can
even be useful in telerobotics, where adding an affective value make people understand
and empathize with the tasks and needs of robots [76] (see also [40], in this volume).
Acknowledgments. The work presented in this paper was supported by the EU grant
PRESENCCIA-IST-2006-27731 (www.presenccia.org). We thank Dr. Aleksander
Väljamäe, anonymous reviewers and editors for their helpful feedback in the manuscript
preparation.
Auditory-Induced Emotion: A Neglected Channel for Communication in HCI 71
References
1. Lang, P.J., Bradley, M.M., Cuthbert, B.N.: Emotion, Attention, and the Startle reflex. Psy-
chological Review 97, 377–395 (1990)
2. Phelps, E.A., LeDoux, J.E.: Contributions of the Amygdala to Emotion Processing: From
Animal Models to Human Behavior. Neuron 48, 175–187 (2005)
3. Reeves, B., Nass, C.: The Media Equation: How People Treat Computers, Television, and
New Media Like Real People and Places. Cambridge University Press, New York (1996)
4. Picard, R.W., Daily, S.B.: Evaluating Affective Interactions: Alternatives to Asking What
Users Feel. In: CHI Workshop on Evaluating Affective interfaces: Innovative Approaches,
Portland (April 2005)
5. Juslin, P., Västfjäll, D.: All Emotions are not Created Equal: Mechanism Underlying
Musical Emotions. Behavioral Brain Sciences (in press)
6. Sköld, A., Bergman, P., Västfjäll, D., Tajadura-Jiménez, A., Larsson, P.: Emotional Reac-
tions to Information and Warning Sounds. Acta Acustica (submitted)
7. Västfjäll, D., Kleiner, M.: Emotion in Product Sound Design. In: Proceedings of Journées
Design Sonore, Paris, March 20-21 (2002)
8. Västfjäll, D., Tajadura-Jiménez, A., Väljamäe, A., Juslin, P.: Non-Vocal, Non-Musical De-
terminants of Auditory Induced Emotions (in preparation)
9. Jäncke, L., Vogt, J., Musial, F., Lutz, K., Kalveram, K.T.: Facial EMG Responses to Audi-
tory Stimuli. International Journal of Psychophysiology 22, 85–96 (1996)
10. De Gelder, B., Bertelson, P.: Multisensory Integration, Perception and Ecological Validity.
Trends in Cognitive Sciences 7, 460–467 (2003)
11. Levenson, R.W.: Human emotions: A functional view. In: Ekman, P., Davidson, R.J. (eds.)
The Nature of Emotion: Fundamental Questions, pp. 123–126. Oxford University Press,
New York (1994)
12. Lazarus, R.S.: Emotion and Qdaptation. Oxford University Press, New York (1991)
13. Shiv, B., Fedorikhin, A.: Heart and Mind in Conflict: Interplay of Affect and Cognition in
Consumer Decision Making. Journal of Consumer Research 26, 278–282 (1999)
14. Ekman, P., Friesen, W.V., Ancoli, S.: Facial Signs of Emotional Experience. Journal of
Personality and Social Psychology 39, 1125–1134 (1980)
15. LeDoux, J.E.: The Emotional Brain: The mysterious Underpinnings of Emotional Life.
Simon & Schuster, New York (1996)
16. Levenson, R.W.: Emotion and the Autonomic Nervous System: A Prospectus for Research
on Autonomic Specificity. In: Wagner, H. (ed.) Social Psychology: Perspectives of Theory
and Clinical Applications, pp. 17–42. Wiley, London (1988)
17. Wundt, W.: Lectures on Human and Animal Psychology. Macmillan, New York (1896)
(J.E. Creighton & E.B Titchener, Trans.)
18. Osgood, C., Suci, G., Tannenbaum, P.: The Measurement of Meaning. University of
Illinois Press, Urbana (1957)
19. Lang, P.J.: The Emotion Probe: Studies of Motivation and Attention. American Psycholo-
gist 50, 372–385 (1995)
20. Russell, J.A.: The Circumplex Model of Affect. Journal of Personality and Social Psychol-
ogy 39, 1161–1178 (1980)
21. Palen, L., Bølen, S.: Don’t Get Emotional. In: Peter, C., Beale, R. (eds.) Affect and
Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
22. Bainbridge, W.S.: Computational Affective Sociology. In: Peter, C., Beale, R. (eds.) Af-
fect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg
(2008)
72 A. Tajadura-Jiménez and D. Västfjäll
23. Lichtenstein, A., Oehme, A., Kupschick, S., Jürgensohn, T.: Comparing Two Emotion
Models for Deriving Affective States from Physiological Data. In: Peter, C., Beale, R.
(eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer,
Heidelberg (2008)
24. Mahlke, S., Minge, M.: Consideration of Multiple Components of Emotions in Human-
Technology Interaction. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-
Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
25. Blauert, J., Jekosch, U.: Sound-Quality Evaluation: A Multi-Layered Problem. Acta
Acustica 83, 747–753 (1997)
26. Guski, R.: Psychological Methods for Evaluating Sound Quality and Assessing Acoustic
Information. Acta Acustica 83, 765–774 (1997)
27. Widman, U.: Aurally Adequate Evaluation of Sounds. In: Proc. of Euro Noise, vol. 98, pp.
29–46 (1998)
28. Todd, N.: Evidence for a Behavioral Significance of Saccular Acoustic Sensitivity in Hu-
mans. Journal of the Acoustical Society of America 110, 380–390 (2001)
29. Sokolov, E.N.: Perception and the Conditioned Reflex. Pergamon Press, Oxford (1963)
30. Dimberg, U.: Facial Electromyography and Emotional Reactions. Psychophysiology 27(5),
481–494 (1990)
31. Kjellberg, A., Skoldstrom, B., Tesarz, M., Dallner, M.: Facial EMG Responses to Noise.
Percept Mot Skills 79(3 Pt 1), 1203–1216 (1994)
32. Bisping, R.: Emotional Effect of Car Interior Sounds: Pleasantness and Power and their
Relation to Acoustic Key Features. SAE paper 951284, 1203–1209 (1995)
33. Bisping, R.: Car Interior Sound Quality: Experimental Analysis by Synthesis. Acta Acus-
tica 83, 813–818 (1997)
34. Björk, E.A.: The Perceived Quality of Natural Sounds. Acustica 57, 185–188 (1985)
35. Bradley, M.M., Lang, P.J.: Affective Reactions to Acoustic Stimuli. Psychophysiology 37,
204–215 (2000)
36. Bradley, M.M., Lang, P.J.: International Affective Digitized Sounds (IADS): Stimuli, In-
struction Manual and Affective Ratings (Tech. Rep. No. B-2). Gainesville, FL: The Center
for Research in Psychophysiology. University of Florida (1999)
37. Schröder, M.: Emotional Speech Synthesis: A Review. In: Proceedings of Eurospeech
2001, Scandinavia (2001)
38. Hermann, T., Ritter, H.: Sound and Meaning in Auditory Data Display. Proceedings of the
IEEE 92(4) (2004)
39. Vogt, T., André, E., Wagner, J.: Automatic Recognition of Emotions from Speech: a Re-
view of the Literature and Recommendations for Practical Realisation. In: Peter, C., Beale,
R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer,
Heidelberg (2008)
40. Jones, C., Deeming, A.: Affective Human-Robotic Interaction. In: Peter, C., Beale, R.
(eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer,
Heidelberg (2008)
41. Jones, C., Sutherland, J.: Acoustic Emotion Recognition for Affective Computer Gaming.
In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS,
vol. 4868.Springer, Heidelberg (2008)
42. Wexler, B.E., Warrenburg, S., Schwartz, G.E., Janer, L.D.: EEG and EMG Responses to
Emotion-Evoking Stimuli Processed Without Conscious Awareness. Neuropsycholo-
gia 30(12), 1065–1079 (1992)
43. Hietanen, J.K., Surakka, V., Linnankoski, L.: Facial Electromyographic Responses to
Vocal Affect Expressions. Psychophysiology, 35, 530–536 (1998)
Auditory-Induced Emotion: A Neglected Channel for Communication in HCI 73
44. Scherer, K.R.: Acoustic Concomitants of Emotion Dimensions: Judging Affect from Syn-
thesized Tone Sequences. In: Weitz, S. (ed.) Nonverbal Communication: Readings with
Commentary, pp. 249–253. Oxford University Press, New York (1974)
45. Väljamäe, A., Larsson, P., Västfjäll, D., Kleiner, M.: Sound Representing Self-Motion in
Virtual Environments Enhances Linear Vection. Presence: Teleoperators and Virtual Envi-
ronments (to appear)
46. Whalen, D.H., Hoequist, C.E., Sheffert, S.M.: The Effects of Breath Sounds on the Percep-
tion of Synthetic Speech. Journal of Acoustic Society of America 97, 3147–3153 (1995)
47. Albers, S.: Eating Mindfully: How to End Mindless Eating and Enjoy a Balanced Rela-
tionship with Food. New Harbinger Publications (2003)
48. Woll, S.B., McFall, M.E.: The Effects of False Feedback on Attributed Arousal and Rated
Attractiveness in Female Subjects. Journal of Personality 47, 214–229 (1979)
49. Scherer, K.R., Zentner, M.R.: Emotional Effects of Music: Production Rules. In: Juslin,
P.N., Sloboda, J.A. (eds.) Music and Emotion: Theory and Research. Oxford University
Press, New York (2001)
50. James, W.: The Principles of Psychology. Holt, New York (1890)
51. Schachter, S., Singer, J.E.: Cognitive, Social and Physiological Determinants of Emotional
State. Psychol. Review 69, 379–399 (1962)
52. Juslin, P.N., Västfjäll, D.: Lost in a Feeling? A Model that can Guide the Study of Music
and Emotion (submitted)
53. Thaut, M.H.: Neuropsychological Processes in Music Perception and their Relevance in
Music Therapy. In: Unkeler, R.F. (ed.) Music Therapy in the Treatment of Adults with
Mental disorders, pp. 3–31. Schirmer books, New York (1990)
54. Juslin, P.N., Sloboda, J.A.: Music and Emotion. In: Theory and Research. Oxford Univer-
sity Press, New York (2001)
55. Loviscach, J., Oswald, D.: In the Mood: Tagging Music with Affects. In: Peter, C., Beale,
R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer,
Heidelberg (2008)
56. Västfjäll, D.: The Subjective Sense of Presence, Emotion Recognition, and Experienced
Emotions in Auditory Virtual Environments. CyberPsychology & Behavior 6(2), 181–188
(2003)
57. Freeman, J., Avons, S.E., Pearson, D.E., Ijsselsteijn, W.A.: Effects of Sensory Information
and Prior Experience on Direct Subjective Ratings of Presence. Presence: Teleoperators
and Virtual Environments 8(1), 1–13 (1999)
58. Frijda, N.H.: Emotions are Functional, Most of the Time. In: Ekman, P., Davidson, R.J.
(eds.) The nature of emotion, pp. 112–122. Oxford University Press, New York (1994)
59. Hendrix, C., Barfield, W.: The sense of Presence Within Auditory Virtual Environments.
Presence: Teleoperators and Virtual Environments 3, 290–301 (1996)
60. Larsson, P., Väjamäe, V.D., Kleiner, M.: Auditory Induced Presence in Mediated Envi-
ronments and Related Technology. In: Biocca, F., IJsselsteijn, W.A., Freeman, J.J. (eds.)
Handbook of Presence. Lawrence Erlbaum, Mahwah (in press)
61. Kallinen, K., Ravaja, N.: Comparing Speakers versus Headphones in Listening to News
from a Computer: Individual Differences and Psychophysiological Responses. Computers
in Human Behavior 23, 303–317 (2007)
62. Spence, C., Driver, J. (eds.): Crossmodal Space and Cross-modal Attention. Oxford Uni-
versity Press, Oxford (2004)
63. Poliakoff, E., Miles, E., Li, X., Blanchette, I.: The effect of Visual Threat on Spatial Atten-
tion to Touch. Cognition 102(3), 405–414 (2007)
74 A. Tajadura-Jiménez and D. Västfjäll
64. Västfjäll, D.: Influences of Current Mood and Noise Sensitivity on Judgments of noise an-
noyance. The Journal of Psychology 136, 357–370 (2002)
65. Tajadura-Jiménez, A., Väljamäe, A., Västfjäll, D.: Self-Representation in Mediated Envi-
ronments: The Experience of Emotions Modulated by Auditory-Vibrotactile Heartbeat (in
press)
66. Lang, P.J., Bradley, M.M., Cuthbert, B.N.: International Affective Picture System (IAPS):
Affective Ratings of Pictures and Instruction Manual. Technical Report A-6. Univ. Flor-
ida, Gainesville, FL (2005)
67. Lang, P.J.: Behavioral Treatment and Bio-Behavioral Assessment: Computer Applications.
In: Sidowski, J.B., Johnson, J.H., Williams, T.A. (eds.) Technology in Mental Health Care
Delivery Systems, pp. 119–137. Ablex Publishing, Norwood (1980)
68. Slater, M., Usoh, M.: Body Centred Interaction in Immersive Virtual Environments.
In: Magnenat Thalmann, N., Thalmann, D. (eds.) Artificial Life and Virtual Reality, pp.
125–148. John Wiley and Sons, Chichester (1994)
69. Watson, D., Tellegen, A.: Toward a Consensual Structure of Mood. Psychological Bulle-
tin 98, 219–235 (1985)
70. Graf, C., Niebuhr, S., Kohler, K.: Enhancing Business Software through Fun-of-Use: A
Pattern-based approach. In: Position Paper for Workshop on The Role of Emotion in HCI
2006, London, September 12-15 (2006)
71. Walkinshaw, O.: A Photo a Day: Is it Work, Rest or Play? In: Position Paper for Work-
shop on The Role of Emotion in HCI 2006, London (September 12-15, 2006)
72. Harbich, S., Hassenzahl, M.: Beyond Task Completition in the Workplace: Execute,
Engage, Evolve, Expand. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-
Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
73. Creed, C., Beale, R.: Simulated Emotion in Affective Embodied Agents. In: Peter, C.,
Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868.
Springer, Heidelberg (2008)
74. Millard, N., Hole, L.: In the Moodie: Using Affective Widgets to Help Contact Centre Ad-
visors Fight Stress. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer
Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
75. Pertaub, D.P., Slater, M., Barker, C.: An Experiment on Public Speaking Anxiety in Re-
sponse to Three Different Types of Virtual Audience. Presence: Teleoperators and Virtual
Environments 11, 68–78 (2002)
76. Shick, A.: Duckie and Dudle: a preliminary demonstration using affective display to com-
municate task status. In: Position Paper for Workshop on the Role of Emotion in HCI
2006: Engage, London, September 12-15 (2006)
Automatic Recognition of Emotions from
Speech: A Review of the Literature and
Recommendations for Practical Realisation
1 Introduction
Automatic emotion recognition from speech has in the last decade shifted from a
side issue to a major topic in human computer interaction and speech processing.
The aim is to enable a very natural interaction with the computer by speaking
instead of using traditional input devices and not only have the machine under-
stand the verbal content, but also more subtle cues such as affect that any human
listener would easily react to. This can be used in spoken dialogue systems, e.g.
in call center applications. However, so far real-time emotion recognition has
scarcely been attempted and if so, only in prototypical applications, as there are
still many problems that are not yet solved appropriately.
In this article, we focus on technical challenges that arise when equipping
human-computer interfaces with the ability to recognize the user’s vocal emo-
tions. Therefore, we will start with a short introduction into the acoustic proper-
ties of voice that are relevant for emotions as identified by psychological studies
and move on to a discussion of databases with emotional speech. To give the
reader an idea of how to employ information on the user’s emotional state in
human-computer interfaces, we then present a number of promising application
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 75–91, 2008.
c Springer-Verlag Berlin Heidelberg 2008
76 T. Vogt, E. André, and J. Wagner
fields. After that, we address the three main parts of automatic emotion recogni-
tion, namely finding appropriate audio units, feature extraction, and classifica-
tion, that pose the hardest problems. Last, we exemplify the major difficulties of
real-time emotion recognition by means of a sample application. The overall ob-
jective of the paper is to help readers to assess the feasibility of human-computer
interfaces that are sensitive to the user’s emotional voice and to provide them
with guidelines of how to technically realize such interfaces.
3 Databases
Databases with emotional speech are not only essential for psychological studies,
but also for automatic emotion recognition, as standard methods are statistical
and need to learn by examples. Generally, research deals with databases of acted,
induced or completely spontaneous emotions. Of course, the complexity of the
task increases with the naturalness. So at the beginning of the research on au-
tomatic vocal emotion recognition, which started seriously in the mid-90s, work
began with acted speech [4] and shifts now towards more realistic data [5,6].
Prominent examples for acted databases are the Berlin database of emotional
speech [7] and the Danish Emotional Speech corpus (DES) [8] which hold record-
ings of 10 resp. 4 test persons that were asked to speak sentences of emotionally
neutral content in 7 resp. 5 basic emotions. Induced data is for instance the
SmartKom corpus [9] and the German Aibo emotion corpus [10] where people
where recorded in a lab setting fulfilling a certain task that was intended to
elicit e. g. anger or irritation in the subjects without them knowing that their
emotional state was of interest. The call center communication dealt with by
Devillers and colleagues [5] is fully realistic as it is obtained from live recordings.
The labeled emotions in the databases — and consequently also the emotions
that are going to be recognised — can be a classic set of basic emotions like
joy, anger, sadness, disgust. Alternatively, emotion states can be placed within
a dimensional model of two or three affective dimensions (see Fig. 1). The di-
mensions are usually valence (from positive to negative) and arousal (from high
to low), sometimes a third dimension like stance (from open to close) is added.
A dimensional model allows for a continuous description which is very suitable
78 T. Vogt, E. André, and J. Wagner
Fig. 1. A two-dimensional emotion space with a valence and an arousal axis. Basic
Emotions are marked as areas within the space.
Fig. 2. Types of databases used for emotion recognition and their difficulty
obvious settings are the distinction between problem and no problem in a di-
alogue [16], or the detection of engagement [12]. The term “emotion” is thus
interpreted widely and would rather comprise all affect related user states that
occur in human-computer interaction. However, the more realistic the data is,
the smaller is the number of classes that can feasibly be processed. Figure 2
illustrates how the difficulty of emotion recognition increases with the type of
data used.
4 Applications
Call centre conversations belong to the most popular applications for approaches
to the automated recognition of emotions from speech. On the one hand, a system
may provide human operators with information regarding the emotions their
voice might portray. That is, the system serves as a kind of “Affective Mirror” [17]
that helps users to improve their interaction skills. An example includes the Jerk-
O-Meter that monitors attention (activity and stress) in a phone conversation,
based on speech feature analysis, and gives the user feedback allowing her to
change her manners if deemed appropriate [18]. On the other hand, mechanisms
for detecting emotions may be employed to sort voice messages according to the
emotions portrayed by the caller. Among other things, a dialogue system may
deploy knowledge on emotional user states to select appropriate conciliation
strategies and to decide whether or not to transfer the caller to a human agent.
An example includes the emotion-ware voice portal currently under development
at T-Systems [19]. Furthermore, information on the caller’s emotional state may
be used to predict system error rates. Riccardi and Hakkani-Tür [20] investigate
how the user’s emotional state affects the accuracy of the AT&T “How May
I Help You?” spoken dialogue system and conclude that the detection of the
caller’s emotional state may be beneficial for the adaptation of the system’s
dialogue strategies. In the case of anger, the performance of the dialogue system
tends to go down, for example. This knowledge may again be used to select
appropriate repair strategies.
Recently, methods for the recognition of emotions from speech have also been
explored within the context of computer-enhanced learning. The motivation
behind these approaches is the expectation that the learning process may be
improved if a tutoring system adapts its pedagogical strategies to a student’s
emotional state. For instance, Ai and colleagues [21] consider features extracted
from the dialogue between the tutor and the student, such as the prosody of
speech, as well as features relating to user and system performance for the emo-
tion recognition process in the ITSpoke tutoring system.
Starting in the last years, research has been conducted to explore the fea-
sibility and potential of emotionally aware in-car systems. This work is moti-
vated by empirical studies that provide evidence of the dependencies between
a driver’s performance and his or her emotional state. Emotion recognition
from speech in cars has so far been investigated e.g. in the Emotive Driver
project [22], and in the FERMUS project, a cooperation with the automobile
80 T. Vogt, E. André, and J. Wagner
The goal of the audio segmentation is to segment a speech signal into units
that are representative for emotions. These are usually linguistically motivated
middle-length time intervals such as words or utterances. Though the decision on
which kind of unit to take is evidently important, it has not received much atten-
tion in past research on emotion recognition. Most approaches so far have dealt
with utterances of acted emotions where the choice of unit is obviously just one
utterance, a well-defined linguistic unit with no change of emotion within in this
case. However, in spontaneous speech this kind of obvious unit does not exist.
Neither is the segmentation into utterances straight-forward nor can a constant
emotion be expected over an utterance. Generally speaking, a good emotion unit
has to fulfill certain requirements. In particular, it should be (1) long enough to
reliably calculate features by means of statistical functions and (2) short enough
to guarantee stable acoustic properties with respect to emotions within the seg-
ment: For features calculated from global statistics over an extraction unit, these
units need to have a minimum length. The more values statistical measures are
based on, the more expressive they are. On the other hand all alterations of the
emotional state should possibly be captured, so the unit should be short enough
that no change of emotion is likely to happen within. In addition, it should be so
short that the acoustic properties of the segment with respect to emotions are
stable, so that expressive features can be derived. This is particularly important
for features based on statistical measures, since e. g. the mean value of a very
inhomogeneous segment yields an inadequate description. So a compromise has
to be found for these two contradicting requirements.
So far, only few attempts have been undertaken to compare different types of
units. In [27], we compared utterances, words, words in context and fixed time
intervals and found larger, linguistically motivated units tending to be better.
Batliner et al. [16] grounded their features on words with a varying number of
context words. In addition to simple word-level recognition, they also mapped
word results onto turns and on chunks within the turns using two different
strategies. In a qualitative analysis of this they found both advantages and dis-
advantages of smaller units than turns, but they have not further quantitatively
explored it.
Generally, it strongly depends on the data which unit fits best. Most com-
monly dialogue turns, utterances or phrases as e. g. in [5,28,6,29,30] have been
used, but also words [16,31].
algorithm like principal components analysis (PCA) can be used to encode the
main information of the feature space more compactly. The start set of fea-
tures consisted originally mainly of pitch and energy related features, and these
continue to be the prominent features. Formants and Mel Frequency Cepstral
Coefficients (MFCC) are also frequently found. Durational and pause related
features are noted in several papers, as well as different types of voice quality
features. Spectral measures and parametric representations other than MFCCs
are less common, but include wavelets, Teager energy operator (TEO) based
features, log frequency power coefficients (LFPC) and linear prediction cepstral
coefficients (LPCC).
The raw pitch, energy, etc. contours can be used as is, and are then called
short-term features, or more often, the actual features are derived from these
acoustic variables by applying (statistic) functions over the sequence of values
within an emotion segment, thus called global statistics features. This could be
e g. mean pitch of a word or an utterance; further statistical measures are typi-
cally maximum, or minimum, etc. of the segment, but also regression, derivations
or other more complex functions. The choice of feature type also determines
the type of classifier. For global statistics features, a static classifier like Sup-
port Vector Machines (SVM), processing one instance at a time has to be used.
Short-term features require a dynamic classifier such as Hidden Markov Mod-
els (HMM). One can say, that in the first case, dynamic properties of emotions
should be captured by the features, while in the latter case, they are dealt with
by the classifier.
Some suprasegmental acoustic phenomena may also be considered as global
emotion features. Batliner et al. [32] and Devillers et al. [5] used those, among
them hyper-clear speech, pauses inside words, syllable lengthening, off-talk, resp.
disfluency cues, inspiration, expiration, mouth noise, laughter, crying, unintel-
ligible voice. Though these have been mainly annotated by hand, automatic
extraction would also be possible in some cases.
Furthermore, meta-data can be used to enhance recognition accuracy as e. g.
applied by Litman and colleagues [6]: They collected a corpus from a spoken
dialogue tutoring system in the physics domain and hence incorporated into
their feature set further application dependent knowledge like the respective
speaker, the gender and which of five available physics problems was treated.
Unfortunately, it is rarely possible to compare features across published work,
since conditions vary a lot and even slight changes in the general set-up can make
results incomparable. E. g. most researchers use their own recordings, and dif-
ferent data or particularly data types have a huge impact on the comparability
between two approaches. As for now, there don’t exist standard databases that
could be used for benchmarking. For one database, 50% accuracy may be excel-
lent for a 4-class problem, while for another database, recognition rates of 70%
to 80% can be reached. This does not mean that the database in the former case
was not well designed, but rather that it is a harder task and that can be due
to many factors. A rule of thumb for natural emotions is that recognition rate
is not much more than twice chance level, so for a 4-class problem, 50% is good.
Automatic Recognition of Emotions from Speech 83
Classifiers, target classes, speaker types also differ in the various publications
on automatic emotion recognition, so that from a comparison of the literature
no general statement can be made on which features are most successful. Of
course, comparisons of features within publications are made, e. g. through rel-
evance ranking by the information gain of single features [29,30] or by rank in
a sequential selection method [28,33]. Relevance ranking usually has the goal
to see the salience of single features, usually per feature type. However, a sin-
gle feature’s relevance does not necessarily imply usefulness in a set of features.
Another strategy is to have groups of features (e. g. prosodic, lexical, etc.) and
to look at the different performance of the groups or combinations of groups
e. g. [6,32,34]. No general conclusion can be drawn from the work on feature
evaluation, but pitch features have on various occasions shown not to be that
important as previously assumed [27,28,29,30]. As for now, this has however not
been confirmed for other emotion classification tasks. The CEICES (Combining
Efforts for Improving automatic Classification of Emotional user States) initia-
tive [15] is therefore aimed at finding a more general evaluation of features by
providing a database under fixed conditions and having different sites use their
own features and classifiers.
5.3 Classification
After the feature calculation, each input unit is represented by a feature vec-
tor, and the problem of emotion recognition can now be considered a general
data mining problem. So, in principle, any statistical classifier that can deal
with high-dimensional data can be used, but static classifiers like support vec-
tor machines, neural networks, and decision trees for global statistics features,
and HMM for short-term features as a dynamic modeling technique are most
commonly found in the literature on emotional speech recognition. All these
classifiers need training data to learn parameters.
Static classification has been more prevalent than dynamic classification in the
work on emotion recognition. It has proved to be successful for acted data, but for
more natural data, recognition accuracy is only useful in a problem with very few
emotion classes. Recent approaches try to enhance the recognition accuracy by a
multi-layered classification approach, like having several steps of classifying two
groups of the target emotion classes and always further splitting the “winning”
group in two as in the cascade-bisection process [35] or automatically separating
male and female voice before the actual emotion classification [36].
Dynamic classification with HMMs is used less often than static classification,
but is thought to be advantageous for better capturing the temporal activity in-
corporated in speech. So far, HMMs have almost exclusively been applied to
acted data, though they might even better be suited for natural emotions. An
HMM is a stochastic finite automaton, where each state models some character-
istics of the input signal and where the probability to pass to the next state only
depends from the previous state (cf. Fig. 4). In order to use HMMs for speech
emotion recognition, usually a single HMM is trained for each emotion and an
unknown sample is classified according to the model which describes the derived
84 T. Vogt, E. André, and J. Wagner
feature sequence best. Beside the use of appropriate speech features, the archi-
tecture of the HMM has main influence on its ability to capture those emotional
cues that help to distinguish among different emotions. In [37] we examined the
three parameters that are important for the model topology, number of states,
connectivity and output probabilities (discrete or continuous and number of
mixtures). Although it turned out that finding general tendencies was rather
difficult, since on the one hand quite different parameters sometimes gained the
same results, whereas on the other hand a slight parameter sometimes caused
a very different performance, we could conclude that for the model topology, a
medium number of 5 to 10 states per model is most often successful, and for
the output probabilities, this was the case for continuous probability densities
with a low number of mixtures. With respect to the connectivity of the states,
we found high connectivity not necessarily to be more suitable. Results showed
also that the network design seems to be relatively independent of the source of
speech (acted vs. spontaneous) and the segmentation level (word vs. utterance).
Fig. 4. Some HMM topologies that are suitable for emotion recognition
In Fig. 5, a user is engaged in a dialogue with the Greta agent from Pelachaud
and colleagues [46]. The user is reporting on a positive event that happened to
her to which Greta responds with a cheerful facial expression.
7 Conclusion
Acknowledgements
This work was partially supported by the European Community (EC) within the
network of excellence Humaine IST-507422, the eCIRCUS project IST-4-027656-
STP and the Callas project IST-034800. The authors are solely responsible for
the content of this publication. It does not represent the opinion of the EC, and
the EC is not responsible for any use that might be made of data appearing
therein.
References
1. Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A
review of the literature on human vocal emotion. Journal of the Acoustical Society
of America 93(2), 1097–1108 (1993)
2. Wilting, J., Krahmer, E., Swerts, M.: Real vs. acted emotional speech. In: Pro-
ceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)
3. Velten, E.: A laboratory task for induction of mood states. Behavior Research &
Therapy 6, 473–482 (1968)
4. Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proceedings
of ICSLP, Philadelphia, USA (1996)
Automatic Recognition of Emotions from Speech 89
5. Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation
and machine learning based detection. Neural Networks 18(4), 407–422 (2005)
6. Litman, D.J., Forbes-Riley, K.: Predicting student emotions in computer-human
tutoring dialogues. In: Proceedings of the 42nd Annual Meeting of the Association
for Computational Linguistics (ACL), Barcelona, Spain (2004)
7. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database
of German emotional speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal
(2005)
8. Engberg, I.S., Hansen, A.V.: Documentation of the Danish Emotional Speech
Database (DES). Technical report. Aalborg University, Aalborg, Denmark (1996)
9. Schiel, F., Steininger, S., Türk, U.: The SmartKom multimodal corpus at BAS.
In: Proceedings of the 3rd Language Resources & Evaluation Conference (LREC)
2002, Las Palmas, Gran Canaria, Spain, pp. 200–206 (2002)
10. Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.:
“You stupid tin box” - children interacting with the AIBO robot: A cross-linguistic
emotional speech corpus. In: Proceedings of the 4th International Conference of
Language Resources and Evaluation LREC 2004, Lisbon, pp. 171–174 (2004)
11. Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional space improves emo-
tion recognition. In: Proceedings International Conference on Spoken Language
Processing, Denver, pp. 2029–2032 (2002)
12. Yu, C., Aoki, P.M., Woodruff, A.: Detecting user engagement in everyday conver-
sations. In: Proceedings of Interspeech 2004 — ICSLP, Jeju, Korea, pp. 1329–1332
(2004)
13. Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G.,
Moosmayr, T.: On the necessity and feasibility of detecting a driver‘s emotional
state while driving. In: International Conference on Affective Computing and In-
telligent Interaction, Lisbon, Portugal, pp. 126–138 (2007)
14. Kollias, S.: ERMIS — Emotionally Rich Man-machine Intelligent System. (2002)
retrieved: 09.02.2007, http://www.image.ntua.gr/ermis/
15. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers,
L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for im-
proving automatic classification of emotional user states. In: IS-LTC 2006, Ljubl-
jana, Slovenia (2006)
16. Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: How to find trouble in
communication. Speech Communication 40, 117–143 (2003)
17. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1998)
18. Madan, A.: Jerk-O-Meter: Speech-Feature Analysis Provides Feedback on Your
Phone Interactions (2005), retrieved: 28.06.2007,
http://www.media.mit.edu/press/jerk-o-meter/
19. Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice
portal. In: Electronic Speech Signal Processing Conference, Prague, Czech Republic
(2005)
20. Riccardi, G., Hakkani-Tür, D.: Grounding emotions in human-machine conversa-
tional systems. In: Proceedings of Intelligent Technologies for Interactive Enter-
tainment, INTETAIN, Madonna di Campiglio, Italy (2005)
21. Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.:
Using system and user performance features to improve emotion detection in spo-
ken tutoring dialogs. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh,
PA, USA (2006)
90 T. Vogt, E. André, and J. Wagner
22. Jones, C., Jonsson, I.: Using Paralinguistic Cues in Speech to Recognise Emotions
in Older Car Drivers. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-
Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
23. Schuller, B., Rigoll, G., Grimm, M., Kroschel, K., Moosmayr, T., Ruske, G.: Effects
of in-car noise-conditions on the recognition of emotion within speech. In: Proc. of
the DAGA 2007, Stuttgart, Germany (2007)
24. Jones, C., Sutherland, J.: Acoustic Emotion Recognition for Affective Computer
Gaming. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer
Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
25. Jones, C., Deeming, A.: Affective Human-Robotic Interaction. In: Peter, C., Beale,
R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868.
Springer, Heidelberg (2008)
26. Hegel, F., Spexard, T., Vogt, T., Horstmann, G., Wrede, B.: Playing a different
imitation game: Interaction with an empathic android robot. In: Proc. 2006 IEEE-
RAS International Conference on Humanoid Robots (Humanoids 2006) (2006)
27. Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in
view of automatic emotion recognition. In: Proceedings of International Conference
on Multimedia & Expo., Amsterdam, The Netherlands (2005)
28. Fernandez, R., Picard, R.W.: Classical and novel discriminant features for affect
recognition from speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal
(2005)
29. Oudeyer, P.Y.: The production and recognition of emotions in speech: features
and algorithms. International Journal of Human-Computer Studies 59(1–2), 157–
183 (2003)
30. Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker independent emotion recog-
nition by early fusion of acoustic and linguistic features within ensembles. In: Pro-
ceedings of Interspeech 2005, Lisbon, Portugal (2005)
31. Nicholas, G., Rotaru, M., Litman, D.J.: Exploiting word-level features for emotion
recognition. In: Proceedings of the IEEE/ACL Workshop on Spoken Language
Technology, Aruba (2006)
32. Batliner, A., Zeißler, V., Frank, C., Adelhardt, J., Shi, R.P., Nöth, E.: We are not
amused - but how do you know? User states in a multi-modal dialogue system. In:
Proceedings of Eurospeech 2003, Geneva, Switzerland, pp. 733–736 (2003)
33. Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals.
In: Proceedings of Eurospeech 2003, Geneva, Switzerland, pp. 125–128 (2003)
34. Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE
Transaction on speech and audio processing 13(2), 293–303 (2005)
35. Zhang, S.,, P.: C.C., Kong, F.: Automatic emotion recognition of speech signal
in mandarin. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA
(2006)
36. Vogt, T., André, E.: Improving automatic emotion recognition from speech via
gender differentiation. In: Proc. Language Resources and Evaluation Conference
(LREC 2006), Genoa (2006)
37. Wagner, J., Vogt, T., André, E.: A systematic comparison of different hmm de-
signs for emotion recognition from acted and spontaneous speech. In: International
Conference on Affective Computing and Intelligent Interaction (ACII), Lisbon,
Portugal, pp. 114–125 (2007)
38. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden
markov models. Speech Communication 41, 603–623 (2003)
Automatic Recognition of Emotions from Speech 91
39. Petrushin, V.A.: Creating emotion recognition agents for speech signal. In: Daut-
enhahn, K., Bond, A.H., Canamero, L., Edmonds, B. (eds.) Socially Intelligent
Agents. Creating Relationships with Computers and Robots, pp. 77–84. Kluwer
Academic Publishers, Dordrecht (2002)
40. Scherer, K.R., Banse, R., Walbott, H.G., Goldbeck, T.: Vocal clues in emotion
encoding and decoding. Motivation and Emotion 15, 123–148 (1991)
41. Polzin, T.S., Waibel, A.H.: Detecting emotions in speech. In: Proceedings of Co-
operative Multimodal Communications, Tilburg, The Netherlands (1998)
42. Polzin, T.S., Waibel, A.H.: Emotion-sensitive human-computer interfaces. In:
Workshop on Speech and Emotion, Newcastle, Northern Ireland, UK, pp. 201–206
(2000)
43. Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A.: Emotion recognition based
on phoneme classes. In: Proceedings of Interspeech 2004 — ICSLP, Jeju, Korea
(2004)
44. Nogueiras, A., Moreno, A., Bonafonte, A., No, J.M.: Speech emotion recognition
using hidden markov models. In: Proceedings of Eurospeech, Aalborg, Denmark
(2001)
45. Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der
Werf, R.J., Morency, L.P.: Virtual rapport. In: 6th International Conference on
Intelligent Virtual Agents, Marina del Rey, USA (2006)
46. de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., de Carolis, B.: From Greta’s
mind to her face: modelling the dynamics of affective states in a conversational
embodied agent. International Journal of Human-Computer Studies 59, 81–118
(2003)
Emotion Recognition through Multiple Modalities:
Face, Body Gesture, Speech
1 Introduction
A challenging research issue and one that has been of growing importance to those
working in human-computer interaction is to endow a machine with an emotional
intelligence. Such a system must be able to create an affective interaction with users:
it must have the ability to perceive, interpret, express and regulate emotions [1]. In
this case, recognising users’ emotional state is one of the main requirements for com-
puters to successfully interact with humans [2]. Many works in affective computing
do not combine different modalities into a single system for the analysis of human
emotional behaviour: different channels of information (mainly facial expressions and
speech) are considered independently to each other. Further, there have been only a
few attempts to also consider the integration of information from body movement and
*
Ginevra Castellano is now at the Department of Computer Science, Queen Mary, University
of London, UK ([email protected]).
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 92–103, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Emotion Recognition through Multiple Modalities 93
gestures. Nevertheless, Sebe et al. [3] and Pantic et al. [4] highlight that an ideal sys-
tem for automatic analysis and recognition of human affective information should be
multimodal, as the human sensory system is. Moreover, studies from psychology
highlight the need to consider the integration of different behaviour modalities in
human-human communication [5] [6].
In this paper we present a multimodal approach for the recognition of eight acted
emotional states (anger, despair, interest, pleasure, sadness, irritation, joy and pride).
Our approach integrates information from facial expressions, body movement and
gestures and speech: we trained and tested a model with a Bayesian classifier, using a
multimodal corpus with ten subjects collected during the Third Summer School of the
HUMAINE EU-IST project, held in Genova in September 2006. In the following
sections we describe the approach by focusing on the analysis performed for each of
the three modalities considered in this work. We then compare different strategies to
perform the data fusion for the multimodal emotion recognition.
2 Related Work
Emotion recognition has been investigated with three main types of databases: acted
emotions, natural spontaneous emotions and elicited emotions. The best results are
generally obtained with acted emotion databases because they contain strong emo-
tional expressions. Literature on speech (see for example Banse and Scherer [7])
shows that the majority of studies have been conducted with emotional acted speech.
Feature sets for acted and spontaneous speech have recently been compared by [8].
Generally, few acted-emotion speech databases have included speakers with several
different native languages. More recently, some attempts to collect multimodal data
were made: some examples of multimodal databases can be found in [9] [10] [11].
In the area of unimodal emotion recognition, there have been many studies using
different, but single, modalities. Facial expressions [12] [13], vocal features [14] [15],
body movements and postures [16] [17] [18], physiological signals [19] have been
used as inputs during these attempts, while multimodal emotion recognition is cur-
rently gaining ground [20] [21] [22]. Nevertheless, most of the work has considered
the integration of information from facial expressions and speech and there have been
relatively a few attempts to combine information from body movement and gestures
in a multimodal framework. Gunes and Piccardi [23] for example fused at different
levels facial expressions and body gestures information for bimodal emotion recogni-
tion. Further, el Kaliouby and Robinson [24] proposed a vision-based computational
model to infer acted mental states from head movements and facial expressions.
A wide variety of machine learning techniques have been used in emotion recognition
approaches [2] [12]. Especially in the multimodal case [4], they all employ a large num-
ber of audio, visual or physiological features, a fact which usually impedes the training
process; therefore, it is necessary to find a way to reduce the number of utilised features
by picking out only those related to emotion. One possibility in this direction is to use
neural networks, since they enable us to pinpoint the most relevant features with respect
to the output, usually by observing their weights. An interesting work in this area is
the sensitivity analysis approach by Engelbrecht et al. [25]. Sebe et al. [3] highlight
that probabilistic graphical models, such as Hidden Markov Models, Bayesian networks
and Dynamic Bayesian networks are very well suited for fusing different sources of
94 G. Castellano, L. Kessous, and G. Caridakis
information in multimodal emotion recognition and can also handle noisy features and
missing values of features all by probabilistic inference.
In this work we combine a wrapper feature selection approach and a Bayesian clas-
sifier. The former reduces the number of features and the latter was used both for
unimodal and multimodal emotion recognition.
(a) (b)
We chose such a setup because the resolution required for the extraction of facial fea-
tures is much larger than the one for body movement detection or hand gesture tracking.
This could only be achieved if one camera zoomed in on the actor’s face. We adopted
some restrictions concerning the actor’s behaviour and clothing. Long sleeves and a
covered neck were preferred since the majority of the hand and head detection algorithms
are based on colour tracking. Further, a uniform background was used to make the back-
ground subtraction process easier. As for the facial features extraction process we consid-
ered some prerequisites such as the lack of eyeglasses, beards, and moustaches.
For the voice recordings we used a direct-to-disk computer-based system. The
speech samples were directly recorded on the hard disk of the computer using sound
editing software. We used an external sound card connected to the computer by IEEE
1394 High Speed Serial Bus (also known as FireWire or i.Link). A microphone
mounted on the actors’ shirt was connected to an HF emitter (wireless system emitter)
Emotion Recognition through Multiple Modalities 95
and the receiver was connected to the sound card using a XLR connector (balanced
audio connector for high quality microphones and connections between equipments).
The external sound card included a preamplifier (for two XLR inputs) that was used
in order to adjust the input gain and to minimise the impact of signal-to-noise ratio of
the recording system. The sampling rate of the recording was 44.1 kHz and the
quantization was 16 bit, mono.
3.2 Procedure
Participants were asked to act eight emotional states: anger, despair, interest, pleasure,
sadness, irritation, joy and pride, equally distributed in the space valence-arousal (see
Table 1).
During the recording process one of the authors had the role of director guiding the
actors through the process. Participants were asked to perform specific gestures that
exemplify each emotion. The director’s role was to instruct the subject on the proce-
dure (number of gestures’ repetitions, emotion sequence, etc.) and details of each
emotion and emotion-specific gesture. For example, for the despair emotion the sub-
ject was given a brief description of the emotion (e.g. “facing an existential problem
without solution, coupled with a refusal to accept the situation”) and if the subject had
required more details he would be given an example of a situation in which the spe-
cific emotion was present. All instructions were provided based on the procedure used
during the collection of the GEMEP corpus [10]. For selecting the emotion-specific
gestures we have borrowed ideas from a figure animation research area dealing with
posturing of a figure [26] and came up with the gestures shown in Table 1.
consonants (also known as plosives or stop-plosives) /t/ and /k/ and two identical vowels
/o/. This was done in order to allow the study of certain acoustic correlates. Then
'damato ma gali sa' is supposed to mean something like 'can you open it'. The word 'it'
could correspond to a folder, a file, a box, a door or whatever.
Each emotion was acted three times by each actor, so that we collected 240 posed
gestures, facial expressions and speech samples.
4 Feature Extraction
As first step the face was located, so that approximate facial feature locations could be
estimated from the head position and rotation. The face was segmented focusing on
the following facial areas: left eye/eyebrow, right eye/eyebrow, nose and mouth. Each
of those areas, called feature-candidate areas, contains the features whose boundaries
need to be extracted for our purposes. Inside the corresponding feature-candidate
areas precise feature extraction was performed for each facial feature, i.e. eyes, eye-
brows, mouth and nose, using a multi-cue approach, generating a small number of
intermediate feature masks. Feature masks generated for each facial feature were
fused together to produce the final mask for that feature. The mask fusion process
uses anthropometric criteria [27] to perform validation and weight assignment on each
intermediate mask; all the feature’s weighted masks are then fused to produce a final
mask along with confidence level estimation.
Since this procedure essentially locates and tracks points in the facial area, we chose
to work with MPEG-4 FAPs (Facial Animation Parameters) and not Action Units
(AUs), since the former are explicitly defined to measure the deformation of these fea-
ture points. Measurement of FAPs requires the availability of a frame where the sub-
ject’s expression is found to be neutral. This frame is called the neutral frame and is
manually selected from video sequences to be analysed or interactively provided to the
system when initially brought into a specific user’s ownership. The final feature masks
were used to extract 19 Feature Points (FPs) [28]; Feature Points obtained from each
frame were compared to FPs obtained from the neutral frame to estimate facial defor-
mations and produce the FAPs. Confidence levels on FAP estimation were derived from
the equivalent feature point confidence levels. The FAPs were used along with their
confidence levels to provide the facial expression estimation.
In accordance with the other modalities, facial features needed to be processed so as
to have one vector of values per sentence. FAPs originally correspond to every frame
in the sentence. A way to imprint the temporal evolution of the FAP values was to
calculate a set of statistical features over these values and their derivatives. The whole
process was inspired by the equivalent process performed in the acoustic features.
Tracking of body and hands of the subjects was done using the EyesWeb plat-
form [29]. Starting from the silhouette and the hands blobs of the actors, we extracted
five main expressive motion cues, using the EyesWeb Expressive Gesture Processing
Library [30]: quantity of motion and contraction index of the body, velocity, acceleration
Emotion Recognition through Multiple Modalities 97
and fluidity of the hand’s barycenter. Data were normalised according to the behav-
iour shown by each actor, considering the maximum and the minimum values of each
motion cue in each actor, in order to compare data from all the subjects.
Automatic extraction allows to obtain temporal series of the selected motion cues
over time, depending on the video frame rate. For each profile of the motion cues we
selected then a subset of features describing the dynamics of the cues over time.
Based on the model proposed in [31] we extracted the following dynamic indicators
of the motion cues temporal profile: initial and final slope, initial and final slope of
the main peak, maximum value, ratio between the maximum value and the duration of
the main peak, mean value, ratio between the mean and the maximum value, ratio
between the absolute maximum and the biggest following relative maximum, centroid
of energy, distance between maximum value and centroid of energy, symmetry index,
shift index of the main peak, number of peaks, number of peaks preceding the main
one, ratio between the main peak duration and the whole profile duration. This proc-
ess was made for each motion cue of all the videos of the corpus, so that each gesture
is characterised by a subset of 80 motion features.
Face Bayesian
Data Classifier
Speech Bayesian
Data Classifier Decision-level
fusion
Bayesian
Classifier
Feature-level
fusion
As shown in the left part of the diagram, a separate Bayesian classifier was used for
each modality (face, gestures, speech). All sets of data were normalised. Features discre-
tisation based on Kononenko’s MDL (minimum description length) criterion [34] was
done to reduce the learning complexity. A wrapper approach to feature subset selection
(which allows to evaluate the attribute sets by using a learning scheme) was used in
order to reduce the number of inputs to the classifiers and find the features that maxi-
mise the performance of the classifier. A best-first search method in forward direction
was used. Further, in all the systems, the corpus was trained and tested using the cross-
validation method.
To fuse facial expressions, gestures and speech information, two different appro-
aches were implemented (right of Figure 2): feature-level fusion, where a single
classifier with features of the three modalities is used; and decision-level fusion,
where a separate classifier is used for each modality and the outputs are combined a
posteriori. In the second approach the output was computed combining the posterior
probabilities of the unimodal systems. We made experiments using two different
approaches for the decision-level fusion. The first approach consisted of selecting the
emotion that received the highest probability in the three modalities (best probability
approach). The second approach (majority voting plus best probability) consisted of
selecting the emotion that corresponds to the majority of 'voting' from the three
modalities; if a majority was not possible to define (for example when each unimodal
system gives in output a different emotion), the emotion that received the highest
probability in the three modalities was selected.
6 Results
6.1 Emotion Recognition from Facial Expressions
Table 2 shows the confusion matrix of the emotion recognition system based on facial
expressions. The overall performance of this classifier was 48.3%. The most recog-
nised emotions were anger (56.67%), irritation, joy and pleasure (53.33%). Pride is
misclassified with pleasure (20%), while sadness is misclassified with irritation
(20%), an emotion in the same valence-arousal quadrant.
Emotion Recognition through Multiple Modalities 99
Table 2. Confusion matrix of the emotion recognition system based on facial expressions
a b c d E f g h
56.67 3.33 3.33 10 6.67 10 6.67 3.33 a Anger
10 40 13.33 10 0 13.33 3.33 10 b Despair
6.67 3.33 50 6.67 6.67 10 16.67 0 c Interest
10 6.67 10 53.33 3.33 6.67 3.33 6.67 d Irritation
3.33 0 13.33 16.67 53.33 10 0 3.33 e Joy
6.67 13.33 6.67 0 6.67 53.33 13.33 0 f Pleasure
6.67 3.33 16.67 6.67 13.33 20 33.33 0 g Pride
3.33 6.67 3.33 20 0 13.33 6.67 46.67 h Sadness
Table 3 shows the performance of the emotion recognition system. The overall per-
formance of this classifier was 67.1%. Anger and pride are recognised with very high
accuracy (80 and 96.67% respectively). Sadness was partly misclassified with pride
(36.67%).
a b c d e f g h
80 10 0 3.33 0 0 6.67 0 a Anger
3.33 56.67 6.67 0 0 0 26.67 6.67 b Despair
3.33 0 56.67 0 6.67 6.67 26.67 0 c Interest
0 10 0 63.33 0 0 26.67 0 d Irritation
0 10 0 6.67 60 0 23.33 0 e Joy
0 6.67 3.33 0 0 66.67 23.33 0 f Pleasure
0 0 0 3.33 0 0 96.67 0 g Pride
0 3.33 0 3.33 0 0 36.67 56.67 h Sadness
Table 4 displays the confusion matrix of the emotion recognition system based on
speech. The overall performance of this classifier was 57.1%. Anger and sadness are
classified with high accuracy (93.33 and 76.67% respectively). Despair obtained a
very low recognition rate and was mainly confused with pleasure (23.33%).
a b c d e f g h
93.33 0 3.33 3.33 0 0 0 0 a Anger
10 23.33 16.67 6.67 3.33 23.33 3.33 13.33 b Despair
6.67 0 60 10 0 16.67 3.33 3.33 c Interest
13.33 3.33 10 50 3.33 3.33 13.33 3.33 d Irritation
20 0 10 13.33 43.33 10 3.33 0 e Joy
3.33 6.67 6.67 6.67 0 53.33 6.67 16.67 f Pleasure
3.33 10 3.33 13.33 0 13.33 56.67 0 g Pride
0 6.67 3.33 10 0 3.33 0 76.67 h Sadness
100 G. Castellano, L. Kessous, and G. Caridakis
Table 5 displays the confusion matrix of the multimodal emotion recognition system.
The overall performance of this classifier was 78.3%, which is much higher than the
performance obtained by the most successful unimodal system, the one based on
gestures. The diagonal components reveal that all the emotions, apart from despair,
can be recognised with over 70% accuracy. Anger was the emotion recognised with
highest accuracy, as in all the unimodal systems.
a b c d e f g h
90 0 0 0 10 0 0 0 a Anger
0 53.33 3.33 16.67 6.67 0 10 10 b Despair
6.67 0 73.33 13.33 0 3.33 3.33 0 c Interest
0 6.67 0 76.67 6.67 3.33 0 6.67 d Irritation
0 0 0 0 93.33 0 6.67 0 e Joy
0 3.33 3.33 13.33 3.33 70 6.67 0 f Pleasure
3.33 3.33 0 3.33 0 0 86.67 3.33 g Pride
0 0 0 16.67 0 0 0 83.33 h Sadness
The approach based on decision-level fusion obtained lower recognition rates than
that based on feature-level fusion. The performance of the classifier was 74.6%, both
for the best probability and for the majority voting plus best probability approaches.
Table 6 shows the performance of the system with decision level integration using
the best probability approach. Anger was again the emotion recognised with highest
accuracy, but the recognition rate of the majority of emotions decreases with respect
to the feature-level integration.
a b c d e f g h
96,67 0 0 0 0 0 3,33 0 a Anger
13,33 53,33 6,67 0 0 3,33 13,33 10 b Despair
3,33 0 60 3,33 10 13,33 6,67 3,33 c Interest
13,33 6,67 6,67 60 0 3,33 0 10 d Irritation
0 0 10 3,33 86,67 0 0 0 e Joy
6,67 3,33 0 0 0 80 6,67 3,33 f Pleasure
3,33 0 6,67 0 0 10 80 0 g Pride
3,33 3,33 0 10 0 3,33 0 80 h Sadness
We tested our approach on a dataset of 240 samples for each modality (face, body,
speech). Considering the performances of the unimodal emotion recognition systems,
the one based on gestures appears to be the most successful, followed by the one
based on speech and then the one based on facial expressions. We note that in this
study we used emotion-specific gestures: these are gestures that are selected so as to
express each specific emotion. An alterative approach which may also be of interest
would be to recognise emotions from different expressivities of the same gesture (one
not necessarily associated with any specific emotion) performed under different
emotional conditions. This would allow good comparison with contemporary systems
based on facial expressions and speech and will be considered in our future work.
Fusing multimodal data greatly improved the recognition rates in comparison with the
unimodal systems: the multimodal approach gave an improvement of more than 10%
compared to the performance of the system based on gestures. Further, the fusion
performed at the feature level showed better performances than the one performed at
the decision-level, highlighting the processing of input data in a joint feature space as
the most successful approach in this case.
We can conclude that using three different modalities greatly increases the recogni-
tion performance of an automatic emotion recognition system. That is helpful also
when some values for features of some modalities are missing. On the other hand,
humans use more than one modality to recognise emotions and process signals in a
complementary manner, so it is expected that an automatic system shows a similar
behaviour. This study considered a restricted set of data, collected from a relatively
small group of subjects. Nevertheless, it represents a first attempt to fuse together
three different synchronised modalities, which is still uncommon in current research.
Future work will consider new multimodal recordings with a larger and more repre-
sentative set of subjects, as well as the investigation of the mutual relationship be-
tween audio-visual information.
Acknowledgments. The research work has been realised in the framework of the
EU-IST Project HUMAINE (Human-Machine Interaction Network on Emotion), a
Network of Excellence (NoE) in the EU 6th Framework Programme (2004–2007).
References
1. Picard, R.: Affective computing. MIT Press, Boston (1997)
2. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor,
J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine
(January 2001)
3. Sebe, N., Cohen, I., Huang, T.S.: Multimodal Emotion Recognition. Handbook of Pattern
Recognition and Computer Vision. World Scientific, Singapore (2005)
4. Pantic, M., Sebe, N., Cohn, J., Huang, T.S.: Affective Multimodal Human-Computer Interac-
tion. In: ACM Multimedia, Singapore, pp. 669–676 (November 2005)
5. Scherer, K.R., Wallbott, H.G.: Analysis of Nonverbal Behavior. In: Handbook Of Discourse:
Analysis, ch.11, vol. 2. Academic Press, London (1985)
6. Scherer, K.R., Ellgring, H.: Multimodal Expression of Emotion: Affect Programs or Compo-
nential Appraisal Patterns? Emotion 7(1) (2007)
102 G. Castellano, L. Kessous, and G. Caridakis
7. Banse, R., Scherer, K.R.: Acoustic Profiles in Vocal Emotion Expression. Journal of Person-
ality and Social Psychology, 614–636 (1996)
8. Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of
automatic emotion recognition. In: IEEE International Conference on Multimedia & Expo
(ICME 2005) (2005)
9. Gunes, H., Piccardi, M.: A Bimodal Face and Body Gesture Database for Automatic Analy-
sis of Human Nonverbal Affective Behavior. In: Proc. of ICPR 2006 the 18th International
Conference on Pattern Recognition, Hong Kong, China, August 20–24 (2006)
10. Bänziger, T., Pirker, H., Scherer, K.: Gemep - geneva multimodal emotion portrayals: a cor-
pus for the study of multimodal emotional expressions. In: Deviller, L., et al. (eds.) Proceed-
ings of LREC 2006 Workshop on Corpora for Research on Emotion and Affect, Genoa.
Italy, pp. 15–19 (2006)
11. Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new
generation of databases. Speech Communication 40, 33–60 (2003)
12. Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art.
IEEE Trans. on Pattern Analysis and Machine Intelligence 22(12), 1424–1445 (2000)
13. Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., Kollias, S.: Emotion
recognition through facial expression analysis based on a neurofuzzy network. Neural Net-
works 18(4), 423–435 (2005)
14. Cowie, R., Douglas-Cowie, E.: Automatic statistical analysis of the signal and prosodic signs
of emotion in speech. In: Proc. International Conf. on Spoken Language Processing, pp.
1989–1992 (1996)
15. Scherer, K.R.: Adding the affective dimension: A new look in speech analysis and synthesis.
In: Proc. International Conf. on Spoken Language Processing, pp. 1808–1811 (1996)
16. Camurri, A., Lagerlöf, I., Volpe, G.: Recognizing Emotion from Dance Movement: Com-
parison of Spectator Recognition and Automated Techniques. International Journal of
Human-Computer Studies 59(1-2), 213–225 (2003)
17. Bianchi-Berthouze, N., Kleinsmith, A.: A categorical approach to affective gesture recogni-
tion. Connection Science 15(4), 259–269 (2003)
18. Castellano, G., Villalba, S.D., Camurri, A.: Recognising Human Emotions from Body
Movement and Gesture Dynamics. In: Proc. of 2nd International Conference on Affective
Computing and Intelligent Interaction, Lisbon (2007)
19. Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional intelligence: Analysis of af-
fective physiological state. IEEE Trans. on Pattern Analysis and Machine Intelli-
gence 23(10), 1175–1191 (2001)
20. Pantic, M., Rothkrantz, L.J.M.: Towards an Affect-sensitive Multimodal Human-Computer
Interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)
21. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzaeh, A., Lee, S., Neumann,
U., Narayanan, S.: Analysis of Emotion Recognition using Facial Expressions, Speech and
Multimodal information. In: Proc. of ACM 6th int’l Conf. on Multimodal Interfaces (ICMI
2004), State College, PA, October 2004, pp. 205–211 (2004)
22. Kim, J., André, E., Rehm, M., Vogt, T., Wagner, J.: Integrating information from speech and
physiological signals to achieve emotional sensitivity. In: Proc. of the 9th European Confer-
ence on Speech Communication and Technology (2005)
23. Gunes, H., Piccardi, M.: Bi-modal emotion recognition from expressive face and body ges-
tures. Journal of Network and Computer Applications (2006), doi:10.1016/j.jnca.2006.09.007
24. el Kaliouby, R., Robinson, P.: Generalization of a Vision-Based Computational Model of
Mind-Reading. In: Proceedings of First International Conference on Affective Computing
and Intelligent Interfaces, pp. 582–589 (2005)
Emotion Recognition through Multiple Modalities 103
25. Engelbrecht, A.P., Fletcher, L., Cloete, I.: Variance analysis of sensitivity information for
pruning multilayer feedforward neural networks. In: IJCNN 1999. International Joint Confer-
ence on Neural Networks, vol. 3, pp. 1829–1833 (1999)
26. Densley, D.J., Willis, P.J.: Emotional posturing: a method towards achieving emotional fig-
ure animation. Computer Animation, 8 (1997)
27. Young, J.W.: Head and Face Anthropometry of Adult U.S. Civilians, FAA Civil Aeromedi-
cal Institute, 1963–1993 (final report, 1993)
28. Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., Kollias, S.: Parameterized facial expression
synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002(10),
1021–1038 (2002)
29. Camurri, A., Coletta, P., Massari, A., Mazzarino, B., Peri, M., Ricchetti, M., Ricci, A.,
Volpe, G.: Toward real-time multimodal processing: EyesWeb 4.0. In: Proc. AISB 2004
Convention: Motion, Emotion and Cognition, Leeds, UK (March 2004)
30. Camurri, A., Mazzarino, B., Volpe, G.: Analysis of Expressive Gesture: The Eyesweb Ex-
pressive Gesture Processing Library. In: Camurri, A., Volpe, G. (eds.) GW 2003. LNCS
(LNAI), vol. 2915. Springer, Heidelberg (2004)
31. Castellano, G., Camurri, A., Mazzarino, B., Volpe, G.: A mathematical model to analyse the
dynamics of gesture expressivity. In: Proc. of AISB 2007 Convention: Artificial and Ambient
Intelligence, Newcastle upon Tyne, UK (April 2007)
32. Kessous, L., Amir, N.: Comparison of feature extraction approaches based on the Bark
time/frequency representation for classification of expressive speechpaper submitted to Inter-
speech (2007)
33. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd
edn. Morgan Kaufmann, San Francisco (2005)
34. Kononenko, I.: On Biases in Estimating Multi-Valued Attributes. In: 14th International Joint
Conference on Articial Intelligence, pp. 1034–1040 (1995).
The Composite Sensing of Affect
Abstract. This paper describes some of the issues faced by typical emo-
tion recognition systems and the need to be able to deal with emotions
in a natural setting. Studies tend to ignore the dynamic, versatile and
personalised nature of affective expression and the influence that social
setting, context and culture have on its rules of display. Affective cues
can be present in multiple modalities and they can manifest themselves
in different temporal order. Thus, fusing the feature sets is challenging.
We present a composite approach to affective sensing. The term compos-
ite is used to reflect the blending of information from multiple modalities
with the available semantic evidence to enhance the emotion recognition
process.
1 Introduction
Recognising emotions from the modulations in another person’s voice and facial
expressions is perhaps one of our most important human abilities. Such inter-
action is inherently multimodal and for computers to adapt and respond in a
natural, yet robust, manner in real-world situations demands a similar capabil-
ity. This is a great challenge. Affective sensing is the neologism used to describe
recognition of emotional cues by machines. It is the process of mapping mea-
surable physical responses to affective states. Several studies have successfully
mapped strong responses to episodic emotions such as happiness, anger and sur-
prise. However, few studies deal with the more subtle emotions such as anxiety
and depression and most research takes place in a controlled environment, ig-
noring the importance that social settings, culture and context play in dictating
the display rules of affect.
At present, reported examples of affective sensing systems tend to be very
application specific [1,2,3,4,5,6,7,8,9,10,11]. However, in a natural setting, emo-
tions can present themselves in many ways, and in different combinations of
modalities. Thus it seems that some level of semantic incorporation is essential.
For instance, during a diplomatic exchange, anger is more likely to be signaled
through verbal content than, say, in an incident during a football game where a
player remonstrates wildly with the referee. In this paper, a novel approach is
presented which integrates semantic descriptions with standard speech recogni-
tion and computer vision feature sets.
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 104–115, 2008.
c Springer-Verlag Berlin Heidelberg 2008
The Composite Sensing of Affect 105
Some affective states like anxiety can influence breathing resulting in varia-
tions in sub-glottal pressure. Drying of the mucus membrane causes shrinking
of the voice. Rapid breath alters the tempo of the voice. Relaxation tends to
deepen the breath and lowers the voice. Changes in facial expression can also
alter the sound of the voice. Figure 1 represents the typical cues to the six most
common emotion categories [14].
affect within the corpus. In the case of automatic recognition of episodic emo-
tions, this approach is plausible, based on the assumption that clear-cut bursts
of episodic emotion will look and sound somewhat similar in most contexts [28].
However, recognition of pervasive emotions present a much greater challenge
and, intuitively, one would think that awareness of personal and contextual in-
formation needs to be integrated into the recognition process.
Fernandez and Picard [36] used eighty-seven features and concluded that the
recognition rate was still below human performance. One would have to question
how much extrapolation it would take to extend the ASR approach to affective
sensing in a natural setting. Studies by Koike et al. [37] and Shigeno [38] have
shown that it is difficult to identify the emotion of a speaker from a different
culture and that people will predominantly use visual information to identify
emotion. The implications are that the number of feature sets and the amount
of training samples required to take into account natural, social, physiological,
and contextual factors would be infeasible.
Richard Stibbard [39] who undertook the, somewhat difficult, Leeds Emotion
in Speech Project reported,
“The use of genuine spoken data has revealed that the type of data commonly
used gives an oversimplified picture of emotional expression. It is recommended
that future work cease looking for stable phonetic correlates of emotions and
look instead at dynamic speech features, that the classification of the emotions
be reconsidered, and that more account be taken of the complex relationship
between eliciting event, emotion, and expression.”
In keeping with speech recognition, much of the effort to date in emotion
recognition has been concerned with finding the low-level, symbolic representa-
tion and interpretation of the speech signal features. Only a handful of reports
involve real-time facial feature extraction in the emotion recognition process [40]
[30]. Similar points about the need to recognise emotions in natural settings, and
the difficulties of doing so, were made by [41]. To address this deficiency, some
level of semantic reasoning seems essential.
interpret emotions. As Scherer [26] points out, most studies are either speaker
oriented or listener oriented, with most being the former. This is significant when
you consider that the emotion of someone labelling affective content in a corpus
could impact the label that is ascribed to a speaker’s message.
Culture-specific display rules influence the display of affect [28]. Gender and age
are established as important factors in shaping conversation style and content
in many societies.
It might be stating the obvious but there are marked differences in speech
signals and facial expressions between people of different physiological make
up, e.g. age, gender and health. The habitual settings of facial features and vocal
organs determine the speaker’s range of possible visual appearances and sounds
produced. The configuration of facial features, such as chin, lips, nose, and eyes,
provide the visual cues, whereas the vocal tract length and internal muscle tone
guide the interpretation of acoustic output [47].
Social factors temper spoken language to the demands of civil discourse [28].
For example, affective bursts are likely to be constrained in the case of a minor
relating to an adult, yet totally unconstrained in a scenario of sibling rivalry.
Similarly, a social setting in a library is less likely to yield loud and extroverted
displays of affect than a family setting.
Internal state has been included in the model for completeness. At the core
of affective states is the person and their experiences. Recent events such as
winning the lottery or losing a job are likely to influence emotions.
and is meant to serve as an example here, rather than a complete list of affective
cues. Finally, the emotion classification method used in the investigation of a
particular dialogue is also recorded.
We use this ontology to describe our affective sensing research in a formal,
yet flexible and extendible way. In the following section, a brief description of
the facial expression recognition system developed in our group is given as an
example of using the ontologies in practice.
One of the issues in emotion recognition, is that of reuse and verification of re-
sults. However, there is no universally accepted system of describing emotional
content. The HUMAINE project is trying to remedy this through the definition
of the Emotion Annotation and Representation Language (EARL) which is cur-
rently under design [48,49].
Another direction is that of the Moving Picture Experts Group (MPEG) who
have developed the MPEG-7 standard for audio, audio-video and multimedia
description [50]. MPEG-7 uses metadata structures or Multimedia Description
Schemes (MDS) for describing and annotating audio-video content. These are
112 G. McIntyre and R. Göcke
References
1. McCann, J., Peppe, S.: PEPS-C: A new speech science software programme for
assessing prosody. In: The Fifth Annual Parliamentary Reception for Younger Re-
searchers in Science, Engineering, Medicine and Technology (SET for Britain. Tak-
ing science to parliament: The 2003 great British research and R&D show), the
House of Commons, London (2003)
2. Devillers, L., Vasilescu, I., Vidrascu, L.: F0 and pause features analysis for anger
and fear detection in real-life spoken dialogs. Speech Prosody (2004)
3. Jones, C.M., Jonsson, I.: Automatic recognition of affective cues in the speech of car
drivers to allow appropriate responses. Technical report, School of Mathematical
and Computer Sciences, Heriot-Watt University, Edinburgh, UK and Department
of Communication. Stanford University, California, USA (2005)
4. Jones, C.M., Jonsson, I.: Using paralinguistic cues in speech to recognise emotions
in older car drivers. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-
Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
5. Breazeal, C.: Emotion and sociable humanoid robots. Int. J. Human-Computer
Studies 59, 119–155 (2003)
The Composite Sensing of Affect 113
6. Reilly, R., Moran, R., Lacy, P.: Voice pathology assessment based on a dialogue
system and speech analysis. Technical report, Department of Electronic and Elec-
trical Engineering, University College Dublin, Ireland and St James’s Hospital,
Dublin 8, Ireland (2000)
7. Picard, R.: Helping addicts: A scenario from 2021. Technical report (2005)
8. Kaliouby, R., Robinson, P.: Therapeutic versus prosthetic assistive technologies:
The case of autism. Technical report, Computer Laboratory, University of Cam-
bridge (2005)
9. Kaliouby, R., Robinson, P.: The emotional hearing aid: An assistive tool for children
with asperger’s syndrome. Technical report, Computer Laboratory, University of
Cambridge (2003)
10. Petrushin, V.A.: Emotion in speech: Recognition and application to call centres.
In: Artificial Neural Networks in Engineering (1999)
11. Yacoub, S., Simske, S.: X.Lin, Burns, J.: Recognition of emotions in interactive
voice response systems. Technical report, HP Laboratories Palo Alto (2003)
12. Ekman, P.: Darwin, deception, and facial expression. Annals New York Academy
of Sciences, 205–221 (2003)
13. Fry, D.B.: The Physics of Speech. Cambridge Textbooks in Linguistics. Cambridge
University Press, Cambridge (1979)
14. Murray, I., Arnott, L.: Toward the simulation of emotion in synthetic speech. Jour-
nal Acoustical Society of America 93(2), 1097–1108 (1993)
15. Ekman, P., Friesen, W.: Unmasking the Face. Prentice Hall, Englewood Cliffs
(1975)
16. Ekman, P., Oster, H.: Emotion in the human face, 2nd edn. Cambridge University
Press, New York (1982)
17. Ekman, P., Rosenberg, E.L.: What the Face Reveals. Series in Affective Science.
Oxford University Press, Oxford (1997)
18. Ekman, P.: Facial Expressions. In: The Handbook of Cognition and Emotion, pp.
301–320. John Wiley and Sons, Ltd., Sussex (1999)
19. McNeill, D.: Gesture and language dialectic. Technical report, Department of Psy-
chology. University of Chicago (2002)
20. Lien, J., Kanade, T., Cohn, J., Li, C.: Automated Facial Expression Recognition
Based on FACS Action Units. In: International Conference on Automatic Face and
Gesture Recognition, pp. 390–395 (1998)
21. Cootes, T., Taylor, C., Cooper, D., Graham, J.: Active shape models - their training
and applications. Computer Vision and Image Understanding 61(1), 38–59 (1995)
22. Nixon, M., Aguado, A.: Feature Extraction and Image Processing. MPG Books
Ltd., Brodmin, Cornwall (2001)
23. Castellano, G., Kessous, L., Caridakis, G.: Emotion recognition through multiple
modalities: face, body gesture, speech. In: Peter, C., Beale, R. (eds.) Affect and
Emotion in Human-Computer Interaction, vol. 4868. Springer, Heidelberg (2007)
24. Fragopanagos, N., Taylor, J.: Emotion recognition in human-computer interaction.
Neural Networks 18, 389–405 (2005)
25. Polzin, T.: Detecting verbal and non-verbal cues in the communication of emotions.
PhD thesis, School of Computer Science. Carnegie Mellon University (2000)
26. Scherer, K.R.: Vocal communication of emotion: A review of research paradigms.
Speech Communication 40, 227–256 (2003)
27. Cowie, R., Cornelius, R.: Describing the emotional states that are expressed in
speech. Speech Communication 40, 5–32 (2003)
28. Cowie, R., Douglas-Cowie, E., Cox, C.: Beyond emotion archetypes: Databases for
emotion modelling using neural networks. Neural Networks 18, 371–388 (2005)
114 G. McIntyre and R. Göcke
29. Lee, C.M., Narayanan, S., Pieraccini, R.: Recognition of negative emotions from
the speech signal. Automatic Speech Recognition and Understanding (2001)
30. Devillers, L., Abrilian, S., Martin, J.: Representing real-life emotions in audiovisual
data with non basic emotional patterns and context features. Technical report,
LIMSI, Centre national de la recherche scientifique, France (2005)
31. Velten, E.: A laboratory task for induction of mood states. Behaviour Research
and Therapy 6, 473–482 (1968)
32. Schiel, F., Steininger, S., Türk, U.: The SmartKom Multimodal Corpus at BAS.
Technical report, Ludwig Maximilans Universität München (2003)
33. Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. Technical re-
port, School of Computer Science. Carnegie Mellon University (1995)
34. Lin, Y.L., Wei, G.: Speech emotion recognition based on hmm and svm. In: Pro-
ceedings (2005)
35. Scherer, K.R.: Humaine Deliverable D3c: Preliminary plans for exemplars: theory
(2004). Retrieved October 26, 2006,
http://emotion-research.net/publicnews/d3c/
36. Fernandez, R., Picard, R.: Classical and novel discriminant features for affect recog-
nition from speech. In: Interspeech, Lisbon, Portugal, pp. 473–476 (2005)
37. Koike, K., Suzuki, H., Saito, H.: Prosodic parameters in emotional speech. In:
International Conference on Spoken Language Processing, pp. 679–682 (1998)
38. Shigeno, S.: Cultural similarities and differences in the recognition of audio-
visual speech stimuli. In: International Conference on Spoken Language Processing,
1057th edn., pp. 281–284 (1998)
39. Stibbard, R.: Vocal expression of emotions in non-laboratory speech: An investiga-
tion of the Reading/Leeds Emotion in Speech Project annotation data. PhD thesis,
University of Reading, UK (2001)
40. Silva, L.D., Hui, S.: Real-time facial feature extraction and emotion recognition.
In: ICICS-PCM. IEEE, Singapore (2003)
41. Ward, R., Marsden, P.: Affective computing: Problems, reactions and intentions.
Interacting with Computers 16(4), 707–713 (2004)
42. Liscombe, J., Riccardi, G., Hakkani-Tür, D.: Using context to improve emotion
detection in spoken dialog systems. In: EUROSPEECH 2005, 9th European Con-
ference on Speech Communication and Technology, pp. 1845–1848 (2005)
43. Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation
and machine learning based detection. Neural Networks 18, 407–422 (2005)
44. Athanaselisa, T., Bakamidisa, S., Dologloua, I., Cowieb, R., Douglas-Cowie, E.,
Cox, C.: Asr for emotional speech: Clarifying the issues and enhancing performance.
Neural Networks 18, 437–444 (2005)
45. Cowie, R., Douglas-Cowie, E., Taylor, J., Ioannou, S., Wallace, M., Kollias, S.: An
intelligent system for facial emotion recognition. IEEE, Los Alamitos (2005)
46. Town, C., Sinclair, D.: A self-referential perceptual inference framework for video
interpretation. In: Crowley, J.L., Piater, J.H., Vincze, M., Paletta, L. (eds.) ICVS
2003. LNCS, vol. 2626, pp. 54–67. Springer, Heidelberg (2003)
47. Millar, J.B., Wagner, M., Göcke, R.: Aspects of speaking-face data corpus design
methodology. In: International Conference on Spoken Language Processing 2004,
Jeju, Korea, vol. II, pp. 1157–1160 (2004)
48. Schröder, M., Devillers, L., Karpouzis, K., Martin, J.C., Pelachaud, C., Peter, C.,
Pirker, H., Schuller, B., Tao, J., Wilson, I.: What should a generic emotion markup
language be able to represent? In: Proc. 2nd International Conference on Affective
Computing and Intelligent Interaction (ACII 2007), Lisbon, Portugal (2007)
The Composite Sensing of Affect 115
49. Schröder, M., Zovato, E., Pirker, H., Peter, C., Burkhardt, F.: W3c emotion incu-
bator group final report. Technical report, W3C (2007)
50. MPEG-7 Committee: Retrieved June 2, 2007, http://www.m4if.org/m4if/
51. Chiariglione, L.: Introduction to MPEG-7: Multimedia Content Description Inter-
face. Technical report, Telecom Italia Lab, Italy (2001)
52. Salembier, P., Smith, J.: MPEG-7 Multimedia Description Schemes. IEEE Trans-
actions on Circuits and Systems for Video Technology 11, 748–759 (2001)
53. Rege, M., Dong, M., Fotouhi, F., Siadat, M., Zamorano, L.: Using MPEG-7 to build
a Human Brain Image Database for Image-guided Neurosurgery. Medical Imaging
2005: Visualization, Image-Guided Procedures, and Display, 512–519 (2005)
54. Annesley, J., Orwell, J.: On the Use of MPEG-7 for Visual Surveillance. Techni-
cal report, Digital Imaging Research Center, Kingston University, Kingston-upon-
Thames, Surrey, UK (2005)
Emotional Experience and Interaction Design
Abstract. The emotional experience of an interactive system has been the sub-
ject of a great deal of recent interest and study in the HCI community. How-
ever, many of researchers have pointed out the extreme difficulty in predicting
or controlling these emotional experiences through intentional design choices.
However, the user study we conducted proposes a different point-of-view than
these claims. Although these emotional responses were not always tied directly
to the device itself and influenced by contextual factors, we discovered that cer-
tain controllable aspects of interactive products showed clear patterns of emo-
tion in the responses of our participants. We discuss our findings and provide
implications for the design of emotional experiences in interactive devices.
1 Introduction
The importance of “affect and emotion in HCI” has become increasingly significant
when we face so-called “the third wave” or “the third paradigm” of HCI [6, 12]. One
of the clearest comments on the notion of the third wave in HCI is introduced in
Bødker’s keynote article in NordiCHI 2006 [6] where she builds on Bannon’s view
that there is a shift “from human factors to human actors” [3]. Grudin [11] also
mentioned the movement from the non-discretionary use of technologies to the discre-
tionary use where users use technologies for personal purposes, which indicates the
opening-up of this new paradigm.
As we move in this direction, researchers and professionals in HCI have started
to redefine the meaning of “user-centered design” from an emphasis on efficiency
and usability to a broader holistic context of human behavior. In this behavioral context,
we start to adapt the terms like “human-centered” or “experience–centered” design. This
∗
Current address: Department of Industrial Design, KAIST, 373-1 Guseong-dong, Yuseong-gu,
Daejeon 305-701, Republic of Korea, Tel.: +82-42-869-4522, Fax: +82-42-869-4510,
[email protected].
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 116–129, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Emotional Experience and Interaction Design 117
2 Background
We have encountered and used various terms related to each other such as affect,
emotion, and experience in HCI. Numerous researchers have proposed definitions of
these terms as well as the relationships among them [4, 5, 8, 9, 16, 18, 19, 20]. In our
research, it is critical to revisit the definitions proposed by other researchers and to re-
establish an integrated perspective of those meanings, since it will explain the position
we take in our research and why we selected it.
In HCI, many researchers do not clearly identify the differences between the mean-
ings of affect and emotion, and frequently use them as synonyms [4]. Bentley et al. [4]
particularly addressed this issue and explained the key differences between the two,
which we accepted for our research. In [4], they introduced the definitions suggested
by other researchers and clearly identified the difference. For example, emotion is
viewed as “overall feeling” that is influenced by various external and internal stimuli
which include “context, past experience, recent experiences, personality, affect, and
the net cognitive interpretation of these influences” (p.2 in [4]). Affect is defined
more as “a short term, discrete, conscious subjective feeling” (p.3 in [4]), which
shows the contrast from the meaning of emotion.
In our research, our focus is on emotion, as we accept this distinction. We are par-
ticularly interested in feelings that are relatively stable and shaped over longer periods
of time through the experience with interactive products. Bentley et al. [4] also sug-
gest an approach to understand emotions: “In practice, many system evaluations can
account for emotion by asking the user their overall feelings regarding the system
after use. This could be done through a post-use interview or questionnaire.” (p.3 in [4]).
We conducted a type of a post-use interview that is called “disposable camera study”,
which is a type of a probing technique such as the “cultural probing” technique [10].
The detailed procedure of conducting this study is introduced in Section 3.
We did not wish to let a single theory of emotional experience constrain the probes or
responses of our participants. However, we needed to have some basic framework to
categorize and organize the various emotional responses that we encountered.
For this purpose, we accept Norman’s three levels of emotional response [20] as a
starting point for the analysis since it is one of the most comprehensive, general, and
explicable definitions of emotion in HCI. The three levels include the visceral-level
which represents “perceptually-induced reactions,” the behavioral-level which repre-
sents “expectation-induced reactions,” and the reflective-level which represents “intel-
lectually-induced reactions” [22]. The visceral level of response is directly related
to our physical senses. Textures, sizes, temperatures, or colors evoke different emo-
tional responses through the physical senses. These reactions usually shape immediate
feelings about the product. The behavioral level of responses mostly involves cogni-
tive processing of the mind. The emotions at this level are formed through the process
of planning, expectation, and learning. It also involves automatic responses and
Emotional Experience and Interaction Design 119
immediate feelings like the visceral level, but these responses are formed by the proc-
ess of behavioral control over time (for a short period of time), not by physical sens-
ing at a particular moment. The reflective level of responses is the most complex and
challenging to fully understand. It is “the highest level of intellectual functioning in a
person” (p.4 in [22]). If the other two levels are sub-conscious, this level is “conscious
and self-aware” [22]. For example, the same blue color can be interpreted in different
ways and result in different emotions according to different contexts—e.g. situations,
cultures, and past experiences. For example, some people may not accept a blue Ap-
ple Mac Mini [2], because it would go against the recognizable color scheme that
Apple Inc. has established through its brand image.
Then how is emotion related to experience? Emotion can be viewed as “a resource
for understanding and communicating about what we experience” (p. 264 in [9]). That
is, emotion is a significant channel for expressing experience. We can also say
that emotion is what makes experience possible. It is why many of the frameworks
that explain experience proposed by various researchers, are tightly related to Nor-
man’s three levels of emotion. McCarthy and Wright [19] defines experience with
four types of “threads,” including the sensual thread of experience which corresponds
to the visceral level of emotion, the compositional thread of experience which corre-
sponds to the behavioral level of emotion, the emotional thread of experience which
corresponds to the reflective level of emotion, and the spatio-temporal thread of ex-
perience which is indirectly related to both the visceral level and the behavioral level.
The spatial part of the spatio-temporal thread of experience is tightly related to the
visceral level of emotion since the space is mostly sensed by a visual sense—i.e. the
eyes. The temporal part of the spatio-temporal thread of experience is tightly related
to the behavioral level since this level is induced by expectation that requires the
notion of time. Another framework of pleasurable experience proposed by Jordan [16]
includes physical pleasures that correspond to the visceral level, physiological pleas-
ures that correspond to the behavioral level, and ideological and social pleasures that
correspond to the reflective level.
In our study, we particularly focused on emotion as the foundation for experiences
that can be verbalized directly by users and compared. The primary objective of our
study is to examine how different types of interactive product qualities are related to
people’s emotions. The purpose of this study was to gather insights valuable to de-
signers when designing interactive products in relation to the issues of emotions.
We conducted a disposable camera study for twelve participants. Six participants had
some background in design, as they were the students of the HCI/d master’s program
in the School of Informatics at Indiana University. The other six participants did not
have any design background, and they were either undergraduate or graduate students
in various universities. The ages ranged from early twenties to early thirties.
The disposable camera study is a type of an interview-based study that combines a
picture-journaling (or probing) activity by participants. We first asked the participants
to take several pictures of five to ten interactive products that they own or have fre-
quently used. We asked them to select the products they particularly liked, disliked, or
valued. This naturally led them to select the ones for which they have some strong
emotional response. While we conducted the studies, we did not directly ask them to
describe their “emotional experiences” since that term can be interpreted in different
ways by different people. Instead, we simply asked them to freely explain any kinds of
experiences and memories they have with the products they selected, while also probing
for semantic differentials in a questionnaire we used during the interview (Table 1). We
did not show the terms like visceral, behavioral, and reflective to the participants. We
also mixed the order of these semantic differentials to minimize any bias.
Table 1. Semantic values selected for representing the three levels of emotion
We provided a digital camera for each participant, which they used to photograph
their device. Then we met with them to talk about their pictures and to conduct in-depth
interviews about the products they showed. Each interview took about an hour to an
hour and half. The interview was structured in two parts: (1) general questions about
their demographic information and overall impressions of the selected products in terms
of why they like, dislike, and/or value the product; and (2) semantic differential
Emotional Experience and Interaction Design 121
questions (Table 1) that cover the three levels of emotion and extract their emotional
experiences with the respective products. For this second part of the interview, we
also asked why they selected a certain value for each semantic differential in terms of
the corresponding product qualities of that specific value they assigned. We presented
these semantic differentials to the participants in a random order.
The semantic values for the semantic differentials (Table 1) are very carefully se-
lected in order to investigate how people experience emotion through the respective
product qualities in terms of the three levels of emotion proposed by Norman [20].
The semantic differentials are also referred to simply as “semantic values” or “seman-
tics” occasionally in this paper. The four semantic values for the visceral level are all
related to physical senses, and the four semantic values for the behavioral level are all
related to cognitive processes. We also use eight semantic values for the reflective
level that all require personal reflections in order to decide the corresponding values.
Therefore, a total of 16 semantic values are used for examining the participants’ emo-
tional responses toward the interactive products they selected. We intentionally se-
lected the larger number of reflective-level semantic values so that we could have
more data to characterize this complex portion of Norman’s emotion framework.
We treated the beautiful-ugly dimension as an example of the reflective level since
we originally accepted Hassenzahl’s [14] doubtful perspective on visceral beauty.
However, our data analysis from this study showed a contrasting result of this view
which we will explain in detail at the next section.
We recorded all the interviews using digital audio recorders, and transcribed all the
comments from the participants using the template shown in Table 2. This template is
designed to extract two major types of information from the participants’ comments
in relation to their emotional experiences. These include (1) the qualities of the inter-
active products corresponding to each semantic differential question, and (2) the
qualities of user experience involved in the use of the selected interactive products
corresponding to each semantic differential question. We organized this information
according to each semantic, numbered all 16 semantic differentials from 1 to 16, and
used that number as the code to indicate each semantic (Table 2).
For the interactive product qualities, we used six types of qualities. These qualities
include:
1. Interaction (coded as ‘PI’) – qualities that enable people’s interactions with the
product, such as interfaces.
2. Visual (coded as ‘PV’) – qualities that people can see such as colors, sizes, shapes,
visual material qualities,
3. Tactile (coded as ‘PT’) – qualities that people can feel by touching or grabbing
such as weight, texture, etc.
4. Content (coded as ‘PC’) – contents that are carried, accessed, or delivered by the
product such as music, news, video, games, etc.
5. Function (coded as ‘PF’) – capabilities and functionality such as playing music,
calling to someone, taking pictures, etc.
6. Performance (coded as ‘PP’) – qualities of how well the product performs the
expected functions such as resolution, sound quality, speed, etc.
122 Y. Lim et al.
Table 2. The template used for transcribing the interview data. (The italicized texts indicate the
example records we transcribed into this template).
These six qualities may not cover all the possible qualities of any type of interactive
product that exists in this world. However, these six qualities cover all the qualities that
were mentioned by our participants from their respective product experiences. All the
transcribed data from our study could be categorized with one of these types of product
qualities. This framework is more comprehensive than the work by Norman and
Ortony [22] who discussed the three levels of emotion only in relation to function and
appearance. For the user experience qualities, we used four types including:
1. Visceral Quality (coded as UXV) – the visceral level of emotion that is emotional
responses formed by physical senses such as “looks nice,” “feel cold,” etc.
2. Cognitive Quality (coded as UXC) – the behavioral level of emotion that is formed
from cognitive processes such as “easy to use,” “simple to use,” “hard to figure
out,” etc.
3. Usefulness Quality (coded as UXU) – an indirect quality related to their overall
needs rather than a certain specific type of emotional response, such as “fulfills
what I need,” “practical”, “does what I want it to do,” etc.
4. Reflective and Social Quality (coded as UXRS) – the reflective level of emotion
such as “it is a trend,” “creates an artificial world,” “everyone has these prod-
ucts,” etc.
The reason for extracting these user experience qualities in addition to the semantic
differential values is to validate if our assumptions on the relationships between the
semantic values and the three levels of emotion are appropriate. The result of this
analysis is described in the next section.
The usefulness quality (UXU) is not directly related to any of the three levels of
emotion, but is included to examine how this aspect of user experience is related to
emotional experiences. We wanted to make sure which level of emotion is the rele-
vant one for this quality. The analysis result for this question is also described in the
next section.
For the data analysis, each analyst (total 6 analysts) transcribed two participants’
data into the template, and one supervising analyst went through all the data tran-
scribed by every analyst to make sure the data is consistent with the definitions of the
product qualities and the experience qualities. Before starting the data analysis,
Emotional Experience and Interaction Design 123
the definitions given here were established by consensus. A total of 1,020 com-
ments were collected from all the 12 participants, and a total of 64 interactive
products were discussed.
3.2 Results
We discuss the results to understand (1) how our selections of the semantics are re-
lated to the three levels of emotion as a first examiner before we closely analyze the
relationship between interactive product qualities and the three levels of emotion
outcomes, (2) how usefulness-related user experience can be understood in relation to
the three levels of emotion, (3) what qualities are more significant than others when
people are discussing about their emotional experience with the product, and (4)
whether the different levels of emotion have distinctive relationships with the differ-
ent types of the interactive product qualities. This part establishes if the product quali-
ties have a random or non-deterministic effect on emotion.
The second aspect of analysis among these four, regarding the usefulness experi-
ence, was specifically addressed since there has not been much examination done by
researchers in HCI. One prominent previous research outcome about the relationship
between perceived-usefulness and perceived-affect qualities includes Zhang and Li’s
work [25], but our research outcome adds further valuable insight to this relationship
by examining which level of emotion the usefulness experience is most tightly related
to. To the best of our knowledge this aspect has not been addressed in the literature.
We find this information meaningful in terms of opening up a future research agenda,
i.e. how we can define usefulness in terms of its affect on emotional experiences.
The results from each of these four aspects of analysis are explained in the follow-
ing sections.
The Semantics and the Three-level Emotions. We conducted k-means cluster analy-
sis among the semantic differentials used in the survey. The analysis was based on the
numbers of comments mentioned by the participants about the different types of user
experience qualities for each semantic differential pair. Each distribution of the com-
ments in relation to the user experience qualities (i.e. UXV, UXC, UXU, and UXRS)
for each semantic differential pair was compared with one another to cluster the se-
mantic values. This method was used to examine if the original semantic-emotional
groupings shown in Table 1 are clustered accordingly in our participant results.
Table 3 shows the result of this cluster analysis. As we see here, only two seman-
tics (i.e. beautiful-ugly and deep-shallow) are clustered into other groups than what
we first assumed. All the semantics related to the visceral and the behavioral levels
are clustered as we originally assumed.
As we briefly mentioned in the previous section, our original assumption based on
Hassenzahl’s claim of “no visceral beauty” [14] needs to be re-examined according to
this result. From our data, we could clearly see that the participants were discussing
their experiences in the visceral level when reasoning about their selection of the
value related to the beauty (the beautiful-ugly semantic). This occurred in their de-
scriptions of colors, material feelings, shapes, sound qualities, etc. There were also
several comments that were related to the reflective level such as “trendy,” “cool,”
124 Y. Lim et al.
Table 3. K-means cluster analysis results for clustering semantics in relation to the data about
user experience qualities. (*clusters that are clustered to a different group from the originally
assumed one).
Fig. 1. The means of the numbers of comments the participants mentioned in terms of the
usefulness-related experience (UXU) according to the three levels of emotion. (Each mean for
each level of emotion is calculated by adding the numbers of UXU comments related to each
level of emotion (following our result shown in Table 3) and dividing it with the number of the
semantics relevant to each level of emotion).
Fig. 2. The numbers of comments the participants mentioned regarding all types of their emo-
tional experience per each type of the interactive product qualities
As shown in Fig. 2, the functional and interaction qualities are the most influential
qualities when shaping people’s emotional experience with interactive products, and
the visceral and tactile qualities are also relatively important although they are not as
significant as the functional and interaction qualities.
Fig. 3. The rates of the numbers of comments the participants mentioned related to each seman-
tic differential associated to each type of the interactive product qualities
and interaction qualities are tightly related to the behavioral-level emotion (Fig. 3(b)).
For the reflective-level chart (Fig. 3(c)), the distribution looks like a combination of
both visceral and behavioral levels as we see the functional, interaction, tactile, and
Emotional Experience and Interaction Design 127
visual qualities are significant. However, the functional quality was shown as the
most significant influencer in the reflective level which was not the case in other
levels.
Most of the distributions are resulted as we expected except for the distribution of
“sympathetic-unsympathetic”, as we can see in Fig. 3(c). It is interesting since it is the
only semantic that stands out for the performance quality comparing to the other se-
mantic differentials. From people’s comments, we could understand that they feel
things are sympathetic when they perform as they expected—e.g. “the device is un-
sympathetic because it is unreliable,” or when they make errors—e.g. “feels sympa-
thetic when the machine makes mistakes.”
other types of mp3 players not just because of its look, but also because of its unique
interaction quality that allows people to browse music with the wheel interface. This
supports the notion that interaction quality and useful functionality are significant
components to affect people’s emotional experience with a given interactive products.
Implications of Understanding Emotional Experience for Interaction Design. Our
study supports the fact that the three levels of emotion are clearly distinguishable in
our data. Furthermore, in spite of some researchers’ doubtful perspectives regarding
designing for emotional experience [7, 13], our results showed that different qualities
of interactive products do in fact have different roles in affecting various levels of
emotion.
This tells us that there are at least three research directions potentially valuable in
terms of helping designers to effectively consider and project outcomes of emotional
experiences toward what they design:
1. Continuous research on developing improved models of emotion and experience
(building on the previous work in HCI such as Norman’s [20], Jordan’s [16], and
McCarthy & Wright’s [19]),
2. Continuous research on the relationship between the models of emotion and a
deeper understanding of interactive product qualities (similar to what is presented
in this article), and
3. Research on developing useful design principles for enhancing positive emotional
experiences.
As we briefly mentioned earlier in this paper, designers must not give up on ac-
commodating or directing emotional experiences since it is apparent that how we
design does in fact influence users’ emotions. Although it may not be possible to fully
control emotional experiences by design, our results show that it is possible to utilize
our results about what types of product qualities are more tightly related to which
types of emotional experiences. The results shown in Fig. 3 especially provide valu-
able knowledge about product qualities that can be manipulated through design in
order to shape emotional experiences according to the semantic differential space. For
future research, we need to continue to examine these relationships in even more
depth and detail to increase the resolution of our findings.
Acknowledgments. We would like to thank all the participants to our studies.
References
1. Apple - iPod, Apple, Inc. (August 31, 2007),
http://www.apple.com/ipod/ ipod.html
2. Apple - Mac mini, Apple, Inc. (June 21, 2007), http://www.apple.com/macmini/
3. Bannon, L.: From Human Factors to Human Actors: the Role of Psychology and Human-
computer Interaction Studies in System Design. In: Greenbaum, J., Kyng, M. (eds.) Design at
Work: Cooperative Design of Computer Systems, Erlbaum, pp. 25–44 (1986)
4. Bentley, T., Johnston, L., von Baggo, K.: Evaluation Using Cued-recall Debrief to Elicit In-
formation about a User’s Affective Experiences. In: Proc. of CHISIG 2005. ACM Interna-
tional Conference Proceeding Series, vol. 122, pp. 1–10 (2005)
Emotional Experience and Interaction Design 129
5. Boehner, K., De Paula, R., Dourish, P., Sengers, P.: Affect: from information to interaction.
In: Proc. of CC 2005, pp. 59–68. ACM Press, New York (2005)
6. Bødker, S.: When second wave HCI meets third wave challenges. In: Proc. of NordiCHI
2006, vol. 189, pp. 1–8. ACM Press, New York (2006)
7. Csikszentmihalyi, M.: Design and Order in Everyday Life. In: Margolin, V., Buchanan, R.
(eds.) The Idea of Design, pp. 118–126. MIT Press, Cambridge (1995)
8. Desmet, P., Hekkert, P.: Framework of Product Experience. International Journal of De-
sign 1(1), 57–66 (2007)
9. Forlizzi, J., Battarbee, K.: Understanding experience in interactive systems. In: Proc. of DIS
2004, pp. 261–268. ACM, New York (2004)
10. Gaver, B., Dunne, T., Pacenti, E.: Design: Cultural probes. Interactions 6(1), 21–29 (1999)
11. Grudin, J.: Is HCI homeless?: in search of inter-disciplinary status. Interactions 13(1), 54–59
(2006)
12. Harrison, S., Tatar, D., Sengers, P.: The Three Paradigms of HCI. In: Proc. of alt.chi CHI
2007. ACM, New York (2007)
13. Hassenzahl, M.: Emotions can be quite ephemeral; we cannot design them. Interac-
tions 11(5), 46–48 (2004)
14. Hassenzahl, M.: Beautiful Objects as an Extension of the Self: A Reply. Human-Computer
Interaction 19(4), 377–386 (2004)
15. Isbister, K., Höök, K., Sharp, M., Laaksolahti, J.: The sensual evaluation instrument: devel-
oping an affective evaluation tool. In: Proc. of CHI 2006. ACM Press, New York (2006)
16. Jordan, P.W.: Designing Pleasurable Products: An Introduction to the New Human Factors.
Taylor and Francis, London (2000)
17. Krippendroff, K.: The Semantic Turn: A New Foundation for Design. CRC Taylor & Fran-
cis, Boca Raton (2005)
18. Mahlke, S.: Understanding users’ experience of interaction. In: Proc. of the 2005 Annual
Conference on European Association of Cognitive Ergonomics. ACM International Confer-
ence Proceeding Series, vol. 132, pp. 251–254. University of Athens (2005)
19. McCarthy, J., Wright, P.: Technology as Experience. MIT Press, Cambridge (2004)
20. Norman, D.A.: Emotional Design: Why We Love (or Hate) Everyday Things. Basic Books,
New York (2005)
21. Norman, D.A.: Words matter. talk about people: not customers, not consumers, not users. In-
teractions 13(5), 49–63 (2006)
22. Norman, D.A., Ortony, A.: Designers and Users: Two Perspectives on Emotion and Design.
In: Proc. of the Symposium on “Foundations of Interaction Design” at the Interaction Design
Institute, Ivrea, Italy (2003)
23. Palen, L., Bødker, S.: Don’t Get Emotional! In: Peter, C., Beale, R. (eds.) Affect and Emo-
tion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
24. Preece, J., Rogers, Y., Sharp, H.: Interaction Design: Beyond Human-Computer Interaction.
John Wiley & Sons, Inc., Hoboken (2002)
25. Zhang, P., Li, N.: The importance of affective quality. Commun. ACM 48(9), 105–108
(2005)
How Is It for You? (A Case for Recognising User
Motivation in the Design Process)
Abstract. This research suggests that a novel approach for extending the de-
signers understanding of the user can be found in combining psychological and
HCI perspectives. The study examines the users’ emotional response to the de-
sign of software in Human-Computer Interaction (HCI), the influence it has on
their attitudes and expectations of the software, and their motivation to use it.
Reference to contemporary theories of human motivation and emotion in psy-
chology, adapted for application to the user at the interface, provide a new
‘lens’ through which to examine the user, and to elicit new data to inform the
design process. The exploration of underrepresented variables in human-
computer interaction establishes the basis for additional tools and measures that
can be employed by the designer to assess the success, or otherwise, of the user
interface. Findings indicate that the opportunity to manage and capitalise on
user interaction in the design process is under-recognised.
1 Introduction
It is likely that it will never be possible to ‘know enough’ about humans and the way
that we behave in different environments and under different conditions, and this is
particularly true of Human Computer Interaction (HCI). The immense complexity of
human interaction challenges all those that work in this area.
Software design over the last 20 years has led to the predominance of engineering
principles for usability and interface design, where the overall objective lies in opti-
mising user performance and satisfaction by developing usable interfaces. Applied
knowledge of human information processing has enabled designers of interfaces to
exploit the strengths and support the weaknesses of human cognition, and thereby
tailor the design. As new knowledge emerges on the human response, designers need
to recognise and evaluate the implications for software interface design.
This paper seeks to focus on the user response (as a manifestation of the mental
processes of perception, cognition and emotion) which has implications for software
usability through the interactive process between humans and computers. From a
design perspective there appears to be a lack of empirical research investigating the
determinants of important aspects of behaviour such as emotion and motivation, and
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 130–141, 2008.
© Springer-Verlag Berlin Heidelberg 2008
How Is It for You? 131
2 Hypothesis
The authors believe that human behaviour is under-represented in the software design
process and this important variable in the interaction process has consequences for
interface design and usability. As with all visual communication designed to influence
the viewer, the interface can embody similar techniques for persuading the user, and
go much further as an interactive event with the potential to exploit behaviour patterns
stimulated through perception. The authors hypothesise that the design of the inter-
face and user experience can influence the users’ behaviour and that the use of an
iterative design process presents an opportunity that should be recognised and ex-
ploited. With an expanding user base, in terms of the level of experience and knowl-
edge of many applications, together with the increasing popularity of web-based
applications and requirements for accessibility, a focus on user's needs presents new
opportunities for the designer. Following an extensive series of user-centred tests
using recent definitions from psychology and applying these principles to HCI it is
132 S. Walker and D. Prytherch
suggested that motivation, and its energising component emotion, plays a critical role
in the user’s perceptive, cognitive and evaluative processes, and influences choice
behaviour. A predictive explanatory framework is envisaged within which motivation
can be analysed, measured and evaluated for design implications. It will be possible to
examine why users are motivated and de-motivated at the interface, and measure
relative changes in these states. Such a framework could be used for troubleshooting
and prediction in terms of the ability of the software interface to support the user.
In an attempt to bring the user and computer system closer together Picard [5]
describes research that capitalises on the computer’s strengths to create more mean-
ingful interaction by enabling computers to recognise human emotion, and respond
appropriately, through the use of emotional intelligence. Picard warns that too little
emotion in the interaction process may be detrimental to decision-making, and she
makes a comparison with patients who suffer from a lack of emotions.
This approach is supported by research at MIT [6] [7], that revealed that despite
knowing the computer has particular limitations, people often respond to computers
as they would to other people,
“An analogy with human-computer interaction also holds, where people think of
computers as annoying and stupid, or as intimidating, and of themselves as dum-
mies…we all know better than to treat computers like humans, and yet many of our
default behaviors and responses tend in this direction.” [5]
From the user’s point of view, assumptions (which are a typically human characteris-
tic) can lead to problems in the interaction process, for example, novice users are par-
ticularly susceptible to the notion that the computer must possess human-like qualities,
which can open ‘gaps’ in the conceptual model the user has of the system. The affor-
dances of the system [8], how the user perceives they can perform the task, can differ
from how they actually need to proceed to complete the task successfully, influencing
perception and behavioural response. This ‘gap’ can actually polarise different user
groups; more experienced users can apply ‘typical’ computer constraints to their con-
ceptual model, and are possibly unprepared for a more flexible, intuitive interface.
An important ingredient for a successful interface design is a well-researched de-
sign model; the design model should embody the user’s task requirements, back-
ground, experience and capabilities in terms of handling information and complexity.
The ‘success’ of the interaction process dictates the usability of the system. A simple,
effective design enables the user to have a comprehensible task domain model of what
the system does, and free's up the user's cognitive capacity, enabling the user to
concentrate on the task in hand. This is associated with “Cognitive Load Theory”
(CLT) [9] which suggests that human cognitive architecture operates by a combina-
tion of working and long term memory and that
“Working memory, in which all conscious cognitive processing occurs, can handle
only a very limited number—possibly no more than two or three—of novel inter-
acting elements.” [10]
As designers, our approach to understanding human behaviour and the interaction
process must recognise the evolutionary development of the user’s knowledge and
How Is It for You? 133
experience when describing the requirements of the interface for software design. The
interface provides more than just sequence control, it should clearly represent the
user’s model of the task, and should be able reflect and facilitate complex behaviour
patterns of the user. As Hooper [11], illustrates:
“The interface is the stage on which our interactions with computer systems are
shown, but it is the nature of play that is of relevance, not just the physical actions
of the players.”
successful the interface is. These gulfs are bridged by addressing issues from either
direction, the computer or the user [16]. The system designer can bridge such ele-
ments by creating interaction mechanisms that better match the psychological needs
of the user, evident from the task model. In an ideal case no effort is required to
bridge these elements, an invisible interface and a foolproof design model is required
to achieve this, emphasising the importance of the relationship between the user’s
model and the design model for successful interaction. Users will have different user
models and it is likely that the design model specified is based on what is considered
a typical user for the application. A careful balance is required when the design is
specified, highlighted by Norman [15]. Automation must concentrate on the tasks that
the user need not get involved in (bridging the gulf). The system design should em-
power the user as too much reliance on the user to execute tasks that can be auto-
mated dilutes the user’s cognitive focus on the task. As a result the user may feel the
system is unworkable and can feel de-motivated and alienated from it.
A utopian ‘invisible’ interface would be simply a means by which someone entered
the pursuit of an activity and if such an ideal existed, there may be no justification for
research into HCI. The complexity of the human response makes an ‘invisible’ inter-
face a difficult goal, but it is possible to come close to this ultimate tool through im-
proved recognition of the user’s requirements and needs [17]. A simple illustration
might be the design of a pen. If a manufacturer came up with a design that for some
people was too narrow to grip, they would have misinterpreted the user model for a
pen and how the user likes to write. However, by producing a wide variety of differ-
ent pens, shapes, sizes, colours, the designers are providing for a range of user mod-
els, as well as catering for emotional appeal. The user’s model of the system can be
described in part, by their response during use which can be observed and recorded.
Information revealed about the user in this way can be limited, as not all knowledge is
externalised in a form that is traditionally collected by commonly used techniques;
other methodologies derived from an interdisciplinary approach are necessary to fill
these ‘gaps’ in the user model. User motivation is a manifestation of the complexity
of the human response and is fundamental for the success of the interaction process.
Recognition of the role of identified stimuli of motivation can contribute to an as-
sessment of the interface and be one of the indicators of how effectively design has
addressed user centred needs and requirements.
Motivation is a response to the interaction process, illustrated in Figure 1 that leads
to a positive cycle that is self-propagating, i.e. increased motivation leads to improved
interaction that can lead to increased motivation. There is a notable relationship to the
model proposed by Porat and Tractinsky [18].
Nielsen and Pullin [19] describe the potential impact of technology on human
behaviour,
“The right technology integration can instil magic, the opposite causes disappoint-
ment, frustration and confusion.”
The ability to recognise and use techniques that work to enhance user motivation at
the software interface is important to all types of user. For example developing inter-
faces that stimulate users’ motivation may be instrumental in attracting and sustaining
How Is It for You? 135
the interest of a younger audience, as they may be more prone to motivational stimuli
in making assessments of whether to continue or cease activity [17]. Highly motivated
users may be more likely to respond interactively with the software and thus gain a
higher level of involvement and therefore understanding; the system is thus more
effective at facilitating the user’s task. A motivated user is more likely to overcome
obstacles and any initial lack of understanding, and take a computer-based task
through to completion. The enhanced energy and focus it provides the user could be
viewed as motivational insurance.
Though individual users will naturally differ in their requirements and needs, ex-
tending the data collected with reference to specific psychological models of
motivation will progress insights in the relationship between design and the user’s
experience. This new information for designers will help to re-prioritise design con-
siderations and, most importantly, may reveal previously undetected problems relat-
ing to usability that are inaccessible to traditional user satisfaction techniques. Recent
research has explored the application of certain psychological theories to software
design [20], even creating systems that recognise human emotion. This work is nas-
cent, and there is only scant research on improving our understanding of the user in
the design process. Designers require a dual approach, both capitalising on what mod-
ern computers can offer in terms of processing power, speed, detailed high definition
graphics, new operating systems and connectivity, and most importantly developing
user-led design research to determine what qualities of the human user can (and need
to) be exploited (through stimulation to create a specific response) and supported to
deliver a more effective and consequently “affective” experience.
The designer’s task is to define through design the interaction process between user
and computer. Norman [16] developed a simple model describing user interaction as a
linear sequence of events. This type of simplification delivers a sense of order to the
unpredictability of human interaction. Experiencing real life practice enables innova-
tive, useful design; the gathering of field data by testing the system with the user is
an essential part of checking and re-modelling the mechanisms to cater for different
user task models. It can highlight any ‘gaps’ between the user’s model of the system
and the design model. The designer will want to note whether the user response is
136 S. Walker and D. Prytherch
characteristic or uncharacteristic when considering the user group. The nature of the
positive and negative responses must be elicited to determine how the user’s experi-
ence is enhanced or degraded. The designer should attempt to develop a designed
response to these observations enabling the user to feel they can relate to the interface.
There are opportunities to consider the users’ individual techniques within the design,
and the design model must be flexible enough to cater for those whose behaviour and
approach may be different from the expected “norm”. This highlights the need to
correctly interpret information collected in user tests, assisting the production of an
informed design model that ensures the key issues in human computer interaction are
addressed, moving us closer towards the designer’s goal of an ‘invisible’ interface.
A useable system is, in many cases required to extend beyond a simple functional
interface; it must incorporate other opportunities to support users' needs, for example
the human need to be creative or other personal goals. Usability will suffer if the user
is searching for opportunities that don’t exist. Moran [21] proposes the notion that
computers are over-formal, and formalise our behaviour and constrain our thinking in
ways that are not necessarily productive, partly due to the fact that in many cases, the
way users interact has in part, already been predetermined by the designer. Moran
believes that computers need to be as informal as pencil and paper.
“There seem to be natural structures for certain kinds of activities. But a lot of activi-
ties aren’t easy to structure, particularly creative activities, where I think computers
haven’t helped us nearly as much as I would like to see. I would like computers to
aid and liberate thinking, not constrain it.”
The overall style and approach of the design concept can have a dramatic impact
on users’ behaviour and interaction. An example of this might be the difference in
attitude of many Apple Macintosh users in comparison with IBM PC compatible and
early Microsoft Windows users. Turkle [22] describes the evolving two cultures of
personal computing that had developed by the late 1980s,
“IBM reductionism vs. Macintosh simulation and surface: an icon of the modernist
technological utopia vs. an icon of post-modern reverie. For years, avid loyalist on
both sides fought private and not-so-private battles over which vision of computing
was “best”.”
Methods of bringing the computer closer to the user derive from opportunities for
involving the end user through the design process. With specific task models in mind, it
is likely the designer will further explore designing an interface that uses metaphors to
represent function, thus developing the perceived affordances for the user. The designer
employs the usual techniques for measuring usability e.g. heuristics, and other measures
of usability provide information on the user’s understanding and ability to use the sys-
tem. The opportunity exists to influence the attitude and behaviour of the user towards
the system, thus improving its usability. The designer needs to be involved in creative
experimentation, replacing existing metaphors and exploring more appropriate represen-
tations for the user. Developing usability in a system design can be understood in terms
of four components identified by Bennett [23] and later operationalised by Shackel [24]
so that they could be tested. These are shown in Figure 2.
How Is It for You? 137
delivering the potential for enabling the designer to attribute design features to com-
ponents of motivated behaviour in the user. MST integrates the conceptual frame-
works of 32 motivational theories around its core concepts. It was also found to
compare well with models and theories already used to describe user interaction in
HCI, for example, the Technology Acceptance Model (TAM) [2].
Fig. 3. Webomatic (A basic web site builder) was used for the tests
A user test was devised and twelve users, selected for their roles within SMEs (to
have some involvement in web site design/content/construction and management)
were introduced to Webomatic. All users had a demonstration of how to use the soft-
ware before a timed test. Users completed a questionnaire that asked a series of ques-
tions about their experience, and linked design issues with motivational theory. The
users were allowed thirty minutes to create a basic five page web site (this was easily
achievable). During the test, the users were given keyword tables (emotions) with an
intensity scale to record how they felt during the test. They were also asked to think
out loud as the tests were recorded. Activity on-screen and a camera recording facial
expression assisted in the interpretation and analysis of data collected via question-
naires. Reflective interviews were used to elicit further information about the user's
experience, and in particular for comments on key events.
The descriptive terminology (for example, personal goals) used in MST required
modification to improve semantic relevance to the HCI context. The application of the
theory through the user tests provided a new ‘lens’ through which to view the users'
experience, and this proved to be an effective tool during the user tests to gauge
How Is It for You? 139
References
1. Picard, R.W., Klein, J.: Computers that Recognise and Respond to User Emotion: Theoreti-
cal and Practical Implications. Interacting with Computers 14(2), 141–169 (2002)
2. Davis, F.D.: Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Informa-
tion Technology. In: Venkatesh, V. (ed.) (2000) Determinants of Perceived Ease of Use: In-
tegrating Control, Intrinsic Motivation, and Emotion into the Technology Acceptance Model,
Information Systems Research, pp. 342–365 (1989)
140 S. Walker and D. Prytherch
27. Tractinsky, N., et al.: A Few Notes on the Study of Beauty in HCI. Human-Computer Inter-
action 19, 351–357 (2004)
28. Norman, D.A.: Emotional Design: Why we Love (or Hate) Everyday Things. Basic Books,
New York (2004)
29. Norman, D.A.: Emotionally Centred Design. Interactions, 53–71 (May, June 2006)
30. Hassenzahl, M.: The Interplay of Beauty, Goodness, and Usability in Interactive Products.
Human-Computer Interaction 19, 319–349 (2004)
31. Sklar, A., Gilmore, D.: Are you Positive? Interactions, 28–33 (May, June 2004)
32. Lindgaard, G., et al.: Attention Web Designers: You have 50 Milliseconds to make a Good
First Impression! Behaviour & Information Technology. 25(2), 115–126 (2006)
33. Peter, C., Herbon, A.: Emotion Representation and Physiology Assignments in Digital Sys-
tems. Interacting with Computers 18, 139–170 (2006)
34. Partala, T., et al.: Real-Time Estimation of Emotional Experiences from Facial Expressions.
Interacting with Computers 18, 208–226 (2006)
35. Light, A.: Adding Method to Meaning: A Technique for Exploring People’s Experience with
Technology. Behaviour & Information Technology 25(2), 175–187 (2006)
36. Axelrod, L., Hone, K.S.: Affectemes and Allaffects: A Novel Approach to Coding User
Emotional Expression During Interactive Experiences. Behaviour and Information Technol-
ogy 25(2), 159–173 (2006)
Affect as a Mediator between Web-Store Design and
Consumers’ Attitudes toward the Store
1 Introduction
The introduction of e-commerce “is the most wide-ranging and significant area of
current development in marketing” [1]. These developments increase the importance
of how potential consumers view retail stores and how design features of the Web
stores influence consumers’ perceptions. The importance of Web-store design had
generated considerable business interest in HCI concepts, practice, and design guide-
lines. The realm of store design can no longer rely on traditional marketing alone
since the developing for the Web employs major aspects of information technology
and HCI. One of the most intriguing challenges in the marriage of the marketing and
the HCI disciplines is the apparent tension that exists in design that has to accommo-
date traditional marketing principles on the one hand and HCI goals and criteria on
the other hand.
Thus, for example the field of marketing has been intensively involved in attempts
to influence consumers’ emotions through advertisements, and product and store de-
sign (e.g., [2], [3], [4]). Efficient and accurate information processing, navigation, and
task execution by customers are not of major concern here. In fact, some marketing
techniques attempt to make the information processing or the shopping process even
less efficient for various reasons (e.g., [5], [6], [7], [8]).
The field of HCI, on the other hand, has traditionally been dedicated to the study
and the practice of ease of use, and has emphasized accurate and fast task execution
as its main success criteria. Until recently, the field has refrained from dealing with
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp.142–153, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Affect as a Mediator between Web-Store Design and Consumers’ Attitudes 143
the affective aspects of the interaction [9], [10]. Thus, the coupling of these contrast-
ing disciplines in a new business model is challenging for both research and practice.
Currently, research on the merger of marketing and HCI is scarce [1], [11]. Studies
and design recommendations have mainly concentrated on the usability aspects of the
Web store (cf. [12]), but not on the emotional aspects of the interaction. By far, stud-
ies of e-retail environments have not attempted to generate robust conceptual frame-
works. Thus, there are only tentative explanations regarding how Web site design af-
fects consumers’ emotions, beliefs, attitudes and behavior vis a vis a particular vendor.
In this paper, we intend to help bridge the gaps between traditional and online re-
tail research. For this purpose we integrate theoretical concepts and frameworks
from both the field of Marketing and the field of HCI. The main objective of this pa-
per is to propose a theoretical framework that puts the consumer emotions at center
stage. The framework is based on the environmental psychology model of Mehrabian
and Russell [13], which has been used to describe the atmospheric qualities of various
environments and their effects on people’s emotions and behavior.
2 Background
atmospherics. Recently, studies have begun to explore various aspects of the affective
qualities of e-retail environments (e.g., [37], [38], [39]). Richard [40] examined the
impact of Internet atmospheric cues on surfer behavior. One of the key findings
was that entertainment cues (escapism, diversion, aesthetic enjoyment and emotional
release) affected site involvement and site attitudes. Fiore et al. [41] tested linkages
between emotion, hedonic value, consumer characteristics, and responses toward
an online store. Their results revealed positive relations between hedonic value
(image interactivity-feature in an apparel Web site) and emotional pleasure and
arousal. Both arousal and pleasure enhanced willingness to patronize the on-line store.
Mummalaneni [42] used the environmental psychology model [13] to demonstrate the
importance of enhancing Web site navigation through the purchasing stages.
Retailers of electronic stores rely on their Web sites to attract customers, present prod-
ucts, provide services and complete transactions. Over the last decade, a large number
of studies evaluated e-commerce Web sites and their impact on customers’ attitudes,
shopping intentions, and satisfaction. In line with the traditional HCI approach, most of
these studies have taken a cognitive perspective, emphasizing the Web-stores’ function-
ality and usability (e.g., [43], [44], [12], [45]). Users’ might consider other criteria,
though, when shopping on the Web. For example, empirical evidence suggests that aes-
thetics is an important determinant of preferring a Web site [46], [47] or of influencing
user satisfaction [48]. Consequently, recent studies have concentrated on understanding
the relationships of Web site design with affective reactions [37], [38], [49], aesthetic
perceptions [50], customers’ beliefs [51], and site success [52].
3 Proposed Model
We propose a model that builds on the relation between perceived design qualities of
the Web-store (i.e., e-store atmospherics), the emotions induced by those qualities,
and consumer attitudes. The framework builds on two research streams. (1) An envi-
ronmental psychology model of affective states [13], [53], which suggests that the
affective reaction to environments influences diverse behaviors [54]. (2) HCI research
on Web design. In particular we refer to usability, which had been the focus of re-
search on Web site design [44], and to the aesthetics qualities of Web sites, a growing
area of interest among HCI researchers [50].
The basic environmental psychology model of Mehrabian and Russell (M-R) is de-
picted in Figure 1. It posits that all stimuli are perceived in terms of their affective
qualities. Those emotional states can be reduced into a set of basic emotions or into
underlying dimensions of emotions [55]. Mehrabian and Russell [13] proposed three
such bipolar dimensions, abbreviated PAD: (1) Pleasure, referring to the degree to
which a person feels happy or satisfied in a place; (2) Arousal, concerning the degree
of stimulation caused by an atmosphere; and (3) Dominance, indicating the degree to
Affect as a Mediator between Web-Store Design and Consumers’ Attitudes 145
which a person feels that he or she has influence over his/her surroundings and is in
control of a situation. Subsequently, the person’s emotional state influences her atti-
tudes towards the environment, framed as an “approach avoidance” response.
This model and its derivatives have been used extensively in the study of physical
retail environments [18], [56], [34], [57], [58], [29], [59]. However, only a few em-
pirical studies (e.g. [60], [61], [41], [42]) have (partially) employed this model in a
virtual shopping context. These studies have demonstrated the applicability of the
model to the online environment in some concrete contexts. Yet, the relations be-
tween design attributes of online stores and consumers’ affective states remain largely
unexplored. The proposed model augments the above studies by offering a systematic
and theory-driven treatment of HCI design variables that affect store atmospherics
and emotions. In particular, it introduces perceived aesthetics as a major aspect of
online retail environments. In addition, the model suggests that given the importance
of usability in Web-store design all three aspects of the PAD model should be re-
tained, as opposed to the tendency in past research to ignore the dominance dimension
(see more detail in Section 3.2.2).
The model proposed in this study suggests that perceptions of the e-retail environment
induce certain emotional states in the consumer, which in turn affect his/her attitudes
toward the store (see Figure 2). In this model, perceptions of the environment (i.e., the
internet store) are represented by perceptions of the two central design concepts: us-
ability and aesthetics. The induced emotional states are based on the environmental
psychology model of Mehrabian and Russell, and the approach-avoidance element of
the M-R model is reflected in the users’ attitudes towards the store.
Fig. 2. The Proposed Model. Arrows indicate relations between constructs. The corresponding
proposition numbers are indicated along the paths.
Usability has been the flagship concept of HCI for the last two decades [63]. It strives
to focus the design and evaluation activities around users’ ability to accomplish their
tasks efficiently. It plays a central role in design recommendations for e-retail [44], [12].
While the usability of systems can be captured both objectively and subjectively (i.e., by
self reports), the latter method appears more viable in practical settings. As such, usabil-
ity can be viewed as the equivalence of the ease-of use construct in the technology ac-
ceptance model (TAM, cf. [64]) – a subjective measure that reflects users’ perceptions
of the degree to which the system facilitates easy interaction.
Recent HCI research, however, suggests that in addition to usability, aesthetics also
serve a major role in affecting users’ perceptions of Web sites. This argument goes
hand in hand with a research tradition that regards aesthetics as a key factor in how
people perceive their environment (cf. [10]). Lavie and Tractinsky [50] suggested that
the aesthetic evaluation of Web pages can be divided into two dimensions: classical
aesthetics and expressive aesthetics. Classical aesthetics represents the degree to
which the design is clean and balanced. Expressive aesthetics relate to the creativity
and innovativeness of the design. The advantage of this componential view is that it
allows researchers to tease out finer details about how aesthetics relate to users’ per-
ceptions of other attributes of the Web site; how Web site design relates to users’
emotions; and how it influences their attitudes towards the site. Thus, it was found
that classical aesthetics is strongly correlated with evaluations of usability; that both
aesthetic dimensions contributed to user satisfaction [50]; and that expressive aesthet-
ics is a better determinant of immediate attractiveness impressions of Web sites [65].
Due to the documented role of emotions in consumer behavior, one of the important
consequences of aesthetics in design is its effect on our emotions (e.g., [66], [67], [68].
There are several reasons for this effect, but perhaps the most obvious is that first aes-
thetic impressions are formed immediately (e.g., [69], [70], [68], [71], [65]. Those first
impressions may linger and color subsequent evaluations of objects. Thus, to a large
extent, aesthetics sets the tone for the rest of the interaction.
Affect as a Mediator between Web-Store Design and Consumers’ Attitudes 147
store site for a longer period of time, greater willingness to buy from the store, and a
better chance of actual purchase and loyalty. An avoidance response would mean just
the opposite. Operationally, the literature refers to attitudes as a surrogate to ap-
proach/avoidance reactions (e.g., [83], [40]).
3.2.4 Propositions
Our model proposes a set of relationships between the model variables. These rela-
tionships are depicted in Figure 2 and are further described below. The results of a
small-scale study [84] provided considerable support for the model.
Almost by definition, aesthetics entail pleasure (cf. The American Heritage Dic-
tionary of the English Language). Indeed, [50] found strong positive correlations be-
tween both aesthetic dimensions and pleasure. Based on this, we propose that:
P1: Web-stores perceived as having higher levels of classical aesthetics induce higher
levels of pleasure.
P2: Web-stores perceived as having higher levels of expressive aesthetics induce higher
levels of pleasure.
Expressive aesthetics emphasize novel and creative designs [50], which are ex-
pected to correlate positively with arousal (e.g., [69]). Classical aesthetics, on the
other hand, adheres to familiar and accepted notions of design, and is expected to
have a calming effect on the senses. Hence, not only is it not expected to increase
arousal - it might even reduce it.
P3: Web-stores perceived as having higher levels of expressive aesthetics induce higher
levels of arousal.
P4: Web-stores perceived as having higher levels of classical aesthetics induce lower
levels of arousal.
The HCI literature emphasizes the importance of allowing the user to be in control
of the technological environment (e.g., [80], [81]). Users’ perceptions of control over
the interaction and of the likelihood of achieving their goals influence their affective
states [85]. In addition, smoother interactions facilitated by better usability are likely
to increase pleasure, whereas lower levels of usability increase frustration [85] and
thus reduce pleasure. Hence, we hypothesize that:
P5: Web-stores perceived as being more usable induce higher levels of perceived
dominance.
P6: Web-stores perceived as more usable induce higher levels of pleasure.
Previous studies have found that perceptions of systems’ usability and aesthetics
might be related [86], [48], [87]. This is especially the case concerning the relations
between usability and classical aesthetics [50]. Thus:
P7: Perceptions of Web-store’s classical aesthetics and perceived usability will be
positively correlated.
Consumers who are more pleased and who feel more in control of their environ-
ment, are likely to exhibit greater approach, rather than avoidance, tendencies. For
example, [88] findings show that perceived control and shopping enjoyment can in-
crease the intention of new Web customers to return. The relations between arousal
Affect as a Mediator between Web-Store Design and Consumers’ Attitudes 149
4 Summary
Donovan et al. [34] suggest that one of the most important directions in store envi-
ronment research is to determine what constitutes a pleasant environment and how to
implement such an environment. The retail world has changed considerably since
then. HCI became a central discipline in reshaping much of the retail activity and in
determining the affective consequences of store design. To account for these changes
we have proposed a research model that focuses on the role of emotions in online
shopping. The model suggests nine testable propositions regarding the antecedents and
the consequences of emotions in the specific context of Web – based retail. The model
argues that e-store atmospherics, which is described in terms of three design attributes –
the site’s usability, classical aesthetic and expressive aesthetic – influences users’
emotional states. These states, then, influence approach/avoidance response towards the
site. The model has theoretical and practical implications. Theoretically, it contributes to
our understanding of virtual environments in general and in the e–retail domain in par-
ticular, and of the associations between design, emotion, and attitudes. Practically, it can
help e-retailers and Web site designers in general to focus their efforts on concrete de-
sign objectives based on better understanding of the relations between design attributes
and consumer behavior.
References
1. Barwise, P., Elberse, A., Hammond, K.: Marketing and the Internet. In: Weitz, B., Wensley,
R. (eds.) Handbook of Marketing. Sage Publishing, London (2002)
2. Bloch, P.H.: Seeking the ideal form: product design and consumer response. Journal of Mar-
keting 59, 16–29 (1995)
3. Kotler, P., Rath, A.G.: Design a powerful but neglected strategic tool. Journal of Business
Strategy 5, 16–21 (1984)
4. Whitney, D.E.: Manufacturing by design. In: Harvard Business Review, July–August, pp.
83–91 (1988)
5. Russo, J.E.: The Value of Unit Price Information. Journal of Marketing Research 14, 193–201
(1977)
6. Levy, M., Weitz, B.A.: Retailing Management, 3rd edn. Irwin/McGraw-Hill, Boston (1998)
150 T. Porat and N. Tractinsky
7. Hoyer, W.D., MacInnis, D.J.: Consumer Behavior, 2nd edn. Houghton Mifflin, Boston (2001)
8. Schroeder, J.E.: Visual Consumption. Routledge, London (2002)
9. Norman, D.A.: Emotion and design: attractive things work better. Interactions, 36–42 (July–
August 2002)
10. Tractinsky, N.: Towards the Study of Aesthetics in Information Technology. In: Proceedings
of the 25th Annual International Conference on Information Systems (ICIS), Washington,
DC, December 12–15, pp. 771–780 (2004)
11. Vergo, J., Noronha, S., Kramer, J., Lechner, J., Cofino, T.: E-Commerce Interface Design. In:
Jacko, J.A., Sears, A. (eds.) The Human-Computer Interaction Handbook, Lawrence Erlbaum
Associates, New Jersey (2003)
12. Nah, F., Davis, S.A.: HCI Research Issues in Electronic Commerce. Journal of Electronic
Commerce Research 3(3), 98–113 (2002)
13. Mehrabian, A., Russell, J.A.: An approach to Environmental Psychology. MIT Press, Cam-
bridge (1974)
14. Bagozzi, R.P., Gopinath, M., Nyer, P.U.: The Role of Emotions in Marketing. Journal of the
Academy of Marketing Science 27(2), 184–206 (1999)
15. Isen, A.M.: An influence of positive affect on decision making in complex situations: theo-
retical issues with practical implications. Journal of Consumer Psychology 11(2), 75–85
(2001)
16. Martineau, P.: The personality of the retail store. Harvard Business Review 36, 47–55 (1958)
17. Kotler, P.: Atmosphere as a marketing tool. Journal of Retailing 49(4), 48–64 (1973-1974)
18. Donovan, R.J., Rossiter, J.R.: Store atmosphere: An experimental psychology approach.
Journal of Retailing 58, 34–57 (1982)
19. Nagamachi, M.: Kansei engineering as a powerful consumer-oriented technology for product
development. Applied Ergonomics 33(3), 289–294 (2002)
20. Michon, R., Chebat, J.-C., Turley, L.W.: Mall Atmospherics: the Interaction Effects of the
Mall Environment on Shopping Behavior. Journal of Business Research 58, 576–583 (2005)
21. Guiry, M., Magi, A.W., Lutz, R.J.: Defining and measuring recreational shopper identity.
Journal of academy of Marketing Science 34, 74–83 (2006)
22. Baker, J., Grewal, D., Levy, M.: An Experimental Approach to Making Retail Store Envi-
ronmental Decisions. Journal of Retailing 68, 445–460 (1992)
23. Bitner, M.J.: Servicecapes: The impact of the physical surroundings on customers and
employees. Journal of Marketing 56, 57–71 (1992)
24. Darden, W.R., Babin, B.J.: Exploring the concept of affective quality: Expanding the concept
of retail personality. Journal of Business Research 29, 101–109 (1994)
25. Spangenberg, E.R., Grohmann, B., Sprott, D.E.: It’s beginning to smell (and sound) a lot like
Christmas: the interactive effects of ambient scent and music in a retail setting. Journal of
Business Research 58, 1583–1589 (2005)
26. Gardner, M.P.: Does Attitude Toward the Ad Affect Brand Attitude Under a Brand Evalua-
tion Set? Journal of Marketing Research 22, 192–198 (1985)
27. Gardner, M.P., Hill, R.: The Buying Process: Effects Of and On Consumer Mood States. In:
Wallendorf, M., Anderson, P. (eds.) Advances in Consumer Research, vol. 14, pp. 408–410.
Association for Consumer Research, Ann Arbor (1987)
28. Grewal, D., Baker, J.: Do retail store environmental factors affect consumers’ price acceptabil-
ity? An empirical examination. International Journal of Research in Marketing 11, 107–115
(1994)
29. Babin, B.J., Attaway, J.S.: Atmospheric Affect as a Tool for Creating Value and Gaining
Share of Customer. Journal of Business Research 49, 91–99 (2000)
Affect as a Mediator between Web-Store Design and Consumers’ Attitudes 151
30. Dawson, S., Bloch, P.H., Ridgway, N.M.: Shopping Motives, Emotional States and Retail
Outcomes. Journal of Retailing 60, 408–427 (1990)
31. Swinyard, W.R.: The Effects of Mood, Involvement, and Quality of Store Experience on
Shopping Intensions. Journal of Consumer Research 20, 271–280 (1993)
32. Yoo, C., Jonghee, P., Maclnnis, D.J.: Effects of Store Characteristics and In-Store Emotional
Experiences on Store Attitude. Journal of Business Research 42, 253–263 (1998)
33. Kluger, A.N., Rafaeli, A.: Affective Reactions to Physical Appearance. In: Ashkanasy, N.,
Hartel, C.E.J., Zerbe, W.J. (eds.) Emotions and organizational life, Greenwood Publishing
Group, Westport (2000)
34. Donovan, R.J., Rossister, J.R., Marcoolyn, G., Nesdale, A.: Store Atmosphere and Purchas-
ing Behavior. Journal of Retailing 70(3), 283–294 (1994)
35. Levin, A.M., Levin, I.R., Heath, C.E.: Product Category Dependent Consumer Preferences
for Online and Offline Shopping Features and Their Influence on Multi-Channel Retail
Alliances. Journal of Electronic Commerce Research 4(3), 85–93 (2003)
36. Tractinsky, N., Rao, V.S.: Social Dimensions of Internet Shopping: Theory-Based Argu-
ments for Web-Store Design. Human Systems Management 20, 105–121 (2001)
37. Kim, J., Yoo, B.: Toward the optimal link structure of the cyber shopping mall. International
Journal of Human-Computer Studies 52, 531–551 (2000)
38. Kim, J., Lee, J., Choi, D.: Designing Emotionally Evocative Homepages: An Empirical
Study of the Quantitative Relations between Design Factors And Emotional Dimensions. In-
ternational Journal of Human-Computer Studies 59, 899–940 (2003)
39. Menon, S., Kahn, B.: Cross-category effects of induced arousal and pleasure on the Internet
shopping experience. Journal of Retailing 78, 31–40 (2002)
40. Richard, M.-O.: Modeling the impact of internet atmospherics on surfer behavior. Journal of
Business Research 58, 1632–1642 (2005)
41. Fiore, A.N., Jin, H.-J., Kim, J.: For fun and profit: Hedonic value from image interactivity
and responses toward an online store. Psychology and Marketing 22(8), 669–694 (1999)
42. Mummalaneni, V.: An empirical investigation of Web site characteristics, consumer emo-
tional states and on-line shopping behaviors. Journal of Business Research 58, 526–532
(2005)
43. Spool, J.M., Scanlon, T., Schroeder, W., Snyder, C., DeAngelo, T.: Web Site Usability: A
Designer’s Guide. Morgan Kaufmann, San Francisco (1998)
44. Nielsen, J.: Designing Web Usability: The Practice of Simplicity. New Riders Publishing
(2000)
45. Li, N., Zhang, P.: Toward E-Commerce Website Evaluation and Use: An Affective Perspec-
tive. In: Post-ICIS 2005 JAIS Theory Development Workshop, Las Vegas, NV (2005)
46. Schenkman, B.N., Jonsson, F.U.: Aesthetics and preferences of web pages. Behavior and
Information Technology 19(5), 367–377 (2000)
47. van der Heijden, H.: Factors influencing the usage of websites: the case of a generic portal in
the Netherlands. Information and Management 40, 541–549 (2003)
48. Lindgaard, G., Dudek, C.: What Is This Evasive Beast We Call User Satisfaction? Interacting
with Computers 15, 429–452 (2003)
49. Gorn, G.J., Chattopadhyay, A., Sengupta, J., Tripathi, S.: Waiting for the Web: How screen
color affects time perception. Journal of Marketing Research 41, 215–225 (2004)
50. Lavie, T., Tractinsky, N.: Assessing Dimensions of Perceived Visual Aesthetics of Web
Sites. International Journal of Human-Computer Studies 60(3), 269–298 (2004)
51. Song, J., Zahedi, F.: A theoretical approach to Web design in e-commerce: A Belief Rein-
forcement Model. Management Science 51(8), 1219–1235 (2005)
152 T. Porat and N. Tractinsky
52. Palmer, J.W.: Web Site Usability, Design, and Performance Metrics. Information Systems
Research 13(2), 151–167 (2002)
53. Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. Journal of Re-
search in Personality 11, 273–294 (1977)
54. Russell, J.A., Pratt, G.: A description of the affective quality attributed to environments.
Journal of Personality and Social Psychology 38(2), 311–322 (1980)
55. Lichtenstein, A., Oehme, A., Kupschick, S., Jürgensohn, T.: Comparing Two Emotion Mod-
els for Deriving Affective States from Physiological Data. In: Peter, C., Beale, R. (eds.)
Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg
(2008)
56. Bellizzi, J.A., Hite, R.E.: Environmental Color, Consumer Feelings and Purchase Likelihood.
Psychology and Marketing 9, 347–363 (1992)
57. Chebat, J.C., Gelinas-Chebat, C., Vaninski, A., Filiatrault, P.: The Impact of Mood on Time
Perception, Memorization, and Acceptance of Waiting. Genetic, Social, and General Psy-
chology Monographs 121(4), 411–424 (1995)
58. Sherman, E., Mathur, A., Smith, R.B.: Store Environment and Consumer Purchase Behavior:
Mediating Role of Consumer Emotions. Psychology & Marketing 14(4), 361–378 (1997)
59. Turley, L.W., Milliman, R.E.: Atmospherics effects on shopping behavior: a review of the
experimental evidence. Journal of Business Research 49, 193–211 (2000)
60. Eroglu, S.A., Machleit, K.A., Davis, L.M.: Empirical Testing of a Model of Online Store
Atmospherics and Shopper Responses. Psychology and Marketing 20(2), 139–150 (2003)
61. Huang, M.-H.: Modeling virtual exploratory and shopping dynamics: an environmental psy-
chology approach. Information & Management 41, 39–47 (2003)
62. Lim, Y.-K., Donaldson, J., Jung, H., Kunz, B., Royer, D., Ramalingam, S., Thirumaran, S.,
Stolterman, E.: Emotional Experience and Interaction Design. In: Peter, C., Beale, R. (eds.)
Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg
(2008)
63. Butler, K.A.: Usability engineering turns 10. Interactions 3(1), 59–75 (1996)
64. Davis, F.D.: Perceived usefulness, perceived ease-of-use, and user acceptance of information
technology. MIS Quarterly, 319–340 (September 1989)
65. Tractinsky, N., Cokhavi, A., Kirschenbaum, M., Sharfi, T.: Evaluating the Consistency of
Immediate Aesthetic Perceptions of Web Pages. International Journal of Human-Computer
Studies 64(11), 1071–1083 (2006)
66. Desmet, P.M.A., Hekkert, P.P.M.: The basis of product emotions. In: Green, W.S., Jordan,
P.W. (eds.) Pleasure with Products: Beyond Usability, Taylor and Francis, Abington (2002)
67. Rafaeli, A., Vilnai-Yavetz, I.: Discerning organizational boundaries through physical
artifacts. In: Paulsen, N., Hernes, T. (eds.) Managing Boundaries in Organizations: Multiple
Perspectives, Palgrave. Macmillan, Basingstoke, Hampshire, UK (2003)
68. Norman, D.A.: Emotional Design: Why We Love (or Hate) Everyday Things. Basic Books,
New York (2004)
69. Berlyne, D.E.: Aesthetics and Psychobiology. Appleton-Century-Crofts, New York (1971)
70. Zajonc, R.B., Markus, H.: Affective and cognitive factors in preferences. Journal of Con-
sumer Research 9(2), 123–131 (1982)
71. Lindgaard, G., Fernandes, G.J., Dudek, C., Brown, J.: Attention web designers: You have 50
milliseconds to make a good first impression! Behaviour and Information Technology 25(2),
115–126 (2006)
72. Russell, J.A.: Core affect and the psychological construction of emotion. Psychological Re-
view 110(1), 145–172 (2003)
Affect as a Mediator between Web-Store Design and Consumers’ Attitudes 153
73. Green, D.P., Goldman, S.L., Salovey, P.: Measurement error masks bipolarity in affect rat-
ings. Journal of Personality and Social Psychology 64, 1029–1041 (1993)
74. Mehrabian, A.: Framework for a comprehensive description and measurement of emotional
states. Genetic, Social, and General Psychology Monographs 121, 339–361 (1995)
75. Reisenzein, R.: Pleasure-activation theory and the intensity of emotions. Journal of Personal-
ity and Social Psychology 67, 525–539 (1994)
76. Feldman Barrett, L., Russell, J.A.: Independence and bipolarity in the structure of affect.
Journal of Personality and Social Psychology 74, 967–984 (1998)
77. Yik, M.S.M., Russell, J.A., Feldman Barrett, L.: Structure of self-reported current affect: In-
tegration and Beyond. Journal of Personality and Social Psychology 77(3), 600–619 (1999)
78. Feldman Barrett, L., Russell, J.A.: Structure of current affect. Current Directions in Psycho-
logical Science 8, 10–14 (1999)
79. Brengman, M., Geuens, M.: The four dimensional impact of color on shoppers’ emotions.
Working Paper. Ghent University, Belgium (2003)
80. Brown, J.S.: From cognitive to social ergonomics and beyond. In: Norman, D.A., Draper,
S.W. (eds.) User Centered System Design, Lawrence Erlbaum, London (1986)
81. Shneiderman, B.: Designing the User Interface: Strategies for Effective Human-Computer
Interaction, 3rd edn. Addison Wesley Longman Inc., Amsterdam (1998)
82. Mehrabian, A., Wihardja, C., Ljunggren, E.: Emotional correlates of preferences for
situation-activity combinations in everyday life. Genetic, Social, and General Psychology
Monographs 123, 461–477 (1997)
83. Yalch, R.F., Spangenberg, E.: An Environmental Psychological Study of Foreground and
Background Music as Retail Atmospheric Factors. In: Walle, A.W. (ed.) AMA Educators’
Conference Proceedings, vol. 54, pp. 106–110. American Marketing Association, Chicago
(1988)
84. Porat, T., Tractinsky, N.: The effects of perceived web-store characteristics on consumers’ af-
fective states and attitudes towards the store. In: Fifth Conference on Design&Emotion 2006,
Gothenburg, Sweden, September 27-29 (2006)
85. Brave, S., Nass, C.: Emotion in human-computer interaction. In: Jacko, J., Sears, A. (eds.)
Handbook of Human-Computer Interaction, Lawrence Erlbaum Associates, Mahwah (2003)
86. Tractinsky, N., Shoval-Katz, A., Ikar, D.: What is beautiful is usable. Interacting with Com-
puters 13, 127–145 (2000)
87. Hartmann, J., Sutcliffe, A., DeAngeli, A.: Investigating attractiveness in web user interfaces.
In: Proceedings of ACM CHI 2007 Conference on Human Factors in Computing Systems
2007, pp. 387–396 (2007)
88. Koufaris, M., Kambil, A., LaBarbera, P.A.: Consumer behavior in Web-based commerce: an
empirical study. International Journal of Electronic Commerce 6(2), 115–138 (2002)
Beyond Task Completion in the Workplace:
Execute, Engage, Evolve, Expand
Abstract. Research on user experience stresses the need to go beyond the mere
accomplishment of behavioral goals, such as providing novelty and change, a
compelling image, etc. Although it can be argued that the consideration of such
non-instrumental aspects at the workplace will lead to joyous experiences,
which in turn may be an important aspect of employee well-being, the offered
link between current models of user experience and the work environment is
rather weak. This paper presents a user experience model more closely linked to
behavioral goals; that is tasks, their generation, and completion. We describe
the e4-model consisting of four groups of desired behavioral outcomes – exe-
cute, engage, evolve, expand – and its underlying motivation. e4 does address
not only task completion, but also persistence in task execution, modification of
existing tasks, and the creation of novel tasks.
1 Introduction
User experience (UX) research aims at extending the view on Human-Computer Inter-
action (HCI) beyond the mere focus on the effective and efficient accomplishment of
behavioral goals, and thus the functionality and usability of interactive products [1].
Most available approaches (e.g., [2, 3]) emphasize the importance of taking all human
needs and emotions into account when designing and evaluating technology use.
Industry got the message, too. Nokia [4], for example, explains: “Before mobility
was about the delivery of products and services. Now it is about the delivery of per-
sonalized experiences”, which add value and pleasure to daily lives. Their model of
“Wow, flow, and show” acknowledges the importance of utility and usability (flow),
but also emphasizes the need for novelty (wow) and self-expression (show). The latter
are clearly non-utilitarian aspects. In other words, utility and usability is regarded as
state of the art, a given, but no longer sufficient for an outstanding, positive, even
joyous experience with a product [5, 6].
In the context of consumer products, such as mobile phones, the importance of ad-
dressing aspects beyond the mere functional is obvious. But how about the workplace?
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 154–162, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Beyond Task Completion in the Workplace: Execute, Engage, Evolve, Expand 155
From the employer’s view, work is not primarily about satisfying human needs – there
are jobs to be done. In contrast, one may argue that work performance will benefit from
a more holistic consideration of human needs and that workplace technology can be a
medium to facilitate this. However, in current approaches to user experience, effects on
work or task performance are – if discussed – rather indirect (e.g., “attractive things
work better”, [7]). The present paper’s objective is to take a closer look at user experi-
ence from a workplace perspective.
environment to foster and sustain intrinsic motivation seems key to a positive technol-
ogy experience.
Being intrinsically motivated may have effects on performing at work in several
ways: It may facilitate executing assigned tasks and pursuing the plans to accomplish
these tasks, perfecting plans and even exceeding the task goals, and expanding the
usually predefined subordinate goals to serve the overall working goal by generating
new tasks:
− Task Accomplishment. First of all, motivation helps simply to get the work done.
Parts of the daily work are recurring, routine tasks, which can be dealt with in the
similar recurring manner. Usually, people develop fixed schemes for these tasks
[15, 16], which renders them unchallenging, dull and monotonous at times.
− Modification of Tasks. For non-recurring tasks, fixed schemes can’t be used, but a
concept of the desired outcome (goals) and of the way to achieve this outcome
(task, plan) has to be generated (e.g., [17, 15]). Even if the “main” goal is exter-
nally given, workers have to determine a plan to pursue the goal. The quality of
this plan determines the required effort for its execution and the quality of the re-
sults [16, 18]. Devising a high-quality plan requires motivation and consideration
of possible alternatives.
− Creation of Novel Tasks. Sometimes even the goal itself has to be specified [19].
Goals are considered to be hierarchically organized with a few higher-level goals
unfolding in lower-level goals (e.g., [20, 15]). In the work context, people know
their higher-level goal, i.e., the overall work goal. A marketing department is, for
example, expected to see after the brand image and enhance sales figures, but the
subordinate goals are not necessarily as specified. Occasionally, it is necessary to
rethink previous methods to achieve the overall goal and create new methods and
subordinate goals (e.g., [16]). This is where motivation is called for once again, as
old routines are to be overcome and additional effort is required to constantly en-
hance the work quality.
To summarize, work does not only consist of the execution of given tasks in a
specified way. The motivation to actually pursue goals, to modify existing goals,
plans and tasks or to even create new ones could be facilitated by technology, too. In
addition to merely providing functionality, interactive products for the workplace may
attempt to motivate their users.
One approach to formulate a model of user experience for work environments based
on motivation is to look at desired behavioral outcomes. The advantage is the possible
application to a broad range of workplace products. Desired behaviors, such as play-
ing around with the interactive product given some free time, using the product even
if the goal is already accomplished to make it even better, or finding novel uses not even
intended by the product developers, hold for many software systems and interactive
products. The disadvantage of this approach is that it merely defines desired behaviors,
but does not actually prescribe the product attributes necessary to generate this behavior.
Beyond Task Completion in the Workplace: Execute, Engage, Evolve, Expand 157
3.1 Execute
3.2 Engage
However, besides the mere execution of tasks, a workplace product must engage its
users. Being motivated facilitates good performance at work in several ways as
pointed out above. Engagement will be a quality in itself and may additionally sup-
port the other aspects of e4, execute, evolve and expand. The users should simply like
to use their tools, and look forward to performing upcoming tasks. Engage – in our
sense – summarizes situations, where people put additional effort into tasks, which
would otherwise be avoided or finished with the least possible effort (and correspond-
ing results (see [7])).
Some studies show that “perceived enjoyment” while working with a software can
lead to more usage, often even in spite of usability problems (e.g., [22, 23]). With en-
gaging products, users will more easily master their tools. In addition, employees might
use their time more efficiently because of being harder to distract. They might engage
less in alternative activities – a typical avoidance behavior, when being unmotivated.
Isen, Rosenzweig and Young [24], for example, found that positive affect let stu-
dents perform as well as a neutral control group. Nevertheless, they reached their
decision earlier and, thus, achieved their goal more efficiently. If arbitrarily induced
affect can have such effects on task performance, the impact of affect deliberately
induced by a product while being used might lead to similar effects on tasks.
3.3 Evolve
In addition to engagement, users should evolve their work itself; that is, modifying
their tasks by discovering functions and possibilities they might not have been aware
of so far. Putting those functions to use will inevitably alter tasks [25]. As tasks can
be accomplished in several ways, the best way has to be found, in terms of efficiency
and quality.
Interactive products are bloated with functions and users may have a hard time of
becoming familiar with all of them. As a result, they will hardly ever use some of
158 S. Harbich and M. Hassenzahl
them, although they might be valuable for several tasks. But knowing those functions
and the opportunities they provide is desirable, nonetheless. In a study of Baecker,
Booth, Jovicic, McGrenere and Moore [26], the majority of users correspondingly
expressed their desire for continually discovering new functions instead of entirely
removing unused functions or having them tucked away. Paul [27] compared novice
and expert users and found that the latter stand out due to their habit of exploring new
systems freely in order to become acquainted with them. Facilitating exploration of a
product’s functions will lead users to tap the full potential of their products and there-
fore evolve their work quality. In other words, a workplace product, which supports
evolvement, will suggest ways to restructure and modify tasks. Note that a product
must present unknown functions as opportunities for action, without distracting from
currently active tasks.
3.4 Expand
Knowing a workplace product well and interacting with it in a playful way can lead to
a fourth outcome, namely to expanding.
In Isen et al’s study [24], those subjects with positive affect not only were more ef-
ficient in executing their tasks, but even went beyond their assigned tasks and per-
formed additional tasks as well. Going beyond the assigned task like this will add to
the overall work goal. Users should be supported in defining new superordinate goals
and expanding their scope in general to better accomplish their overall goal (see
Erschließungsplanung, [16])1.
This implies going beyond the product’s original scope, using it for purposes not
even intended by the developers. Take the VisiCalc, the first spreadsheet software as an
example. In an interview with Dan Bricklin, one of the inventors of VisiCalc [28], he
replies to the statement that early stories about VisiCalc had a hard time describing it
with: “You can’t describe some of these things. Until you’re actually immersed in a
certain technology and using it and seeing how the public uses it, you don’t necessarily
understand it. Some people don’t understand why instant messaging has taken off so
much among certain parts of the population. That was true for the spreadsheet, which
seems so obvious now.” Indeed, the spreadsheet is now used for a number of things,
Bricklin hadn’t thought about before. Expanding will surely be difficult to realize for
many products, as some are designed to fulfill rather strict requirements, not offering the
necessary “openness” for the invention of novel uses and tasks. Nevertheless, we argue
that users should be able to use a product creatively, even if this means outwitting the
software. This will lead to a feeling of autonomy and competence.
We believe that workplace products should be designed in a way to facilitate be-
haviors summarized by the four aspects of e4. People must be enabled to execute their
daily tasks and must be engaged to show more persistence in executing even not so
interesting tasks. In addition, a good workplace product will lend itself to modify
tasks oneself or even to create completely novel tasks. See the following example for
the expected effects of a well designed workplace product on the daily work and
Table 1 for a summary of the four aspects of e4:
1
Creation of new scopes of action (translated by the author).
Beyond Task Completion in the Workplace: Execute, Engage, Evolve, Expand 159
“Mr. Smith got a new spreadsheet program three months ago. Compared to his old
program, the weekly analysis of the sales figures was much quicker (execute). As he
loves to work with this new tool, he even put extra effort in the already completed
analysis to make it even more clearly and conveniently arranged (engage). For this
purpose he used a lot of functions, he hadn’t discovered in his former program
(although they were available). This led to a general change in the way he approaches
the analysis and what his boss expected from him (evolve). Being well versed, he
even found a way to automatically remind sales representatives to send their figures in
time (expand).”
4 Evaluation
scales on a general product evaluation (as “good”) showed execute to explain 39% of
the total variance. The inclusion of engage, evolve and expand added another signifi-
cant 10%. Given “desirable” as product evaluation, execute explained only 18% and
the remaining scales added another 22%. In other words, although execution of tasks
is the primary source for a good product evaluation in the work context, motivational
aspects as addressed by engage, evolve or expand add substantially to the product’s
appeal. Asked about desirability, task execution and motivational aspects were even
seen as equal sources.
5 Conclusion
The formulation of a model of desired behavioral outcomes that go beyond mere task
completion is a necessary first step towards the notion of user experience at the work-
place and the corresponding desired qualities in interactive products. e4 differs from
available usability questionnaires and models by addressing not only the execution of
tasks but also the modification and creation of plans – in short: motivation. We as-
sume all four aspects of e4 considered in a product to not only lead to a more enjoy-
able working experience – they comprise behaviors desired in any workplace that
takes human capabilities seriously. In this sense, well-designed workplace products
may not only be a source of employee subjective well-being, but will have broader
effects on the quality of work itself.
To design for these qualities is challenging. It requires new approaches and empiri-
cal studies (see [31]). We are sure, however, that it can be done – and that HCI will be
key in promoting more enjoyable, more motivating, more valuable interactive prod-
ucts for the workplace by addressing aspects beyond the mere utility and usability.
References
1. Hassenzahl, M., Tractinsky, N.: User Experience - a research agenda. Behaviour & Informa-
tion Technology 25(2), 91–97 (2006)
2. Hassenzahl, M.: The Thing and I: Understanding the Relationship Between User and Prod-
uct. In: Blythe, M.A., Overbeeke, K., Monk, A.F., Wright, P.C. (eds.) Funology: From
Usability to Enjoyment, pp. 31–42. Kluwer Academic Publishers, Dordrecht (2003)
3. McCarthy, J., Wright, P., Wallace, J., Dearden, A.: The Experience of Enchantment in
Human-Computer Interaction. In: Online Proceedings of CHI Fringe (2004), Retrieved 25
April 2006, http://www.shu.ac.uk/schools/cs/cri/adrc/research2/
enchantment.pdf
4. Nokia Corporation. Inspired Human Technology (2005), http://www.nokia.com/
NOKIA_COM_1/About_Nokia/Press/White_Papers/pdf_files/backgroun
der_inspired_human_technology.pdf
5. Hassenzahl, M., Beu, A., Burmester, M.: Engineering Joy. IEEE Software 1&2, 70–76
(2001)
6. Jordan, P.: Designing pleasurable products. Taylor & Francis, London (2000)
7. Norman, D.A.: Emotional Design: Why We Love (or Hate) Everyday Things. Basic Books,
New York (2004)
Beyond Task Completion in the Workplace: Execute, Engage, Evolve, Expand 161
8. Draper, S.W.: Analysing fun as a candidate software requirement. Personal Technology 3(1),
1–6 (1999)
9. Heckhausen, H.: Entwurf einer Psychologie des Spielens. In: Flitner, A. (ed.) Das Kinder-
spiel. Piper, München, pp. 138–155 (1978)
10. Malone, T.W.: Heuristics for Designing Enjoyable User Interfaces: Lessons from Computer
Games. In: Thomas, J.C., Schneider, M.L. (eds.) Human Factors in Computer Systems,
Ablex, Norwood, NJ, pp. 1–12 (1982)
11. Ryan, R.M., Deci, E.L.: Self-Determination Theory and the Facilitation of Intrinsic Motiva-
tion, Social Development, and Well-Being. American Psychologist 55(1), 68–78 (2000)
12. Hassenzahl, M.: Hedonic, emotional, and experiential perspectives on product quality. In:
Ghaoui, C. (ed.) Encyclopedia of Human Computer Interaction, pp. 266–272. Idea Group
(2006)
13. Baard, P.P., Deci, E.L., Ryan, R.M.: Intrinsic Need Satisfaction: A Motivational Basis of
Performance and Well-Being in Two Work Settings. Journal of Applied Social Psychol-
ogy 34(10), 2045–2068 (2004)
14. Gagné, M., Deci, E.L.: Self-determination theory and work motivation. Journal of Organiza-
tional Behavior 26, 331–362 (2005)
15. Miller, G.A., Galanter, E., Pribram, K.H.: Plans and the structure of behavior, Holt, Rinehart
and Winston, London (1970)
16. Oesterreich, R.: Handlungsregulation und Kontrolle. Urban & Schwarzenberg, München,
Wien (1981)
17. Cropanzano, R., Citera, M., Howes, J.: Goal Hierarchies and Plan Revision. Motivation and
Emotion 19(2), 77–98 (1995)
18. Stock, J., Cervone, D.: Proximal goal-setting and self-regulatory processes. Cognitive Ther-
apy and Research 14(5), 483–498 (1990)
19. Hacker, W.: Arbeitspsychologie. Psychische Regulation von Arbeitstätigkeiten, Huber, Bern,
Stuttgart, Toronto (1986)
20. Carver, C.S., Scheier, M.F.: On the Self-Regulation of Behavior. Cambridge University
Press, Cambridge (1998)
21. International Organization for Standardization ISO 9241: Ergonomic requirements for office
work with visual display terminals (VDTs) – Part 11: Guidance on usability (1998)
22. Davis, F.D., Bagozzi, R.P., Warshaw, P.R.: Extrinsic and Intrinsic Motivation to Use Com-
puters in the Workplace. Journal of Applied Social Psychology 22(14), 1111–1132 (1992)
23. Igbaria, M., Schiffman, S.J., Wieckowski, T.J.: The respective roles of perceived usefulness
and perceived fun in the acceptance of microcomputer technology. Behaviour & Information
Technology 13(6), 349–361 (1994)
24. Isen, A.M., Rosenzweig, A.S., Young, M.J.: The Influence of Positive Affect on Clinical
Problem Solving. Medical Decision Making 11, 221–227 (1991)
25. Carroll, J.M., Kellogg, W.A., Rosson, M.B.: The Task-Artifact Cycle. In: Carroll, J.M. (ed.)
Designing Interaction: Psychology at the Human-Computer Interface, pp. 74–102. Cam-
bridge University Press, Cambridge (1991)
26. Baecker, R., Booth, K., Jovicic, S., McGrenere, J., Moore, G.: Reducing the gap between
what users know and what they need to know. In: Proceedings on the 2000 conference on
Universal Usability, Arlington, Virginia, United States, pp. 17–23. ACM Press, New York
(2000)
27. Paul, H.: Exploratives Agieren. Ein Beitrag zur ergonomischen Gestaltung interaktiver Sys-
teme. Verlag Peter Lang, Frankfurt am Main (1995)
28. Bender, E.: Three Minutes: Godfathers of the Spreadsheet (2004), Retrieved 9/13/2007,
http://www.pcworld.com/article/id,116166-page,1/article.html
162 S. Harbich and M. Hassenzahl
29. Harbich, S., Hassenzahl, M.: Messung der Qualität von interaktiven Produkten für den Ar-
beitskontext: Ergebnisse zur Reliabilität und Validität des e4-Fragebogens. In: Prospektive
Gestaltung von Mensch-Technik-Interaktion. 7. Berliner Werkstatt Mensch-Maschine-Systeme,
VDI-Verlag, Düsseldorf (2007)
30. Harbich, S., Hassenzahl, M., Kinzel, K.: e4 – Ein neuer Ansatz zur Messung der Qualität in-
teraktiver Produkte für den Arbeitskontext. In: Gross, T. (ed.) Mensch & Computer 2007, pp.
39–48. Oldenbourg Wissenschaftsverlag, München (2007)
31. Kohler, K., Hassenzahl, M., Niebuhr, S.: Stay on the ball! An interaction pattern approach to
the engineering of motivation. In: Baranauskas, C., Palanque, P., Abascal, J., Barbosa, S.D.J.
(eds.) INTERACT 2007. LNCS, vol. 4663. Springer, Heidelberg (2007)
Simulated Emotion in Affective Embodied Agents
1 Introduction
An important strand of research that is often neglected in the field of affective com-
puting is that of how users respond to simulated displays of emotion. How do we
respond to synthetic displays of happiness, sadness, anger, frustration and fear? Can
we catch emotions from users? How do we respond to computer empathy and sympa-
thy? With interface designers increasingly incorporating emotion into their interfaces
through a variety of modalities (e.g. textual content, speech, video and facial expres-
sions in embodied agents), it is imperative that we understand in detail the impact of
simulated emotion on computer users.
A number of recent studies have investigated user responses to simulated emotion
through the use of embodied agents – screen-based entities that attempt to closely
simulate humans to make interactions with computers more natural and engaging [6].
Recent research into embodied agents has predominantly focused on their use in
situations where human-human relationships are important, such as teaching [5],
exercise and nutritional advisors [2, 7] and simulations [11]. Such agents have also
been used in computer games for a number of years [15] and are now widely utilized
in online virtual environments such as Second Life [20] and There.com [21].
Research into online virtual environments and games that utilize embodied entities
suggests that the social rules and norms that apply in human-human interaction also
apply in human-computer interactions (e.g. [23]). Therefore, our interactions with
embodied entities in virtual environments appear, to some extent, to mirror our social
interactions with others in the physical world. Numerous studies conducted in the last
decade within other computing domains also support this reported effect – that is,
interactions in both physical and virtual worlds are governed by similar social rules
and norms [19]. As emotional expression plays a pivotal role in human-human interac-
tions, the social nature of virtual environments and our interactions with embodied
agents strongly suggests that emotion will likely be of importance in HCI. As such,
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 163–174, 2008.
© Springer-Verlag Berlin Heidelberg 2008
164 C. Creed and R. Beale
2 Related Work
While there have been many studies that have focused on the simulation of emotion in
artificial intelligence (e.g. developing computational models of emotion) computer
graphics (e.g. real-time rendering of emotions), and HCI (e.g. examining the impact
of affective embodied agents), very few studies have explicitly investigated how we
respond to simulated displays of emotion. There have been a large number of studies
over last decade that involve an interaction with an affective agent, but emotion is
often not the primary area of investigation of such studies. In this section, we start
with an overview of more recent studies that have primarily focused on the effects of
simulated emotion on users. We then move on to identify and discuss a number of
general issues with research in this space.
through the use of speech bubbles. Results from the study found that subjects per-
ceived the agent which was empathetic toward them as more likeable, trustworthy,
supportive and caring than the agent which was not empathetic toward them.
Bickmore and Picard [2] found a similar effect when evaluating their embodied
exercise advisor “Laura” which attempted to build and maintain a relationship
with users over the period of a month. Laura used a number of relational strategies
that humans use to establish and maintain relationships, including expressions of
empathy, politeness, humor, appropriate forms of address, and discussion about the
relationship. Results from this study found that subjects generally perceived Laura
more positively when they interacted with the relational version as opposed to the
non-relational version (i.e. the condition where no relational strategies were used).
Fabri [10] examined the impact of simulated emotion by asking subjects to discuss
a moon survival scenario through the use of a “Virtual Messenger.” When using the
virtual messenger, subjects were represented by three dimensional animated avatars
and could see both their partner’s representation on the screen, and a small image of
their own representation. Subjects were represented by either an emotionally expres-
sive or unemotional avatar - results found that subjects who interacted with the emo-
tionally expressive avatar felt more involved in the task than those who interacted
with the unemotional avatar. However, in contrast to the studies above, subjects ap-
peared to enjoy the interaction with non-expressive avatar more.
Prendinger et al. [18] investigated the effect of simulated empathy on users through
measuring their galvanic skin response and blood pressure whilst playing a mathe-
matical game. One group of subjects interacted with an empathetic agent during the
game while another group interacted with a non-empathetic agent. In a number of the
questions asked during the game, a short delay was experienced by subjects – in
the empathy condition, the agent would be empathetic toward the user when this hap-
pened, while the other agent would do nothing. It was found that the incorporation of
empathy helped reduce galvanic skin response in subjects, but did not significantly
reduce how frustrated subjects felt whilst playing the game.
These studies illustrate that simulated emotion can influence user attitudes and per-
ceptions - however, our understanding of exactly how they influence users and how
simulated emotion can best be utilised to enhance HCI is still relatively superficial.
One of the primary issues with emotion simulation research is the lack of studies that
have explicitly compared an emotional agent against an unemotional one. Numerous
studies have made use of emotionally expressive agents in their experimental systems,
but emotion is often not the main focus of the study (e.g. [17], [16], [22], [2]). At-
tempting to compare and analyse such studies from an emotion simulation perspective
becomes problematic as inferences have to be made and this can lead to uncertainty
about whether or not reported effects were down to the manipulation of emotion. For
example, Van Mulken et al. [22] conducted an experiment with two different condi-
tions – one that made use of an animated agent to present information and one where
an arrow was used instead of the agent to present the information. This is most likely
a fair comparison from the author’s point of view, but when attempting to evaluate
such a study from an emotion simulation angle it becomes an unfair comparison.
166 C. Creed and R. Beale
Emotional expression is not the only variable that has been manipulated here – the
functionality of the two conditions is also significantly different. Therefore, while
many studies have utilised affective embodied agents for experimental purposes, it is
problematic when attempting to compare the majority of these studies from an emo-
tion simulation perspective. More studies that explicitly compare an emotional and
unemotional agent are required to further our understanding of the effects of simu-
lated emotion.
Another issue with emotion simulation research is that many of these studies do not
check the emotional expressions of their agent prior to conducting their primary ex-
periment. Failure to take this essential step before testing the effects of simulated emo-
tion can cause problems as it is not clear whether or not subjects perceive the emotional
expressions as expected. For example, consider the difference between a genuine smile
and a social smile – the primary difference between the two is that a genuine smile (i.e.
a duchenne smile) involves the movement of the orbicularis oculi muscle near the eyes,
while a social smile does not [9]. We have found from own studies (described in next
section) that users put a great deal of emphasis on the eyes when interacting with
affective embodied agents. The application we used to develop our embodied agent
did not allow us to manipulate the muscle around the eye – therefore, when we
wanted the agent to simulate being genuinely pleased to see the user, we found that
subjects often perceived these welcoming smiles more as social ones, and therefore
perceived them as slightly false and patronising. This highlights how small details can
easily alter perceptions of emotion and the amount of attention to detail that research-
ers need to pay when simulating human emotion – it should not be assumed that emo-
tions will be perceived as expected. As Hook argues [13], they should be checked and
tested before conducting primary experiments.
Another issue is the importance of researchers being explicit about what constitutes
emotional expression in their study, and where, when and how it was expressed. Even
in studies where emotion is main component being examined, researchers often do not
include sufficient detail regarding this. Important information includes detailing ex-
actly which emotions were expressed, at what times they were expressed, which
model of emotion was used, and how the emotions were displayed (i.e. through tex-
tual content, speech, animated bodily and facial expressions, or combinations of
these). The majority of studies where emotion has been incorporated into embodied
agents tends to focus predominantly on basic emotions such happiness, sadness, joy,
fear, surprise and anger [8]. Very few studies have investigated the impact of more
cognitive emotions such as frustration, humiliation and guilt. The main reason for this
is most likely down to the fact that these emotions are more difficult to simulate – all
basic emotions have a unique facial expression associated with them, while cognitive
emotions tend not to.
A further issue that has rarely been discussed in the field is that of mismatched emo-
tional expressions. For example, many studies that use embodied agents equip them
with the ability to express emotion through bodily and facial expressions, but also
provide them with a synthetic and monotone voice. As we discuss in the following
section, this can result in a mismatched emotional expression – for example, you may
have a happy facial expression with a monotone synthetic voice. This mismatch has
Simulated Emotion in Affective Embodied Agents 167
occurred in a number of studies, yet very few researchers have discussed the impact of
it or the influence it may potentially have on their results. From our own investigations,
we have found that users have very strong reactions to mismatches in emotional ex-
pressions and find them to be particularly tedious, frustrating and annoying.
An important question that is related to this research is whether or not all emotions
are of interest to HCI. For instance, why would a computer ever need to express fear,
frustration, or disappointment? Perhaps frustration or disappointment might be used
by a sports coaching agent as a means for motivating an individual who has not ad-
hered to their strict fitness regime, by attempting to make them feel guilty. An expres-
sion or statement of fear might be used by a security agent to add urgency in getting
the user to patch software that has vulnerable holes located within it. Emotions which
are typically perceived as being more negative should not be ignored and thought of
as harmful to HCI, but instead, should be utilized by interface designers (along with
other more positive emotions) to produce a desired or beneficial outcome for the user.
How do users respond to simulated emotion in embodied agents over multiple and
extended interactions? This is another area that has been neglected and aside from
Bickmore and Picard’s work [2], there have been no other major longitudinal studies
with affective embodied agents. The majority of experimental studies that utilize
embodied agents often require a single short interaction that typically lasts less than
an hour. These short interactions often result in agents expressing one emotion at a
time – for example, they are either happy or sad at any one time, but never a mixture
of the two. This approach misses the main point about how emotions arise and are
expressed – as Boehner et al. state [3]:
“…emotions are constructed and experienced as individuals act in and through their
culture and social interactions. So what we feel is not simply a pre-existing fact, but
something that develops over the course of conversations and interactions with one
another. We work out what we feel through expressing it and through seeing how
others react. We negotiate our feelings with ourselves and with others, over time
crystallizing meanings for us of what may initially be vague, confusing, and ambigu-
ous sensations.”
Emotions, therefore, are often constructed through interactions and conversations
with others over time and are shaped by culture. This suggests that for users to have
meaningful social and emotional interactions and relationships with agents, they must
have the opportunity to interact with each other over time. There is a real need for
more longitudinal studies in this space to help us understand this further – how should
emotional expression adapt over time? How can agents “co-construct” emotional
experiences with users? How do users respond to simulated emotion over time – does
it remain novel and engaging, or does it become tedious and tiresome for users?
we are more likely to act of the advice of people we like and trust rather than people
we dislike and distrust. Does the same principle apply in HCI? That is, if we perceive
emotionally expressive agents to be more likeable, trustworthy and caring than un-
emotional agents, can they potentially influence user attitudes and habitual behavior
more effectively over extended periods of interaction? To investigate this we have
built an embodied agent (Fig. 1.) that simulates the role of a human nutritional coach
and are currently investigating the following questions: (1) are emotionally expressive
agents perceived more positively than unemotional agents? (2) can emotional agents
help motivate people to change unhealthy habits more effectively than unemotional
agents over extended periods of interaction? (3) how do our perceptions of emotional
agents change over multiple and extended interactions?
3.1 Experiment 1
influenced subjects’ perceptions. While neither channel (visual nor audio) seems to
dominant when identifying emotional expressions, it appears that with this particular
set of animations, subjects rated them higher on all measures when a happy or warm
face was in the animation, or when a happy or warm voice was used. Surprisingly,
this was also true of the of the “concern” measure: subjects rated animations with
happy or warm dimensions as more concerned than animations with neutral and con-
cern dimensions. There were some strong responses to the mismatched facial expres-
sions with many subjects venting their frustration. Also, subjects put a lot of emphasis
on the eyes when rating the emotional expressions. Some thought that smiles were
false and patronizing as the orbicularis oculi muscle around the eye did not move,
while others stated that they liked how the eyes were used to emphasize emotions
(even though we did not intentionally use the eyes to express emotion).
3.2 Experiment 2
After gathering feedback on the embodied agent and confirming that subjects could
correctly recognize the emotional expressions of the agent, we conducted another web-
based experiment that involved around a ten minute interaction with the agent. The
experiment had a between-subjects design with two different conditions – emotion and
no-emotion. Emotion was manipulated through facial and vocal expressions – for
example, in the emotion condition, the facial expressions alternated between happi-
ness, warmth, concern, and neutral facial expressions, depending on the context of the
170 C. Creed and R. Beale
conversation, while in the no-emotion condition, the facial expressions always remained
neutral. Also, in the emotion condition, the speech rate, pitch average and pitch range
were faster, higher and wider than in the no-emotion condition.
The interaction with the agent attempted to simulate a ‘first session’ with a human
health professional [14] – the interaction started with the agent introducing itself and
attempting to build rapport with the subject through the use of small talk. The agent
then clarified both its own role and the role of the subject during the interaction, and
followed this by asking subjects about their dieting history and current eating habits.
The agent then moved on to discuss the pros and cons of both the subject’s current
diet and in changing their diet, and then talked about options the subject has for im-
proving their dietary habits. The interaction concluded with the agent attempting to
get an initial commitment for change and then terminating the interaction appropri-
ately. Subjects were able to respond to the agent’s utterances by selecting from a list
of pre-scripted responses (Fig. 3). It is important to note that the dialogue between
conditions was exactly the same – it was just the way in which it was presented (i.e.
either in an emotional or unemotional manner) that differed. The measures were taken
from a similar study by Brave et al. [4] and were a mixture of ten point semantic dif-
ferentials (adjacent pairs) and ten point Likert scales. These scales were used to
measure caring, likeability, trustworthiness, intelligence, how positive subjects felt
when using the system, and how supported subjects felt. After completing the interac-
tion, participants were given a number of online health-related articles that they could
view for as long as they desired. At the end of the experiment, subjects were asked to
answer eight open-ended questions that focused on things that annoyed them about
the agent, things they liked about the agent, whether the agent was better or worse
than interacting with just a website, and their general thoughts regarding the system.
Fifty subjects completed the experiment (twenty five in each condition) and results
found that subjects perceived the emotional agent as significantly more emotionally
expressive than the unemotional agent. While this was an expected finding, it was
important to test so that any subsequent effects could potentially be attributed to our
manipulation of the emotional expression. Further to this, the agent was also per-
ceived as significantly more likeable and caring than the unemotional agent. All other
measures were not significant. Feedback gathered from the open-ended questions
provided a number of further insights into what subjects thought of both the agent and
system. Subjects in the no-emotion condition frequently complained about the
unemotional nature of the agent – they described its voice as “slow”, “boring”, “mo-
notonous”, and “unenthusiastic.” On the other hand, the comments regarding the
emotional agent were generally more positive – subjects liked the “realism” of the
agent and often cited the agent’s ability to express emotion as something they liked.
Others commented on the “pleasant” and “friendly” voice and stated how they liked
the general appearance of the agent. However, a few subjects commented on how they
did not like the “fake perkiness” of the agent. In both conditions, subjects stated that
they would have liked more options to choose from answering the agent’s questions
and also would have liked an option to skip answering certain questions. Subjects also
wanted further feedback after the interaction with Rachael had been completed, as
well as more health-based resources to use.
Simulated Emotion in Affective Embodied Agents 171
We are currently working on a longitudinal study that will investigate whether simu-
lated emotion has any influence on an agent’s ability to help motivate people to
change poor dietary habits over multiple and extended interactions. The experiment
will be eleven weeks in length and will involve subjects logging into the system daily
to record how many portions of fruit and vegetables they have consumed the previous
day. Once they have provided this information, they will be able to access a resource
area where they will be able to view their progress over time (through the use of
graphs) and to make use of a number of other resources such as a BMI calculator, a
calorie counter, a weight monitor, and other tools. At the end of every week, they will
have an interaction with the agent to discuss their progress and to set goals for the
next week. These interactions will last around 5 minutes. The first phase of the ex-
periment will last seven weeks and will involve eight interactions with the agent.
Subjects will then be requested not to use the system for another four weeks – after
these four weeks have elapsed, subjects will be asked to provide information about
their current fruit and vegetable consumption. This data will be used to assess if any
changes that occurred during the interaction with the character have remained.
In conjunction with the longitudinal study, we also plan to run another experiment –
this will focus on recording the dialogue with a human actress and then replacing the
current agent with this. The main motivation for this is related to comments regarding
lip synchronisation from the first two experiments – feedback collected from subjects
suggests that they find inaccurate lip synchronisation to be particularly annoying and
172 C. Creed and R. Beale
frustrating to interact with. Such responses could potentially be influencing the finding
of subtle effects that were expected, but not found in previous experiment explained.
Therefore, by recording an actress, the issue of lip synchronisation will be removed.
From this experiment, we expect to see more positive comments from subjects regard-
ing the agent and system in general, and potentially more significant effects than were
found in experiment two.
4 Conclusion
The studies highlighted in this paper have started to investigate how users respond to
simulated displays of emotion, however, we still have little understanding of how simu-
lated emotion influences user attitudes and behaviour, and whether or not it can be util-
ised to help produce beneficial and practical enhancements to HCI. Future studies need
to concentrate on a number of important areas.
More research is needed that explicitly compares emotional and unemotional
agents – while this appears obvious, there are very few studies that have actually
performed this test. Without such studies, we have to make comparisons between
studies where emotion is not the main focus of the experiment – as discussed in the
paper, this is problematic and can lead to a number of issues. In addition to this, more
studies need to examine how users respond to simulated emotion over multiple and
extended periods of interaction. The majority of studies conducted in this area to date
require subjects to have a single and short interaction with an affective embodied
agent – as result, it is unclear how we respond to affective agents over time. How do
users respond to simulated displays of emotion over multiple interactions? Does it
keep them engaged, or do they find it false and patronising? Are emotional agents
more effective than unemotional agents at motivating change in user attitudes and
behaviour over extended lengths of time? Further to this, there also needs to be a
change of focus from that of whether or not emotional agents are better than unemo-
tional agents, to one where we concentrate on which emotions, used at which times,
and expressed in which ways, enhance (or hinder) HCI. Too many studies have fo-
cused purely on the so-called basic emotions of anger, disgust, fear, happiness, sad-
ness and surprise. More focus needs to be given to cognitive emotions – both in how
to effectively simulate them and in how users respond to them. These emotions play
an important role in our lives and should not be neglected.
Understanding user responses to simulated emotion is an important research area
that has received little attention to date. As more and more users become familiar with
using embodied entities as a means for collaboration and interaction with others in
virtual spaces, it is imperative that we understand in detail how simulated emotion
influences user attitudes and behaviour. Early studies in this area have illustrated that
simulated emotion can influence users in a number of ways, but research now needs
to move beyond this, so that we can understand more clearly how affective agents can
build effective social and emotional experiences with users that can be utilised for
beneficial and practical purposes.
Simulated Emotion in Affective Embodied Agents 173
References
1. Bickmore, T.: Relational Agents: Effecting Change through Human-Computer Relationships.
PhD Thesis, Department of Media Arts and Sciences, Massachusetts Institute of Technology
(2003)
2. Bickmore, T., Picard, R.: Establishing and Maintaining Long-Term Human-Computer Rela-
tionships. ACM Transactions on Computer-Human Interaction (TOCHI) 12, 293–327 (2005)
3. Boehner, K., DePaula, R., Dourish, P., Sengers, P.: How emotion is made and measured.
International Journal of Human Computer Studies 65, 275–291 (2007)
4. Brave, S., Nass, C., Hutchinson, K.: Computers that care: investigating the effects of orienta-
tion of emotion exhibited by an embodied computer agent. International Journal of Human-
Computer Studies 62, 161–178 (2005)
5. Burleson, W., Picard, R.: Affective agents: sustaining motivation to learn through failure and
a state of stuck. In: Social and Emotional intelligence in learning environments workshop, in
conjunction with the 7th International Conference on Intelligent Tutoring Systems, August
31, 2004, (24th August 2007)
http://affect.media.mit.edu/ pdfs/04.burleson-picard.pdf
6. Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds.): Embodied Conversational Agents.
The MIT Press, Cambridge (2000)
7. Creed, C.: Using Computational Agents to Motivate Diet Change. In: IJsselsteijn, W., de
Kort, Y., Midden, C., van den Hoven, E. (eds.) First international conference on Persuasive
Technology for human well-being, Eindhoven University of Technology, the Netherlands,
pp. 100–103. Springer, Heidelberg (2006)
8. Ekman, P.: Emotions Revealed: Recognizing Faces and Feelings to Improve Communication
and Emotional Life. Henry Holt & Co. (2004)
9. Ekman, P., Davidson, R.J., Friesen, W.V.: The Duchenne smile: emotional expression and
brain physiology. Journal of Personality and Social Psychology 58, 342–353 (1990)
10. Fabri, M., Moore, D.J., Hobbs, D.J.: Empathy and Enjoyment in Instant Messaging. In:
McKinnon, L., Bertlesen, O., Bryan-Kinns, N. (eds.) Proceedings of 19th British HCI Group
Annual Conference (HCI 2005), Edinburgh, UK, pp. 4–9 (2005)
11. Gratch, J., Marsella, S.: A domain-independent framework for modeling emotion. Journal of
Cognitive Systems Research 5, 269–306 (2004)
12. Grolleman, J., van Dijk, B., Nijolt, A., van Emst, A.: Break the habit! Designing an e-
therapy intervention using a virtual coach aid of smoking cessation. In: IJsselsteijn, W., de
Kort, Y., Midden, C., Eggen, B., van den Hoven, E. (eds.) PERSUASIVE 2006. LNCS,
vol. 3962, pp. 133–141. Springer, Heidelberg (2006)
13. Hook, K.: User-Centred Design and Evaluation of Affective Interfaces. In: Ruttkay, Z., Pela-
chaud, C. (eds.) From Brows to Trust: Evaluating Embodied Conversational Agents, vol. 7,
pp. 127–160. Kluwer, Dordrecht (2004)
14. Hunt, P., Hillsdon, M.: Changing Eating and Exercise Behaviour: A Handbook for Profes-
sionals. Blackwell Science, London (1996)
15. Isbister, K.: Better Game Characters by Design: A Psychological Approach. Morgan Kauf-
mann, San Francisco (2006)
16. Lester, J., Converse, S., Kahler, S., Barlow, T., Stone, B., Bhogal, R.: The persona effect:
affective impact of animated pedagogical agents. In: Pemberton, S. (ed.) CHI 1997: Proceed-
ings of the SIGCHI conference on Human factors in computing systems, pp. 359–366. ACM
Press, Georgia (1997)
17. Okonkwo, C., Vassileva, J.: Affective Pedagogical Agents and User Persuasion. In: Stepha-
nidis, C. (ed.) Universal Access in Human - Computer Interaction (UAHCI), pp. 5–10 (2001)
174 C. Creed and R. Beale
18. Prendinger, H., Mayer, S., Mori, J., Ishizuka, M.: Persona Effect Revisited. Using Bio-signals
to Measure and Reflect the Impact of Character-based Interfaces. In: Fourth International
Working Conference On Intelligent Virtual Agents (IVA 2003), pp. 283–291. Springer,
Heidelberg (2003)
19. Reeves, B., Nass, C.: The media equation: How people treat computers, televisions, and new
media like real people and places. Cambridge University Press, New York (1996)
20. Second Life: Second Life: Basic Overview (August 24, 2007),
http://secondlife.com/whatis/
21. There.com: A Basic Overview of There.com (August 24, 2007),
http://www.there.com/whatIsThere.html
22. Van Mulken, S., Andrè, E., Muller, J.: The Persona Effect: How Substantial Is It? In:
Johnson, H., Laurence, N., Roast, C. (eds.) HCI 1998: Proceedings of HCI on People and
Computers XIII, Sheffield, UK, pp. 53–66. Springer, Heidelberg (1998)
23. Yee, N.: The Psychology of MMORPGs: Emotional Investment, Motivations, Relationship
Formation, and Problematic Usage. In: Scroeder, R., Axelsson, A. (eds.) Social Life of Ava-
tars II, pp. 187–207. Springer, London (2006)
Affective Human-Robotic Interaction
1 Introduction
In the 2004 film version of Isaac Asimov’s I, Robot, an android interrupts a row be-
tween the hero Police Detective Del Spooner's played by Will Smith, and heroine
psychologist Dr. Susan Calvin played by Bridget Moynahan to prevent violence. ‘Ex-
cuse me. I note there are elevated stress patterns in your speech,’ Sonny (the robot)
announces, recognising the anger in their voices. Set in 2035, having consumer robots
with autonomous intelligence and the facility for complex communications seems
well beyond what researchers can achieve, however recognising stress and emotion in
the human voice is becoming not only possible but more commonplace. Voice stress
analysis [1] or acoustic emotion recognition [2] remains of interest to the research
community however solutions are finding their way into products and the lives of
consumers. Call centres are tracking the conversations between callers and agents for
emotive events to help agents better support caller frustrations, aid agent training and
detect fraudulent insurance claims [3]. In-car systems are not only voice controlled
but can recognise driver emotion and react to drowsy or angry drivers to help them
drive more safely [4-5]. Computer games can recognise player emotion and adapt
game play to maintain active engagement and interest of the player without stressing
the player with over challenging game play [6-7].
Humanoid robots in the home may be decades away however ‘entertainment’ ro-
bots have arrived. Robotic systems are sold as toys which can be controlled remotely
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 175–185, 2008.
© Springer-Verlag Berlin Heidelberg 2008
176 C. Jones and A. Deeming
2 Project Outline
The project aims to develop a consumer level emotionally responsive home robot and
to evaluate the emotive interaction between the robot and it’s owner. Emotionally re-
sponsive technologies in the home are uncommon. To aid the participants in their un-
derstanding of how a device could react with emotional intelligence we have built
emotion recognition and response into a robotic dog. This simplifies the interaction
for the participant as they expect the dog to react to emotion but to not fully under-
stand the context of what the owner says. In fact ‘real’ dogs react more to the emotion
of the owner than learned commands, and the request ‘Go away’ said in a happy
friendly manner will have a dog excited, whilst ‘Come here’ said with anger and ag-
gressive will see a dog retreat.
To facilitate natural interaction between the owner and the robotic dog, speech will
be used to detect and recognise emotion in the owner. Rather than recognising the
emotion of the owner by the words spoken and transcribed using speech recognition
software, the project will determine owner emotion from the acoustic cues of emotion
in speech, in a similar way to emotion recognition of a ‘real’ dog.
The robotic dog will recognise the emotional state of the owner and then react to
the emotion with an appropriate response or action, e.g. if the owner is angry the ro-
botic dog will whimper and cower in fear. We will assess using participants if, and
how, the emotionally responsive robotic dog can enhance the interaction between the
owner and their home robot. We will evaluate whether the new robotic dog is be-
lieved be participants to be reacting intelligently to their voice.
3 Implementation
microphone to record the owner’s speech. In this way the owner does not need to be
close to and in front of the AIBO, and lighting conditions can be ignored. The speech is
processed for emotional cues using an acoustic emotion recognition systemdescribed in
the article ‘Acoustic Emotion Recognition for Affective Computer Gaming’.
The AIBO has a 64 bit RISC CPU operating at 576 MHz with 64MB of RAM. How-
ever the acoustic emotion recognition system (ER) is computationally intensive and on
board processing would incur lengthy latency delays. Instead the ER system resides on
a laptop which communicates wirelessly with the AIBO to retrieve the owner’s speech,
process the speech for emotional cues, and return the recognition result to the AIBO,
Fig. 2. On receipt of the emotional classification, the AIBO performs the appropriate
action.
The actions performed by the AIBO, together with the code to stream speech and
communicate wireless, is built around Sony’s MIND 2 software (‘aiboware’). Sony
has a Software Development Environment (SDE) for the AIBO series, known as Open
R SDE [12]. This contains various Software Development Kits (SDK)s which range
178 C. Jones and A. Deeming
Fig. 2. System configuration for AIBO with integrated acoustic emotion recognition
from simple scripting (in R-Code) of actions, to providing control of individual joints or
wireless remote control. Other development languages exist including Tekkotsu [13]
(a C++ based framework to create a level of abstraction above Open R SDE); Univer-
sal Real-time Behavior Interface (URBI) [14] (similar to Tekkotsu’s abstraction from
Open R, but is based on a client/server model and supports other languages like Java
and Matlab as well as C++); and YART (Yet Another RCode Tool) [15] (a small pro-
gram to aid users in creating R-Code scripts for AIBO to perform simple actions such
as dancing). In this project we use the URBI environment to code the actions, data
capture and streaming. We have developed responses for AIBO to perform triggered
by the owner’s emotional speech, Table 1.
The AIBO constantly listens for owner speech however the actions are performed
fully before the dog can change it’s emotive response thereby preventing confusing
actions being exhibited.
Affective Human-Robotic Interaction 179
4 Published Studies
Previous studies have considered the human-robot relationship, and in particular in-
teractions between users and the AIBO robotic dog [16-18]. Surveys of visitors to an
online AIBO forum investigated the relationship between owner and robotic dog with
aspects of technological essences, Life-like essences, Mental states, Social rapport
and Moral standing [16]. It was reported that most (75%) respondents surveyed had
comments which regarded AIBO as “a cool piece of technology”, 47% referred to
AIBO biologically and 14% attributed animism (in that it has some form of ‘spirit’).
In terms of mental states and emotion itself, 42% said that AIBO appeared to do
things purposely, while 38% reported it having real feelings (e.g. being angry when
the owner’s boyfriend talked to the AIBO). Some 27% of those surveyed also en-
gaged in a reciprocal exchange of information, where the AIBO responded in a par-
ticular way each time it was asked about something. We infer that although owners of
AIBO understand that the dog is mechanical and not alive they attribute real-life char-
acteristics of personality and intelligence to the AIBO.
The child-robot relationship was studied where preschool children compared the
AIBO with a stuffed toy dog [17]. Children were allowed to play equally with both dogs
and then asked questions relating to categories created in the online forum study [16].
Results showed that the children played/behaved the same way with the AIBO as they
did with the stuffed dog. However the children interacted more with the AIBO “as if it
were a live dog” and engaged in more shared action (e.g. one child noticed AIBO look-
ing for its pink ball, brought the ball to the AIBO and expects AIBO to kick it).
A third study involved children from ages 7 to 15 comparing the AIBO with a real
dog (a female Australian Shepherd called “Canis”) [18]. The result reported that the
children generally spent more time with the Australian Shepherd than with AIBO.
However interestingly interviews with the children showed that although they knew
180 C. Jones and A. Deeming
AIBO was a machine, they would attribute it with life-like qualities and treat AIBO as
if it were real. When asked whether the AIBO or Canis “could know how you re feel-
ing?”, the child study rated 68% to Canis and AIBO 22%.
We proposed that by integrated emotion recognition capabilities into the AIBO we
can enhance the human-robot interaction and improve the sense that the AIBO is alive.
5 Evaluation Method
The study evaluated whether the addition of acoustic emotion recognition technology
for the AIBO can enhance the human-robotic interaction and relationship and allow
the AIBO to appear more ‘alive’, with the ability to ‘feel’ and respond to the owner in
an emotionally intelligent manner.
16 participants completed the evaluation study (5 female, 11 male, aged between
19 and 34). All were aware of consumer-level robotics and entertainment robotic dogs
however none of the participants had previously interacted with an AIBO.
The evaluation involved participants interacting with two different AIBOs:
• AIBO 1: a non-emotionally intelligent AIBO. The AIBO will listen to the owner’s
speech and perform the same actions however the ER system is disengaged and the
actions are performed randomly
• AIBO 2: an emotionally-intelligent AIBO. The AIBO will listen to the owner’s
speech, pass to the ER system which will recognise the owner’s emotional state
and trigger the AIBO to perform the emotionally appropriate action.
The participants were split into two groups where the first group interacted with
AIBO 1 first and then AIBO 2, whereas the group 2 interacted with AIBO 2 first and
then AIBO 1. Participants were not told that the two AIBOs were different. Partici-
pants were informed that they could only interact with the AIBO using speech. They
were also told that the AIBOs could detect boredom, sadness/grief, frustration/
extreme anger, happiness and surprise but not told how the AIBOs would respond.
Participants were not constrained in how they could interact with the AIBO and what
they could say.
Participants spent on average 15 minutes interacting with each AIBO during which
their actions were observed and recorded, together with their speech, the ER output
and the AIBO responses. After interaction with both AIBOs participants were re-
quested to complete a questionnaire in which they scored using a 5-point Likert scale
how well each AIBO recognised each of the 5 emotional states, and how effective
each AIBO was at dealing with the emotional stimulus. The participant was then told
the difference between the two AIBOs and which of the two responded using the
emotion recognition technology. They were then asked additional questions about
how well the ER AIBO had performed and whether the ER system made the AIBO
appear more alive.
6 Results
The AIBO would only appear emotionally-intelligent if the emotion recognition tech-
nology performed accurately and robustly. The observations of interactions and
Affective Human-Robotic Interaction 181
recorded speech files showed that the ER system correctly recognised the emotional
state with greater than 75% frequency, averaged over the 16 participants. This is in
line with the stated performance of the acoustic emotion recognition system for
speaker independent and utterance independent speech (see article ‘Acoustic Emotion
Recognition for Affective Computer Gaming’).
Results were gained from responses to the questionnaires. Firstly participants were
asked to rate how well each dog responded to their (owner) emotion using a 5-point
Likert scale where 1 represented poor and 5 excellent. Participants were not told
which dog was using emotion recognition. Results in Fig. 3 are compiled for non-ER
against ER AIBOs.
5
3.75
Average rating
4
3.25 3.25 3.25
3 2.75 3 2.75 2.75
3 non ER
2.25
2 ER enabled
1
0
surprise boredom happiness sadness anger
Emotions
Fig. 3. Comparison of average rating of emotional responses for non ER, and ER AIBOs
With the non ER AIBO, most emotions are rated around average, with happiness and
anger being (on average) considered to be most effective responses (although it should
be noted that the responses for the non-ER AIBO were performed randomly), Fig. 3.
With the ER AIBO, the emotional response rates are higher for all emotions, most
noticeably for anger. This is corroborated by evidence given by participants saying
that anger was the easiest emotion for them to portray, Fig. 3. The ratings indicate that
the ER AIBO is performing with more emotional-intelligence than the non-ER AIBO.
We may also presume that the ER AIBO rates would be higher still if the ER technol-
ogy itself performed with 100% accuracy.
Before being informed which AIBO used the ER system, participants were asked
to rate, using a 5-point Likert scale where 1 represented poor and 5 excellent, each
dog for overall emotional intelligence, Fig. 4. The ER enabled AIBO is rated more
highly than the non-ER AIBO, however the lower than expected ER AIBO result may
be caused by the error rate of the ER system, noise in the testing environment and in-
ability of participants to portray emotions on demand.
Participants were informed that one of the AIBOs was using an automated emotion
recognition system but not told which one. They were then asked to state which of the
two AIBOs they believed to be more ‘alive’, Fig. 5.
182 C. Jones and A. Deeming
5
3.75
Average rating
4
2.75
3
2
1
0
non ER ER enabled
Type of AIBO
Fig. 4. Comparison of average rating of overall emotional intelligence for non ER, and ER
12 11
Number of votes
10
8
6
4
4
2 1
0
Neither non ER ER enabled
AIBO type
Fig. 5. Voting of which version users considered to be more alive
The results show that although the participants did not significantly rate the ER
AIBO responses for each emotion, they could easily recognise which of the two
AIBOs was more emotionally-intelligent and more ‘alive’, Fig. 5. Comments from
participants echoed the statistical results with one participant saying about the ER en-
abled AIBO ‘better responses, seemed to respond accurately to what I said’, and an-
other participant said ‘able to react how a real dog would and convey such things as
empathy when needed’.
Participants were asked about their feelings towards interacting with robots able to
detect and act upon human emotion. Most participants were enthusiastic about the
future of emotionally-intelligent robotics, indicating that users can see the benefits of
affective computing for practical applications, Fig. 6. However the technology re-
quires further refining before it will be widely adopted.
When asked whether the ER enabled AIBO is a substitute for a real dog, 60% of
participants responded that it could not. Many comments related to the mechanical
aesthetics and feel which prevented full emotional engagement between the owner
Affective Human-Robotic Interaction 183
14 12
Number of votes
12
10
8
6 4
4
2 0 0
0
Uncomfortable Unsure Accepting Enthusiastic
Participant opinions
Fig. 6. Participants’ feelings towards robots able to detect and act upon human emotion
and AIBO rather than the ER technology and intelligence itself. Encouragingly 40%
of participants believe the ER enabled AIBO could provide significant emotional en-
gagement. In particular, participants considered the use of the ER AIBO with patients
in hospital settings where real pets would not be allowed. Additionally, the ER AIBO
could be used to help develop social and emotional interaction skills for children or
given to children with allergies.
7 Conclusion
The research considers the integration of acoustic emotion recognition technologies
into a consumer robot (AIBO) and evaluates the impact on the human-robotic interac-
tion and relationship. By comparing two AIBO configurations (one with random re-
sponses to human emotion, the other with automatic emotion recognition triggering
responses), we have been able to show that users can recognise which of the two ro-
bots is more emotionally intelligent. The addition of emotion recognition capabilities
to the robotic dog provides an intuitive and natural interaction mechanism. Partici-
pants were able to talk to the ER robotic dog and the dog respond to emotion in their
voice. Consumer robotics of today offer considerable technology capabilities (such as
videocam, reading websites, reading RSS news feeds and mp3 music playback).
However, we believe that consumer robotics should offer more affective interactions
rather than technology convergence. In this way robotic dogs can act like real dogs
and offer the owner empathy, compassion and companionship.
8 Future Work
The integration and evaluation of affective technologies with consumer level robotics
is in its infancy. The Science fiction of today may become the reality of tomorrow,
when we will have consumer robots which can recognise human emotion and respond
appropriately. Recognising emotion from acoustic cues in speech overcomes limita-
tions in camera resolution and dependencies on lighting required for expression and
184 C. Jones and A. Deeming
gesture recognition. However future affective consumer robots should adopt a multi-
modal sensor input using acoustics, speech recognition, facial expression, gesture,
tactile and bio-physiological recognition. In the immediate future, feature calculation
and classification algorithms may reside on powerful remote PCs connected wire-
lessly with the consumer robot. However with improvements in processing power, re-
duced CPU costs and more optimised algorithms, more of the affective intelligence
can be embedded within the robot. In addition, further work must consider how con-
sumer robots should respond to human emotion: should the robot react to human emo-
tion or attempt to improve the emotion of the owner?, should the robot have it’s own
emotional state which can be affected by that of the owner?, should the robot have
longer-term moods as well as short-term emotions?, what types and ranges of emo-
tions are appropriate and in what situations and applications?, does emotional intelli-
gence in robotics create a potential risk for human harm? Affective communications
with consumer robots will offer the missing natural interaction which can make me-
chanical inanimate robot appear alive.
References
1. Digilog, Voice Analysis Insurance, Banking Fraud Internal Audit,
http://www.digilog.org/
2. Jones, C.M., Jonsson, I.-M.: Detecting Emotions in Conversations Between Driver and In-car
Information Systems. In: Proceedings of the International Conference on Affective Comput-
ing and Intelligent Interaction, Beijing, China (2005)
3. MIT Media Lab: Affective Computing Group, http://affect.media.mit.edu/
4. Fernandez, R., Picard, R.W.: Modeling Driver’s Speech under Stress. Speech Communica-
tion 40, 145–159 (2003)
5. Jones, C.M., Jonsson, I.-M.: Automatic Recognition of Affective Cues in the Speech of Car
Drivers to Allow Appropriate Responses. In: Proceedings of OZCHI Canberra, Australia
(2005)
6. Kaiser, S., Wehrle, T., Edwards, P.: Multi-Modal Emotion Measurement in an Interactive
Computer Game: A Pilot-Study. In: Frijda, N.H. (ed.) Proc Conf International Society of Re-
search on Emotions, pp. 275–279. ISRE Publications, Storrs (1994)
7. Jones, C.M., Sutherland, J.: Creating an Emotionally Reactive Computer Game Responding
to Affective Cues in Speech. In: Proceedings of British HCI Group Annual Conference, Ed-
inburgh, UK (2005)
8. LEGO.com Mindstorms, http://mindstorms.lego.com/
9. Official Robosapien site,
http://www.wowwee.com/robosapien/robo1/robomain.html
10. Sony AIBO Europe, Sony Entertainment,
http://www.sonydigital-link.com/AIBO/
11. Roux, J.: Pursuit Ability of a Robotic Pet, MSc Dissertation. Heriot-Watt University (2005)
12. Sony Corporation: AIBO SDE Official web site, Open R Development Platform for AIBO
robots, http://openr.aibo.com/
13. Carnegie Mellon University: Tekkotsu: Homepage, Open Source Development Environment
for Sony’s AIBO, http://www.cs.cmu.edu/~tekkotsu/
14. Baillie J-C/ENSTA/URBI/Goasti, S.A.S.: URBI – Universal Real-Time Behavior Interface,
Robot Control Using the URBI Scripting Language,
http://www.urbiforge.com
Affective Human-Robotic Interaction 185
15. AiboPet: AiboHack Main Page - AiboPet and other ’Pets, Information Regarding Program-
ming in AIBO R-Code Scripting Language Including R-Code Tool YART (Yet Another
RCode Tool), http://www.aibohack.com/
16. Friedman, B., Kahn, P.H., Hagman, J.: Hardware Companions?: What On-line AIBO Dis-
cussion Forums Reveal About the Human-Robotic Relationship. In: Proc. SIGCHI Confer-
ence on Human Factors in Computing Systems, Florida, USA, pp. 273–280 (2003)
17. Kahn, P.H., Friedman, B., Freier, N.G., Perez-Granados, D.R.: Robotic Pets in the Lives of
Preschool Children. In: CHI 2004 Extended Abstracts on Human Factors in Computing Sys-
tems, Vienna, Austria, pp. 1449–1452 (2004)
18. Kahn, P.H., Friedman, B., Melson, G.F., Beck, A.M., Roberts, T., Garrett, E.: Robots as
Dogs?: Children’s Interactions with the Robotic Dog AIBO and a Live Australian Shepherd.
In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1649–1652
(2005)
In the Moodie: Using ‘Affective Widgets’ to Help
Contact Centre Advisors Fight Stress
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 186–193, 2008.
© Springer-Verlag Berlin Heidelberg 2008
In the Moodie: Using ‘Affective Widgets’ to Help Contact Centre Advisors Fight Stress 187
service advisors’ day and investigated how they could record their ‘emotional weather’
as the hours ticked past.
Fig. 2. ‘Moodies’
188 N. Millard and L. Hole
This action releases a ‘moodie’ (see Figure 2), an animated stick figure that struts
up and down the screen with an exaggerated stride and body inclination. This posture
is designed to express a certain amount of frustration and personality [9].
Advisors may be feeling ‘undesired’ emotions that they cannot express to the cus-
tomer. However, they may feel the need to release these emotions in some way. This
was usually observed in the call centre in the form of (unseen) gestures and (unheard)
comments to the customer during or after the call.
Frustration and anger can be a vicious circle because advisors have to deal with
both the source of their frustration (usually either the customer or feelings of help-
lessness relating to company process and policy), but also the emotional reaction
itself. This is an aversive state that people tend to try and avoid or escape and is posi-
tively linked with emotional exhaustion and job dissatisfaction [8].
Klein et al. [10] found that allowing systems to actively acknowledge and support
user frustration and anger helped the user’s ability to manage and recover from nega-
tive emotional states. This could be a kind of symbolic game allowing a kind of be-
lated emotional mastery over the event. Reynolds and Picard [11] suggest that “user
interface widgets”, e.g. a ‘frustrometer’ or thumbs up/down, can be used to actively
express user frustration through direct user manipulation. The computer then needs to
respond in a socially appropriate manner [9, 12].
The moodie is an example of an ‘affective widget’ that can be unleashed by the
physical action of throwing the customer capsule into the waste bin. This physical ex-
pression of emotion is akin to the kind of reaction that may have occurred in the physi-
cal world (i.e. throwing paper in a bin). It seeks to emulate the essence of that physical
experience in a virtual space whilst tying it to the task (i.e. the call) via the customer
capsule. During evaluation of the prototype, users described this as “throwing the cus-
tomer in the bin” and described a visceral, feeling of “naughtiness mixed with triumph”.
Wensveen et al. [13] have supported the use of physical action to express emotion
rather than the more common use of physiological data. Since people express and
communicate their emotions through behaviour, this behaviour is a source of direct
information about the emotions. It also does not require any direct physical interven-
tion or expensive hardware, as with physiological data capture techniques. The dis-
advantages are that it cannot communicate the severity of the incident and it does
require the user to actively apply effort [11].
The resulting moodie can be used as a non-linguistic, visual indication of state of
mind as well as a humorous and slightly subversive outlet to relieve stress [14].
The ability of humour to build a sense of community in the workplace has also
been demonstrated [21, 22]. Understanding humour presupposes a shared context and
this shared context can be reinforced within many groups by the use of humour [23].
However, the workplace is different from other places in two ways. There is a task to
be done and that task has been assigned to, rather than selected by, those who are
undertaking it. Because of the task focus, there is somewhat less tolerance in a work-
place for “distractions” like humour. However, Venkatesh notes that the concept of
playfulness can be divided into unproductive play (something that is unproductive,
pleasant and involving) and playful work (something that is productive, pleasant and
involving) [24]. The latter can be used to enhance employees’ job experience and, as
a result, positively influence the resulting customer experience that they deliver [25].
Braverman claims that humour at work can add to the bottom line [19]. Goodman
and Gorkin write that humour can relieve tension, enhance relationships and motivate
people [26]. Feigelson claims that employees who have fun at work are less likely to be
late or absent and job turnover improves while motivation and productivity climb [27].
Morkes, Kernal and Nass found that humour, where used appropriately on an inter-
face, did not result in task distraction and could add to likeability and acceptance [28].
However, Reeves and Nass warn against adding additional cognitive load to the advisor
by drawing attention to an animation and taking concentration from the task at hand [9].
Moodies can, therefore, be deleted, shared or stored by the advisors at any point.
One point of discussion about the moodie is around how long it remains effective.
There are opposing views about the effect of novelty on humour. On one hand, Des-
cartes’ theory of surprise in humour would posit that multiple uses of the moodie
would mean that its novelty would soon wear off [29]. However, whilst surprise is a
ubiquitous quality in jokes (especially in the punch line of a joke) it does not seem
essential, based around the enduring quality of comedy routines [30]. Assuming the
former is true, the moodie may need to incorporate some element of unpredictability,
e.g. not always having a stick man as a response to throwing the call into the bin.
Moodies can also be used to paint a picture of an advisor’s day. Macdonald and Sir-
ianni point out that the advisor’s daily work experience is “often one of a series of
minor complaints assuming major proportions for the customer” [31]. Suppression of
these feelings can then cause stress problems for the advisor [10]. If the advisor has
had a tough day, they can choose to send moodies to their buddies or to their man-
ager. This is a similar device to the ‘affective awareness GUI widgets’ that have been
used by Garcia et al. to support emotional awareness in computer supported collabo-
rative work [32].
Self-disclosure and privacy could also be an issue here [33]. This is especially
since stress at work is still somewhat stigmatised. To facilitate this, the advisor is in
control of who sees the moodie. They can choose to reveal their emotional state to
their buddies or to their manager. This allows users to control what sort of emotional
data is collected on them rather than invading their privacy [11] and prevents emo-
tional data from becoming another element of call centre monitoring. They are also
only likely to use this as a channel of communication where they perceive there to be
190 N. Millard and L. Hole
a benefit to them [12]. Earley found that the greater extent that the employee has
control over the type of performance data collected and presented, the greater the
impact on employee motivation and performance [34].
In terms of this form of emotional self-report, there is an argument that self-rating
of stress is too subjective to be of use and that self-ratings can be over inflated and
inaccurate, particularly via electronic communication [35]. The biggest problem is
associated with the basic nature of the question (i.e. “how do I feel?”) and the vaga-
ries of self-insight.
Extreme and moderacy response styles describe a respondent’s possible tendency
to consistently respond at the extremes of the scale. Culture, education and age can
also exert an influence on this response bias. Another distortion is response sets, i.e. a
conscious or unconscious attempt on the part of the respondent to create a certain
socially desirable impression [36]. In some circumstances, respondents may be
tempted to give the socially desirable response rather than describe what they actually
think, believe or do. A distinction can be made between social desirable responding
being either a function of attempting to present oneself in a favourable light to others
(impression management) and/or a self-esteem preservation function (self-deception
or ego defense [37]). Fundamental to this is the assumption that the management
culture of the call centre is not one of fear since, as Pfeffer and Sutton say, “fear
causes people to cheat, conceal the truth, focus on the short term and focus on the
individual” [38]. This would not be conducive to moodie usage.
Self-report tools may provide users with the means to say how they are feeling, but
the onus lies with the advisor to both label and disclose the emotion. A large body of
research has detected significant direct and indirect effects of Negative Affectivity
(NA: [39]) on job stress and stress related coping mechanisms. It is a logical extrapo-
lation that those people with high NA rating would be prone to report their current
emotional state in a more negative manner. This suggests that if traditional self-report
measures were used it may be prudent to determine an individual’s ‘base-line’ NA
and determine the current emotional state as a deviation from that base. Conversely,
the issue is circumvented if the assessment method is objective in the sense that there
is no readily discernable connection between the items or input with the emotions
being measured.
This is why the data needs to be interpreted by a team manager who knows the in-
dividual and can use the emotional information in constructive ways. Rather than just
using call handling statistics that may not paint a true picture of the advisors’ day, this
provides an “information enriched environment” [40] allowing the use of job and
social resources to manage job demands and reduce stress.
subsequent incarnations of MUIs have investigated other ways of using the user inter-
face to express stress and emotion.
In an attempt to recreate the popularity of the moodie design amongst users, other
emotional widgets were prototyped in these new MUI designs. One such device was a
‘splatty’ - a pump action ‘splat gun’ (see figure 3). This involved using the mouse as a
frustration release device to shoot paint balls at the screen. Again, the option was
given to send these splats to colleagues or managers, who could, if necessary, provide
support, sympathy or advice.
However, users seemed to find this less of an emotional release than the moodie.
The manager’s feedback mechanism was received well, with the proviso that “man-
agement don’t use it to penalise me for getting stressed”. In terms of the splat gun, the
users liked the emotional release of shooting the paint. However, they felt somewhat
disconnected with what they were shooting at. One explanation for this was that users
found it easier to anthropomorphise the moodie and empathise with it [41].
References
1. Adams, D., Nelson, P., Todd, P.: Perceived usefulness, ease of use, and usage of information
technology: a replication. MIS Quarterly, 227–247 (June 1992)
2. Gignac, A., Appelbaum, S.: The impact of stress on customer service representatives: a com-
parative study. Journal of Workplace Learning 9(1), 20–33 (1997)
192 N. Millard and L. Hole
3. Feinberg, R., Kim, I., Hokama, L., De Ruyter, K., Keen, C.: Operational determinants of call
satisfaction in the call center. International Journal of Service Industry Management 11(2),
131–141 (2000)
4. Millard, N., Hole, L., Crowle, S.: From Command to Control: Interface Design for Customer
Handling Systems. In: Howard, S., Hammond, J., Lindgaard, G. (eds.) Human-Computer In-
teraction: INTERACT 1997, pp. 294–300. Chapman & Hall/ IFIP (1997)
5. Millard, N., Hole, L., Crowle, S.: Smiling Through: Motivation At The User Interface. In:
Bullinger, H., Ziegler, J. (eds.) Human-Computer Interaction: Ergonomics and User Inter-
faces, pp. 824–828. LEA, London (1999)
6. Hole, L., Crowle, S., Millard, N.: The Motivational User Interface. In: May, J., Siddiqi, J.,
Wilkinson, J. (eds.) HCI 1998 Conference Companion, pp. 68–69. BCS (1998)
7. Millard, N.J.: Designing Motivational User Interfaces: Can A Balance Between Effective
And Affective User Interface Design Be Used To Motivate Call Centre Advisors? PhD The-
sis, Lancaster University, UK (2005)
8. Abraham, R.: Emotional Dissonance in Organisations: Antecedents, Consequences and
Moderators. Genetic, Social and General Psychology Monographs 124, 229–246 (1998)
9. Reeves, B., Nass, C.I.: The Media Equation: How People Treat Computers, Television and
New Media Like Real People and Places. Cambridge University Press, Cambridge (1996)
10. Klein, J., Moon, Y., Picard, R.W.: This Computer Responds to User Frustration. Interacting
with Computers 14, 119–140 (2002)
11. Reynolds, C., Picard, R.W.: Designing for Affective Interactions. MIT Media Lab (2001)
12. Picard, R.W., Klein, J.: Computers that Recognise and Respond to User Emotion: Theoreti-
cal and Practical Implications. MIT Media Lab Tech Report No 538 (2001)
13. Wensveen, S., Overbeeke, K., Djajadiningrat, T.: Touch Me, Hit Me and I Know How You
Feel: A Design Approach to Emotionally Rich Interaction. ID Studio Lab, Delft University
of Technology, Netherlands (2000)
14. Taylor, P., Bain, P.: Subterranean Worksick Blues: Humour as Subversion in Two Call Cen-
tres. Organization Studies (November 2003)
15. Abramis, D.J.: Play at Work: Childish Hedonism or Adult Enthusiam? American Behav-
ioural Scientist 33(3), 353–373 (1990)
16. Koestler, A.: The Act of Creation: A Study of the Conscious and Unconscious Processes of
Humour. Scientific Discovery and Art. Hutchison Press, London (1964)
17. Minsky, M.: Jokes and their Relation to the Cognitive Unconscious. In: Vaina, L., Hintikka,
J. (eds.) Cognitive Constraints on Communication: Representations and Processes, Reidel,
Hingham (1984)
18. Spencer, H.: The Physiology of Laughter. Macmillan’s Magazine 1, 395–402 (1860)
19. Braverman, T.: Enhance Your Sense of Self Mirth: Fun in the Workplace: Philosophies Keep
Firms Laughing All the Way to the Bank. Training and Development 47(7), 9–11 (1993)
20. Burns, C.A.: Comparative Analysis of Humour Versus Relaxation Training for the Enhance-
ment of Immunocompetence. Dissertation Abstracts International 57(08-B), 5319 (1996)
21. Hayworth, D.: The Social Origin and Function of Laughter. Psychological Review 35, 367–385
(1928)
22. Meyer, J.C.: Humour in Member Narratives: Uniting and Dividing at Work. Western Journal
of Communication 61, 188–208 (1997)
23. Zilberg, N.: In-group Humour of Immigrants from the Former Soviet Union to Israel. Israel
Social Science Research 10(1), 1–22 (1995)
24. Venkatesh, V.: Creation of Favorable User Perceptions: Exploring the Role of Intrinsic Mo-
tivation. MIS Quarterly 23(2), 239–260 (1999)
In the Moodie: Using ‘Affective Widgets’ to Help Contact Centre Advisors Fight Stress 193
25. Heskett, J.L., Sasser, W.E., Schlesinger, J.L.: The Value Profit Chain: Treat Employees Like
Customers and Customers Like Employees. Free Press (2003)
26. Goodman, J.B.: Laughing matters: Taking your job seriously and yourself lightly. Journal of
the American Medical Association 13, 267 (1992)
27. Feigelson, S.: Energize Your Meetings With Laughter. Association for Supervision and Cur-
riculum Development (1998)
28. Morkes, J., Kernal, H., Nass, C.: Effects of Humour in Computer-Mediated Communication
and Human-Computer Interaction. In: Proceedings of the Conference of Human Factors in
Computer Systems (CHI 1998). ACM Press, Los Angeles (1998)
29. Descartes, R.: Les Passions de L’Ame, Paris (1649)
30. van Thriel, C., Ruch, W.: The Role of Surprise in Humour Appreciation. In: Eleventh Inter-
national Conference on Humour, Grand-Duche de Luxembourg (September 1993)
31. Macdonald, C.L., Sirianni, C.: Working in the Service Society. Temple University Press,
Philadephia (1996)
32. Garcia, O., Favela, J., Licea, G., Machorro, R.: Extending a Collaborative Architecture to
Support Emotional Awareness. CICESE, Mexico (1999)
33. Howard, S., Vetere, F., Gibbs, M., Kjeldskov, J., Pedell, S., Mecoles, K., Bunyan, M., Mur-
phy, J.: Mediating Intimacy: Digital Kisses and Cut and Paste Hugs. In: Proceedings of HCI
2004: Design for Life, 2, Leeds. British Computer Society (2004)
34. Earley, P.C.: Computer Generated Performance Feedback in the Magazine Subscription In-
dustry. Organisational Behaviour and Human Decision Processes 41, 50–64 (1988)
35. Weisband, S., Atwater, L.: Evaluating Self and Others in Electronic and Face-to-face Groups.
Journal of Applied Psychology 84, 632–639 (1999)
36. Knapp, H., Kirk, S.: Using pencil and paper, Internet and touch-tone phones for self-
administered surveys: does methodology matter? Computers in Human Behaviour 19, 117–134
(2003)
37. Fox, S., Schwartz, D.: Social desirability and controllability in computerized and paper-and-
pencil personality questionnaires. Computers in Human Behaviour 18, 389–410 (2002)
38. Pfeffer, J., Sutton, R.: The Knowing-Doing Gap. Harvard Business School Press (November
1999)
39. Watson, D., Clark, L.: Negative Affectivity: The disposition to experience unpleasant emo-
tional states. Psychological Bulletin 95, 465–490 (1984)
40. Amick, B.C., Smith, M.J.: Stress, Computer Based Work Monitoring and Measurement Sys-
tems: A Conceptual Overview. Applied Ergonomics 23, 6–16 (1992)
41. Nowak, K.L., Biocca, F.: The Effect of the Agency and Anthropomorphism on Users’ Sense
of Telepresence and Social Presence in Virtual Environments. Presence: Teleoperators and
Virtual Environments 12(5) (2003)
Feasibility of Personalized Affective Video Summaries
1 Introduction
Video summaries are sequences of still or moving images, optionally with accompa-
nying audio, which present the content of a video stream in such a way that the
respective target user group is rapidly provided with concise information about the
content, while the essential message of the original stream is preserved [1]. The demand
for video summaries originates primarily from users’ viewing time constraints [2], typi-
cally when users need to assess a set of candidate video streams, e.g. when presented
with a set of search results. Video streams often have a lengthy duration; hence there
is a need to reduce user effort by providing a concise version of each stream. Tradi-
tionally, video summaries are produced by analyzing the underlying semantic content
of the original video stream to determine what segments of the stream should be in-
cluded and excluded in the summary; however, recent approaches have sought to look
outside of the video stream, particularly to context and user as potential sources for
determining content significance. Video frequently invokes certain emotions, attitudes
or moods in the viewer and such responses are likely to be heightened in proportion to
the significance of the content to the viewer at the time. Consequently, if these re-
sponses could be captured and analyzed sufficiently, could they be used to form the
basis of a video summarization technique that personalized the video summary for an
individual user? To answer this question, we have undertaken a feasibility study that
collected and analyzed data representing a user’s physiological responses to video
content. The organization of this article is as follows. Section 2 provides an overview
of video summarization techniques. Section 3 discusses the relevance of affective data
for video summarization. Sections 4–6 present the design, analysis method and results
of the feasibility study. Section 7 discusses the research implications. Section 8 makes
some concluding remarks.
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 194–208, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Feasibility of Personalized Affective Video Summaries 195
no research work has been undertaken to produce affective video summaries from
sources external to the video stream; namely from users’ physiological responses.
Consequently, we consider if users’ physiological responses may serve as a suit-
able external source of information for producing individually-personalized affective
video summaries (via an external or hybrid technique). User responses are likely to be
most significant during the segments of a video stream that have most relevance to
that user, since these will tend to be the segments that have the most impact and are
the most memorable; hence, it is these segments that are the foremost candidates for
inclusion within a summarized version of the video stream.
Fig. 2 shows how physiological responses may be used to achieve personalized
affective video summaries. Initially, the user views the full video stream while physio-
logical responses are captured and measured and the most significant responses identi-
fied. After viewing the video content, the temporal locations of these significant video
segments are then associated with the viewed video content and the video summary is
created. The result is personalized affective video summaries; that is, video summaries
that incorporate the video segments that elicited the most significant physiological re-
sponses in the user during viewing. This approach is specifically formulated to be appli-
cable within the context of real-time viewing of video content, where analysis of user
physiological responses is taken to directly reflect the user’s personal experience of the
video content at the time of viewing. When considering one individual user viewing the
same video content on more than one occasion, it is likely that their physiological re-
sponses to the same content may be different on each viewing occasion and this would
be reflected in any further video summaries produced. Therefore, considerations of a
user’s overall mood and long-term affective state at the time of viewing are inherently
taken into account as they are reflected in the user’s physiological responses to the video
content at the time of viewing, and thus the video content selected for inclusion in a
video summary also reflects this. Since users generally recall video content based on
their overall mood and long-term affective state at the time of viewing video content,
this approach is naturally in-step with a user’s holistic experience of the video content.
4 Experiment Design
To investigate the use of physiological response data for determining candidate video
segments for inclusion in a video summary, we undertook a feasibility study in the
form of a laboratory experiment with ten users (6 males, 4 females). All users re-
ported good eyesight and hearing, were non-smokers, and considered themselves to
be experienced viewers of film and TV. To ensure a wide range of video content was
utilized, popular films and TV shows were chosen from five different genres: ac-
tion/sci-fi, horror/thriller, comedy, drama/action, and drama/comedy. This was desir-
able since it was possible that specific video genres would not be suited to eliciting
physiological responses in the user sufficiently for discerning candidate video seg-
ments for inclusion in a video summary. This could not be known in advance and,
consequently, the use of a range of genres provided the potential for gaining some
indicative understanding. To this end, three films were chosen from the IMDb Top
250 Films together with two award-winning TV shows (see Table 1).
Due to the large amount of data that would otherwise be generated, one 15–18 m
video segment (VS) was chosen as being representative of the content of each of the
five full length video streams. Three video sub-segments (VSSs) from each of the
VSs were manually selected as representing the most pertinent segments of each VS,
and were either 15 or 30 s in length to best match the most pertinent content found
within the video content. The manual selection process was carried out independently
by five individuals and a final set of VSS selections was chosen through a process of
comparison and discussion. It was considered important to involve a number of indi-
viduals in the process to ensure that the VSS selections represented, to a degree, a
generalized view of the most pertinent segments of each VS so that they stood a good
chance of also being pertinent to the subjects taking part, thereby increasing the like-
lihood of obtaining usable data. However, the intention of this particular study was
not to determine if the segments correlated with each user’s own personal choices,
since that would rely on the assumption that physiological responses could already be
used to determine the significance of segments, for which there is no precedence in
the research literature. The focus of this study was instead on whether physiological
Feasibility of Personalized Affective Video Summaries 199
responses may be suitable for video summarization at all, that is, whether they are
even sensitive enough to distinguish between video segments, so that this could be
established as a valid assumption for future research.
To control for order effects, VSs were allocated randomly to each user in a way
that ensured each of the five VSs had a reasonable number of viewings once all ten
experimental sessions had been completed, i.e. a minimum of five and maximum of
six viewings per VSS. Thus, each user viewed three of the five VSs in their entirety.
Each VS was preceded by a 4 m interval during which relaxing music was played to
control for the possible effects of pre-experimental nerves and previously viewed
video content. Users were given 40 s to read a textual synopsis of the story so far
before the VS commenced, to ensure understanding of the VS storyline by all.
A video viewing room and observation room were set up (see Fig. 3), separated by
a one-way mirror which allowed unobtrusive observation of the user during the ses-
sion. All VSs were played back via a laptop connected to a projector which displayed
the image on a 42” screen. Clear viewing was ensured by switching off all lights, with
the exception of a small, low-intensity lamp near the user, which provided sufficient
illumination for observation throughout the session.
Physiological data were captured using the ProComp Infiniti system and BioGraph
Infiniti software from Thought Technology. Users’ BVP and HR were recorded by
measuring levels of light absorption of blood through capillary beds in the finger,
using the HR/BVP-Flex/Pro sensor (SA9308M) connected to the peripheral phalanx
of the middle finger. EDR was measured with the SC-Flex/Pro Sensor (SA9309M)
connected to the middle phalanxes of the index and ring fingers. The sensor passes a
small electrical charge between electrodes, measuring the level of electrical conduc-
tivity in the skin. RR and RA was measured with the Resp-Flex/Pro Sensor (SA9311M)
which is an elasticized band connected around the thorax. The band expands and
200 A.G. Money and H. Agius
contracts as the user breathes in and out. Physiological measures were captured at the
standard sampling rates for each sensor, which was 256 Hz for EDR and 2048 Hz for
the respiration and BVP/HR sensors. Fig. 4 shows how the respective sensors were
attached to the user.
1. Dataset preparation: Each set of experiment session data was temporally seg-
mented into VS datasets using the ‘set event marker’ option within the BioGraph
Infiniti application, so that each user’s viewing of each VS was represented by a
physiological response dataset. These were then interpolated and converted into an
8Hz format to reduce processing overhead, and exported in CSV format to allow
for analysis outside of the Infiniti application. At the considerably reduced 8Hz
sampling rate, each physiological response measure produced a substantial amount
of data. For example, a 15 m experimental session would produce 7,200 observa-
tions per physiological measure; therefore with five physiological measures a total
of 36,000 observations per 15 m session were collected. Given the significant
number of observations, it was necessary to automate the analysis process as much
as possible. In this instance, customized scripts were developed to carry out the
remaining stages.
2. User response values: For each VSS within each VS, the user’s average responses
for HR, BVP, RA and RR were calculated. This was achieved by calculating, for
each respective physiological response measure, the average of all observations
corresponding with the temporal location of each respective VSS. The VSS tempo-
ral locations were also used for EDR; however, the signal was detrended using an
approach similar to that of van Reekum et al. [27]. The reading at the start of the
time period was subtracted from the highest value within the time period, which
then represented the maximum level of deflection the stimuli elicited within the
specified time period. This constitutes a method for evaluating local fluctuations in
the EDR signal regardless of unpredictable baseline variations. The output of this
stage was a set of user response values (U values).
3. Signal values: 15 and 30 s moving average windows for HR, BVP, RA and RR and
detrended moving time windows for EDR were then constructed. This was achieved
in a similar fashion to Stage 2, but applied to all observations of each individual
physiological measure within the dataset. The output of this procedure produces ad-
ditional signal values (S values) for each user’s physiological response measure for
each VS dataset, which serve as a baseline from which the significance of user re-
sponses to individual VSSs could be calculated.
4. Percentile rank user response values: Percentile rank U values (PRU values) were
then derived by calculating percentile rank values for the corresponding U values
as a function of the constructed S values. Therefore, S values served as the popula-
tion against which U values could be ranked, thereby facilitating the calculation of
the significance of user responses to respective VSSs as a function of the users’
overall responses to respective VSs.
While any proportion of the sample could be used, for the purposes of this study,
we used 30% of the sample since this was found to provide a range of responses suf-
ficient for discerning significant candidate video segments. Thus, we considered sig-
nificant responses to VSSs as those equal to or greater than the 85th percentile or
equal to or less than the 15th percentile for RA, RR, BVP and HR, and those equal to
or greater than the 70th percentile for EDR. This reflects the fact that EDR percentile
rank values represent the level of positive deflection of the signal, and hence values
less than or equal to the 15th percentile do not necessarily represent significant re-
sponses, but rather reflect little or no significant response.
202 A.G. Money and H. Agius
6 Results
This section presents the individual and aggregate results of applying the data analysis
method from the previous section to the subject data. At an individual user level,
physiological responses to VSSs varied widely. For example, VS1:VSS3 (action/sci-fi
content) elicited significant RA, BVP and HR responses in User #2, but only elicited
a significant HR response in User #6. Similarly, VS2:VSS1 (horror/thriller content)
elicited significant EDR, RR and HR responses in User #4 and significant EDR and
RA responses in Users #2 and #10, but elicited no significant responses in Users #5
and #9. VS3:VSS1 (comedy content) elicited no significant responses in User #1, but
elicited significant EDR, RA, RR and BVP responses in User #10. VS4:VSS2
(drama/action content) elicited a significant EDR response only in User #7, but elic-
ited significant EDR, RA, RR and HR responses in User #2 and significant EDR, RA
and RR responses in Users #4 and #9. VS5:VSS1 (drama/comedy content) elicited
significant EDR and RR responses in User #8, but did not elicit any significant re-
sponses in Users #9 and #10. Given this wide range of responses, we now review in
more detail the VSS from each content genre that has the highest overall percentage
of significant responses:
− Action/sci-fi (VS1:VSS1): All VSSs in The Matrix elicited similar overall percent-
ages of significant response (36%), thus VSS1 is selected for review here. EDR
produced the most marked response, which was significant in all five users. Four
responses were well in excess of 0.85, the lowest of which was 0.801 and the re-
maining four were 0.958 and above. RA was not significant for any of the users,
while RR was significant for two of the five users. User #1 took significantly less
breaths during this VSS with 0.104. The other user, #7, took significantly more
breaths in this period with a PRU response value of 0.939. BVP was significant for
one out of five cases, with a reading of 0.990, which indicates very significant in-
creased blood volume to the body’s peripherals. HR was only significant for one
user with a response of 0.915. However, all five users appeared to have raised HR
in this VSS.
− Horror/thriller (VS2:VSS2): EDR was significantly raised for four of the five us-
ers. User #10 had the highest significant rise in EDR with a value of 0.991. RA
was reduced in all five cases; however, User #2 was the only significant RR re-
sponse with a value of 0.004, indicating that they took significantly faster breaths
during this VSS. RR was significantly high in two of the five cases: User #2 in-
creased RR by 0.876 percent rank significance and User #9 by 0.860. BVP was
significant in four of the five cases. The general trend was a reduced BVP, with
three of the four significant cases showing lowered BVP, indicating a constriction
of blood to the peripherals. Interestingly, User #4 experienced significantly in-
creased blood flow to the peripherals with a percent rank of 0.884. Users #5 and
#9 showed significant increases in HR with 0.929 and 0.940 respectively. 52% of
all the physiological responses elicited by this VSS were found to be significant.
This was the third highest percentage of all VSSs evaluated in the experiment.
− Comedy (VS3:VSS3): Although VSS3 elicited several significant responses, no
EDR values were significant. User #3 took significantly shorter breaths during this
VSS with a PRU value of 0.099. User #6 showed the most significant increase in
Feasibility of Personalized Affective Video Summaries 203
RA with a value of 0.943 and User #8 showed the next most significant value of
0.932. RR was varied with three significant response values, two showing signifi-
cantly faster breaths: User #3 had the highest significance with a value of 0.949,
User #1 had the next highest value with 0.911, and User #6 took significantly
slower breaths during this VSS with a value of 0.120. BVP tended to be signifi-
cantly reduced with four of the six users experiencing significantly constricted
blood volume in their peripherals. User #3 showed the most significant constriction
with a value of 0.035. HR tended to increase during this VSS with five of the six
users showing some increase in HR. User #8 showed the most significant increase
with 0.991. User #10 was the only user to show a significantly reduced HR with a
value of 0.091. A total of 53.34% of all the physiological responses elicited by this
VSS were found to be significant. This was the second highest proportion of sig-
nificant responses of all VSSs evaluated in this experiment.
− Drama/action (VS4:VSS2): All six users experienced significantly raised EDR.
User #9 had the lowest significant response with a value of 0.848, and the highest
response was elicited in User #2 with a response value of 0.999. RA increased for
four of the six users, of which User #9 showed the most significant increase with
0.899. User #2 appeared to take shorter breaths with a significant response in the
lower 50th percentile of 0.035. Users #9 and #4 showed a marked decrease in RR
with response values of 0.40 and 0.122 respectively. User #2 showed a marked in-
crease in RR with a value of 0.993. BVP was not significant for any user, while
HR was significant for three users; User #6 had the most significant increase in HR
with 0.984 and User #8 had the most significant decrease in HR with 0.056.
− Drama/comedy (VS5:VSS3): EDR was significantly elevated for four of the six
users. User #8 had the most significant response with 0.931 and User #10 was next
with 0.871. RA was lowered for four of the six users, three of which experienced
significantly lowered amplitude. User #8’s breaths were most significantly reduced
with a value of 0.102. RR was elevated in four of the six cases, but none were sig-
nificant rises. User #4 had the only significant response indicating reduced RR at
0.15. Four of the six responses for BVP were in the lower 50th percentile, two of
which were significant reductions in peripheral blood volume: these were User #10
with 0.125 and User #8 with 0.133. User #5 responded with increased blood vol-
ume to the peripherals with a value of 0.923. HR was in the lower 50th percentile
for five of the six cases, four of which were significantly low. User #10 was the
lowest of these with 0.043.
From the above, it seems that user physiological responses to video content varied
significantly for individual users and that this may reflect individual user’s personal
experiences of video content. In turn, this may suggest that physiological responses
may serve as a usable source of external (user-based) information and show potential
as an aid for personalized affective video summaries.
To examine the overall user physiological responses to the VSSs, EDR, RA, RR,
BVP and HR response measures were aggregated for each VSS and compared. To see
how each physiological response measure relates to each content genre, the number of
significant user response values observed in each of the three VSSs within each con-
tent genre were added together and presented as a percentage of significant responses
as a function of the total responses observed for each VS. As discussed in Section 4,
204 A.G. Money and H. Agius
since this study was concerned with the feasibility of using physiological responses to
video content as a means for developing video summaries, the primary purpose of
analyzing the data in this way was to establish whether user physiological response
measures were sufficiently sensitive, and thus could be deemed significant for, the
content of specific VSSs. It is therefore important to note that aggregated response
values were not calculated with the primary aim of ascertaining general population
trends. Nevertheless, since the VSS selections may be taken to be an approximation of
the most pertinent video content, as agreed by the group of five individuals that se-
lected them, the results may provide an approximate indication of whether user
physiological responses match pertinent video content.
Our results indicated that, in general, users responded significantly to VSSs, there-
fore indicating that physiological response measures seem to be sensitive enough to
identify significant VSSs based on the user responses. In particular, horror/thriller
content seemed to elicit relatively high numbers of significant EDR, RA, RR and
BVP responses. Fig. 6 illustrates all physiological responses grouped by video genre.
Since significant responses are considered to be those that are in the 15th percentile
and below or in the 85th percentile and above for RA, RR, BVP and HR (totalling
30% of the whole sample), and those that are in the 70th percentile and above for
EDR (totalling 30% of the sample), the proportion of significant responses that can be
expected is at least 30%. In other words, regardless of the elicitation effects of a given
VSS, 30% of RA, RR, BVP and HR user responses would be expected to fall either
within the 15th percentile or below or within the 85th percentile and above; and like-
wise 30% of significant responses for EDR would be expected to fall within the 70th
percentile and above. Therefore, a higher proportion than 30% of physiological re-
sponses falling within these limits indicate that users responded in excess of what
would normally be expected, thus indicating that these VSSs elicited higher than
normal levels of physiological response in the user. To give an indication of the extent
to which the proportion of physiological responses exceeded this 30% limit, Fig. 6
includes a normal response level line.
The results show that the percentages of significant responses for EDR, RA and
RR to horror/thriller content are higher than any other video genre with 80.00%,
46.67% and 46.67% significant responses respectively. A relatively high percentage
of significant response (46.67%) was also observed for BVP; this was the second
highest percentage observed for this measure. Comedy content elicited a relatively
low number of significant EDR and RA responses (only 38.89% and 33.33% of all
responses were significant, respectively) and a low proportion of significant RR re-
sponses (27.78%, which is below the normal response level). However, comedy did
elicit a higher percentage of significant BVP (55.56%) and HR (50.00%) responses
than any other genre. In comparison, action/sci-fi and drama/action content elicited
generally lower levels of response, with less significant differences being observed
between the respective physiological measures. The highest percentage of significant
response observed in either of these genres was the EDR response for action/sci-fi
with a response of 66.67%, which was the second highest EDR response out of the
five genres. Finally, the drama/comedy content elicited the lowest percentage of sig-
nificant responses overall at 33.33%. However, this video genre elicited the second
highest percentage of significant HR response (44.44%) and the second least amount
of significant EDR responses (38.89%). Interestingly this indicates that both video
genres with a comedy element elicited relatively high levels of significant HR re-
sponse and relatively low levels of significant EDR response.
From these results, we may therefore reasonably conclude that there is good poten-
tial for developing personalized affective video summaries based on user physiologi-
cal responses to video content for all genres, although certain video genres appear to
elicit significant physiological responses more consistently than others, which is dis-
cussed in the next section.
8 Concluding Remarks
This article has examined whether affective data can be used as a potential information
source that is external to the video stream for producing personalized affective video
summaries. Through a feasibility study that collected and analyzed data representing a
user’s physiological responses to video content, it has been shown that physiological
responses are potentially a valuable source of external user-based information. Future
research will formally develop a framework so that video sub-segments can be identi-
fied automatically from physiological response data. In addition, mapping users’
physiological response data onto valence and arousal is likely to further assist in identi-
fying candidate video sub-segments for inclusion in the video summary. This will also
offer the user the option of viewing video summaries that consist of specific affective
qualities, an approach that has already proven useful for music media [28]. In the longer
term, further research into other content genres, such as sports, news and reality TV
shows, will help develop a robust and in-depth understanding of how users respond to
all video content genres.
Feasibility of Personalized Affective Video Summaries 207
References
1. Zhu, X., Wu, X., Fan, J., Elmagarmid, A.K., Aref, W.G.: Exploring Video Content Structure
for Hierarchical Summarization. Multimedia Systems 10(2), 98–115 (2004)
2. Li, Z., Schuster, G.M., Katsaggelos, A.K.: MINMAX Optimal Video Summarization. IEEE
Transactions on Circuits and Systems for Video Technology 15(10), 1245–1256 (2005)
3. Money, A.G., Agius, H.: Video Summarisation: A Conceptual Framework and Survey of the
State of the Art. Journal of Visual Communication and Image Representation 19(2), 121–143
(2008)
4. Everingham, M., Sivic, J., Zisserman, A.: Hello! My name is.. Buffy. In: 17th British Ma-
chine Vision Conference, vol. III, pp. 899–908 (2006), http://www.macs.hw.ac.uk/
bmvc2006/BMVC06%20Proceedings%20III.pdf
5. Sivic, J., Schaffalitzky, F., Zisserman, A.: Object Level Grouping for Video Shots. Interna-
tional Journal of Computer Vision 67(2), 189–210 (2006)
6. Wang, Y., Liu, Z., Huang, J.: Multimedia Content Analysis: Using Both Audio and Visual
Clues. IEEE Signal Processing Magazine 17(6), 12–36 (2000)
7. Shao, X., Xu, C., Maddage, N.C., Tian, Q., Kankanhalli, M.S., Jin, J.S.: Automatic Summa-
rization of Music Videos. ACM Transactions on Multimedia Computing, Communications,
and Applications 2(2), 127–148 (2006)
8. Hanjalic, A.: Adaptive Extraction of Highlights from a Sport Video Based on Excitement
Modeling. IEEE Transactions on Multimedia 7(6), 1114–1122 (2005)
9. Hanjalic, A., Xu, L.: User-Oriented Affective Video Content Analysis. In: IEEE Workshop
on Content-Based Access of Image and Video Libraries, pp. 50–57. IEEE Computer Society,
Los Alamitos (2001)
10. de Silva, G., Yamasaki, T., Aizawa, K.: Evaluation of Video Summarization for a Large
Number of Cameras in Ubiquitous Home. In: 13th Annual ACM International Conference on
Multimedia, pp. 820–828. ACM Press, New York (2005)
11. Jaimes, A., Echigo, T., Teraguchi, M., Satoh, F.: Learning Personalized Video Highlights
from Detailed MPEG-7 Metadata. In: IEEE International Conference on Image Processing,
vol. I, pp. 133–136. IEEE, Piscataway (2002)
12. Takahashi, Y., Nitta, N., Babaguchi, N.: Video Summarization for Large Sports Video Ar-
chives. In: IEEE International Conference on Multimedia and Expo., pp. 1170–1173. IEEE,
Piscataway (2005)
13. Aizawa, K., Tancharoen, D., Kawasaki, S., Yamasaki, T.: Efficient Retrieval of Life Log
Based on Context to Content. In: 1st ACM Workshop on Continuous Archival and Retrieval
of Personal Experiences, pp. 22–31. ACM Press, New York (2004)
14. Rui, Y., Zhou, S.X., Huang, T.S.: Efficient Access to Video Content in a Unified Frame-
work. In: IEEE International Conference on Multimedia Computing and Systems, vol. 2, pp.
735–740. IEEE Computer Society, Los Alamitos (1999)
15. Zimmerman, J., Dimitrova, N., Agnihotri, L., Janevski, A., Nikolovska, L.: Interface Design
for MyInfo: A Personal News Demonstrator Combining Web and TV Content. In: IFIP
TC13 International Conference on Human-Computer Interaction (INTERACT), pp. 41–48.
IOS Press, Amsterdam (2003)
16. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-Based Image Re-
trieval at the End of the Early Years. IEEE Transactions on Pattern Analysis and Machine
Intelligence 22(12), 1349–1380 (2000)
17. Simon, H.A.: Comments. In: Sydnor Clark, M., Fiske, S.T. (eds.) Affect and Cognition, pp.
333–342. Erlbaum, Hillsdale (1982)
208 A.G. Money and H. Agius
18. Gomez, P., Danuser, B.: Affective and Physiological Responses to Environmental Noises
and Music. International Journal of Psychophysiology 53(2), 91–103 (2004)
19. Cacioppo, J.T., Berntson, G.G., Klein, D.J., Poehlmann, K.M.: The Psychophysiology of
Emotion Across the Lifespan. Annual Review of Gerontology and Geriatrics 17, 27–74 (1997)
20. Carlson, N.R.: Psychology of Behaviour. Allyn and Bacon, Boston (2001)
21. Philippot, P., Chapelle, C., Blairy, S.: Respiratory Feedback in the Generation of Emotion.
Cognition & Emotion 16(5), 605–627 (2002)
22. Brown, W.A., Corriveau, D.P., Monti, P.M.: Anger Arousal by a Motion Picture: A Meth-
odological Note. American Journal of Psychiatry 134, 930–931 (1977)
23. Detenber, B.H., Simons, R.F., Bennett, G.: Roll ’Em!: The Effects of Picture Motion on
Emotional Responses. Journal of Broadcasting & Electronic Media 42(1), 113–127 (1998)
24. Lang, A., Bolls, P., Potter, R., Kawahara, K.: The effects of Production Pacing and Arousing
Content on the Information Processing of Television Messages. Journal of Broadcasting and
Electronic Media 43(4), 451–476 (1999)
25. Agius, H., Crockford, C., Money, A.G.: Emotion and Multimedia Content. In: Furht, B. (ed.)
Encyclopedia of Multimedia, pp. 222–223. Springer, New York (2006)
26. McIntyre, G., Göcke, R.: The Composite Sensing of Affect. In: Peter, C., Beale, R. (eds.) Af-
fect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg
(2008)
27. van Reekum, C., Johnstone, T., Banse, R., Etter, A., Wehrle, T., Scherer, K.R.: Psychophysi-
ological Responses to Appraisal Dimensions in a Computer Game. Cognition and Emotion
18(5), 663–688 (2004)
28. Loviscach, J., Oswald, D.: In the Mood: Tagging Music with Affects. In: Peter, C., Beale, R.
(eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Hei-
delberg (2008)
Acoustic Emotion Recognition for
Affective Computer Gaming
Abstract. Computer games are becoming visually realistic, and coupled with
more violent storylines and characters there is concern among parents and parts
of the media that violent games could encourage aggressive behaviour in those
who play them. There have been links made between a number of murders and
the violent games played by the murderers. To stimulate other emotions in
game players we have developed an emotionally responsive computer game.
Based on a traditionally aggressive role-playing environment, we have removed
the violent elements and replaced with a goal to navigate obstacles using both
positive and negative emotions. Affective cues in the speech of the player are
analysed automatically using an emotion recognition system and these affect
the physical and behavioural attributes of the gaming character. The character is
better able to overcome obstacles based on the emotional state of the player.
1 Introduction
Violence in computer games has been criticised by parents and lobby groups who believe
that violent games can invoke violent behaviour. The Grand Theft Auto series developed
by Rockstar North has been criticised in the US where the games has been blamed for
killings by youngsters shooting at passing cars. Activision’s Doom was linked to the
Columbine high school shootings. The prosecution of 17 year old Warren Leblanc for the
murder of 14 year old Stefan Pakerrah in February 2004 reinvigorated the debate of pos-
sible association between violent behaviour and violence in computer games.
The depiction of violence in games has become more extreme over recent years.
Today all games developers can create highly realistic and believable visuals using
lighting and sound effects seen in the film industry. It can appear that developers are
turning to more violent scenarios and game play in order to differentiate their game
from others. However there are more ways to engage with game players than simply
more extreme violence, and there are many more emotions that games could stimulate
rather than feelings of anger, rage and aggression [1].
This paper reports on a feasibility study to invoke different emotions (rather than
anger) from game players and automatically recognising these emotions using affective
cues in the speech of the player. Characters in the game can then respond to the game
player’s emotional state.
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 209–219, 2008.
© Springer-Verlag Berlin Heidelberg 2008
210 C. Jones and J. Sutherland
categories including subtle and extreme emotion of hot anger (extreme anger), cold
anger (frustration), interest (upbeat), despair, disgust, happiness, boredom, surprise,
sadness, grief, troubled, excitement, fear, love, sexy and natural.
studio using 16kHz, 16bit mono and performance dynamic directional microphones.
The scripted every day conversations and commercial recordings were recorded on
voice servers at 8kHz, 16bit from speakers on mobile phones. Our research shows that
higher quality recordings provide more acoustic resolution allowing better discrimina-
tion between emotions, however are not practical for commercial settings. Addition-
ally, voice actors tend to over emphasis the presentation of emotion which can improve
acoustic recognition in the studio but create emotion models which are not representa-
tive of the general population and every day speech. Instead the emotion recognition
systems are trained on emotional speech compiled from the scripted mobile phone
conversations and tested on the real-life commercial recordings.
All recordings are manually segmented by the research team into separate speech
files and labeled with an utterance and emotion file tag. These speech files are analysed
by human evaluators in a blind listener study to determine those samples representative
of each emotion. The human evaluators are researchers in the team and have at least 3
years experience working in acoustic affective computing. They are presented with unla-
belled emotional speech samples in a randomized order and must rate each recording for
emotional category and confidence of classification, which are then averaged across all
listeners. We then create sub-corpus of emotional speech from each database containing
i) samples which were recognised correctly and with 100% confidence level, ii) samples
which were recognised correctly and with 75% confidence level, and iii) samples which
were recognised correctly but with 50% confidence level. By including more, or less,
samples with lower confidence levels increases the amount of data on which to train the
classifiers and also provides wider variation of emotional portrayal to support greater
speaker variability.
Table 1. Emotion tracking using acoustic emotion recognition. Output is numerical and results
returned every 2 seconds with a one second overlapping window for example.wav.
Fig. 1. Graphical output of acoustic emotion recognition showing speech waveform, pitch
track, volume, and representation of emotional classification for example.wav
Fig. 2. Level map of obstacles and escape route for emotional responsive computer game
Table 2. Link between player emotional cues and physical and behaviour attributes
The first person view height is adjusted depending on emotional state. The viewing
height is the height from the ground which the view camera sits at. This is altered to
give the impression that the character height level has changed, Fig. 4. When moving
from upbeat to downbeat the game player has the sensation that they are getting
smaller by lowering the first person view height.
Acoustic Emotion Recognition for Affective Computer Gaming 215
Fig. 3. Large and small in-game physical stages. Note the hole in the wall is the same size in
the left and right screenshots however the player character size is large (left) and small (right).
Fig. 4. First person view height for small character (left) and large character (right)
The game commences with the player realising that they are locked in a jail cell. The
player must portray an upbeat emotion in their voice in order that Gordon Freeman be-
comes upbeat, more energetic and able to jump high into the ventilation system, Fig. 5.
The second obstacle is a narrow walk along a winding pipe, Fig. 6. The player must
make Gordon Freeman downbeat by portraying a downbeat emotion so that the character
moves more slowly and is more easily controllable along the pipe.
The third obstacle is a drawbridge over a fast following river in which the player
must place the character into an upbeat emotional state so that he can jump further
and able to clear the gap, Fig. 7. The fourth obstacle is a minefield where Dr. Freeman
must be in a downbeat state to be able to walk slowly through the maze of mines
without detonating them, Fig. 8. The level continues through a series of obstacles
each requiring an emotional response by the player in order to select the emotional
behaviour in the game character.
recognition appears not to frustrate the player or hinder the game play, however users
suggest that it would be unacceptable for more frequent errors when attempting to
complete a full length computer game.
Results from user questionnaires show that the game players believe the emotion-
ally interactive computer game is novel and interesting and can help to engage the
player in the story and characters. The mappings of emotion to character behaviour
and physical attributes were learned quickly and were easily remembered by the gam-
ers with the player having greater speed and size in the happy state, and less size and
speed for the sad state. However there was some concern regarding potential repeti-
tion in the game play if the player has only two emotional states and repeatedly has to
switch from one to another to overcome obstacles. In response we wish to increase the
number and range of emotions which are detected from the player and the range of
physical and behavioural attributes of the gaming character. More control and more
subtle control using different emotions will eliminate repetition and allow players to
control the character more naturally using emotion.
Furthermore, the players suggest that the ability to scale the character physics and
bahaviour proportionally to the degree of the emotion expressed would provide more
subtle game play. Also modifying the level such that selecting the ‘wrong’ state only
hinders the ability to overcome the obstacle rather than completely blocking the path
would provide more freedom and flexibility in how to complete the game.
Although feedback shows that the game players enjoyed the emotional interaction
with the game and agreed that emotion recognition technologies can enhance gaming,
half of those who took part in the study reported that they felt uncomfortable talking
to the game simply in order to change the characters emotional state. They suggested
that emotional interaction could be used more naturally in multiplayer gaming where
the technology could recognise the reactions of the player to events such as being shot
at, or collecting a good item in the game. In addition the players suggest that they
would feel more comfortable interacting with other characters in the game both com-
puter controlled avatars and other player characters.
All participants reported that they would like to see anger used in the game as an
individual physical and behavioural characteristic. Furthermore, half of the players
wanting a fear element included and a quarter would like to have frustration and
extreme anger separated and the game play to respond differently for each. A consid-
eration of this initial research was to develop an emotional game without the more
traditional aggressive and violent game play. However our findings suggest that game
players want to be able to interact with their games using their full range of emotional
states including anger and aggression.
6 Future Research
The project will address the desire of game players for an extended range of emotional
states to be recognised by the game and develop physical and behavioural responses of
the game characters to these new emotions such that the game does not become repeti-
tive. We wish to extend the study to consider additional uses of emotion recognition for
computer games. These include automatically controlling the expressions of game char-
acters in multiple player roll-playing environments thereby allowing players to know the
Acoustic Emotion Recognition for Affective Computer Gaming 219
emotional state of each other; allowing players to interact with game characters emo-
tionally so that game characters respond with different information depending on the
way the player talks to them; and creating acting games where the player must use emo-
tion to act out scripts with a director awarding points on how well they perform. Build-
ing emotional interaction into computer games may provide a new genre of game play
which is not wholly reliant on aggression and violence but instead rewards positive feel-
ings and emotional interaction. However as we see currently within the computer game
community the desire for violence and aggression may result in emotion recognition
technologies becoming integrated into the next generation of games and used in the
main to recognise anger and frustration.
References
1. Freeman, D.: Creating Emotion in Games. New Riders Publishing (2004)
2. Cowie, R., et al.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Process-
ing Magazine, 32–80 (January 2001)
3. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
4. The Humaine Portal.: Research on Emotion and Human-Machine Interaction (2004), http://
www.emotion-research.net/
5. Banse, R., Scherer, K.R.: Acoustic Profiles in Vocal Emotion Expression. Journal of Personal-
ity and Social Psychology 70, 614–636 (1996)
6. Liberman, M., et al.: Emotional Prosody Speech and Transcripts, Linguistic Data Consortium,
Philadelphia (2002)
In the Mood:
Tagging Music with Affects
1 Introduction
This system combines and extends features that have been presented in iso-
lation. Already in 1998, Healey, Picard and Dabek [1] used skin conductance in
their “Affective DJ.” The XPod of Dornbush et al. [2] employs the BodyMedia
SenseWear armband plus other data such as time of day, genre, and musical
tempo as input to a neural network that estimates whether the user will skip
the current song. Chung’s “Affective Remixer” [3] records the galvanic skin re-
sponse and the foot tapping. It plays back prepared music clips, targeting a
prescribed mood. Meyer’s “mySoundTrack” [4] guesses the mood from a text
snippet and then generates a playlist. In June 2007, Sony-Ericsson have intro-
duced mobile phones with a “SenseMe” function that displays songs as dots in a
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 220–228, 2008.
c Springer-Verlag Berlin Heidelberg 2008
In the Mood: Tagging Music with Affects 221
Fig. 1. The central engine of our system helps the user to tag the music files; it accepts
queries concerning certain moods and controls a music player appropriately
2D space spanned by the poles happy, sad, fast and slow, based on an automated
analysis.
Measuring physiological and motion data, Wijanda et al. [5] employ music as
feedback for physical exercising, an idea that has also been picked up by Oliver
and Flores-Mangas [6]. Reddy and Mascia [7] present a context engine that reacts
to location, time, environment, and the user’s motion; Corthaut et al. [8] employ
a huge body of metadata to select music for a given atmosphere. Livingstone and
Brown [9] propose rules to control music from game-state information. They
map two-dimensional emotion descriptions to musical terms such as staccato or
minor. On the side of emotion detection in music, Yang et al. [10] introduce fuzzy
classifiers; Mandel et al. [11] employ active learning on a support vector machine.
On top of that, novel ways to sort and select music have been described: Knees
et al. [12] create playlists by placing similar songs next to each other; Andric
and Xech [13] organize music in an imaginary 2D space according to rhythm,
harmonics, and loudness.
2 Data Model
To process moods on a computer requires a data model. After researching into
existing models we decided to develop a proprietary mood model for our needs.
A good model balances expressivity and ease of use: It offers enough complexity
to describe and tag music appropriately; at the same time it is comprehensible.
The concept of “basic emotions” is broadly discussed in psychology, see e. g.
Ortonly and Turner [14]. However, we found it to be wanting for music. For
222 J. Loviscach and D. Oswald
instance, Kalinnen [15] shows that in most listener’s ears music cannot depict
surprise or disgust. Both are, however, typical candidates for basic emotions.
Many works use a two-dimensional model of emotion with axes similar to neg-
ative/positive and passive/active, see for instance Schubert [16]. Whereas this
has been applied to music, in our view such a model quickly deteriorates to the
axes happy/sad and fast/slow and thus loses many aspects.
Models with an additional third axis can be found as well, offering the required
complexity while still being simple enough to be understood by the user. The
conventional choice for the third axis is “power/control,” see e. g. Mehrabian [17].
But we found that this axis is hard to map to music. Thus, we developed and
tested three different models. The first two employ a three-dimensional approach;
both turned out to be not intuitive enough for tagging music. The third model
is based on a high-dimensional model, making it easier to tag music but harder
to visualize “mood space” as a geometric entity.
Model 1 is based on the common two dimensions negative/positive and pas-
sive/active; they are extended by a third axis specifying a time orientation:
backward such as triumph or forward such as fear, see Fig. 2. This is indepen-
dent of the first two axes, but employs a rather abstract concept. The basic idea
seemed to be promising, but we could not find appropriate terms for a number of
xyz triples in this 3D space. If there is no commonly agreed term for a position,
it probably does not map to a meaningful mood.
Model 2 tries to overcome these shortcomings with four different z-axes. It com-
prises the same first two dimensions as the first model, but augments them by
an additional dimension that depends on the first two dimensions. For instance,
Passionate
Aggressive
Monotonous
Longing
Forward
Act
Sad Bright
Act Pos
Neg
Backward
Pass Pos Relaxed
Pass
Neg Frustrated
3 Tagging
To include a music file in the automatic selection process, the system requires a
description of the mood that this file represents. Such a “tag” can be attached
manually by setting seven sliders, see Fig. 3. After the user clicks on a slider
or drags it, a diagram displays the overall distribution in the music repository’s
corresponding dimension. Every song is represented by a dot, with more data
displayed on demand as a tooltip. To never overlap, the dots are stacked in the
vertical direction.
Tagging every song by hand would be impractical. Thus, we implemented a
system for automatic tagging based on feature-extraction methods from content-
based music information retrieval. This automatic tagging serves as a starting
point for the user’s own adjustments: As previous studies [18] have shown, high
granularity—as required by our seven-dimensional model—leads to only vague
automatic detection results.
224 J. Loviscach and D. Oswald
Fig. 3. To facilitate manual tagging, the data of other files are displayed as a reference
4 Music Selection
The system offers four playback modes that use mood data in the player software:
sensor, game, proximity, and chain.
In the Mood: Tagging Music with Affects 225
Fig. 4. The “game mode” is based on rules that translate the statistics of Activision
Call to Power II into mood coordinates
Sensor mode. This mode selects music that corresponds to the user’s current
mood. To this end, we employ inexpensive, off-the-shelf hardware—a system that
cannot match the sophistication of the one of Peter et al. [20]. Five physiological
signals are measured:
– The galvanic skin response is recorded as the electrical conductance between
the user’s index and middle finger.
– The temperature of the skin is measured through a temperature-dependent
resistor such as used in digital thermometers.
– The heart rate is captured by placing an infrared light-emitting diode and
a photometer chip on either side of a finger to detect the oscillation in light
absorption. Signal processing adapts to the strong temporal drift.
– The breath rate and volume are determined from the noise pattern the ex-
halation causes in a microphone mounted below the user’s nose.
A microphone connected to the computer’s audio input records the breath
signal. The other sensors are attached to a standard computer mouse; their
signals are fed into a low-cost USB-based analog-to-digital converter. The raw
sensor data are converted to mood data by a neural network, which the user
has to train in advance. At any time, the user may override the result of the
automatic conversion of sensor data to mood data. The mood data will be taken
into account to select the upcoming song; the user may skip the current song.
Game mode. A game on the same or another computer sends requests via
Internet Protocol that control the mood of the music to be played. These are
taken into account when the current song ends or may be enforced immediately
226 J. Loviscach and D. Oswald
Fig. 5. The main user interface displays standard playback controls, mood controls
(here in chain mode), the generated playlist, and the music repository
References
1. Healey, J., Picard, R., Dabek, F.: A new affect-perceiving interface and its appli-
cation to personalized music selection. In: Proc. of PUI 1998 (1998)
2. Dornbush, S., Fisher, K., McKay, K., Prikhodko, A., Segall, Z.: Xpod – a human
activity and emotion aware music player. In: Proc. of the International Conference
on Mobile Technology, Applications and Systems, pp. 1–6 (2005)
3. Chung, J.-W., Vercoe, G.S.: The affective remixer: personalized music arranging.
In: CHI 2006 Extended Abstracts, pp. 393–398 (2006)
4. Meyers, O.: mySoundTrack: A commonsense playlist generator (2005),
http://web.media.mit.edu/∼ meyers/mysoundtrack.html
5. Wijnalda, G., Pauws, S., Vignoli, F., Stuckenschmidt, H.: A personalized music
system for motivation in sport performance. IEEE Pervasive Computing 04(3),
26–32 (2005)
6. Oliver, N., Flores-Mangas, F.: MPTrain: a mobile, music and physiology-based
personal trainer. In: Proc. of MobileHCI 2006, pp. 21–28 (2006)
7. Reddy, S., Mascia, J.: Lifetrak: Music in tune with your life. In: Proc. of HCM
2006, pp. 25–34 (2006)
8. Corthaut, N., Govaerts, S., Duval, E.: Moody tunes: The Rockanango project. In:
Proc. of ISMIR 2006, pp. 308–313 (2006)
9. Livingstone, S.R., Brown, A.R.: Dynamic response: real-time adaptation for music
emotion. In: Proc. of IE 2005, pp. 105–111 (2005)
10. Yang, Y.-H., Liu, C.-C., Chen, H.H.: Music emotion classification: a fuzzy ap-
proach. In: Proc. of MULTIMEDIA 2006, pp. 81–84 (2006)
11. Mandel, M.I., Poliner, G.E., Ellis, D.P.W.: Support vector machine active learning
for music retrieval. Multimedia Systems 12(1), 3–13 (2006)
12. Knees, P., Pohle, T., Schedl, M., Widmer, G.: Combining audio-based similarity
with web-based data to accelerate automatic music playlist generation. In: Proc.
of MIR 2006, pp. 147–154 (2006)
228 J. Loviscach and D. Oswald
13. Andric, A., Xech, P.L., Fantasia, A.: Music mood wheel: Improving browsing expe-
rience on digital content through an audio interface. In: Proc. of AXMEDIS 2006,
pp. 251–257 (2006)
14. Ortony, A., Turner, T.J.: What’s basic about basic emotions? Psychological Re-
view 97, 315–331 (1990)
15. Kalinnen, K.: Emotional ratings of music excerpts in the Western art music reper-
toire and their self-organization in the Kohonen neural network. Psychology of
Music 33(4), 373–379 (2005)
16. Schubert, E.: Measuring emotion continuously: validity and reliability of the two-
dimensional emotion-space. Australian J. of Psychology 51(3), 154–156 (1999)
17. Mehrabian, A.: Pleasure–arousal–dominance: A general framework for describing
and measuring individual differences in temperament. Current Psychology: Devel-
opmental, Learning, Personality, Social 14, 261–292 (1996)
18. Li, T., Ogihara, M.: Detecting emotion in music. In: Proc. of ISMIR 2003, pp.
239–240 (2003)
19. Pampalk, E., Dixon, S., Widmer, G.: On the evaluation of perceptual similarity
measures for music. In: Proc. of DAFx 2003, pp. 7–12 (2003)
20. Peter, C., Ebert, E., Beikirch, H.: A wearable multi-sensor system for mobile ac-
quisition of emotion-related physiological data. In: Tao, J., Tan, T., Picard, R.W.
(eds.) ACII 2005. LNCS, vol. 3784, pp. 691–698. Springer, Heidelberg (2005)
21. Lichtenstein, A., Oehme, A., Kupschick, S., Jürgensohn, T.: Comparing Two Emo-
tion Models for Deriving Affective States from Physiological Data. In: Peter, C.,
Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS,
vol. 4868. Springer, Heidelberg (2008)
22. Money, A.G., Agius, H.: Automating the extraction of emotion-related multimedia
semantics. In: Workshop on The Role of Emotion in Human-Computer Interaction
(2005)
23. Nagel, F., Grewe, O., Kopiez, R., Altenmller, E.: The relationship of psycho-
physiological responses and self-reported emotions while listening to music. In:
Proc. of the 30th Göttingen Neurobiology Conference (2005)
24. Livingstone, S.R., Brown, A.R., Muhlberger, R.: Influencing the perceived emotions
of music with intent. In: Proc. of the 3rd International Conference on Generative
Systems (2005)
Using Paralinguistic Cues in Speech to Recognise
Emotions in Older Car Drivers
Abstract. Interactive speech based systems are moving into the car since
speech interactions are considered less detrimental to the driver than interac-
tions with a display. The introduction of in-car speech-based interactions
highlights the potential influence of linguistic and paralinguistic cues such as
emotion. Emotions direct and focus people’s attention on objects and situations,
and affects performance, judgment and risk-taking. All of these properties are
crucial for driving where the smallest slip-up can have grave repercussions.
Emotional cues in a car-voice, paired with the emotional state of the driver,
have been found to influence driving performance. This initiated the design of
an in-car driver emotion detection and response system. Results show that the
in-car system can recognise and track changes in the emotional state of the
driver. This study considers older drivers who often feel both unsafe and inse-
cure due to concerns about declining abilities and in particular vision.
1 Introduction
Interactive information systems are rapidly finding their way into the car. Current re-
search and attention theory both suggest that speech-based interactions would be less
detrimental to the driver than would interactions with a visual display [1]. Introducing
speech-based interaction and conversation into the car highlights the potential influence
of linguistic and paralinguistic cues. These cues play a critical role in human—human
interactions, manifesting among other things, personality and emotion [2]. The research
literature offers a number of definitions of “Emotion”, where two generally agreed-
upon aspects of emotion stand out [3]: 1) Emotion is a reaction to events deemed rele-
vant to the needs, goals, or concerns of an individual; and, 2) Emotion encompasses
physiological, affective, behavioural, and cognitive components. Emotions can be rela-
tively short lived, and when they are sustained they are called moods. Just as humans are
built to process and produce speech [4], humans are built to process and produce emo-
tion/mood [2].
Emotions direct and focus people’s attention on objects and situations that have
been appraised as important to current needs and goals. In a voice interface, this
C. Peter and R. Beale (Eds.): Affect and Emotion in HCI, LNCS 4868, pp. 229–240, 2008.
© Springer-Verlag Berlin Heidelberg 2008
230 C. Jones and I.-M. Jonsson
attention- function can be used to alert the user, as by a navigation system’s “turn left
right now”, or it can be distracting, as when users are frustrated by poor voice recog-
nition. Just as emotions can direct users to an interface, emotions can also drive
attention away from the stimulus eliciting the emotion [5]. For example, if a person
becomes angry with a voice recognition system, the user may turn off or actively
avoid parts of an interface that rely on voice input. Emotions have been found to af-
fect cognitive style and performance, where even mildly positive feelings can have a
profound effect on the flexibility and efficiency of thinking and problem solving [6].
People in a good mood are significantly more successful at solving problems [2].
Emotion also influences judgment and decision making. This suggests, for example,
that users in a good mood would likely judge both a voice interface itself, as well as
what the interface says, more positively than if they were in a negative or neutral
mood. It has also been shown that people in a positive emotional state also accept rec-
ommendations and take fewer risks than people in a negative emotional state [7].
Driving presents a context where emotion can have enormous consequences. At-
tention, performance, and judgment are of paramount importance in automobile op-
eration, with even the smallest disturbance potentially having grave repercussions.
There is overwhelming evidence suggesting that older adult drivers may have more
difficulty in attending to the driving task than younger drivers, especially when re-
quired to make complex decisions [8]. The driving task places significant perceptual
and cognitive demands on the driver and the normal aging process negatively affects
many of the perceptual, cognitive and motor skills necessary for safe driving [9].
Older adults are also more easily distracted by irrelevant information than young
adults, and thus may direct their attention to the wrong place and miss cues indicating
potential hazardous situations [10].
Older drivers often feel both unsafe and insecure as drivers. This is to a large extent
due to observations and concerns about declining abilities and in particular vision [11].
There is also a strong link between age, visual task load, stimulus location and reaction
time to unexpected stimuli. Reaction times are longer for older drivers, and reaction
times in general are longer for roadside stimuli than for stimuli in the middle of the
road [12]. Common causes and dangerous risk factors for older adult drivers include:
• Failure to maintain proper speed.
• Improper left turns.
• Failure to yield right-of-way.
• Confusion in heavy traffic.
• Hesitation in responding to new traffic signs, signals, road markings.
• Hesitation in responding to different traffic patterns and roadway designs.
Previous studies show that alerting young drivers to hazards in the road results in a
more cautious and safer driving [13]. An in-car voice system was used to give the
drivers relevant and timely road information, and thereby providing extra time and
distance for them to evaluate the driving situation. A similar in-car voice system de-
signed for older adults with car-voice prompts that compensate for memory loss sug-
gesting actions that have not been remembered i.e. speed limits; prompts that
provided contextually relevant advice i.e. road conditions; and prompts that provided
warnings in safety critical situations, showed the same benefits as for younger drivers,
that is improved driving performance [13]. Emotion of the car-voice impacts driving
Using Paralinguistic Cues in Speech to Recognise Emotions in Older Car Drivers 231
performance, results from a study using a car-voice system where the emotion of the
car-voice and the emotion of the driver were matched or miss-matched showed that
matched emotions positively impacted driving performance [14]. These results make
it interesting to investigate the feasibility of designing the emotionally responsive car.
Blood Volume Pulse, Respiration and Electromyograms taken from the steering wheel
and seat/seat-belt; and also analysing the acoustic cues contained in speech. Although it
may only be a matter of time before video cameras and biometric sensors are fitted as
standard in cars, speech controlled systems are already commonplace. Voice-controlled
satellite navigation, voice-dial mobile phones and voice-controlled multimedia systems
exist and drivers are more educated and comfortable with their use. Therefore the pro-
ject can incorporate voice-based emotion recognition without any requirement of addi-
tional hardware or changes to the driver’s environment.
The acoustic emotion recognition system used in this project has been trained to
recognise a range of emotions including boredom, sadness, grief, frustration, extreme
anger, happiness and surprise, from United Kingdom and North American native Eng-
lish speakers. The emotion recognition can track changes in emotional state over time,
presenting emotion decisions numerically to represent the degree of emotional cues
present in the speech. Details of the acoustic emotion recognition system and the emo-
tive speech data on which it was trained and tested appear in the article ‘Acoustic
Emotion Recognition for Affective Computer Games’.
For the propose of diagrammatic clarity with the in-car emotion project, the acous-
tic emotion recognition system outputs a range of emotional faces, including ‘not
sure’ when there is the possibility that the speech exhibits multiple emotions e.g.
boredom and sadness, or happiness and surprise; and ‘no decision’ when no emotion
has been detected e.g. neutral / natural emotion, Table 1.
Car environment sound effects were played through stereo speakers. These sound
effects included engine noise, brake screech, indicators, sirens etc. Verbal information
from the car was also played through the speakers. The information present included:
• There is thick fog ahead.
• You are approaching an intersection.
• Warning there is a fallen tree in the road ahead.
• Beware of cyclists ahead.
• The current speed limit is 60 miles an hour.
• There are crosswinds in this area.
• Stop sign ahead.
• The police use radar here, you might need to slow down.
• There is heavy traffic ahead, turn left to avoid it.
• There is an accident ahead, turn right to avoid it.
Participants were not forced to converse with the car but instead told that they
could talk back to the car if they wished.
Speech from the participants was recorded using multiple microphone and sam-
pling configurations. These included an Andrea superbeam array microphone placed
40cm to the left of the steering wheel and directed towards the mouth of the driver
and recorded directly onto a laptop; an AKG microphone co-located with the Andrea
superbeam and recording onto Minidisk (MD); and an Andrea directional bean with 4
microphones placed in front and about 1.5 meters away from the driver and recorded
on a Sony Handicam DVD201. All audio recordings were synchronized using an au-
dio marker at the start of the drive. These microphones are typical of those used in the
cars of today. Without including noise reduction and cancellation systems, the first
two configurations were degraded by significant engine acceleration, brake, indicator,
and other car noise. The final configuration of a directional beam Andrea to DVD
234 C. Jones and I.-M. Jonsson
Handicam provided the cleanest acoustic recordings without overly sampling the car
noise. Thus this final configuration was used by the acoustic emotion recognition sys-
tem to detect and recognise the emotion of the drivers.
The driving sessions were also videotaped from the front left of the driver to show
driver hands, arms, upper body, head and eye motion, and facial expressions. Al-
though this study does not consider image processing as a means to recognise driver
emotions the video is used to correlate results from the acoustic emotion recognition
with the emotions displayed in the faces of the drivers.
Using the recorded speech the project aims to answer three questions:
• Do older drivers converse with the car when the car is providing information
and instructions?
• Can acoustic emotion recognition detect and recognise the emotions of older
drivers?
• Can acoustic emotion recognition track the older driver emotions so as to re-
spond emotionally and with empathy?
5 Results
Of the 28 participants we consider a subset of 18 older drivers to perform the emo-
tional older driver study. These drivers are both male and female and range in age
from 57 to 73.
5.1 Do Older Drivers Converse with the Car When the Car Is Providing
Information and Instructions?
The participants were not encouraged to converse with the car but were told that they
could talk if they wished. The result was that around half of the participants spoke
back to the car and the other half remained silent through the drive. Of those that
spoke, some responded only when the car spoke to them and then only to say ‘Thank
you’. However 4 of the older participants conversed freely and extensively with the
car and it is these 4 drivers which are considered in greater detail.
The in-car system was instructional rather than conversational, providing informa-
tion rather than requesting it from the driver. Consequently the car does not provide
significant opportunity for the driver to establish dialog. By considering alternative
discursive messages from the car we will be able to provide more opportunities for the
driver to respond, and respond more freely.
5.2 Can Acoustic Emotion Recognition Detect and Recognise the Emotions of
Older Drivers?
To ascertain whether the emotion recognition system can detect and recognise older
driver emotions, this paper considers the 4 older drivers who conversed freely with
the car. The participants exhibit a range of emotions including boredom, sadness, an-
ger, happiness and surprise, however for most of the drive the participants have a
neutral/natural emotional state. When challenged in the drive by obstacles in the road,
other drivers, difficult road conditions and pedestrians, we observe strong emotions
Using Paralinguistic Cues in Speech to Recognise Emotions in Older Car Drivers 235
from the drivers. During these emotional events the acoustic emotion recognition sys-
tem must detect and recognise the emotional state of the driver.
By listening to the speech recording only, a human labeler transcribes the drive, in-
cluding not only the words of the conversation but also the emotional state of the
driver. The human labeler is not shown the video recording of the drive and is there-
fore not aware of the driving situations associated with the driver speech. Also the
human labeler is unaware of the output classifications of the automatic emotion rec-
ognition system. The same driver speech recording was processed by the acoustic
emotion recognition system and its output classification represented as emotive faces
for each second of the drive. The performance of the automatic emotion recognition
system was determined by comparing the human emotion transcript against the output
from the recognition system. The comparative study is presented for each emotive
segment from the 4 talkative older drivers. Only parts of the complete transcript and
emotional classifications are included for conciseness.
Driver: Female, age 73
Speech recording: 29 minutes of speech, during which there is one crash. The driver
is talkative, offering opinions and a commentary throughout the drive.
Emotive event 1: The driver is warned by the car of heavy traffic ahead and advised to
turn left to avoid it. Carrying out this manoeuvre she has a problem with the steering
wheel and becomes emotive. Whilst the passenger explains what has happened her
emotion can be seen to be detected as frustration. There is also laughter during the
conversation which is detected as happiness, Fig. 3.
T: Help the steering wheel work loose my left turn that crashed tree <laugh>
H:
A:
Fig. 3. Transcription (T), human emotion recognition (H) and automated emotion recog (A)
Emotive event 2: The driver has been warned of slow moving vehicles and is stuck
behind one which is in the outside lane. She observers a near-miss and comments on
the other cars undertaking the slow traffic, Fig. 4.
T: Something’s coming the other way Ooh everybody else is going on the inside
H:
A:
Fig. 4. Transcription (T), human emotion recognition (H) and automated emotion recog (A)
236 C. Jones and I.-M. Jonsson
T: I can’t see I could have if I learnt how to work the machine looking back
H:
A:
Fig. 5. Transcription (T), human emotion recognition (H) and automated emotion recog (A)
H:
A:
Fig. 6. Transcription (T), human emotion recognition (H) and automated emotion recog (A)
Emotive event 2: The driver starts to whistle. This is an ‘upbeat’ song and is detected
as surprise. He then becomes genuinely surprised by something and brakes, Fig. 7.
H:
A:
Fig. 7. Transcription (T), human emotion recognition (H) and automated emotion recog (A)
Using Paralinguistic Cues in Speech to Recognise Emotions in Older Car Drivers 237
Emotive event 3: The driver was warned by the system that the road is very slippery.
He agrees and encourages himself to keep calm, Fig. 8.
H:
A:
Fig. 8. Transcription (T), human emotion recognition (H) and automated emotion recog (A)
Emotive event 4: The driver is warned of a stop sign ahead, and while looking for it
starts humming ‘Strangers in the night’. This is detected by the acoustic emotion rec-
ognition system as sadness. The human listeners also considered the driver to be in a
sad/negative mood whilst singing. The driver then notices actions of another driver
and becomes frustrated by actions of some ‘bloody kids!’, Fig. 9.
T: <hum> dear that’s a stop sign oh I oh look at that idiot look oh bloody kids
H:
A:
Fig. 9. Transcription (T), human emotion recognition (H) and automated emotion recog (A)
There is a strong correlation between the emotional transcript created by the human
listener and the emotion output returned automatically by the acoustic emotion recog-
nition system. It is more difficult to objectively measure the performance of the emo-
tion recognition system with driver speech as there are occasions where the speech is
masked by car noise (such as engine, brakes etc) and times when the driver could be
one of two emotions such bored or sad, happiness or surprised. In these cases it is in-
appropriate to say that the acoustic emotion recognition system is erroneous, rather
the human listener could also be confused in classifying the emotion. Considering
these limitations in the ability to objectively measure performance, we considered a
second by second comparison between the human listener and the automated emotion
recognition system for emotionally distinct events, and which provides a performance
accuracy of greater than 70% for the 7 emotion categories.
The current range of emotions detected (boredom, sadness, grief, frustration, anger,
happiness, and surprise) may not be the optimal range of emotions required for the
emotionally responsive car. If the car doesn’t need to recognise the difference be-
tween boredom and sadness and instead detect ‘downbeat’ (human listeners also have
problems deciding between boredom and sadness) then the performance accuracy of
the emotion recognition system can be further improved.
238 C. Jones and I.-M. Jonsson
5.3 Can Acoustic Emotion Recognition Track the Older Driver Emotions So as
to Respond Emotionally and with Empathy?
The automatic recognition system can recognise and classify emotion for every sec-
ond of speech (seen in the emotion plots). This creates an emotion track showing the
mood of the driver throughout their drive. Additionally, the emotion recognition sys-
tem can output the emotion classification numerically showing the level of presence
of emotion every second. In this way the granularity of recognition is appropriate for
the car to be able to track and respond to the driver’s emotion. In a driver-car conver-
sation, the car would be able to follow the conversation, continuously monitoring the
changing emotion of the driver and thus respond reflectively to the driver’s mood.
The car could modify its synthesized words for its response together with placements
of stresses and emotional cues.
6 Discussion
Considerable amounts of time and money are spent by car manufacturers on improv-
ing car safety. These include installation of anti-locking brakes, adaptive steering sys-
tem, air bags, parking sensors as well as many in-car warning indicators. All of these
systems have required comprehensive, lengthy and costly design stages.
Although research into voice-based car interfacing is relatively immature by com-
parison, it has been shown that simple, inexpensive and fully controllable aspects of a
car interface can also have a significant impact on driver safety. It is well known that
upset drivers perform much worse on the road than happier drivers [7]. Changing the
paralinguistic characteristics of the in-car voice is sufficient to have a dramatic impact
on driving performance [14]. Furthermore, the number of accidents and the drivers’
attention to the road can be strongly influenced by the voice of the car. However it has
been found that there is not one effective voice for happy and sad drivers. Upset driv-
ers benefit from a subdued in-car voice while happy drivers benefit from an energetic
in-car voice. This suggests that the car must adapt its voice to the driver.
In order for the car to modify its voice so as to respond to the emotion of the driver
and thus improve their driving, the car must first be able to assess the driver’s mood.
Emotion recognition could be achieved from tracking facial expressions using in-car
cameras and image processing, monitoring driver physiological changes from sensors
in the steering wheel, and voice analysis from driver-car conversations. As the car in-
dustry already uses voice-controlled systems in their cars it is sensible to consider
voice emotion recognition as an inexpensive addition to safety.
The project has reported on the application of an in-house acoustic emotion recog-
nition system for the recognition and tracking of driver emotion for older participants
using a driving simulator. We have been able to conclude that there is sufficient accu-
racy and granularity of emotion recognition to enable an emotionally intelligent in-car
voice to adapt to the mood of the driver.
system to not only provide information but also request information from the driver as
to their desired course of action. This will encourage conversation and enable the
acoustic emotion recognition to gain insight into the mood of the driver.
Once the car knows the emotional state of the driver how should it adapt? Previous
studies have considered varying the paralinguistic cues only [7], however should the
content of the response also change, and how? Should the car become less or more
talkative depending on the mood of the driver? Should the car alter the telematics,
climate, music in the car in response the mood of the driver?
Further research should consider the affect of altering the car response and car en-
vironment to driver emotion. One strategy is to exhibit empathy by changing the emo-
tion of the car-voice to match the user. Empathy fosters relationship development, as
it communicates support, caring, and concern for the welfare of another. A voice
which expresses happiness in situations where the user is happy and sounds subdued
or sad in situations where the user is upset would strongly increase the connection be-
tween the user and the voice [16].
How fast should the emotion of voice change? Although rapid response to pre-
dicted emotion of the user can be effective, there are a number of dangers in this
approach. Emotions can change in seconds in the human brain and body [17]. A sad
person may momentarily be happy if someone tells a joke, but will fall back into their
sad state relatively quickly. Conversely, happy drivers may become frustrated as they
must slam on the brakes for a yellow light, but their emotion may quickly switch back
to feeling positively. If the voice in the car immediately adapted to the user’s emo-
tions, drivers would experience occurrences such as the car-voice changing its emo-
tion in mid-sentence. This would dramatically increase cognitive load [18], constantly
activate new emotions in the driver and be perceived as psychotic. This might be en-
tertaining when performed by manic comedians like Robin William, but an in-car
voice would quickly be marked as manic-depressive instead of empathetic!
Mood must be taken into account to make the car-voice an effective interaction part-
ner. Moods tend to bias feelings and cognition over longer terms, and while moods can
be influenced by emotions, they are more stable and effectively filter events. A person
in a good mood tends to view everything in a positive light, while a person in a bad
mood does the opposite. Drivers that are in a good mood when entering a car are more
likely to experience positive emotion during an interaction with a car-voice than drivers
in a bad mood. It would therefore seem that emotion in technology-based voices must
balance responsiveness and inertia by orienting to both emotion and mood.
Humans are what they feel: Performance, knowledge, beliefs, and feelings are to a
large extent determined by emotions. People are also influences by voice interactions
with people and interfaces. This makes it important for designers of speech based sys-
tems to work with linguistic and para-linguistic cues, especially emotional cues, to
create the desired effect when people interact with the system.
References
1. Lunenfeld, H.: Human Factor Considerations of Motorist Navigation and Information Sys-
tems. In: Proc. of Vehicle Navigation and Information Systems, pp. 35–42 (1989)
2. Strayer, D., Johnston, W.: Driven to Distraction: Dual-Task Studies of Simulated Driving
and Conversing on a Cellular Telephone. Psychological Science 12, 462–466 (2001)
240 C. Jones and I.-M. Jonsson
3. Brave, S., Nass, C.: Emotion in Human-Computer Interaction. In: Jacko, J., Sears, A. (eds.)
Handbook of Human-Computer Interaction, pp. 251–271. Lawrence Erlbaum Associates,
Mahwah (2002)
4. Pinker, S.: The Language Instinct. W. Morrow and Company, New York (1994)
5. Gross, J.J.: Antecedent- and Response-Focused Emotion Regulation: Divergent Conse-
quences for Experience, Expression, and Physiology. Journal of Personality and Social Psy-
chology 74, 224–237 (1998)
6. Hirt, E.R., Melton, R.J., McDonald, H.E., Harackiewicz, J.M.: Processing Goals, Task Inter-
est, and the Mood-Performance Relationship: A Mediational Analysis. Journal of Personality
and Social Psychology 71, 245–261 (1996)
7. Isen, A.M.: Positive Affect and Decision Making. In: Lewis, M., Haviland-Jones, J.M. (eds.)
Handbook of Emotions, pp. 417–435. The Guilford Press, New York (2000)
8. McGehee, D., Lee, J., Rizzo, M., Bateman, K.: Examination of Older Driver Steering Adap-
tation on a High Performance Driving Simulator. In: Proceedings of International Driving
Symposium on Human Factors in Driver Assessment, Training, and Vehicle Design (2001)
9. Nemoto, H., Yanagishima, T., Taguchi, M.: Effect of Physical Changes in Aging on Driving
Performance, Nissan Research Center, Vehicle Research Laboratory. In: Proceeding of First
International Driving Symposium on Human Factors in Driver Assessment, Training and
Vehicle Design, pp. 131–136 (2001)
10. Ponds, R., Brouwer, W., van Wolffelaar, P.: Age Differences in Divided Attention in a Simu-
lated Driving Task. Journal of Gerontology 43, 151–156 (1988)
11. Johansson, K.: Older Automobile Drivers: Medical Aspects, Doctoral Dissertation, Karolin-
ska Institutet, Stockholm (1997)
12. Owens, J., Lehman., R.: The Effects of Age and Distraction on Reaction Time in a Driving
Simulator. Journal of Vision 2(7), 632a (2002)
13. Nass, C., Jonsson, I.-M., Harris, H., Reaves, B., Endo, J., Brave, S., Takayama, L.: Improv-
ing Automotive Safety by Pairing Driver Emotion and Car Voice Emotion. In: CHI 2005 Ex-
tended Abstracts on Human factors in Computing Systems, Portland, USA, pp. 1973–1976
(2005)
14. Jones, C.M., Jonsson, I.-M.: Performance Analysis of Acoustic Emotion Recognition for In-
Car Conversational Interfaces. In: Proceedings of HCI International, Beijing, China (2007)
15. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor,
J.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine,
32–80 (January 2001)
16. Brave, S.: Agents that Care: Investigating the Effects of Orientation of Emotion Exhibited by
an Embodied Computer Agent. Doctoral Dissertation. Communication. Stanford University,
Stanford, CA (2003)
17. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
18. Mullennix, J.W., Bihon, T., Bricklemyer, J., Gaston, J., Keener, J.M.: Effects of Variation in
Emotional Tone of Voice on Speech Perception. Language and Speech 45, 228–255 (2002)
Author Index