Comparing UX Measurements A Case Study
Comparing UX Measurements A Case Study
Reykjavik,
Iceland, 18 June 2008, E.L-C. Law, N. Bevan, G. Christou, M. Springett, and M. Lárusdóttir (pp. 72-78)
3
For more information on the uLog software see:
4
http://www.noldus.com/site/doc200603005 For more information see: http://www.attrakdiff.de/
4.2.3 Expert review session: Would you consider starting to use Tribler at
In the expert review five experts were provided with the home? How frequently? What would you use it for? Under
Tribler software to review it. They were asked various what circumstances? In the expert review, the experts were
questions assessing their opinions on the software. They asked similar questions: 1) after a quick exploration of
were also asked to estimate what problems users would Tribler: Do you think that the target group may indeed
have with the software, and to fill in the Attrakdiff consider downloading, or actually download the software
questionnaire imagining what the users would answer. once they are aware of its existence? What do you think
Communication with the experts was solely through email. users would want to use Tribler for in particular? 2) in
relation to Tribler’s friends and recommendations facilities:
4.3 Measurements in the different studies Do you think that (in the long term) Tribler users will
The various types of measurements taken are discussed actively engage in using this facility?
using the UX framework explained in section 2, focusing
on compositional, meaning and aesthetics UX aspects. It is 4.3.2 Attractiveness
assumed here that in case of software for voluntary use its One of the ways of measuring attractiveness was by having
attractiveness largely determines whether or not and to what the test participants fill in the Attrakdiff questionnaire. This
extent the software will be used. The attractiveness of using questionnaire is based on a theoretical model in which
the software is closely related to the emotional response to pragmatic and hedonic aspects of experiences are thought to
(the exposure to and interaction with) the software. In affect the attractiveness of a product. A number of
addition, factors like the user’s context, a user’s pre- questions assess the product’s attractiveness for the user. In
dispositions and constraints play an important role (e.g., the field test, participants who had actually used Tribler
familiarity and availability of other software with similar were asked to fill in the questionnaire in the fourth week of
functionality, previous knowledge of P2P file sharing their use. In the laboratory study participants filled in the
software, technicalities and compatibility of the user’s questionnaire after their task performance. In the expert
system). view, the experts were asked to fill in what they thought
The sense-making process of the UX framework (depicted users would fill in.
in the outer circle) implies that there are temporal issues In addition to this questionnaire, participants were also
involved. One has to take into account the user’s experience asked about their appreciation of specified Tribler features
in relation to the product prior to or at the very start of the on a 5-point scale. In the field study, these questions were
actual interaction (e.g., anticipation and connecting), as asked as experience sample questions in response to the use
well as during and after the interaction (i.e., the other of the specified function (e.g., after the 1st and every 5th
elements of the sense-making process). time of using the function). In the laboratory study all
Data was gathered on usage and on the attractiveness of the participants were asked to answer the questions after having
software, in addition to data that related to the different UX performed a task related to that function. In the expert view,
aspects. Temporal issues were addressed in different ways experts were instructed to answer the questions imagining
in the three types of studies. In the field trial data gathering how they thought users would appreciate the specified
on usage was automatically done over the five week study functionality.
period. In case of the laboratory study, users were asked Finally, insight into the users’ emotions in reaction to the
UX and usage related questions after their initial product were received through spontaneous (written)
experiences with Tribler, as well as during the whole feedback by the participants (field test) as well as through
session and afterwards, imagining future usage. In the the retrospective interviews and observed (verbal and no-
expert review experts were asked to provide their verbal) reactions during task performance (laboratory
expectations on the UX after their first confrontation with study).
the software, as well as after further inspection and trial.
4.3.3 Compositional aspects, meaning and aesthetics
4.3.1 Usage Compositional aspects relate to the pragmatic aspects of
In the field study, TUMCAT’s automated logging facilities interactions, including usability problems, effectiveness etc.
made it possible to monitor actual usage of Tribler at the In all three studies measurements included those questions
level of UI events (e.g., shifts in input focus, key strokes) of the Attrakdiff questionnaire that related to the software’s
and the abstract interaction level (e.g., providing values in pragmatic quality. In the field study, the logged and sensed
input fields) [10], providing insight into how usage data in combination with spontaneous user feedback shed a
developed over the five week time period. In the laboratory light on pragmatic issues. In the laboratory study such data
study test participants were asked to imagine whether and were gathered by observing task performance and through
how they would use Tribler at home in three different ways: retrospective interviews. In the expert review, the experts
1) after a short exploration of the software: what kind of were given the tasks used in the laboratory study as
things do you think you would use Tribler for at home; 2) suggestions to structure their search for usability problems
after each task: would this be something you can picture in the software and were also asked to provide some
yourself doing at home? (scale 1-5) and 3) at the end of the reasons on why they thought a problem would occur.
In: Proc. of the Intern. Workshop on Meaningful Measures: Valid Useful User Experience Measurement (VUUM),.Reykjavik,
Iceland, 18 June 2008, E.L-C. Law, N. Bevan, G. Christou, M. Springett, and M. Lárusdóttir (pp. 72-78)
7,0
data from the laboratory study, as well as from the expert
Lab t est review provided a rich view on problems and their possible
Expert Review
6,0 Field t est
causes. This was the kind of data that designers in the
Tribler development team could most readily use in their
5,0
attempts to redesign Tribler. Aspects of meaning were
found mainly in the laboratory study through spontaneous
4,0
think aloud utterances, as well as in retrospective
interviews. This related to issues like terminology being too
3,0
dull for them, not wanting to use social software, not
valuing recommendations of files based on popularity,
2,0
issues in relation to (appreciating or not appreciating)
illegal downloads and (laughing about or feeling offended
1,0 by) adult content. In the expert review two experts
PQ HQI HQS ATT
commented on issues of meaning, only in relation to not
appreciating illegal content and feeling offended by adult
Figure 3. Attrakdiff results (PQ: pragmatic quality, HQI: content. In the field study only two users commented on
hedonic quality (identification), HQS: hedonic quality issues of meaning. They did so in the same way as the
(stimulation), ATT: attractiveness. 1 indicates low score, 4 experts in the expert review.
indicates neutral. Lab test (n=11), Expert view (n=5), Field test
(n=6)). As to the aesthetic aspects, three experts mentioned issues
of graphical design and layout in their comments. Generally
In addition, the experts were explicitly asked about whether they indicated they liked the graphical design, although also
they thought users would understand the logic behind once ‘bad layout’ was mentioned, as well as being disturbed
specified functions. Aspects of meaning were measured by about the software not showing thumbnails in the files
the questions in the Attrakdiff questionnaire that related to view. In the field test only two participants commented on
the software’s hedonic qualities (all three studies). It was aesthetic aspects, mentioning they disliked the library and
also expected that in all three studies spontaneous feedback files view. In the laboratory study a rich mix of comments
from users and explanations on expected future usage was given (by 7 participants) on aesthetic issues. Opinions
would provide some insights on these aspects. Aesthetic here were more mixed in the sense of valuing the design or
aspects were not explicitly included in any of the not, but also of the level of design detail they commented
measurements of the three studies. However, spontaneous on, ranging from comments on specific icons to an opinion
remarks or feedback could provide some data on such on the general looks of the software. Many of the comments
aspects. were spontaneous exclamations when confronted with a
new screen.
5 PRELIMINARY RESULTS AND ANALYSIS
A preliminary analysis of the data was conducted, which From the Attrakdiff questionnaire we found that pragmatic
allows drawing some tentative conclusions on how the aspects in the field trial and the laboratory test were more or
findings from the three studies relate to each other in terms less the same and scored more negatively than in the expert
of UX framework elements. review. Hedonic identification with, and stimulation by
Tribler scored lowest in the field test, followed by the
From the field study it became clear that 15 of the 39 test laboratory test and the experts’ opinions. Attractiveness
participants had started using Tribler during the five week scored similar over the different studies, see figure 3 for an
period; these were given the questionnaire. Only 6 of them overview.
filled it in, being active users in the sense that on occasional
days they spent some hours on using the software. From 6 DISCUSSION AND CONCLUSIONS
questions asked to the 15 ‘real’ users and through informal Analysis of these studies is still preliminary and ongoing. In
communication it became clear that reasons for not using the following we describe our first findings. We can only
Tribler had to do with software packages crashing, the gain insight in the actual usage of Tribler and its’ functions
combination of software making computers too slow, as by using tools such as logging and sensing over a longer
well as them being used to competing software. Statements period of time. Laboratory tests and expert reviews give
participants in the laboratory study made, predicted that weak predictions about usage which do not agree with the
utility and usability problems (compositional aspects) results from logging and sensing. Logging and sensing do
would prevent some people from using the software. In the not provide any explanation for the actual usage. We found
expert review similar views were expressed. A superficial lower scores in the field test for both meaning and hedonics
analysis of the logging and sensing data gathered in the (from the Attraktdiff questionnaire) than from the
field study could not provide a clear view on compositional laboratory study or the expert reviews. Actual usage makes
aspects like usability problems; it was too difficult to trace people aware of the match between product and their higher
back what users were trying to do from mere log data. The order goals (or meaning). This may account for this result.
Laboratory tests and expert reviews give a detailed and rich ACKNOWLEDGEMENT
insight in compositional, pragmatic issues such as the We would like to especially thank the following people for
usability. Logging and sensing do not provide this detailed their contributions to these studies: Gilbert Cockton, Effie
insight. Logging tools used, monitored user activity on the Law, Jens Gerken, Hans-Christian Jetter and Alan
levels of UI events (e.g., shifts in input focus, key events) Woolrych from the COST294 network. Ashish Krishna and
and the abstract interaction level (e.g., providing values in Anneris Tiete for their contribution in executing and
input fields). To generate information about usability analysing the study and the preliminary results. The test
issues, the data needs to be transformed to higher levels of participants for their active contribution and feedback. Leon
abstraction such as domain or task related levels (e.g., Roos van Raadshoven, Paul Brandt, Jan Sipke van der Veen
providing address information) or to goal and problems and Armin van der Togt for their technical contributions.
related levels (e.g., placing an order) and compared to
predefined or automatically identified sequences of user REFERENCES
activity within these levels [10]. 1. Vermeeren, A.P.O.S. and J. Kort, Developing a testbed
for automated user experience measurement of context
Attractiveness scores were slightly negative on the aware mobile applications, in User eXperience, Towards
Attrakdiff questionnaire in all three studies. Though the a unified view, E. Law, E.T. Hvannberg, and M.
score was slightly negative we doubt if this is a valuable Hassenzahl, Editors. 2006, COST294-MAUSE: Olso. p.
predictor for the (non-)usage as measured during the field 161.
test (it seems too bold that a slightly negative score can
result in such overwhelming non-usage). The influence of 2. Fokker, J.E., A.P.O.S. Vermeeren, and H. de Ridder,
attractiveness on usage might be dependent on people’s Remote User Experience Testing of Peer-to-Peer
motivation to use a product (externally motivated or Television Systems: a Pilot Study of Tribler, in
internally motivated product interaction). Aesthetic aspects EuroITV'07, A. Lugmayr and P. Golebiowsky, Editors.
are difficult to measure for interactive products. We found 2007, TICSP, Tampere: Amsterdam, The Netherlands.
attractiveness as well as aesthetics are best measured p. 196-200.
through direct interaction with participants. For experts it’s 3. Kort, J., A.P.O.S. Vermeeren, and J.E. Fokker,
difficult to formulate an opinion about attractiveness and Conceptualizing and Measuring UX, in Towards a UX
aesthetics from a user’s viewpoint. Manifesto, COST294-MAUSE affiliated workshop, E.
Law, et al., Editors. 2007, COST294-MAUSE:
7 FUTURE WORK Lancaster. p. 83.
Based on the conclusions above, we identified several 4. McCarthy, J. and P. Wright, Technology as Experience.
issues for future work on improving UX measurement in 2007: The MIT Press. 224.
long-term field studies:
5. Pals, N., et al., Three approaches to take the user
1. Tools or methods to raise the level of logging data perspective into account during new product design.
from the UI events and the abstract interaction level to International Journal of Innovation Management, 2008.
higher levels such as domain and task or even goal and In press.
problem levels and means to analyze the data on these
higher levels (detecting sequences and interpreting 6. Desmet, P. and P. Hekkert, Framework of Product
these sequences); Experience. International Journal of Design, 2007. 1(1):
p. 10.
2. Approaches for automated gathering of data that 7. Hekkert, P., Design Aesthetics: Principles of Pleasure in
provide (a) insight into reasons of (non-) usage at the Design, Delft University of Technology, Department of
level of products or product features, (b) insight into Industrial Design: Delft. p. 14.
why or how the product succeeds (or not) in making
the user attribute meaning to it; (c) rich and detailed 8. Pouwelse, J.A., et al., Tribler: A social-based peer-to-
data on usability issues, especially those that relate to peer system. Concurrency and computation: Practice
longer term usage and are highly affected by the and experience, 2008. 20(2): p. 127-138.
personal situation of the user. Especially for topic 2a 9. Boren, T.M. and J. Ramey, Thinking aloud: reconciling
more detailed theoretical knowledge in the area of UX theory and practice. IEEE Transactions on Professional
would help to relate the various aspects in the Communication, 2000. 43(3): p. 261-277.
framework to each other and to a product’s
10. Hilbert, D.M. and Redmiles D.F, Extracting usability
attractiveness and (non-) usage in real-life.
information from user interface events. ACM
3. A practical approach for assessing the aesthetic aspects Computing Surveys (CSUR). 2000. 32(4): p. 384 – 421.
of a product.