12 Usability Testing
12 Usability Testing
Usability Testing
Human Computer Interaction
Fulvio Corno, Luigi De Russis
Academic Year 2019/2020
Evaluation Goal (recap)
§ «Evaluation tests the usability, functionality, and acceptability of an
interactive system»
o According to the design stage (sketch, prototype, … final)
o According to the initial goals
o Alongside different dimensions
o Using a range of different techniques
§ Very wide (and a little bit vague) definition
§ The idea is to identify and correct problems as soon as possible
2
Human Computer Interaction
Evaluation Approaches (recap)
4
Human Computer Interaction
Involving Users: Experimental Methods
§ observation-driven § scientific
§ hypothesis-driven
5
Human Computer Interaction
Involving Users: Experimental Methods
6
Human Computer Interaction
Usability Testing
§ Usability testing speeds up many projects and produces cost savings in a
system development
§ Participants should represent the intended user communities, with attention
to:
o background in computing and experience with the task
o motivation, education, and ability with the natural language used in the
interface
§ The movement towards usability testing stimulated the building of ad-hoc
usability labs
7
Human Computer Interaction
Usability Testing Labs
§ The usability lab usually consists of two areas
o the testing room
o the observation room
§ The testing room is typically smaller
and accommodates a small number of people
§ The observation room can see into the testing room typically via a one-way
mirror
o it is larger and can hold the facilitators with ample room to bring in others,
such as the developers of the product being tested
8
Human Computer Interaction
Usability Testing: 3 Steps
1. Plan
o who are your participants? what are you going to test, where, and how?
2. Run
o one participant at time, multiple sessions
o collect data about the interactive system/interface
3. Analyze
o extract information from the collected data, both qualitative and
quantitative
9
Human Computer Interaction
Plan
Usability Testing
10
Human Computer Interaction
Usability Testing: Plan
§ Choose who you will involve in the test
o who are your (target) users?
§ How many participants do you need?
o 5!
o https://www.nngroup.com/articles/how-many-test-users/
§ Decide who and which roles you are going to "play"
o you need at least a facilitator of the session
o other 1-2 people may serve as note-takers and observers
o N.B. developers, designers, creators, … of the interactive system in
evaluation must not serve as facilitators!
11
Human Computer Interaction
Usability Testing: Plan
§ Choose which task(s) you are going to ask your participants to perform
o tasks may be introduced with a scenario
o they must be concrete and with a clear goal
o between 5-10 tasks
12
Human Computer Interaction
Usability Testing: Plan
§ Decide whether you need or want to ask any additional information
o before and/or after the test
o before and/or after each task
o before and/or after a meaningful group of tasks
13
Human Computer Interaction
Usability Testing: Plan
§ Decide whether to have a debriefing session at the end of the test
o for each participant
o observers and note-takers can ask general and specific questions, to better
understand some pathways or comments
§ Develop a written test protocol ("script") for consistency among sessions
o step-by step instructions with all the needed questions and forms
o often down to the exact words that the facilitator will say
o the appendix may contain a table with all tasks and their metrics
§ Practice your script with friends or colleagues
o to fix obvious bugs so that you do not waste (yours and users’) time
14
Human Computer Interaction
Informed Consent Form
§ Professional ethics practice is to ask all participants to read, understand, and
sign a statement which says:
o I have freely volunteered to participate in this experiment
o I have been informed in advance what my task(s) will be and what
procedures will be followed
o I have been given the opportunity to ask questions and have had my
questions answered to my satisfaction
o I am aware that I have the right to withdraw consent and to discontinue
participation at any time, without prejudice to my future treatment
o My signature below may be taken as affirmation of all the above
statements; it was given prior to my participation in this study
15
Human Computer Interaction
Metrics
§ For success/failure criteria and additional information
§ Subjective metrics, i.e., questions you ask participants:
o prior to the session, e.g., background info
o after each task scenario is completed, such as ease and satisfaction
questions about the task
o overall ease of use, satisfaction, and likelihood to use/recommend at the
end
§ Quantitative metrics
o what you will be measuring in your test, e.g., successful completion rates,
error rates, time on task
16
Human Computer Interaction
Metrics
Successful Task A task is successfully completed when the participant Boolean value, 0-100
Completion indicates they have found the answer or completed the scale, …
task goal.
Critical Errors Deviations at completion from the targets of the task, so Absolute or relative
that the participant cannot finish the task. Participant may number
or may not be aware that the task goal is incorrect or
incomplete.
Non-Critical Errors Errors that are recovered by the participant and do not Absolute or relative
result in the participant's ability to successfully complete number, or they may
the task. These errors result in the task being completed affect the "successful
less efficiently. task completion"
Error-Free Rate The percentage of participants who complete the task Relative number
without any errors.
17
Human Computer Interaction
Metrics
Time On Task The amount of time it takes the participant to complete the Time
task.
Subjective Measures Self-reported participant ratings for satisfaction, ease of Likert Scale
use, ease of finding information, etc.
Likes, Dislikes and What participants liked the most about the system, what Free text
Recommendations they liked least, any recommendations for improving it, etc.
Typically at the end of the session or a meaningful part of
it.
Reliable and validated questionnaires exist for subjective measures and open questions
18
Human Computer Interaction
Methodology: Think-Aloud
§ While the participant performs a task, she is asked to describe what she is
doing and why, what she thinks is happening, etc.
§ Advantages
o simple, it requires little expertise
o can provide useful insight
o can show how the system is actually used
§ Disadvantages
o subjective
o selective
o the act of describing may alter task performance (e.g., time-on-task metric)
19
Human Computer Interaction
Methodology: Cooperative Evaluation
§ Variation of the think-aloud
§ The participant and the facilitator collaborate during the evaluation
o both can ask each other questions throughout
§ Additional advantages
o less constrained and easier to use
o user is encouraged to criticize system
o clarification possible
20
Human Computer Interaction
Equipment
§ Any of these can work for an effective usability testing:
o Laboratory with two or three connected rooms outfitted with audio-visual
equipment
o Room with portable recording equipment
o Room with no recording equipment, as long as someone is observing the
participant and taking notes
o Remotely, with the participant in a different location (either moderated or
unmoderated)
21
Human Computer Interaction
Equipment: Some Material
§ Paper and pencil § Computer logging
o cheap, limited to writing speed o automatic and unobtrusive
o large amounts of data may be
§ Audio difficult to analyze
o good for think-aloud
§ Eye-tracking
§ Video o to track and record eye movements
o accurate and realistic
o needs special equipment
o may be obtrusive
23
Human Computer Interaction
Post-Test Questionnaire: SUS
§ System Usability Scale (SUS)
o a "quick and dirt" (but trustable) usability scale
o invented by John Brooke in 1986
§ It measures the perceived usability of a system
§ A 10-item Likert-scale questionnaire
o each question has 5 response options
§ It produces a score from 0-100
o not equivalent to a percentage score!
§ A SUS score above 68 is considered above average
24
Human Computer Interaction
SUS: Questions
1. I think that I would like to use this system frequently.
2. I found the system unnecessarily complex.
3. I thought the system was easy to use.
4. I think that I would need the support of a technical person to be able to use this system.
5. I found the various functions in this system were well integrated.
6. I thought there was too much inconsistency in this system.
7. I would imagine that most people would learn to use this system very quickly.
8. I found the system very cumbersome to use.
9. I felt very confident using the system.
10. I needed to learn a lot of things before I could get going with this system.
25
Human Computer Interaction
SUS: Scoring
To calculate the SUS score of your system:
1. Each answer is 1-5 (X)
2. For every odd-numbered question, subtract 1 from the score (X-1)
o e.g., the answer for question 1 is 4, so its score is 4-1 = 3
3. For every even-numbered question, subtract the score from 5 (5-X)
o e.g., the answer for question 2 is 4, so its score is 5-4 = 1
4. Sum the scores from even and odd-numbered questions
5. Multiply the total by 2.5
26
Human Computer Interaction
SUS: Advantages and Disadvantages
§ Advantages § Disadvantages
o Score reliability has been o It is a subjective measure of
evaluated over the decades and it perceived usability
is on par with more complex and • it should not be your only method
costly methods o It gives no clues about how to
o Free, quick, and simple improve the score
o Quite used in industry • it is not diagnostic
o Applicable to a wide range of o It is not possible to make
technologies, systems, and systematic comparisons between
products two system and their
functionality using SUS
27
Human Computer Interaction
Post-Test Questionnaire: NASA-TLX
§ NASA Task Load indeX (NASA-TLX)
o emerged in the 1980s
o the result of NASA efforts to develop an instrument
for measuring the perceived workload required by
the complex, highly technical tasks of aerospace
crew members
§ Useful for studying complex products and tasks in high-
consequence environments
o e.g., healthcare, aerospace, military, etc.
28
Human Computer Interaction
NASA-TLX: Questions
§ 6 questions on an unlabeled 21-point scale
o ranging from Very Low to Very High
§ Each question addresses one dimension of the perceived workload:
o mental demand
o physical demand
o time pressure
o perceived success with the task
o overall effort level
o frustration level
§ Respondents weigh each one of the questions pertaining to the six
categories, to indicate which mattered most to what they were doing
29
Human Computer Interaction
NASA-TLX: Score
§ A complex instrument to score
§ NASA shares a paper and pencil version
o with instructions
o https://humansystems.arc.nasa.gov/groups/tlx/tlxpaperpencil.php
§ and a free iOS app to compute the score
o https://itunes.apple.com/us/app/nasa-tlx/id1168110608
30
Human Computer Interaction
Sample Scripts and Some Tips
§ Sample Usability Testing scripts, with no task described in them, mainly:
o https://www.sensible.com/downloads/test-script.pdf
o http://www.lse.ac.uk/intranet/staff/webSupport/guides/archivedWebeditor
sHandbook/pdf/script.pdf
§ How to create good tasks?
o https://www.nngroup.com/articles/task-scenarios-usability-testing/
31
Human Computer Interaction
Run and Analyze
Usability Testing
32
Human Computer Interaction
Usability Testing: Run
§ Get informed consent
o better in written format
§ One person acts as the facilitator and rest of team are observers
o at least one of the observers must take notes
§ Tell each participant:
o "we are testing our app, not you! Any mistakes are app’s fault, not yours."
o IMPORTANT!
33
Human Computer Interaction
Usability Testing: Run
§ The facilitator should always follow the script, remain neutral, not help the
participants, and provide clear instructions
o tasks can be given in a written form, one at time, … or vocally
§ The facilitator must encourage participants to adopt (and explain) the chosen
methodologies, at the right moment
o e.g., how the think-aloud work and for which tasks to use it
§ Note-takers take notes of the participant’s behavior, comments, errors and
completion (success or failure) of each task
§ The system is ready to measure all the defined criteria
34
Human Computer Interaction
Usability Testing: Analyze
§ Analyze collected data to find UI failures and ways to improve
o e.g., written notes, audio, video, usage logs, …
§ Do not forget to consider the collected metrics
o per task and overall
§ Quantitative data can be summarized in, e.g., success rates, task time, error
rates, satisfaction questionnaire ratings
§ Look for trends and keep a count of problems that occurred across
participants
o e.g., observations about pathways participants took,
comments/recommendations, answers to open-ended questions
35
Human Computer Interaction
References
§ Alan Dix, Janet Finlay, Gregory Abowd, Russell Beale, Human Computer
Interaction, 3rd Edition
o Chapter 9: Evaluation Techniques
§ Ben Shneiderman, Catherine Plaisant, Maxine S. Cohen, Steven M. Jacobs, and
Niklas Elmqvist, Designing the User Interface: Strategies for Effective Human-
Computer Interaction
o Chapter 5: Evaluating Interface Design
§ usability.gov - Improving the User Experience
o https://www.usability.gov
36
Human Computer Interaction
References
§ Beyond the NPS: Measuring Perceived Usability with the SUS, NASA-TLX, and
the Single Ease Question After Tasks and Usability Tests
o https://www.nngroup.com/articles/measuring-perceived-usability/
§ John Brooke, SUS - A quick and dirty usability scale, 1986
o https://hell.meiert.org/core/pdf/sus.pdf
§ The Pros and Cons of the System Usability Scale (SUS)
o https://research-collective.com/blog/sus/
37
Human Computer Interaction
License
§ These slides are distributed under a Creative Commons license “Attribution-NonCommercial-ShareAlike 4.0
International (CC BY-NC-SA 4.0)”
§ You are free to:
o Share — copy and redistribute the material in any medium or format
o Adapt — remix, transform, and build upon the material
o The licensor cannot revoke these freedoms as long as you follow the license terms.
§ Under the following terms:
o Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were
made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses
you or your use.
o NonCommercial — You may not use the material for commercial purposes.
o ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions
under the same license as the original.
o No additional restrictions — You may not apply legal terms or technological measures that legally restrict
others from doing anything the license permits.
§ https://creativecommons.org/licenses/by-nc-sa/4.0/
38
Human Computer Interaction