0% found this document useful (0 votes)
222 views69 pages

Kmklo

The document discusses various techniques for evaluating user interfaces, including both expert analysis and user participation methods. It describes cognitive walkthroughs, heuristic evaluations, and review-based evaluations that can be conducted by experts to identify usability issues early in the design process. It also notes that while expert analysis is useful for refining designs, user testing is ultimately needed to fully evaluate a system.

Uploaded by

mohsin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views69 pages

Kmklo

The document discusses various techniques for evaluating user interfaces, including both expert analysis and user participation methods. It describes cognitive walkthroughs, heuristic evaluations, and review-based evaluations that can be conducted by experts to identify usability issues early in the design process. It also notes that while expert analysis is useful for refining designs, user testing is ultimately needed to fully evaluate a system.

Uploaded by

mohsin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 69

chapter 9

evaluation techniques
Evaluation?

• Evaluation
– tests usability and functionality of system
– occurs in laboratory, field and/or in collaboration
with users
– evaluates both design and implementation
– should be considered at all stages in the design life
cycle
Background

• Ideally, evaluation should occur throughout


the design life cycle.
• Clearly, it is not usually possible to perform
extensive experimental testing continuously
throughout the design.
• But analytic and informal techniques can and
should be used.
• In this respect, there is a close link between
evaluation and the principles and prototyping
techniques we have already discussed.
Background

• Such techniques help to ensure that the


design is assessed continually.
• Advantage:
– problems can be traced before considerable effort
and resources have been expended on the
implementation.
– it is much easier to change a design in the early
stages of development than in the later stages.
Background

• We can make a broad distinction between


– evaluation by the designer or a usability expert,
without direct involvement by users, AND
– evaluation that studies actual use of the system

• The former is particularly useful for assessing


early designs and prototypes.
• The latter normally requires a working
prototype or implementation.
Background

• Evaluation techniques under two broad


headings:
– expert analysis
– user participation.
• Lets discuss Goals of Evaluation first
Goals of Evaluation

• To assess extent of system functionality

• To assess users’ experience of the


interaction

• To identify specific problems with the


system
Goal 1:
extent of system functionality
• The system’s functionality must accord with the
user’s requirements.

• This includes not only making the appropriate


functionality available within the system.

• But making it clearly reachable by the user in terms


of the actions.

• matching the use of the system to the user’s


expectations of the task.
Goal 2:
assess users’ experience of the
interaction
• includes considering aspects such as:

• how easy the system is to learn?


• usability and the user’s satisfaction with
system
• enjoyment and emotional response of the user
• identifying areas of the design that overload
the user in some way(information to be
remembered
Goal 3: identify specific
problems with the system
• concerned with identifying trouble-spots
which can then be rectified.

• trouble-spots cause unexpected results,


or confusion amongst users
• related to both
– Functionality
– usability of the design
Evaluation
through expert analysis
• it can be expensive to carry out user testing at
regular intervals during the design process.

• it can be difficult to get an accurate assessment of


the experience of interaction from incomplete
designs and prototypes

• Best at early design stage/ at lab


– Cognitive Walkthrough
– Heuristic Evaluation
– Review-based evaluation
Cognitive Walkthrough

• The origin of the cognitive walkthrough is the


code walkthrough
• Walkthroughs require a detailed review of a
sequence of actions
• In code walkthrough, the sequence represents
a segment of the program code
• that is stepped through by the reviewers to
check certain characteristics
Cognitive Walkthrough

• Usually, the main focus of the cognitive walkthrough


is to establish how easy a system is to learn.
• More specifically, the focus is on learning through
exploration.
• Experience shows that many users prefer to learn
how to use a system by exploring its functionality
hands on, and not after sufficient training or
examination of a user’s manual.
• So the checks that are made during the walkthrough
ask questions that address this exploratory learning.
Cognitive Walkthrough

• To do this, the evaluators go through each


step in the task and provide a ‘story’ about
why that step is or is not good for a new user.
Cognitive Walkthrough

• To do a walkthrough you need 4-things:

• 1. A specification or prototype of the system. It


doesn’t have to be complete, but it should be
fairly detailed. Details such as the location and
wording for a menu can make a big difference.
• 2. A description of the task the user is to
perform on the system. This should be a
representative task that most users will want to
do.
Cognitive Walkthrough

• 3. A complete, written list of the actions


needed to complete the task with the
proposed system.

• 4. An indication of who the users are and what


kind of experience and knowledge the
evaluators can assume about them.
Cognitive Walkthrough

• Given this information, the evaluators step


through the action sequence (as in point 3) to
critique the system and tell a believable story
about its usability.
• To do this, for each action, the evaluators try to
answer the following four questions for each
step in the action sequence.
– Is the effect of the action the same as the user’s goal at that point?
– Will users see that the action is available?
– Once users have found the correct action, will they know it is the one
they need?
– After the action is taken, will users understand the feedback they
get?
Cognitive Walkthrough

• It is vital to document the cognitive


walkthrough to keep a record of what is
good and what needs improvement in
the design.
• It is therefore a good idea to produce
some standard evaluation forms for the
walkthrough.
Cognitive Walkthrough

• Any negative answer for any of the questions


for any particular action should be ocumented
on a separate usability problem report sheet.
• It is also useful to indicate the severity of the
problem, that is whether the evaluators think
this problem will occur often, and how serious it
will be for the users.
• This information will help the designers to
decide priorities for correcting the design, since
it is not always possible to fix every problem.
Heuristic Evaluation

• A heuristic is a guideline or general principle or rule


of thumb that can guide a design decision has
already been made.
• Heuristic evaluation can be performed on a design
specification so it is useful for evaluating early
design.
• But it can also be used on prototypes, storyboards
and fully functioning systems.
• It is therefore a flexible, relatively cheap approach.
• Hence it is often considered a discount usability
technique.
Heuristic Evaluation

• The general idea behind heuristic evaluation is


that several evaluators independently critique
a system to come up with potential usability
problems
• It is important that the evaluations be done
independently
• between three and five evaluators are
sufficient
• with five usually resulting in about 75% of the
overall usability problems being discovered.
Heuristic Evaluation

• To aid the evaluators in discovering usability


problems, a set of 10 heuristics are provided.
The heuristics are related to principles and
guidelines (as in Chapter 7).

Visibility of system status Match between system and the


real world
User control and freedom Consistency and standards
Error prevention Recognition rather than recall
Flexibility and efficiency of use Aesthetic and minimalist
design
Help users recognize, diagnose Help and documentation
and recover from errors
Heuristic Evaluation

• Each evaluator assesses the system and notes


violations of any of these heuristics that would
indicate a potential usability problem.
• The evaluator also assesses the severity of each
usability problem, based on four factors:
– how common is the problem?
– how easy is it for the user to overcome?
– will it be a persistent one?
– how seriously will the problem be perceived?
• These can be combined into an overall
severity rating on a scale of 0–4:
Heuristic Evaluation

• 0 = I don’t agree that this is a usability


problem at all
• 1 = Cosmetic problem only: need not be fixed
unless extra time is available on project
• 2 = Minor usability problem: fixing this should
be given low priority
• 3 = Major usability problem: important to fix,
so should be given high priority
• 4 = Usability catastrophe: imperative to fix
this before product can be released (Nielsen)
Review-based evaluation

• Results from the literature used to support or refute


parts of design.

• Care needed to ensure results are transferable to new


design.

• Model-based evaluation

• Cognitive models used to filter design options


e.g. GOMS prediction of user performance.

• Design rationale can also provide useful evaluation


information
Review-based evaluation

• Model-based evaluation
– cognitive and design models provide a
means of combining design specification
and evaluation
– GOMS (goals, operators, methods and
selection) model
– predicts user performance with a particular
interface and can be used to filter particular
design options.
Review-based evaluation

• keystroke-level model
– lower-level modeling techniques provide
predictions of the time users will take to
perform low-level physical tasks
Review-based evaluation

• Design rationale (as in Chapter 6)


– Design rationale provides a framework in
which design options can be evaluated.
– By examining the criteria that are
associated with each option in the design,
Evaluating through user
Participation
Expert Analysis (recap)

• The techniques considered so far


concentrate on
– evaluating a design or system through
analysis by the designer, or an expert
evaluator, rather than testing with actual
users.
• useful as these techniques are for
filtering and refining the design
– they are not a replacement for actual usability
testing with the people
Styles of Evaluating through
user Participation
• Two styles
– Laboratory studies
– Field Studies
Laboratory studies

• Advantages:
– specialist equipment available
– uninterrupted environment

• Disadvantages:
– lack of context
– difficult to observe several users cooperating

• Appropriate
– if system location is dangerous or impractical for
constrained single user systems to allow controlled
manipulation of use
Field Studies

• Advantages:
– natural environment
– context retained (though observation may alter it)
– longitudinal studies possible

• Disadvantages:
– distractions
– Noise
– movement

• Appropriate
– where context is crucial for longitudinal studies
Evaluating Implementations

Requires an artefact:
simulation, prototype,
full implementation
Experimental evaluation

• controlled evaluation of specific aspects of


interactive behaviour
• evaluator chooses hypothesis to be tested
• a number of experimental conditions are
considered which differ only in the value of
some controlled variable.
Experimental factors

• Subjects
– who – representative, sufficient sample
• Variables
– things to modify and measure
• Hypothesis
– what you’d like to show
• Experimental design
– how you are going to do it
Subjects or participants

• The choice of participants is vital to the


success of any experiment.
• participants should be chosen to match the
expected user population as closely as
possible.
• Ideally, actual users should be involved but
this is not always possible.
• If participants are not actual users, they
should be chosen to be of a similar age and
level of education as the intended user group.
Subjects or participants

• A second issue is the sample size chosen.



• Often this is something that is determined by
the availability of participants or resources.

• However, the sample size must be large


enough to be considered to be representative
of the population, taking into account the
design of the experiment and the statistical
methods chosen.
Variables

• independent variable (IV)


characteristic changed to produce different
conditions
e.g. interface style, number of menu items

• dependent variable (DV)


characteristics measured in the experiment
e.g. time taken, number of errors.
Hypothesis

• prediction of outcome
– framed in terms of IV and DV

e.g. “error rate will increase as font size decreases”

• null hypothesis:
– states no difference between conditions
– aim is to disprove this

e.g. null hyp. = “no change with font size”


Experimental design

• within groups design


– each subject performs experiment under each
condition
– transfer of learning possible
– less costly and less likely to suffer from user
variation
• between groups design
– each subject performs under only one condition
(Experimental or Control group)
– no transfer of learning
– more users required
– variation can bias results
Analysis of data

• Before you start to do any statistics:


– look at data
– save original data
• Choice of statistical technique depends on
– type of data
– information required
• Type of data
– discrete - finite number of values (e.g: screen color is
red/green/blue)
– continuous - any value (e.g: time taken to complete a task)
• A continuous variable can be rendered discrete by
clumping it into classes
Analysis - types of test

• parametric
– assume normal distribution
– robust
– powerful

• non-parametric
– do not assume normal distribution
– less powerful
– more reliable
Analysis of data (cont.)

• What information is required?


– is there a difference?
– how big is the difference?
– how accurate is the estimate?
Experimental studies on groups

More difficult than single-user experiments

Problems with:
– subject groups
– choice of task
– data gathering
– analysis
Subject groups

larger number of subjects


 more expensive

longer time to `settle down’


… even more variation!

difficult to timetable

so … often only three or four groups


The task

must encourage cooperation

perhaps involve multiple channels

options:
– creative task
– decision games
– control task
– Decision Games (desert survival task)
Data gathering

several video cameras


+ direct logging of application

problems:
– synchronisation
– sheer volume!

one solution:
– record from each perspective
Analysis

solutions:
– within groups experiments

look at interactions between group and media

controlled experiments may `waste' resources!


Field studies

Experiments dominated by group formation

Field studies more realistic:


distributed cognition  work studied in context
real action is situated action
physical and social environment both crucial

Contrast:
psychology – controlled experiment
sociology and anthropology – open study and rich data
Observational Methods

Think Aloud
Cooperative evaluation
Protocol analysis
Automated analysis
Post-task walkthroughs
Think Aloud

• user observed performing task


• user asked to describe what he is doing and why,
what he thinks is happening etc.

• Advantages
– simplicity - requires little expertise
– can provide useful insight
– can show how system is actually use
• Disadvantages
– subjective
– selective
– act of describing may alter task performance
Cooperative evaluation

• variation on think aloud


• user collaborates in evaluation
• both user and evaluator can ask each other
questions throughout

• Additional advantages
– less constrained and easier to use
– user is encouraged to criticize system
– clarification possible
Protocol analysis
• paper and pencil – cheap, limited to writing speed
• audio – good for think aloud, difficult to match with other
protocols
• video – accurate and realistic, needs special equipment,
obtrusive
• computer logging – automatic and unobtrusive, large amounts
of data difficult to analyze
• user notebooks – coarse and subjective, useful insights, good
for longitudinal studies

• Mixed use in practice.


• audio/video transcription difficult and requires skill.
• Some automatic support tools available
• Transcription: typist may oversee context, expertise req
automated analysis – EVA

• Workplace project
• Post task walkthrough
– user reacts on action after the event
– used to fill in intention
• Advantages
– analyst has time to focus on relevant incidents
– avoid excessive interruption of task
• Disadvantages
– lack of freshness
– may be post-hoc interpretation of events
post-task walkthroughs

• transcript played back to participant for


comment
– immediately  fresh in mind
– delayed  evaluator has time to identify
questions
• useful to identify reasons for actions
and alternatives considered
• necessary in cases where think aloud is
not possible
Query Techniques

Interviews
Questionnaires
Interviews

• analyst questions user on one-to -one basis


usually based on prepared questions
• informal, subjective and relatively cheap

• Advantages
– can be varied to suit context
– issues can be explored more fully
– can elicit user views and identify unanticipated
problems
• Disadvantages
– very subjective
– time consuming
Questionnaires

• Set of fixed questions given to users

• Advantages
– quick and reaches large user group
– can be analyzed more rigorously
• Disadvantages
– less flexible
– less probing
Questionnaires (ctd)

• Need careful design


– what information is required?
– how are answers to be analyzed?

• Styles of question
– general
– open-ended
– scalar
– multi-choice
– ranked
Physiological methods

Eye tracking
Physiological measurement
eye tracking

• head or desk mounted equipment tracks the


position of the eye
• eye movement reflects the amount of
cognitive processing a display requires
• measurements include
– fixations: eye maintains stable position. Number and
duration indicate level of difficulty with display
– saccades: rapid eye movement from one point of
interest to another
– scan paths: moving straight to a target with a short
fixation at the target is optimal
physiological measurements
• emotional response linked to physical changes
• these may help determine a user’s reaction to
an interface
• measurements include:
– heart activity, including blood pressure, volume and pulse.
– activity of sweat glands: Galvanic Skin Response (GSR)
– electrical activity in muscle: electromyogram (EMG)
– electrical activity in brain: electroencephalogram (EEG)
• some difficulty in interpreting these
physiological responses - more research needed
Choosing an Evaluation Method

when in process: design vs. implementation


style of evaluation: laboratory vs. field
how objective: subjective vs. objective
type of measures: qualitative vs. quantitative
level of information: high level vs. low level
level of interference: obtrusive vs. unobtrusive
resources available: time, subjects,
equipment, expertise

You might also like