Chp43 A Reference To UsabilityInspection Methods
Chp43 A Reference To UsabilityInspection Methods
net/publication/314091950
CITATIONS READS
13 916
2 authors:
All content following this page was uploaded by Lin Chou Cheng on 13 December 2020.
43.1 Introduction
Usability evaluation is a form of user context analysis, which mainly draws from the
direct observations of users’ task performance when they are interacting with an
application. The evaluation can be a costly quantitative experiment or a formative
qualitative study which comprises both large and small sample sizes [1, 2]. The
methods to evaluate usability are usually a costly construct which would need a
considerable amount of resources to administer [3]. The referring costs in usability
are the total cost that has been spent on an evaluation cycle, where the definition of
costs is based on man-hours (time) and the value of money that has been used dur-
ing the evaluation activities. According to Nielsen, on average, the costing of usabil-
ity would siphon an additional 8–13 % of a project’s total budget [4]. Although the
cost for usability testing does not increase linearly with project size, Nielsen advo-
cated that it would be best to devote additional 10 % of a project’s total budget for
From the price list above, the offered five evaluation services can be broadly clas-
sified into two distinct categories: the formative and summative methods. These cat-
egorical terms of formative and summative originated from assessment design in
education where they are used to classify the approach that set to evaluate students’
performance [9]. In education, formative methods use interim results to inform one’s
learning with immediate feedback for self-improvement, whereas summative meth-
ods use a standardized test to summarize the performance of a population. The
nature of assessment design in education is similar to usability evaluation. The only
difference is that the test subject in usability is an application and not students or the
users. All the expenditures in usability evaluation are basically a floating operational
cost that scales according to the selected type of evaluation methods. By observa-
tion, the costing in usability evaluation is largely subjected to the mechanics of the
method itself. For instance, based on the sample price list in [10] (see Fig. 43.1), the
median cost for both the summative and formative UEMs is at the distinct price mark
of USD 60,000 and USD 40,000, respectively. All summative testing is more costly
compared to the formative one, as it required a large quantity of samples to conclude
its finding through discrete statistical distributions [10]. A typical summative usabil-
ity test would be a one-to-one session that involves two (2) hired individuals: a
moderator and a recruited tester (user). Each of the test sessions will have different
testers and the same pool of moderators to moderate the test for at least 30–50 ses-
sions. In its formal procedure, the usability test would continue to be run even the
findings are about the same after several rounds of initial testing. As such, the spend-
ing on such mode of recurring testing would have set a total cost of $30,000 if each
of the test sessions is worth $1,000 and is being rendered for thirty (30) times.
The formative UEMs on the other hand have much lower overhead cost as com-
pared to the summative UEMs. This is because the findings of a standard formative
evaluation can be concluded among three to five different experienced evaluators.
Hence, the formative UEMs are also best known as discounted inspection methods
[11]. Although formative UEMs seem to be more cost-efficient as compared to the
summative test methods, the challenge lies in hiring usability experts, as there are
no standard ways to qualify such expertise and it would not be cheap to engage one
[7, 12, 13]. The objective of usability evaluation is to discover what is usable and
not usable in an application. The gathered insights aimed to aid the developers to
improve their design with better informed decisions, which would benefit their end
users. However, usability evaluation is still costly, be it summative or formative.
The inspection approach was first conceived from the informal auditing procedure
in software development for debugging a program’s source code during the mid-
1980s [19, 20]. The practice was then brought over into the development of
43 A Reference to Usability Inspection Methods 411
graphical user interface (GUI) and subsequently became an alternate solution for
usability testing, as real users are often expensive and difficult to recruit in sufficient
number for testing all aspects of an evolving design [21]. For instance, a typical
usability inspection process would only require three to five (3–5) evaluators,
whereas the procedure of usability testing would require at least 30 of real users and
a number of moderators in each of the test sessions. Prior to 1990, most usability
inspections relied on the general skill and experience of an evaluator without any
formalized procedures or guided inspection criteria [22]. Such mode of inspection
is usually questionable and plagued with validity issues, as every time a new evalu-
ator who evaluated the same interface would deliver a different scope of findings
from the previous one. Over the time, discussions on formalizing usability inspec-
tion approach were called to attention, and both the academics and industrial
researchers have discovered that the traditional scientific inquisition methods can
actually be fruitfully employed to inspect the usability of an interactive system with
much better validity [23]. Hence, from 1990 to 1993, there was an explosion of
interest in developing new evaluation methods which are built on the existing ones
[24]. For instance, UEMs like heuristic evaluation (HE) were an extension of the
previously informal inspection technique that is armed with a standardized usability
heuristics and checklist. In the following subsection, this paper will look into the
inspection techniques of task analysis, cognitive walk-through, and heuristic evalu-
ation, all of which have stood the test of time relatively well.
Task Analysis
sequence of laborious actions with no internal control structure [14]. The focus of
TA as usability inspection technique is to identify and analyze users’ mental and
physical efforts in details, when they are interacting with a user interface. TA was
originally developed to improve occupational task performance of the labor force
[26], and it was only later being adapted into the field of human-computer interac-
tion when the industrial landscape has changed from analogue to digital technology.
The use of TA is to comprehend users’ task requirements, by breaking down an
observed task into its lowest level of acts, and then re-cluster them into plausible
scenarios where the users would perform in an actual course. For instance, consider-
ing the task of buying an e-book via online store [27], the digital task was observed
and deconstructed into five different sequential acts as follows:
1. Locate the book.
2. Add a book to shopping cart.
3. Enter payment details.
4. Complete billing and mailing address.
5. Confirm order.
Based on the preliminary task flow, the online book buying action can be orga-
nized into two plausible scenarios based on the predefined users’ experience profile
(see Fig. 43.3). By drawing from these two scenarios, an inspection process would
then follow suite to assess the usability of the e-commerce website, whether the
website’s interface can effectively and efficiently support the course of book buying
actions for both the new and experienced users.
The process of TA can be extended by using questionnaires or open-ended user
interviews to gather more detailed information about how people would actually
perform a specific type of task [28]. The questionnaires can be administered as a
face-to-face interview or an online survey for identifying users’ task requirements
[29]. The gathered requirements enable the application designers to have a deeper
insight into users’ task needs before actual development [25]. Overall, TA is a user-
centered inspection technique, where only one experienced practitioner is needed to
do the inspection. However, such user-centered inspection technique has its disad-
vantages, as it is too time consuming and skill dependent. In addition, TA has lim-
ited application, as it cannot model after complex users’ task [25]. Hence, this
explained why the usage of task analysis has fifty-eight percent (58 %) declining
popularity in the 2009 UPA’s survey; a time-consuming method is not favorable in
a tight developing environment.
Cognitive Walk-Through
noninvolvement of the real users or any other independent evaluators would create
the danger of inherent bias during the walk-through [14]. For instance, the applica-
tion’s designers who acted as the evaluators would tend to be more defensive when
potential usability problems with their design are being highlighted to them; this
would often evoke a long argument to justify their designs at the expenses of any
necessary fixes [31]. Like TA, CW is also found to be too time consuming and
tedious as the method needs to query and answer at every level of actions. The effec-
tiveness of CW is tied to its deep-level mode of slow data gathering process; hence,
the walk-through would need to be executed in a slower pace.
Heuristic Evaluation
Heuristic evaluation (HE) is a popular and widely used inspection technique pio-
neered by Jakob Nielsen and Rolf Molich during the early 1990s [32]. The inspec-
tion process of HE involves having usability specialists to evaluate a user interface
with a set of usability principles known as heuristics [14, 21]. HE is also known as
expert review, as the evaluators who review or inspect the user interface are usually
the product domain experts who know about usability requirement [7]. By far, HE
is the only inspection technique that uses usability experts to audit an interface’s
navigation structure, dialogue box, menu, etc., through a set of empirical or vali-
dated heuristics. The original mechanic of HE is to have a single expert evaluator
performing the inspection alone. However, the method was later revised by Nielsen
[33] to include a few more evaluators, to widen the scope of the inspection. The
rationale behind the refinement came from other Nielsen’s experiments when
inspecting a voice response system. In the experiment, Nielsen had asked nineteen
(19) expert evaluators to identify sixteen (16) usability problems, which were all
purportedly sowed into the voice response system before the actual experiment.
Interestingly, all the returned discoveries by the nineteen (19) expert evaluators
were more varied and more diverse than the predicted findings. Then, there came
the question about how many expert evaluators are needed for hosting a reliable
HE. The common connotation back then was that the more expert evaluators are on
the job, the higher the usability problem discovery rate would be. But to include
more evaluators, the purpose of HE for being a cost-effective inspection technique
would be defeated, as it would not be any different from summative UEMs. In a real
development, all usability studies should use as few evaluators as possible when
resources are tight.
As a result, both Nielsen and Landauer come up with a predictive model and set
the optimum cost-benefit ratio of HE, which is not to have more than five evaluators
per testing [34]. This is because the findings after the fifth evaluator would be
repeatedly the same, as 85 % of the usability problems would have been identified
by the previous five evaluators. To explain their substantial discovery, both Nielsen
and Landauer used the following nomograph to address the cost-effectiveness by
having only five evaluators (Fig. 43.5).
43 A Reference to Usability Inspection Methods 415
For instance, the nomograph showed there will be zero usability problems if
there is zero evaluator in action. As soon as there are two evaluators entering the
evaluation, the usability problems’ discovery rate would jump astoundingly to fifty
percent (50 %). Subsequently, there will be lesser and lesser new usability problems
to be uncovered after the fifth evaluator as most of the obvious problems would have
been pointed out by the first five evaluators. This influential finding in [34] by
Nielsen and Landauer has also indirectly asserted one of the earlier hypothetical
claims by Virzi in [35], where the act of “observing additional participants will
reveal fewer and fewer new usability problems.” The findings by these researchers
were also repeatedly confirmed in the published findings by Lewis [36] who stated
that the law of diminishing returns applies to HE, as discovery rate of new usability
problems would begin to stay stagnant and incrementally diminish after the eighth
evaluator. In other words, it is not economical to repeat the inspection process for
the same interface for more than eight times, as the subsequent findings would deem
to be redundant. In [36], Lewis also pointed out that it would be highly unrealistic
for most usability evaluation studies to uncover ninety-nine percent (99 %) of
usability problems within an application, as the evaluation would require the sample
inputs from four hundred and eighteen (418) expert evaluators. The cornerstone to
a successful HE is the selection of usability heuristics that are contextually
relevant.
Usability heuristics are general rules of thumb that set to guide the design of an
interface. In their first founding method of HE, both Nielsen and Molich proposed
a smaller set of heuristics that are made up of nine empirical usability principles
[32]. These were empirically tested principles derived from the analysis of 249
usability problems [31]. Later, Nielsen revised the heuristics to the ten with better
explanatory power [37]. Besides the 10 Usability Heuristics by Nielsen, there are
other usability guidelines such as Gerhardt-Powals Heuristics [38] which were
based on cognitive design principles. By far, the heuristics by Gerhardt-Powals is
the only set of heuristics that has been validated by human factors research [39].
416 L.C. Cheng and M. Mustafa
Gerhardt-Powals Heuristics
1. Visibility of system status 1. Automate unwanted workload
2. Match between system and the real 2. Reduce uncertainty
world 3. Fuse data
3. User control and freedom 4. Present new information with
4. Consistency and standards meaningful aids to interpretation
5. Error prevention 5. Use names that are
6. Recognition rather than recall conceptually related to function
7. Flexibility and efficiency of use 6. Limit data-driven tasks
8. Aesthetic and minimalist design 7. Include in the displays only
9. Help users recognize, diagnose, and that information needed by the
recover from errors user at a given time.
10. Help and documentation 8. Provide multiple coding of
data when appropriate.
9. Practice judicious redundancy.
However, Nielsen’s 10 Usability Heuristics are the most widely adopted guidelines
compared to others, as it is more concise and easily understood (Fig. 43.6).
Most usability heuristics were developed way before the emergence of apps cul-
ture. Having said so, Nielsen’s heuristics is still relevant for evaluating mobile tech-
nologies [40]. For instance, Wright et al. have applied the heuristics in [37] to
evaluate a mobile fax application known as MoFax [41]. MoFax was an application
created to support industry representatives who often send faxes of plans to conven-
tional fax machines while out in the construction field. Initially, Wright et al. have
planned to perform field testing with real users. Prior to the summative testing,
Wright et al. have discovered that the MoFax interface was so unusable that they
decided to conduct an HE instead of having a costly user testing when the problems
are obvious. With three (3) expert evaluators, fifty-six (56) usability problems were
then identified for MoFax, and the developers mitigated all the problems by rede-
signing the application. The advantages of HE are in its effectiveness to identify all
major and minor usability problems of an application in any given stage [14]. With
the inputs from three to five experienced evaluators, all identified problems could be
prioritized and analyzed along with proposed solutions to improve usability. The
benefit to having experienced evaluators for usability inspection is that time can be
saved during the process of problem analysis as the evaluators can act as the ana-
lysts themselves to provide sample solutions. Testing with end users would require
the presence of moderators and external analysts to analyze the identified usability
problems before generating any solutions. HE can be performed by less-experienced
people as they will be guided with the heuristics [14]. For instance, in a comparative
study of web usability for the blinds, Mankoff et al. [42] have discovered that the
developers who are not the usability expert were capable to find 50 % of known
usability problems with HE. Mankoff et al. have concluded that HE, as an inspec-
tion method, is more cost-effective in their experiment if they were to perform user
testing with the blind users.
43 A Reference to Usability Inspection Methods 417
However, HE is not perfect. Several studies have reported that HE is not always
reliable in identifying usability problems, even though the evaluators have been
guided with a set of heuristics [15, 43–47]. The issues come from inconsistent
reporting among different evaluators who evaluated the same interface, as there are
no standard ways of documenting the findings with a common lingo [48]. Thus,
there are two notable revised HE methods, namely, HE-Plus and HE++ which are
developed by Chattractichart and Lindgaard which aimed to address the reliability
issues [49–51]. The method of HE-Plus used a catalogue of node-based problem
descriptors that was found in [48] and overlapped it with Nielsen’s heuristics.
However, such approach is quite time consuming and complex, as it requires the
evaluators to keep checking with an exhaustive list of catalogue, and there will be
situations that the problem description nodes were too rigid to describe a usability
problem. To validate the effectiveness of these new methods, a comparative study
had been conducted between the methods of HE, HE-Plus, and HE++ [12]. From
the study, both HE++ and HE-Plus have demonstrated that they have better reliabil-
ity and effectiveness compare to HE. However, there is no way to ascertain the find-
ings in [12], as the profile of the recruited expert evaluators was being misguided as
part of the sample population were students.
43.4 Overview
References
4. Nielsen, J. (2003). Return on investment for usability. Nielsen Norman Group. Retrieved
September 16, 2013 from http://www.nngroup.com/articles/return-on-investment-for-usability/
5. Nielsen, J. (1998). Cost of user testing a website. Jakob Nielsen’s Alertbox. Retrieved June 16,
2013 from http://www.nngroup.com/articles/cost-of-user-testing-a-website/
6. Nielsen, J. (2013). Salary trends for usability professionals. Jakob Nielsen’s Alertbox: 8
May 2012. Retrieved June 16, 2013 from http://www.nngroup.com/articles/
salary-trends-usability-professionals/
7. Barnum, C. M. (2011). Usability testing essentials: ready, set…test! Burlington: Morgan Kaufmann.
8. Nielsen Norman Group. Usability evaluation, (n.d.). Retrieved June 16, 2013 from http://www.
nngroup.com/consulting/usability-evaluations/
9. Scriven, M. (1967). The methodology of evaluation. In R. E. Stake (Ed.), Curriculum evalua-
tion. Chicago: Rand McNally. American Educational Research Association.
10. Hofman, R. (2011). Range statistics and the exact modeling of discrete non-gaussian distribu-
tions on learnability data. In A. Marcus (Ed.), Design, user experience, and usability. Theory,
methods, tools and practice (Vol. 6670, pp. 421–430). Berlin/Heidelberg: Springer.
11. Nielsen, J., & Mack, R. L. (Eds.). (1994). Usability inspection methods. New York: Wiley.
12. Chattratichart, J., & Lindgaard, G. (2008). A comparative evaluation of heuristic-based usabil-
ity inspection methods. In CHI’08 Extended abstracts on human factors in computing systems
(pp. 2213–2220). New York: ACM.
13. Nielsen, J. (2007). High-cost usability sometimes makes sense. Jakob Nielsen’s Alertbox. Retrieved
June 16, 2013 http://www.nngroup.com/articles/when-high-cost-usability-makes-sense/
14. Holzinger, A. (2005). Usability engineering methods for software developers. Communication
of the ACM, 48(1), 71–74.
15. Hornbaek, K. (2005). Current practice in measuring usability: challenges to usability studies
and research. International Journal Human-Computer Studies, 64(2006), 79–102.
16. Desurvire, H., Kondziela, J., & Atwood, M. (1992). What is gained and lost when using evalu-
ation methods other than empirical testing. In Proceedings of human-computer interaction
(HCI’92), University of York, Heslington, York, UK.
17. Coursaris, C. K., & Kim, D. J. (2011). A meta-analytical review of empirical mobile usability
studies. Journal Of Usability Studies, 6(13), 117–171.
18. Usability Professionals’ Association (UPA). (2009). UPA 2009 salary survey (public version).
Bloomingdale: UPA.
19. Ackerman, A. F., Buchwald, L. S., & Lewski, F. H. (1989). Software inspections: an effective
verification process. IEEE Software, 6(3), 31–36.
20. Eagan, M. E. (1986). Advances in software inspections. IEEE Transaction Software
Engineering, 12(7), 744–751.
21. Nielsen, J. (1994). Usability inspection methods. In Conference companion on human factors
in computing systems (CHI’94), Catherine Plaisant (Ed.) (pp. 413–414). New York: ACM.
22. Nielsen, J. (1994). Enhancing the explanatory power of usability heuristics. In Proceedings of
CHI’94 (pp. 152–158). New York: ACM.
23. Shneiderman, B. (1987). Designing the user interface: strategies for effective human computer
interaction. Reading: Addison-Wesley.
24. Dumas, J. (2007). The great leap forward: The birth of the usability profession (1988–1993).
Journal of Usability Studies, 2(2), 54–60.
25. Rogers, Y., Sharp, H., & Preece, J. (2011). Interaction design: beyond human-computer inter-
action (3rd edition). West Sussex: Wiley.
26. Annett, J., & Duncan, K. D. (1967). Task analysis and training design. Occupational
Psychology, 41, 211–221.
27. Hornsby, P. (2012). Hierarchical task analysis. UX Matters. Retrieved November 16, 2012,
from http://www.uxmatters.com/mt/archives/2010/02/hierarchical-task-analysis.php
28. Cooper, A., Reimann, R., & Cronin, D. (2007). About face 3: the essentials of interaction
design. Indianapolis: Wiley.
29. Jonassen, D. H., Tessmer, M., & Hannum, W. H. (1999). Task analysis methods for instruc-
tional design. Mahwah: Lawrence Erlbaum Associate.
43 A Reference to Usability Inspection Methods 419
30. Lewis, C., & Wharton, C. (1997). Cognitive walkthroughs. In M. G. Helander, T. K. Landauer,
& P. V. Prabhu (Eds.), Handbook of human-computer interaction (pp. 717–732). Amsterdam:
Holland.
31. Spencer, R. (2000). The streamlined cognitive walkthrough method, working around social con-
straints encountered in a software development company. In Proceedings of the SIGCHI confer-
ence on Human Factors in Computing Systems (CHI’00) (pp. 353–359). New York: ACM.
32. Nielsen, J., & Molich, R. (1990). Heuristic evaluation of user interfaces. In Proceedings of
ACM CHI’90 conference, (pp. 249–256). Seattle, 1–5 April 1990.
33. Nielsen, J. (1992). Finding usability problems through heuristic evaluation. In Proceedings of
CHI’92. New York: ACM.
34. Nielsen, J., & Landauer, T. K. (1993). A mathematical model of the finding of usability prob-
lems. In Proceedings of ACM/IFIP INTERCHI’93 conference, Amsterdam, Netherlands
(pp. 206–213).
35. Virzi, R. A. (1992). Refining the test phase of usability evaluation: how many subjects is
enough? Human Factors, 34(4), 457–468.
36. Lewis, J. R. (1994). Sample sizes for usability studies: additional considerations. Human
Factors, 36, 368–378.
37. Nielsen, J. (1995). 10 usability heuristics for user interface design. Jakob Nielsen’s Alertbox, 1995.
Retrieved November 16, 2012 from http://www.nngroup.com/articles/ten-usability-heuristics/
38. Gerhardt-Powals, J. (1996). Cognitive engineering principles for enhancing human – computer
performance. International Journal of Human-Computer Interaction, 8(2), 189–211.
39. U.S. Department of Health and Human Services. (2012). The research-based web design &
usability guidelines. Washington: U.S. Government Printing Office. Retrieved September 16,
2012, from http://www.usability.gov/basics/index.html
40. Brewster, S. A., & Dunlop, M. D. (2004). Mobile human-computer interaction – mobile HCI
2004. Lecture notes in computer science. Springer. Note: Vol. 3160.
41. Wright, T., Yoong, P., Noble, J., Cliffe, R., Hoda, R., Gordon, D., & Andreae. C. (2005).
Usability methods and mobile devices: an evaluation of MoFax. In Proceedings of the 4th
international conference on mobile and ubiquitous multimedia (pp. 29–33), New York: ACM.
42. Mankoff, J., Fait, H., & Tran, T. (2005). Is your web page accessible?: A comparative study of
methods for assessing web page accessibility for the blind. In Proceedings of the SIGCHI
conference on human factors in computing systems, Portland, Oregon, USA (pp. 41–50).
43. Jacobsen, N. E., Hertzum, M., & John, B. E. (1998). The evaluator effect in usability tests. In
CHI’98 conference summary (pp. 255–256). Reading: Addison-Wesley.
44. Hertzum, M., & Jacobsen, N. E. (2001). The evaluator effect: A chilling fact about usability
evaluation methods. International Journal of Human-Computer Interaction, 13(4), 421–443.
45. Kjeldskov, J., & Graham, C. (2003). A review of mobile HCI research methods. In Proceedings
of 5th international mobile HCI 2003 conference. Italy: Udine.
46. Cockton, G., & Woolrych, A. (2002). Sale must end: Should discount methods be cleared off
HCI’s shelves? Interactions 9, 5, (pp. 13–18). September 2002.
47. Kock, E. D., Biljon, J. V., & Pretorius, M. (2009). Usability evaluation methods: Mind the
gaps. In SAICSIT conference 2009, Vaal River, South Africa (pp. 122–131).
48. Andre, T. S., Hartson, H. R., Belz, S. M., & McCreary, F. A. (2001). The user action frame-
work: A reliable foundation for usability engineering support tools. International Journal of
Human – Computer Studies, 54(1), 107–136.
49. Chattratichart, J., & Brodie, J. (2002). Extending the heuristic evaluation method through con-
textualisation. In Proceedings of 46th annual meeting of the human factors & ergonomics
society, Baltimore, Maryland, USA. (pp. 641–645).
50. Chattratichart, J., & Brodie, J. (2003). HE-plus – towards usage-centered expert review for
website design. In Proceedings of for use 2003, (pp. 155–169). MA: Ampersand Press.
51. Chattratichart, J. & Brodie, J. (2004). Applying user testing data to UEM performance metrics.
In Proceedings of CHI 2004 (pp. 1119–1122). Vienna: ACM Press.