Structured software usability evaluation: an experiment in evaluation design
Abstract
This research compared the effectiveness of a detailed usability evaluation
method and instrument to a traditional overview method. Usability is a distinctive term
that includes, for example, ease of use, learnability, supportability of multi-level users,
error prevention, feedback, and recovery. Evaluating a product against a set of usability
criteria is a technique known in the usability field as a “heuristic evaluation,” during
which a usability professional can identify problems before they reach the end user.
Heuristic evaluations performed early in a product lifecycle can promote fixes in preproduction
phases, rather than more costly testing and implementation phases. In the
study, 63 usability professionals performed heuristic evaluations in a 2 x 2 research
design to determine the relative effectiveness in revealing usability problems of using
traditional versus experimental contemporary heuristics, each paired with an unstructured
all-at-once evaluation method, or an experimental structured method. In the latter
method, the evaluators were required to use limited sets of heuristics during a given
session, with breaks between sessions. The work is an extension of research by Masaaki
Kurosu, who developed the Structured Heuristic Evaluation Method (sHEM) for use in
Japan, and tested it in a single-factor design. In this present study the heuristic set and
evaluation method variables were separated into a two-factor design; Kurosu’s main
effect of the contemporary/structured interaction was not replicated. Participants using
traditional general heuristics found more usability problems than those using
contemporary detailed heuristics. The study’s strongest finding was that the structured
approached rendered the participants more effective at identifying usability problems,
even when individual participants found the approach “disconcerting” or “distracting.”
Results are congruent with the psychological phenomenon of retroactive interference: by
interrupting the evaluation at intervals, participants were able to “forget” previous
sections, freeing up working memory to find more usability problems in subsequent
sections. Implications of the practical applications of these results to the usability field
are discussed.