The Myth of Usability Testing
by Robert Hoekman Jr.

In 1998, usability expert Rolf Molich (co-inventor with Jakob Nielsen of the heuristic evaluation method) gave nine teams three weeks to evaluate the webmail application www.hotmail.com. The experiment was part of his series of Comparative Usability Evaluations (CUEs), through which he began to identify a set of standards and best practices for usability tests. In each segment of the series, Molich asked several usability teams to evaluate a single design using the method of their choice.

From the documented results of the second test, called CUE-2, a surprising trend appeared. Contrary to claims that usability professionals operate scientifically to identify problems in an interface, usability evaluations are at best less than scientific.

In an interview with Christine Perfetti published in User Interface Engineering, Molich said:

The CUE-2 teams reported 310 different usability problems. The most frequently reported problem was reported by seven of the nine teams. Only six problems were reported by more than half of the teams, while 232 problems (75 percent) were reported only once. Many of the problems that were classified as “serious” were only reported by a single team. Even the tasks used by most or all teams produced very different results—around 70 percent of the findings for each of these common tasks were unique.

In CUE-4, run in 2003, 17 teams evaluated the Hotel Penn website, which featured a Flash-based reservation system developed by iHotelier. Of the 17 teams, nine ran usability tests, and the remaining eight performed expert reviews.

Collectively, the teams reported 340 usability problems. However, only nine of these problems were reported by more than half of the teams. And a total of 205 problems—60% of all the findings reported—were identified only once. Of the 340 usability problems identified, 61 problems were classifed as “serious” or “critical” problems.

Think about that for a moment.

For the Hotmail team to have identified all of the “serious” usability problems discovered in the evaluation process, it would have to have hired all nine usability teams. In CUE-4, to spot all 61 serious problems, the Hotel Penn team would have to have hired all 17 usability teams. Seventeen!

Asked how development teams could be confident they are addressing the right problems on their websites, Molich concluded, “It’s very simple: They can’t be sure!”

Why usability evaluation is unreliable

Usability evaluations are good for a lot of things, but determining what a team’s priorities should be is not one of them. Fortunately, there is an explanation for these counterintuitive outcomes that can help us choose a more appropriate evaluation course.

Right questions, wrong people, and vice versa

First, different teams get different results because tests and research are often performed poorly: teams either ask the right questions of the wrong people or ask the wrong questions of the right people.

In one recent case, the project goal was to improve usability for a site’s new users. A card-sorting session—a perfectly appropriate discovery method for planning information architecture changes—revealed that the existing, less-than-ideal terminology used throughout the site should be retained. This happened because the team ran the card-sort with existing site users instead of the new users it aimed to entice.

In another case, a team charged with improving the usability of a web application clearly in need of an overhaul ran usability tests to identify major problems. In the end, they determined that the rather poorly-designed existing task flows should not only be kept, but featured. This team, too, ran its tests with existing users, who had—as one might guess—become quite proficient at navigating the inadequate interaction model.

Usability teams also have wildly differing experience levels, skill sets, degrees of talent, and knowledge, and although some research and testing methods have been homogenized to the point that anyone should be able to perform them proficiently, a team’s savvy (or lack thereof) can affect the results it gets. That almost anyone can perform a heuristic evaluation doesn’t mean the outcome will always be useful or even accurate. Heuristics are not a checklist, they are guidelines a usability evaluator can use as a baseline from which to apply her expertise. They are a beginning, not an end.

...

 

Read

 

ARTICLES
Showcase of Web Design in Russia
By Arseny Vesnin

Web Design in Russia

The land mass that is one-sixth of the Earth...

Redesign: When To Relaunch The Site and Best Practices
By Kayla Knight November 11th, 2009

Redesigning a website is a big job...

CONTACT
Name:
E-Mail:
Text:
Code: