|
|
|
By Gerd Waloszek, SAP User Experience, SAP AG, and Ulrich Kreichgauer, oCTO, SAP AG – October 13, 2009 • original article (story)
On
this Website, we have published quite a few articles discussing performance
and responsiveness issues in software applications over the past two years
(see the Human
Performance at the Computer highlight topic for a compilation of these
articles). These more general articles are the by-products of a project that
was initiated by SAP User Experience at the beginning of 2008 – the
Perceived Performance project, or "PeP" project for short. This article is
devoted to the PeP project itself; it reports briefly on the project's goals
and methodological approach, work and cooperation with other groups within
SAP, and possible future directions.
In his editorial What Matters Most? one of the authors expressed his belief performance issues are the number one usability issue. Even if you find this statement too strong, there is general agreement that solving performance, or better, responsiveness issues is of utmost importance for software companies (see the appendix or Human Performance at the Computer – Part 1: Introduction for the difference between performance and responsiveness): Poor performance degrades user efficiency and thus the efficiency of the business processes that depend on the software. The usual step is, of course, to approach these issues from a technical perspective. At SAP AG, for example, there are dedicated technical teams that measure the responsiveness of SAP applications in clearly defined test environments. For this purpose, they have created standardized step-by-step scenarios that allow them to compare different software versions and thus evaluate the effects of technical fine-tuning to improve the system's responsiveness.
The problem with a purely technical approach, however, is that these measurements tell us little about how users experience an application's responsiveness, which areas require a greater investment of effort from a user's perspective, and where the system is already responsive enough. In order to gain a better overall understanding of these issues, SAP's User Experience team initiated the Perceived Performance (PeP) project at the beginning of 2008. The primary goal was to devise a user-centered evaluation method that could be applied to the scenario-based measurements made by the technical teams. Further goals were to apply this methodology to dedicated SAP applications, roll out the insights gained within the company to increase awareness of responsiveness issues, and to publish them externally via channels such as the SAP Design Guild Website and conference appearances.
When the technical performance teams at SAP evaluate the responsiveness of applications they monitor a large number of parameters – one of which is the overall response time for user-initiated user interface (UI) events. For this parameter, the teams apply a one-second threshold as a criterion for whether an application achieves SAP's performance goals. However, this rule does not reflect the full variety of user expectations and behaviors: Some actions should take much less than a second, while others may take longer without annoying users. Thus, the PeP team's challenge was to develop an evaluation method that provides better insight into the actual user experience and helps identify areas that need improvement. The concept of human time ranges that originates from Allen Newell's time scales of human action looked promising as a starting point for developing such a methodology, because these time scales refer to the psychological dimensions of perception, operations, and cognition (thinking, attention, motivation) (see Table 1 below; for details see Human Performance at the Computer – Part 2: Making Applications More Responsive and Waloszek and Kreichgauer, 2009). In their most basic and cited form, the time ranges are defined as follows:
The PeP team integrated two further categories into its adoption of time ranges:
This leads to the following table of time ranges (see Table 1; a more extended version of the time range table can be found in the appendix):
| Time Range |
Human Aspect |
Application / User Interface (UI): Acceptable Response |
User: Response When Feedback Does Not Meet Time Range |
| 0.1 sec. (0.0-0.2) |
Perception | Acknowledges user input | Perception of smooth animations and cause-and-effect relationship breaks down |
| 1.0 sec. (0.2-2.0) |
Dialog, action |
Presents result of simple task | Engaged user-system dialog breaks down |
| 3 sec. (2.0-5.0) |
Cognition, attention, motivation |
Presents result of common task | User has time to think – the system is perceived as slow, the user's focus starts to wander, and the user may turn to other tasks |
| 10 sec. (5.0-15) |
Presents result of complex tasks | User loses focus on task and may turn to other tasks | |
| >15 sec. | Presents result of very complex task | User becomes annoyed – the system is detrimental to productivity and motivation |
Table 1: PeP adaptation of human time ranges table, including variations in parentheses
The next question was how the time ranges could be utilized for a user-centered evaluation of response times. The PeP team's answer to this question was to classify observed response times according to time ranges, and thus the psychological effects on users of waiting. This required switching from discrete times to ranges by extending and connecting the time ranges from 0 to beyond 15 seconds, without leaving any gaps (see the graphic in the appendix). To define the ranges, the PeP team adopted Shneiderman's and Plaisant's (2004) values for the variation of the time ranges wherever possible, but a few decisions could not be backed up with data from the literature. We therefore initially set fairly conservative upper limits for the time ranges (see the first column in Table 1 or the graphic in the appendix).
Measuring response times and classifying them according to the time ranges does not, however, provide the complete picture. As already mentioned, some UI events need to be blazingly fast, while others may take longer without annoying users. Thus, to derive guidance from the evaluations, it is also necessary to know, which response (or waiting) time users expect (and tolerate) for certain types of UI events. Assigning UI events to time ranges makes it possible to compare and evaluate observed and expected response times and to identify which events conform to users' expectations and which do not (and thus require improvement). As there was very little guidance in the literature, the PeP team drew up the following list for practical use in its evaluations:
Finally, we put together the ingredients of an evaluation method for response time. In short, the PeP methodology is based on three steps:
The time ranges have distinct implications (directness, appropriateness, slowness, waning or lost focus, annoyance) for users' perceptions and reactions. Therefore, the PeP evaluation matrix provides a more refined picture of how users perceive the performance of a software application than checking response times against one fixed time limit. The PeP evaluation is particularly valuable if an application is considerably slower than expected or exhibits wide response-time variations.
The PeP team measured many standardized scenarios, the data for which was provided by the technical performance teams. The (fictional) example in Table 2 below shows a scenario with a fulfillment rate of 30.1% for simple tasks; this is assumed to have a strong negative impact on user satisfaction.
Tolerable Range |
Observed Range (Number of Times Measured) |
Total | Fulfillment Rate (%) |
|||
Type of Interaction |
0.2-2.0 s |
2.0-5.0 s |
5.0-15 s |
> 15 s |
||
Simple Tasks (0.2-2.0 s.) |
22 |
26 |
20 |
5 |
73 |
30.1 |
Common Tasks (2.0-5.0 s.) |
3 |
13 |
9 |
9 |
34 |
47.1 |
Complex Tasks (5.0-15.0 s.) |
0 |
1 |
2 |
1 |
4 |
75.0 |
Overall |
25 |
40 |
31 |
15 |
111 |
36.9 |
Table 2: Example of a PeP evaluation matrix (fictional data)
One of the PeP project's major tasks was, of course, to learn about and gain an understanding of responsiveness issues from a user's perspective. As shown, this was essential for developing an evaluation method, and it was also the basis for consulting other teams at SAP. But the PeP team also had to gain a basic understanding of the technical constraints underlying application responsiveness issues. For this purpose, the team attended the internal SAP Performance Focus Days, for example. In addition, cooperating closely with the technical performance teams at SAP was mandatory for the PeP team; we already mentioned that the technical teams provided the data for most of the PeP evaluations.
After the PeP team had developed a user-centered evaluation approach and performed a number of evaluations based on data provided by the technical teams – about 10 evaluation reports were delivered by the PeP team in 2008 – the team was also able to roll out information within SAP: PeP members gave a number of presentations to other teams that were interested in the topic, prepared a presentation for an internal SAP Developers conference, and took part in numerous discussions and several work groups, consulting the teams from a user-centered perspective. Issues that arose during these discussions were, for example: When should feedback be given and what form should it take? Should pages load incrementally or completely? Further topics included application startup time, speed of autocomplete, and the influence of server roundtrips and WANs. It turned out that the PeP team's time ranges provided a good heuristic for answering such questions, thus extending their usefulness beyond their sole application to the PeP evaluation method.
At the beginning of this article, we mentioned that the PeP team also had the goal of making information available outside of SAP. This was accomplished by publishing articles on the SAP Design Guild Website. These are additionally compiled in the Human Performance at the Computer highlight topic for easier access. They contain general information and are largely independent from the PeP project. However, Ulrich Kreichgauer and Gerd Waloszek presented the PeP team's user-centered evaluation method at the INTERACT 2009 conference in Uppsala, Sweden. This method will also form part of a keynote that Dan Rosenberg will give at the 20th FQS-Forschungstagung ( research congress of the German Quality Research Community ) in October 2009 in Frankfurt am Main, Germany. Last but not least, this article delivers some details of the PeP team's work to the public.
Because it was initiated as a project, the PeP project has a limited time scope. Many questions were addressed and answered during the project time span, while others need further research and clarification and may be beyond the current project scope. First of all, the assumptions underlying the PeP evaluations need to be validated further. UI events are currently assigned to time ranges on a heuristic basis and call for more thorough investigation. In addition, the transition points between the time ranges rely on data from the literature and on heuristic assumptions. Systematic experiments involving users who rate the timeliness of selected UI events could help to define the points more reliably.
The PeP team's research could also be the starting point for addressing responsiveness issues through the UI design itself: (1) Performance-oriented guidelines, that is, high-level rules on top of UI guidelines for specific applications, could make UI designers aware of human performance issues and provide guidance (see also Have You Ever Heard of Performance-Oriented (UI) Design? and Human Performance at the Computer – Part 4: On the Way to Performance-Oriented UI Guidelines); (2) Measuring the time costs of UI controls and suggesting alternative designs would make it possible to reduce screen rendering times at the design stage.