By Gerd Waloszek, SAP AG, SAP User Experience – Updated: November 5, 2009
Imagine that you enter a booth at a trade show and are welcomed by a synthetic character on a computer screen and offered assistance. The character does not simply talk to you in a one-way fashion but has a real conversation with you. It directs you to indicate your interests in a book in front of the screen and shows videos on the requested topics. Finally it asks you to hand over your business card so that you may receive further information. When you leave the booth accompanied by a farewell greeting, you may wonder what on earth you just have encountered at this booth. Well, you encountered a glimpse into the future of user interfaces, a future that is called "Narrative User Interfaces." (This scenario is based on the IZA prototype from Fraunhofer IDG, see below.)
Narrative user interfaces are based on the storytelling paradigm and set out to revolutionize the way people interact with computers. They promise to ultimately make computers accessible for everyone. Today's graphical user interfaces, even though they have opened the computer to the masses, have reached their limits. Many people have problems with using them and with exploiting their full capabilities. As software becomes more and more complex and powerful, the situation is getting even worse – that is the reason why Prof. Thome of the University of Würzburg, Germany speaks of "complexware" instead of software. The fact that the functionality of software applications is to a large degree underutilized is not a minor issue; it has an enormous economic impact. As an example, business processes that do not run as efficiently as they could waste a lot of money. Narrative interfaces, the research field of Prof. Encarnação from Fraunhofer IDG, combined with Prof. Thome's demand for integrating explanation and guidance capabilities into computers, may help to utilize the immense computing power of future computers and to unleash their real power and benefits – computers that are usable and understandable for everyone, without recourse to manuals, computers that also explain what they do and why, and what they can do for us.
This article provides an introduction to narrative user interfaces. It starts with the concepts behind these interfaces. Then it gives an overview of existing approaches to the new paradigm. This articles concludes by listing available resources, such as a glossary and references to books, journals, conferences, and people working in the field. This should allow readers to delve deeper into the promising new field of narrative interfaces.
See also the editorial From GUIs to Narrative Interfaces – From Point-and-Click to Computers Telling Stories and the stories article Future Scope - Tangible Information.
See also the Narrative User Interfaces Glossary and Links and References for Narrative User Interfaces (both taken from this article)
This introductory section describes the ideas behind narrative user interfaces and related concepts. For definitions of these and further concepts, see the glossary.
We tell and surround ourselves with stories from the early days of our childhood and all through our adult lives. By telling stories, we make sense of the world: We order its events and find meaning in them by assimilating them to more or less familiar narratives (Mateas & Sengers, 1998). The psychologist Bruner (1991), for example, argues that narrative is fundamental to human understanding of intentional behavior.
Research under the storytelling paradigm comprises support for human storytelling, especially for kids, data bases of stories in which stories describe how people handled commonly occurring problem situations, the design of story-understanding systems, as well as autonomous "intelligent" agents that engage in conversations with users. The latter is handled below in the context of narrative user interfaces.
Blair and Meyer (1997) call the human ability to organize experiences into narrative form, that is the ability to tell stories that follow a certain dramaturgy, logic, or pattern, "narrative intelligence." This term is also used as the name of a new interdisciplinary research field with contributions from artificial intelligence (AI), computer science, human-computer interaction (HCI), and many humanistic fields of study. Currently, the field does not have a clear definition – as is the case for any new research direction. Of the many threads that are pursued, the design of narrative intelligent agents and narrative user interfaces are among the most promising. Both approaches are based on the argument that systems can be made more understandable if they communicate in ways that are easy to assimilate to narrative.
Narrative user interfaces attempt to mimic the communication behavior of humans. Computers talk to people, listen to them, and even take the situational context into account. While the conversational aspect of narrative interfaces is based on the storytelling paradigm, these interfaces typically do not stop here. One additional ingredient of these interfaces is goal-driven behavior: The computer drives the conversation with users in the pursuit of goals, which are given by the software application. Thus, the computer behaves proactively, following a strategy, and no longer waits passively for user commands. This approach is based on the argument that humans use narrative for understanding intentional behavior. It is typically implemented using intelligent agents that frame their behavior into a narrative structure, which, for example, enables them to make behavioral choices.
For achieving a more human-like communication, narrative interfaces may utilize humanoid representations, such as avatars, which have realistic facial expressions, lip-synch speech and can express emotions. That is, narrative interfaces may also include ingredients taken from social and affective computing, which are presented below. They may also include "natural" physical devices for people to interact with the system as known from the tangible media research field. All in all, users interact with the computer in a "mixed reality" of virtual and physical objects, in which the computer's behavior comes close to human behavior, and therefore is easy for humans to understand.
While the idea and reasoning behind narrative interfaces is simple and easy to get, these interfaces are hard to realize. Their implementation draws heavily on techniques from artificial intelligence, such as speech recognition and production, detection and creation of emotions, planning, learning, autonomous agents, and pattern recognition. They also require state-of-the-art graphical rendering techniques and sensors that mimic the abilities of human senses.
Authors of conversational applications have to take into account how humans tell stories and conduct conversations. The dialog between human users and the computer has to follow a certain dramaturgy that finally leads to the fulfillment of the computer's goal. Therefore, techniques from the film industry, theater, and literature have to be integrated into the design of narrative user interfaces.
The terms "social computing" and "affective computing" refer to certain aspects of narrative interfaces but are also research directions in their own right.
As Reeves and Nass (1996) demonstrated in their famous book The Media Equation, people tend to interact with computers as if they were human beings. Social Computing aims at supporting this human tendency for human-like communication. There are two directions in this research field: narrative user interfaces based on storytelling, and synthetic characters, which can exhibit emotions that depend on the current situation. In real-world systems both directions are often combined, as the examples below and our introductory scenario demonstrate.
The term "affective computing" was probably coined by Rosalind Picard, from the MIT Media Lab, who wrote an influential book with this title in 1997. Affective computing is concerned with the means to recognize and synthesize "emotional intelligence." Whereas emotional intelligence includes both bodily and mental events, affective computing presently focuses mainly on the apparent characteristics of verbal and nonverbal communication in relatively simple settings (Duric et al. 2002). As an example, research areas in affective computing at the MIT Media Lab comprise, among others:
Thus, there is considerable overlap between research in social and affective computing. Both directions, like narrative interfaces, requires the application of diverse disciplines ranging from agent systems, models of emotion, graphics, and interface design, to sociology and psychology, and even art, drama, and animation techniques.
For an overview of the affective computing field, see the Affective Computing Portal, which links to many resources. You can also find information on emotions and computers in CHI 2002 — Changing the World, Changing Ourselves on the SAP Design Guild Website.
Below are selected examples of prototype systems that study the capabilities of narrative user interfaces. This selection is by no means complete but should suffice for a first impression of the work going on in the field. There is also theoretically-oriented work going on, such as the research by Chrystopher Nehaniv.
The IZA project lead by Prof. Encarnação at Fraunhofer IDG, Germany explores narrative user interfaces with the prototype of a digital trade show booth. This includes affective behavior, goal-directed interaction, and a multimodal user interface in a mixed-reality environment. The introductory scenario shows how the prototype system works: The plot for the trade show booth delivers the structure for a conversation between the booth's intelligent agent and a visitor approaching it. The story includes a greeting and introduction phase, which may hopefully lead to a conversation. In the course of the conversation the systems offers information and tries to collect the visitor's business card – this is the ultimate goal the system pursues. The story closes with a leave scenario.
The research team's vision for humane applications is that the computer engages in a natural conversation that takes context-relevant information into account, and that interaction is goal-oriented and makes use of human senses beyond vision and hearing. The natural interaction style is supported by integrating physical objects and the environment into the user interface.
This vision transcends the current desktop metaphor by utilizing multimodal interaction and narrative environments. According to the research team, it attempts to combine technology and the best qualities of people to create the perfect environment for success ("clicks and mortar").
In 2003 the research group started the new project "Virtual Human," which builds on the experiences gained in the IZA project.
For more information see www.inigraphics.net/publications/topics/2002/issue2/Topics%202_2002.pdf (PDF document) and the "Virtual Human" Website: www.virtual-human.org/start_en.html (site under construction).
Figure 1: Digital trade show booth (left; from Fraunhofer IDG) and demo of it at SAP (right)
Helmut Prendinger and Mitsuru Ishizuka developed SCREAM (scripting emotion-based agents minds), a system that allows users to script a character's affect-related capabilities. This system is intended as a plug-in to content and task specific agents systems, such as interactive tutoring or entertainment systems that provide possible verbal utterances for a character. The system is built on top of the Microsoft agents player technology and has been tested in three different scenarios, each of which focuses on a particular aspect of the agent architecture:
There are several groups at the MIT Media Lab that do research in the field of narrative user interfaces and related areas. This article focuses on the work of the Gesture and Narrative Language Group (GNL) lead by Justine Cassell, which explores several systems based on narrative interfaces. The work of Rosalind Picard and her Affective Computing group are not covered here.
MACK is a life-sized, on-screen, animated robot that explains the Lab's people, projects, and groups to Media Lab visitors, and gives directions about how to find them within the Lab. The agent shares a model of the Lab with the visitor, around which the two participants can center their discussion. MACK is an Embodied Conversation Agent (ECA), which uses a combination of speech, gesture, and reference to a normal paper map that users place on a table between themselves and MACK.
For more information see www.media.mit.edu/gnl/projects/kiosk/.
Figure 2: MACK is a life-sized, on-screen, animated robot
Many networked virtual communities, such as Multi-User Domains (MUDs) or chat rooms, where people meet in a fictitious place, offer graphical representations of the places and the people that inhabit them. Visitors that come to the environment choose a character, called an avatar, that represents them in this world. They can then explore the environment by moving the avatar around. The avatars of other users currently logged onto the system can also be seen and approached to initiate a conversation. Even though these systems are graphically rich, communication is still mostly based on text messages or digitized speech streams sent between users. The Gesture and Narrative Language Group is looking at ways to make communication, mediated through avatars, more lifelike and natural through appropriate and meaningful animation of the avatar's body and face.
For more information see avatars.www.media.mit.edu/avatars/.
Figure 3: Three different avatars created by the Media Labs GNL group (from MIT Media Lab)
Body Chat is an early prototype of a graphical chat system that allows users to communicate using text while their avatars automatically animate attention, salutations, turn taking, back-channel feedback, and facial expression, as well as simple body functions such as the blinking of the eyes.
For more information see www.media.mit.edu/groups/gn/projects/bodychat/
Situated Chat builds on the experiences gained with BodyChat and automatically animates the visual representations (avatars) of the participants of an online chat. BodyChat concentrated on the use of a social model to animate appropriate social behavior, such as greetings and farewells. Situated Chat also builds a model of the discourse context, taking into account the shared visual environment, and then uses it to generate appropriate nonverbal behavior, such as referring gestures.
For more information, see www.media.mit.edu/groups/gn/projects/situchat/
The Behavior Expression Animation Toolkit (BEAT) allows animators to type text that they wish to be spoken by an animated human figure, and to obtain as output appropriate and synchronized nonverbal behaviors and synthesized speech. The nonverbal behaviors are assigned on the basis of actual linguistic and contextual analysis of the typed text, relying on rules derived from extensive research into human conversational behavior.
For more information, see gn.www.media.mit.edu/groups/gn/projects/beat/
Bringing in knowledge from human discourse analysis and social cognition, the research team is developing autonomous agents that are capable of having a real-time face-to-face conversation with a human. These agents are human in form and communicate using both verbal and nonverbal modalities.
Figure 4: Rea greeting screen shot (from MIT Media Lab)
For more information see, gn.www.media.mit.edu/groups/gn/projects/humanoid/
SAGE stands for Storytelling Agent Generation Environment. It is an environment that supports kids in storytelling activities. Children can create artificial storytellers as projections of fears, feelings, interests and role-models. Thus, the storytellers allowed them to explore their own inner life as well as to present themselves to others. SAGE was, for example, tested with children suffering from cardiac illnesses in a Boston hospital. This project can be subsumed under systems that support human storytelling in contrast to autonomous agents that themselves tell stories and conduct conversations.
Figure 5: The SAGE storytellers are embedded in a soft interface: a programmable interactive stuffed rabbit (from MIT Media Lab)
For more information, see xenia.media.mit.edu/~marinau/Sage/