| home pubs essays HCI Remixed HICSS PC patterns |
Designing Agents as if People Mattered
(now at) firstname.lastname@example.org
One of Apple Computer's buildings used to have an advanced energy management system. Among its many features was the ability to make sure lights were not left on when no one was around. It did this by automatically turning the lights off after a certain interval, during times when people weren't expected to be around. I overheard the following dialog between a father and his six year old daughter, one Saturday evening at Apple. The energy management system had just noticed that the lights were on during 'off hours,' and so it turned them off.
Figure 1. The first phase of DowQuest interaction: the user types in a 'natural language' query and the system searches the database using the non-'noise words' in the query and returns a list of titles of the 'most relevant' articles.
In phase 2 of the process (figure 2) the user tells the system which articles are good examples of what is wanted. The user may specify an entire article or may open an article and specify particular paragraphs within it. The system takes the full text of the selections, drops out the high frequency noise words, and uses a limited number of the most informative words for use in the new query. It then returns a new list of the 16 'most relevant' items. This second, relevance feedback phase may be repeated as many times as desired.
Figure 2. The second phase of DowQuest interaction: the user instructs the database to find more articles like 2, 3 and 4, and the system returns a new set of relevant articles. (Note that the first three, 'most relevant' articles are those that were fed back (an article is most 'like' itself); the fourth article is new.
New users generally had high expectations of DowQuest: it seemed quite intelligent. However, their understanding of what the system was doing was quite different from what the system was actually doing. The system appeared to understand plain English; but in reality it made no effort to understand the question that was typed in--it just used a statistical algorithm. Similarly, the system appeared to be able to 'find more items like this one;' but again, it had no understanding of what an item was like--it just used statistics. These differences were important because they led to expectations that could not be met.
Users' expectations were usually dashed when, in response to the first phase of the first query, DowQuest returned a list of articles containing many obviously irrelevant items. When this happened some users concluded that the system was 'no good,' and never tried it again. While reactions like this may seem hasty and extreme, they are not uncharacteristic of busy people who do not love technology for its own sake. Furthermore, such a reaction is perfectly appropriate in the case of conventional application: a spreadsheet that adds incorrectly should be rejected. Users who had expected DowQuest to be intelligent could plainly see that it was not. They did not see it as a semi-intelligent system that they had control over, and that would do better as they worked with it. This was quite ironic, as the second stage of the process, relevance feedback, was the most powerful and helpful aspect of the system.
Only a few users gave up after the first phase. However, efforts to understand what was going on, and to predict what would happen continued to influence their behavior. In the second phase of a DowQuest query, when users requested the system to retrieve more articles 'like that one,' the resulting list of articles was ordered by 'relevance.' While no computer scientist would be surprised to find that an article is most relevant to itself, some ordinary users lacked this insight: when they looked at the new list of articles and discovered that the first, most relevant article was the one they had used as an example, they assumed that there was nothing else relevant available and did not inspect the rest of the list. Obviously, a system with any intelligence at all would not show them articles that they had already seen if it had anything new.
DowQuest is a very compelling system. It holds out the promise of freeing users from having to grapple with arcane query languages. But, as is usually the case with adaptive functionality, it doesn't work perfectly. Here we've seen how users have tried to understand how the system works (it's smart!), and how their expectations have shaped their use of the system.
How can designers address these problems? One approach is to provide users with a more accurate model of what is going on. Malone, Grant, and Lai (this volume), advocate this sort of approach, with their dictum of 'glass boxes, not black boxes,' suggesting that agents rules be made visible to and modifiable by users. This is certainly a valid approach, but it is not likely to always work. After all, the statistical algorithm which computes the 'relevance' of stories is sufficiently complex that describing it would probably be futile, if not counterproductive, and allowing users to tinker with its parameters would probably lead to disaster. In the case of DowQuest, perhaps the aim should not be to give users an accurate picture of what is going on. One approach might be to encourage users to accept results that seem to be of low quality, so that they will use the system long enough to benefit from its sophistication. Another approach might be to construct a 'fictional' model of what the system is doing, something that will set up the right expectations, but without exposing them to the full complexity of the system's behavior. See Erickson and Salomon (1991) and Erickson (1995) for a discussion other issues in this task domain, and a glimpse of one type of design solution.
Understanding how to portray a system which exhibits partially intelligent behavior is a general problem. Few will dispute that, for the foreseeable future, intelligent systems will fall short of the breadth and flexibility which characterize human-level intelligence. But how can the semi-intelligence of computer systems be portrayed? People have little if any experience with systems which are extremely (or even just somewhat) intelligent in one narrow domain, and utterly stupid in another, so appropriate metaphors or analogies are not easy to find. Excellent performance in one domain or instance is likely to lead to expectations of similar performance everywhere. How can these expectations be controlled?
The Agent Metaphor: Reactions and Expectations
In this section we turn to the agent metaphor and the expectations it raises. Why should adaptive functionality be portrayed as an agent? What is gained by having a character appear on the screen, whether it be a bow-tied human visage, an animated animal character, or just a provocatively named dialog box? Is it somehow easier or more natural to have a back and forth dialog with an agent than to fill in a form that elicits the same information? Most discussions that advance the cause of agents focus on the adaptive functionality that they promise--however, as we've already argued, adaptive functionality need not be embodied in the agent metaphor. So let's turn to the question of what good are agents as ways of portraying functionality? When designers decide to invoke the agent metaphor, what benefits and costs does it bring with it?
First it must be acknowledged that in spite of the popularity of the agent metaphor there is remarkably little research on how people react to agents. The vast bulk of work has been focused either on the development of adaptive functionality itself, or on issues having to do with making agents appear more lifelike: how to animate them, how to make them better conversants, and so on. In this section, we'll look at three strands of research that shed some light on the experience of interacting with agents.
The Guides project involved the design of an interface to a CD ROM based encyclopedia (Salomon, Oren, and Kreitman 1989; Oren, et al. 1990). The intent of the design was to encourage students to explore the contents of the encyclopedia. The designers wanted to create a halfway point between directed searching and random browsing by providing a set of travel guides, each of which was biased towards a particular type of information.
The interface used stereotypic characters such as a settler woman, an Indian, and an inventor (the CD-ROM subset of the encyclopedia covered early American history). The guides were represented by icons that depicted the guide's role--no attempt was made to reify the guide, either by giving it a realistic looking picture or by providing information such as a name or personal history. As users browsed though stories in the encyclopedia, each guide would create a list of articles that were related to the article being looked at and were in line with its interests. When clicked on, the guide would display its 'suggestions.' Thus, if the user were reading an article about the gold rush, the Indian guide might suggest articles about treaty violations, whereas the inventor guide might suggest an article about machines for extracting gold.
The system was implemented and was then tested on high school students. The students had a variety of reactions. They tended to assume that the guides, which were presented as stock characters, embodied particular characters. For example, since many of the articles in the encyclopedia were biographies, users would assume that the first biography suggested by a guide was its own. If the inventor guide first suggested an article on Samuel Morse, users often assumed that Morse was now their guide. Students also wondered if they were seeing the article from the guide's point of view (they weren't). And they sometimes assumed that guides had specific reasons for suggesting each story and wanted to know what they were (in line with users' general wish to understand what adaptive functionality is actually doing).
In some cases the students also became emotionally engaged with the guides. Oren, et al. (1990) report some interesting examples of this: " the preacher guide brought one student to the Illinois history article and she could not figure out why. The student actually got angry and did not want to continue with the guide. She felt the guide had betrayed her." While anecdotes of users getting angry with their machines are common, stories about users getting angry with one interface component are much less so. In another case, a bug in the software caused the guide to disappear. Oren, et al., write: "One student interpreted this as ' the guide got mad, he disappeared.' He wanted to know ' if I go back and take his next choice, will he come back and stay with me?'" Here the tables are turned. The user infers that the guide is angry. While no controlled experiment is available, it is hard to believe that the user would have made such an inference if the suggested articles been presented in a floating window that had vanished.
While this evidence is anecdotal, it is nevertheless interesting and relevant. Here we again see users engaged in the effort to understand, control, and predict the consequences of adaptive functionality. What is particularly interesting is how these efforts are shaped by the agent metaphor. The students are trying to understand the guides by particularizing them, and thinking about their points of view. One student wants to control his guide (the one that 'got mad and disappeared') by being more agreeable, suspecting that the guide will come back if his recommendations are followed. All of this happens in spite of the rudimentary level of the guides' portrayals.
Computers as Social Actors
Nass, and his colleagues at Stanford, have carried out an extensive research program on the tendency of people to use their knowledge of people and social rules to make judgments about computers. Two aspects of their results are interesting in relation to the agent metaphor. First, they show that very small cues can trigger people's readiness to apply social rules to computers. For example, simply having a computer use a human voice is sufficient to cause people to apply social rules to the computer (Nass and Steuer 1993). This suggests that the agent metaphor may be invoked very easily--human visages with animated facial expressions, and so forth, are not necessary. This is in accord with the finding from the Guides study, in which stereotypic pictures and role labels triggered attributions of individual points of views and emotional behavior. The second aspect of interest is the finding that people do, indeed, apply social rules when making judgments about machines.
Let's look at an example. One social rule is that if person B praises person A, a third person will perceive the praise as more meaningful and valid than if person A praises himself. Nass, Steuer, and Tauber (1994) designed an experiment to show that this social rule holds when A and B are replaced with computers. The experiment went something like this (it has been considerably simplified for expository purposes):
in part 1, a person went through a computer-based tutorial on a topic
in part 2, the person was given a computer-based test on the material covered
in part 3, the computer critiqued the effectiveness of the tutorial in part 1.
The experimental manipulation was that in one condition, parts 1, 2, and 3 were all done on computer A (i.e. computer A praised itself), whereas in the second condition computer A was used for giving the tutorial and computer B was used to give the test and critique the tutorial (i.e., B praised A). Afterwards, the human participants in the study were asked to critique the tutorial themselves. The result was that their ratings were much more favorable when computer B had praised A's tutorial, than when computer A had praised itself. That is, they were more influenced by B's praise of A than by A's praise of itself.
The finding that people are willing apply their social heuristics to computers is surprising, particularly since the cues that trigger the application of the social rules are so minimal. In the above experiment, the only cue was voice. There was no attempt to portray the tutorial as an agent or personal learning assistant. No animation, no picture, no verbal invocation of a teacher role, just a voice that read out a fact each time the user clicked a button. This finding appears to be quite general. Nass and colleagues are engaged in showing that a wide variety of social rules are applied to computers given the presence of certain cues: to date, these range from rules about politeness, to gender biases, to attributions about expertise (Nass, Steuer, and Tauber 1994; Nass and Steuer 1993).
While this research is important and interesting, there is a tendency to take it a bit too far. The finding that people apply social rules to interpret the behavior of computers is sometimes generalized to the claim that individuals' interactions with computers are fundamentally social (e.g., Nass, Steuer, and Tauber 1994; Ball, et al., this volume). I think that this is incorrect. It is one thing for people to apply social heuristics to machines; it is quite another to assume that this amounts to social interaction, or to suggest that the ability to support social interaction between humans and machines is now within reach. Interaction is a two way street: just as people act on and respond to computers, so computers act on and respond to people. Interaction is a partnership. But social interaction relies on deep knowledge, complex chains of inferences and subtle patterns of actions and responses on the part of all participants (see, for example, Goffman 1967). Computers lack the knowledge, the inferential ability, and the subtlety of perception and response necessary to be even marginally competent social partners. Does this mean that this research should be disregarded? Certainly not. If anything, the willingness of people to apply social rules to entities that can't hold up their end of an anticipated social interaction raises more problems for designers.
Thus far we have looked at cases where rather minimal portrayals of agents have evoked surprising reactions. For an interesting contrast, let's move to the other end of the spectrum and examine work on extremely realistic portrayals of agents.
One of the more famous examples of a highly realistic agent is "Phil", an agent played by a human actor in the Knowledge Navigator video tape (Apple Computer 1987). During the video, Phil interacts via natural language, and uses vocal inflection, direction of gaze, and facial expressions to support the interaction. While, as noted in the previous section, the intelligence and subtlety necessary to support such interaction is far beyond the capacities of today's software and hardware, it is possible to create portrayals of agents which synchronize lip movements with their speech and make limited use of gaze and facial expression (e.g. Walker, Sproull, and Subramani 1994; Takeuchi and Taketo 1995).
Walker, Sproull, and Subramani (1994) report on a controlled study of human responses to two versions of a synthesized talking face that was used to administer a questionnaire. One group simply filled in a textual questionnaire presented on the computer. Two other groups listened while synthesized talking faces (a different one for each group) read a question, and then typed their answer on the computer. Compared to people who simply filled out the questionnaire, those who answered the questions delivered by the synthesized faces spent more time, wrote more comments, and made fewer errors. People who interacted with the faces seemed more engaged by the experience.
Of particular interest was the difference between people's responses to the two synthesized faces. The faces differed only in their expression: one face was stern, the other was more neutral. Although the difference in expression was extremely subtle--the only difference was that the inner portion of the eyebrows were pulled inward and downward--it did make a difference. People who answered questions delivered by the stern face spent more time, wrote more comments, and made fewer errors. Interestingly enough, they also liked the experience and the face less.
Is the Agent Metaphor Worth the Trouble?
So far it looks like the agent metaphor is more trouble than its worth. Designers who use the agent metaphor have to worry about new issues like emotion and point of view and politeness and other social rules and--if they put a realistic face on the screen--whether people like the face's expression! Perhaps the agent metaphor should be avoided.
I think there are several reasons not to give up on agents. First, it is too soon to give up on the agent metaphor. The difficulties noted above are problems for designers--not necessarily for users. They may very well be solvable. We simply don't know enough about how people react to agents. Far more research is needed on how people experience agents. Second, the research by Nass and his colleagues suggests that we may not have much of a choice. Very simple cues like voice may be sufficient to invoke the agents metaphor. Perhaps our only choice is to try to control expectations, to modulate the degree to which the agent metaphor is manifested. It's not clear. The third reason is that I believe the agent metaphor brings some clear advantages with it.
The Agent Conceptual Model
We've discussed the two meanings of agent--adaptive functionality and the agent metaphor--and some of the new problems they raise. In this section I want to look below the surface of the agent metaphor at its most fundamental characteristics. The agent metaphor brings with it a new conceptual model, one that is quite different from that which underlies today's graphic user interfaces. It is at this level that the agent metaphor has the most to offer. To begin with, let's look at the conceptual model that underlies today's interfaces, and then we'll consider the agent conceptual model in relation to it.
The Object-Action Conceptual Model
Today's graphic interfaces use a variety of different metaphors. The canonical example is the desktop metaphor, in which common interface components such as folders, documents, and the trash can, can be laid out on the computer screen in a manner analogous to laying items out on a desktop. However, I don't think the details of the metaphors--folders, trash cans, etc.--are what is most important. Rather, it is the conceptual model that underlies them.
The underlying conceptual model of today's graphical user interfaces has to do with objects and actions. That is, graphic user interface elements are portrayed as objects on which particular actions may be done. The power of this object-action conceptual model is rooted in the fact that users know many things about objects Some of the general knowledge that is most relevant to the objects found in graphic user interfaces includes the following:
objects are visible
objects are passive
objects have locations
objects may contain things
This knowledge translates into general expectations. An object has a particular appearance. Objects may be moved from one location to another. Because objects are passive, if users wish to move them, they must do so themselves. Objects that contain things may be opened, their contents inspected or changed, and then closed again.
Graphic user interfaces succeed in being easy to use because these expectations are usually met by any component of the interface. When users encounter an object--even if they have absolutely no idea what it is--they know that it is likely that they can move it, open it, and close it. Furthermore, they know that clicking and dragging will move or stretch the object, and that double clicking will open it. They know that if they open it up and find text or graphics inside it, they will be able to edit the contents in familiar ways, and close it in the usual way. Because this general knowledge is applicable to anything users see in the interface, they will always be able to experiment with any new object they encounter, regardless of whether they recognize it.
The Agent Conceptual Model
The agent metaphor is based on a conceptual model that is different from the object-action conceptual model. Rather than passive objects that are acted upon, the agent metaphor's basic components (agents, of course) have a degree of animacy and thus can respond to events. We'll call this the responsive agent conceptual model.
Consider some of the general knowledge people have about agents:
agents can notice things
agents can carry out actions
agents can know things
agents can go places
This knowledge translates into expectations for agents that differ from those for objects. Since agents can notice things and carry out actions, in contrast to inanimate objects where these attributes don't apply, the responsive agent conceptual model is well suited to representing aspects of a system which respond to events. The sorts of things an agent might notice, and the ways in which it might respond, are a function of its particular portrayal.
Another basic difference is that while objects can contain things, agents know things, and, as a corollary, can learn things. Thus, the agent conceptual model is suitable for representing systems which acquire, contain, and manage knowledge. What sort of things are agents expected to learn or know? That depends on the way in which the agent is portrayed. To paraphrase Laurel (1990), one might expect an agent portrayed as a dog to fetch the electronic newspaper, but one would not expect it to have a point of view on its contents. A 'stupid' agent might only know a few simple things that it is taught, and might be unable to offer explanations for its actions beyond citing its rules; a more intelligent agent might be able to learn by example, and construct rationales for its actions. Note that more intelligence or knowledge is not necessarily better: what is important is the match between the agent's abilities and the user's expectations. Ironically, the agent metaphor may be particularly useful not because agents can represent intelligence, but because agents can represent very low levels of intelligence.
Another difference between object-action and agent conceptual model is that agents can go places. Users expect objects to stay where they're put; agents, on the other hand, are capable of moving about. Where can agents go? That depends both on the particular portrayal of the agent, as well as on the spatial metaphor of the interface. At the very least, an agent is well suited for representing a process that can log onto a remote computer, retrieve information, and download it to its user's machine. Another consequence of an agent's ability to go places is that it need not be visible to be useful or active. The agent may be present 'off stage,' able to be summoned by the user when interaction is required, but able to carry out its instructions in the background.
Objects and Agents
These arguments about the differences between the object and agent conceptual models could be ignored. After all, interface components ignore many properties of the real things on which they are based. For example, 'Folder objects' in graphic user interfaces can be deeply nested, one inside another inside another inside another, unlike their real world counterparts. Yet in spite of this departure from our knowledge of the real world objects, it works well. Perhaps we could simply integrated adaptive functionality into what were formerly passive, unintelligent objects. It's easy to conceive of an interface folder that is 'smart,' or that can 'notice' particular kinds of documents and 'grab' them, or that can 'migrate' from one a desktop machine to a portable when it is time to go home. However, the drawback of such a design tack is that it undermines the object-action conceptual model. If that tack were pursued, users wouldn't know as much about what they seen on the screen. If they encounter a new object, what will it do? Perhaps it will just sit there, or perhaps it will wake up and do something. Perhaps double clicking will open it, or perhaps double clicking will start it running around, doing things.
I believe that there is much to be said for maintaining the separation between the object and agent conceptual models. It becomes a nice way of dividing up the computational world. That is, objects and agents can be used in the same interface, but they are clearly distinguished from one another. Objects stay what they are: nice, safe, predictable things that just sit there and hold things. Agents become the repositories for adaptive functionality. They can notice things, use rules to interpret them, and take actions based on their interpretations. Ideally, a few consistent methods can be defined to provide the users with the knowledge and control they need. That is, just as there are consistent ways of moving, opening, and closing objects, so can there be consistent ways of finding out what an agent will notice, what actions it will carry out, what it knows, and where it is. Such methods get us a good deal of the way to providing users with the understanding, control, and prediction they need when interacting with adaptive systems.
There is a risk of over emphasizing the importance of metaphors and conceptual models. Normally, people are not aware of the conceptual model, the metaphor, or even individual components of the interface. Rather, they are absorbed in their work, accomplishing their actions with the kind of unreflective flow that characterizes expert performance. It is only when there are problems--the lights go out, the search agent brings back worthless material, the encyclopedia guide vanishes--that we begin to reflect and analyze and diagnose.
But this is why metaphors and conceptual models are particularly important for adaptive functionality. For the foreseeable future, it will fall short of perfection. After all, even humans make errors doing these sorts of tasks, and adaptive functionality is immeasurably distant from human competence. As a consequence, systems will adapt imperfectly, initiate actions when they ought not, and act in ways that seem far from intelligent.
In this chapter we've explored a number of problems that are important to consider when designing agents. First we noted that there are two distinct senses of agent: the metaphor that is presented to the user, and the adaptive functionality that underlies it. Each gives rise to particular problems. The agent metaphor brings a number of expectations that are new to user interface design. And adaptive functionality raises a number of other issues that are independent of how the functionality is portrayed.
The chief challenge in designing agents, or any other portrayal of adaptive systems, is to minimize the impact of errors and to enable people to step in and set things right as easily and naturally as possible. We've discussed two approaches to this. One is to make sure that adaptive systems are designed to enable users to understand what they're doing, and predict and control what they may do in the future. Here we've suggested that the agent conceptual model may provide a good starting point, providing general mechanisms for accessing and controlling agents. Second, since the agent metaphor can create a wide variety of expectations, we need to learn more about how portrayals of agents shape users' expectations and then use that knowledge to adjust (which usually means lower) people's expectations. Research which focuses on the portrayal of adaptive functionality, rather than on the functionality itself, is a crucial need if we wish to design agents that interact gracefully with their users.
Gitta Salomon contributed to the analysis of the DowQuest system. A number of the findings about the use of DowQuest are from an unpublished manuscript by Meier, et al. (1990), carried out as project for a Cognitive Engineering class under the supervision of Don Norman, with Salomon and myself as outside advisors. The paper benefited from the comments of Stephanie Houde, Gitta Salomon, and three anonymous reviewers.
Apple Computer. 1987. The Knowledge Navigator. (Videotape)
Belew, R. K. 1989. Adaptive Information Retrieval.: Using a Connectionist Representation to Retrieve and Learn about Documents. In Proceedings of SIGIR . Cambridge, MA: ACM Press, pp 11-20.
Cypher, A. 1991. EAGER: Programming Repetitive Tasks by Example. Human Factors in Computing Systems: the Proceedings of CHI '91 , pp 33 39. New York: ACM Press.
Dow Jones and Company, Inc. 1989. Dow Jones News/Retrieval User's Guide.
Erickson, T. 1996. "Feedback and Portrayal in Human Computer Interface Design." Dialogue and Instruction , eds. R. J. Beun, M. Baker and M. Reiner. Heidelberg: Springer-Verlag, in press, 1996.
Erickson, T., and Salomon, G. 1991. Designing a Desktop Information System: Observations and Issues. Human Factors in Computing Systems: the Proceedings of CHI '91 . New York: ACM Press.
Goffman, E. 1967. Interaction Ritual. New York: Anchor Books.
Greenberg, S., and Whitten, I. 1985. Adaptive Personalized Interfaces--A Question of Viability. Behavior and Information Technology , 4(1): 31 45.
Laurel, B. 1990, Interface Agents: Metaphors with Character. The Art of Human-Computer Interface Design , ed. B. Laurel. Addison Wesley, pp 355 365.
Meier, E.; Minjarez, F.; Page, P.; Robertson, M.; and Roggenstroh, E. Personal communication, 1990.
Mitchell, T.; Caruana, R.; Freitag, D.; McDermott, J.; and Zabowski, D. 1995. Experience with a Learning Personal Assistant. Communications of the ACM , 37(7): 1-91.
Nass, C., and Steuer, J. 1993. Anthropomorphism, Agency, and Ethopoeia: Computers as Social Actors. Human Communication Research, 19 (4): 504-527.
Nass, C.; Steuer, J; and Tauber, E. R. 1994. Using a Human Face in an Interface. Human Factors in Computing Systems: CHI '94 Conference Proceedings . New York: ACM Press.
Oren T.; Salomon, G.; Kreitman K.; and Don, A. 1990, Gui des: Characterizing the Interface. The Art of Human-Computer Interface Design , ed. B. Laurel. Addison Wesley, pp. 367 381.
Salomon, G.; Oren, T.; and Kreitman. K. 1989. Using Guides to Explore Multimedia Databases. The Proceedings of the Twenty-Second Annual Hawaii International Conference on System Science .
Stanfill, C., and Kahle, B. 1986. Parallel Free-text Search on the Connection Machine System. Communications of the ACM , 29(12,): 1229 1239.
Takeuchi, A.; and Taketo, N. 1995. Situated Facial Displays: Towards Social Interaction. Human Factors in Computing Systems: CHI '95 Conference Proceedings . New York: ACM Press.
Walker, J.; Sproull, L.; and Subramani, R. 1994. Computers are Social Actors. Human Factors in Computing Systems: CHI '94 Conference Proceedings . New York: ACM Press.
| home pubs essays HCI Remixed HICSS PC patterns |