Designing a Desktop Information System:
Observations and Issues

Thomas Erickson and Gitta Salomon

(now at) snowfall@acm.org and gitta@swimstudio.com

Published in Human Factors in Computing Systems: CHI '91 Proceedings. ACM: 1991.

ABSTRACT

This paper describes the first phase of a project to create a desktop information system for general users. The approach was to observe the problems, needs, and practices of several groups of information users, and to use these observations to drive the interface design of a prototype. In the first section of the paper, we describe problems which arise in the use of a relevance feedback system for information retrieval. In the second and third sections, we look at the needs and practices of users of both electronic and paper-based information systems. In the final section, we briefly describe the resulting design.

KEYWORDS: information retrieval, human interface, user interface, interactive systems, design process, design methodology, relevance feedback

INTRODUCTION

Today there are hundreds of on-line databases available to anyone with a personal computer and a modem. But it isn't very easy to access them. Each data source has its own interface; the computer often serves as only a terminal emulator. In most cases, while accessing information, users temporarily move into a world which is isolated from the rest of their computer environment. When they return, there are few facilities for working with the retrieved data.

In the future, users will want to move fluidly between numerous remote databases and effectively use the information they collect. Personal computers will need to be part of an integrated information environment.

In the Fall of 1989 we began a research project to explore interface issues related to the creation of just such an environment. Our focus was on problems that arise when general users are given access to a number of large, remote databases through their personal computers. (By "general user," we mean users who are not specialists in information retrieval; rather they need to obtain information to do their jobs.) One goal of the project, which is still underway, is the creation of a working prototype which will be installed in a real world environment, and the observation of its use. This prototype will give a group of accountants access to outside news sources and internal company data.

In this paper we discuss some of the interface issues which arose during the initial investigation phase and provide an illustration of how these issues drove an early prototype design. The investigation phase involved studying an existing commercial full-text information retrieval system, called DowQuest [5], which permits users to create powerful queries using natural language and relevance feedback [11] rather than a sophisticated query language. This phase also involved observation of information users. We interviewed and observed three groups of users: professional on-line searchers; day to day users of on-line information sources who were not information professionals; and a group of accountants. While the accountants made little or no use of on-line information sources, they nevertheless accessed and managed large amounts of paper-based information, and are the target group for the interactive prototype.

The remainder of this paper is divided into four sections. After a brief overview of the DowQuest system, we discuss issues concerning its query style. In the second and third sections, we look at the needs and practices of users of both electronic and paper-based information systems. Finally, we discuss a prototype that addresses some of these issues.

DOWQUEST AND RELEVANCE FEEDBACK

Early in the project, we were presented with the opportunity to use the DowQuest retrieval engine in our working prototype. In general, this engine seemed well suited to our target audience of accountants, who were generally lacking experience in the use of sophisticated query languages. Before we set out to design an interface to the engine, we examined the already functioning DowQuest implementation.

How DowQuest Works

DowQuest, offered by Dow Jones & Company as part of their Dow Jones News Service, gives users access to over 350 news sources covering, approximately, the previous six months [5]. The system offers a full-text retrieval mechanism based on relevance feedback [12] which is purported to enable ordinary users to conduct powerful searches of large databases. Rather than using a sophisticated query language, DowQuest allows users to first type in a few words, get a list of potential hits, and then say in essence 'get more like that one.'

Figures 1 and 2 depict two phases of the process of constructing a query in DowQuest. In Figure 1, the user has entered a sentence describing the desired information. While DowQuest does not do actual natural language understanding, the user is encouraged to enter text in that manner. In the example shown, the system will drop out the words "tell," "me," "about," "the," and "of," and use the other, lower frequency words to search the database. After the user has entered the initial query, the system returns the titles of the 16 most 'relevant' articles, where 'relevant' is defined algorithmically and is based on a variety of features over which the user has no control (and often no knowledge). While this list frequently contains articles relevant to the user's query, it also usually contains items which appear to the user to be irrelevant. At this point, the user has the option of reading the articles retrieved or continuing to the second phase of the query process.

tell me about the erruption of the alaskan volcano DOWQUEST STARTER LIST HEADLINE PAGE 1 OF 4 1 OCS: BILL SEEKS TO IMPOSE BROAD LIMITS ON INTERIOR . . . INSIDE ENERGY, 11/27/98 (935 words) 2 Alaska Volcano Spews Ash, Causes Tremors DOW JONES NEWS SERVICE , 01/09/90 (241) 3 Air Transport: Volcanic Ash Cloud Shuts Down All Four . . . AVIATION WEEK & SPACE TECHNOLOGY, 01/01/90 (742) 4 Volcanic Explosions Stall Air Traffic in Anchorage WASHINGTON POST: A SECTION, 01/04/90 (679) * * * * *

Figure 1. First phase of DowQuest interaction: the user types in a 'natural language' query and the system searches the database using the non-'noise words' in the query and returns a list of titles of the 'most relevant' articles.

In the second phase of the process (Figure 2) the user tells the system which articles are relevant to the query. The user may either specify an entire article or particular paragraphs within it. The system takes the full text of the selections, drops out the noise words, and takes a limited number of the most 'informative' words for use in a revised query. It then returns a new list of sixteen relevant items. This second phase may be repeated as many times as the user wishes, though, in our observations, it was rare for users to iterate more than two or three times.

search 2 4 3DOWQUEST SECOND SEARCH HEADLINE PAGE 1 OF 4 1 Air Transport: Volcanic Ash Cloud Shuts Down All Four . . . AVIATION WEEK & SPACE TECHNOLOGY, 01/01/90 (742 words) 2 Alaska Volcano Spews Ash, Causes Tremors DOW JONES NEWS SERVICE , 01/09/90 (241) 3 Volcanic Explosions Stall Air Traffic in Anchorage WASHINGTON POST: A SECTION, 01/04/90 (679) 4 Alaska's Redoubt Volcano Gushes Ash, Possibly Lava DOW JONES NEWS SERVICE , 01/03/90 (364) * * * * *

Figure 2. Second phase of DowQuest interaction: the user instructs the database to find more articles like 2,3 and 4, and the system returns a new set of relevant articles. (Note that the first three, 'most relevant' articles are those that were fedback (an article is most 'like' itself); the fourth article is a new hit.

Interface Issues

Through observation of users, as well as our own experiences with the system, we uncovered a number of interface issues related to DowQuest's method of query specification and use of relevance feedback. A variety of lower level interface problems such as the arbitrary 16 article result set size or the limitations of the teletype-style interaction are discussed in [14]. We discuss two higher level problems which seem of general interest and importance.

Inappropriate Expectations of Intelligence

New users of DowQuest generally had high expectations of the system's intelligence. There are a variety of possible reasons for this, ranging from the seeming use of natural language, to the system's apparent ability to 'find more like this,' to the general belief in the intelligence of computers. In any event, these expectations were usually dashed when, in response to the first phase of the first query, DowQuest would return a set of articles containing many irrelevant articles. Consequently many users assumed the system was no good, or that no relevant articles existed, and would abandon the query before even trying relevance feedback [9].

Another negative effect due to the assumption of intelligence occurred in the second phase of the query, when users requested the system to retrieve more articles 'like that one.' The new list of articles returned was ordered by 'relevance,' and, of course, no computer scientist would be surprised to find that an article is most similar to itself. General users, however, lacked this insight, and so when they looked at the new list and discovered that the first, most relevant article was the one they had told the system to find more like, they assumed there was nothing else relevant available and did not inspect the rest of the list [9]. While this assumption was incorrect, in human-human conversations it is conventional to assume that a provider of information will provide new information if it exists [8].

Ease of Use versus Control

Another problem, observed primarily in our own use of DowQuest, was one of undesired generalization. An example of this occurred for the query: 'tell me why Apple Computer stock prices have dropped.' The initial query produced some relevant articles, but after a couple rounds of feedback, the articles found veered away from Apple stock prices and began to emphasize the fluctuations in high technology stock prices. This occurred because articles discussing Apple's stock price tended to put it in a more general context, and repeated feedback of relevant articles reinforced this context. It is perhaps inaccurate to refer to such generalization as a problem, since it may often be a desired result. Nevertheless, it aptly illustrates the loss of control that results from shielding the user from the complexity of query languages.

While both problems discussed in this section arise in the context of DowQuest, analogs of them seem likely to occur in any system which attempts to use built-in intelligence to shield the user from underlying complexity.

NEEDS OF INFORMATION USERS

Through interviewing and observing users of both electronic and traditional information, we uncovered a number of issues that need to be addressed in the creation of an integrated desktop information environment. These are discussed below.

The Need for Metaknowledge

Before users can create queries they need metaknowledge about the information in which they're interested. For example, they need to know 1) where to look for the answer to their question, and 2) what constitutes a reasonable question. This knowledge is not typically in the hands of the general user.

Choosing from 10,000 databases

There are many databases available on-line. How do users decide where to start looking for desired information? In observing expert on-line searchers at their weekly status meeting, we noted that a remarkable amount of time was spent sharing information about databases: topics included newly available databases, information quality, frequency of updates, timeliness of updates, costs, as well as situations in which a particular database should be consulted. Some of this information was gathered from experience, some gleaned from newsletters written by the database publishers. It became apparent that learning and memorizing database characteristics is a recognized part of the professional searcher's job.

Yet, a casual information user cannot be expected to stay abreast of database attributes in the same way. On the other hand, casual users often hold strong opinions about the quality of various data sources (whether well founded or not), and would likely be opposed to any system that automatically selected 'appropriate' databases. The information access system should, therefore, be designed to offer easy access to descriptive information about the available databases and offer aid in making decisions, when desired.

Asking a useful question

A related problem is that general users often lack familiarity with the amount or scope of knowledge associated with the information they are seeking. The on-line searchers indicated that it is not uncommon for a client to request, for example, all information about "artificial intelligence." In such situations, the searcher explains the difficulty and, through conversation, narrows the query's breadth. However, if the user addressed the same query to an on-line service, an enormous amount of material would be retrieved, unaccompanied by explanation. In such instances, the information system needs to help users make headway in their search. Various research systems have addressed this problem, and solutions range from providing the user with an example of a retrieved record to assist in query reformulation [15], to providing mechanisms for guiding the user through the information [10].

Additional information about these, and a variety of related issues, can be found in [2] and [3].

Working with Dynamic Information

Many databases contain frequently changing information. Bibliographic sources acquire new citations; news databases receive the latest reports. Over time, previously available information may longer be accessible. For example, due to the large volume of news items and storage limitations, DowQuest offers approximately the last six months of news at any one time. Several interface issues arise because of this dynamic nature of information sources, some of which are discussed in [1].

From our interviews we expect users will issue two types of queries: ad hoc queries, where they want an answer to a specific question and nothing more; and on-going queries, where they want to be kept up to date on a particular topic. The following examples illustrate problems that can occur in both of these cases.

One day in November of 1989, we issued the ad hoc query "earthquake volcano ashes seismic activity" on the DowQuest database. This query was successful and returned desired articles about the October 1989 California earthquake. However, when we executed the same query at a later date with the intent of quickly re-finding this information, we obtained articles about a newly erupting Alaskan volcano. Because DowQuest only returns 16 results to any query, the new information had taken precedence and the "California Earthquake" articles had slipped below the retrieval threshold. Even if DowQuest had displayed the entire result set, we may not have easily found the desired articles, because their location had changed. Users may find it disconcerting that on a different day the same query may not return the same set of results.

Similarly, a once useful on-going query may eventually become inadequate. For example, an on-going query established ten years ago to track news on portable computers might have performed well for quite some time. Today, the same query would return unmanageable numbers of articles. Furthermore, because terminology has changed, some relevant information might not be returned: machines that were called portable ten years ago might not be called portable today and many subclassifications now exist. In order to be useful again, the old query would have to be refined and narrowed to meet particular interests, in light of new developments. Possibly, several new, specific queries would be required to effectively deal with the information.

These problems are basically the result of a mismatch: a static query cannot remain effective when it is directed at a dynamic database. Therefore, the query interface will need to establish a means of explaining why and how changes have occurred and offer ways for the user to easily alter the query as the available information changes.

PRACTICES OF INFORMATION USERS

In our observations of general information users, we noted a number of practices which seemed of importance in their use of information. It seems likely that any successful desktop information system will have to support such practices.

Skimming

In our study of accountants, we found that whether they were dealing with newspapers, technical papers, or memos, no one ever used the verb "read." These users began by skimming all information they received, often relying on the layout of the information to give them a quick overview. Only rarely did they decide to read the material thoroughly. One accountant subscribed to approximately 20 magazines and journals, but infrequently ventured beyond the table of contents. Similar usage patterns have been noted in other domains [4].

It is difficult to skim electronically-based information in the same way. One accountant, who had personally implemented part of an electronic database of a standard accounting reference, confessed that he preferred using the hard copy version because it was easier to skim.

One way to facilitate skimming is to provide article summaries. However, it is often not possible to summarize (either automatically or manually) a document because different people will look for different types of information. The accountants we interviewed noted that they often search for information that is implicit or even deliberately concealed (such as bad financial indicators), and would be even less likely to be included in an abstract.

A different tactic is to rely on structure in the document itself. Various designers (e.g., [7]) have argued that document usability can be enhanced by incorporating the structure of traditional documents into on-line information. Paper-based documents such as magazines employ a variety of visual design techniques which could be used to facilitate skimming in on-line documents. The design challenge here is to support skimming in ways that go beyond adaptation of traditional printed media design and take advantage of the properties of electronic media (e.g., [6]). For example, one accountant suggested that the system could display the first few sentences of every paragraph and he could choose where to expand to full text.

Annotation

Most of the accountants annotated (i.e., added comments or marked-up) the paper-based information they saved. Annotation was used as a memory cue about what aspects of the information were of importance. In addition, annotation was used to add value. For example, annotation facilitated skimming by other people with whom the document was shared. Also, it was used to indicate relationships between the document and other information.

Currently, it's difficult to annotate an electronic document casually. One accountant who maintained information on-line went to great lengths to annotate it. He would import the ASCII text into a word processor and mark it up by changing text styles to bold or underline. More typically, users printed the information they'd found, marked it up by hand, and filed it, thus losing any capacity for electronically managing the retrieved documents. A complete information environment needs to provide users with annotation tools, the means to view documents in both pristine and annotated form, and the ability search for elements in both the original data and the annotations.

Our interviews with accountants also revealed a way in which annotation may be more important in an electronic environment than in a paper-based one. The accountants themselves are audited by corporate level quality control people who want to make sure that they're performing to the company's standards. Among other things, quality control people look at clipping files to ensure that the accountant is keeping up on the industry and clients. Future systems which automatically retrieve information on particular topics would eliminate this as a source of evidence. In such an instance, the existence of annotations would provide proof that the information had been 'touched by human hands' evidence that might be welcomed by clients as well as quality controllers.

Organization

The accountants discarded all but the most important information; space constraints, as well as the difficulty of deciding which file folder was most appropriate, deterred them from saving more. There was a general feeling that the fewer items saved, the easier it was to re-locate them. One of the few users who maintained information in electronic form saved items into a "scrapbook" file, but rarely revisited anything because this required a sequential scan through the file. These cases indicate that an information management system needs to supply users with tools to organize and reorganize their data, once retrieved.

Such tools need to support full text search on saved items, as well as the ability to search on other criteria. For example, users often remember the approximate date on which the data was found, or the source it came from. Tools provided by the system should allow the use of combinations of such attributes for searching and reorganizing, thus permitting users to create their own idiosyncratic databases with items retrieved from external databases.

FROM OBSERVATIONS TO DESIGN

In this section, we briefly describe some of the design elements which resulted from consideration of the issues previously identified. Note that the design does not address all of the issues we have discussed in this paper. Furthermore, we must emphasize that because the system is still being implemented and has yet to be tested on the intended users we cannot say whether the features we describe will be successful. Readers may wish to look at related systems, such as SuperBook [6] and Concordia [13], which have already progressed through implementation and testing phases and which address similar issues.

The Prototype

Our prototype interface design has three components: reporters, newspapers, and notebooks.

Reporters are what users interact with to define the type of information they wish to retrieve. Through a form-based dialogue, a user can give a reporter specifications, examine items it retrieves, and use relevance feedback to refine those specifications. Any reporter can be automated so that it will access desired databases on a regular basis.

By using a reporter metaphor, we hope to provide users with a way to understand and contend with a less-than-predictable query mechanism and the dynamic nature of databases. This metaphor allows us to examine an interesting conjecture: anthropomorphism may be useful for representing ignorance, as well as intelligence. Users were often disturbed when initial queries to DowQuest would result in the retrieval of irrelevant articles, and sometimes concluded that "the system" didn't work. Would they be more forgiving of a reporter and expect it to improve with feedback? In addition, real-world reporters embody many of the characteristics of the retrieval mechanism: the ability to use fuzzy information as feedback ('find more like that one'), and the ability to function in a world of changing information (a reporter is not expected to come back with the same information next week).

Typically, a user might create several automated reporters. Because users will want a quick way to determine what's new without having to access each independent reporter, we designed the newspaper component to allow users to skim through all new information. Each reporter is allocated a 'column' in the newspaper. If new information has been retrieved by the reporter since the last edition of the newspaper, the associated column appears in the current newspaper, and contains the titles and brief excerpts of each item found. Reporters that find large amounts of relevant information appear on the front page; progressively less active reporters appear on subsequent pages. A listing of the columns published in the current issue is always available to the user and serves as a navigation device. From the newspaper, the user can either access the full text of an item of interest or call up the reporter. Consequently, if a reporter's column starts to stray from the desired information, the user can easily revise the reporter's assignment.

Whether users are interacting with a reporter or a newspaper, if they encounter an article they wish to keep, they may save it into a notebook. Notebooks allow users to create their own customized databases. Figure 3 describes features of a preliminary design which support practices such as browsing, annotation, and organization.

Figure 3. Caption: Prototype design for information "notebook." This screen dump depicts a notebook in which a user can skim, search, organize and annotate information. More specifically:

Annotation is supported through this vertical palette of tools along the left. The user is given access to (from top to bottom) "Posted" notes that can hold text data, a special type of Posted that can store audio annotations and a number of colored highlight pens. At the bottom of the vertical pallet, the "Find" button and "next" and "previous" arrows allow the user to look for data based on a number of characteristics. The user can search for particular text strings. In addition, the user can select to search for earlier or later instances of particular highlight colors, "Posted" notes or audio annotations.

Immediately to the left of the vertical tool pallet, the central portion contains the "content" of the notebook -- i.e. the actual data that was retrieved by the user.

To the right are two overviews: the "bird's eye view" and the hierarchical outline view.

The 'bird's eye view' of the notebook allows the user to see a visual map of items in the vicinity of the current location. The large arrow marks the current location; the sizes of annotations are exaggerated. The user can quickly see that two images are immediately 'above' the current location, a highlighted passage is located farther 'above' and a "Posted" note is located 'below.' This view can also be used as a navigational device by clicking on the desired location, the notebook content jumps to that location.

The hierarchical outline allows the user, in this case, to view the contents in chronological order. The user can expand the outline (e.g. 'open' a year into its months) or use it as a navigational device to jump to a particular section of the notebook. The user can also change the notebook's organization by selecting a new attribute from the "Organize by" menu at the top of the column.

SUMMARY

In this paper we've described the investigation phase of a project aimed at creating a desktop information system for general users. We began by describing problems due to inappropriate expectations of intelligence that arise when users employ natural language and relevance feedback to retrieve information. Similar problems may arise in other domains as interfaces grow more intelligent and adaptable. In our prototype, we use a "reporter." This anthropomorphic metaphor might be more suited to the fuzziness and inevitable 'mistakes' that occur in information retrieval.

Our investigation also included observations and interviews of professional searchers, general users of on-line systems, and accountants, which revealed a number of needs and practices that a desktop information system should support. The system should address the need for metaknowledge and offer support for dealing with dynamic information. The current interface prototype addresses these issues only slightly, because the initial implementation will provide its users with access to familiar information sources. In addition, the system should support current practices such as skimming, annotation, and organization. The newspaper and notebook components of the interface prototype illustrate some ways of providing this support.

The next phase of this project includes the implementation of the interface, its installation in an accounting office, and the observation of its use. At a later date, we hope to report on the nature and efficacy of the implemented interface and use our findings to drive the next design phase.

ACKNOWLEDGEMENTS

Special thanks to Ruth Ritter for graphic design assistance and to Kevin Tiene for influence throughout. The project discussed is part of a joint effort between Apple Computer, Dow Jones & Co., KPMG Peat Marwick and Thinking Machines Corp. We'd like to thank the following project leaders from each company for their assistance: Charlie Bedard, Clare Hart, Robin Palmer and Brewster Kahle.

BIBLIOGRAPHY

1. Allen, R. B. User Models: theory, method, and practice. International Journal of Man-Machine Studies 32, (1990), 511-543.

2. Belkin, N. J. and Vickery, A. Interaction in information systems: a review of research from document retrieval to knowledge-based systems. LIR Report no. 35. London, The British Library, 1985.

3. Daniels, P. J. Developing the User Modelling Function of an Intelligent Interface for Document Retrieval Systems. Ph.D. Thesis, The City University, London, 1987.

4. Dillon, A., Richardson, J. and McKnight, C. Human factors of journal usage and design of electronic texts. Interacting with Computers. 1, 2, (1989), 183-189.

5. Dow Jones & Company, Inc. Dow Jones News/Retrieval User's Guide. 1989.

6. Egan, D.E., Remde, J.R., Gomez L.M., Landauer, T.K., Eberhardt, J., Lochbaum, C.C. Formative Design-Evaluation of SuperBook. ACM Transactions on Information Systems, 7, 1, (January 1989), 30-57.

7. Glushko, R. J. Design Issues for Multi-Document Hypertexts. In Proceedings of Hypertext 1989. ACM Press, November, 1989, pp. 51-60.

8. Grice, H. P. Logic and Conversation. In P. Cole & J.L. Morgan (Eds.), Syntax and Semantics, Volume 3: Speech Acts. New York: Seminar Press, 1975.

9. Meier, E., Minjarez, F., Page, P., Robertson, M. & Roggenstroh, E. Personal communication, 1990.

10. Salomon, G., Oren T. and Kreitman K. Using Guides to Explore Multimedia Databases. In Proceedings of the Twenty-Second Annual Hawaii International Conference on System Science. (Kailua-Kona, Hawaii, Jan. 3-6, 1989), IEEE Computer Society Press, vol. 4, pp. 3-11.

11. Salton, G. and McGill, M. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983.

12. Stanfill, C. and Kahle, B. Parallel Free-text Search on the Connection Machine System. Communications of the ACM. 29, 12, (Dec. 1986), 1229-1239.

13. Walker, J. Supporting Document Development with Concordia. IEEE Computer. [Jan. 1988], 48-59.

14. Weyer, S. Questing for the "Dao": DowQuest and Intelligent Text Retrieval. Online. 13, 5, (Sept. 1989), 39-48.

15. Williams, M. D. What makes RABBIT run? International Journal of Man-Machine Studies 21, (1984), 333-352.

Designing a Desktop Information System: Observations and Issues