The Open Group Research Institute, 11 Cambridge Center, Cambridge MA 02142, USA
An acknowledged problem with a rich hypertext system, such as the World Wide Web, is the tendency for users to become lost when following a number of links between different Web pages, or when using browser navigation controls. A visual representation of the tree depicting the paths the user has taken makes it easier for a user to return to pages previously visited.
Our approach allows one to create one's own structures overlaid upon the World Wide Web. We allow users to rearrange and prune the tree (thereby creating their own representations of how they believe Web pages should be organized), as well as to easily create named sets of pages from this tree and to perform operations upon those sets, such as determining which pages have changed. We also allow users to automatically add pages to the tree (by running a robot for example), without having to visit each page with the browser. The combination of these capabilities allows users to quickly build their own customized views of pages of interest in the Web.
The ability to easily create customized linked representations of pages in the Web, without requiring the ability to modify the original documents, puts considerable power into the hands of users. The ability to save and share these representations means that the effort of organizing web information can benefit others as well as individuals creating the representations.
Keywords: Bookmark Organization; Browsing Aids; WWW Navigation; Hypertext; Visualization; History
Those who cannot remember the past are condemned to repeat it.
The World Wide Web has popularized and made hypertext publishing accessible to large numbers of people. One well known problem with the Web is that it is possible to "get lost in the Web" during the process of following links, using the browser "Back" button, and trying to remember the URLs (Uniform Resource Locators) of previously visited pages [Conklin86], [Neuss94], [Ayers95]. Web browsers allow users to specify pages they wish to remember by saving URLs as bookmarks, but the accumulation of large numbers of bookmarks can make retrieval and manipulation difficult. In addition, the user must consciously choose to save a page as a bookmark rather than relying on the browser to automatically record it. A graphical history of browsing activity [Meeks87], [Dömel94], [Ayers95] built automatically as a user navigates, is a powerful way of generating organized views of information from the Web. This visual history can then be used as the basis for creating and presenting additional organized views of information from the Web.
Such a history records the pages visited by the user, and organizes them initially into a tree showing the paths the user has followed in the Web. The user may graphically reorganize and prune the tree to restructure the information. The user may also decide to select a set of pages from the tree and run a robot on these pages, automatically expanding the tree without needing to browse. In addition, the user may use sets of pages from the tree for performing other operations which report on the attributes of the pages, such as determining which pages have changed. The ability to create, name and save sets of pages is useful for managing views of the Web.
The structure of a customized view of the Web does not necessarily mirror the inherent structure of the Web, since the links in the view are the personalized relationships the user has created in the view. These are independent of, but may have been derived from, the links embedded in documents, which represent the view of the document authors. The ability to link documents without modifying document source is a powerful concept, since it allows users to create trails through the Web without needing to write or modify HTML documents. Such views may be saved across sessions, and shared with others.
The rest of this paper is organized as follows. We present a brief scenario to illustrate how Web views can support using the Web to do research. We then describe the HistoryGraph, our Web visualization prototype, and two tools (WhatsNew and LinkTree) which can be used in conjunction with it. Finally, we discuss how this visualization environment may be extended by incorporating concepts and work of other researchers.
The following scenario both illuminates and motivates our initial work. Imagine that it is 1998. Greg is a graduate student in economics working on his dissertation. He believes that another major depression like that of the 1930's is imminent and wants to support his hypothesis by examining the parallels between the 1920's and the 1990's. He already has collected various economic, historical, and anthropological documents, both text and electronic. He will continue to explore the Web, finding and noting other connections and selections which support or contradict his hypothesis. He will then write his thesis, forming a new hypertext document linking together the documents that best support his hypothesis and refute counter-arguments against it.
Greg's material is collected into one master HistoryGraph visualization tree containing two major subtrees--one for the 1920's and one for the 1990's. He also has other branches documenting various lines of thought that he has been pursuing. Today he wants to continue with a particular line of research he started yesterday on the Protectionism branch. He sees the tree node which represents the site he looked at describing protectionist legislation passed during the 1990's, and accesses that page by activating that node. From a section on sanctions aimed at Japan, he follows a link to a document on Japanese imports and exports, and from there to a document on the balance of trade. Nodes are automatically added to the tree for these two documents.
Greg knows that the balance of trade has always been a key issue in protectionist arguments. He wants to know about all documents which are the destinations of links in the protectionism document without reading the documents right now. He selects the tree node for the protectionism document and creates a new set that he labels "Protectionism documents". He then runs the LinkTree tool on this set. LinkTree finds all the documents which are linked to the protectionism document and adds nodes for them to the tree. These nodes include links to several articles from economics and politics journals arguing for and against the validity of the trade balance as an economic indicator.
Greg wants to explore some of these articles and guesses that the most important ones will be those with the most links to other documents. So he creates a new set containing the nodes for these articles and invokes the LinkTree on this new set to import nodes for all the links from the articles. Three of the economics articles and two of the politics articles have a large number of links. Since, as Greg reminds himself with a glance at his tree, he started out examining a political situation--protectionist legislation--he decides to examine the article in the politics journals. He browses to one of the articles by activating its node.
As he is reading the article, he perceives a similarity between the arguments in a section of the article and those in a similar article he read on the politics of the 1920's. He doesn't remember exactly which article it was, so he searches the 1920's branch of the tree for the keyword "politics" in the title, finding two matches. He views one by activating its node, quickly decides it is the wrong document, and then views the other document. This is the document he wanted. He finds a paragraph in the document he wants to remember, and creates a position marker, which is added to the tree. This marker creates a virtual named anchor in the document and adds a special node to the tree referring to that anchor. Just in case, he also adds an annotation to the link between the nodes describing the connection he thinks he sees.
He saves this representation of his dissertation research to discuss with his advisor tomorrow and begins reading one of the hard-copy texts.
This scenario demonstrates how visualization technology makes it easier for individuals to access and work with information on the Web. In order to easily and naturally track, organize, and build upon browsing activity, such systems must provide the following capabilities:
The HistoryGraph visualizer provides a graphical user interface
by creating a tree-structure which traces the browsing activity in
(See Figure 1).
Each node in the tree represents a page which was
visited; each link represents the user's shift of focus (whether by
following a link, using a browser control such as
"Back", or by typing in a URL directly.) The
HistoryGraph tracks the current URL in the user's browser.
Whenever the URL changes, it checks to see if it
already has a node for the new URL, and if not, will add a new
node with a link to the previously visited URL. If the URL
already exists in the graph, then HistoryGraph marks the
corresponding node as being the current node. If the user then
navigates to a new URL, a branch will be created in the graph.
HistoryGraph acts as a peer with the browser: activating
a node in the graph requests the user's browser to display the
|Figure 1: Sample HistoryGraph Screen.|
An elided title or URL is displayed next to the node icon. The standard icon is a simple file folder icon and indicates no additional information about the page. Other icons are used for nodes that contain additional information. For example, the document icon ( ) indicates that the document is stored in our document management system; the stack of documents icon ( ) indicates that this document is an index of managed documents. Managed documents are pages which have owners, and exhibit controlled access. The management service provides an index of documents as well as producing the original documents to authorized users when requested.
Our current interface uses user-configurable colors to display members of the current set, and the current node. (Fonts and pop-up lists could convey the same information in a color-independent manner). The node ("Detailed|ormation") (colored blue) represents the document currently displayed in the browser. The red nodes ("WAIBA *" + "Transpor| College") indicate the members of the currently displayed set. The highlighted node ("Mediator|web-team") indicates the currently selected node: the complete URL, complete title, and visit count for this node are displayed at the bottom of the window. The box on the lower right corner indicates a successful browser connection when colored green; it would be red with a diagonal line through it otherwise ( ).
HistoryGraph requires several facilities from a Web browser. The browser must provide a means to register with it in order to receive events when pages are retrieved, and to pass both URL and page to the HistoryGraph. The browser must also accept requests to fetch pages. Browsers which do not support both means of communication cannot be used with HistoryGraph, and, depending on the mechanisms provided, other browsers may require HistoryGraph modification before they may be used.
Our first HistoryGraph implementation is written in Tcl/Tk [Ousterhout94] [Welch95]. It communicates with the Netscape browser on Windows NT using DDE [SDI] [NetscapeDDE95], or with NCSA Mosaic using the Common Client Interface [CCI95] on various Unix systems. The tree display is generated using the Tk Tree Widget [Brighton].
In addition to a history mechanism, the HistoryGraph visualizer provides a means for using and manipulating the visualization, and for interacting with the browser and other tools. The visualizer displays a tree of nodes which represent pages visited and links representing an ordering relationship between nodes. Both the pages and links may have properties associated with them.
Properties are characteristics associated with a page which are directly useful to a user or software tools. Some properties are assigned automatically, while others may be user-defined. A user may define properties of a page in order to remember its content without having to re-fetch. Properties may be used for searching or pruning the tree and for generating sorted lists of pages extracted from the tree. They may also be used to compare pages with similar or different characteristics. Some example properties include
Links in the visualization may also have properties, such as the number of times traversed, which may be represented by the thickness of the line in the tree.
Properties can be extracted from the document itself, added manually in the visualization environment, or added by an external tool such as the LinkTree tool. Properties which are extracted from the document itself include the URL, title, and embedded HTML META information. Properties which may be added by the user include notes and annotations. Properties which may be added by tools ("property generators") include information added by a notification mechanism (such as a notification that document ownership has been changed).
In addition to individual pages, the visualizer allows the user to work with groups of pages, or sets. A set is a group of pages that is treated as a single entity, and may be used by another tool or displayed to a user. Sets are useful for identifying and manipulating pages which are related in some way, and are an important tool for helping users personalize their work. A set can be created in several ways:
Multiple sets can be created, saved, and displayed by highlighting the nodes in the tree. Sets may be manipulated by performing operations against them. Such operations include:
Sets are important because they provide a means to categorize and work with pages in the tree which is not dependent on the current tree organization, but rather on the attributes of the individual pages themselves. By providing multiple sets, the user may have multiple ways of categorizing the same pages (e.g the set of pages I want to print, the set of pages which I must update, etc.)
We designed our visualization environment as a group of cooperating browsing associates, extending a theme of modular, browser-independent "agents". A browsing associate is a relatively small and simple application which is not coupled to a particular HTTP stream and which can independently and asynchronously access the Web on the user's behalf. [Brooks95]. A browsing associate is designed to enhance a user's browsing experience by adding additional capabilities to the browser, but through a separate user interface. The associate may be loosely coupled to the browser through one of a number of possible mechanisms, such as the Common Client Interface [CCI95], but this is not a requirement.
We integrate individual browsing associates into the visualization environment by allowing the associate to maintain its own control window, which is used for setting operational parameters. However, we pass URLs and requests for execution from the HistoryGraph to the associate, and the associate returns results by creating nodes in the HistoryGraph tree, creating new sets, or modifying properties (including set membership) of existing nodes.
We have integrated the HistoryGraph visualizer with the WhatsNew associate. When run as an independent application, WhatsNew allows a user to determine whether or not pages of interest have changed since a specified date. The WhatsNew controls allow the user to specify
The LinkTree associate was originally designed to function as a World Wide Web robot [Koster94]: namely, a program that (when given a starting URL) would explore the hypertext graph starting from that node by examining the links contained in each document. The search is limited by specifying both the depth of the resulting tree and by an expression that each URL would have to match (for example, indicating that all URLs must incorporate the same hostname). LinkTree is informed by the client's browser when each new page is displayed. The search can be invoked automatically, or only when instructed by the user. LinkTree initially generated an HTML page representing the hierarchy of hyperlinks starting at the given page: if so requested, the Linktree would automatically cause this page to be redisplayed in the browser.
Linktree has also been integrated with HistoryGraph to allow the user to conveniently and automatically expand the tree from any set of nodes. The new nodes generated by Linktree are not displayed in the browser, but simply added to the tree, where the user can choose to view them, save them for future reference, prune them, or perform additional operations using these new nodes.
Most browsers support some kind of history recording mechanism. NCSA Mosaic provides a "Window History" item under the "Navigate" menu which provides a linear list of the titles of URLs visited. Netscape Navigator provides a similar list under the "Go" menu. Both lists suffer from the limitation that they are linear and thus lose information whenever the "Back" control is used. Microsoft's Internet Explorer records all the pages visited in the History Folder which may be examined later. URLs in this folder may be sorted by title, URL, visited date, expiration date, or update date. Clicking on one will cause the browser to revisit the page. After a while the list gets large and it is hard to make sense of it, but it does address the problem of recording one's activity automatically (although losing the structure).
Browsers offer some facilities for recording URLs and organizing and managing them. NCSA Mosaic originally offered a Hotlist, a linear list of bookmarks. Navigator 3.0 offers a hierarchical list of Bookmarks, with the ability to drag and drop bookmarks between folders, associate text with bookmarks, and to determine which have changed since they were recorded. Each bookmark includes both the URL and a title (originally derived from the document). Internet Explorer 3.0 offers favorites, which, like bookmarks, may be organized into hierarchical folders. Unlike Navigator bookmarks, arbitrary text may not be associated with favorites, although favorites may be dragged from the menu into internet aware applications such as Word 7, and then have text or other material associated with them. Clicking on them within such an application will cause Internet Explorer to display the page.
The HistoryTree[SmartBrowser] product from SmartBrowser is similar to our HistoryGraph. HistoryTree provides a pair of standalone applications for Windows which communicate with Netscape to build an interactive tree of the user's Web explorations. The nodes are displayed as featureless rectangles which, when pointed to will either display the page's title in the status line or popup a window containing the page's URL, title, and last visited time. Clicking a node navigates the browser to the corresponding page. Rudimentary display configuration, tree editing and saving and restoring are supported. HistoryTree does not have the concept of "sets" of nodes or the means for easily working with the nodes in the tree to perform tasks.
MosaicG[Ayers95] generates a graphical history of a user's browsing activity. It provides many of the same features as HistoryGraph, but does not provide for tree rearrangement by the user, configuration of display options by the user, sets, or integration of the tree into a framework allowing manipulation of the nodes. It is implemented as part of the Mosaic browser (which limits its generality), but is an excellent source of ideas .
The Webmap system [Dömel94] pioneered the graphical display history for Web browsers and included the concepts of associating attributes with links (such as whether or not the link goes between servers), and the idea of employing tree traversals to perform actions such as fetching or printing pages. It introduced the notion of sets, but did not use them in a browsing associate environment to accomplish user tasks, and used a non-standard mechanism to communicate with Mosaic.
Several other non-Web hypertext systems supported labeled or graphical history lists.
Apple's HyperCard provides a "Recent" card which records in miniature the 42 cards most recently visited, in the order in which they were visited. The most recent card is highlighted and selecting a miniature transports the user to the corresponding card. This early form of "thumbnailing" suffers from two problems. First, the miniatures are ordered according to their time of first visit; eventually the order may bear little relation to the actual ordering of most recently visited cards. Second, the miniatures have no additional labeling and thus similar looking cards are indistinguishable.
The TextNet[Trigg86] system was organized around links between text chunks and table of contents nodes. The system kept track of the user's current position in the overall table of contents and could display this position at any time. In addition, the system had paths, ordered lists of nodes for viewing hyperlinked text in a specific linear arrangement. Paths could be saved to be reused later or given to other readers. However, each path exists only as a linear list with no mechanism to create links or branching structures.
The Electronic Document System (EDS) at Brown University provided for special timeline pages:
This presentation of history is constrained by the hierarchical chapter/page structure of the system and doesn't capture any information about non-hierarchical movement.
Miniatures of the pages are drawn, each nested in its parent chapter's band, starting with the oldest on the left, with the time at which each was accessed appearing below it. The miniature pages are made into buttons that when touched transport the reader back to the selected page. Buttons permit the reader to move forward and backward along the timeline to examine miniatures of pages viewed earlier.[Feiner82]
In order to more fully implement the functionality described in our initial scenario, we anticipate that additional features will be added to HistoryGraph. Additional associates may be integrated with it, and the whole environment implemented inside a browser or desktop to make it more transparent.
In addition to recording pages which are visited, such as HTML and FTP pages, we would also like to allow the user to name sections of text within an HTML document without modifying the document, and then create nodes which would scroll the browser to that section of the document. For example, this could be useful to someone using the history path to mark relevant passages of a document that will need to be referred to later.[Meeks87] We have been working on a system to support inline annotations in a document without modifying the document [Schickler96]. This would extend that work to allow hidden annotations to serve as named marks and be recorded in the visualizer.
A generalized Search associate would be useful to users in presenting a standard search user interface, which would then query various configured search engines to find relevant pages. The responses of the search engines would be parsed, and the results added to the visualizer as new sets. The user could then prune the sets, save the work for later, or reorganize the information. This is similar to work reported on user agents [Eichmann96].
A system described in [Maarek96] allows the automatic reorganization of Netscape bookmarks, under user control. This system could be applied to subtrees or sets in the HistoryGraph to automatically reorganize the visualization according to the similarity between pages. The results could be combined with the results of the Search engine or LinkTree without requiring the manual addition of numerous bookmarks. The HistoryGraph pruning mechanism could then be used to remove nodes which did not meet specified criteria (such as being within a similarity tolerance of a specified starting node).
A graphical visualization of browsing history is essential for managing activities on the Web, but that it is not enough. One should treat the result of browsing not as merely a past to be repeated, but as a first-class hypertext entity in itself. This entity, the HistoryGraph visualization, may be customized through reorganization, selective pruning, and the addition of property information to the nodes in the tree. The links are first class in this representation, and manipulated directly, unlike HTML where the text is first-class and the links secondary.
By using browsing associates to perform tasks on sets drawn from the representation, it is possible to work meaningfully with large numbers of pages which meet certain criteria, determining which have changed, printing or validating them for example. When a browsing associate takes a set of nodes as input and generates new nodes in the tree automatically, it is possible to greatly increase the rate of Web learning, since it is not necessary to "browse" each page manually. By developing custom associates and integrating them with HistoryGraph, it should be possible to learn from ones personal browsing history.
In constructing and using HistoryGraph we have learned a number of lessons:
We are exploring alternatives to running HistoryGraph as a browsing associate. We are interested in plugins, such as the Spynergy[Eòlas] Tcl/Tk plugin, as well as exploring component-based architectures such as OpenDoc or ActiveX.
|[Ayers95]||E. Z. Ayers, J. T. Stasko,
Using Graphic History in Browsing the World Wide Web,
Fourth International World Wide Web Conference Proceedings, World Wide Web Journal, Issue 1. 1995.
Tree-4.0.1 - A Tree Widget for Tk4.0 based on C++ and [incr Tcl],
|[Brooks95]||C. Brooks, W. S. Meeks, M. S. Mazer,
An Architecture for Supporting Quasi-agent Entities in the WWW,
Intelligent Agents Workshop Proceedings, Conference on Information and Knowledge Management, December 1995.
|[CCI95]||NCSA Mosaic Common Client Interface,
March 31, 1995.
A Survey of Hypertext,
MCC Technical Report No. STP-356-86, October 1986.
Webmap - A Graphical Hypertext Navigation Tool,
The Second International WWW Conference, October, 1994.
|[Eichmann96]||D. Eichmann, J. Wu,
Sulla, A User Agent for the Web,
Fifth International World Wide Web Conference: Posters
|[Eòlas]||Eòlas Technologies Technologies Inc.,
|[Feiner82]||S. Feiner, S. Nagy, A. van Dam,
"An Experimental System for Creating and Presenting Interactive Graphical Documents,"
ACM Transactions on Graphics, Vol. 1, pp. 59-77, January 1982.
Robots in the Web: threat or treat?,
|[Maarek96]||Y. S. Maarek, I. Z. Ben-Shaul,
Automatically Organizing Bookmarks Per Contents,
Computer Networks and ISDN Systems 28 (1996)1321-1333, Fifth International World Wide Web Conference,
|[Meeks87]||W. S. Meeks,
A Design and Partial Implementation of Intermedia, Master's Thesis, Brown University 1987.
Netscape's DDE Implementation, March 22, 1995.
|[Neuss94]||C. Neuss, S. Höfling,
Lost in Hyperspace? Free Text Searches in the Web,
First International Conference on the World-Wide Web, May 1994.
|[Ousterhout94]||J. K. Ousterhout,
Tcl and the Tk Toolkit, Addison-Wesley, 1994.
|[SDI]||Spyglass Software Development Interface,
|[Schickler96]||M. Schickler, M. Mazer, C. Brooks,
Pan-Browser Support for Annotations and Other Meta-Information on the World Wide Web,
Computer Networks and ISDN Systems 28 (1996)1063-1074, Fifth International World Wide Web Conference,
|[Smartmarks]||Netscape Smartmarks 2.0,
|[Trigg86]||R. Trigg, and M. Weiser,
"TEXNET: A Network-Based Approach to Text Handling,"
ACM Transactions on Office Information Systems, Vol. 4, No. 1, pp.1-23, January 1986.
|[Welch95]||B. B. Welch,
Practical Programming in Tcl and Tk, Prentice Hall, 1995
W. Scott Meeks is a Senior Research Engineer at the Open Group Research Institute. He is currently working on the Prism project which is extending and enhancing the application of the OSF Distributed Computing Environment (DCE) to the Web. Previously he worked on projects investigating ways to transform the Web into a platform that facilitates the use of Web-based information and supports group-related activities. Before joining the RI, Mr. Meeks worked in the OSF Motif group and for Bell Communications Research, where he helped develop the Rendezvous platform for building interactive applications with multiple distributed cooperating users. His Master's thesis at Brown University involved designing and implementing a browsing history mechanism for the Intermedia system. He also has BS degrees in Computer Science and Cognitive Science from MIT.
Charles L. Brooks is a Principal Research Engineer at the Open Group Research Institute, an international shared research facility supported by government and industry. He is currently working on the Distributed Clients project in support of ongoing, adaptive mobile access to Web-based information resources under conditions of variable of intermittent connectivity. He previously worked on extending Web services for collaboration via application-specific proxy servers and desktop browsing assistants. Prior to joining the Research Institute, Mr. Brooks was a principal engineer on the OSF DCE 1.1 core technologies project, where he worked on the RPC and the DTS components, and worked for BBN, Inc., and Dataware Technologies, where he developed systems and software for management information, network management, and CD-ROM based document retrieval. His current research interests are in the areas of mobile computing and Human-Computer interaction. Mr. Brooks holds an MS in Computer information Systems from Boston University, and an MA and BA in English from Clark University.