The purpose of this dissertation is to introduce Tab, an example of a new style of user interface system which presents the user with a dynamically zoomable space. This introductory chapter describes Tab and discusses why it is interesting, particularly in relationship to the leading user interface style of the past twenty years, the desktop model. The second chapter considers th The purpose of this dissertation is to introduce Tab, an example of a new style of user interface system which presents the user with a dynamically zoomable space. This introductory chapter describes Tab and discusses why it is interesting, particularly in relationship to the leading user interface style of the past twenty years, the desktop model. The second chapter considers the historical underpinnings of each of the major elements of system. The third chapter presents a description of the design of the low level mechanisms of an implementation of the system called Tabula Rasa (or henceforth Tab), while the fourth chapter demonstrates how those low level mechanisms are used to implement some conventional user interface objects and some basic multi-scale applications. Finally, a concluding chapter summarizes and suggests some future directions for zoomable user interface (ZUI) systems.
The most common user interface style used in today's personal computers is the Desktop Metaphor. Systems which adopt this style are known as desktop systems because of the resemblance of the overlapping windows that appear on the user's screen to overlapping sheets of paper on a desk. The desktop interface achieves its power by bringing together several elements which, in combination, create a new and compelling experience for the user. The WIMP (Window/Icon/Menu/Pointer) system at once enables the user to perform multiple tasks (by using multiple windows and icons), relieves the user of much of the burden of learning and remembering how to use applications (by providing a menu-driven interface), and provides a more dynamic, tactile and intuitive type of interaction (by using a mouse.) All three of these elements were necessary to create a successful interface - each complements the other two in a variety of ways. Windows multiply the number of applications that can be running, increasing the burden of memorization that menus help to relieve. Without a mouse the manipulation of windows could become an arduous task. The mouse makes it much easier to access menus specific to a particular window, and so on.
In this dissertation a new style of user interface system is presented which embodies a new collection of user interface elements which complement one another in a way similar to the way windows, menus and pointers do. This system has three design elements and an underlying programming language methodology. These are the design elements:
It has been pointed out by some authors, e.g. Ted Nelson [31], that the similarity of a desktop style user interface to an actual desktop breaks down very quickly (real paper doesn't pop to the top of a stack when you click on it), while others have noted that slavishly attempting to duplicate the features of a real desktop would defeat the whole purpose of putting a system on a computer. As Gentner and Nielson note in [16], if your text editor functions exactly like a typewriter, why not just use a typewriter?
This engenders two observations. First, the word ``metaphor'' is ill chosen. Metaphor is a poetic device, drawing an absolute identity between two concepts. ``The moon is a ghostly galleon'', rather than ``the moon resembles a ghostly galleon.'' Brenda Laurel [24] points out that simile is a more accurate description of the relationship between a desktop and a desktop-style user interface system. The second observation is that the only interesting things about any computer system are the ways in which it departs from the real-world system on which it is based. We incorporate a simile into a system in order for the user to build a mental model of that system. The user then uses that model to reason about the system and deduce its untried capabilities. The desktop model tells us that ``this system is like a desktop, but without the physical limitations of a real desktop, and with the capabilities (and many limitations) familiar from older command-line oriented systems.'' Thus, a reasonable user will not be surprised if the screen representation of a folder fails to fatten as we add documents, even though a real file folder would. A user experienced with older computer systems would correctly expect that a desktop system might run out of storage space suddenly, while a real desktop has a more gradual storage exhaustion failure mode.
Indeed, the passage of time allows greater and greater departure from the desktop model as incremental improvements are discovered and added, and users become accustomed to the departures already incorporated. Nelson rejects the whole notion of metaphor in system design, preferring systems that are designed with a consistency from which the user can build a mental model, but with no real-world referent. He refers to this as the principle of virtuality [30], but this appears to be more a rejection of slavishly metaphorical design rather than a serious proposal. Linguistics and cognitive science argue against the ability of any human to think even a single thought without reference to simile. [23]
The first element of the Tab system is resolution independence. While the historical underpinnings of each of the elements are discussed in detail in chapter two, the notion of incorporating resolution independence into a user interface system is inherited directly from the Pad system. Pad is a name coined by Ken Perlin in 1989. It has been used to refer to a series of systems developed at New York University, and one system developed jointly with the University of New Mexico, each with a different feature set. A paper by Ken Perlin and myself presented at the 1993 SIGGRAPH conference [33] described the notion and the system as it was at that time.
In his book ``Designing the User Interface'' [36] Ben Schneiderman writes that there are two user interface techniques that are widely reported to engender not only acceptance, but often an enthusiastic response. The first is direct manipulation ([36], p. 202), which is a central component of the desktop system. The second technique is to design a system which presents the user with a large navigable space ([36], p. 222.) The Pad system is intended to capitalize on both these techniques. The central element of the Pad model is a virtual surface capable of nearly limitless detail. This surface is called virtual because such a surface cannot be adequately represented by any foreseeable display equipment. Even if it could, the human eye cannot apprehend the detail such a display would present. Instead, the system creates the impression of limitless detail by allowing the user to navigate while looking down at the surface. The user can move laterally to see different parts of the surface, or can change altitude, up to see a larger portion of the surface, or down to see a smaller portion in greater detail. surface?
Using a low resolution device (such as a computer monitor) to create a
high resolution experience of a surface is analogous to the way humans
use their eyes to create a high resolution experience of their
environment. The eye has a very small area of high resolution in the
center of the retina surrounded by an area of low
resolution.2.01
2.02.0
It is also able to move very
quickly, (about 500 degrees per second), and these two features
combine to create the experience of being immersed in a world of great
visual detail. This suggests that an important element of a system
based on a high-resolution virtual surface is that its navigation
system be extremely responsive.
We claim that this model has departed from conventional systems to the point where the desktop simile becomes irrelevant. The amount of space on our virtual surface is so vast that it could be used to represent anything, or perhaps everything. Instead of the few icons and windows we find on the screen of a desktop system, our virtual surface could hold all of the user's files. The user zooms in towards the folder that contains the file of interest, and at a certain point the individual files appear and the user zooms in on one particular file. When it becomes large enough to work with, the user can read or edit the file.
Note that this arrangement implies a geography for the user's data objects that is persistent. Desktop systems tend to use their surface for relatively transient objects, just as a real desktop would - a document being edited, a window onto an on-line service and perhaps one or two others. The amount of space available on a virtual surface makes this economy unnecessary, the user travels to the document of interest rather than bringing it onto the desk from someplace ``outside''. Furthermore, the size of the virtual surface makes a persistent geography natural to the user trying to locate objects on it. Humans have developed a great ability for navigating in a familiar geography, and a Pad style system capitalizes on this ability.
The Pad surface could also be used to hold other types of hierarchies. A shared collaborative version could partition the surface among a large number of users. Individual users would never run out of space because they could always just shrink everything down by half. A distributed version of Pad could represent the Internet as a set of nested domains. Zooming in on a particular machine might reveal areas for that machine's users, analogous to todays World Wide Web sites, and if the viewer is privileged, system administration tools and information might become available, and so on.
Pad was originally conceived to implement a virtual surface rather than a three dimensional space, and this places it somewhere between desktop systems and virtual reality systems in terms of the ``Looking at'' vs. ``Being in'' dichotomy discussed by Schneiderman ([36], p. 223.) Pad space is sometimes referred to as a 2.5 dimensional space, which can be thought of as a three dimensional space where the viewer is always looking straight down, and the objects in the space are all flat and face straight up. Of course, these flat objects could be displaying perspective drawings of three dimensional objects. The distance of the objects from the viewer has no effect on their size, just as windows in a conventional desktop interface stay the same size even if they are pushed to the bottom of the stack. This is to say, the system uses an orthogonal projection rather than a perspective projection. As in a desktop system, there is a total ordering on the objects in the space which is called the stacking order. Rather than being determined by the viewer's distance, each object contains a bounding rectangle that defines its size and position independent from its altitude, and a transformation matrix that converts the coordinate system of the space the object exists in to the object's private coordinate system.
We believe that this type of space, where the size of an object is decoupled from its distance, has advantages over a true three dimensional space as an environment for displaying and storing information. It means the visual size of an object can be controlled by the user, it becomes a design element of the user's graphical computing environment. While we might add the same feature to a fully three dimensional space, it would be that much more confusing because the depth cues in a such a space are that much stronger than in our 2.5 dimensional space. This 2.5 dimensional space also reduces the tendency one has of becoming disoriented or lost in a fully three dimensional space, while still freeing the user from some of the restrictions of the two dimensional space embodied in conventional desktop systems. Finally, we decided that the additional computer power required to implement a fully three dimensional system would be better devoted to improving the responsiveness of this simpler imaging model.
Tab is an attempt to carefully consider what is necessary to make a successful user interface system based on Pad's virtual surface; that is, a system that might successfully compete with the commercial systems. This goal brings up a number of new considerations that would not apply to a system that used the Pad virtual surface to implement a particular application. The main task is to devise an application programmer interface (API) which allows programmers to author interactive software to inhabit this virtual surface. The API must address the questions of how to express both the appearance and the interactive behavior of Pad applications.
One of the features of the desktop system is the ability to juxtapose
various objects so they can be compared, or so one can be referred to
while working on the other. The Tab system must match this feature
within its own paradigm. In a desktop system this is accomplished by
loading the objects into windows controlled by applications and
dragging windows around. Because we wish to maintain a permanent
geography and avoid moving objects, a mechanism called a portal
has been adopted from the Pad system to allow users to relate objects
distant in position or scale or both. In the Tab system, a portal can
be linked to any object, and it displays whatever is below that object
(Figure 1.1.) Portals are usually linked a simple
invisible object type called a camera. Not only can a user see
through a portal, but also can interact with the objects there just as one
interacts with objects directly, that is, if the user's input events
hit the monitor they are ``transported'' to the camera and pass to the
objects below it.
![]() |
We also need to decide how the notion of applications translates into
the Pad model. By this we mean tools that can be applied to data
such as text, images or a spreadsheet. Most desktop interfaces
associate a primary tool with each data object. For example, under
Microsoft Windows every document whose filename ends with .doc is
usually assigned Microsoft Word as a default application that is
invoked when the icon for that document is double clicked. However,
there may be a number of tools each of which operate on a particular
data object in a different way. There may be one tool that analyses
the document's grammar and computes word count statistics, another
which could be used to edit figures the document contains, and so
on. There seems to be a tendency for a desktop model application to
try to do all of these tasks in a single package, while the older
command-line model encouraged separating these tasks out into smaller
packages each of which (as the slogan goes) ``do one thing and do it
well.''2.01
2.02.0
It is as if in order to escape the deficiencies of the desktop
interface all the functionality of the system has drained down into
each individual application. One reason for this tendency may be
that there is no intuitive idiom in the desktop model for applying one
of several tools to an object, let alone an idiom for applying several
tools in ``sequence'' as in a Unix pipeline.
In order to bring this capability to the user interface it was decided
to allow the data to remain stationary on the surface and bring the
tool to the data object using a variant of the portal called a
portal filter or simply filter. This is a portal which
``understands'' a certain type of object and causes objects of that
type to have a different appearance when viewed through it. A filter
may also ``mediate'' the input events that pass through it on the way
to that object, changing them into commands the object understands.2.01
2.02.0 By
convention, filters in pad usually directly display the area they
cover, rather than some remote area as portals often do. (This is
enforced using the constraint system, another Tab subsystem.)
As an example, a user might place a word processing filter on top of a
text file. When this happens, a cursor appears in the text. Now when
the user's mouse is positioned over both the filter and the text file
(that is, inside their intersection), text is inserted into the data
file as the user types. If more than one text file appears beneath
the filter, each has a cursor, and editing commands go to whichever
one contains the mouse cursor.
In order to achieve the most versatile model for filter behavior it is important to carefully consider what information belongs to which object. In this example, the text belongs to the data object, while the cursor and its position should be a part of the filter. We don't want the data object to have to carry any information which is specific to the filter's application. There might be another filter which when placed over the text file highlights spelling errors or displays a grammatical analysis of the document - such a filter would have no use for a cursor, and it would be a waste for this information to be stored in the document. This also means that the filter needs to store a set of cursors, one per text file that appears below it. A general mechanism for managing per-filtered-object information is discussed in section 1.6.
If we are editing a document and we move the editor filter off that document and then back on, we would probably like the cursor to re-appear in the same position we left it. This means we need to retain the cursor information associated with the document even if the filter moves away from it. One might worry that a filter will accumulate thousands of cursor positions, but this is less likely to become a problem if we use semantic zooming to insure that no cursor is generated for documents that occupy too little of the screen to be comfortably edited. The cursor is a small piece of data, but other filters might generate much larger amounts of information associated with each data object - for example, an image processing filter might generate modified versions of the image it is covering. This issue is domain dependent, so it is important that the API provide tools to allow the Pad application programmer to implement various policies regarding when to retain or discard this data. This includes notification of when an object enters or leaves the filter, timers which notify the filter when an object has been absent for a certain period, and a least recently used queue to decide which data can most safely be discarded.
It should be noted that portals and filters, like data objects, are persistent objects in the Pad system. In fact, there is no reason to introduce the notion of a ``transient'' object. Such an object is an artifact associated with the idea of turning off the computer, and with the distinction between temporary and permanent memory. We do not believe that this distinction has any value to the user, and therefore its existence should be made transparent. This decision means that filter based applications have no save command - the document should always be in a ``safe'' state, and older versions are made available through use of undo commands and the ability to revert to check-pointed versions of the document.
In designing a Pad API, our first observation is that rendering commands and event positions must be performed in a floating point space. This brings up some difficulties with regards to rendering operations which have generally been regarded as pixel-oriented - for example, images are usually stored as a grid of pixels, and thus have a ``natural size'' with relation to the screen's pixel grid. Also, text rendering is usually tuned carefully to the exact size in pixels that the character will appear - even if the characters are stored as filled shapes constructed from polygons and splines, the font will contain hints about how to make sure that artifacts don't show up when converting to a pixel representation. A Pad system needs to take on some tasks that are usually handled by the application in systems where the rendering model gives access to the screen's pixels.
Furthermore, applications still need to know how large they appear on the user's screen in order to avoid wasting time and screen space rendering detail that is too small to be seen, or too large to be interesting. For example, a Pad representation of a text file might show just the file's name while the user was at a distance where the file is less than an inch across. When the file is a couple of inches across it might display some details about the file such as size, permissions and ownership. Only when it became five or six inches across would it display the textual content of the file. Finally, it would fade out entirely if it got too big, perhaps when the characters exceeded an inch in height.
We have coined the term ``semantic zooming'' to describe this behavior. It is intended to try to optimize the information carrying capacity of the user's screen in a multi-scale system, just as anti-aliasing optimizes a screen's image displaying ability. The API must include a mechanism by which the application can determine its screen size. Beyond that, however, this topic is mostly one concerning the design of Pad applications, rather than how to solve technical issues. A couple of examples of applications that illustrate some issues regarding semantic zooming are:
In Pad this hierarchy might be represented differently, in a single document. The rows of the top level spreadsheet would resolve into multiple rows as you approached, allowing an unlimited depth. You would want separate control of the horizontal and vertical zooming so you could adjust the temporal resolution separately from the conceptual hierarchy represented in the rows. Portals could be used to keep the row and column labels visible even when you were deep within the table - one portal would display the cell entries while the constraint system would cause the other two to track the row and column labels.
This example points up the importance of transitions in semantic zooming - one of the most important things to be conveyed by the spreadsheet is the relationship between the summed rows and the sum. Both must be clearly visible at the same time for the viewer to relate the two.
We have already encountered one situation where we need to maintain a constraint - the convention that the cameras should remain directly below their associated portal if that portal is a portal filter. A constraint system makes it easy to tie various components into a cooperating system, and help system components from falling prey to the syndrome of increasing complexity described in section 1.3.
Actually, the resolution independence of the Tab system makes the satisfaction of constraints much simpler than in many pixel-based systems. This is because it is never necessary to use complicated algorithms to position object in a pleasing way - things can always be fit into the allotted space, and the user can move closer if necessary. Therefore, the Tab constraint system has so far remained quite rudimentary, implemented using a simple call-back keyed to three important types of events in the system: an object being inserted, an object being removed, or an object whose geometry changes. These are the principal types of changes whereby one object might affect another. It is also possible for an object to control another object of which it has direct knowlege.
Multiple inheritance is another feature of the object-oriented programming model which allows classes to inherit data and methods from more than one superclass. This mechanism is well suited to composing various types of interactive behaviors in a single object, particularly in combination with multi-methods. For example, in Tab the interactive behavior of an object is determined by a component of the class <handler> (as in ``event handler.'') There are many subclasses derived from <handler> to describe different types of interactive behavior. For example, the <drag-bindings> class implements the ability to drag and scale an object using the mouse, while we might have a class <text-editor-bindings> which implements keyboard commands to edit text, such as backspace, delete, cursor motion and so on. If we want an object to have both these behaviors we can simply create a new class with both <drag-bindings> and <text-editor-bindings> as super-classes and allow the multi-method mechanism to locate the correct method for each event.
The third and most unusual object-oriented feature used in the Tab implementation is delegation. This is a form of inheritance which is per-instance rather than per-class. Few object oriented languages implement delegation - there is actually no built-in support for delegation in CLOS or STklos, our implementation language. However, we will see in section 3.15.1 how delegation can be implemented in STklos. It allows us to implement a macro similar to define-class named define-filtered, in honor of its central role in implementing the Tab filtering mechanism. An instance of a class created using define-filtered is constructed from an existing instance of the parent class, along with the usual constructor arguments. References to the slots the new instance inherited from the parent class are actually references to the slots of the parent instance. This corresponds to the relationship between an object and that object as seen through a filter.
As mentioned above, the filtering mechanism seems to hold some promise of restoring the ability to compose tools lost in the move from the Unix command line environment to the desktop environment. This mechanism should enable us, for example, to place a ``find the adjectives'' filter on top of a document, and a ``count the words'' filter on top of that to display the number of adjectives in our document. The same ``count the words'' filter should, if placed directly on the document, display the total word count. This can be accomplished by having the adjective filter derive a filtered class from the text object which adds slots and to and overrides methods of the original class so that it behaves as a document which contained only the adjectives of the original document.
This introductory chapter has introduced Tab in the context of existing user interface metaphors and similes. The basic elements of the Tab system have been discussed at some length, including the virtual surface, portals and filters, semantic zooming, and constraints. It is important to stress that the value comes from combining all these techniques in a single system, just as the combination of windows, icons, pointers and menus let to the success of the desktop interface style.