polaris

A System for Query,Analysis,and Visualization of Multidimensional Relational Databases
展开查看详情

1.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 1 Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases Chris Stolte, Diane Tang, and Pat Hanrahan AbstractÐIn the last several years, large multidimensional databases have become common in a variety of applications such as data warehousing and scientific computing. Analysis and exploration tasks place significant demands on the interfaces to these databases. Because of the size of the data sets, dense graphical representations are more effective for exploration than spreadsheets and charts. Furthermore, because of the exploratory nature of the analysis, it must be possible for the analysts to change visualizations rapidly as they pursue a cycle involving first hypothesis and then experimentation. In this paper, we present Polaris, an interface for exploring large multidimensional databases that extends the well-known Pivot Table interface. The novel features of Polaris include an interface for constructing visual specifications of table-based graphical displays and the ability to generate a precise set of relational queries from the visual specifications. The visual specifications can be rapidly and incrementally developed, giving the analyst visual feedback as they construct complex queries and visualizations. Index TermsÐDatabase visualization, database analysis, visualization formalism, multidimensional databases. æ 1 INTRODUCTION I Nthe last several years, large databases have become common in a variety of applications. Corporations are creating large data warehouses of historical data on key generated from the resulting tables. Visual Insights recently released a new interface for visually exploring projections of data cubes using linked views of bar charts, scatterplots, aspects of their operations. International research pro- and parallel coordinate displays [14]. jects such as the Human Genome Project [20] and In this paper, we present Polaris, an interface for the Digital Sky Survey [31] are generating massive data- exploration of multidimensional databases that extends the bases of scientific data. Pivot Table interface to directly generate a rich, expressive A major challenge with these databases is to extract set of graphical displays. Polaris builds tables using an meaning from the data they contain: to discover structure, algebraic formalism involving the fields of the database. find patterns, and derive causal relationships. The analysis Each table consists of layers and panes and each pane may and exploration necessary to uncover this hidden informa- contain a different graphic. The use of tables to organize tion places significant demands on the human-computer multiple graphs on a display is a technique often used by interfaces to these databases. The exploratory analysis statisticians in their analysis of data [5], [11], [38]. process is one of hypothesis, experiment, and discovery. The Polaris interface is simple and expressive because The path of exploration is unpredictable and the analysts it is built upon a formalism for constructing graphs and need to be able to rapidly change both what data they are building data transformations. We interpret the state of viewing and how they are viewing that data. the interface as a visual specification of the analysis task The current trend is to treat multidimensional databases and automatically compile it into data and graphical as n-dimensional data cubes [16]. Each dimension in these transformations. This allows us to combine statistical data cubes corresponds to one dimension in the relational analysis and visualization. Furthermore, all intermediate schema. Perhaps the most popular interface to multi- specifications that can be created in the visual language dimensional databases is the Pivot Table [15]. Pivot Tables are valid and can be interpreted to create visualizations. allow the data cube to be rotated, or pivoted, so that Therefore, analysts can incrementally construct complex different dimensions of the dataset may be encoded as rows queries, receiving visual feedback as they assemble and or columns of the table. The remaining dimensions are alter the specifications. aggregated and displayed as numbers in the cells of the table. Cross-tabulations and summaries are then added to the resulting table of numbers. Finally, graphs may be 2 RELATED WORK The related work to Polaris can be divided into three categories: formal graphical specifications, table-based data . The authors are with the Computer Science Department, Stanford University, Stanford, CA 94305. displays, and database exploration tools. E-mail: {cstolte, hanrahan}@graphics.stanford.edu, dtang@cs.stanford.edu. 2.1 Formal Graphical Specifications Manuscript received 17 Apr. 2001; accepted 10 July 2001. For information on obtaining reprints of this article, please send e-mail to: Bertin's Semiology of Graphics [6] is one of the earliest tvcg@computer.org, and reference IEEECS Log Number 114503. attempts at formalizing graphing techniques. Bertin 1077-2626/02/$17.00 ß 2002 IEEE

2.2 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 developed a vocabulary for describing data and the techni- a set of predefined visualizations, such as scatterplots and ques for encoding data in a graphic. One of his important parallel coordinates, for exploring multidimensional data contributions is the identification of the retinal variables sets. These views are augmented with extensive interaction (position, color, size, etc.) in which data can be encoded. techniques, such as brushing and zooming, that can be used Cleveland [11], [12] used theoretical and experimental results to refine the queries. We feel that this approach is much to determine how well people can use these different retinal more limiting than providing the user with a set of building properties to compare quantitative variations. blocks that can be used to interactively construct and refine Mackinlay's APT system [26] is one of the first applica- a wide range of displays to suit an analysis task. tions of formal graphical specifications to computer- Another significant database visualization system is generated displays. APT uses a set of graphical languages VisDB [22], which focuses on displaying as many data and composition rules to automatically generate 2D dis- items as possible to provide feedback as users refine their plays of relational data. The Sage system [29] extends the queries. This system even displays tuples that do not meet concepts of APT, providing a richer set of data character- the query and indicates their ªdistanceº from the query izations and generating a wider range of displays. criteria using spatial encodings and color. This approach Livny et al. [25] describe a visualization model that helps the user avoid missing important data points just provides a foundation for database-style processing of outside of the selected query parameters. In contrast, visual queries. Within this model, the relational queries Polaris provides extensive ability to drill-down and roll- and graphical mappings necessary to generate visualiza- up data, allowing the analyst to get a complete overview of tions are defined by a set of relational operators. The Rivet the data set before focusing on detailed portions of the visualization environment [9] applies similar concepts to database. provide a flexible database visualization tool. Wilkinson [41] recently developed a comprehensive language for describing traditional statistical graphs and 3 OVERVIEW proposed a simple interface for generating a subset of the Polaris has been designed to support the interactive specifications expressible within his language. We have exploration of large multidimensional relational databases. extended Wilkinson's ideas to develop a specification that Relational databases organize data into tables where each can be directly mapped to an interactive interface and that row in a table corresponds to a basic entity or fact and each is tightly integrated with the relational data model. The column represents a property of that entity [36]. We refer to differences between our work and Wilkinson's will be a row in a relational table as a tuple or record and a column further discussed in Section 8. in the table as a field. A single relational database will 2.2 Table-Based Displays contain many heterogeneous but interrelated tables. Another area of related work is visualization systems that We can characterize fields in a database as nominal, use table-based displays. Static table displays, such as ordinal, quantitative, or interval [6], [34]. Polaris reduces scatterplot matrices [18] and Trellis [3] displays, have been this categorization to ordinal and quantitative by treating used extensively in statistical data analysis. Recently, intervals as quantitative and assigning an ordering to the several interactive table displays have been developed. nominal fields and subsequently treating them as ordinal. Pivot Tables [15] allow analysts to explore different The fields within a relational table can also be partitioned projections of large multidimensional datasets by interac- into two types: dimensions and measures. Dimensions and tively specifying assignments of fields to the table axes, but measures are similar to independent and dependent are limited to text-based displays. Systems such as the Table variables in traditional analysis. For example, a product Lens [27] and FOCUS [32] visualization systems provide name or type would be a dimension of a product and the table displays that present data in a relational table view, product price or size would be a measure. The current using simple graphics in the cells to communicate quanti- implementation of Polaris treats all ordinal fields as tative values. dimensions and all quantitative and interval fields as measures. 2.3 Database Exploration Tools In many important business and scientific databases, The final area of related work is visual query and database there are often many dimensions identifying a single entity. exploration tools. Projects such as VQE [13], Visage [30], For example, a transaction within a store may be identified DEVise [25], and Tioga-2 [2] have focused on developing by the time of the sale, the location of the store, the type of visualization environments that directly support interactive product, and the customer. In most data warehouses, these database exploration through visual queries. Users can multidimensional databases are structured as n-dimen- construct queries and visualizations directly through their sional data cubes [36]. Each dimension in the data cube interactions with the visualization system interface. These corresponds to one dimension in the relational schema. systems have flexible mechanisms for mapping query To effectively support the analysis process in large results to graphs and all of the systems support mapping multidimensional databases, an analysis tool must meet database records to retinal properties of the marks in the several demands: graphs. However, none of these systems leverages table- based organizations of their visualizations. . Data-dense displays: The databases typically con- Other existing systems, such as XmdvTool [40], Spotfire tain a large number of records and dimensions. [33], and XGobi [10], have taken the approach of providing Analysts need to be able to create visualizations that

3.STOLTE ET AL.: POLARIS: A SYSTEM FOR QUERY, ANALYSIS, AND VISUALIZATION OF MULTIDIMENSIONAL RELATIONAL DATABASES 3 Fig. 1. The Polaris user interface. Analysts construct table-based displays of relational data by dragging fields from the database schema onto shelves throughout the display. A given configuration of fields on shelves is called a visual specification. The specification unambiguously defines the analysis and visualization operations to be performed by the system to generate the display. will simultaneously display many dimensions of . Familiar: Table-based displays have an extensive large subsets of the data. history. Statisticians are accustomed to using tabular . Multiple display types: Analysis consists of many displays of graphs, such as scatterplot matrices and different tasks such as discovering correlations Trellis displays, for analysis. Pivot Tables are a between variables, finding patterns in the data, common interface to large data warehouses. locating outliers, and uncovering structure. An Fig. 1 shows the user interface presented by Polaris. In analysis tool must be able to generate displays this example, the analyst has constructed a matrix of suited to each of these tasks. scatterplots showing sales versus profit for different . Exploratory interface: The analysis process is often product types in different quarters. The primary interaction an unpredictable exploration of the data. Analysts technique is to drag-and-drop fields from the database must be able to rapidly change what data they are schema onto shelves throughout the display. We call a viewing and how they are viewing that data. given configuration of fields on shelves a visual specification. Polaris addresses these demands by providing an inter- The specification determines the analysis and visualization face for rapidly and incrementally generating table-based operations to be performed by the system, defining: displays. In Polaris, a table consists of a number of rows, columns, and layers. Each table axis may contain multiple . The mapping of data sources to layers. Multiple data nested dimensions. Each table entry, or pane, contains a set sources may be combined in a single Polaris of records that are visually encoded as a set of marks to visualization. Each data source maps to a separate create a graphic. layer or set of layers. Several characteristics of tables make them particularly . The number of rows, columns, and layers in the effective for displaying multidimensional data: table and their relative orders (left to right as well as back to front). The database dimensions assigned to . Multivariate: Multiple dimensions of the data can be rows are specified by the fields on the y shelf, explicitly encoded in the structure of the table, columns by fields on the x shelf, and layers by fields enabling the display of high-dimensional data. on the layer (z) shelf. Multiple fields may be dragged . Comparative: Tables generate small-multiple displays onto each shelf to show categorical relationships. of information, which, as Tufte [38] explains, are . The selection of records from the database and the easily compared, exposing patterns and trends partitioning of records into different layers and across dimensions of the data. panes.

4.4 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 . The grouping of data within a pane and the into rows and columns, whereas quantitative fields will be computation of statistical properties, aggregates, spatially encoded as axes within the panes. and other derived fields. Records may also be sorted A valid expression in our algebra is an ordered sequence into a given drawing order. of one or more symbols with operators between each pair of . The type of graphic displayed in each pane of the adjacent symbols and with parentheses used to alter the table. Each graphic consists of a set of marks, one precedence of the operators. The operators in the algebra mark per record in that pane. are cross (Â), nest (/), and concatenation (+), listed in order . The mapping of data fields to retinal properties of of precedence. The precise semantics of each operator is the marks in the graphics. The mappings used for defined in terms of its effects on the operand sets. any given visualization are shown in a set of automatically generated legends. 4.1.1 Concatenation Analysts can interact with the resulting visualizations in The concatenation operator performs an ordered union of several ways. Selecting a single mark in a graphic by the sets of the two symbols: clicking on it pops up a detail window that displays user- A ‡ B ˆ fa1 ; . . . ; an g ‡ fb1 ; . . . ; bm g specified field values for the tuples corresponding to that mark. Analysts can draw rubberbands around a set of ˆ fa1 ; . . . ; an ; b1 ; . . . ; bm g marks to brush records. Brushing, discussed in more detail A ‡ P ˆ fa1 ; . . . ; an g ‡ fPg in Section 5.3, can be performed within a single table or ˆ fa1 ; . . . ; an ; Pg between multiple Polaris displays. P ‡ Q ˆ fPg ‡ fQg In the next section, we describe how the visual ˆ fP; Qg: specification is used to generate graphics. In Section 5, we describe the support Polaris provides for visually querying and transforming the data through filters, sorting, and data 4.1.2 Cross transformations. Then, in Section 6, we discuss how the The cross operator performs a Cartesian product of the sets visual specification is used to generate the database queries of the two symbols: and statistical analysis. A  B ˆ fa1 ; . . . ; an g  fb1 ; . . . ; bm g 4 GENERATING GRAPHICS ˆ fa1 b1 ; . . . ; a1 ; bm ; a2 b1 ; . . . ; a2 bn ; . . . ; The visual specification consists of three components: 1) the specification of the different table configurations, 2) the type an b1 ; . . . ; an bm g of graphic inside each pane, and 3) the details of the visual A  P ˆ fa1 ; . . . ; an g  P encodings. We discuss each of these in turn. ˆ fa1 ; P; . . . ; an Pg: 4.1 Table Algebra We need a formal mechanism to specify table configura- 4.1.3 Nest tions and we have defined an algebra for this purpose. The nest operator is similar to the cross operator, but it only When the analysts place fields on the axis shelves, as shown creates set entries for which there exist records with those in Fig. 1, they are implicitly creating expressions in this domain values. If we define R to be the dataset being algebra. analyzed, r to be a record, and A(r) to be the value of the A complete table configuration consists of three separate field A for the record r, then we can define the nest operator expressions in this table algebra. Two of the expressions as follows: define the configuration of the x and y axes of the table, partitioning the table into rows and columns. The third A=B ˆ fai bj j Wr P R st expression defines the z axis of the table, which partitions A…r† ˆ ai & B…r† ˆ bi g: the display into layers. The operands in this table algebra are the names of the The intuitive interpretation of the nest operator is ordinal and quantitative fields of the database. We will use ªB within A.º For example, given the fields quarter and A, B, and C to represent ordinal fields and P, Q, and R to month, the expression quarter/month would be interpreted as represent quantitative fields. We assign ordered sets to each those months within each quarter, resulting in three entries field symbol in the following manner: To ordinal fields, we for each quarter. In contrast, quarter  month would result in assign the members of the ordered domain of the field and, 12 entries for each quarter. to quantitative fields, we assign the single element set Using the above set semantics for each operator, every containing the field name. expression in the algebra can be reduced to a single set, with each entry in the set being an ordered concatenation of A ˆ domain…A† ˆ fa1 ; . . . ; an g zero or more ordinal values with zero or more quantitative P ˆ fPg: field names. We call this set evaluation of an expression the normalized set form. The normalized set form of an This assignment of sets to symbols reflects the difference expression determines one axis of the table: The table axis in how the two types of fields will be encoded in the is partitioned into columns (or rows or layers) so that there structure of the tables. Ordinal fields will partition the table is a one-to-one correspondence between set entries in the

5.STOLTE ET AL.: POLARIS: A SYSTEM FOR QUERY, ANALYSIS, AND VISUALIZATION OF MULTIDIMENSIONAL RELATIONAL DATABASES 5 Fig. 2. The graphical interpretation of several expressions in the table algebra. Each expression in the table algebra can be reduced to a single set of terms and that set can then be directly mapped into a configuration for an axis of the table. normalized set and columns. Fig. 2 illustrates the config- omy of graphics that results in an intuitive and concise urations resulting from a number of expressions. specification of graphic types. Analysts can also combine multiple data sources in a When specifying the table configuration, the user also single Polaris visualization. When multiple data sources are implicitly specifies the axes associated with each pane. We imported, each data source is mapped to a distinct layer (or have structured the space of graphics into three families by set of layers). While all data sources and all layers share the the type of fields assigned to their axes: same configuration for the x and y axes of the table, each . ordinal-ordinal, data source can have a different expression for partitioning . ordinal-quantitative, its data into layers. Fig. 3b and Fig. 3c illustrate the use of . quantitative-quantitative. layers to compose multiple distinct data sources into a Each family contains a number of variants depending on single visualization. how records are mapped to marks. For example, selecting a bar in an ordinal-quantitative pane will result in a bar chart, 4.2 Types of Graphics whereas selecting a line mark results in a line chart. The After the table configuration is specified, the next step is to mark set currently supported in Polaris includes the specify the type of graphic in each pane. One option, typical rectangle, circle, glyph, text, Gantt bar, line, polygon, and of most charting and reporting tools, is to have the user image. select a chart type from a predefined set of charts. Polaris Following Cleveland [12], we further structure the space allows analysts to flexibly construct graphics by specifying of graphics by the number of independent and dependent the individual components of the graphics. However, for variables. For example, a graphic where both axes encode this approach to be effective, the specification must balance independent variables is different than a graphic where one flexibility with succinctness. We have developed a taxon- axis encodes an independent variable and the other encodes

6.6 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 Fig. 3. (a) A standard Pivot Table except with graphical marks rather than textual numbers in each cell, showing the total sales in each state organized by product type and month. By using a graphical mark with the total sales encoded as the size of the circle, trends may be easier to spot than by scanning a table of numbers. (b) Gantt charts showing the correspondence between wars and major scientists, organized by country. Note that this display is using two different data sources, one for the wars and one for the scientists, each corresponding to a different layer. Because both data sources have the country and time dimensions, Polaris can be used to visually join the two data sets into a single display with common x and y table axes. (c) Maps showing flights across the United States. This display has multiple data sources corresponding to different layers. The ªUSAº data source contains data corresponding to the underlying state outlines. The ªAirportº data source contains the coordinates of numerous airports in the US. Finally, the ªFlightsº data source describes flights in the continental United States, including data about the region in which the flight originated. When a field that only appears in a single data set (flights) is used to partition a layered display, the data from the other data sources (airports and states) is replicated into each pane formed by the partitioning. Thus, each pane displays all states and airports, but only a subset of the flights. (d) A small-multiple display of line charts overlaying dot plots. Each chart displays the profit and sales over time for a hypothetical coffee chain, orgainized by state. The display is constructed using two layers, where each layer contains a copy of the same data set. The layers share the same table structure, but use different marks and data transformations. As a result, the line chart and dot plot can be at different levels of detail. In this case, each line chart shows average profit across all products per month, whereas the dot plot has an additional grouping transform specified that results in separate dots displaying average profit per product per month. a dependent variable (y ˆ f…x†). By default, dimensions of 4.2.1 Ordinal-Ordinal Graphics the database are interpreted as independent variables and The characteristic member of this family is the table, either measures as dependent variables. of numbers or of marks encoding attributes of the source Finally, the precise form of the data transformations, in records. particular, how records are grouped and whether aggre- In ordinal-ordinal graphics, the axis variables are gates are formed, can affect the type of graphic. Some typically independent of each other and the task is focused graphic types best encode a single record, whereas others on understanding patterns and trends in some function can encode an arbitrary number of records. f…Ox ; Oy † 3 R, where R represents the fields encoded in the We briefly discuss the defining characteristics of the retinal properties of the marks. This can be seen in Fig. 3a, three families within our categorization. where the analyst is studying sales and margin as a function

7.STOLTE ET AL.: POLARIS: A SYSTEM FOR QUERY, ANALYSIS, AND VISUALIZATION OF MULTIDIMENSIONAL RELATIONAL DATABASES 7 Fig. 4. The different retinal properties that can be used to encode fields of the data and examples of the default mappings that are generated when a given type of data field is encoded in each of the retinal properties. of product type, month, and state for the items sold by a 4.2.3 Quantitative-Quantitative Graphics hypothetical coffee chain. Fig. 7a presents another example Graphics of this type are used to understand the distribu- of an ordinal-ordinal graphic. In this figure, the analyst is tion of data as a function of one or both quantitative investigating the performance of a graphics rendering variables and to discover causal relationships between the library. The number of cache misses attributable to each two quantitative variables. Fig. 6a illustrates a matrix of line of source code has been encoded in the color of the line. scatterplot graphics used to understand the relationships The cardinality of the record set in each pane has little between a number of attributes of different products sold effect on the overall structure of the table. When there is by a coffee chain. more than one record per pane, multiple marks are Fig. 3c illustrates another example of a quantitative- shown in each display, with a one-to-one correspondence quantitative graphic: the map. In this figure, the analyst is of mark to record. The marks are stacked in a specified studying how flight scheduling varies with the region of the drawing order and the spatial placement of the marks country in which the flight originated. Data about a number within the pane conveys no additional information about of flights between major airports has been plotted as a the record's data. function of latitude and longitude; this data has been composed with two other layers that plot the location of 4.2.2 Ordinal-Quantitative Graphics airports and display the geography of each state as a Well-known representatives of this family of graphics are polygon. It is extremely rare to use this type of graph with a the bar chart, possibly clustered or stacked, the dot plot, and cardinality of one, not because it is not meaningful, but the Gantt chart. because the density of information in such a graphic is In an ordinal-quantitative graphic, the quantitative very low. variable is often dependent on the ordinal variable and the analyst is trying to understand or compare the proper- 4.3 Visual Mappings ties of some set of functions f…O† 3 Q. Fig. 6c illustrates a Each record in a pane is mapped to a mark. There are two case where a matrix of bar charts is used to study several components to the visual mapping. The first component, functions of the independent variables product and month. described in the previous section, determines the type of The cardinality of the record set does affect the structure of graphic and mark. The second component encodes fields of the graphics in this family. When the cardinality of the the records into visual or retinal properties of the selected record set is one, the graphics are simple bar charts or dot mark. The visual properties in Polaris are based on Bertin's plots. When the cardinality is greater than one, additional retinal variables [6]: shape, size, orientation, color (value structure may be introduced to accommodate the additional and hue), and texture (not supported in the current version records (e.g., a stacked bar chart). of Polaris). The ordinal and quantitative values may be independent Allowing analysts to explicitly encode different fields of variables, such as in a Gantt chart. Here, each pane the data to retinal properties of the display greatly enhances represents all the events in a category; each event has a the data density and the variety of displays that can be generated. However, in order to keep the specification type as well as a begin and end time. In Fig. 3b, major wars succinct, analysts should not be required to construct the over the last 500 years are displayed as Gantt charts, mappings. Instead, they should be able to simply specify categorized by country. An additional layer in that figure that a field be encoded as a visual property. The system displays pictures of major scientists plotted as a function of should then generate an effective mapping from the domain the independent variables country of birth and date of birth. of the field to the range of the visual property. These Fig. 7c shows a table of Gantt charts, with each chart mappings are generated in a manner similar to other displaying the thread scheduling and locking activity on a visualization systems. We discuss how this is done for the CPU within a multiprocessor computer. To support Gantt purpose of completeness. The default mappings are illu- charts, we need to support intervals as an additional type of strated in Fig. 4. field that exists in the meta-data only, allowing one field Shape: Polaris uses the set of shapes recommended by name to map to a pair of columns in the database. Cleveland for encoding ordinal data [11]. We have extended

8.8 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 this set of shapes to include several additional shapes to default aggregation operation is applied to a quantitative allow a larger domain of values to be encoded. measure when it is dragged to one of the x or y-axis (or Size: Analysts can use size to encode either an ordinal or layer); shelves and aggregation is enabled. The user can quantitative field. When encoding a quantitative domain as change which aggregation function is applied by right- size, a linear map from the field domain to the area of the clicking on the field and choosing a different aggregation mark is created. The minimum size is chosen so that all function from the popup context menu. Polaris can be easily visual properties of a mark with the minimum size can be extended to provide any statistical aggregate that can be perceived [23]. If an ordinal field is encoded as size, the generated from relational data. domain needs to be small, at most four or five values, so Counting of Ordinal Dimensions refers to the counting that the analyst can discriminate between different cate- of distinct values for an ordinal field within the data set. gories [6]. This aggregation function can be applied to an ordinal field Orientation: A key principle in generating mappings of by right-clicking on the field and choosing the CNT (count) ordinal fields to orientation is that the orientation needs to aggregation function. Unlike simple aggregation, applying vary by at least 30 between categories [23], thus constrain- the count operator will change the field type (to quantita- ing the automatically generated mapping to a domain of at tive) and thus change the table configuration and graph most six categories. For quantitative fields, the orientation type in each pane. varies linearly with the domain of the field. Discrete Partitioning is used to discretize a continuous Color: When encoding an ordinal domain, we use a domain. Polaris provides two discretization methods: predefined palette to select the color for each domain entry. binning and partitioning. Binning allows the analyst to The colors in the palette are well separated in the color specify a regular bin size in which to aggregate the data; spectrum, predominantly on hue [37]. We have ordered the binning will not change the graph type since the resulting colors to avoid adjacent colors with different brightness or derived field is also quantitative, just at the specified substantially different wavelengths in an attempt to include granularity. Partitioning allows the user to individually harmonious sets of colors in each palette [6], [23], [37]. We specify the size and name of each bin. Partitioning of a additionally reserve a saturated red for highlighting items quantitative field will result in an ordinal field, thus that have been selected or brushed. changing the graph types and table configuration. Binning When encoding a quantitative variable, it is important to is useful for creating graphs, such as histograms, in which vary only one psychophysical variable, such as hue or there are many regularly sized bins, while partitioning is value. The default palette we use for encoding quantitative useful for encoding additional categorizations into the data, data is the isomorphic colormap developed by Rogowitz either ad hoc or derived from known domain information. and Treinish [28]. Both can be applied by right-clicking on the field name and choosing either the ªBin by...º or ªPartition...º option. 5 DATA TRANSFORMATIONS AND VISUAL QUERIES Ad hoc Grouping is the ordinal version of quantitative partitioning, where the user can choose to group together The ability to rapidly change the table configuration, type of different ordinal values for the purpose of display and data graphic, and visual encodings used to visualize a data set is transformations. For example, a user may choose to group necessary for interactive exploration. However, it is not California and Florida together into an ªOrange providerº sufficient; additional interactivity is needed. The resulting partition. This type of arbitrary grouping and aggregation is display must be manipulable. The analysts must be able to powerful since it allows the analyst to add his own domain sort, filter, and transform the data to uncover useful knowledge to the analysis and to change the groupings as relationships and information and then they must be able to form ad hoc groupings and partitions that reflect this the exploration uncovers additional patterns. The user can newly uncovered information [5]. apply ad hoc grouping by right-clicking on the field name In this section, we describe four interaction techniques and choosing the ªPartition...º option. This transformation Polaris provides to support analysis within the resulting derives an ordinal field from an ordinal field and, thus, the visualizations: deriving additional fields, sorting and graph type does not change. filtering, brushing and tooltips, and undo and redo. Threshold Aggregation is the last type of derived field we support and it differs from the rest since it is derived 5.1 Deriving Additional Fields from two source fields: an ordinal field and a quantitative While analyzing data, one of the most important interac- field. If the quantitative field is less than a certain tions needed is the ability to derive additional fields from threshold value for any values of an ordinal field, those the original data. Typically, these generated fields are values are aggregated together to form an ªOtherº aggregates or statistical summaries. Polaris currently category. This allows the user to specify threshold values provides five methods for deriving additional fields: simple below which the data is considered uninteresting. One aggregation of quantitative measures, counting of distinct challenge in supporting threshold aggregation is that it values in ordinal dimensions, discrete partitioning of can require two aggregation passes if the quantitative quantitative measures, ad hoc grouping within ordinal field desired is itself a derived field (e.g., the average of dimensions, and threshold aggregation. the quantitative profit field). Simple Aggregation refers to basic aggregation opera- Adding derived fields on the fly is necessary as part of tions, such as summation, average, minimum, and max- the exploration and analysis process. As the analyst imum, that are applied to a single quantitative field. A explores the data, relationships within the data are

9.STOLTE ET AL.: POLARIS: A SYSTEM FOR QUERY, ANALYSIS, AND VISUALIZATION OF MULTIDIMENSIONAL RELATIONAL DATABASES 9 discovered and groupings can be used to encode and reflect 5.3 Brushing and Tooltips this discovered information. Furthermore, aggregations and Many times, when exploring the data, analysts want to statistics can be used to hide uninteresting data and to directly interact with the data, visually querying the data to reduce large data sets in size so that they are tractable for highlight correlated marks or getting more details on understanding and analysis. demand. Polaris provides both brushing and tooltips for this purpose. 5.2 Sorting and Filtering Brushing allows the user to choose a set of interesting Sorting and filtering data play a key role in analysis. It is data points by drawing a rubberband around them. The often only by reordering rows and columns and filtering user selects a single field whose values are then used to out unimportant information that patterns and trends in the identify related marks and tuples. All marks corresponding data emerge. Polaris provides extensive support for both of to tuples sharing selected field values with the selected these analysis techniques. tuples are subsequently highlighted in all other panes or Filtering allows the user to choose which values to linked Polaris views. Brushing allows the analyst to choose display so that he can focus on and devote more screen data in one display and highlight it in other displays, space and attention to the areas of interest. For all fields, the allowing correlation between different projections of the user can right-click on the field name to bring up a menu same data set or relationships between distinct data sets. and choose the ªFilter...º option. Tooltips allow the user to get more details on demand. If For ordinal fields, a listbox with all possible values is the user hovers over a data point or pane, additional details, such as specific field values for the tuple corresponding to shown and the user can check or uncheck each value to the selected mark, are shown. Analysts can use tooltips to display it or not. For quantitative fields, a dynamic query understand the relationship between the graphical marks slider [1] allows the user to choose a new domain. and the underlying data. Additionally, there are textboxes showing the chosen minimum and maximum values that the user can use to 5.4 Undo and Redo directly enter a new domain. In using Polaris, we The final interaction technique we provide in Polaris is discovered that we needed to provide a slightly larger unlimited undo and redo within an analysis session. Users domain than the actual data domain for the user to select can use the ªBackº and ªForwardº buttons on the top from since the user may want some buffer space in the toolbar to either return to a previous visual specification or graphs. to move forward again. This functionality is critical for Note that applying a filter changes the interpretation of interactive exploration since it allows the user to quickly the field in the algebra. For ordinal fields, it reduces the back out of changes (or redo them if he goes too far back) domain to just the filtered values, rather than all values. It and try a different exploration path without having to does not change how a quantitative field is interpreted in rebuild the desired starting point from scratch. Support for the table algebra. undo also promotes more experimentation and exploration Sorting allows the user to to uncover hidden patterns as there is no fear of losing work done thus far. If the user and trends and to group together interesting values by does want a clean canvas, Polaris also provides a ªClearº button. changing the order of values within a field's domain or the By using the formalism and visual specification, im- ordering of tuples in the data. Changes to the ordering of plementing Undo and Redo is trivial. We simply need to values within a field's domain can affect the ordering of the save the visual specification at each stage and moving panes within a table, the ordering of values along an axis backward or forward is done by simply updating the (such as in a bar chart), and the composite ordering of display to reflect the saved visual specification. layers. The ordering of tuples affects the drawing order of marks within a pane. The drawing order is most relevant in graphs where a single primitive encodes multiple tuples, 6 GENERATING DATABASE QUERIES such as a line or polygon primitive, or where marks overlap In addition to determining the appearance of the visualiza- and the drawing order thus determines the front-to-back tion, the visual specification also generates queries to the ordering and occlusion of marks. database that 1) select subsets of the data for analysis, then Polaris provides three ways for a user to sort the domain. 2) filter, sort, and group the results into panes, and then, First, the user can bring up the filter window and drag-and- finally, 3) group, sort, and aggregate the data before passing drop the values within that window to reorder the domain. it to the graphics encoding process. Second, if the field has been used to partition the table into Fig. 5 shows the overall data flow in Polaris. We can rows or columns, the user can drag-and-drop the table row precisely describe the transformations in each of the three phases using SQL queries. or column headers to reorder the domain values. Finally, Polaris provides programmatic sorting, allowing the user to 6.1 Step 1: Selecting the Records sort one field based on the value in another field. For The first phase of the data flow retrieves records from the example, the user may want to sort the State field by the database, applying user-defined filters to select subsets of Profit field. The ordering of tuples within a pane is the database. determined by which fields the analyst has placed in the For an ordinal field A, the user may specify a subset of the Sort shelf. domain of the field as valid. If filter(A) is the user-selected

10.10 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 Fig. 5. The transformations and data flow within Polaris. The visual specification generates queries to the database to select subsets of the data for analysis, then to filter, sort, and group the results into panes, and then, finally, to group, sort and aggregate the data within panes. subset, then a relational predicate expressing the filter for Given these definitions, the records to be partitioned into A is: the pane at the intersection of the ith row, the jth column, and the kth layer can be retrieved with the following query: A in filter …A†: SELECT à For a quantitative field P, the user may define a subset of WHERE fRow…i† and Column…j† and the field's domain as valid. If min(P) and max(P) are the user-defined extents of this subset, then a relational Layer…k†g: predicate expressing the filter for P is: To generate the groups of records corresponding to each of the panes, we must iterate over the table, executing this …P ! min …P† and P max …P††: SELECT statement for each pane. There is no standard SQL We can define the relational predicate filters as the statement that enables us to perform this partitioning in a conjunction of all of the individual field filters. Then, the single query. We note that this same problem motivated the first stage of the data transformation network is equivalent CUBE [16] operator; we will revisit this issue in the to the SQL statement: discussion section. SELECT à 6.3 Step 3: Transforming Records within the Panes WHERE ffiltersg: The last phase of the data flow is the transformation of the records in each pane. If the visual specification includes aggregation, then each measure in the database schema 6.2 Step 2: Partitioning the Records into Panes must be assigned an aggregation operator. If the user has The second phase of the data flow partitions the retrieved not selected an aggregation operator for a measure, that records into groups corresponding to each pane in the table. measure is assigned the default aggregation operator As we discussed in Section 4.1, the normalized set form of the (SUM). We define the term aggregates as the list of the table axis expressions determines the table configuration. aggregations that need to be computed. For example, if the The table is partitioned into rows, columns, and layers database contains the quantitative fields Profit, Sales, and corresponding to the entries in these sets. Payroll and the user has explicitly specified that the average The ordinal values in each set entry define the criteria by of Sales should be computed, then aggregates is defined as: which records will be sorted into each row, column, and aggregates à layer. Let Row(i) be the predicate that represents the SUM…Profit†; AVG…Sales†; SUM…Payroll†: selection criteria for the ith row, Column(j) be the predicate for the jth column, and Layer(k) the predicate for the kth Aggregate field filters (for example, SUM(Profit) > 500) layer. For example, if the y-axis of the table is defined by the could not be evaluated in Step 1 with all of the other filters normalized set: because the aggregates had not yet been computed. Thus, those filters must be applied in this phase. We define the fa1 b1 P; a1 b2 P; a2 b1 P; a2 b2 Pg; relational predicate filters as in Step 1 for unaggregated then there are four rows in the table, each defined by an fields. Additionally, we define the following lists: entry in this set, and Row would be defined as: G: the field names in the grouping shelf, Row…1† ˆ …A ˆ a1 and B ˆ b1 † S: the field names in the sorting shelf, and Row…2† ˆ …A ˆ a1 and B ˆ b2 † dim: the dimensions in the database. Row…3† ˆ …A ˆ a2 and B ˆ b1 † The necessary transformation can then be expressed by the Row…4† ˆ …A ˆ a2 and B ˆ b2 †: SQL statement:

11.STOLTE ET AL.: POLARIS: A SYSTEM FOR QUERY, ANALYSIS, AND VISUALIZATION OF MULTIDIMENSIONAL RELATIONAL DATABASES 11 SELECT fdimg; faggregatesg GROUP BY fGg HAVING ffiltersg ORDER BY fSg: If no aggregate fields are included in the visual specifica- tion, then the remaining transformation simply sorts the records into drawing order: SELECT à ORDER BY fSg: 7 RESULTS Polaris is useful for performing the type of exploratory data analysis advocated by statisticians such as Bertin [5] and Cleveland [12]. We demonstrate the capabilities of Polaris as an exploratory interface to multidimensional databases by considering the following two scenarios, performed using a Polaris prototype implemented in the Rivet visualization environment [9]. 7.1 Scenario 1: Commercial Database Analysis The chief financial officer (CFO) of a national coffee store chain has just been told to cut expenses. To get an initial understanding of the situation, the CFO creates a table of scatterplots showing the relationship between marketing costs and profit categorized by product type and market (Fig. 6a). After studying the graphics, the CFO notices an interesting trend: Certain products have high marketing costs with little or no return in the form of profit. To further investigate, the CFO creates two linked displays: a table of scatterplots showing profit and market- ing costs for each state and a text table itemizing profit by product and state (Fig. 6b). The two views are linked by the state field: If records are selected in either display, then all records with the same state value as the selected records are highlighted. The CFO is able to use these linked views to determine that, in New York, several products are offering very little return despite high expenditures. The CFO then creates a third display (Fig. 6c): a set of bar charts showing profit, sales, and marketing for each product sold in New York, broken down by month. In this view, the CFO can clearly see that Caffe Mocha's profit Fig. 6. An example scenario demonstrating the capabilities of Polaris for margin does not justify its marketing expenses. With this exploratory analysis of multidimensional databases. The data displayed data, the CFO can change the company's marketing and is for a hypothetical coffee store chain and the analyst is searching for sales strategies in this state. ways to reduce the company's expenses. 7.2 Scenario 2: Computer Systems Analysis visualization constructed to display this data. The visua- At Stanford, researchers developing Argus [21], a parallel lization is composed of two linked Polaris instances. One graphics library, found that its performance had linear displays a histogram of cache misses by virtual address, speedup when using up to 31 processors, after which its the other displays source-code, with each line's hue performance diminished rapidly. Using Polaris, we encoding the number of cache misses suffered by that recreate the analysis they performed using a custom-built line. Upon seeing these displays, they could tell that visualization tool [8]. memory was not in fact the problem. Initially, the developers hypothesized that the dimin- The developers next hypothesized that lock contention ishing performance was a result of too many remote might be a problem, so they reran Argus and collected memory accesses, a common performance problem in detailed lock and scheduling information. The data is parallel programs. They collected and visualized detailed shown in Fig. 7b using two instances of Polaris to create a memory statistics to test this hypothesis. Fig. 7a shows a composite visualization with two linked projections of the

12.12 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 lock requests correspond to descheduled periods for most processes. One process, however, has a descheduled period corresponding to a period during which the lock was held. This behavior, which was due to a bug in the operating system, was the source of the performance issues. 7.3 Summary These two examples illustrate several important points about the exploratory process. Throughout the analyses, both what the data users want to see and how they want to see it change continually. Analysts first form hypoth- eses about the data and then create new views to perform tests and experiments to validate or disprove those hypotheses. Certain displays enable an understanding of overall trends, whereas others show causal relationships. As the analysts better understand the data, they may want to drill-down in the visible dimensions or display entirely different dimensions. Polaris supports this exploratory process through its visual interface. By formally categorizing the types of graphics, Polaris is able to provide a simple interface for rapidly generating a wide range of displays. This allows analysts to focus on the analysis task rather than the steps needed to retrieve and display the data. 8 DISCUSSION In this section, we focus on three points of discussion. First, we discuss how our work compares to that of Wilkinson [41], second, the interpretation of our visual specifications as database queries, and, finally, the interactivity and performance of Polaris. Several of the ideas in our specification are extensions of Wilkinson's [41] efforts to develop a grammar for statistical graphics. His grammar encapsulates both the statistical transformation of datasets and their mapping to graphic representations. The primary distinctions between Wilkinson's system and ours arise because of differences in the data models. We chose to focus on developing a tool for multidimensional relational databases and we decided to build as much of the system as possible using relational algebra. All of the data transformations required by our visual specifications can be precisely interpreted as standard SQL queries to OLAP Fig. 7. A scenario demonstrating how researchers use Polaris to analyze servers. Wilkinson instead intentionally uses a data model the performance problems of a parallel graphics library. that is not relational, citing shortcomings in the relational model's support for statistical analysis. Consequently, his same data. One projection shows a scatterplot of the start specification defines operations and function in terms of his own data model, consisting of variable sets and indexed cycle versus cycle duration for the lock events (requests and variables. holds). The second shows a histogram over time of initiated The differences in design are most apparent in the table lock events. The scatterplot shows that toward the end of algebra. As in our system, Wilkinson's table algebra the run, the duration of lock events (both holds and performs two functions: it provides database services such requests) were taking an unexpectedly long time. That as set operations and it specifies the layout of the tables and observation correlated with the histogram showing that the graphs. Since we use relational algebra for all our database number of lock requests peaked and then tailed off toward services, our algebra is different. For example, his blend the end of the run, indicating that this might be a fruitful operator performs both set union and may partition the area for further investigation. axes of a table; our concatenation operation is different since A third visualization, shown in Fig. 7c, shows the same it just performs partitioning. Another difference is in his data using Gantt charts to display both lock events and cross and nest operators: cross generates a 2D graphic and process scheduling events. This display shows that the long nest only a 1D graphic. We use a different mechanism

13.STOLTE ET AL.: POLARIS: A SYSTEM FOR QUERY, ANALYSIS, AND VISUALIZATION OF MULTIDIMENSIONAL RELATIONAL DATABASES 13 (shelves) to specify axes of the graphic. Overall, whether performance of the queries, including existing techniques our system is better than Wilkinson's is hard to judge such as materialized views [39], progressively refined completely, and will require more experience using the queries that provide intermediate feedback [19], and systems to solve practical problems. One major advantage sampled queries [17], [24]. We also think that substantial of our approach is that it leverages existing database performance benefits can be gained by leveraging the systems and as a result was very easy to implement. coherence between successive queries generated by visua- Another interesting issue is the interpretation of our lization systems using both caching and prefetching. visual specifications as database queries. When database queries are generated from the visual specifications in 9 CONCLUSIONS AND FUTURE WORK Polaris, it is necessary to generate a SQL query per table pane. This problem is similar to the one that motivated Gray We have presented Polaris, an interface for the exploration et al. to develop the CUBE operator [16]. The CUBE and analysis of large multidimensional databases. Polaris operator generalizes the queries necessary to develop extends the well-known Pivot Table interface to display cross-tab and Pivot Table displays of relational data into a relational query results using a rich, expressive set of single, more efficient operator. However, the CUBE graphical displays. A second contribution of this system is a operator cannot be applied in our situation because it succinct visual specification for describing table-based assumes that the sets of relations partitioned into each table graphical displays of relational data and the interpretation pane do not overlap. In several possible Polaris table of these visual specifications as a precise sequence of configurations, such as scatterplot matrices, there can be relational database operations. considerable overlap between the relations partitioned into We have many plans for future work in extending this each pane. One can imagine generalizing the CUBE system. As stated above, one area of future work is operator to handle these overlapping partitions. exploring database performance issues. A related area is Another major limitation of the CUBE operator is its expanding Polaris to expose the hierarchical structure of method for computing aggregates. Usually, only aggregates data cubes. Each level of the hierarchy can be thought of as based on sums are allowed. More complex aggregation a different level of detail. One idea, a la Pad++ [4], is to operators requiring ranking, such as computation of change the visual representation as we change the level of medians and modes, are not part of the current specifica- detail; the visual representation will be determined in part tion, although they are available in some commercial by the available screen space. In order to make this type of systems. These operators are very useful for data mining system interactive, especially given the large quantities of applications. data involved, we are also exploring prefetching and A final point of discussion is the interactivity and caching strategies to achieve real-time interactivity. performance of Polaris. Although Polaris is designed to be Another area of future work is to leverage the direct an interactive and exploratory interface to large data ware- correspondance of graphical marks in Polaris to tuples in houses, our research has focused on the techniques, seman- the relational databases in order to generate database tables tics, and formalism needed to provide an effective from a selected set of graphical marks. This technique can exploratory interface rather than on attaining interactive be used to develop lenses, similar to the Magic Lens [7], that query times. While we would like the system to be reasonably can perform much more complex transformations because responsive as the user modifies the visual specification, our they operate in data space rather than image space. This experience has been that the query response time does not technique can also be used to compose Polaris displays, need to be real-time in order to maintain a feeling of using a selected mark set in one display as the data input to exploration: The query can even take several tens of seconds. another. We are exploring these techniques and believe it is Within this constraint, Polaris can currently be used with possible to develop a relational spreadsheet by composing many large databases, especially if a large subset of the views Polaris displays in this manner. Extending the x, y, and layer shelves to include an can be materialized a priori [39]. Furthermore, it is important animation shelf would enable analysts to partition the to note that many queries on data warehouses, such as those data on those fields and create animated displays that generated with existing Pivot Table tools, will be returning a sequence through the data. For example, in the coffee small number of tuples and, thus, the most relevant constraint chain data set, dropping the Month field on the on performance is server-side query time and not client-side animation shelf would create an animation showing drawing or data manipulation. how the data changes over time. We have used Polaris with two reasonably large data sets: 1) a subset of a packet trace of a mobile network over a ACKNOWLEDGMENTS 13 week period [35] that has over 6 million tuples and 2) a subset of the data collected from the Sloan Digital Sky The authors especially thank Robert Bosch for his contribu- Survey (approx. 650 MB), both stored in Microsoft's tion to the design of Polaris, his review of manuscript drafts, and for many useful discussions. The authors also SQLServer. With both data sets, we are able to get thank Maneesh Agrawala for his reviews and discussions reasonably responsive performance and maintain a sense on early drafts of this paper. The air traffic data is courtesy of exploration for the queries we ran. We intend to pursue of William F. Eddy and Shingo Oue. The coffee chain data is database performance issues in order to scale to much courtesy of Stephen Eick and Visual Insights. This work was larger data sets as part of our future research. There are supported by the US Department of Energy through the many techniques that can be used to improve the ASCI Level 1 Alliance with Stanford University.

14.14 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 REFERENCES [29] S.F. Roth, J. Kolojejchick, J. Mattis, and J. Goldstein, ªInteractive Graphic Design Using Automatic Presentation Knowledge,º Proc. [1] C. Ahlberg, C. Williamson, and B. Shneiderman, ªDynamic SIGCHI '94. pp. 112-117, Apr. 1994. Queries for Information Exploration: An Implementation and [30] S.F. Roth, P. Lucas, J.A. Senn, C.C. Gomberg, M.B. Burks, P.J. Evaluation,º Proc. SIGCHI 1992, p. 619, 1992. Stroffolino, J. Kolojejchick, and C. Dunmire, ªVisage: A User [2] A. Aiken, J. Chen, M. Stonebraker, and A. Woodruff, ªTioga-2: A Interface Environment for Exploring Information,º Proc. Informa- Direct Manipulation Database Visualization Environment,º Proc. tion Visualization, pp. 3-12, Oct. 1996. 12th Int'l Conf. Data Eng., pp. 208-217, Feb. 1996. [31] ªSloan Digital Sky Survey,ºavailable: http://www.sdss.org/, [3] R. Becker, W.S. Cleveland, and R.D. Martin, ªTrellis Graphics Sept. 2001. Displays: A Multi-Dimensional Data Visualization Tool for Data [32] M. Spenke, C. Beilken, and T. Berlage, ªFOCUS: The Interactive Mining,º Proc. Third Ann. Conf. Knowledge Discovery in Databases, Table for Product Comparison and Selection,º Proc. ACM Symp. Aug. 1997. User Interface Software and Technology, Nov. 1996. [4] B.B. Bederson, L. Stead, and J.D. Hollan, ªPad++: Advances in [33] Spotfire Inc., available: http://www.spotfire.com, Sept. 2001. Multiscale Interfaces,º Proc. SIGCHI, 1994. [34] S.S. Stevens, ªOn the Theory of Scales of Measurement,º Science, [5] J. Bertin, Graphics and Graphic Information Processing. Berlin: Walter vol. 103, pp. 677-680, year? de Gruyter, 1980. [35] D. Tang and M. Baker, ªAnalysis of a Local-Area Wireless [6] J. Bertin, Semiology of Graphics. Madison, Wis.: Univ. of Wisconsin Network,º Mobicom, 2000. Press, 1983. Translated by W.J. Berg. [36] E. Thomsen, OLAP Solutions: Building Multidimensional Information [7] E.A. Bier, M. Stone, K. Pier, W. Buxton, and T. DeRose, ªToolglass Systems. New York: Wiley Computer Publishing, 1997. and Magic Lenses: The See-Through Interface,º SIGGRAPH '93 [37] D. Travis, Effective Color Displays: Theory and Practice. London: Proc., pp. 73-80, Aug. 1993. Academic Press, 1991. [8] R. Bosch, C. Stolte, G. Stoll, M. Rosenblum, and P. Hanrahan, [38] E.R. Tufte, The Visual Display of Quantitative Information. Chesire, ªPerformance Analysis and Visualization of Parallel Systems Conn.: Graphics Press, 1983. Using SimOS and Rivet: A Case Study,º Proc. Sixth Ann. Symp. [39] J. Ullman, ªEfficient Implementation of Data Cubes via Materi- High-Performance Computer Architecture, p. 360-371, Jan. 2000. alized Views,º Proc. Knowledge Discovery in Databases, 1996. [9] R. Bosch, C. Stolte, D. Tang, J. Gerth, M. Rosenblum, and P. [40] M. Ward, ªXmdvTool: Integrating Multiple Methods for Visualiz- Hanrahan, ªRivet: A Flexible Environment for Computer Systems ing Multivariate Data,º Proc. Visualization, pp. 326-331, Oct. 1994. Visualization,º Computer Graphics, pp. 68-73, Feb. 2000. [41] L. Wilkinson, The Grammar of Graphics. New York: Springer, 1999. [10] A. Buja, D. Cook, and D.F. Swayne, ªInteractive High-Dimen- sional Data Visualization,º J. Computational and Graphical Statistics, Chris Stolte received the BSc degree in vol. 5, no. 1, pp. 78-99, 1996. computer science in 1996 from Simon Fraser [11] W.S. Cleveland, The Elements of Graphing Data. Pacific Grove, University. He is currently a PhD student in the Calif.: Wadsworth Advanced Books and Software, 1985. Department of Computer Science at Stanford [12] W.S. Cleveland, Visualizing Data. Hobart Press, 1993. University. His research interests include infor- [13] M. Derthick, J. Kolojejchick, and S.F. Roth, ªAn Interactive mation visualization and computer graphics. His Visualization Environment for Data Exploration,º Proc. Knowledge current research projects include the Rivet Discovery in Databases, pp. 2-9, Aug. 1997. project on visualizations for studying computer [14] S. Eick, ªVisualizing Multi-Dimensional Data,º Computer Graphics, systems and the Polaris project on visual pp. 61-67, Feb. 2000. interfaces for the exploration and analysis of [15] Microsoft ExcelÐUser's Guide. Redmond, Wash.: Microsoft, 1995. large relational databases. [16] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, H. Pirahesh, and F. Pellow, ªData Cube: A Relational Diane Tang received the AB degree in compu- Aggregation Operator Generalizing Group-By, Cross-Tab, and ter science in 1995 from Harvard/Radcliffe Sub-Totals,º Proc. 12th Int'l Conf. Data Eng., pp. 152-159, Feb. 1996. University and the MS and PhD degrees in [17] P.B. Gibbons, Y. Matias, and V. Poosala, ªAqua Project White computer science in 2000 from Stanford Uni- Paper,º technical report, Bell Laboratories, Murray Hill, N.J., Dec. versity. She is a research associate in the 1997. Department of Computer Science at Stanford [18] J.A. Hartigan, ªPrinter Graphics for Clustering,º J. Statistical University. Her interests include information Computation and Simulation, vol. 4, pp. 187-213, year? visualization, especially with regards to level- [19] J.M. Hellerstein, P.J. Haas, and H.J. Wang, ªOnline Aggregation,º of-detail issues, mobile networking, and distrib- Proc. ACM SIGMOD, pp. 171-82, June 1997. uted systems. She is currently working on the [20] ªHuman Genome Project,ºavailable at http://www.ornl.gov/ Rivet and Polaris projects on interactive visual exploration of large hgmis/project/about.html, Sept. 2001. datasets. She is a recipient of the NPSC Fellowship. [21] H. Igehy, G. Stoll, and P. Hanrahan, ªThe Design of a Parallel Graphics Interface,º Proc. SIGGRAPH 1998, pp. 141-150, 1998. Pat Hanrahan is the CANON USA Professor of [22] D. Keim and H.-P. Kriegel, ªVisDB: Database Exploration Using Computer Science and Electrical Engineering at Multidimensional Visualization,º IEEE Computer Graphics and Stanford University, where he teaches computer Applications, vol. 14, no. 5, pp. 40-49, 1994. graphics. His current research involves visuali- [23] S.M. Kosslyn, Elements of Graph Design. New York: W.H. Freeman zation, image synthesis, and graphics systems and Co., 1994. and architectures. Before joining Stanford, he [24] R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri, was a faculty member at Princeton University. ªEfficient Sampling Strategies for Relational Database Opera- He has also worked at Pixar, where he devel- tions,º Theoretical Computer Science, vol. 116, nos. 1-2, pp. 195-226, oped volume rendering software and was the chief architect of the RenderMan(TM) Interfa- 1993. ceÐa protocol that allows modeling programs to describe scenes to [25] M. Livny, R. Ramakrishnan, K. Beyer, G. Chen, D. Donjerkovic, S. high quality rendering programs. Prior to working for Pixar, he directed Lawande, J. Myllymaki, and K. Wenger, ªDEVise: Integrated the 3D computer graphics group in the Computer Graphics Laboratory at Querying and Visual Exploration of Large Datasets,º Proc. ACM the New York Institute of Technology. Professor Hanrahan has received SIGMOD, May 1997. three university teaching awards. In 1993, he received an Academy [26] J.D. Mackinlay, ªAutomating the Design of Graphical Presenta- Award for Science and Technology, the Spirit of America Creativity tions of Relational Information,º ACM Trans. Graphics, pp. 110-141, Award, the SIGGRAPH Computer Graphics Achievement Award, and Apr. 1986. he was recently elected to the National Academy of Engineering. [27] R. Rao and S. Card, ªThe Table Lens: Merging Graphical and Symbolic Representations in an Interactive Focus+Context Visua- lization for Tabular Information,º Proc. SIGCHI '94, pp. 318-322, 1994. . For more information on this or any computing topic, please visit [28] B. Rogowitz and L. Treinish, ªHow NOT to Lie with Visualiza- our Digital Library at http://computer.org/publications/dlib. tion,º Computers in Physics, pp. 268-274, May/June 1996.