Voyager

In this paper, we seek to complement manual chart construction with interactive navigation of a gallery of automatically-generated visualizations. We contribute Voyager, a mixed-initiative system that supports faceted browsing of recommended charts chosen according to statistical and perceptual measures. We describe Voyager’s architecture, motivating design principles, and methods for generating and interacting with visualization recommendations. In a study comparing Voyager to a manual visualization specification tool, we find that Voyager facilitates exploration of previously unseen data and leads to increased data variable coverage. We then distill design implications for visualization tools, in particular the need to balance rapid exploration and targeted question-answering.
展开查看详情

1. Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer Fig. 1. Voyager: a recommendation-powered visualization browser. The schema panel (left) lists data variables selectable by users. The main gallery (right) presents suggested visualizations of different variable subsets and transformations. Abstract—General visualization tools typically require manual specification of views: analysts must select data variables and then choose which transformations and visual encodings to apply. These decisions often involve both domain and visualization design expertise, and may impose a tedious specification process that impedes exploration. In this paper, we seek to complement manual chart construction with interactive navigation of a gallery of automatically-generated visualizations. We contribute Voyager, a mixed-initiative system that supports faceted browsing of recommended charts chosen according to statistical and perceptual measures. We describe Voyager’s architecture, motivating design principles, and methods for generating and interacting with visualization recommendations. In a study comparing Voyager to a manual visualization specification tool, we find that Voyager facilitates exploration of previously unseen data and leads to increased data variable coverage. We then distill design implications for visualization tools, in particular the need to balance rapid exploration and targeted question-answering. Index Terms—User interfaces, information visualization, exploratory analysis, visualization recommendation, mixed-initiative systems 1 I NTRODUCTION Exploratory visual analysis is highly iterative, involving both open- ine each variable before investigating relationships between them [28], ended exploration and targeted question answering [16, 37]. Yet making in practice they may fail to do so due to premature fixation on specific visual encoding decisions while exploring unfamiliar data is non-trivial. questions or the tedium of manual specification. Analysts may lack exposure to the shape and structure of their data, or The primary interaction model of many popular visualization tools begin with vague analysis goals. While analysts should typically exam- (e.g., [35, 44, 45]) is manual view specification. First, an analyst must select variables to examine. The analyst then may apply data • Kanit Wongsuphasawat, Dominik Moritz, Bill Howe, and Jeffrey Heer are transformations, for example binning or aggregation to summarize with University of Washington. E-mail: the data. Finally, she must design visual encodings for each resulting {kanitw,domoritz,billhowe,jheer}@cs.washington.edu. variable set. These actions may be expressed via code in a high-level • Anushka Anand and Jock Mackinlay are with Tableau Research. E-mail: language [44] or a graphical interface [35]. While existing tools are {aanand, jmackinlay}@tableau.com. well suited to depth-first exploration strategies, the design of tools for breadth-oriented exploration remains an open problem. Here we focus Manuscript received 31 Mar. 2015; accepted 1 Aug. 2015; date of publication on tools to assist breadth-oriented exploration, with the specific goal of xx Aug. 2015; date of current version 25 Oct. 2015. promoting increased coverage of a data set. For information on obtaining reprints of this article, please send e-mail to: tvcg@computer.org. To encourage broad exploration, visualization tools might automati- cally generate a diverse set of visualizations and have the user select

2.among them. However, for any given data table the choice of variables, attempts to produce relevant and perceptually effective views based on transformations and visual encodings leads to a combinatorial explo- a user’s current exploration state. sion. Appropriate filtering and recommendation strategies are needed 2.2 Tools for Visualization Construction to prune the space and promote relevant views. Further, automation is unlikely to succeed on its own: as exploration proceeds, users will Visualization tools offer various levels of expressivity for view construc- inevitably wish to focus on specific aspects of the data, requiring a tion. Chart typologies, such as the templates provided by spreadsheet browser that enables interactive steering of recommendations. programs, are a common form of specification. While easy to use, they We present Voyager, a mixed-initiative system that couples faceted typically support a limited range of charts and provide little support for browsing with visualization recommendation to support exploration of iterative view refinement, a crucial component of EDA. multivariate, tabular data. Voyager exchanges specification for brows- Visualization toolkits (e.g., [5, 6]) and design tools (e.g., [30, 32]) en- ing, providing an organized display of recommended visualizations and able intricate designs but require detailed specification, hindering rapid enabling user input for both chart refinement and recommendation steer- exploration. Higher-level grammars, such as Wilkinson’s Grammar of ing. To enable breadth-oriented exploration, Voyager privileges data Graphics [44, 45], can generate a wide-range of statistical graphics, but variation (different variable selections and transformations) over design still require textual specification. variation (different visual encodings of the same data). Underlying On the other hand, Tableau (formerly Polaris) [35] enables similar Voyager is the Compass recommendation engine, which enumerates, specification of visualizations using a graphical interface. Users drag- clusters and ranks visualizations according to both data properties and and-drop data variables onto visual encoding “shelves”; the system perceptual principles. then translates these actions into a high-level grammar (VizQL), en- Voyager and Compass describe visualizations using Vega-lite, a abling rapid view creation for targeted exploration of multidimensional new high-level specification language. Following in the footsteps of databases. Voyager adopts a similar grammar-based approach to rep- the Grammar of Graphics [44, 45] and Tableau’s VizQL [35], Vega- resent visualizations; however, it automatically generates views and lite provides a convenient formalism for enumeration and reasoning allows users to browse a gallery of recommended views. of visualization designs. It also enables hand-offs between different 2.3 Visualization Recommendation visualization tools (e.g., for breadth- or depth-oriented exploration). Much existing research on visualization recommendation focuses on In this paper we describe Voyager’s motivating design principles, suggesting visual encodings for an ordered set of user-specified data interface design, and system architecture. We also present a controlled variables. Mackinlay’s APT [24] proposes a compositional algebra user study focused on exploratory analysis of previously unseen data. to enumerate the space of encodings. It then applies a set of expres- We compare Voyager with PoleStar, a state-of-the-art view specification siveness and effectiveness criteria based on the work of Bertin [4] and tool modeled on Tableau. Through analysis of both user performance Cleveland [8] to prune and rank the set of visualizations. Sage [31] and preference ratings, we find that Voyager better facilitates initial ex- extends APT with a taxonomy of data properties for recommending ploration and leads to increased data variable coverage, while PoleStar visualizations. Tableau’s Show Me [25] introduces a set of heuristics to is preferable for answering more specific questions. We discuss re- aid in the construction of small multiples and recommend chart types. sulting implications for visualization tools, in particular the need to Voyager draws on this line of work, for example using expressiveness integrate rapid exploration and targeted question-answering. and effectiveness criteria to evaluate visual encoding options. Voyager The systems described in this paper are all available as open-source extends this prior research by contributing methods for also recom- software. In addition to the contributions of the present work, we hope mending data variables and transformations, and enabling interactive these components will provide a shared platform for continued research browsing and refinement of multiple recommendations. on visual analysis and visualization recommendation tools. After creating valid views, some tools [33, 46] rank views based on statistical properties to recommend interesting relationships between 2 R ELATED W ORK variables in the dataset. Other tools like SemViz [10] and VISO [41] recommend data to visualize using knowledge ontologies from the se- Voyager draws on and extends prior research on exploratory search mantic web. They rely on data having extensive semantic labels, which interfaces, visualization tools, and automated visualization design. may not always be available. Other systems [7, 11, 48] recommend visualizations based on analytical tasks and handle a small number of 2.1 Exploratory Search predefined tasks by design. Inferring the user’s task or asking the user Voyager is partly inspired by work on exploratory search [26, 43], to select one may preempt the iterative examination process at the heart which shares a number of characteristics with exploratory data anal- of EDA. In the absence of perfect knowledge about the user’s task, ysis (EDA) [15, 37]. Both involve activities of browsing (gaining Voyager presents visualizations of appropriate yet diverse view types an overview and engaging in serendipitous discovery) and searching that cover a variety of data variables for analysts to examine. (finding answers to specific questions). Users must clarify vague in- Multiple visualizations are often presented in a gallery to facilitate formation needs, learn from exposure to information, and iteratively data exploration. The classic Design Galleries work [27] shows al- investigate solutions. In either exploratory search or EDA, people may ternatives of user-generated views by varying the choice of encoding be unfamiliar with the resources at hand (e.g., specific datasets), in the parameters. Van den Elzen [38] similarly allows users to browse a small midst of forming goals, or unsure about how to best achieve their goals. number of parameter variants using small multiples of alternative views. Exploratory search is typically supported through browser interfaces. Both allow users to explore a small neighborhood of the visualization Faceted browsing [47] is a popular approach for exploring collections specification space. In contrast, Voyager presents both data variations in which users specify filters using metadata to find subsets of items and design variations to facilitate broader data exploration. sharing desired properties. Interactive query refinement — by up-voting VizDeck [29] presents a gallery of recommended charts based on or down-voting metadata or items [21, 22] — can further facilitate statistical properties of interest. The system includes a voting mech- exploration. In addition, recommender systems (sometimes in the form anism by which users can adjust the ranking and supports keyword of collaborative filtering [18]) can be used to populate a browser with queries to search for charts. Voyager is instead designed to support ostensibly relevant items, particularly when the number of items renders browsing, which is more suitable for exploratory tasks [43]. Voyager manual inspection intractable. seeks to promote broader coverage of the search space and navigation Here we seek to adapt these approaches to the domain of exploratory by including or omitting selected data variables. visual analysis. We contribute a browser interface for statistical graph- ics of a single relational table, and support navigation using facets such 3 U SAGE S CENARIO as the data schema, applicable data transformations, and valid visual en- We first motivate the design of Voyager with a usage scenario. We illus- codings. As the set of possible charts is typically too large to manually trate how an analyst can use the system to examine data about cars [17]. inspect, we also contribute a visualization recommender system that The dataset contains 406 rows (cars) and 9 columns (variables).

3. Upon loading the data, the analyst examines the list of variables in B the schema panel and their univariate summaries in the main gallery (Figure 2). Starting from the top left, she observes that most of the cars have 4, 6, or 8 cylinders (Figure 2a). Using the toggle button ( ) to sort A the name histogram by number of records, she notices that Ford Pinto has the highest frequency, with 6 records (Figure 2b). The majority C of the cars are from origin A (coded information, Figure 2c) and the years 1970-1982 (Figure 2d). Most of the quantitative variables appear to have log-normal distributions except for acceleration, which looks normally distributed (Figure 2e). Intrigued by horsepower, the analyst clicks that variable in the D E schema panel. The system in turn updates the gallery with relevant vi- sualizations (Figure 3). The exact match section (Figure 3a) lists charts with varied transformations of horsepower. The analyst inspects the dot plot of horsepower and hovers over the maxima (Figure 3e) to discover that the car with highest horsepower is a Pontiac Grand Prix. She then glances at the suggestion section (Figure 3b), which shows charts with additional variables. She notices a correlation between horsepower and cylinder, and bookmarks the view so she can revisit it for targeted question answering after she completes her initial exploration. The analyst wonders if other variables might be correlated with both horsepower and cylinder, so she selects cylinder in the schema panel. The display updates as shown in Figure 1. Looking at the first view in the suggestion section (Figure 1, leftmost view in the bottom section), she sees that acceleration is correlated with both variables. The analyst Fig. 2. The main gallery shows univariate summaries upon loading. would like to see other ways to visualize these three variables, so she clicks the view’s expand button ( ). This action opens the expanded gallery (Figure 4), which shows different encodings of the same data. She selects a small multiple view grouped by cylinder (Figure 4b), so she can easily spot outliers in each group (Figure 5). At this point, the analyst wants to explore other parts of the data. She clicks the reset button to clear the selection and starts selecting new vari- ables of interest to look at relevant visualizations. As her exploration proceeds, she bookmarks interesting views for future investigation in the bookmark gallery (Figure 6). 4 T HE D ESIGN OF VOYAGER In this section we present our motivating design considerations and describe the design of the Voyager user interface. We defer discussion of technical implementation details to the next section. 4.1 Design Considerations While creating Voyager we faced many design decisions. The interface should not overwhelm users, yet must enable them to rapidly browse collections of visualizations with minimal cognitive load. To guide our process, we developed a set of considerations to inform visual- Fig. 3. Selecting horsepower updates the main gallery. (a) The exact ization recommendation and browsing. These considerations were in- match section shows different transformations for horsepower. (b) The formed by existing principles for visualization design [24], exploratory suggestion section shows charts with suggested variables in addition search [14, 43], and mixed-initiative systems [19], then refined through to horsepower. (c,d) Each section’s header bar describes its member our experiences across multiple design iterations. views. (e) Hovering over a point reveals a tooltip with more information. C1. Show data variation, not design variation. We adapt this well-known maxim from Tufte [36] to the context of visualization C galleries. To encourage breadth-oriented exploration [28], Voyager prioritizes showing data variation (different variables and transforma- A tions) over design variation (different encodings of the same data). To discourage premature fixation and avoid the problem of “empty results” [14], Voyager shows univariate summaries of all variables prior to user interaction. Once users make selections, it suggests additional variables beyond those explicitly selected. To help users stay oriented, B avoid combinatorial explosion, and reduce the risk of irrelevant dis- plays, Voyager currently “looks ahead” by only one variable at a time. C2. Allow interactive steering to drive recommendations. Ana- lysts’ interests will evolve as they browse their data, and so the gallery must be adaptable to more focused explorations. To steer the recom- mendation engine, Voyager provides facet controls with which analysts can indicate those variables and transformations they wish to include. C3. Use expressive and effective visual encodings. Inspired by Fig. 4. The expanded gallery for cylinder, horsepower, and acceleration. (a) The main panel presents the selected chart in an enlarged view. (b) prior work on automatic visualization design [24, 25], Voyager prevents The sidebar shows alternative encodings for the expanded data. misleading encodings by using a set of expressiveness criteria and ranks encodings based on perceptual effectiveness metrics [4, 8].

4.Fig. 5. Scatter plots of horsepower vs. acceleration, partitioned by cylinder. An analyst hovers the mouse over an outlier to view details-on-demand. 4.2.1 The Schema Panel The schema panel (Figure 1, left) presents a list of all variables in the data table. By default the list is ordered by data type and then alphabet- ically. For each variable, the schema panel shows the following items from left to right: (1) a checkbox representing inclusion of the variable in the recommendation, (2) a caret button for showing a popup panel for selecting transformations, (3) a data type icon, (4) variable name and function, (5) and a basic information button , which upon hover shows descriptive statistics and samples in a tooltip. To steer the recommendations (C2), users can click a variable to toggle its inclusion or can select transformation functions in the popup panel revealed by clicking the caret. Selected variables are also high- lighted with a surrounding capsule. Similar capsules are used in the gallery to facilitate comparison (C4). Data transformation functions are indicated using bold capitalized text (e.g., MEAN). 4.2.2 The Main Gallery: Browsing Recommendations Fig. 6. A bookmark gallery of visualizations saved by an analyst. The main gallery presents views that represent different data subsets C4. Promote reading of multiple charts in context. Browsing relevant to the selected variables. To prioritize data variation over multiple visualizations is a complex cognitive process, arguably more design variation (C1), each view in the main gallery shows the top- so than image or product search. We must consider not only the com- ranked encoding for each unique set of variables and transformations. prehension of charts in isolation, but also in aggregate. When possible, To help provide meaningful groups (C4), the gallery is divided into Voyager consistently orders related charts such that effort spent in- two sections: exact match and suggestion. The top of each section terpreting one chart can aid interpretation of the next. For example, (Figure 3c-d) contains a header bar that provides a description of its Voyager aligns axis positions and uses consistent colors for variables member views. The exact match section (Figure 3a) presents views (Figure 1). Voyager organizes suggested charts by clustering encoding that include only selected variables. In contrast, the suggestion section variations of the same data and showing a single top-ranked exemplar (Figure 3b) includes suggested variables in addition to selected vari- of each cluster. If desired, users can drill-down to browse varied encod- ables. If the user has not selected any variables (as in Figure 2), only ings of the data. Voyager also partitions the main gallery into a section the suggestion section is shown, populated with univariate summaries that involves only user-selected variables and a section that includes (C1). additional (non-selected) variables recommended by the system. Each entry in the gallery contains an interactive visualization. The C5. Prefer fine-tuning to exhaustive enumeration. Even a simple top of each view lists its member variables in capsules. The capsules for chart might have a number of important variations, including the choice user-selected variables (solid border, darker background, Figure 7a) are of sort order, aspect ratio, or scale transform (e.g., linear vs. log). Rather visually differentiated from capsules for suggested variables (dashed than using up space in the gallery with highly similar designs, Voyager border, lighter background, Figure 7b). The top right of each view collapses this space of options to a single chart with default parameters, (Figure 7c) contains bookmark and expand view buttons. During ex- but supports simple interactions to enable fine-tuning. ploration, analysts can bookmark views they wish to share or revisit C6. Enable revisitation and follow-up analysis. Successful ex- (C6); bookmarked visualizations can be viewed in the bookmark gallery plorations may result in a number of insights worthy of further study. (Figure 6). Analysts can also hover over data points to view details-on- Exploratory tools should assist the transition to other stages of analysis. demand (Figure 5). Voyager provides a bookmarking mechanism to allow analysts to revisit Voyager attempts to parameterize and layout charts such that reading interesting views or to share them with collaborators. By represent- one chart facilitates reading of subsequent related charts (C4). To do so, ing all visualizations in a high-level grammar (Vega-lite), Voyager can Voyager places charts with shared axes in close proximity to each other. easily export visualizations for publishing or sharing with other tools. Moreover, Voyager suggests the same visual encoding (axis position, sorting, spacing, palettes, etc.) for the same variable to aid scanning and reduce visual clutter. For example, it uses a consistent color palette 4.2 The Voyager User Interface for cylinders and aligns y-axes for cylinders and horsepower in Fig- Voyager’s interface (Figure 1) consists of a schema panel (left) and a ure 1). To further aid comparison, all capsules for the same variable are visualization gallery (right). Analysts can select variables and desired highlighted when the user hovers over a capsule. transformations in the schema panel; these selections become input A B C for the recommendation algorithm. The main gallery presents rec- ommended visualizations. Each chart supports interactive refinement, bookmarks, and expansion to increase the chart size and see related views. Undo buttons are provided in the top panel (Figure 1, top). Fig. 7. The top of each view shows user-selected variables (a), sug- gested variables (b), and buttons to bookmark or expand the view (c).

5. level of detail (detail) for specifying additional group-by values, and facets (row, column) for creating trellis plots [3, 36]. A speci- fication of each encoding channel (encoding) includes the assigned variable’s name, data type, scale and axis properties, and transforma- tions. Vega-lite supports nominal, ordinal, quantitative, and tempo- ral data types [34]. Supported transformations include aggregation (summarize), binning (bin), sorting (sort), and unit conversion for temporal variable (timeUnit). For example, year, month, and other time abstraction values can be derived from temporal variables. { " data " : {" url " : " data / cars . json "}, " marktype " : " point " , " encoding " : { "x": { " name " : " Miles_per_Gallon " , " type " : " Q " , " summarize " : " mean " }, Fig. 8. Voyager’s system architecture. Voyager uses Compass to gener- "y": { ate clustered and ranked Vega-lite specifications. These specifications " name " : " Horsepower " , are translated to Vega and rendered in the Voyager interface. " type " : " Q " , " summarize " : " mean " 4.2.3 The Expanded Gallery: Inspecting Alternative Encodings }, " row " : { An analyst can click a chart’s expand view button to invoke the expanded " name " : " Origin " , gallery (Figure 4). This mode allows analysts to interact with a larger " type " : " N " , visualization and examine alternative visual encodings of the same data. " sort " : [{" name " : " Horsepower " , The top-right corner of the main panel (Figure 4c) includes controls " summarize " : " mean " , " reverse " : true }] for interactive refinement (C5): transposing axes, sorting nominal }, or ordinal dimensions, and adjusting scales (e.g., between linear and " color " : {" name " : " Cylinders " , " type " : " N "} log). Thumbnails of alternative encodings are presented in a sidebar. } Analysts can click a thumbnail to load the chart in the main panel. } Listing 1. A Vega-lite specification of the visualization shown in Figure 10. 5 T HE VOYAGER S YSTEM The JSON object specifies a trellis of scatter plots for a data about cars. We now describe Voyager’s system architecture. Figure 8 depicts the re- Each plot shows for one origin (row) the mean miles per gallon (x) and lationships between the major system components. Voyager’s browser mean horsepower (y), broken down by the number of cylinders (color). interface displays visualizations and supports user navigation and in- Origin and number of cylinders are nominal while miles per gallon and teraction. Visualizations are specified using Vega-lite, a declarative horsepower are quantitative. The scatter plots are sorted by the mean grammar that compiles to detailed Vega [40] visualization specifica- horsepower per origin. tions. The Compass recommendation engine takes user selections, the data schema and statistical properties as input, and produces recommen- Vega-lite makes default assignments for parameters such as axis dations in the form of Vega-lite specifications. The recommendations scales, colors, stacking, mark sizes, and bin count. These parameters are clustered by data and visual similarity, and ranked by perceptual can be explicitly specified to override default values. Nominal variables effectiveness heuristics. Each of these components is implemented in are mapped to ordinal scales by alphabetical order unless an explicit JavaScript, and is individually available as an open-source project. order is provided. When assigned to color, nominal variables are mapped to hue using Tableau’s categorical color palette, while other 5.1 Vega-lite: A Formal Model For Visualization variables are mapped to luminance. When a color channel is used We developed the Vega-lite specification language to provide a formal with a bar or area mark type, Vega-lite creates a stacked chart. The model for representing visualizations in Voyager. Vega-lite is mod- band size of an ordinal scale is automatically adjusted based on the eled after existing tools and grammars such as Tableau’s VizQL [35], assigned variable’s cardinality. Vega-lite determines properties such as ggplot2 [44], and Wilkinson’s Grammar of Graphics [45]. Vega-lite bin count based on the encoding channel: the default max bin count is specifications consist of a set of mappings between visual encoding 7 for color and 20 for positional encodings. channels and (potentially transformed) data variables. Like other high- In the future, we plan to extend Vega-lite with additional features level grammars, these specifications are incomplete, in the sense that such as cartographic mapping, polar coordinates, and layering multiple they may omit details ranging from the type of scales used to visual variables (including dual axis charts). The goal of this work, however, elements such as fonts, line widths and so on. The Vega-lite compiler is to investigate different modes of visual exploration and the current uses a rule-based system to resolve these ambiguities and translate a implementation of Vega-lite is sufficiently expressive for this purpose. Vega-lite specification into a detailed specification in the lower-level Vega visualization grammar [40]. Though initially developed for Voy- 5.2 The Compass Recommendation Engine ager, Vega-lite can serve as a model for other tools. For example, we The goal of the Compass recommendation engine is to support rapid, built the PoleStar visualization specification tool (§6.1) using Vega-lite. open-ended exploration in Voyager. Compass generates an expressive A Vega-lite specification is a JSON object (see Listing 1) that de- set of visualization designs (C3) represented using Vega-lite specifica- scribes a single data source (data), a mark type (marktype), key-value tions. Compass also prunes the space of recommendations based on visual encodings of data variables (encoding), and data transformations user selection (C2) and clusters results into meaningful groups (C4). including filters (filter) and aggregate functions. Vega-lite assumes Compass takes the following input: (1) the data schema, which a tabular data model: each data source is a set of records, where each contains a set of variables (D); (2) descriptive statistics for each variable record has values for the same set of variables. including cardinality, min, max, standard deviation, and skew; (3) the Vega-lite currently supports Cartesian plots (with mark types points, user selection, which consists of a set of selected variables (U ⊂ D), bars, lines or areas), and pivot tables (with mark type text). Avail- preferred transformations for each variable, and a set of excluded able encoding channels include position (x, y), color, shape, size, variables.

6. Derived Clusters of Encodings Encoding Data Tables Design Suggested Data Variable Sets Horsepower Transformation Encoding Selected ⌃ Design Variable Set Bin(Horsepower), Horsepower U Count Variable Horsepower Data Encoding Horsepower Selection Cylinder Transformation Horsepower, Design D Cylinder Horsepower A Name Mean(Horsepower), Encoding Data Cylinder Design Transformation C B Fig. 9. Compass’s 3-phase recommendation engine. (a) Variable selection takes user-selected variable sets and suggests additional variables (b) Data transformation applies functions including aggregation and binning to produce data tables for each variable set. (c) Encoding design generates visual encodings for each data table, ranks results by perceptual effectiveness score, and prunes visually similar results. Compass enumerates, ranks and prunes recommendations in three 5.2.2 Applying Data Transformations phases, taking output from each phase as input to the next phase. The For each suggested variable set V ∈ Σ from the first phase, Compass process is depicted in Figure 9. First, Compass selects variables by enumerates applicable transformations for each variable v ∈ V to rec- taking user-selected variable sets and suggesting additional variables. It ommend an ordered set of data tables Γ. then applies data transformations, including aggregation and binning, Compass produces both raw tables without aggregation to provide to produce a set of derived data tables. For each data table, it designs details and aggregate tables to provide summaries. For raw tables, each encodings based on expressiveness and effectiveness criteria (C3) and variable is untransformed by default, but users can perform binning prunes visually similar results to avoid exhaustive enumeration (C5). if desired. For aggregate tables, variables either serve as measures The multiple phases of pruning and ranking allow Compass to constrain (values amenable to aggregation) or dimensions (values to group by). the search space early in the generation process, and to produce clusters By default, each quantitative variable is treated as a measure, while of visualizations that are grouped by their underlying data tables. each ordinal, nominal, or temporal variable is treated as a dimension. Our initial Compass design is admittedly modest. Though more Compass averages (MEAN) quantitative measures by default; users advanced recommender systems are possible, the primary goal of this can choose to apply other aggregation functions such as SUM, MIN, paper is to develop and evaluate an overall approach to breadth-oriented MAX. Averages may be affected by outliers or mix effects [1], but are data exploration. In lieu of more sophisticated methods, we intention- also more likely to be generally useful. In our experience, defaulting ally limit ourselves to “single-step” variable additions and interpretable, to sums results in plots that are not always meaningful and skewed deterministic heuristics for pruning and ranking. We view the design when the number of records varies across dimensions. Users can also and evaluation of improved recommenders (likely expressible within choose to treat quantitative variables as dimensions by either using the current Compass architecture) as important future research. the untransformed values or binning. Compass determines the largest units of time within the extents of temporal variables. For example, if a 5.2.1 Selecting Variables variable spans within one year, MONTH is applied. Ordinal variables are Compass first suggests variables beyond what the user has explicitly se- untransformed by default. lected, producing new variable sets for encoding. The primary goals of Derived tables are first ordered by the rank of its corresponding this phase are to recommend additional variables that the analyst might variable set. (Tables derived from U come before tables from V1 , which otherwise overlook (C1) and to avoid the “empty results” problem [14]. in turn come before tables from V2 , and so on.) For tables from the Prior to user selection, Compass suggests a univariate summary of same variable set, Compass then orders raw tables before aggregate each variable. When a user selects a variable set U (of size |U| = k tables to provide a consistent ordering (C4). ), Compass returns a sequence of variable sets Σ = [U,V1 ,V2 , ...,Vn ], where U is the original user selection, n = |D −U|, and each Vi contains 5.2.3 Designing Encodings k + 1 variables: the k user-selected variables U along with exactly one For each data table T ∈ Γ, Compass applies visualization design best additional (non-selected) variable vi ∈ D −U, such that Vi = U ∪ {vi }. practices drawn from by prior research [4, 9, 24, 25, 42] to generate For example, if a user selects {horsepower}, Compass may return and rank a set of encodings ET (C3). the variable sets U = {horsepower}, V1 = {horsepower, cylinder}, Generation. Compass first enumerates candidates for the encoding V2 = {horsepower, year}, and so on. set ET by composing permutations of data variables, visual encoding Compass recommends all non-selected variables (all vi ∈ D − U) channels, and mark types. It first assigns each variable v ∈ T to all by default, but analysts can interactively exclude variables from the permitted visual encoding channels (Table 1) to generate a set of map- suggestions to focus on a particular variable set of interest (C2). pings MT . Then it generates each encoding candidate by combining The generated variable sets are returned in a sorted order. The user’s each mapping m ∈ MT with each valid mark type. selected variable set is always ranked first, as it is the most relevant to Compass considers multiple criteria to determine whether a mark the user’s specified intention. The remaining variable sets [V1 ,V2 , ...,Vn ] type is appropriate for a given mapping m. For a given mark type, are ordered by the type and name of the recommended variable vi , con- some encoding channels are required, while some are disallowed (see sistent with the display order in Voyager’s schema panel. This approach Table 2). For example, Compass requires a mapping to have both x provides a predictable and consistent ordering, which works well for and y encodings if used with a line or area mark. Such constraints the common case of datasets with a sufficiently bounded number of ensure the production of appropriate visualizations (here, a proper variables to allow users to scroll through the recommendations. For line or area chart). After determining the set of supported mark types, datasets with a large number of variables, we plan to extend Compass Compass assigns the mark type that best respects expressiveness criteria to support multiple relevancy rankings based on statistical measures according to the rankings listed in Table 3, indexed by the data types of (e.g., [33]) and allow analysts to select measures that fit their interests. the x and/or y encodings.

7. Data Types Encoding Channels Data Types Mark Types quantitative, temporal x,y > size > color > text Q tick > point > text ordinal x,y > column, row > color > size (O or N) × (O or N) point > text nominal x,y > column, row > color > shape Q×N bar > point > text Q × (T or O) line > bar > point > text Table 1. Permitted encoding channels for each data type in Compass, Q×Q point > text ordered by perceptual effectiveness rankings. Table 3. Permitted mark types based on the data types of the x and y Required Supported Channels channels. N, O, T , Q denote nominal, ordinal, temporal and quantitative Mark Types Channels X, Y Column, Row Color Shape Size Detail Text types, respectively. point x or y tick x or y Positions x, y bar x or y line, area x and y Facets column, row text and Level of detail color (hue), shape, detail text Retinal measures color (luminance), size (row or column) Table 2. Required and permitted encoding channels by mark type. Table 4. Encoding channel groups used to perform clustering. In addition to Tables 1-3, Compass considers interactions among descriptive statistics. The type inference procedure determines whether visual variables (e.g., the perceptual separability of visual channels [42]) a variable is nominal, ordinal, quantitative or temporal based on primi- and avoids creating ineffective charts. It produces mappings that encode tive types (e.g., integer, string, float), special formats (for dates), and color, size, or shape only when the mappings also contain both x and statistics (e.g., low cardinality integers are treated as ordinal). y. In other words, Compass omits dot plots that use these encodings, as In the main gallery (§4.2.2), views are laid out using HTML5’s flex they would likely suffer from occlusion and visual clutter. display. Each view has the same height and has a maximum width. A Ranking. Compass ranks the generated encodings using perceptual view that displays a visualization larger than the view size includes a effectiveness metrics. Compass applies prior work by Cleveland [8] local scroll bar, which is activated by hovering for 500ms in order to and Mackinlay [24] to rank the effectiveness of each visual channel disentangle local and global scrolling. By default, Voyager loads up based on a variable’s data type (Table 1). Compass also considers the to a fixed number of views to safeguard application performance, but cardinality, or number of unique values, of a data variable. Encoding users can load more visualizations as they scroll down the page. high cardinality variables with color, shape, row, or column can lead to poor color or shape discrimination or massive, sparse trellis plots. 6 E VALUATION : VOYAGER VS . P OLE S TAR Moreover, the effectiveness of each visual channel is not measured in We conducted a user study to contrast recommendation browsing with isolation. Since over-encoding can impede interpretation [42], Compass manual chart construction, focusing on exploratory analysis of pre- penalizes encodings that use multiple retinal encodings (e.g., both viously unseen data. We compared Voyager with PoleStar, our own color and shape, or both color and size). implementation of a visualization specification interface (Figure 10). Compass takes into account that the produced charts will be pre- We hypothesized that Voyager would encourage breadth-first con- sented in a gallery, with the goal of promoting charts that are easier sideration of the data, leading to higher coverage of unique variable to read (C4). Vertical bar charts and histograms are preferred if their combinations. Given its direct control over visual encodings, we ex- dimensions are binned quantitative or temporal variables. Otherwise, pected PoleStar to be better for targeted depth-first question answering. horizontal bar charts are preferred as their axis labels are easier to read. Horizontal line and area charts are favored over vertical ones. Compass 6.1 Study Design also privileges encodings that use less screen space, and hence are Our study followed a 2 (visualization tool) × 2 (dataset) mixed design. more easily browsed in the gallery. For example, colored scatter plots Each participant conducted two exploratory analysis sessions, each (Figure 4a) are ranked higher than small multiple plots (Figure 5). with a different visualization tool and dataset. We counterbalanced the Compass maps all of the above features to scalar values and calcu- presentation order of tools and datasets across subjects. lates a weighted sum to derive the effectiveness score s(e) for each Visualization Tools. Participants interacted with two visualization encoding candidate e. We have manually tuned the current weights tools: Voyager and PoleStar, a manual specification tool. Rather than and scores for each feature through a series of refinements and tests. use an existing tool such as Tableau, we implemented PoleStar (named Automatic determination of these parameters remains as future work. in honor of Polaris [35]) to serve as a baseline interface, allowing Clustering. To prevent exhaustive enumeration (C5), Compass us to control for external factors that might affect the study. Like groups encoding candidates that map the same variables to similar en- Voyager, PoleStar models visualizations using Vega-lite. In fact, any coding channels listed in Table 4, and suggests only the most effective visualization suggested by Voyager can also be constructed in PoleStar, view in each group. This includes variants caused by swapping vari- ensuring comparable expressivity. PoleStar also features similar UI ables in the positional or facet encodings (producing transposed charts elements, including field capsules, bookmarks, and an undo mechanism. as depicted in Figure 9d), or by selecting alternative retinal encodings Figure 10 illustrates PoleStar’s interface. The left-hand panel (e.g., shape instead of color). All the suggested views are also sorted presents the data schema, listing all variables in the dataset. Next by the effectiveness score s. Therefore, for each T ∈ Γ, Compass pro- to the data schema are the encoding shelves, which represent each duces an ordered set of visually different visualizations ET , ranked by encoding channel supported by Vega-lite. Users can drag and drop their perceptual effectiveness. a variable onto a shelf to establish a visual encoding. Users can also As a result, Compass recommends clusters of visualizations grouped modify properties of the data (e.g., data types, data transformations) or by their corresponding data tables. To privilege data variation over the visual encoding variable (e.g., color palette or sort order) via popup design variation (C1), the main gallery presents the top ranked visu- menus. The mark type can be changed via a drop-down menu. Upon alization for each table T ∈ Γ. When the user expands a view that user interaction, PoleStar generates a new Vega-lite specification and shows data table T , the expanded gallery displays a set of visually dif- immediately updates the display. ferent views ET and provides an interface for refining presented views Datasets. We provided two datasets for participants to explore. One (Figure 4c). is a dataset of motion pictures (“movies”) comprising title, director, genre, sales figures, and ratings from IMDB and Rotten Tomatoes. The 5.3 Implementation Notes table has 3,201 records and 15 variables (7 nominal, 1 temporal, 8 We implemented Voyager as a web application using the AngularJS quantitative). The other dataset is a redacted version of FAA wildlife framework. When the user selects a dataset, the application asyn- airplane strike records (“birdstrikes”). The table has 10,000 records chronously loads the data, determines the variable types, and calculates and 14 variables (8 nominal, 1 geographic, 1 temporal, 4 quantitative).

8. and qualitative feedback. To perform hypothesis testing over user performance data, we fit linear mixed-effects models [2]. We include visualization tool and session order as fixed effects, and dataset and participant as random effects. These models allow us to estimate the effect of visualization tool while taking into account variance due to both the choice of dataset and individual performance. We include an intercept term for each random effect (representing per-dataset and per-participant bias), and additionally include a per-participant slope term for visualization tool (representing varying sensitivities to the tool used). Following common practice, we assess significance using likelihood-ratio tests that compare a full model to a reduced model in which the fixed effect in question has been removed. 6.2.1 Voyager Promotes Increased Data Variable Coverage To assess the degree to which Voyager promotes broader data explo- ration, we analyze the number of unique variable sets (ignoring data transformations and visual encodings) that users are exposed to. While users may view a large number of visualizations with either tool, these might be minor encoding variations of a data subset. Focusing on unique variable sets provides a measure of overall dataset coverage. While Voyager automatically displays a number of visualizations, this does not ensure that participants are attending to each of these Fig. 10. PoleStar, a visualization specification tool inspired by Tableau. views. Though we lack eye-tracking data, prior work indicates that the Listing 1 shows the generated Vega-lite specification. mouse cursor is often a valuable proxy [12, 20]. As a result, we analyze both the number of variable sets shown on the screen and the number We removed some variables from the birdstrikes data to enforce parity of variable sets a user interacts with. We include interactions such as among datasets. We chose these datasets because they are of real-world bookmarking, view expansion, and mouse-hover of a half-second or interest, are of similar complexity, and concern phenomena accessible more (the same duration required to activate view scrolling). Analyzing to a general audience. interactions provides a conservative estimate, as viewers may examine Participants. We recruited 16 participants (6 female, 10 male), all views without manipulating them. For PoleStar, in both cases we students (14 graduate, 2 undergraduate) with prior data analysis ex- simply include all visualizations constructed by the user. perience. All subjects had used visualization tools including Tableau, We find significant effects of visualization tool in terms of both the Python/matplotlib, R/ggplot, or Excel.1 No subject had analyzed the number of unique variable sets shown (χ 2 (1, N = 32) = 38.056, p < study datasets before, nor had they used Voyager or PoleStar (though 0.001) and interacted with (χ 2 (1, N = 32) = 19.968, p < 0.001). With many found PoleStar familiar due to its similarity to Tableau). Each Voyager, subjects were on average exposed to 69.0 additional variable study session lasted approximately 2 hours. We compensated partici- sets (over a baseline of 30.6) and interacted with 13.4 more variable pants with a $15 gift certificate. sets (over a baseline of 27.2). In other words, participants were exposed Study Protocol. Each analysis session began with a 10-minute tuto- to over 3 times more variable sets and interacted with 1.5 times more rial, using a dataset distinct from those used for actual analysis. We then when using Voyager. briefly introduced subjects to the test dataset. We asked participants to In the case of interaction, we also find an effect due to the presen- explore the data, and specifically to “get a comprehensive sense of what tation order of the tools (χ 2 (1, N = 32) = 5.811, p < 0.05). Subjects the dataset contains and use the bookmark features to collect interesting engaged with an average of 6.8 more variable sets (over the 27.2 base- patterns, trends or other insights worth sharing with colleagues.” To line) in their second session. encourage participants to take the analysis task seriously, we asked them to verbally summarize their findings after each session using 6.2.2 Bookmark Rate Unaffected by Visualization Tool the visualizations they bookmarked. During the session, participants We next analyze the effect of visualization tool on the number of verbalized their thought process in a think-aloud protocol. We did not bookmarked views. Here we find no effect due to tool (χ 2 (1, N = 32) = ask them to formulate any questions before the session, as doing so 0.060, p = 0.807), suggesting that both tools enable users to uncover might bias them toward premature fixation on those questions. We gave interesting views at a similar rate. We do observe a significant effect due subjects 30 minutes to explore the dataset. Subjects were allowed to to the presentation order of the tools (χ 2 (1, N = 32) = 9.306, p < 0.01). end the session early if they were satisfied with their exploration. On average, participants bookmarked 2.8 additional views (over a All sessions were held in a lab setting, using Google Chrome on baseline of 9.7 per session) during their second session. This suggests a Macbook Pro with a 15-inch retina display set at 2,880 by 1,980 that participants learned to perform the task better in the latter session. pixels. After completing two analysis sessions, participants completed an exit questionnaire and short interview in which we reviewed subjects’ 6.2.3 Most Bookmarks in Voyager include Added Variables choice of bookmarks as an elicitation prompt. Of the 179 total visualizations bookmarked in Voyager, 124 (69%) Collected Data. An experimenter (either the first or second author) include a data variable automatically added by the recommendation en- observed each analysis session and took notes. Audio was recorded to gine. Drilling down, such views constituted the majority of bookmarks capture subjects’ verbalizations for later review. Each visualization tool for 12/16 (75%) subjects. This result suggests that the recommendation recorded interaction logs, capturing all input device and application engine played a useful role in surfacing visualizations of interest. events. Finally, we collected data from the exit survey and interview, including Likert scale ratings and participant quotes. 6.2.4 User Tool Preferences Depend on Task In the exit survey we asked subjects to reflect on their experiences 6.2 Analysis & Results with both tools. When asked to rate their confidence in the compre- We now present a selected subset of the study results, focusing on hensiveness of their analysis on a 7-point scale, subjects responded data variable coverage, bookmarking activity, user survey responses, similarly for both tools (Voyager: µ = 4.88, σ = 1.36; PoleStar: µ = 4.56, σ = 1.63; W = 136.5, p = 0.754). Subjects rated both tools 1 All participants had used Excel. Among other tools, 9 had used Tableau, 13 comparably with respect to ease of use (Voyager: µ = 5.50, σ = 1.41; had used Python/matplotlib and 9 had used R/ggplot. PoleStar: µ = 5.69, σ = 0.95; W = 126, p = 0.952).

9. Participants indicated which tool they would prefer for the tasks of to best fit such large plots in small spaces, perhaps via paging and exploration vs. targeted analysis. Subjects roundly preferred Voyager “scrubbing” interactions common to videos or image collections. for exploration (15/16, 94%) and PoleStar for question answering An important avenue for continued research is the design and eval- (15/16, 94%) – a significant difference (χ 2 (1) = 21.125, p < 0.001). uation of more sophisticated (and more scalable [39]) visualization Finally, we asked subjects to rate various aspects of Voyager. All but recommenders. Our current approach is intentionally conservative, one (15/16, 94%) rated Voyager’s recommendations as “Helpful” or seeking to provide useful recommendations while helping users stay “Very Helpful”. When asked if Voyager’s inclusion of additional (non- oriented. The current system is deterministic, based on best practices selected) variables was helpful, 14/16 (88%) responded “Helpful” or formalized as heuristics in a rule-based system. In future work, we “Very Helpful”, with 2 responding “Neutral”. We also asked participants hope to explore probabilistic recommendation models that can learn to agree or disagree with the statement “The recommendations made by improved ranking functions over time (e.g., by updating parameters in Voyager need improvement.” Here, 8/16 subjects (50%) agreed with the response to user input, including bookmarks and explicit ratings). Such statement, 5/16 (31%) were neutral and 3/16 (19%) disagreed. This last work also opens up possibilities for studying personalization or domain result surprised us, as we expected all subjects would request refined adaptation. For example, might different data domains benefit from relevance rankings. In aggregate, these results suggest that though there differing recommendation strategies? remains room for improvement, the current Voyager system already A clear next step is to better integrate breadth-first and depth-first vi- provides a valuable adjunct to exploratory analysis. sual analysis tools. How might Voyager and PoleStar be most fruitfully combined? One straightforward idea is that Voyager users could fur- 6.2.5 Participant Feedback: Balancing Breadth & Depth ther drill-down into plots to enable PoleStar-like refinement. However, Participants’ comments reinforce the quantitative results. Subjects it is not immediately clear if refinements should backpropagate into appreciated Voyager’s support for broad-based exploration. One said broad exploration. How might view refinements inform subsequent that “Voyager gave me a lot of options I wouldn’t have thought about recommendations in a consistent, understandable fashion? on my own, it encouraged me to look more deeply at data, even data I Finally, further empirical work is needed to better understand the didn’t know a lot about”. Another “found Voyager substantially more analysis process, including breadth- and depth-oriented strategies. Our helpful in helping me learn and understand the dataset,” while a third user study, for instance, resulted in rich event logs and think-aloud felt Voyager “prompted me to explore new questions in a way that transcripts that are ripe for further analysis beyond the scope of this didn’t derail me from answering follow-up questions.” paper. By recording and modeling analysis activities, we might better Subjects also reflected on the complementary nature of Voyager and characterize analysts’ strategies and inform tool development. PoleStar for the tasks of breadth- vs. depth-oriented exploration. One To support these and other future research questions, the system user noted that “with Voyager, I felt like I was scanning the generated components described in this paper (Voyager, Compass, PoleStar, visualizations for trends, while with PoleStar, I had to think first about and Vega-lite) are all freely available as open-source software at what questions I wanted to answer, then make the visualizations for http://vega.github.io. We hope these systems will provide valuable them.” Another wrote that Voyager “is really good for exploration but building blocks and shared reference points for visualization research cumbersome for specific tasks.” All but one subject wished to use a and development. hybrid of both tools in the future. For example, one participant said that “if I have to just get an overview of the data I would use Voyager ACKNOWLEDGMENTS to generate visualizations, and then dive in deep using PoleStar,” while another envisioned that “I would start with Voyager but want to go and We thank the anonymous reviewers, Magda Balazinska, Daniel switch to PoleStar to dive into my question. Once that question was Halperin, Hanchuan Li, Matthew Kay, and members of the Interac- answered, I would like to switch back to Voyager.” tive Data Lab and Tableau Research for their comments in improving this paper. This work was supported in part by the Intel Big Data 7 D ISCUSSION AND F UTURE W ORK ISTC, DARPA XDATA, the Gordon & Betty Moore Foundation, and We presented Voyager, a mixed-initiative system to facilitate breadth- the University of Washington eScience Institute. Part of this work was oriented data exploration in the early stages of data analysis. Voyager developed during the first author’s internship at Tableau Research in contributes a visualization recommender system (Compass) to power a 2013. We also thank the Noun Project and Dmitry Baranovskiy for the novel browsing interface that exchanges manual chart specification for “person” icon used in Figure 8. interactive browsing of suggested views. In a user study comparing Voy- ager with a visualization tool modeled after Tableau (PoleStar), we find R EFERENCES that Voyager encourages broader exploration, leading to significantly [1] Z. Armstrong and M. Wattenberg. Visualizing statistical mix effects and greater coverage of unique variable combinations. The vast majority of simpson’s paradox. IEEE Transactions on Visualization and Computer participants (15/16) expressed a preference for using Voyager in future Graphics (Proc. InfoVis), 20(12):2132–2141, 2014. exploration tasks. This result is encouraging and indicates the value [2] D. J. Barr, R. Levy, C. Scheepers, and H. J. Tily. Random effects structure of improved support for early-stage exploration: PoleStar is based on for confirmatory hypothesis testing: Keep it maximal. Journal of memory a popular interaction model backed by over a decade of research and and language, 68(3):255–278, 2013. industrial use, whereas Voyager is relatively new and untested. [3] R. A. Becker, W. S. Cleveland, and M.-J. Shyu. The visual design and con- That said, we view Voyager as a first step towards improved systems trol of trellis display. Journal of computational and Graphical Statistics, that balance automation and manual specification. First, multiple av- 5(2):123–155, 1996. enues for future work lie in perfecting the Voyager interface. Further [4] J. Bertin. Semiology of graphics: diagrams, networks, maps. University iterations could support more navigational facets (e.g., based on chart of Wisconsin press, 1983. [5] M. Bostock and J. Heer. Protovis: A graphical toolkit for bisualization. types, statistical features or detected anomalies [23]) as well as addi- IEEE Transactions on Visualization and Computer Graphics (Proc. Info- tional visualizations (e.g., dots plots annotated with summary statistics, Vis), 15(6):1121–1128, 2009. violin plots, cartographic maps) and data types (e.g., networks). While [6] M. Bostock, V. Ogievetsky, and J. Heer. D3: Data-driven documents. IEEE we have taken initial steps to support reading of multiple charts in Transactions on Visualization and Computer Graphics (Proc. InfoVis), context (C4), more work is needed to formalize and evaluate this goal. 17(12):2301–2309, 2011. Additional interaction techniques might further aid analysis. An [7] S. M. Casner. Task-analytic approach to the automated design of graphic obvious candidate is to support brushing & linking within Voyager’s vi- presentations. ACM Transactions on Graphics (TOG), 10(2):111–151, sualization gallery. Other possibilities may require additional research. 1991. For example, trellis plots are a valuable method for multidimensional [8] W. S. Cleveland and R. McGill. Graphical perception: Theory, experimen- visualization, but often require substantial screen real estate at odds tation, and application to the development of graphical methods. Journal with a rapidly scannable gallery. An open question (c.f., [13]) is how of the American Statistical Association, 79(387):531–554, 1984.

10. [9] S. Few. Now you see it: simple visualization techniques for quantitative sis, and Visualization of Multidimensional Relational Databases. IEEE analysis. Analytics Press, 2009. Transactions on Visualization and Computer Graphics, 8(1):52–65, 2002. [10] O. Gilson, N. Silva, P. W. Grant, and M. Chen. From web data to visual- [36] E. R. Tufte. The visual display of quantitative information, volume 2. ization via ontology mapping. Computer Graphics Forum, 27(3):959–966, Graphics press Cheshire, CT, 1983. 2008. [37] J. W. Tukey. Exploratory data analysis. Reading, Ma, 231:32, 1977. [11] D. Gotz and Z. Wen. Behavior-driven visualization recommendation. [38] S. van den Elzen and J. J. van Wijk. Small multiples, large singles: A In Proceedings of the 14th international conference on Intelligent user new approach for visual data exploration. Computer Graphics Forum, interfaces, pages 315–324, 2009. 32(3pt2):191–200, 2013. [12] S. Green, J. Heer, and C. D. Manning. The efficacy of human post-editing [39] M. Vartak, S. Madden, A. Parameswaran, and N. Polyzotis. SeeDB: for language translation. In Proc. ACM Human Factors in Computing Automatically generating query visualizations. Proceedings of the VLDB Systems (CHI), 2013. Endowment, 7(13):1581–1584, 2014. [13] R. Hafen, L. Gosink, J. McDermott, K. Rodland, K.-V. Dam, and W. Cleve- [40] Vega: A visualization grammar. https://github.com/trifacta/vega. land. Trelliscope: A system for detailed visualization in the deep analysis [41] M. Voigt, S. Pietschmann, L. Grammel, and K. Meissner. Context-aware of large complex data. In Proc. IEEE Large-Scale Data Analysis and recommendation of visualization components. In Proceedings of the Visualization (LDAV), pages 105–112, Oct 2013. 4th International Conference on Information, Process, and Knowledge [14] M. Hearst. Search user interfaces. Cambridge University Press, 2009. Management, pages 101–109, 2012. [15] J. Heer and B. Shneiderman. Interactive dynamics for visual analysis. [42] C. Ware. Information visualization: perception for design. Elsevier, 2012. Commun. ACM, 55(4):45–54, Apr. 2012. [43] R. W. White and R. A. Roth. Exploratory search: Beyond the query- [16] J. Heer, F. Van Ham, S. Carpendale, C. Weaver, and P. Isenberg. Creation response paradigm. Synthesis Lectures on Information Concepts, Retrieval, and collaboration: Engaging new audiences for information visualization. and Services, 1(1):1–98, 2009. In Information Visualization, pages 92–133. Springer, 2008. [44] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer, [17] H. V. Henderson and P. F. Velleman. Building multiple regression models 2009. interactively. Biometrics, pages 391–411, 1981. [45] L. Wilkinson. The Grammar of Graphics. Springer, 2005. [18] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluat- [46] G. Wills and L. Wilkinson. Autovis: automatic visualization. Information ing collaborative filtering recommender systems. ACM Trans. Inf. Syst., Visualization, 9(1):47–69, 2010. 22(1):5–53, Jan. 2004. [47] K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for [19] E. Horvitz. Principles of mixed-initiative user interfaces. In Proc. ACM image search and browsing. In Proc. ACM Human Factors in Computing Human Factors in Computing Systems (CHI), pages 159–166, 1999. Systems (CHI), pages 401–408, 2003. [20] J. Huang, R. White, and G. Buscher. User see, user point: gaze and cursor [48] M. X. Zhou and M. Chen. Automated generation of graphic sketches by alignment in web search. In Proc. ACM Human Factors in Computing example. In IJCAI, volume 3, pages 65–71, 2003. Systems (CHI), 2012. [21] S. Kairam, N. H. Riche, S. Drucker, R. Fernandez, and J. Heer. Refinery: Visual exploration of large, heterogeneous networks through associative browsing. Computer Graphics Forum (Proc. EuroVis), 34(3), 2015. [22] Y. Kammerer, R. Nairn, P. Pirolli, and E. H. Chi. Signpost from the Masses: Learning Effects in an Exploratory Social Tag Search Browser. In Proc. ACM Human Factors in Computing Systems (CHI), pages 625–634, 2009. [23] S. Kandel, R. Parikh, A. Paepcke, J. M. Hellerstein, and J. Heer. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proc. Advanced Visual Interfaces (AVI), pages 547–554. ACM, 2012. [24] J. Mackinlay. Automating the design of graphical presentations of rela- tional information. ACM Transactions on Graphics, 5(2):110–141, 1986. [25] J. Mackinlay, P. Hanrahan, and C. Stolte. Show me: Automatic presenta- tion for visual analysis. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis), 13(6):1137–1144, 2007. [26] G. Marchionini. Exploratory search: from finding to understanding. Com- munications of the ACM, 49(4):41–46, 2006. [27] J. Marks, B. Andalman, P. A. Beardsley, W. Freeman, S. Gibson, J. Hod- gins, T. Kang, B. Mirtich, H. Pfister, W. Ruml, et al. Design galleries: A general approach to setting parameters for computer graphics and anima- tion. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 389–400. ACM Press/Addison-Wesley Publishing Co., 1997. [28] D. S. Moore and G. P. McCabe. Introduction to the Practice of Statistics. WH Freeman/Times Books/Henry Holt & Co, 1989. [29] Perry, Daniel B and Howe, Bill and Key, Alicia MF and Aragon, Cecilia. VizDeck: Streamlining exploratory visual analytics of scientific data. In Proc. iSchool Conference, 2013. [30] D. Ren, T. Hollerer, and X. Yuan. ivisdesigner: Expressive interactive de- sign of information visualizations. Visualization and Computer Graphics, IEEE Transactions on, 20(12):2092–2101, 2014. [31] S. F. Roth, J. Kolojejchick, J. Mattis, and J. Goldstein. Interactive graphic design using automatic presentation knowledge. In Proc. ACM Human Factors in Computing Systems (CHI), pages 112–117. ACM, 1994. [32] A. Satyanarayan and J. Heer. Lyra: An interactive visualization design environment. In Computer Graphics Forum, volume 33, pages 351–360. Wiley Online Library, 2014. [33] J. Seo and B. Shneiderman. A rank-by-feature framework for interactive exploration of multidimensional data. Information Visualization, 4(2):96– 113, 2005. [34] S. S. Stevens. On the theory of scales of measurement, 1946. [35] C. Stolte, D. Tang, and P. Hanrahan. Polaris: A System for Query, Analy-