ImMens: Real-time Visual Querying of Big Data

ImMens: Real-time Visual Querying of Big Data

1.imMens : Real-time Visual Querying of Big Data Zhicheng Liu ⋆, Biye Jiang ‡ and Jeffrey Heer⋆ presenter Tao mo

2.Content 1 Introduction 2 Related Work 3 Techniques Being Used 4 Demo 5 Future Work

3.Introduction Data Visualization - Big Data Traditional data visualization tools are often inadequate

4.Introduction Data Visualization - Big Data perceptual Two challenges: resolution of conventional displays (~1-3 million pixels) interactive scalability Querying large data stores can incur high latency, disrupting fluent interaction Solution? Data reduction methods

5.Related Work F iltering Sampling Aggregation perceptual and interactive scalability should be limited by the chosen resolution of the visualized data, not the number of records .

6.Related Work Filtering & Sampling S imple random sampling Systematic sampling Stratified sampling requiring prior knowledge

7.Related Work Binned Aggregation Binning aggregates data and visualizes density by counting the number of data points falling within each predefined bin.

8.Techniques Being Used 1 Designing Binned Plots 2 Enabling Interaction in Binned Plots 3 Some Details

9.Techniques Being Used 1 Designing Binned Plots Why? conveys both global patterns (e.g., densities) and local features (e.g., outliers), while enabling multiple levels of resolution via the choice of bin size. four million user checkins

10.Techniques Being Used 1 Designing Binned Plots How? Design Space

11.Techniques Being Used 2 Enabling Interaction in Binned Plots Latency! Panning and zooming -> finer grained bins Brushing & linking -> computing aggregates filtered by an initial data selection How? Multivariate Data Tiles

12.Techniques Being Used 2 Enabling Interaction in Binned Plots Multivariate Data Tiles Data Cube - > Multivariate Data Tiles Apply binned aggregation to get data cube.(Too big to store in memory) Decompose full cube into sub-cubes (at most four dimensions) A ggregating data tiles - > Brushing & linking behavior

13.Techniques Being Used

14.Techniques Being Used

15.Techniques Being Used

16.Techniques Being Used 2 Enabling Interaction in Binned Plots Query? For each output bin (summed value), we can use a simple loop that accesses only the bins needed for that summation. Consider a 3D data tile T with dimensions ( d 1 , d 2 , d 3 ), with respective bin counts ( c 1 , c 2 , c 3 ). If users brush a 2D binned plot of d 2 and d 3 to select ranges R 2 and R 3 , we can compute the summed value v at index i of the d 1 projection using Algorithm 1 . With this simple roll-up procedure, we can run the algorithm in parallel for all c 1 indices.

17.Techniques Being Used 3 Some Details Storing Data Tiles as Image Files Parallel Query and Render via Shader Programs

18.Demo 4.5 million  user checkins on Brightkite 35.6 million  flight delays in the U.S. from 1989 to 2008 10K to 1B synthetic data points visualized as  scatterplot matrices (SPLOM) https:// / uwdata / imMens https:// / uwdata / imMens /wiki

19.Future Work Other possible data reduction method apply here other than binned aggregation? Brushing of more than four dimension? V isualization construction UI ?

20.Q&A Questions?