SIGMOD2018-iQCAR

Unpredictability in query runtimes can arise in a shared cluster as a result of resource contentions caused by inter-query interac- tions. iQCAR - inter Query Contention AnalyzeR is a system that formally models these interferences between concurrent queries and provides a framework to attribute blame for contentions.
展开查看详情

1. iQCAR-demo: A demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy Duke University {pkalmegh,hgl2,hx26,shivnath,sudeepa}@cs.duke.edu ABSTRACT available slots1 among the contending tenants. As a result, there Unpredictability in query runtimes can arise in a shared cluster are no guarantees on the usages of other resources like memory, as a result of resource contentions caused by inter-query interac- disk IO, or network bandwidth for competing queries leading to tions. iQCAR - inter Query Contention AnalyzeR is a system that inter-query resource interferences. This is a major concern in to- formally models these interferences between concurrent queries day’s clusters as performance issues due to resource contentions and provides a framework to attribute blame for contentions. iQCAR are often wrongly diagnosed, or are left unresolved due to lack leverages a multi-level directed acyclic graph called iQC-Graph to of appropriate tools. It is, thus, important to analyze the victims diagnose the aberrations in query schedules that lead to these re- (that we call a target query) and sources of these contentions (that source contentions. The demonstration will enable users to perform we call source queries), especially identifying why and where a a step-wise deep exploration of such resource contentions faced by target query faces contentions from a source query. This can help a a query at various stages of its execution. The interface will allow cluster administrator diagnose aberrations in resource allocations users to identify top-k victims and sources of contentions, diagnose among tenants or devise alternative query placement strategies. high-contention nodes and resources in the cluster, and rank their For example, ranking the tenants based on their contention impact impacts on the performance of a query. Users will also be able to towards a target query can prove particularly useful to revisit the navigate through a set of rules recommended by iQCAR to compare resource shares of these tenants. how application of each rule by the cluster scheduler resolves the contentions in subsequent executions. ACM Reference Format: Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy. 2018. iQCAR-demo: A demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks. In Proceedings of ACM Con- ference (Conference’17). ACM, New York, NY, USA, 4 pages. https://doi.org/ 10.475/123_4 1 INTRODUCTION Large scale data analytics frameworks like Hadoop [10] and Spark [14] process a mix of short-running interactive BI (Business Intelligence) queries along with long-running ETL or batch analytics queries. Recurring queries co-exist with adhoc unplanned queries. Ana- lytical SQL queries with varying resource utilizations over time Figure 1: iQC-Graph(V,E): The sub-graph in Explanations share the cluster with machine learning, graph analytics, and data Layer is unique for each stage in Level-1 mining queries. In such shared clusters, resources are allocated to multiple tenants executing mixed workloads based on their prior- In this demonstration, we will present iQCAR - inter Query ities, SLAs (Service-Level Agreements), minimum share, etc. As Contention AnalyzeR, a system to explore contentions faced by such, resource allocations are controlled by the cluster scheduler queries due to inter-query interactions on a cluster computing using sophisticated arbitration techniques like capped capacities, framework. iQCAR interface allows users to interact with a multi- reservations or use of scheduling policies like FAIR [13] or First-In- level directed acyclic graph (DAG) to progressively unravel three First-Out (FIFO). Most of these techniques rely on partitioning of levels of explanations; namely (i) Immediate Explanations (IE): identify disproportionate waiting times spent by a query blocked for a particular resource, (ii) Deep Explanations (DE): inspect this Permission to make digital or hard copies of part or all of this work for personal or waiting time for every resource used by the query on all hosts where classroom use is granted without fee provided that copies are not made or distributed it executed, and finally (iii) Blame Explanations (BE): quantify for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. the impact from concurrent queries towards the slowdown for a For all other uses, contact the owner/author(s). target query. Figure 1 presents the different levels of explanations Conference’17, July 2017, Washington, DC, USA in iQC-Graph. Additionally, users will be able to filter and navigate © 2018 Copyright held by the owner/author(s). ACM ISBN 123-4567-24-567/08/06. 1 We refer to a slot as the smallest unit of resource allocation. For example, its a CPU https://doi.org/10.475/123_4 core in Spark and a combination of CPU and Memory in Hadoop.

2.Conference’17, July 2017, Washington, DC, USA Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy through cluster-level summary visual aids on: (a) high contention wlSEL and wlSUB: The workload selection module (wlSEL) al- resources and nodes, (b) high impact causing source queries, and lows users to select a pre-executed workload for analysis. To let (c) high impact receiving target queries. Finally, users will be able users simulate a workload on Spark, the workload submission in- to browse through a list of alternate schedule rules recommended terface (wlSUB) allows them to select a list of benchmark TPCDS by iQCAR and compare the results of applying them in recurring queries, the order of these queries and finally their arrival schedule execution of the workloads. (fixed-delay or poisson or user-input start times). The users can also 2 IQCAR specify the interval (in seconds) for collecting the task execution metrics. By default, we collect metrics after the completion of each iQCAR system automates the process of (i) collecting, parsing and persisting the query execution and cluster performance logs, (ii) task, stage, job or query in Spark2 . construction and persistence of iQC-Graph, (iii) quantifying con- tention impact and blame attribution at various levels, and (iv) generation and application of rules for cluster scheduler to apply in subsequent execution of the workload. We present the multi- layered iQC-Graph in Section 2.1, and discuss how the architecture shown in Figure 2 facilitates each of these tasks in Section 2.2. Due to space constraints, we present the complete derivation of our blame attribution metric and the detailed steps of the methodology to construct the graph, computing the node-weights, edge weights, and other metrics in [6]. 2.1 Multi-layered iQC-Graph Level-0 of iQC-Graph consists of the target queries to be analyzed and Level-1 contains the stages of each target query. Level-5 and Level-6 constitute the concurrently running source stages and source queries respectively. The explanations layer consisting of Levels 2, 3 and 4 provides explanations of different forms and gran- Figure 2: iQCAR Architecture ularity. It enables us to connect the two ends of iQC-Graph with appropriate assessment of contention impact among all intermedi- iQCAR CORE: The offAGGR module collects and aggregates clus- ate nodes and edges. ter logs for queries executed on Spark offline, and parses them. For 2.1.1 Immediate Explanations (IE):. Level-2 vertices provide an an online execution analysis, the onAGGR module collects the exe- explanation of the form ‘how much time was spent by a stage waiting cution metrics using the REST API interface of Spark 3 and streams for a particular resource per unit of data processed’. For every stage them through Apache Kafka [1] to a MySQL [2] database. The data S tq at Level-1 of target query Q t q , we add an IE vertex at Level- model builder module (dmBUILD) uses this input to build the data 2 for every resource (scheduling queue, CPU, Memory, Network, model for iQCAR. Users can specify in their configuration whether IO) used by stage S t q , and store the value of its wait-time for this to persist this data model in a CSV format or MySQL store. By resource per unit data processed as its Vertex Contribution (VC). default, we persist in a CSV format for later easy integration with 2.1.2 Level-3: Deep Explanations (DE):. Level-3 captures the hosts our iQCARViz API. iQCAR also provides an easy export from our responsible towards the corresponding disproportionality in the MySQL store to the iQCARViz data frames and series objects. Users wait time components for every resource. That is, DEs keep track of use hints from the iQCARViz interface (described shortly) to select a the wait-time distributions per unit of data processed by stage S tq list of target queries for deep exploration. The graph builder module for a specific resource r on each host h of execution. That is, for (grBUILD) uses our parallel graph construction algorithm (see [6]) each IE node in Level 2, we add h DE nodes in Level 3 corresponding to build iQC-Graph using the Neo4j graph API [7]. By default, we to all hosts on which the tasks of S t q executed. persist a Neo4j graph instance for every workload, and reload the 2.1.3 Level-4: Blame Explanations (BE): . Blame Explanations graph when user requests for a deep exploration of the selected is a novel contribution of iQCAR. For each vertex v in Level-3 (DE) target queries through the iQCARViz interface. corresponding to S t q , host h, and type of resource request r , if tasks blameAnalyzer: A graph-based model enables us to consolidate of S t q were concurrent with tasks of P stages of other queries on the contention and blame values for a systematic deep exploration. host h, we add P nodes u in Level 4 and connect them to v. To As such, to enable a comparison of contention impacts at various compute the blame to be assigned to a concurrent stage, we first levels of iQC-Graph, the blameAnalyzer consists of three modules compute the blame from a source task st (task of concurrent stage) that calculate the following contention measures: The Vertex Con- to a target task tt executing concurrently on host h while using tribution (VC) values are computed using the VC-calc that measures resource r and use it to compute the VC of each BE node. the standalone impact of any vertex towards the contention faced by a target query. The VC values of different vertices depend on 2.2 Architecture 2 iQCAR provides a cluster interface to analyze existing Spark work- Please refer to [6] for a complete terminology and background information 3 We extended the Spark metrics accumulator API to publish our custom wait-time load execution logs, submit new Spark applications, or simulate an metrics for all resources (scheduling delay, CPU, Network, Memory, and IO blocked existing workload using TPCDS [3] benchmark queries. times) to the REST interface.

3.iQCAR-demo: A demonstration of an Inter-Query Contention Analyzer for Cluster Computing Conference’17, Frameworks July 2017, Washington, DC, USA Figure 3: iQCAR visual aids for answering cluster-level contention analysis questions. the level to which the vertex belongs to, and are computed dur- and contrast the results of applying the iQCAR rules on a benchmark ing the graph construction process as described in Section 2.1. We workload’s performance. To achieve these goals, we will divide the then perform a top-down pass on this graph to update the edge demo in three segments, namely (a) manual analysis of pre-executed weights using the IF-calc, and next do a bottom-up pass to update TPCDS benchmark execution, (b) deep exploration of contentions, the responsibility measure of each vertex using the DOR-calc. The and (c) analyze the output of iQCAR. Impact Factor (IF) of an edge gives the impact a source vertex of the edge has towards the end vertex. Degree of Responsibility (DOR) of a vertex is defined as the cumulative impact this vertex has towards a target query. rlGEN and rlAPP: The rule generator (rlGEN) module uses the consolidated contention measures from blameAnalyzer and out- puts two types of rules: (i) Alternate Query Placement: rlGEN out- puts the top-k aggressive queries (high-impact towards all selected target queries) and generates k rules that recommend placing each of these queries in a new pool with new recommended shares for each of them. (ii) QueryPriority Readjustment: rlGEN produces k priority rectification rules for each of the top-k affected target (a) (b) queries that suffered the highest impact, and top-k impacting source queries. The rlAPP module is an extension to the Spark standalone Figure 4: (a) Screen to select a workload for contention analysis. (b) scheduler that parses the rules and applies any active and applicable Screen to perform a time-series analysis of iQC-Graph. rules during a recurring execution of the workload. Setup: Users will be given two options to analyze a workload: (i) iQCAR VIZ : The iQCAR visualization module is a web based select a pre-executed microbenchmark workload, or (ii) submit front-end that enables users to (i) select a workload for analysis us- a workload by selecting a set of TPCDS queries along with their ing our wlSEL module and visualize its key characteristics using the arrival pattern. A sample screen for this workload selection is shown wlVIZ interface, (ii) explore summary of query execution and clus- in Figure 4a. The workloads will be executed on a 10-node cluster ter performance using the perfVIZ interface that provides tips on setup with Apache Spark 2.2 [14] and Hadoop 2.7 [10]. selecting a single target query or a set of target queries for further exploration, (iii) delve into the task-level execution and wait-time 3.1 Segment 1: Manual Analysis distribution details for each query using the dmVIZ interface, (iv) Users will be asked to answer one randomly selected multi-choice perform a systematic deep exploration of the Neo4j iQC-Graph question on the contention faced by a query using the existing that lets users unfold explanations at various levels and analyze the monitoring tools like Spark UI [9] and Ganglia [5]. This activity will relative impacts from concurrent queries using the grVIZ visual- demonstrate the tedious process of performing a manual contention ization aids, and (v) finally, use the rlVIZ module that lets users analysis even on a small-size cluster. compare the results of applying a selected set of rules on the re- curring execution of the workload. The iQCARViz interface also 3.2 Segment 2: Explore iQCAR allows users to compare the impact of different moitoring intervals of data collection on our contention analysis metrics. Each of the The next step in our demo is to explore the interface of iQCAR that visualization module uses the iQCARViz dataframes API to render will enable users answer questions like below divided broadly into plots dynamically based on user-input online using Plotly [8] tool. the following three categories based on the level of contention details they provide: Summary Questions: Users can choose to browse through a series 3 DEMONSTRATION of cluster summary visual aids for our sample workloadas illustrated The purpose of this demonstration is to (i) showcase the users the in Figure 3. tedious process of contention analysis in the absence of iQCAR, ● Q1: Which hosts in the cluster had the highest CPU contention? (ii) enable users with a hands-on experience of using iQCAR for ● Q2: On which resource were all queries bottlenecked the most? insightful analysis, and (iii) present users an opportunity to compare ● Q3: Which queries are the victim of highest contention?

4.Conference’17, July 2017, Washington, DC, USA Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy (a) (b) (c) Figure 5: (a) Screen to aid users identify queries for further exploration. It shows the runtime hit of all queries compared to their uncon- strained (no contention) execution time. (b) Screen to aid users identify stages of a selected query for further exploration. (c) iQCAR visualization to select combinations of a host, resource and source queries to analyze impact on a single selected target query. ● Q4: Which queries are the cause of highest contention? frame. Figure 4b shows the iQC-Graph for our example workload Target Query Performance Analysis Questions: Figure 5a shows for the last scenario where user inputs the start and end times to how iQCAR provides a visual aid for users to select a target query input a time frame for impact analysis. for further deep exploration from a set of completed queries in a 3.3 Segment 3: Analyze iQCAR results workload. Next, Figure 5b shows how users can choose whether The final segment in the demo will enable users to examine a set they want to analyze contentions on (i) a single stage, (ii) all stages of rules output by the rlGEN module of iQCAR. Users will be able in the critical path (execution time dominating path), or (iii) all to compare the performance of the workloads (new runtime of stages of a query using iQCAR. For performing a single-query anal- each query, wait-times on all resources and/or hosts, ) after ap- ysis, users can drill-down to various levels of details by filtering plication of the top-3 rules of each type. Users can also use the on hosts, resources or specific stages of target and source queries recommended priority to choose and apply the rules for a more as shown in Figure 5c to explore the impact on a particular target real-time experience. query Q t . This interface allows users to answer questions like: Related Work: The field of explanations has been studied in ● Q5: On which user-selected combination of host h and resource many contexts like analyzing job performance [11]. In [12], the au- r , has a target query Q t (or its selected stages) spent maximum thors present a general framework to analyze data-analytical work- time waiting? loads using blocked-time metric. We use this pedestal to present ● Q6: Which queries are responsible for causing highest contention iQCAR as a first systematic attempt towards exploration of different for a target query Q t (or its selected stages) on a user-selected levels of explanations for resource contentions on cluster frame- combination of host h and resource r ? works. See [6] for a detailed discussion on related work. Source Query Performance Analysis Questions: Finally, users can also perform a top-down analysis on iQC-Graph to draw in- REFERENCES sights on how a query impacts or causes contentions to others using [1] [n. d.]. Apache Kafka: A Distributed Streaming Platform . https://kafka.apache. various filters on its stages, hosts, and resource types. Due to space org. ([n. d.]). [2] [n. d.]. MySQL: The world’s most popular open source database . https://www. constraints, the screenshots for these visualizations are not shown. mysql.com. ([n. d.]). ● Q7: Which queries were affected most by the contention caused [3] [n. d.]. TPC Benchmark™DS . http://www.tpc.org/tpcds/. ([n. d.]). [4] 2018. D3: Data Driven Documents. https://d3js.org. (2018). by a source query Q s (or its selected stages) on a user-selected [5] 2018. Ganglia Monitoring System. http://ganglia.info. (2018). combination of host h and resource r ? [6] 2018. iQCAR: An Inter-Query Contention Analyzer for Cluster Computing Frameworks. https://users.cs.duke.edu/~pkalmegh/iQCAR.pdf. (2018). The above visual aids help users to get answers to questions Q1 [7] 2018. Neo4j: A graph database. https://neo4j.com. (2018). [8] 2018. Plotly: Modern Visualization for the Data Era. https://plot.ly. (2018). to Q7 rapidly. For users who want to diagnose each contention in [9] 2018. Spark Monitoring and Instrumentation. http://spark.apache.org/docs/ detail, the iQC-Graph visualization powered by d3.js [4] provides latest/monitoring.html. (2018). a step-wise exploration opportunity. For instance, users can click [10] Doug Cutting. 2018. Apache Hadoop. http://hadoop.apache.org. (2018). [11] Nodira Khoussainova, Magdalena Balazinska, and Dan Suciu. 2012. Perfxplain: on each node to progressively expand further levels of sub-graph. debugging mapreduce job performance. PVLDB 5, 7 (2012), 598–609. Users can also click on each vertex and edge to view the values of [12] Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In our contention analysis metrics (VC, IF, and DOR). Other interface NSDI. 293–307. https://www.usenix.org/conference/nsdi15/technical-sessions/ features will let the users (i) highlight the path from a single source presentation/ousterhout query to a target query with highest path weight (useful for pro- [13] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay scheduling: a simple technique for achieving viding explanations for highest impact between any two queries), locality and fairness in cluster scheduling. In Proceedings of the 5th European (ii) display all paths that add up more than a certain threshold of conference on Computer systems. ACM, 265–278. user-input impact value (useful to discover contention conditions [14] Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets (HotCloud). 10–10. beyond an acceptable threshold), and (iii) load source queries that http://dl.acm.org/citation.cfm?id=1863103.1863113 impact the selected target queries only within an user-input time