- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
如何在睡觉时调整你的工作
展开查看详情
1 .TuneIn: How to get your jobs tuned while sleeping Manoj Kumar, LinkedIn Arpan Agrawal, LinkedIn #Res2SAIS
2 . OUR VISION Create economic opportunity for every member of the global workforce
3 . OUR MISSION Connect the world’s professionals to make them more productive and successful
4 .Agenda • Why TuneIn? • How does TuneIn work? • Architecture and framework features • Road ahead #Res2SAIS 4
5 .Grid Scale at LinkedIn 2008 2018 1 cluster 10+ clusters 20 nodes 1000s of nodes 5 users 1000s of active users MapReduce Pig, Hive, Spark, etc. Few workflows 10000s workflows #Res2SAIS 5
6 .Typical Conversations Hey, this Spark I will tune it to job is running improve the slowly. run time. Manager Developer #Res2SAIS 6
7 .Typical Conversations We have found some jobs which I will ask my team are consuming to tune those jobs high resources on to reduce the the cluster. resource usage. Hadoop Admin Manager #Res2SAIS 7
8 .Typical Conversations Is there a way we can get this I will try to tune daily report 30 it to reduce the minutes early? run time. Client Developer #Res2SAIS 8
9 .Why Tuning? • Optimal parameter configuration: – leads to better cluster utilization and thus savings – reduces the execution time • Default configuration is not always optimal #Res2SAIS 9
10 .Manual Tuning PHASE 3 PHASE 1 Come up with next Execute parameter set Manual Job Tuning Process PHASE 2 Observe the Execution Metrics #Res2SAIS 10
11 .Dr. Elephant: Heuristic based tuning • Suggests tuning recommendations PHASE 1 PHASE 3 based on pre-defined heuristics Come up with next Execute parameter set Heuristics Job • No need to worry about the Based Manual Tuning hundreds of counters and parameters • Relies on user’s initiative to use the recommendations PHASE 2 • Expects some user expertise Look at the Dr. Elephant recommendations #Res2SAIS 11
12 .#Res2SAIS 12
13 .Why Auto Tuning? • 10000s of jobs to tune • Increases developer productivity • Tunes without any extra effort • No expertise is expected • Option of which objective function to tune for – resource usage – execution time etc. #Res2SAIS 13
14 .Let’s auto tune! #Res2SAIS 14
15 .TuneIn • Framework to automatically tune recurring Hadoop and Spark jobs • Iteratively tries to reach the optimal configuration • Results : 20-35% reduction in Resource Usage #Res2SAIS 15
16 .Particle Swarm Optimization (PSO) [1] • Mimics the behavior of swarm of birds searching food • Introduces a population of candidate solution particles in the search space Source: Wikipedia Particle Swarm Optimization by J. Kennedy et al., https://ieeexplore.ieee.org/document/488968/ #Res2SAIS 16
17 .PSO (contd.) • Points of attraction: personal and swarm’s best known positions • Particles converge to the region with the minimum cost function value Source: Wikipedia #Res2SAIS 17
18 .Why PSO? • Cost function is noisy – PSO is gradient free and robust to noise [3] • Spark and Hadoop are complex systems – PSO is a metaheuristic black box optimization algorithm • Fastest convergence K. E. Parsopoulos et al., “Particle Swarm Optimizer in Noisy and Continuously Changing Environments,” in Artificial Intelligence and Soft Computing #Res2SAIS 18
19 .PSO Details [2] • Swarm size of 3 gives the best result – neither too small to cover the search space – nor too big to do many first iteration random searches • Good starting point is important to guide the swarm Optimizing Hadoop parameter settings with gene expression programming guided PSO by Mukhtaj Khan et al. #Res2SAIS 19
20 .Cost function • Resource usage per unit input ∑"#$%&'$()* +,-./0-12 314,25 ∗ +,-./0-12 78.041 9,./: ;-8<. =0>1 • Approximately input size invariant #Res2SAIS 20
21 .Search Space • Parameters being tuned constitutes Param 3 the search space • Parameters to tune depends on the cost function metric Param 2 Param 1 #Res2SAIS 21
22 .Search Space Cost function: Resource Usage Pig Spark mapreduce.map.memory.mb spark.executor.memory mapreduce.reduce.memory.mb spark.executor.cores mapreduce.task.io.sort.mb spark.memory.fraction mapreduce.task.io.sort.factor spark.yarn.executor.memoryOverhead #Res2SAIS 22
23 .Search Space Optimization • Important to prevent failures • Speeds up convergence • Boundary parameter values – e.g. !"#$%. '(')*+,$. ),$'! ∈ 1, 10 • Parameter interdependent constraints – Captures the interdependence among the parameters – e.g. 1#"$'2*)'. +#!%. 3,. !,$+. 14 < 0.60 ∗ 1#"$'2*)'. 1#". 1'1,$8. 14 #Res2SAIS 23
24 . Avoiding over optimization • Undesirable to squeeze memory so much that execution time shoots up significantly • Updated cost function: ∑"#$%&'$()* +,-./0-12 314,25 ∗ +,-./0-12 78.041 + @1-/:.5 9,./: ;-8<. =0>1 #Res2SAIS 24
25 .Convergence • No theoretical bound on the steps to converge • Practically converges in 20 job executions • TuneIn gets turned off for the job automatically on convergence #Res2SAIS 25
26 .Results Job type Metric Average reduction Spark Resource Usage 30 - 40 % per job Pig Resource Usage 20 - 35 % per job #Res2SAIS 27
27 .Architecture Dr. Elephant 1. Get Parameters Rest API 2. Mapper memory: 2048 Sort Buffer: 200 3. Submit Job 4. Fetch Metrics MapReduce/Spark TuneIn Framework Fetchers #Res2SAIS 28
28 .Framework Features Generic Framework Tuning During Regular • Resource Usage, Execution Time Scheduled Runs • Pig, Hive, Spark • Easy Integration Failure Avoidance • Constraints on parameters Auto Switch Off • Automatic Failure Handling #Res2SAIS 29
29 .Road Ahead • Tuning for execution time • Faster convergence using Intelligent Parameter Space Optimization (IPSO) • Smarter tuning switch on/off #Res2SAIS 30