Apache Spark驱动的度量规模调整

调整调整Apache SCAP可能是复杂和困难的,因为有许多不同的配置参数和度量。随着运行在LinkedIn集群上的Spark应用程序变得更加多样化和数量众多,小型Spark专家团队不再能够帮助单个用户调试和调优Spark应用程序。用户需要能够快速获得建议并迭代他们的开发,并且需要及时发现任何问题以保持集群的健康。

1.Metrics-Driven Tuning of Apache Spark at Scale Edwina Lu and Ye Zhou, #Exp7SAIS

2.Hadoop Infra @ LinkedIn • 10+ clusters • 10,000+ nodes • 1000+ users #Exp7SAIS 2

3.Spark @ LinkedIn Average Daily Resource Usage Number of Applications per Day 8.00 3000 7.00 2500 6.00 5.00 2000 4.00 3.00 1500 2.00 1.00 1000 0.00 500 ry ch ril ay y r r be be ar Ap ra ar M nu em em bu M Ja Fe ov ec 0 N D ch ril er y r ay y r r be be be ar ar Spark Non-Spark Ap ob ar M bu nu em em em M ct Fe Ja O pt ov ec Se N D Number of daily Spark apps for one cluster: close to Spark applications consume 25% of resources, 3K, a 2.4x increase in last 3 quarters average daily Spark resource consumption: 1.6 PBHr #Exp7SAIS 3

4.What We Discovered About Spark Usage Executor Memory Only ~34% of allocated memory was actually used. 5% Example application: Peak Used JVM 34% Memory § 200 executors Unused Executor Memory § spark.driver.memory: 16GB Reserved Memory § spark.executor.memory: 16GB 61% § Max executor JVM used memory: 6.6GB § Max driver JVM used memory: 5.4GB § Total wasted memory: 1.8TB § Time: 1h #Exp7SAIS 4

5.Memory Tuning: Motivation • Memory and CPUs cost money • These are limited resources, so must be used efficiently • With 34% of allocated memory used, if memory usage is more efficient, we can run 2-3 times as many Spark applications on the same hardware #Exp7SAIS 5

6. Memory Tuning: What and How to Tune? • Spark tuning can be complicated, with many metrics and configuration parameters • Many users have limited knowledge about how to tune Spark applications #Exp7SAIS 6

7.Memory Tuning: Scaling • Data scientist and engineer time cost even more money • Analyzing applications and giving tuning advice in person does not scale for the Spark team or users who must wait for help • Infrastructure efficiency vs. developer productivity – Do we have to choose between these two? #Exp7SAIS 7

8.Dr. Elephant • Performance monitoring and tuning service • Identify badly tuned applications and causes • Provide actionable advice for fixing issues • Compare performance changes over time #Exp7SAIS 8

9.Dr. Elephant: How does it Work? Resource History Manager Server Run Dr. Elephant UI Rule 1 Application Metrics Run Fetcher Fetcher Rule 2 Database Run Rule 3 #Exp7SAIS 9

10.Challenges for Dr. Elephant to Support Spark • Spark tuning heuristics – What are the necessary metrics to enable effective tuning? • Fetch Spark history – Spark components are not equally scalable #Exp7SAIS 10

11.Spark Memory Overview UNIFIED MEMORY spark.memory.fraction = 0.6 Executor Container JVM USED MEMORY EXECUTOR MEMORY Execution Memory Storage Memory spark.memory.storageFraction Executor Memory spark.executor.memory User Memory 1 – spark.memory.fraction = 0.4 Overhead (off-heap memory) spark.yarn.executor.memoryOverhead max(executorMemory * 0.1, 384MB) Reserved Memory 300 MB #Exp7SAIS 11

12.Executor JVM Used Memory Heuristic Executor JVM Used Memory Severity: Severe Spark Wasted Memory The configured executor memory is much higher than the Executor maximum amount of JVM used by executors. Please set Memory spark.executor.memory to a lower value. 16 GB spark.executor.memory: 16 GB Max executor peak JVM used memory: 6.6 GB Suggested spark.executor.memory: 7 GB MB 275.9 Peak JVM Used Memory Reserved MB 300 Memory #Exp7SAIS 12

13.Executor Unified Memory Heuristic Executor Peak Unified Memory Severity: Critical Wasted Memory The allocated unified memory is much higher than the maximum amount of unified memory used by executors. Please lower Unified spark.memory.fraction. Memory 8.36 spark.executor.memory: 10 GB GB spark.memory.fraction: 0.6 Allocated unified memory: 6 GB Peak 474.42 KB Max peak JVM used memory: 7.2 GB Max peak unified memory: 1.2 GB Unified Memory Suggested spark.memory.fraction: 0.2 #Exp7SAIS 13

14.Execution Memory Spill Heuristic Execution Memory Spill Severity: Severe Execution memory spill has been detected in stage 3. Shuffle read bytes and spill are evenly distributed. There are 200 tasks for this stage. Please increase spark.sql.shuffle.partitions, or modify the code to use more partitions, or reduce the number of executor cores. Executor Memory Unified spark.executor.memory 10 GB Memory spark.executor.cores 3 spark.executor.instances 300 Stage 3: Median shuffle read bytes: 954 MB Max shuffle read bytes: 955 MB Disk Median shuffle write bytes: 359 MB Max shuffle write bytes: 388 MB Median memoryBytesSpilled: 1.2 GB Max memoryBytesSpilled: 1.2 GB Num tasks: 200 #Exp7SAIS 14

15.Executor GC Heuristic Executor GC Executor Runtime Severity: Moderate Executors are spending too much time in GC. Please increase spark.executor.memory. GC Time Spark.executor.memory: 4 GB GC time to executor run time ratio: 0.164 13 Seconds Total executor run time: 1 Hour 15 Minutes Total GC time: 12 Minutes 2 Minutes #Exp7SAIS 15

16.Automating Spark Tuning with Dr. Elephant @ LinkedIn Yes Well Tuned? Ship It! Development Production No Tune It! #Exp7SAIS 16

17. Architecture Task Executor Cache Driver Heartbeats Task Scheduler Task Task DAG Scheduler Executor Cache Task Task Task Heartbeats Web REST UI API Listener Bus HDFS EventLogging Spark Executor Cache Task Listener History AppState Spark Server Task Task Heartbeats Listener History Logs #Exp7SAIS 17

18.Upstream Ticket SPARK-23206: Additional Memory Tuning Metrics • New executor level memory metrics: – JVM used memory – Execution memory – Storage memory – Unified memory • Metrics sent from executors to driver via Heartbeat • Peak values for executor metrics logged at stage end • Metrics exposed via web UI and REST API #Exp7SAIS 18

19.Overview of our Solution Spark History Enhancements Benefits brought by Server (SHS) on SHS enhanced SHS Scalable application history Debug provider Easy investigation of past applications Scalable application Dr Elephant metrics provider Performance analysis at scale #Exp7SAIS 19

20.Spark History Server (SHS) at LinkedIn Log Parsing >>>>>>>>> >>>>>>>>> Web UI Rest APIs >>>>>>>>> >>>>>>>>> #Exp7SAIS 20

21.How does SHS work? SHS http://www.yoursite.com Queued Jetty Handlers Thread Pool Thread Pool http://www.yoursite.com Update Create Listing DB Apps DBs http://www.yoursite.com SPARK-18085 #Exp7SAIS 21

22.Not Happy Log Parsing >>>>>>>>> Rest APIs Web UI >>>>>>>>> >>>>>>>>> #Exp7SAIS 22

23.SHS Issues • Missing applications – Users cannot find their applications on the home page • Extended loading time – Application details page take a very long time (up to 0.5 hour) to load • Handling large history files – SHS gets completely stalled • Handling high-volume concurrent requests – SHS doesn’t return expected JSON response #Exp7SAIS 23

24.Missing Applications Check it out on 1 Start running 3 SHS Submit Job 2 Job Failed 4 #Exp7SAIS 24

25. Extended Loading Time Finally it shows Keep loading… 5 up 7 No response Wait SHS to catch 6 Check out the 8 up details #Exp7SAIS 25

26.Extended Listing Delay Listing DB Update History Files 1. Replay same file multiple times 2. Limited threads for the replay 3. Processing time proportional to file size #Exp7SAIS 26

27. How to Decrease the Listing Delay • Use HDFS Extended Attributes Write log file extended Read from attributes key/value 1 extended attributes Listing DB NameNode Spark Driver 2 Read from log content Write log file content when fail to read from extended attributes #Exp7SAIS 27

28.Extended Loading Delay SHS Apps DBs Request Replay Response Replaying all the events takes a long time for large log file #Exp7SAIS 28

29.How to Decrease the Loading Delay • DB creation time is unavoidable • Start DB creation prior to User’s request for every application log file SHS Apps DBs Request Request Replay Response #Exp7SAIS 29