Apache Spark on K8S Best Practice and Performance in the Cloud

Kubernetes As of Spark 2.3, Spark can run on clusters managed by Kubernetes. we will describes the best practices about running Spark SQL on Kubernetes upon Tencent cloud includes how to deploy Kubernetes against public cloud platform to maximum resource utilization and how to tune configurations of Spark to take advantage of Kubernetes resource manager to achieve best performance. To evaluate performance, the TPC-DS benchmarking tool will be used to analysis performance impact of queries between configurations set.
展开查看详情

1.Jimmy Chen, Junping Du Tencent Cloud

2.Agenda • Spark service in the cloud – our practice • Spark on Kubernetes – architecture, best practice and performance

3.About us Jimmy Chen Junping Du Sr. Software Engineer at Tencent Cloud Principal Engineer & Director at Tencent Cloud Apache Hadoop, Spark, Parquet Contributor Apache Hadoop PMC & ASF Member Previously, worked at Intel Previously, worked at Hortonwork and VMware

4.About Tencent and Tencent Cloud The 331th largest company in the Fortune in 2018 Top 100 Most Valuable Global Brands 2018 ranked 1st in China, ranked 5th global World’s 50 Most Innovative Companies 2015 ranked 1st in China

5.Case study for Spark in the Cloud - Sparkling , , , , , ,

6.Sparkling Architecture Sparkling Data Warehouse Manager Work Space Tencent Cloud Infrastructure JDBC Livy-Thrift Data Application Query Engine SQL Optimizer SQL IDE Notebook Execution Analysis Engine - Spark Columnar Query Storage Work Flow Monitor Data Sync BI ML RDBMS COS Meta Data Atlas Store Catalog NoSQL ES … KAFKA … RDBMS Data Lake Data Catalog and Management

7.Motivations towards dockerized Spark • Serverless – Better isolation • On-premises deployment – Bare metal infrastructure, control protocol not work – DevOps scope – Cost

8.Our Practice • Sparkling on Kubernetes – Cloud API server -> Kubernetes API server – CVM resource -> container resource – Cloud load balancer -> Kubernetes load balancer – Cloud DNS -> Kubernetes ingress/coredns – CVM image -> docker image – etc.

9.Kubernetes New open-source cluster manager. app app - github.com/kubernetes/kubernetes libs libs app app libs libs Runs programs in Linux containers. kernel 2,000+ contributors and 75,000+ commits.

10.Kubernetes Architecture Pod, a unit of scheduling and isolation. ● runs a user program in a primary container ● holds isolation layers like a virtual IP in an infra container Pod 1 Pod 1 Pod 1 Pod 2 Pod 2 Pod 2 node A 172.17.0.5 node B 172.17.0.6 node C 172.17.0.7

11.Kubernetes Advantages • .. . . . . . • , . . ,,. , . . , • , •

12.Spark On Kubernetes Workflow spark-submit Kubernetes spark operator

13.Spark on Kubernetes Status • Spark side • Kubernetes side – Cluster/Client mode – Support spark-ctl CLI – Volume mounts – Native cron support for scheduled app – Customized Spark pod, e.g. mounting – Multiple-language support, such as: configMap and volume, setting Python, R affinity/anti-affinity – Static executor allocation – Monitoring through Prometheus • On-going Work • On-going Work – Dynamic Executor Allocation – Support GPU resource scheduler – Job Queues and Resource – Spark operator HA Management – Multiple spark version support – Spark Application Management – Priority queues

14.Sparkling on Kubernetes Architecture Kubernetes cluster Notebook Service pod Metastore pod Persistent volume DaemonSet Zeppelin Service pod Ingress Livy Server pod Airflow Service pod Thrift Server pod Grafana Service pod Datasync pod History Service pod Node Node Node Node Node pod pod pod pod pod pod pod pod pod pod

15.Submit via spark-submit Step 1: check cluster master #kubectl cluster-info Kubernetes master is running at https://169.254.128.15:60002 Step 2: submit application via spark-submit #bin/spark-submit \ --master k8s://https://169.254.128.7:60002 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.kubernetes.container.image=sherrytima/spark2.4.1:k8s \ local:///opt/spark/examples/jars/spark-examples_2.11-2.4.1.jar

16.Submit via kubectl/sparkctl • Install spark operator on your kubernetes cluster # helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator # helm install incubator/sparkoperator --namespace spark-operator • Declare application yaml • Run the application – kubectl create -f spark-pi.yaml – spark-ctl create spark-pi.yaml

17.Benchmark Environment • Software Kubernetes: 1.10.1 Hadoop/Yarn: 2.7.3 Spark: 2.4.1 OS: Centos-7.2 • Hardware Master – CPU/mem: 8 core + 64G mem – Disk: 200G SSD – NUM: 1 Slaves – CPU/mem:32 core + 128G mem – Disk: 4 HDD – NUM: 6

18.Terasort • Data Size 100G 900 Lower is better spark.executor.instance 5 800 spark.executor.cores 2 700 600 spark.executor.memory 10g Time in Seconds 500 spark.executor.memoryoverhead 2g 400 spark.driver memory 8g 300 • Data Size 500G 200 100 – Spark on kubernetes failed 0 – Spark on Yarn (~35 minutes) spark on yarn spark on k8s terasort 100g

19.Root Cause Analysis • Pod always be evicted due to disk pressure, the spark.local.dir is mounted as emptyDir. • emptyDir is ephemeral storage on node

20.Possible Solutions • mount emptyDir with RAM backed volumes. spark.kubernetes.local.dirs.tmpfs= true • Extend disk capacity of kubelet workspace

21.Result Terasort 100G lower is better 900 800 700 600 Time in Seconds 500 400 300 200 100 0 spark on yarn spark on k8s (default) spark on k8s with tmpfs

22.Spark-sql-perf • Use spark-sql-perf repo from Databricks – 10 shuffle bound queries – spark application base on spark-sql-perf • Run in cluster mode for Kubernetes bin/spark-submit --name tpcds-1000 --jars hdfs://172.17.0.8:32001/run-lib/* --class perf.tpcds.TpcDsPerf --conf spark.kubernetes.container.image=sherrytima/spark-2.4.1:k8sv8 --conf spark.driver.args="hdfs://172.17.0.8:32001/run-tpcds 1000 orc true" hdfs://172.17.0.8:32001/tpcds/tpcds_2.11-1.1.jar

23.Benchmark Result • Configuration 1200000.0 tpcds@1000 lower is better spark.executor.instance 24 1000000.0 spark.executor.cores 6 spark.executor.memory 24g 800000.0 time in ms spark.executor.memoryoverhead 4g 600000.0 spark.driver memory 8g 400000.0 • Result 200000.0 Spark on kubernetes is much 0.0 slower than spark on yarn!!! q4 q11 q17 q25 Spark on yarn q29 q64 Spark on k8s q74 q78 q80 q93

24.Root Cause • The kubelet work directory can only be mounted on one disk, so that the spark scratch space only use ONE disk. • While running in yarn mode, the spark scratch space use as many disk as yarn.local.dir configures.

25.Possible workaround • RAM backed media as spark scratch space

26.Our Solution Use specified volumes mount as spark local directory. SPARK-27499

27. Result TPCDS@1000 lower is better TPCDS@1000 lower is better 1200000.0 1000000.0 CreateTime (ms) 800000.0 time in ms 600000.0 400000.0 200000.0 0.0 q4 q11 q17 q25 q29 q64 q74 q78 q80 q93 spark on k8s orig spark on k8s optimized . 0 . .

28.Summary • Spark on Kubernetes practices – scenario in serverless and cloud native • But we hit problems in production - local storage issues, etc. • Work with community for better solution and best practices

29.Future Work • Close performance gap with yarn mode • Data locality awareness for Kubernetes scheduler • Prometheus/Grafana integration