Infrastructure for Deep Learning in Apache Spark

展开查看详情

1.WIFI SSID:SparkAISummit | Password: UnifiedAnalytics

2.Infrastructure for Deep Learning in Apache Spark Kaarthik Sivashanmugam, Wee Hyong Tok Microsoft #UnifiedAnalytics #SparkAISummit

3.Agenda • Evolution of data infrastructure • ML workflow: Data prep & DNN training • Intro to deep learning and computing needs • Distributed deep learning and challenges • Unified platform using Spark – Infra considerations, challenges • ML Pipelines #UnifiedAnalytics #SparkAISummit 3

4.Organization’s Data Database / Data Warehouse Web logs Call Logs Data …… Products Images Video Feeds Organization’s data

5.Machine Learning Typical E2E Process Prepare Experiment Deploy … Orchestrate

6. + Machine Learning and Deep Learning workloads #UnifiedAnalytics #SparkAISummit 6

7.How long does it take to train Resnet-50 on ImageNet? Before 2017 14 days NVIDIA M40 GPU #UnifiedAnalytics #SparkAISummit 7

8.Training Resnet-50 on Imagenet UC Berkeley, Sony Fujitsu Facebook Preferred Network Tencent Neural Network TACC, UC Davis TensorFlow MXNet Caffe2 ChainerMN Library (NNL) Tensorflow 1 hour 31 mins 15 mins 6.6 mins 2.0 mins 1.2 mins Tesla P100 x 256 1,600 CPUs Tesla P100 x 1,024 Tesla P40 x 2,048 Tesla V100 x 3,456 Tesla V100 x 2,048 Apr Sept Nov July Nov Apr 2017 2018 2019 #UnifiedAnalytics #SparkAISummit 8

9.Considerations for Deep Learning @ Scale • CPU vs. GPU • Single vs. multi-GPU • MPI vs. non-MPI • Infiniband vs. Ethernet Credits: Mathew Salvaris https://azure.microsoft.com/en-us/blog/gpus-vs-cpus-for-deployment-of-deep-learning-models/ #UnifiedAnalytics #SparkAISummit 9

10.“Things” you need to deal with when training machine learning/deep learning models Dependencies and Containers Handling failures Schedule jobs Secure Access Distribute data Gather results Scale resources Provision VM clusters

11.Machine Learning Typical E2E Process Prepare Experiment Deploy … Orchestrate

12.Machine Learning and Deep Learning ML DL Top figure source; #UnifiedAnalytics #SparkAISummit Bottom figure from NVIDIA 12

13. TensorFlow PyTorch Lots of ML MXNet Chainer Frameworks …. Keras Scikit-Learn #UnifiedAnalytics #SparkAISummit 13

14. Design Choices for Big Data and Machine Learning/Deep Learning Laptop Spark + Spark Cloud Separate infrastructure for ML/DL training/inference #UnifiedAnalytics #SparkAISummit 14

15.Execution Models for Spark and Deep Learning Task 1 Spark Task 2 • Independent Tasks • Embarrassingly Parallel and Massively Scalable Task 3 Distributed Learning • Non-Independent Tasks • Some parallel processing • Optimizing communication between nodes Data Parallelism Model Parallelism Credits – Reynold Xin, Project Hydrogen – State of Art Deep Learning on Apache Spark #UnifiedAnalytics #SparkAISummit 15

16.Execution Models for Spark and Deep Learning Task 1 Spark Task 2 • Independent Tasks • Embarrassingly Parallel and Massively Scalable Task 3 Task Distributed Learning 1 • Non-Independent Tasks Task Task • Some parallel processing 2 3 • Optimizing communication between nodes Credits – Reynold Xin, Project Hydrogen – State of Art Deep Learning on Apache Spark #UnifiedAnalytics #SparkAISummit 16

17.Execution Models for Spark and Deep Learning Task 1 Spark Task 2 • Independent Tasks • Embarrassingly Parallel and Massively Scalable Task 3 • Re-run crashed task Task Distributed Learning 1 • Non-Independent Tasks Task Task • Some parallel processing 2 3 • Optimizing communication between nodes • Re-run all tasks Credits – Reynold Xin, Project Hydrogen – State of Art Deep Learning on Apache Spark #UnifiedAnalytics #SparkAISummit 17

18.Spark + ML/DL www.aka.ms/spark HorovodRunner Sparkflow Project Hydrogen TensorFlowOnSpark #UnifiedAnalytics #SparkAISummit 18

19. Microsoft Machine Learning for Apache Spark v0.16 Microsoft’s Open Source Contributions to Apache Spark Cognitive Spark Model LightGBM Deep Networks HTTP on Services Serving Interpretability Gradient Boosting with CNTK Spark www.aka.ms/spark Azure/mmlspark #UnifiedAnalytics #SparkAISummit 19

20.Demo - Azure Databricks and Deep Learning #UnifiedAnalytics #SparkAISummit 20

21.Demo – Distributed Deep Learning using Tensorflow with HorovodRunner #UnifiedAnalytics #SparkAISummit 21

22.Physics of Machine Learning and Deep Learning GPU Storage CPU What do you need for training / distributed training? Network Memory Deep Learning Framework

23.GPU Device Interconnect • NVLink • GPUDirect P2P • GPUDirect RDMA • Standard network stack Interconnect topology sample Credits:CUDA-MPI Blog (https://bit.ly/2KnmN58)

24.From CUDA to NCCL1 to NCCL2 Multi-GPU Communication Library CUDA NCCL 1 NCCL 2 Multi-Core GPU Multi-GPU Multi-GPU CPU Multi-Node Credits: NCCL Tutorial (https://bit.ly/2KpPP44)

25.NCCL 2.x (multi-node) Credits: NCCL Tutorial (https://bit.ly/2KpPP44)

26. NCCL 2.x (multi- node) Credits: NCCL Tutorial (https://bit.ly/2KpPP44)

27.Spark & GPU • Using GPU with Spark options: 1. Native support (cluster manager, GPU tasks): SPARK- 24615 2. Use cores/memory as proxy for GPU resources and allow GPU-enabled code execution 3. Code implementation/generation for GPU offload • Considerations – Flexibility – Data management – Multi-GPU execution #UnifiedAnalytics #SparkAISummit 27

28.Infrastructure Considerations • Data format, storage and reuse – Co-locate Data Engineering storage infrastructure (cluster-local) – DL Framework support for HDFS (reading from HDFS does not mean data-locality-aware computation) – Sharing data between Spark and Deep Learning (HDFS, Spark-TF connector, Parquet/Petastorm) • Job execution – Gang scheduling – Refer to SPARK-24374 – Support for GPU (and other accelerators) – Refer to SPARK-24615 – Cluster sharing with other types of jobs (CPU-only cluster vs. CPU+GPU cluster) – Quota management – Support for Docker containers – MPI vs. non-MPI – Difference GPU generations • Node, GPU connectivity – Infiniband, RDMA – GPU Interconnect options – Interconnect-aware scheduling, minimize distribution, repacking

29.ML Pipelines • Using machine learning pipelines, data scientists, data engineers, and IT professionals can collaborate on different steps/phases • Enable use of best tech for different phases in ML/DL workflow #UnifiedAnalytics #SparkAISummit 29