深度学习云交互

在本演示中,我们表明,互动的环境和特定的深蓝色,学习模型的训练数据与真实的世界。这个环境包括t GPU集群和一个或更多的火花将一个VMS Azure虚拟网络连通在一起,可以轻松地设置一个mmlspark(微软机器学习在Apache放电)在开源的机器学习库的工作流程。
展开查看详情

1.Interactive Deep Learning in Cloud via MMLSpark Tong Wen, Microsoft #DL3SAIS

2.Overview • Toward a single environment for fast experimentation with big data and big compute • Spark + Accelerators (GPU, FPGA, TPU, …) + MPI • High performance with: – Cost effectiveness – Ease of use – Extensibility and openness #DL3SAIS 2

3.MMLSpark https://github.com/Azure/mmlspark/ • Tong Wen @microsoft.com • Eduardo de Leon • Akshaya Annavajhala • Ilya Matiach • Roope Astala • Miruna Oprescu • Eli Barzilay • Young Park • Maureen Busch • Sudarshan Raghunathan • Mark Hamilton • Ratan Sur • Danil Kirsanov #DL3SAIS 3

4.Key Advantages • Fast experimentation with Deep Learning – GPU vs CPU: ~40x speedup – Single interactive environment with easy setup • Trained an accurate model on NIH chest X-ray dataset in days – Data size: 45 GB compressed on disk; O(1) TB in memory – Model size: 46 million parameters • Cost to train the above model: < $9.54 – Spark cluster (10 nodes) : $2.48/hour – 4 GPUs: $2.29/hour – Training time: 54 mins #DL3SAIS 4

5.Implementation #DL3SAIS 5

6.Setup the System https://github.com/Azure/mmlspark/blob/master/ docs/gpu-setup.md #DL3SAIS 6

7.Attach a New VM Set up passwordless SSH login to the GPU VM Peak FLOPS/s GPU Type Price (FP32) Tesla K80 8.7 teraflops $0.574/hour Tesla P40 12 teraflops 1.319/hour Tesla P100 10.6 teraflops 1.319/hour Tesla V100 15.7 teraflops $1.95/hour Earth Simulator 41 teraflops >> $832/hour (2003) #DL3SAIS 7

8.Programming API Minibatch Wall clock GPU Epochs size time Yes 30 32 1m53s No 30 32 73m8s #DL3SAIS 8

9. Test Case: NIH Chest X-ray Dataset • 112,120 X-ray images (1024 by 1024) • AlexNet with 46 million parameters • 14 pathology labels • Half of the dataset for training • 30,805 unique patients • Downsized to 224 by 224 • Binary model • Data Parallel 1-Bit SGD Configuration Epochs Minibatch size Wall clock time 4 GPUs, 2 VMs 55 512 55m47s 4 GPUs, 4 VMs 55 512 53m40s #DL3SAIS 9

10.Conclusion & Future Work • A dynamically configurable hybrid architecture to support more big data + big compute scenarios with cost effectiveness – Data exchange (Parquet adaptor) – Model exchange (ONNX) – Single environment (Resource management) – Openness (More frameworks) #DL3SAIS 10

11.Thank You! #DL3SAIS 11

12.Test System Configuration Node Type Number Size Price 2.4 GHz Intel Xeon® E5-2673 v3 Spark Cluster Node 10 $0.248/hour processor; 8 cores; 28Gib 1 NVIDIA Tesla K80 GPU; 6 cores; GPU VM 2 $0.574/hour 56Gib 2 NVIDIA Tesla K80 GPU; 12 cores; GPU VM 2 $1.147/hour 112Gib #DL3SAIS 12