- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
深度学习云交互
展开查看详情
1 .Interactive Deep Learning in Cloud via MMLSpark Tong Wen, Microsoft #DL3SAIS
2 .Overview • Toward a single environment for fast experimentation with big data and big compute • Spark + Accelerators (GPU, FPGA, TPU, …) + MPI • High performance with: – Cost effectiveness – Ease of use – Extensibility and openness #DL3SAIS 2
3 .MMLSpark https://github.com/Azure/mmlspark/ • Tong Wen @microsoft.com • Eduardo de Leon • Akshaya Annavajhala • Ilya Matiach • Roope Astala • Miruna Oprescu • Eli Barzilay • Young Park • Maureen Busch • Sudarshan Raghunathan • Mark Hamilton • Ratan Sur • Danil Kirsanov #DL3SAIS 3
4 .Key Advantages • Fast experimentation with Deep Learning – GPU vs CPU: ~40x speedup – Single interactive environment with easy setup • Trained an accurate model on NIH chest X-ray dataset in days – Data size: 45 GB compressed on disk; O(1) TB in memory – Model size: 46 million parameters • Cost to train the above model: < $9.54 – Spark cluster (10 nodes) : $2.48/hour – 4 GPUs: $2.29/hour – Training time: 54 mins #DL3SAIS 4
5 .Implementation #DL3SAIS 5
6 .Setup the System https://github.com/Azure/mmlspark/blob/master/ docs/gpu-setup.md #DL3SAIS 6
7 .Attach a New VM Set up passwordless SSH login to the GPU VM Peak FLOPS/s GPU Type Price (FP32) Tesla K80 8.7 teraflops $0.574/hour Tesla P40 12 teraflops 1.319/hour Tesla P100 10.6 teraflops 1.319/hour Tesla V100 15.7 teraflops $1.95/hour Earth Simulator 41 teraflops >> $832/hour (2003) #DL3SAIS 7
8 .Programming API Minibatch Wall clock GPU Epochs size time Yes 30 32 1m53s No 30 32 73m8s #DL3SAIS 8
9 . Test Case: NIH Chest X-ray Dataset • 112,120 X-ray images (1024 by 1024) • AlexNet with 46 million parameters • 14 pathology labels • Half of the dataset for training • 30,805 unique patients • Downsized to 224 by 224 • Binary model • Data Parallel 1-Bit SGD Configuration Epochs Minibatch size Wall clock time 4 GPUs, 2 VMs 55 512 55m47s 4 GPUs, 4 VMs 55 512 53m40s #DL3SAIS 9
10 .Conclusion & Future Work • A dynamically configurable hybrid architecture to support more big data + big compute scenarios with cost effectiveness – Data exchange (Parquet adaptor) – Model exchange (ONNX) – Single environment (Resource management) – Openness (More frameworks) #DL3SAIS 10
11 .Thank You! #DL3SAIS 11
12 .Test System Configuration Node Type Number Size Price 2.4 GHz Intel Xeon® E5-2673 v3 Spark Cluster Node 10 $0.248/hour processor; 8 cores; 28Gib 1 NVIDIA Tesla K80 GPU; 6 cores; GPU VM 2 $0.574/hour 56Gib 2 NVIDIA Tesla K80 GPU; 12 cores; GPU VM 2 $1.147/hour 112Gib #DL3SAIS 12