- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
无缝云链接和可扩展的硬件卸载
展开查看详情
1 .Accelerated Spark on Azure: Seamless and Scalable Hardware Offloads in the Cloud Yuval Degani, Mellanox Technologies Evan Burness, Microsoft Azure #HWCSAIS18
2 .• End-to-end designer and supplier of interconnect solutions: network adapters, switches, system-on-a-chip, cables, silicon and software • 10-400 Gb/s Ethernet and InfiniBand Virtual Protocol Switch / Virtual Protocol Interconnect Gateway Interconnect 56/100/200G 56/100/200G Storage Server / InfiniBand InfiniBand Front / Backend Compute 10/25/40/50/ 10/25/40/50/ 100/200/400GbE 100/200/400GbE #HWCSAIS18 2
3 .• RDMA capable network, powered by Mellanox • H-series (Intel CPUs with FDR InfiniBand) • NC-series (Nvidia GPUs with FDR InfiniBand) • Only major Cloud provider with RDMA • Run simulation and AI workloads at large-scale • Dozens of RDMA clusters around the world #HWCSAIS18 3
4 .Why are we here? • Azure hardware accelerated networks will soon support general-purpose RDMA (on top of SR-IOV) • SparkRDMA Shuffle Plugin (appeared at Spark Summit Europe 2017) can now be used in the cloud, providing instant speedups for Spark jobs #HWCSAIS18 4
5 . Java app What’s RDMA? buffer Socket RDMA • Remote Direct Memory Access – Read/write from/to remote memory locations Context switch • Zero-copy • Direct hardware interface – bypasses the OS kernel and TCP/IP in IO path Sockets • Flow control and reliability is offloaded in hardware TCP/IP • Sub-microsecond latency Driver • Supported on almost all mid-range/high- end network adapters Network Adapter #HWCSAIS18 5
6 .RDMA on Azure • No need for buying expensive hardware • Lowest latency on the Cloud (~2.5 uSec) • Pre-built OS images for easy deployment • K80, P100, and V100 GPUs with InfiniBand • Other uses cases for RDMA on Azure: #HWCSAIS18 6
7 .RDMA on Azure Azure accelerated networking is build on top of SR-IOV (Single Root Input/Output Virtualization) hardware support provided by Mellanox ConnectX network cards #HWCSAIS18 7
8 .Under the hood Spark’s Shuffle Internals #HWCSAIS18 8
9 . Spark’s Shuffle Basics Map Reduce #HWCSAIS18 9
10 . Spark’s Shuffle Basics Input Map Reduce #HWCSAIS18 9
11 . Spark’s Shuffle Basics Input Map Map Map Map Map Map Reduce #HWCSAIS18 9
12 . Spark’s Shuffle Basics Input Map output Map Map Map Map Map Map Reduce #HWCSAIS18 9
13 . Spark’s Shuffle Basics Input Map output Map File Map File Map Map File Map File Map File Reduce #HWCSAIS18 9
14 . Spark’s Shuffle Basics Input Map output Map File Map File Map Map File Driver Map File Map File Reduce #HWCSAIS18 9
15 . Spark’s Shuffle Basics Input Map output Map File Map File Map Map File Driver Map File Map File Reduce task Reduce Reduce task Reduce task Reduce task Reduce task #HWCSAIS18 9
16 . Spark’s Shuffle Basics Input Map output Map File Map File Map Map File Driver Map File Map File Reduce task Fetch blocks Reduce Reduce task Fetch blocks Reduce task Fetch blocks Reduce task Fetch blocks Reduce task Fetch blocks #HWCSAIS18 9
17 . Spark’s Shuffle Basics Input Map output Map File Map File Map Map File Driver Map File Map File Reduce task Fetch blocks Reduce Reduce task Fetch blocks Reduce task Fetch blocks Reduce task Fetch blocks Reduce task Fetch blocks #HWCSAIS18 9
18 . Spark’s Shuffle Basics Input Map output Map File Map File Map Map File Driver Map File Map File Reduce task Fetch blocks Reduce Reduce task Fetch blocks Reduce task Fetch blocks Reduce task Fetch blocks Reduce task Fetch blocks #HWCSAIS18 9
19 . Spark’s Shuffle Read Protocol Driver Shuffle Read Reader Writer #HWCSAIS18 10
20 . Spark’s Shuffle Read Protocol Driver Shuffle Read Reader Writer #HWCSAIS18 10
21 . Spark’s Shuffle Read Protocol Driver Shuffle Read Reader 1 Request Map Statuses Writer #HWCSAIS18 10
22 . Spark’s Shuffle Read Protocol Send back Map Statuses Driver 2 Shuffle Read Reader 1 Request Map Statuses Writer #HWCSAIS18 10
23 . Spark’s Shuffle Read Protocol Send back Map Statuses Driver 2 Shuffle Read Reader 1 3 Request Map Group block Statuses locations by writer Writer #HWCSAIS18 10
24 . Spark’s Shuffle Read Protocol Send back Map Statuses Driver 2 Request blocks Shuffle Read from writers Reader 1 3 4 Request Map Group block Statuses locations by writer Writer #HWCSAIS18 10
25 . Spark’s Shuffle Read Protocol Send back Map Statuses Driver 2 Request blocks Shuffle Read from writers Reader 1 3 4 Request Map Group block Statuses locations by writer Writer 5 Locate blocks, and setup as stream #HWCSAIS18 10
26 . Spark’s Shuffle Read Protocol Send back Map Statuses Driver 2 Request blocks Request blocks from stream, one Shuffle Read from writers by one Reader 1 3 4 6 Request Map Group block Statuses locations by writer Writer 5 Locate blocks, and setup as stream #HWCSAIS18 10
27 . Spark’s Shuffle Read Protocol Send back Map Statuses Driver 2 Request blocks Request blocks from stream, one Shuffle Read from writers by one Reader 1 3 4 6 Request Map Group block Statuses locations by writer Writer 5 7 Locate blocks, and Locate block, send setup as stream back #HWCSAIS18 10
28 . Spark’s Shuffle Read Protocol Send back Map Statuses Driver 2 Request blocks Block data is now Request blocks from stream, one ready Shuffle Read from writers by one Reader 1 3 4 6 8 Request Map Group block Statuses locations by writer Writer 5 7 Locate blocks, and Locate block, send setup as stream back #HWCSAIS18 10
29 .The Cost of Shuffling • Shuffling is very expensive in terms of CPU, RAM, disk and network IOs • Spark users try to avoid shuffles as much as they can • Speedy shuffles can relieve developers of such concerns, and simplify applications #HWCSAIS18 11