Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Ac

CPU technologies have scaled well in past years, by more complex architecture design, more wide execution pipelines, more cores in same processor, and higher frequency. However accelerators show more computational power and higher throughput with lower cost in dedicated area, which leads to more usages in Spark. But when we integrate accelerators in Spark a common case is huge performance promises through micro test with little performance boost actually we get. One reason is the cost of data transfer between JVM and accelerator. The other reason is the accelerator lack the information how it’s used in Spark. In this research, we investigate the usage of apache arrow based dataframe as the unified data sharing and transferring way between CPU and accelerators, and make it dataframe aware when we design hardware and software stack. In this way we seamlessly integrate Spark and Accelerators design and get close to promised performance.
展开查看详情

1.WIFI SSID:SparkAISummit | Password: UnifiedAnalytics

2.Apache Arrow* Based Unified Data Exchange Binwei Yang, Intel Carson Wang, Intel #UnifiedAnalytics #SparkAISummit

3.Me • 13 years of experience on performance analysis • Software -> CPU simulator -> Spark • Join Intel Spark team in Aug. 2018 • A “layman” of Apache Spark 3

4.Pursuit of Performance Is Endless • Intel® 2nd Gen Xeon® Scalable Processors • Intel® Optane™ DC persistent memory • Intel® FPGA • Software optimization 4

5. Without Offload Internal Row Tungsten Engine Internal Row CPU 5

6. FPGA Offload Spark already has off-heap unsafe-row Internal Row Internal Row FPGA Batch FPGA Batch CPU FPGA FPGA DMA RX FPGA Engine FPGA DMA TX 6

7.Offloading Performance Time To-FPGA √ CPU Offload From-FPGA 7

8.Offloading Performance Time To-FPGA To-FPGA √ CPU Offload Offload From-FPGA From-FPGA 8

9. Overhead of Offload Internal Row Internal Row Convert FPGA Batch FPGA Batch CPU Data Move FPGA FPGA DMA RX FPGA Engine FPGA DMA TX 9

10. Optimize – Unified Format Unified FPGA Format Batch Unified FPGA Format Batch CPU FPGA FPGA DMA RX FPGA Engine FPGA DMA TX • Unified format FPAG can easily debug • FPGA library can be shared with all other projects 10

11. Optimize – Double Buffer Unified FPGA Format Batch Unified FPGA Format Batch CPU FPGA FPGA DMA RX1 FPGA DMA RX1 FPGA Engine FPGA DMA RX2 FPGA DMA RX2 11

12. Optimize – Double Buffer • Columnar data format is friendly to most of accelerator Col1 Col2 Col3 Col… Eng 1 Eng 2 Eng 3 Eng … Time 12

13.Do We Fully Utilize CPU? df.agg(F.sum(‘a_float')).show() perf stat -e fp_arith_inst_retired.128b_packed_single -A -a sleep 1 CPU0 0 fp_arith_inst_retired.128b_packed_single CPU1 0 fp_arith_inst_retired.128b_packed_single CPU2 0 fp_arith_inst_retired.128b_packed_single … 13

14.Add AVX Support • We need – A columnar data format – Native LLVM SQL Engine • Take use of other highly optimized libraries 14

15.Recap • A standard columnar data format – Easily debug – Shared by all projects • Implement a serial of Tungsten backends 15

16. Apache Arrow* Is the Answer • Apache Arrow* is the best choice • A standard data frame format – For Native Tungsten backend – For all accelerators offloading Spark SQL engine *Other names and brands may be claimed as the property of others. 16

17.Plug and Play Backend Data Frame Physical Plan Python op1 op2 op3 op4 UDF Tungsten Backend LLVM ACC1 ACC2 Intel JVM AVX Python Off-Heap Python >>> >>> >>> 17

18.Take Use of Intel Optane DC Persistent Memory Data Frame Physical Plan op1 op2 op3 op4 Tungsten Backend LLVM JVM ACC ACC AVX Off-Heap >>> >>> 18

19.Take Use of Intel Optane DC Persistent Memory Data Frame Physical Plan op2 op3 op4 Tungsten Backend LLVM ACC ACC AVX Off-Heap Shuffle >>> >>> Input 19

20.Json, CSV, Unzip Offload Data Frame Physical Plan op2 op3 op4 Tungsten Backend Json LLVM ACC2 csv Unzip ACC1 AVX Off-Heap >>> >>> 20

21.Filter, Project Pushdown Data Frame Physical Plan op2 op3 op4 Tungsten Backend Filter LLVM ACC2 Project ACC1 AVX Off-Heap >>> >>> 21

22.Connect Other ML/AI Framework • The proposal of JIRA 24579 • No extra data format convert 22

23.Call to Action • Share your comments on JIRA 27396 created by Robert • Follow our work on https://github.com/Intel- bigdata • Let’s bring Spark’s performance to higher level #UnifiedAnalytics #SparkAISummit 23

24.DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT