- 微博 QQ QQ空间 贴吧
Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Ac
收藏 6下载 8
CPU technologies have scaled well in past years, by more complex architecture design, more wide execution pipelines, more cores in same processor, and higher frequency. However accelerators show more computational power and higher throughput with lower cost in dedicated area, which leads to more usages in Spark. But when we integrate accelerators in Spark a common case is huge performance promises through micro test with little performance boost actually we get. One reason is the cost of data transfer between JVM and accelerator. The other reason is the accelerator lack the information how it’s used in Spark. In this research, we investigate the usage of apache arrow based dataframe as the unified data sharing and transferring way between CPU and accelerators, and make it dataframe aware when we design hardware and software stack. In this way we seamlessly integrate Spark and Accelerators design and get close to promised performance.