- 微博 QQ QQ空间 贴吧
Accelerating Machine Learning Workloads and Apache Spark Applications via CUDA
收藏 1下载 1
Data science workflows can benefit tremendously from being accelerated, to enable data scientists to explore more and larger datasets. This allows data scientist to drive towards their business goals, faster, and more reliably. Accelerating Apache Spark with GPU is the next step for data science. In this talk, we will share our work in accelerating Spark applications via CUDA and NCCL. We have identified several bottleneck in Spark 2.4 in the areas of data serialization and data scalability. To address this we accelerated Spark based data analytics with enhancements to allow large columnar datasets to be analyzed directly in CUDA with Python. The GPU dataframe library, cuDF (github.com/rapidsai/cudf), can be used to express advanced analytics easily. Through applying Apache Arrow and cuDF, we have achieved over 20x speedup over regular RDDs. For distributed machine learning, Spark 2.4 introduced a barrier execution mode to support MPI allreduce style algorithms. We will demonstrate how the latest Nvidia NCCL library, NCCL2, could further scale out distributed learning algorithms, such as XGBoost. Finally, an enhancement of Spark kubernetes scheduler will be introduced so that GPU resources can be scheduled from a kubernetes cluster for Spark applications. We will share our experience deploying Spark on Nvidia Tesla T4 server clusters. Based on the new NVIDIA Turing architecture, the T4, an energy-efficient 70-watt small PCIe form factor GPU, is optimized for scale-out computing environments and features multi-precision Turing Tensor Cores and new RT Cores.