申请试用
HOT
登录
注册
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data

Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data

Spark开源社区
/
发布于
/
8297
人观看
Project Hydrogen is a major Apache Spark initiative to bring state-of-the-art AI and Big Data solutions together. It contains three major projects: 1) barrier execution mode 2) optimized data exchange and 3) accelerator-aware scheduling. A basic implementation of barrier execution mode was merged into Apache Spark 2.4.0, and the community is working on the latter two. In this talk, we will present progress updates to Project Hydrogen and discuss the next steps. First, we will review the barrier execution mode implementation from Spark 2.4.0. It enables developers to embed distributed training jobs properly on a Spark cluster. We will demonstrate distributed AI integrations built on top it, e.g., Horovod and Distributed TensorFlow. We will also discuss the technical challenges to implement those integrations and future work. Second, we will outline on-going work for optimized data exchange. Its target scenario is distributed model inference. We will present how we do performance testing/profiling, where the bottlenecks are, and how to improve the overall throughput on Spark. If time allows, we might also give updates on accelerator-aware scheduling.
0点赞
0收藏
4下载
确认
3秒后跳转登录页面
去登陆