申请试用
HOT
登录
注册
 
Updates from Project Hydrogen - Unifying State-of-the-Art AI and Big Data in Apache Spark
Updates from Project Hydrogen - Unifying State-of-the-Art AI and Big Data in Apache Spark

Updates from Project Hydrogen - Unifying State-of-the-Art AI and Big Data in Apache Spark

Spark开源社区
/
发布于
/
4152
人观看

Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark Project Hydrogen is a major Apache Spark initiative to bring state-of-the-art AI and Big Data solutions together.

It contains three major projects:
1) barrier execution mode
2) optimized data exchange and
3) accelerator-aware scheduling.

A basic implementation of barrier execution mode was merged into Apache Spark 2.4.0, and the community is working on the latter two. In this talk, we will present progress updates to Project Hydrogen and discuss the next steps.

First, we will review the barrier execution mode implementation from Spark 2.4.0. It enables developers to embed distributed training jobs properly on a Spark cluster. We will demonstrate distributed AI integrations built on top it, e.g., Horovod and Distributed TensorFlow. We will also discuss the technical challenges to implement those integrations and future work.

Second, we will give updates on accelerator-aware scheduling and how it shall help accelerate your Spark training jobs. We will also outline on-going work for optimized data exchange.

10点赞
4收藏
0下载
确认
3秒后跳转登录页面
去登陆