1.Ray的发展历程与展望-张喆

播放视频

视频文档

1.Ray的发展历程与展望-张喆

下载 27

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
视频嵌入链接文档嵌入链接
<iframe src="https://www.slidestalk.com/Baiyulan/ZheShanghaiRayMeetup20210731ExternallyShared53669?embed&video" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

白玉兰开源

发布于

3年前

1775

人观看

#信息技术

Ray的发展历程与展望-张喆

这个议题会分享三个要点：1）什么是Ray（包括Ray项目的简史，和基本的框架以及API）；2）为什么我们认为Ray是下一代云计算的计算架构（包括灵活和强大的分布式开发，活跃的生态，和对Serverless模式的支持）；3）我们接下来会做什么（包括介绍新开发的Dataset，Workflow模块）

张喆目前领导Anyscale公司开源工程团队。此前，他在LinkedIn公司，负责大数据与人工智能计算团队（提供Hadoop/Spark/TensorFlow服务）。从2014年开始，张喆的工作就与开源紧密相关，他是Apache Hadoop Committer和PMC，也是Apache软件基金会成员。

展开查看详情

1 .云原生的新一代计算基础设施：Ray 张喆 / Anyscale - Ray研发负责人 zhz@anyscale.com 07. 31. 2021

2 .Agenda 什么是Ray 为什么Ray是新一代计算基础设施？我们接下来在做什么

3 .什么是Ray？ 1. 项目简史 2. API 和架构

4 .什么是Ray？ 1. 项目简史 2. API 和架构

5 .History of Ray (in a nutshell) 起点：实时/强化学习（要求高度灵活的系统） - 动态计算图 - 毫秒级的分布式调度

6 . 第二届Ray Summit + Ray 1.4 第一届Ray Summit + Ray 1.0 Ownership: 分布式Future系统 Lineage Stash: 关键路径外的容错 Ray: 支持新一代AI 应用的分布式框架 RLLIB: 分布式强实时机器学习: 化学习的抽象 The Missing Pieces 2017 2018 2019 2020 2021

7 . 第二届Ray Summit + Ray 1.4 第一届Ray Summit + Ray 1.0 Ownership: 分布式Future系统 Lineage Stash: 关键路径外的容错 Ray: 支持新一代AI 应用的分布式框架 RLLIB: 分布式强实时机器学习: 化学习的抽象 The Missing 通用 Pieces 分布式计算 2017 2018 2019 2020 2021

8 .什么是Ray？ 1. 项目简史 2. API 和架构

9 .什么是Ray？-- API简介 Function Task Class Actor Object (Distributed) Object

10 . 什么是Ray？-- Hello World import ray ray.init() @ray.remote def f(x): return x * x results = [f(i) for i in range(4)] print(results) # [0, 1, 4, 9]

11 . 什么是Ray？-- Hello World import ray f() ray.init() f() @ray.remote Driver / def f(x): main() f() return x * x results = [f.remote(i) for i in range(4)] f() print(ray.get(results)) # [0, 1, 4, 9]

12 . What is Ray? Hello World… @ray.remote(num_cpus=1) Worker() class Worker: def __init__(self): Supervisor self.value = 0 Worker() def work(self): self.value += 1 return "done" Worker() @ray.remote(num_cpus=1) class Supervisor: def __init__(self): self.workers = [Worker.remote() for _ in range(3)] def work(self): return ray.get([w.work.remote() for w in self.workers]) ray.init() sup = Supervisor.remote() print(ray.get(sup.work.remote())) # outputs ['done', 'done', 'done']

13 .什么是Ray？--架构简介类比: Spark Driver Ray独有的模类比: YARN NM 块/功能 Daemon，或者 Spark Executor 类比: YARN RM

14 .什么是Ray？--实现原理

15 .Agenda 什么是Ray 为什么Ray是新一代计算基础设施？我们接下来在做什么

16 .Why Using Ray for ML? 💡让分布式应用开发变得简单 💡灵活和强大的API/抽象 💡丰富的数据/机器学习生态

17 .简单的开发 serverless ● 局限于特定云厂商 ● 无状态计算 ● 没有GPU/TPU支持 ● 运行时间限制组装不同的框架 ● 学习成本高 ● 复杂的部署和维护 ● 基础架构“孤岛” 容器层抽象/开发 ● 代码复杂 ● 需要运维团队灵活/强大

18 .Ray 生态系统：ML and Data Ray ecosystem 数据处理模型训练 Serving 超参数搜索业务逻辑+模拟其他 + Native universal framework for distributed computing

19 . Case Study: XGBoost on Ray Every training Worker First class Tuning support naturally maps to a Ray Actor - Each training is a distributed job - Automatic elasticity - Nested parallelism with Ray - Native GPU support

20 .Case Study: AutoML (Ludwig)

21 .Case Study: AutoML (Ludwig)

22 .开始动手试一试！ Ray 101 - pip install ray - python hello.py (on 💻) Ray 201 - Join Ray Slack - Try a Ray cluster - Send questions to discuss.ray.io

23 .Agenda 什么是Ray 为什么Ray是新一代计算基础设施？我们接下来在做什么

24 .Ray’s Roadmap Core: Reliable and stable at large scale Libraries: Easy-to-use high level libraries for production workloads Deployment: Simple and clear paths for deploying clusters and code

25 . Core: Reliable and Stable at Large Scale - Object lifetime mgmt (WiP) - PB level data processing (WiP) - Pluggable GCS backend (WiP) - 1000+ nodes (WiP)

26 . Libraries: Easy / Production 2.0 Convenience, Durability Function Task Workflow HA, Convenience, Performance Class Actor Serve Distributed Convenience, Interoperability Object Dataset Object

27 . Deployment: Simple and Clear import ray ray.client().env({"pip": "./requirements.txt"}).connect() ... working_dir(Path) conda(dict | str) Docker -- WiP

1点赞

4收藏

27下载