Fluid-Alluxio Day in China

播放视频

视频文档

Fluid-Alluxio Day in China

下载 4

Alluxio

发布于

1317

人观看

#信息技术

Fluid云原生环境下数据密集型应用的高效支撑平台
顾荣
博士，南京大学计算机系副研究员，研究方向大数据处理系统，已在 TPDS、ICDE、JPDC、IPDPS、ICPP 等领域前沿期刊会议发表论文30余篇，主持国家自然科学基金面上项目/青年项目、中国博士后科学基金特别资助项目、企业创新研究基金项目等多项，研究成果落地应用于阿里巴巴、百度、字节跳动、中国石化、华泰证券等公司和开源项目 Apache Spark、Alluxio，获 2018 年度江苏省科学技术一等奖、2019 年度江苏省计算机学会青年科技奖，担任中国计算机学会系统软件专委会委员/大数据专委会通讯委员、江苏省计算机学会大数据专委会秘书长、Fluid开源项目co-founder、Alluxio开源项目PMC成员。

展开查看详情

1 .Fluid—云原生环境下数据密集型应用的高效支撑平台顾荣（Fluid co-founder，Alluxio PMC Member）南京大学 PASALab ·gurong@nju.edu.cn

2 .提纲 1 项目背景简介 2 Fluid核心理念 3 Fluid架构功能 4 Fluid系统演示

3 .技术发展背景过去十年云计算、大数据、人工智能发展迅猛云计算大数据人工智能平台领域处理领域框架领域 Docker、Kubernetes Hadoop、Spark、Alluxio Tensorflow、PyTorch、Caffe 1、大数据应用和AI应三者融合正在成为下一个重要的发展趋势用： • Gartner预测，到2023年，70%的AI workloads将以应用容器的方式运行 • 面向大规模数据计算分析或者以serverless编程模型的方式构建* • 典型的数据密集型应用 • Spark 3.0.1版本开始支持Kubernetes scheduler，拥抱云原生环境* 2、云计算平台： Gartner报告：https://www.gartner.com/en/conferences/emea/data-analytics-switzerland/featured-topics/ • 计算成本低和易于规模扩展 topic-ai-machine-learning • 容器化高效部署 Spark 3.0.1 runs on k8s： https://spark.apache.org/docs/latest/running-on-kubernetes.html

4 .面临的问题现有云原生编排框架对数据密集型应用支持不够好：运行效率低下、数据管理复杂 RestNet50 模型训练速度管理复杂：（images/second） ESSD云盘 Cloud StoragePL2 3189.6 数据版本数据接口多变多样 Synthetic 9993.6 本地内存数据类型数据存储抽象异构 0 2000 4000 6000 8000 10000 12000

5 .问题的原因分析  云原生环境和数据密集处理框架设计理念存在天然分歧计算和存储分离的基本架构以大数据/AI框架设计理念更多在云原生环境中大行其道地考虑数据本地化架构计算存储分离数据本地化云原生应用以无状态微服数据密集型框架以数据抽象务化部署，FaaS方式串联为中心，开展任务分配执行无状态服务有状态计算 CNCF云原生全景图缺少数据高效支撑组件这块重要拼图

6 .云原生环境下的数据支撑挑战云平台计算存储分离云环境中数据安全治理与架构数据访问延时高 01 02 多维度管理复杂 03 混合云场景下跨存储系统的联合分析困难

7 .Kubernetes生态中缺失的一块抽象 01 02 Kubernetes现有的抽象：云原生其他存储的抽象： • 计算抽象成了Pod • Rook：对于Ceph生命周期管理 • 存储抽象成了PVC • ChubaoFS：面向数据持久化存储，同时 • 网络抽象成了Service 提供对象和文件存储缺乏以应用为中心的数据抽象及其生命周期管理

8 .商店购物模式演变的联想商品、超市、客户商品<-->数据超市<-->存储客户<-->应用类比（功能：被消费）（功能：贮藏与供应）（功能：消费）数据、存储、应用

9 . 商店购物模式演变的联想线上购物模式：以客户为中心，商品贮藏在仓库，客户线上挑选商品，最后由现代化物流将商品交付到客户：高效便捷、交易量更大。仓库贮藏商品线上选择商品物流递送并主动交付商品至客户现代云架构下，数据贮存在云存储系统中，应用根据需要访问数据，？？？但由于类似“物流系统”的缺失，数据密集型应用消费访问数据低效。缺乏高效的数据交付数据贮存于云存储系统云上数据密集型应用

10 .Fluid扮演云原生的数据物流系统角色 Hadoop Alluxio Fluid 紧密耦合静态分离动态弹性 Data Fetch Data Access Data Delivery

11 .Fluid扮演云原生的数据物流系统角色 01 视角的转变：从云原生资源调度结合数据密集处理两方面综合审视云原生场景的数据抽象与支撑访问 02 思路的转变：针对容器编排缺乏数据感知，数据编排缺乏架构感知，提出协同编排、多维管理、智能感知创新方法 02 理念的转变：让数据像流体一样在资源编排层和计算处理层灵活高效地访问、转换和管理

12 .Fluid的核心理念 01 提供云平台数据集抽象的原生支持：数据密集型应用所需基础支撑能力功能化，实现数据高效访问并降低多维成本基于容器调度管理的数据集编排： 02 通过数据集缓存引擎与Kubernetes容器调度和扩缩容能力的相互配合，实现数据集可迁移性面向云上数据本地化的应用调度： 03 Kubernetes调度器通过与缓存引擎交互获得节点的数据缓存信息，将使用该数据的应用以透明的方式调度到包含数据缓存的节点，最大化缓存本地性的优势

13 .Fluid的系统架构

14 .Fluid的功能概念 Fluid不是全存储加速和管理，而是应用使用的数据集加速和管理 • Concept Dataset: 01 数据集是逻辑上相关的一组数据的集合，一致的文件特性，会被同一运算引擎使用 Runtime: 02 实现数据集安全性，版本管理和数据加速等能力的执行引擎的接口，定义了一系列生命周期的方法。 AlluxioRuntime: 03 来自Alluixo社区，是支撑Dataset数据管理和缓存的执行引擎高效实现

15 .Fluid的功能概念 Fluid不是全存储加速和管理，而是应用使用的数据集加速和管理 • Key Features 加速: • Portable and Scalable: Increase the Cache Capacity on Demand 01 • Observation: Know the Cache Capacity easily • Co-locality: Bring the data close to compute, and bring the compute close to data. 数据卷接口,统一访问不同存储： 02 Miniature Data Lake. The data from the different storage can be consumed together 隔离: 03 Access control in Dataset level for the Data Scientist

16 .Fluid的功能概念 How to Use Fluid apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: imagenet spec: 1.Create Dataset mounts: - mountPoint: oss://imagenet/train name: train options: fs.oss.accessKeyId: xxx fs.oss.accessKeySecret: yyy fs.oss.endpoint: oss-cn-huhehaote.aliyuncs.com 统一数据访问，以 - mountPoint: pvc://ceph-pvc name: validation 2.provision 数据卷方式暴露 PV/PVC apiVersion: batch/v1 3. Create Pod kind: Pod metadata: name: resnet50 Pod spec: containers: ImageNet - name: train image: resnet50 volumeMounts: - mountPath: /data /data name: imagenet volumes: train (OSS) - name: imagenet persistentVolumeClaim: Validation(PVC) claimName: imagenet

17 .Fluid的功能概念 How to check the dataset Status: Cache States:  Understand the current cache capabilities Cache Capacity: 600GiB Cached: 76.08GiB Cached Percentage: 90.3%  Determine if need to scale out Conditions: # more conditions Current Fuse Number Scheduled: 4 Current Master Number Scheduled: 1 Status: Current Cache Current Worker Number Scheduled: 4 Cache States: Capabilities Desired Fuse Number Scheduled: 4 Desired Master Number Scheduled: 1 Cache Capacity: 200.00GiB Desired Worker Number Scheduled: 4 Cached: 0B Fuse Number Available: 4 Cached Percentage: 0% Fuse Status: True Type: Ready Conditions: Phase: Bound Message: The ddc runtime is ready. Runtimes: Category: Accelerate Reason: DatasetReady Name: imagenet Status: True Namespace: default Type: Ready Type: alluxio Events: <none> Phase: Bound er Ready: 4 Current Number Runtimes: Fuse Phase: Ready of workers Master Number Ready: 1 Category: Accelerate Master Phase: Ready Name: imagenet Value File: imagenet-alluxio-values Namespace: default Worker Number Available: 4 Worker Number Ready: 4 Type: alluxio Data requires Worker Phase: Ready Ufs Total: 84.29GiB to cache Dataset Runtime

18 .Fluid的功能概念 Schedule job with the dataset locality 2.Find the cacheable Node 1. Create Pod client Kubernetes Fluid Scheduler Scheduler 3.Query the cache capabilities apiVersion: batch/v1 of then odes kind: Pod metadata: name: resnet50 Fluid Runtime spec: 4.Start pod in N1 containers: Service - name: train image: resnet50 volumeMounts: - mountPath: /data name: imagenet volumes: - name: cifar10 persistentVolumeClaim: Alluxio Alluxio claimName: imagenet Pod 10G Cached 5G Cached N1 N2 N3

19 .Fluid系统演示 Demo 1 Demo Demo22 Demo 3 Accelerate Remote File Machine Learning Accelerate PVC Accessing with Fluid with Fluid with Fluid 更多Demo请参见：https://github.com/fluid-cloudnative/fluid

20 .Fluid性能评估 Fluid vs OSSFS(20Gb/s) Fluid vs OSSFS 70000 250 58200.1 220 60000 214 images/second 200 50000 40817.2 40000 150 minutes 21422.3 32215 110 30000 100 91 78 20000 9248.64 27529.7 55 49 50 10000 16556.2 31 8630.56 0 0 8 GPUs 32 GPUs 64 GPUs 128 GPUs 8 GPUs 32 GPUs 64 GPUs 128 GPUs ossfs(cache on) Fluid ossfs(cache) Fluid Workload: ResNet50网络 && ImageNet数据集端到端性能提升约1倍

21 .更多了解Fluid Github Repo: https://github.com/fluid-cloudnative/fluid Project Homepage: http://pasa-bigdata.nju.edu.cn/fluid/index.html DingDing Talk Group:

22 .Thank You!

4点赞

3收藏

4下载