七牛云 Atlab 实验室专注于机器视觉方向的深度学习研究。我们团队针对深度学习训练打造了高效的训练平台 AVA,深度整合了存储和计算资源。为了提高算法工程师训练模型的效率,我们使用 Alluxio 加速训练任务读取海量视频、图片等样本文件,并针对我们的应用场景,从元数据管理和读写分离等方向入手对 Alluxio 进行改造,实现了文件读写性能 50% 以上的提升,同时显著降低了存储系统的成本和容量风险。

Alluxio发布于2018/10/29 18:07

注脚

1.Alluxio & AVA - Deep learning in cloud Qiniu AI Technical Director lichaoguang@qiniu.com (李李朝光)

2.Agenda ✦ Brief introduction to AVA deep learning platform ✦ Evolution of AVA storage sub-system ✦ Time for Alluxio ✦ Performance boost with Alluxio in cloud ✦ Roadmap of Alluxio in AVA ✦ Contribute to Alluxio

3.AVA Image, Video service platform Classify Face Text Detection Image Search Track Query Dataset Label Model Training Schedule Inference

4.AVA architecture Label Dataset Training Model AVA deep learning platform Distributed Storage Distributed Compute Distributed caching Distributed Schedule Object Block GPU mgt Training Virtualization, orchestration, docker Hardware, switch

5.Agenda ✦ Brief introduction to AVA deep learning platform ✦ Evolution of AVA storage sub-system ✦ Time for Alluxio ✦ Performance boost with Alluxio in cloud ✦ Running Alluxio ✦ Contribute to Alluxio ✦ Roadmap of Alluxio in AVA

6.Data at present CEPH Storage 200T+ Alluxio RAM/SSD tier 3T/100T+ Training data 50B+ Bucket files 10M+ Image file size 100KB Video file size 500MB Images /day/training 300K+ Image files for model 20M+ KODO new added images /day 100M+

7.AVA Storage sub-system Qiniu Object Qiniu Object Qiniu Object Poor performance Not shareable Performance + + bad scale Op hell / scale scale + ease

8.Agenda ✦ Brief introduction to AVA deep learning platform ✦ Evolution of AVA storage sub-system ✦ Time for Alluxio ✦ Performance boost with Alluxio in cloud ✦ Running Alluxio ✦ Contribute to Alluxio ✦ Roadmap of Alluxio in AVA

9.Alluxio in AVA Application Application Application /a1 /a2 /b1 /b2 /c1 /c2 Bucket Bucket Bucket Bucket Bucket Bucket FUSE FUSE FUSE 3 1 FUSE Adapter (on host) KODO KODO 1 RAM RAM 2 2 3 Worker Worker Worker Worker Worker Worker

10.Deployment ✦ Cluster: ✦ Master ✦ N+ ✦ 128G (major) RAM or 64G ✦ Zookeeper: 3 node for HA ✦ 3 replicas ✦ Dockerized service ✦ Journal on CEPH FS ✦ One major cluster ✦ G1GC ✦ Other: video, classification, etc ✦ Supported inode: 40M (128G) *** ✦ Ethernet: 10000M * 2 ✦ Inode cache feature ** ✦ Read Worker ✦ Client ✦ 2 tiers: 1.8T SSD * 4, 64G RAM ✦ Write tier: SSD ✦ JVM mem: 16G ✦ Async through ✦ Cache poliicy ✦ Fuse cache *** ✦ Write worker ✦ Separated read/write host *** ✦ 10T CEPH RBD (FS first) ✦ Flush control *** ✦ Async through ✦ Short circuit *** ✦ Persist rate control ***

11.Why Alluxio on AVA? ✦ Qiniu: cloud vender with object storage ✦ + 2Billions images/videos every day ✦ Multiple vendor support (gateway) ✦ CEPH is not satisfied ✦Shared, scalable, FS inode, performance ✦ Keep training code unchanged ✦ Memory speed ✦ Beautiful code ✦ Good community support ✦ Cache, Cache, Cache !!!

12.Agenda ✦ Brief introduction to AVA deep learning platform ✦ Evolution of AVA storage sub-system ✦ Time for Alluxio ✦ Performance boost with Alluxio in cloud ✦ Roadmap of Alluxio in AVA ✦ Contribute to Alluxio

13. Performance boost with Alluxio ✦ Benefit: ✦ Memory cluster: 10GB for short circuit Object storage (KODO) ✦ Short term: pre-warm Master ✦ Long term: LRU evict Network: 1-10M/S ✦ImageNet, 50K images (proper pre-warm) Training POD Tier0: 1GB/S ✦20% faster than CEPH ✦5 times as fast as Object Storage Short Circuit: 8-10GB/S Tier1: 500MB/S ✦Same as CEPH for compute-heavy training Worker (local) Worker (remote) ✦4x fast with short-circuit for IO-heavy training

14.Caching (1) - Objects/Files cache Object Storage Billions of objects 50% faster than CEPH (pre-warm) Training 5x faster than from cloud Ethernet: 1-10MB/S Alluxio Worker Millions of objects Caching/Persist RAM Tier: 1G-10GB/S Promote/evict SSD Tier: 200-500MB/S Promote/evict SATA Tier: 1-100MB/S Discard

15.Caching (2) - FUSE meta cache LRU getStatus() Meta Cache open() Client read() Master Client First Access Master …() About N round-trip per file access Every access takes 2-3ms 3x performance boost for small file

16.Caching (3) - Master safety guarantee Or performance down 10x due to heavy GC! Possibily OOM Inode/Block Evict Persist Master Persist/Evict Worker Worker Monitor Meta Server

17.Short circuit & Cache Policy GPU 8-10G/S, Short Circuit POD Master Worker 1G/S, Ethernet Possible issues: • Compete with training program Worker • RAM rushing Worker CACHE POLICY • Tier eviction stuck 10M/S, Ethernet 10x storage access / 2x+ samples/sec

18.Agenda ✦ Brief introduction to AVA deep learning platform ✦ Evolution of AVA storage sub-system ✦ Time for Alluxio ✦ Performance boost with Alluxio in cloud ✦ Roadmap of Alluxio in AVA ✦ Contribute to Alluxio

19.Plan ✦ Cache Policy ✦ POD schedule ✦ Short Circuit / temp worker ✦ Master inode improvement ✦ Quota/large bucket ✦ off-heap/KV ✦ Reduce lock contention ✦ Write enhancement (retry, etc) ✦ Evict manager ✦ Standalone service

20.AVA & Alluxio Mobility Qiniu KODO object Vendor object store AVA platform AVA platform Qiniu infrastructure Vendor infrastructure

21.Edge storage & Edge compute & Alluxio & AVA Edge Object Store Object Store Edge Storage Edge Storage Alluxio Alluxio Edge Compute AVA Alluxio Edge Compute AVA AVA Customer IDC Cloud, etc, Qiniu

22.Agenda ✦ Brief introduction to AVA deep learning platform ✦ Evolution of AVA storage sub-system ✦ Time for Alluxio ✦ Performance boost with Alluxio in cloud ✦ Roadmap of Alluxio in AVA ✦ Contribute to Alluxio

23.Community contribution ✦ More to come ✦ Large file upload ✦ FUSE cache ✦ Bucket mount conflict ✦ Selective read/write worker ✦ Lock improvement ✦ Master Inode mem shrink ✦ Quota ✦……

24.Thanks!

user picture
  • Alluxio
  • Alluxio,世界上第一个将分离的异构存储整合到统一平台,并提供近乎内存访问速度的中间件,广泛用于企业和混合云的商业数据分析加速。

相关Slides

  • 本PPT解释了作为支持交易型分布式数据库系统的TiDB核心产品架构及其主要组件,包括TiDB,TiKV,Placement Driver,TiSpark,TheFlash,Tool,TiDB-operator for k8s等,对其基本作用进行阐述,并对其中的核心组件TiKV重点分析,解释了基本数据组织方式,执行方式,数据管理,水平扩展和负载均衡,以及分布式一致性等基本问题。最好对其分析引擎TiSpark也进行了简要功能说明。

  • 介绍了ES的基本结构,功能和原理,重点分析了在实际生产环境中各种运维和监控的指标,以及各种调优经验和配置参数,还有运维自动化的方法论探讨,可以作为ES在实际生产环境中的最佳实践部署和运维监控案例,也可以帮助ES平台维护者理解并思考如何提供更好的ES服务及运维保障。

  • Adaptive Execution @ Spark + AI Summit Europe 2018 Video @ https://databricks.com/session/spark-sql-adaptive-execution-unleashes-the-power-of-cluster-in-large-scale-2

  • Apache Spark作为分布式内存计算引擎,内存使用的优化对于性能提升至关重要,Intel的Optane(傲腾)技术,让内存和SSD之间架设了个新的数据缓存/存储层,并通过PMDK等特殊的API绕过文件系统,系统调用,内存拷贝等一系列额外操作,让性能有极大的提升。Intel开源的OAP(Optimized Analytics Package)for Apache Spark项目,也是基于这个前体,构建即席查询引擎,以及在机器学习算法诸如KMeans算法上也获得了不错的性能回报。