- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Bring data locality to ML_AI
邱璐,
毕业于乔治华盛顿大学数据科学专业,有多年开源社区贡献经验,2018年加入Alluxio 团队,主要负责Alluxio与公有云场景的结合,分布式系统选举机制,日志管理,监控系统,机器学习场景下的数据供给研究开发
展开查看详情
1 .Bring data locality to ML and AI workloads Lu Qiu
2 .About me ● Master Data Science @ GWU ● Software Engineer @ Alluxio ● Email: lu@alluxio.com ● Areas: Alluxio fault tolerant system, journal system, metrics system, and POSIX API. Alluxio integration with Cloud
3 .Alluxio POSIX API
4 . Make Distributed Data Available Locally • FUSE Interface makes all data available locally HDFS #1 SUPPORTS Obj Store • HDFS • NFS NFS • OpenStack HDFS #2 • Ceph • Amazon S3 4 • Azure • Google Cloud 3/25/19
5 .Data Accessibility via POSIX API Bash ~$ cat /mnt/alluxio/myInput Tensorflow ~$ python classify_image.py --model_dir /mnt/fuse/imagenet/ Note: Since Alluxio as a write-once/read-many file system, the mounted file system will not support all POSIX workloads 5
6 .A New Alluxio POSIX Implementation
7 .Why a new POSIX Impl.? Old JNR-Fuse New JNI-Fuse (Available in 2.5) Activeness Personal project, not actively Actively supported by Alluxio and the supported. whole community. Performance Has reasonable performance has much better performance under under 10 threads multi-thread high concurrency ML/AI training workloads (> 10 threads). Correctness Has read 0 issue with direct_io. N/A. Has other correctness issues without direct_io.
8 .Old JNR-based POSIX Implementation ● Hard to debug ● Worse performance
9 .Alluxio POSIX API Performance
10 .Alluxio new POSIX API JNI-Fuse ▪ Community-driven collaboration ▪ Contributors from NJU, Alibaba, Tencent, Alluxio ▪ Already in production in Microsoft Azure ML Platform ▪ Weekly Developer Sync
11 .Improving POSIX API ● Moduliazed JNI-Fuse library (issue ticket) ● Better performance for reading many small files by removing local RPCs (issue ticket, design doc) ● Support libfuse 3.x (issue ticket) And many more coming.. Join Alluxio weekly community sync to create solutions together!
12 .Alluxio POSIX API Reference ● ALLUXIO POSIX API documentation (English or Chinese) ● Turn Cloud Storage or HDFS Into Your Local File System for Faster AI Model Training With TensorFlow (link) ● 阿里云容器服务团队实践——Alluxio 优化数倍提升云上 Kubernetes 深度学习训练性能 (link)
13 .Questions? Welcome to join the Alluxio Community! www.alluxio.io | www.alluxio.io/slack | @alluxio