申请试用
HOT
登录
注册
 
2021-01-19 Alluxio KeyNote.
1 点赞
1 收藏
3下载
Alluxio
/
发布于
/
372
人观看

云时代面向AI和数据分析的数据编排-范斌&Jasmine Wang

范斌
位于硅谷的开源数据平台软件Alluxio公司的创始成员和VP of Open Source. 加入Alluxio前, 范斌在Google从事下一代大规模分布式存储系统的研究与开发. 范斌博士毕业于卡内基梅隆大学计算机系, 博士期间在分布式系统算法和系统实现等方向发表多篇包括SIGCOMM, SOSP, NSDI等顶级国际会议论文以及多篇专利。

Jasmine Wang
Alluxio Open Source Community Manager

展开查看详情

1.

2. 范斌 创始成员 开源副总裁

3.

4.THE JOURNEY TO A FRAGMENTED DATA WORLD

5.THE JOURNEY TO A FRAGMENTED DATA WORLD

6.THE JOURNEY TO A FRAGMENTED DATA WORLD

7.

8.DATA PLATFORMS ARE COMPLEX

9.DATA PLATFORMS ARE COMPLEX

10.DATA PLATFORMS ARE COMPLEX

11. DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 ALLUXIO 11

12. DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 12

13. DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS REGION A PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 13

14. DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS ERROR PRONE AND NETWORK INTENSIVE DATA COPIES REGION A REGION B PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 14

15.

16.A DATA ORCHESTRATION APPROACH

17.Thousands OF COMPANIES USING ALLUXIO ALLUXIO 17

18. Core Feature 1: Distributed Caching A C A B Big Data ETL Alluxio Server Alluxio Server Alluxio Server Big Data Query A B C / path1 / file1 Model Training Object Store / path2 / file2

19. Core Feature 2: Flexible APIs HDFS Big Data ETL compatible API Alluxio Server Alluxio Server Alluxio Server Big Data Query POSIX Model Training Object Store

20. Core Feature 3: Unified Namespace Big Data ETL Alluxio Server Alluxio Server Alluxio Server Big Data Query Model Training Object Store HDFS

21. COMPANIES USING ALLUXIO TECHNOLOGY OTHERS FINANCIAL SERVICES INTERNET PUBLIC CLOUD PROVIDERS TELCO & MEDIA GENERAL E-COMMERCE LEARN MORE

22.DATA ORCHESTRATION WITHIN A SINGLE DATACENTER OR CLOUD REGION USE CASE 01: CLOUD USE CASE 02: ON PREM Consistent SLAs, Performance, and Speed-up analytics on on-prem Cost Savings on cloud storage object stores PUBLIC CLOUD ON PREMISE Tensorflow Spark Alluxio Alluxio OR OR

23.https://www.alluxio.io/resources/videos/speeding-up-spark -performance-using-alluxio-at-china-unicom/

24. ○ ○ ○ https://www.alluxio.io/resources/videos/enterprise-distributed-query-servic e-powered-by-presto-alluxio-across-clouds-at-walmartlabs/

25.Growing Workloads

26.○ ○ ○ ○ ○ ○

27.Alluxio & AI w/ K8s • Machine Learning & AI runs on Data Lakes • Compared to Data Analytics, AI workloads have different characteristics, but a similar mismatch between compute and storage ○ Access Pattern - Repeated access on a dataset ○ Dataset - Many small files ○ Preferred API - Posix Filesystem ○ Workload Regularity - Predictable, bulk access

28.

29.by Google/OpenSSF (https://github.com/ossf/criticality_score)

1 点赞
1 收藏
3下载