申请试用
HOT
登录
注册
 
2021-01-19 Alluxio KeyNote.
Alluxio
/
发布于
/
432
人观看

云时代面向AI和数据分析的数据编排-范斌&Jasmine Wang

范斌
位于硅谷的开源数据平台软件Alluxio公司的创始成员和VP of Open Source. 加入Alluxio前, 范斌在Google从事下一代大规模分布式存储系统的研究与开发. 范斌博士毕业于卡内基梅隆大学计算机系, 博士期间在分布式系统算法和系统实现等方向发表多篇包括SIGCOMM, SOSP, NSDI等顶级国际会议论文以及多篇专利。

Jasmine Wang
Alluxio Open Source Community Manager

展开查看详情

1 .

2 . 范斌 创始成员 开源副总裁

3 .

4 .THE JOURNEY TO A FRAGMENTED DATA WORLD

5 .THE JOURNEY TO A FRAGMENTED DATA WORLD

6 .THE JOURNEY TO A FRAGMENTED DATA WORLD

7 .

8 .DATA PLATFORMS ARE COMPLEX

9 .DATA PLATFORMS ARE COMPLEX

10 .DATA PLATFORMS ARE COMPLEX

11 . DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 ALLUXIO 11

12 . DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 12

13 . DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS REGION A PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 13

14 . DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS ERROR PRONE AND NETWORK INTENSIVE DATA COPIES REGION A REGION B PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 14

15 .

16 .A DATA ORCHESTRATION APPROACH

17 .Thousands OF COMPANIES USING ALLUXIO ALLUXIO 17

18 . Core Feature 1: Distributed Caching A C A B Big Data ETL Alluxio Server Alluxio Server Alluxio Server Big Data Query A B C / path1 / file1 Model Training Object Store / path2 / file2

19 . Core Feature 2: Flexible APIs HDFS Big Data ETL compatible API Alluxio Server Alluxio Server Alluxio Server Big Data Query POSIX Model Training Object Store

20 . Core Feature 3: Unified Namespace Big Data ETL Alluxio Server Alluxio Server Alluxio Server Big Data Query Model Training Object Store HDFS

21 . COMPANIES USING ALLUXIO TECHNOLOGY OTHERS FINANCIAL SERVICES INTERNET PUBLIC CLOUD PROVIDERS TELCO & MEDIA GENERAL E-COMMERCE LEARN MORE

22 .DATA ORCHESTRATION WITHIN A SINGLE DATACENTER OR CLOUD REGION USE CASE 01: CLOUD USE CASE 02: ON PREM Consistent SLAs, Performance, and Speed-up analytics on on-prem Cost Savings on cloud storage object stores PUBLIC CLOUD ON PREMISE Tensorflow Spark Alluxio Alluxio OR OR

23 .https://www.alluxio.io/resources/videos/speeding-up-spark -performance-using-alluxio-at-china-unicom/

24 . ○ ○ ○ https://www.alluxio.io/resources/videos/enterprise-distributed-query-servic e-powered-by-presto-alluxio-across-clouds-at-walmartlabs/

25 .Growing Workloads

26 .○ ○ ○ ○ ○ ○

27 .Alluxio & AI w/ K8s • Machine Learning & AI runs on Data Lakes • Compared to Data Analytics, AI workloads have different characteristics, but a similar mismatch between compute and storage ○ Access Pattern - Repeated access on a dataset ○ Dataset - Many small files ○ Preferred API - Posix Filesystem ○ Workload Regularity - Predictable, bulk access

28 .

29 .by Google/OpenSSF (https://github.com/ossf/criticality_score)

1 点赞
1 收藏
8下载