- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
2021-01-19 Alluxio KeyNote.
云时代面向AI和数据分析的数据编排-范斌&Jasmine Wang
范斌
位于硅谷的开源数据平台软件Alluxio公司的创始成员和VP of Open Source. 加入Alluxio前, 范斌在Google从事下一代大规模分布式存储系统的研究与开发. 范斌博士毕业于卡内基梅隆大学计算机系, 博士期间在分布式系统算法和系统实现等方向发表多篇包括SIGCOMM, SOSP, NSDI等顶级国际会议论文以及多篇专利。
Jasmine Wang
Alluxio Open Source Community Manager
展开查看详情
1 .
2 . 范斌 创始成员 开源副总裁
3 .
4 .THE JOURNEY TO A FRAGMENTED DATA WORLD
5 .THE JOURNEY TO A FRAGMENTED DATA WORLD
6 .THE JOURNEY TO A FRAGMENTED DATA WORLD
7 .
8 .DATA PLATFORMS ARE COMPLEX
9 .DATA PLATFORMS ARE COMPLEX
10 .DATA PLATFORMS ARE COMPLEX
11 . DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 ALLUXIO 11
12 . DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 12
13 . DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS REGION A PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 13
14 . DATA SILOS ACROSS DATA CENTERS, REGIONS, CLOUDS ERROR PRONE AND NETWORK INTENSIVE DATA COPIES REGION A REGION B PRIVATE DATA CENTERS Hive REGION A DATACENTER 1 REGION B DATACENTER 2 ALLUXIO 14
15 .
16 .A DATA ORCHESTRATION APPROACH
17 .Thousands OF COMPANIES USING ALLUXIO ALLUXIO 17
18 . Core Feature 1: Distributed Caching A C A B Big Data ETL Alluxio Server Alluxio Server Alluxio Server Big Data Query A B C / path1 / file1 Model Training Object Store / path2 / file2
19 . Core Feature 2: Flexible APIs HDFS Big Data ETL compatible API Alluxio Server Alluxio Server Alluxio Server Big Data Query POSIX Model Training Object Store
20 . Core Feature 3: Unified Namespace Big Data ETL Alluxio Server Alluxio Server Alluxio Server Big Data Query Model Training Object Store HDFS
21 . COMPANIES USING ALLUXIO TECHNOLOGY OTHERS FINANCIAL SERVICES INTERNET PUBLIC CLOUD PROVIDERS TELCO & MEDIA GENERAL E-COMMERCE LEARN MORE
22 .DATA ORCHESTRATION WITHIN A SINGLE DATACENTER OR CLOUD REGION USE CASE 01: CLOUD USE CASE 02: ON PREM Consistent SLAs, Performance, and Speed-up analytics on on-prem Cost Savings on cloud storage object stores PUBLIC CLOUD ON PREMISE Tensorflow Spark Alluxio Alluxio OR OR
23 .https://www.alluxio.io/resources/videos/speeding-up-spark -performance-using-alluxio-at-china-unicom/
24 . ○ ○ ○ https://www.alluxio.io/resources/videos/enterprise-distributed-query-servic e-powered-by-presto-alluxio-across-clouds-at-walmartlabs/
25 .Growing Workloads
26 .○ ○ ○ ○ ○ ○
27 .Alluxio & AI w/ K8s • Machine Learning & AI runs on Data Lakes • Compared to Data Analytics, AI workloads have different characteristics, but a similar mismatch between compute and storage ○ Access Pattern - Repeated access on a dataset ○ Dataset - Many small files ○ Preferred API - Posix Filesystem ○ Workload Regularity - Predictable, bulk access
28 .
29 .by Google/OpenSSF (https://github.com/ossf/criticality_score)