范斌博士将分享Alluxio 2.0系统着眼的功能、面临的挑战,介绍开发者社区对于对RPC系统升级,完整支持异步写,数据副本的管理,以及自建的HA模式(无需依赖Zookeeper或者HDFS)等重要功能的目标、设计和进度。同时,作为Alluxio开源项目的核心开发者,范斌还将分享在过去数年中Alluxio团队总结的一些分布式系统开发的经验教训及最佳工程实践。

注脚

展开查看详情

1.Alluxio 2.0 in a Nutshell Bin Fan binfan@alluxio.com

2.About Me Bin Fan Alluxio Founding Member CS PhD @ CMU Previously worked at Google Twitter: binfan Email : binfan@alluxio.com

3.Company Overview Founded Feb. 2015 – Haoyuan Li PhD research project “Tachyon” at UC Berkeley AMPLab Venture Backed Andreesen Horowitz etc. Open Source Business Model Project site: www.alluxio.org Open Sourced in Dec. 2012 Open source v1.0 released Feb. 2016 Latest stable version v1.8.1 in Sept. 2018 Office in San Mateo, CA Team: Google, Palantir , Vmware , AMD, Cisco…

4.Fast Growing Open Source Project in Data Eco-System Fastest Growing open-source project in the data ecosystem Running in world’s largest production clusters 800 + Contributors from 100+ organizations 10/27/18 4

5.Agenda Overview 1 Case Study 2 What’s new in 2.0 4 Architecture 3 Lessons 5

6.Emerging Data Ecosystem: Bigdata + ML Many Compute Frameworks Many Storage Systems Most not co-located 10/27/18 6

7.Moving to Cloud 7 A turnkey solution to self managed data platforms on IaaS Pros Cheaper Scalable Easier to maintain Cons Vendor-dependent Many apps are not cloud-native Data locality 10/27/18

8.Problems in Data Ecosystem Complexity Costly to integrat e new compute or storage Hard to maintain data sources plug-and- play Complicated to create data pipelines Efficiency Slow and expensive to accessing remote data repeatedly Data locality remains questionable; Potential performance penalty and semantics mismatch 10/27/18 8

9.How to Address the Challenges A unified data access layer

10.VFS OS Buffer Cache Disk Device Local Application VDFS (Alluxio) Persistent Storage Distributed Application Alluxio as a New Data Access Layer 10/27/18 10

11.Alluxio in Data Ecosystem Apps only talk to Alluxio Simple Add/Remove No App Changes Highest performance in Memory No Lock in 10/27/18 11

12.Technology Overview

13.Alluxio Innovations Storage Unification Bring all files into a single interface Interact with any data using one API Accelerate slow data transparently Common Data API Intelligent Cache 10/27/18 13

14.Alluxio Innovation: Storage Unification Enables effective data management across different storages 10/27/18 14

15.Alluxio Innovation: Common Data API Convert from Client-side Interface to native Storage Bigdata Filesystem API HDFS Connector S3A Connector Swift Connector Google Cloud Connector 10/27/18 15 POSIX Filesystem API

16.Alluxio Innovation: Intelligent Cache Local performance from remote data using multi-tier storage RAM SSD HDD Hot Warm Cold Read & Write Buffering Transparent to App Policies for pinning, promotion/demotion, TTL 10/27/18 16

17.Case Study

18.100+ Known Production Deployments AND MORE! 10/27/18 18

19.Machine Learning Case Study – A Top Hedge Fund Challenge – Slow training of model for algorithmic trading in $46B data driven Hedge Fund Data access was slow, costing them $$ in compute cost and lower modeler productivity SPARK HDFS SPARK HDFS Solution – With Alluxio, data access are 10-30X faster Impact – Increased efficiency on training of ML algorithm, lowered compute cost and increased modeler productivity, resulting in 14 day ROI of Alluxio MESOS MESOS Public Internet Public Internet 10/27/18 19 Confidential © Alluxio , Inc. All Rights Reserved.

20.Big Data Case Study – Challenge – Gain end to end view of business with large volume of data Queries were slow / not interactive, resulting in operational inefficiency Solution – ETL Data from Teradata to Alluxio Impact – Faster Time to Market – “Now we don’t have to work Sundays” Use Case : http://bit.ly/2oMx95W SPARK TERADATA SPARK TERADATA 10/27/18 20 Confidential © Alluxio , Inc. All Rights Reserved.

21.Machine Learning Case Study – Challenge – Large training dataset on Azure blob store, not accessible from TensorFlow directly Repeated data access, no caching Solution – Alluxio POSIX API to serve TensorFlow Impact – Enabler for Deep Learning workloads in their environment TensorFlow Azure Blob Store TensorFlow Azure Blob Store 10/27/18 21 Confidential © Alluxio , Inc. All Rights Reserved. Read more at https ://blogs.msdn.microsoft.com/cloudai/2018/05/01/tensorflow-on-azure-enabling-blob-storage-via-alluxio /

22.HPC /Machine Learning Partnership – Alluxio maximizes GPU investment : Self-serve data access for data scientists Rapid integration of new data sources Improved memory management & performance 10/27/18 22 Confidential © Alluxio , Inc. All Rights Reserved. Learn more at https://www.slideshare.net/Alluxio/flexible-and-fast -storage -for-deep-learning-with- alluxio

23.

24.A Distributed Storage System Under the Hood Architecture

25.Alluxio Architecture Alluxio Master Zookeeper Standby Master Alluxio Worker Alluxio Worker Under Store RAM / SSD / HDD RAM / SSD / HDD Control Path Data Path

26.Read Data not Cached in Alluxio + Caching 26 RAM / SSD / HDD Application Alluxio Client Alluxio Worker Under Store 10/27/18

27.Read Cached Data in Alluxio Alluxio Worker RAM / SSD / HDD Application Alluxio Client 10/27/18 27

28.Write data only to Alluxio Alluxio Worker RAM / SSD / HDD Application Alluxio Client 10/27/18 28

29.Write to Alluxio and Under Store Synchronously RAM / SSD / HDD Application Alluxio Client Alluxio Worker Under Store 10/27/18 29

30.Timeline: Alluxio 2.0-preview in Early 2019 Alluxio 2.0

31.Production-Ready Async Writes to Under Store 31 RAM / SSD / HDD Application Alluxio Client Alluxio Master Alluxio Worker Under Store Async Writes Step1: App writes to Alluxio Step2: Alluxio writes to UFS Benefits Apps writes in Alluxio speed Data gets persisted Challenges Data mutation Fault-tolerance

32.Replication: More Popular Data Gets More Replicas Alluxio Master Alluxio Worker Under Store Alluxio Worker Alluxio Worker Alluxio Worker Application Alluxio Client Block-1 Block-1 Application Alluxio Client Block-1 Application Alluxio Client Application Alluxio Client Block-1 Block-1 Pros: data layout is adaptive based on demand Cons: some data (e.g., common tables to join) gets excessive copies

33.Active Replication Control in Alluxio 2.0 Alluxio Master Alluxio Worker Under Store Alluxio Worker Alluxio Worker Alluxio Worker Application Alluxio Client Block-1 Block-1 Application Alluxio Client Block-1 Application Alluxio Client Application Alluxio Client Block-1 Block-1 SetReplicaMax (2)

34.Active Replication Control in Alluxio 2.0 Alluxio Master Alluxio Worker Under Store Alluxio Worker Alluxio Worker Alluxio Worker Application Alluxio Client Block-1 Block-1 Application Alluxio Client Application Alluxio Client Application Alluxio Client Block-1 Block-1 SetReplicaMin (3)

35.Support One Billion Files/ Dirs in Alluxio Goal: support 1 billion files in one namespace Current Bottleneck File system metadata is in JVM on-heap memory Approaches Move metadata storage out of JVM File System Metadata Block Metadata Worker Metadata RPC Service Alluxio Master JVM

36.UFS (HDFS) becomes completed isolated Benefits Simplify development and maintenance Challenges Library isolation Mounting HDFS at Different Versions 10/27/18 36 Alluxio Master 2.6 3.0 Mount /hdfs2.6 Mount /hdfs3.0

37.Running Alluxio in HA Zookeeper: Serve and elect the leader master for HA HDFS: Journal Storage shared among masters Problems Limited choice of journal storage local, streaming writes Hard to debug/recover on service outrage Hard to maintain Alluxio 1.x HA Relies on ZK/HDFS 10/27/18 37 Standby Master Leading Master Standby Master Shared Storage w rite journal Hello, leader read journal

38.Consensus achieved internally Leading masters commits state change Benefits Local disk for journal Challenges Performance tuning A New HA Mode without External Services 10/27/18 38 Standby Master Leading Master Standby Master Raft State Change State Change State Change

39.RPC System in Alluxio 1.x Master RPC using Thrift Filesystem metadata operations Worker RPC using Netty Data operations Problems Hard to maintain and extend two systems Thrift is not maintained, no streaming RPC support Alluxio Master Alluxio Worker Application Alluxio Client Thrift RPC Thrift RPC Netty RPC

40.Switch to gRPC in Alluxio 2.0 Unify all RPC interfaces using gRPC Benefits Streaming I/O Protobuf everywhere Well maintained & documented Challenges Performance tuning Alluxio Master Alluxio Worker Application Alluxio Client gRPC gRPC gRPC

41.Many lessons after 5-year development Lessons Learned

42.Design 2017 Alluxio, Inc. All Rights Reserved 42 Identify public APIs and ensure they are extensible in a backwards compatible fashion Modularization to improve productivity Implementation hiding helps productivity, testing, and discourages abstraction violations Account for exceptional control flow during resource management (e.g ., client, file, network channel ) to prevent resource leaks Design for failure and when it comes to concurrency

43.Lessons Learned - Implementation 2017 Alluxio, Inc. All Rights Reserved 43 Do not hold exclusive locks while performing RPCs or I/O Implementing highly concurrent (and correct) rename / delete / create is hard but it is fun and makes a difference for important workloads (MapReduce / Spark job commit) Use protobuf early in the project for compatibility to save time down the road When developing distributed systems, sooner or later you will need to implement a replicated state machine Invest in logging early; it will pay dividends when debugging

44.Lessons Learned - Operation 2017 Alluxio, Inc. All Rights Reserved 44 Users / application will misbehave (e.g. fail or forget to close connections) System resources are finite ports, file descriptors, network bandwidth, … Distributed storage latency can vary and be very large Distributed storage might implement rename as copy HDFS is considered reliable; when writing critical information (e.g., journals ) be careful (and good luck ) S3 can go down

45.Conclusion Alluxio provides unified data access layer for bigdata and ML applications Alluxio 2.0 is coming Try it out: www.alluxio.org

46.Thank you binfan@alluxio.com

47.Storage Interface for ML Applications

48.FUSE-Based POSIX API Deep Learning Frameworks Unified Data Storage Systems POSIX Filesystem API

49.Alluxio POSIX API Make all distributed data available locally SUPPORTS HDFS NFS OpenStack Ceph Amazon S3 Azure Google Cloud IT OPS FRIENDLY Storage mounted into Alluxio by central IT Security in Alluxio mirrors source data Authentication through LDAP/AD Wireline encryption HDFS #1 Obj Store NFS HDFS #2 10/27/18 49

50.Accelerate Deep Learning Input Pipeline Deep Learning training involves three stages of utilizing different resources: Data reads (I/O): e.g. choose and read image files from source. Data Preprocessing (CPU): e.g. decode image records into images, preprocess, and organize into mini-batches. Modeling training (GPU): Calculate and update the parameters in the multiple convolutional layers

51.Overcomes I/O bottleneck on Cloud More details at https ://www.alluxio.com/blog/flexible-and-fast-storage-for-deep-learning-with- alluxio

user picture
Alluxio,世界上第一个将分离的异构存储整合到统一平台,并提供近乎内存访问速度的中间件,广泛用于企业和混合云的商业数据分析加速。

相关文档