申请试用
HOT
登录
注册
 
Alluxio2.5 + Bring data locality to ML and AI workload
0 点赞
0 收藏
1下载
Alluxio
/
发布于
/
221
人观看

刘嘉承,2017年哥伦比亚大学计算机科学硕士毕业。2019年加入Alluxio开发团队,负责Alluxio和云场景相关及部分核心组件的开发工作。

展开查看详情

1.What’s new in Alluxio 2.5 Alluxio Day II 2021/03/11- Jiacheng Liu

2.About me ● Jiacheng Liu ● Master of CS @ Columbia University ● Software Engineer @ Alluxio ● Core maintainer

3.Features

4.Feature List Community Enterprise ● JNI Based POSIX API ● AWS STS Support ● S3 Northbound API v2 ● Hybrid Quickstart with Alluxio Hub ● ADLS Gen 2 UFS Connector ● Compatibility with Ranger 2.0 ● Native GCS Connector* ● Remote Logging in K8s Environments

5.JNI Based POSIX API Description Alluxio 2.5 introduces a new JNI-based FUSE integration to support POSIX data access. This new JNI-based FUSE integration improves the performance by 3x to 5x for workloads of high-performance and high-concurrency such as AI/ML training. Specification Envs - Community Testing - K8s Scale - Community Testing - up to 200 nodes. Multi-Tenancy - Single user support. Workflows - Community Testing - AI/ML workloads. Conceptually - Traditional POSIX workloads. Impact Fundamental module for supporting POSIX workloads going forward. Success with community will lead to reference architecture and product definition to expand the use cases. Expect this in 2.6 development cycle.

6.S3 Northbound API v2 Description Support for S3Browser to browse the Alluxio namespace through the S3 Northbound API. Specification Scale - Administrative QPS (< 100 / s). Compatibility - Supports select subset of S3 interface to enable basic browsing, modification, and downloading/uploading capabilities. Multi-Tenancy/Authentication - Multi-user, Simple (server trusts client) Impact S3Browser is an administrative tool, especially for Windows platforms. Availability of a REST management interface greatly improves Alluxio compatibility.

7.ADLS Gen 2/GCS Connector Description Alluxio has a new connector to ADLS Gen 2, and an updated connector for GCS. Specification Environments - ADLS Gen 2, GCS Impact Users can run Alluxio with ADLS Gen 2. GCS is also better supported because the native SDK has more optimizations and the latest features. One important feature for GCS is support for JSON based authentication.

8.Remote Logging in K8s Environments Description Alluxio 2.5 supports remote log server in K8s environment. One challenge users have in a containerized environment is the logs getting disposed or overwritten when a container is killed or restarted. With the remote logger, the logs will be sent to a centralized location (a dedicated pod). Specification Environments - Cloud/On-Prem K8s Impact Enables us to effectively gather logs in K8s environments.

9.AWS STS Support Description Alluxio supports connecting to S3 using the secure token service as opposed to traditional authentication methods such as specifying access key and secret key. STS is AWS’s recommended authentication paradigm and has benefits such as all credentials are temporary, cross account bucket sharing, and fine grained privilege control. Specification Environments - AWS S3 as under storage Impact This is a common requirement for user environments which are AWS cloud native and already use STS.

10.Hybrid Quickstart with Alluxio Hub - 2.4 ● Alluxio Hub was added in Alluxio 2.4 ● Management console for Alluxio clusters ● Wizards for connecting to external storage and validating integrations ● System monitoring ● Configuration management

11.Hybrid Quickstart with Alluxio Hub - 2.5 Description Hub is now supported on Kubernetes to aid cluster configuration and connectivity across private data centers or public clouds. AWS users now also have access to a quickstart using Terraform to deploy an Alluxio cluster with Amazon EMR in minutes. Once an Alluxio cluster is deployed, either using the new Terraform or helm on Kubernetes, the Hub is available to manage subsequent changes. Specification Environments - Cloud PaaS (ie. AWS EC2) with K8s, On-prem K8s, AWS EMR

12.Compatibility with Ranger 2.0 Description Alluxio 2.5 supports integration with Ranger 2.0 for third party authorization. Ranger 2.0 includes finer grained security policies and is the default for CDP 7. Specification Environments - Ranger 2.0 Impact Customers on CDH 7 will be able to integrate with Ranger 2.0 and provide access to data which is guarded by Ranger.

13.Improvements

14.Metrics We added a few key metrics to show Alluxio’s value and troubleshoot system limitations. Expect a more comprehensive metrics/instrumentation story in 2.6. ● Better Web UI metric graphs ● Alluxio data cache hit rate ● Alluxio metadata cost savings ● Lock utilization (troubleshoot out of locks issue) ● Inode cache effectiveness (troubleshoot RocksDB)

15.Logging / Recovery We’ve made several improvements to the cycle of informing errors, attempting to gracefully recover, and cold restarts. ● Make error messages more user friendly (10+ changes, constant effort) ● Allow journal replay without ZK connection (reduces chance of master restarts failing due to GC) ● Improve tiered storage out of space handling

16.Welcome to Join Alluxio Community! Alluxio DingTalk Group

0 点赞
0 收藏
1下载