The Procedure v2 Implementation of WAL Splitting and ACL

该议题由来自小米的 HBase Committer 梅祎分享,她也是中国区唯一的女性Committer.
分享主要分为3个部分:

第一部分:主要介绍了 ProcedureV2 的核心原理,在 PPT 中,她对 ProcedureV2 各组件的介绍以及执行回滚流程的演示,应该是我见过的所有讲 ProcedureV2 的文档中最清晰易懂的了。非常推荐对 ProcedureV2 感兴趣的朋友去学习一下这个PPT。

第二部分:介绍了如何用 ProcedureV2 重构社区的 HBase Grant/Revoke ACL 流程。重构的目的主要有几个:

  • 原始设计采用 Zookeeper 通知机制来实现各 RegionServer的ACL 更新,整个过程依赖 Zookeeper,而且流程相当于是异步的。一旦某些 RS 的 ACL 缓存更新失败(有可能但概率很低),则容易造成各节点之间的 ACL 权限不一致。而采用ProcedureV2 重写之后,整个流程变成同步流程,则不再存在这个问题了,此外还去掉该功能了对 Zookeeper 服务的依赖。

  • 重构的另一个初衷在于,期望在执行 Grant 和 Revoke 时能暴露一些 Coprocessor 接口。例如有一个非常经典的场景就是,某些用户期望通过扫Snapshot 来跑离线任务,但由于扫 Snapshot 需要 HDFS 的权限,而 HBase 的权限管理跟 HDFS 的权限管理完全是两套机制。这时候,就可以实现一个Coprocessor 在 Grant 和 Revoke 时完成 HBase 权限和 HDFS 权限的同步,从而让那些有表权限的用户能立马访问表的 Snapshot.这个功能将在 HBASE-18659中支持。

第三部分:介绍了基于 ProcedureV2 重写了 WAL Splitting 的流程。考虑的点跟 ACL类似,主要是异步流程重写成更可控的同步流程,同时去掉了对 Zookeeper 的依赖。更多细节请参考演讲 PPT 和视频。

展开查看详情

1.

2.The Procedure v2 Implementation of WAL Splitting and ACL meiyi@xiaomi.com HBase Committer

3. Abstract ❏ Introduction of Procedure v2 ❏ Overview ❏ Execution and Rollback ❏ Models ❏ ACL ❏ ACL based on ZK Notification ❏ ACL based on Procedure v2 ❏ WAL Splitting ❏ WAL Splitting based on ZK Coordination ❏ WAL Splitting based on Procedure v2

4. Abstract ❏ Introduction of Procedure v2 ❏ Overview ❏ Execution and Rollback ❏ Models ❏ ACL ❏ ACL based on ZK Notification ❏ ACL based on Procedure v2 ❏ WAL Splitting ❏ WAL Splitting based on ZK Coordination ❏ WAL Splitting based on Procedure v2

5.Goal of Procedure v2 Aims to provide a unified way to build: • multi-steps procedure in case of failure (e.g. Create table) • notifications across multiple machines (e.g. ACLs/Quota cache updates) • coordination of long-running/heavy procedures (e.g. splits) • procedures across multiple machines (e.g. Assignment)

6.Build and run state machines do some checks step1 create table fs layout step1 step2 add regions to meta step2 step3-1 step3-2 step3-3 step3 assign region assign region assign region subprocedures step4 update descriptors cache (steps of create table)

7.Overview ProcedureExecutor Push Worker Worker ... Worker Insert Update Delete ProcedureScheduler Poll meta Queue Load server Queue ProcedureStore peer Queue table Queue HDFS

8.Overview 1. Submit ProcedureExecutor 3. Push Worker Worker ... Worker 2. Insert 5. Update Delete ProcedureScheduler 4. Poll meta Queue Load server Queue ProcedureStore peer Queue table Queue HDFS

9.Procedure execution A B C D 1 2 3 0 A B C G state 0 D E F procedure

10.Procedure execution Queue Back 0 Queue Front Stack Top Stack Bottom A B D C 1 2 3 state 0 procedure

11.Procedure execution Queue Back Queue Front Stack Top Stack Bottom A B D C 1 2 3 state 0 procedure

12.Procedure execution Queue Back Queue Front Stack Top 0 Stack Bottom A B D C 1 2 3 state 0 procedure

13.Procedure execution Queue Back Queue Front Stack Top 0 0 Stack Bottom A B D C 1 2 3 state 0 procedure

14.Procedure execution Queue Back 3 2 1 Queue Front Stack Top 0 0 0 Stack Bottom A B D C 1 2 3 state 0 procedure

15.Procedure execution Queue Back 3 2 Queue Front Stack Top 0 0 0 Stack Bottom A B D C 1 2 3 state 0 procedure

16.Procedure execution Queue Back 3 Queue Front Stack Top 1 0 0 0 Stack Bottom A B D C 1 2 3 state 0 procedure

17.Procedure execution Queue Back Queue Front Stack Top 2 1 0 0 0 Stack Bottom A B D C 1 2 3 state 0 procedure

18.Procedure execution Queue Back 0 Queue Front Stack Top 3 2 1 0 0 0 Stack Bottom A B D C 1 2 3 state 0 procedure

19.Procedure execution Queue Back Queue Front Stack Top 3 2 1 0 0 0 Stack Bottom A B D C 1 2 3 state 0 procedure

20.Procedure execution Queue Back Queue Front Stack Top 0 3 2 1 0 0 0 Stack Bottom A B D C 1 2 3 state 0 procedure

21.Procedure Rollback Queue Back Queue Front Stack Top 1 0 0 Stack Bottom A B 1 state 0 procedure failed procedure

22.Procedure Rollback Queue Back Queue Front Stack Top 0 0 Stack Bottom A B 1 state 0 procedure failed procedure

23.Procedure Rollback Queue Back Queue Front Stack Top 0 Stack Bottom A B 1 state 0 procedure failed procedure

24.Procedure Rollback Queue Back Queue Front Stack Top Stack Bottom A B 1 state 0 procedure failed procedure

25.StateMachineProcedure ● enum of states, describing the various steps of the procedure ● transition from one state to another after calling executeFromState method do some checks CREATE_TABLE_PRE_OPERATION create table fs layout CREATE_TABLE_WRITE_FS_LAYOUT add regions to meta CREATE_TABLE_ADD_TO_META assign region assign region assign region CREATE_TABLE_ASSIGN_REGIONS update descriptors cache CREATE_TABLE_UPDATE_DESC_CACHE

26.RemoteProcedureDispatcher • Dispatch aggregated RPCs to remote server • Remote server report procedures execute states in a heartbeat remoteDispatch RegionServer master RegionServer reportProcedureDone RegionServer

27. Abstract ❏ Introduction of Procedure v2 ❏ Overview ❏ Execution and Rollback ❏ Models ❏ ACL ❏ ACL based on ZK Notification ❏ ACL based on Procedure v2 ❏ WAL Splitting ❏ WAL Splitting based on ZK Coordination ❏ WAL Splitting based on Procedure v2

28.ACL Overview • Every server keeps permission cache. • Check if operation is allowed by hooks of AccessController. Zookeeper RegionServer Master PermissionCache PermissionCache AccessController AccessController Region AccessController

29.The Process of Grant/Revoke Master RegionServer PermissionCache (hbase: acl region) Zookeeper 4 2 RegionServer put or delete in acl table table 4 PermissionCache 3 @namespace update ZK hbase:acl 4 RegionServer 4 PermissionCache PermissionCache notification 1 grant/revoke Client