Recent work on HBase at Pinterest

来自 Pinterest 的技术lead徐良鸿分享了 HBase 在 Pinterest 的最新进展
Pinterest 目前集群规模50台,都部署在 AWS 上,数据量大概在 PB 级。2013年开始使用 HBase 0.94 , 2016年升级为1.2版本。

Pinterest 通过 Apache Omid 实现 HBase 对事务的支持,使用中发现 Omid 存在性能瓶颈,随后自研 Sparrow 系统,主要改进有:

  • 将 commit 操作移到客户端,解决 Transaction Manager 单点问题。

  • 将 Transaction Manager 改为多线程实现,begin 操作可以不用等待 commit 完成。Sparrow 与 Omid 相比,对于 P99 延时,Begin 阶段有100倍降低,commit 阶段也有3倍降低。

Pinterest 自研了 Argus 系统,与 Kafka 结合使用,提供 WAL 通知机制。大概的实现为:需要通知机制的数据会在 client 写入时添加标记,这些标记会被传到WAL 层面,通过 Kafka 将 WAL 提供给 Argus Observer 进行数据处理,处理方式是用户自定义的。

Pinterest 基于开源 Lily 实现 Ixia,用于实时构建 HBase 二级索引,同时整合了Muse,实现类 SQL 查询。大概的实现:写入 HBase 的数据会传到 Replication Proxy,通过 Kafka 打到 Indexer 中,index manager 会读取 HBase 数据的列,如果需要建索引,会将数据写入 Muse 中,Muse 会根据数据的 schema 做检索,query 会在 Muse 中查询,需要时会查询HBase。

徐良鸿介绍了 Argus 和 Ixia 设计的好处:

  • 基于异步的复制机制,对写入的影响很小。

  • 与 HBase 系统独立,分开运行,可以很快的进行数据处理。

展开查看详情

1.

2.Recent work on HBase at Pinterest Lianghong Xu Pinterest Software Engineer, Tech Lead

3.Introduction

4.HBase at Pinterest • Backend for many critical services • Graph database (Zen) • Generic KV store (UMS) • Around 50 HBase clusters • HBase 0.94 since 2013, HBase 1.2 since 2016 • Internal repo with ZSTD, CCSMAP, Bucket cache, etc.

5.Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing

6.Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing

7.NoSQL Embracing Transactions SQL NoSQL Relational Simple Transactional Fast Expressive Scalable

8.NoSQL Embracing Transactions SQL NoSQL Relational Simple Transactional Fast Expressive Scalable

9.NoSQL Embracing Transactions SQL NoSQL Relational Simple Transactional Fast Expressive Scalable

10.Apache Omid at Pinterest • Omid (Optimistically transaction Management In Datastores) • Transaction framework on top of KV stores with HBase support • Open-sourced by Yahoo! in 2016 • Powers next generation of Ads indexing at Pinterest

11.Apache Omid at Pinterest • Omid (Optimistically transaction Management In Datastores) • Transaction framework on top of KV stores with HBase support • Open-sourced by Yahoo! in 2016 • Powers next generation of Ads indexing at Pinterest • Pros: simple, reasonable performance, HA, pluggable backend with native HBase support • Cons: No SQL interface, limited isolation levels, requires MVCC support

12.Omid Architecture begin/commit Client Transaction Manager (TM) timestamp/commit status check persist read/write commit commit Commit Data tables table

13.Omid internals • Leverages Multi-version Concurrency Control (MVCC) support in HBase • Transaction ID (begin timestamp) in version, commit timestamp in shadow cell • OCC: lock-free implementation with central conflict detection mechanism Omid data and commit table

14.Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing

15.Omid Architecture begin/commit Client Transaction Manager (TM) timestamp/commit status check persist read/write commit commit Commit Data tables table

16.Omid Scalability Problem begin/commit Client Transaction Manager (TM) timestamp/commit status check persist read/write commit commit Centralized batch commit to HBase Commit Data tables table

17. Single-threaded request/reply Omid Scalability Problem processor for serializability begin/commit Client Transaction Manager (TM) timestamp/commit status check persist read/write commit commit Centralized batch commit to HBase Commit Data tables table

18. Single-threaded request/reply Sparrow Architecture processor for serializability begin/commit Client Transaction Manager (TM) timestamp/commit status check read/write commit Commit Data tables table

19. Single-threaded request/reply Sparrow Architecture processor for serializability begin/commit Client Transaction Manager (TM) timestamp/commit status check read/write commit persist commit Commit Data tables table Distributed client-side commit

20. Parallel request processing Sparrow Architecture begin/commit Client Transaction Manager (TM) timestamp/commit status check read/write commit persist commit Commit Data tables table Distributed client-side commit

21.Sparrow: Omid made scalable begin/commit Client Transaction Manager (TM) timestamp/commit status Distributed client-side commit check persist read/write commit Parallel conflict detection commit persist commit Performance bottleneck Commit Data tables table

22.Sparrow techniques • Client-side commit • Client writes to commit table when there is no conflicts • Explicitly mark aborted txn in commit table (-1) • Reader may back off and abort concurrent writer in case of client failure or network partition • Avoid performance bottleneck on TM • Parallel request processing • Multi-threaded request processor with in-memory conflict map • beginTx no longer needs to wait until whole commit batch is written to HBase • Timestamp allocation still needs to be synchronized (with negligible overhead)

23.Sparrow vs. Omid beginTx P99: ~100X reduction commitTx P99: ~3X reduction

24.Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing

25.Argus: Motivation and Problem Statement • Clients request a real-time notification feature similar to a database trigger • Incremental processing based on database changes • Notification cannot be missed - ”at least once” • Notification events could have different priorities and object types

26.Kafka-based Notification Pipeline

27.Kafka-based Notification Pipeline Percolator (Google) • Special notification column • Observer threads periodically scan for changes • Heavy-weight distributed scan and locking

28.Kafka-based Notification Pipeline Percolator Argus (Google) • Special notification column • Async notification by tailing HBase WAL • Observer threads periodically scan for changes • Kafka for replayable DB change stream • Heavy-weight distributed scan and locking • Support different priorities and types • Lightweight, minimal impact on DB

29.Argus Architecture Annotated requests read/write Client HBase WAL Replication proxy Notification events (Kafka) Argus Observer • HBase annotation: extra metadata in HBase requests to be passed down into WAL • Replication Proxy: ”fake” regionservers with only replication RPC implemented