- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Recent work on HBase at Pinterest
来自 Pinterest 的技术lead徐良鸿分享了 HBase 在 Pinterest 的最新进展
Pinterest 目前集群规模50台,都部署在 AWS 上,数据量大概在 PB 级。2013年开始使用 HBase 0.94 , 2016年升级为1.2版本。
Pinterest 通过 Apache Omid 实现 HBase 对事务的支持,使用中发现 Omid 存在性能瓶颈,随后自研 Sparrow 系统,主要改进有:
将 commit 操作移到客户端,解决 Transaction Manager 单点问题。
将 Transaction Manager 改为多线程实现,begin 操作可以不用等待 commit 完成。Sparrow 与 Omid 相比,对于 P99 延时,Begin 阶段有100倍降低,commit 阶段也有3倍降低。
Pinterest 自研了 Argus 系统,与 Kafka 结合使用,提供 WAL 通知机制。大概的实现为:需要通知机制的数据会在 client 写入时添加标记,这些标记会被传到WAL 层面,通过 Kafka 将 WAL 提供给 Argus Observer 进行数据处理,处理方式是用户自定义的。
Pinterest 基于开源 Lily 实现 Ixia,用于实时构建 HBase 二级索引,同时整合了Muse,实现类 SQL 查询。大概的实现:写入 HBase 的数据会传到 Replication Proxy,通过 Kafka 打到 Indexer 中,index manager 会读取 HBase 数据的列,如果需要建索引,会将数据写入 Muse 中,Muse 会根据数据的 schema 做检索,query 会在 Muse 中查询,需要时会查询HBase。
徐良鸿介绍了 Argus 和 Ixia 设计的好处:
基于异步的复制机制,对写入的影响很小。
与 HBase 系统独立,分开运行,可以很快的进行数据处理。
展开查看详情
1 .
2 .Recent work on HBase at Pinterest Lianghong Xu Pinterest Software Engineer, Tech Lead
3 .Introduction
4 .HBase at Pinterest • Backend for many critical services • Graph database (Zen) • Generic KV store (UMS) • Around 50 HBase clusters • HBase 0.94 since 2013, HBase 1.2 since 2016 • Internal repo with ZSTD, CCSMAP, Bucket cache, etc.
5 .Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
6 .Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
7 .NoSQL Embracing Transactions SQL NoSQL Relational Simple Transactional Fast Expressive Scalable
8 .NoSQL Embracing Transactions SQL NoSQL Relational Simple Transactional Fast Expressive Scalable
9 .NoSQL Embracing Transactions SQL NoSQL Relational Simple Transactional Fast Expressive Scalable
10 .Apache Omid at Pinterest • Omid (Optimistically transaction Management In Datastores) • Transaction framework on top of KV stores with HBase support • Open-sourced by Yahoo! in 2016 • Powers next generation of Ads indexing at Pinterest
11 .Apache Omid at Pinterest • Omid (Optimistically transaction Management In Datastores) • Transaction framework on top of KV stores with HBase support • Open-sourced by Yahoo! in 2016 • Powers next generation of Ads indexing at Pinterest • Pros: simple, reasonable performance, HA, pluggable backend with native HBase support • Cons: No SQL interface, limited isolation levels, requires MVCC support
12 .Omid Architecture begin/commit Client Transaction Manager (TM) timestamp/commit status check persist read/write commit commit Commit Data tables table
13 .Omid internals • Leverages Multi-version Concurrency Control (MVCC) support in HBase • Transaction ID (begin timestamp) in version, commit timestamp in shadow cell • OCC: lock-free implementation with central conflict detection mechanism Omid data and commit table
14 .Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
15 .Omid Architecture begin/commit Client Transaction Manager (TM) timestamp/commit status check persist read/write commit commit Commit Data tables table
16 .Omid Scalability Problem begin/commit Client Transaction Manager (TM) timestamp/commit status check persist read/write commit commit Centralized batch commit to HBase Commit Data tables table
17 . Single-threaded request/reply Omid Scalability Problem processor for serializability begin/commit Client Transaction Manager (TM) timestamp/commit status check persist read/write commit commit Centralized batch commit to HBase Commit Data tables table
18 . Single-threaded request/reply Sparrow Architecture processor for serializability begin/commit Client Transaction Manager (TM) timestamp/commit status check read/write commit Commit Data tables table
19 . Single-threaded request/reply Sparrow Architecture processor for serializability begin/commit Client Transaction Manager (TM) timestamp/commit status check read/write commit persist commit Commit Data tables table Distributed client-side commit
20 . Parallel request processing Sparrow Architecture begin/commit Client Transaction Manager (TM) timestamp/commit status check read/write commit persist commit Commit Data tables table Distributed client-side commit
21 .Sparrow: Omid made scalable begin/commit Client Transaction Manager (TM) timestamp/commit status Distributed client-side commit check persist read/write commit Parallel conflict detection commit persist commit Performance bottleneck Commit Data tables table
22 .Sparrow techniques • Client-side commit • Client writes to commit table when there is no conflicts • Explicitly mark aborted txn in commit table (-1) • Reader may back off and abort concurrent writer in case of client failure or network partition • Avoid performance bottleneck on TM • Parallel request processing • Multi-threaded request processor with in-memory conflict map • beginTx no longer needs to wait until whole commit batch is written to HBase • Timestamp allocation still needs to be synchronized (with negligible overhead)
23 .Sparrow vs. Omid beginTx P99: ~100X reduction commitTx P99: ~3X reduction
24 .Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
25 .Argus: Motivation and Problem Statement • Clients request a real-time notification feature similar to a database trigger • Incremental processing based on database changes • Notification cannot be missed - ”at least once” • Notification events could have different priorities and object types
26 .Kafka-based Notification Pipeline
27 .Kafka-based Notification Pipeline Percolator (Google) • Special notification column • Observer threads periodically scan for changes • Heavy-weight distributed scan and locking
28 .Kafka-based Notification Pipeline Percolator Argus (Google) • Special notification column • Async notification by tailing HBase WAL • Observer threads periodically scan for changes • Kafka for replayable DB change stream • Heavy-weight distributed scan and locking • Support different priorities and types • Lightweight, minimal impact on DB
29 .Argus Architecture Annotated requests read/write Client HBase WAL Replication proxy Notification events (Kafka) Argus Observer • HBase annotation: extra metadata in HBase requests to be passed down into WAL • Replication Proxy: ”fake” regionservers with only replication RPC implemented