- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
HBase中的事务
展开查看详情
1 .Transactions in HBase Andreas Neumann gokul at cask.co Gokul Gunasekaran anew at apache.org HbaseCon June 2017 @caskoid
2 . Goals of this Talk - Why transactions? - Optimistic Concurrency Control - Three Apache projects: Omid, Tephra, Trafodion - How are they different? 2
3 . Transactions in noSQL? History • SQL: RDBMS, EDW, … • noSQL: MapReduce, HDFS, HBase, … • n(ot)o(nly)SQL: Hive, Phoenix, … Motivation: • Data consistency under highly concurrent loads • Partial outputs after failure • Consistent view of data for long-running jobs • (Near) real-time processing 3
4 . Stream Processing Queue ... Flowlet ... ... HBase Table ... ... 4
5 . Write Conflict! Queue ... Flowlet ... ... HBase Table ... ... 5
6 . Transactions to the Rescue Queue ... Flowlet ... ... HBase Table - Atomicity of all writes involved - Protection from concurrent update 6
7 . ACID Properties From good old SQL: • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably 7
8 . What is HBase? Client Region Server Region Server Coprocessor … Coprocessor Region … Region Region … Region 8
9 . What is HBase? Simplified: • Distributed Key-Value Store • Key = <row>.<family>.<column>.<timestamp> • Partitioned into Regions (= continuous range of rows) • Each Region Server hosts multiple regions • Optional: Coprocessor in Region Server • Durable writes 9
10 . ACID Properties in HBase • Atomic • At cell, row, and region level • Not across regions, tables or multiple calls • Consistent - No built-in rollback mechanism • Isolated - Timestamp filters provide some level of isolation • Durable - Once committed, data is persisted reliably How to implement full ACID? 10
11 . Implementing Transactions • Traditional approach (RDBMS): locking • May produce deadlocks • Causes idle wait • complex and expensive in a distributed env • Optimistic Concurrency Control • lockless: allow concurrent writes to go forward • on commit, detect conflicts with other transactions • on conflict, roll back all changes and retry • Snapshot Isolation • Similar to repeatable read • Take snapshot of all data at transaction start • Read isolation 11
12 . Optimistic Concurrency Control client1: start x=10 fail/rollback time client2: start read x commit must see the old value of x 12
13 . Optimistic Concurrency Control client1: start incr x commit x=10 x=11 time client2: start incr x commit sees the old rollback value of x=10 13
14 .Conflicting Transactions time tx:A 14
15 . Conflicting Transactions time tx:A tx:B 14
16 . Conflicting Transactions time tx:A tx:B tx:C (A fails) 14
17 . Conflicting Transactions time tx:A tx:B tx:C (A fails) tx:D (A fails) 14
18 . Conflicting Transactions time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) 14
19 . Conflicting Transactions time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) tx:F (F fails) 14
20 . Conflicting Transactions time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) tx:F (F fails) tx:G 14
21 . Conflicting Transactions • Two transactions have a conflict if • they write to the same cell • they overlap in time • If two transactions conflict, the one that commits later rolls back • Active change set = set of transactions t such that: • t is committed, and • there is at least one in-flight tx t’ that started before t’s commit time • This change set is needed in order to perform conflict detection. 15
22 . HBase Transactions in Apache (incubating) Apache Omid (incubating) (incubating) 16
23 . In Common • Optimistic Concurrency Control must: • maintain Transaction State: • what tx are in flight and committed? • what is the change set of each tx? (for conflict detection, rollback) • what transactions are invalid (failed to roll back due to crash etc.) • generate unique transaction IDs • coordinate the life cycle of a transaction • start, detect conflicts, commit, rollback • All of { Omid, Tephra, Trafodion } implement this • but vary in how they do it 17
24 . Apache Tephra • Based on the original Omid paper: Daniel Gómez Ferro, Flavio Junqueira, Ivan Kelly, Benjamin Reed, Maysam Yabandeh: Omid: Lock-free transactional support for distributed data stores. ICDE 2014. • Transaction Manager: • Issues unique, monotonic transaction IDs • Maintains the set of excluded (in-flight and invalid) transactions • Maintains change sets for active transactions • Performs conflict detection • Client: • Uses transaction ID as timestamp for writes • Filters excluded transactions for isolation • Performs rollback 18
25 .Transaction Lifecycle 19
26 .Transaction Lifecycle start new tx in progress 19
27 .Transaction Lifecycle write start new tx to in progress HBase 19
28 .Transaction Lifecycle write start new tx to in progress HBase detect conflicts 19
29 .Transaction Lifecycle write start new tx to in progress HBase detect conflicts ok complete make visible 19