HBase中的事务

Transactions in HBase
展开查看详情

1.Transactions in HBase Andreas Neumann
 gokul at cask.co
 Gokul Gunasekaran anew at apache.org HbaseCon June 2017 @caskoid

2. Goals of this Talk - Why transactions? - Optimistic Concurrency Control - Three Apache projects: Omid, Tephra, Trafodion - How are they different? 2

3. Transactions in noSQL? History • SQL: RDBMS, EDW, … • noSQL: MapReduce, HDFS, HBase, … • n(ot)o(nly)SQL: Hive, Phoenix, … Motivation: • Data consistency under highly concurrent loads • Partial outputs after failure • Consistent view of data for long-running jobs • (Near) real-time processing 3

4. Stream Processing Queue ... Flowlet ... ... HBase Table ... ... 4

5. Write Conflict! Queue ... Flowlet ... ... HBase Table ... ... 5

6. Transactions to the Rescue Queue ... Flowlet ... ... HBase Table - Atomicity of all writes involved - Protection from concurrent update 6

7. ACID Properties From good old SQL: • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably 7

8. What is HBase? Client Region Server Region Server Coprocessor … Coprocessor Region … Region Region … Region 8

9. What is HBase? Simplified: • Distributed Key-Value Store • Key = <row>.<family>.<column>.<timestamp> • Partitioned into Regions (= continuous range of rows) • Each Region Server hosts multiple regions • Optional: Coprocessor in Region Server • Durable writes 9

10. ACID Properties in HBase • Atomic • At cell, row, and region level • Not across regions, tables or multiple calls • Consistent - No built-in rollback mechanism • Isolated - Timestamp filters provide some level of isolation • Durable - Once committed, data is persisted reliably How to implement full ACID? 10

11. Implementing Transactions • Traditional approach (RDBMS): locking • May produce deadlocks • Causes idle wait • complex and expensive in a distributed env • Optimistic Concurrency Control • lockless: allow concurrent writes to go forward • on commit, detect conflicts with other transactions • on conflict, roll back all changes and retry • Snapshot Isolation • Similar to repeatable read • Take snapshot of all data at transaction start • Read isolation 11

12. Optimistic Concurrency Control client1: start x=10 fail/rollback time client2: start read x commit must see the old value of x 12

13. Optimistic Concurrency Control client1: start incr x commit x=10 x=11 time client2: start incr x commit sees the old 
 rollback value of x=10 13

14.Conflicting Transactions time tx:A 14

15. Conflicting Transactions time tx:A tx:B 14

16. Conflicting Transactions time tx:A tx:B tx:C (A fails) 14

17. Conflicting Transactions time tx:A tx:B tx:C (A fails) tx:D (A fails) 14

18. Conflicting Transactions time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) 14

19. Conflicting Transactions time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) tx:F (F fails) 14

20. Conflicting Transactions time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) tx:F (F fails) tx:G 14

21. Conflicting Transactions • Two transactions have a conflict if • they write to the same cell • they overlap in time
 • If two transactions conflict, the one that commits later rolls back • Active change set = set of transactions t such that: • t is committed, and • there is at least one in-flight tx t’ that started before t’s commit time
 • This change set is needed in order to perform conflict detection. 15

22. HBase Transactions in Apache (incubating) Apache Omid (incubating) (incubating) 16

23. In Common • Optimistic Concurrency Control must: • maintain Transaction State: • what tx are in flight and committed? • what is the change set of each tx? (for conflict detection, rollback) • what transactions are invalid (failed to roll back due to crash etc.) • generate unique transaction IDs • coordinate the life cycle of a transaction • start, detect conflicts, commit, rollback • All of { Omid, Tephra, Trafodion } implement this • but vary in how they do it 17

24. Apache Tephra • Based on the original Omid paper: Daniel Gómez Ferro, Flavio Junqueira, Ivan Kelly, Benjamin Reed, Maysam Yabandeh:
 Omid: Lock-free transactional support for distributed data stores. ICDE 2014.
 • Transaction Manager: • Issues unique, monotonic transaction IDs • Maintains the set of excluded (in-flight and invalid) transactions • Maintains change sets for active transactions • Performs conflict detection • Client: • Uses transaction ID as timestamp for writes • Filters excluded transactions for isolation • Performs rollback 18

25.Transaction Lifecycle 19

26.Transaction Lifecycle start new tx in progress 19

27.Transaction Lifecycle write start new tx to in progress HBase 19

28.Transaction Lifecycle write start new tx to in progress HBase detect conflicts 19

29.Transaction Lifecycle write start new tx to in progress HBase detect conflicts ok complete make visible 19

为了让众多HBase相关从业人员及爱好者有一个自由交流HBase相关技术的社区,阿里巴巴、小米、华为、网易、京东、滴滴、知乎等公司的HBase技术研究人员共同发起了组建中国HBase技术社区。