PingCAP-Infra-Meetup-91-Distributed-Transaction+in+TiDB

本次分享的主题是分布式事务在 TiDB 中的实现,主要围绕以下三个方面展开: 1.分布式事务的定义 2.Percolator 中事务的实现 3.TiDB 中事务的实现及注意事项 首先,在分布式事务的定义中,主要介绍了 ACID 和四种常见隔离级别。然后解读了 Percolator 中事务实现,核心内容包括:1. 基于快照隔离级别的优缺点;2. 如何通过两阶段提交实现跨行跨表的分布式事务。 最后,我们详细介绍了 TiDB 中分布式事务的实现,包括 TiDB 如何将关系型数据转化成 key-value 存储,TiDB 中两阶段提交的实现细节及异常处理,以及 TiDB 事务使用过程中的注意事项。
展开查看详情

1. Head First Distributed Transaction in TiDB Presented by wuxuelian

2.Agenda ● ACID ● ISOLATION LEVEL ● Percolator ● Transaction in TiDB

3.Part I - ACID

4.ACID $7 $7 Bob: $10 Joe: $2 Bob: $10 Joe: $2 Success Failed Bob: $3 Joe: $9 Bob: $10 Joe: $2 Bob: $10 Joe: $9 Bob: $3 Joe: $2

5.ACID ● Atomicity ○ Each transaction is treated as a single "unit", which either succeeds completely, or fails completely ● Consistency ○ Any data written to the database must be valid according to all defined rules. ● Isolation ○ Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially ● Durability ○ Once a transaction has been committed, it will remain committed even in the case of a system failure

6.Part II - Isolation Levels

7.Read uncommitted Session A Session B begin; select account from account where id = 1 // will get 1000 begin; update account set account=account+500 where id = 1 // not commit here select account from account where id = 1 // will get 1500 (Dirty read) rollback;

8.Read committed Session A Session B begin; select account from account where id=1; // get 1000 begin; update account set account = account+500 where id=1; commit; select account from account where id = 1; // get 1500 (Non-repeatable reads) commit;

9.Repeatable read Session A Session B begin; select account from account where id=1; // get 1000 begin; update account set account = account+500 where id=1; commit; select account from account where id = 1; // get 1000 commit;

10.Repeatable read Session A Session B begin; select id from account; // get id(1), id(2) begin; insert into account values(3,"Dada",5000); commit; select id from account; // get id(1), id(2) insert into account values(3,"Dada",5000); // ERROR 1062 (23000): Duplicate entry '3' for key 'PRIMARY' (Phantom reads)

11.Serializable Session: A Session: B

12.Summary

13.Part III - Percolator

14.Snapshot Isolation Time 1 2 3 ● Read: read from a stable snapshot at some timestamp ● Write: protects against write-write conflicts.

15.2 Phase Commit Bob have $10, Joe have $2, Bob will give Joe $7. key data lock write Bob 5: $10 6: data @5 Joe 5: $2 6: data @5

16.Phase#1 : Prewrite key data lock write Bob 5: $10 6: data @5 7:$3 7:I’m primary Joe 5: $2 6: data @5 7:$9 7:primary @ Bob

17.Phase#2: Primary Commit (Sync) Bob have $3, Joe have $9 now. key data lock write Bob 5: $10 6: data @5 7:$3 7: I’m primary 8: data @7 Joe 5: $2 6: data @5 7:$9 7: primary @ Bob

18.Phase#2: Secondary Commit (Async) Bob have $3, Joe have $9 now. key data lock write Bob 5: $10 6: data @5 7:$3 7: I’m primary 8: data @7 Joe 5: $2 6: data @5 7:$9 7: primary @ Bob 8: data @7

19.Summary ● Advantage ○ Simple ○ Implement cross-row transaction based on single-row transaction (BigTable) ○ Decentralized lock management ● Disadvantage ○ Centralized timestamp oracle. ○ More RPC

20.Part IV - Transaction in TiDB

21.Architecture TiDB ... TiDB ... TiDB Metadata / Timestamp request Placement Driver (PD) Raft groups Region 1 Region 1 Region 2 Region 1 Region 2 Region 2 Region 3 Region 3 Control flow: Balance / Failover Region 3 ... ... ... ... tikv1 tikv2 tikv3 tikv4 PingCAP.com

22.How to convert from SQL to Key-Value id (primary) name(unique) age(non-unique) score 1 Bob 12 99 SQL Model index_type key value primary_index 1 (Bob, 12, 99) name(unique) Bob 1 age(non-unique) (12,1) null Key-Value Model

23.Column Families in RocksDB Column Family Key Value Data key, start_ts value Lock key start_ts, primary_key, ttl Write key, commit_ts start_ts [, short_value] ● Start_ts: timestamp when the transaction begins ● Commit_ts: timestamp get after prewrite, use in commit. ● Primary_key: key used to store the status of transaction. ● Short_value: value which is short.(with length<64 byte)

24.2 PC in TiDB

25. Prewrite Errors: ● WriteConflict (newer version exist) ● KeyIsLocked

26. Commit Errors: ● Lock Not Found

27.Attentions for Using Optimistic Lock session 1 session 2 begin; begin; select balance from T where id = 1; update T set balance=balance - 100 where id =1; // use the result of select update T set balance=balance - 100 where id = 2; if balance > 100 { update T set balance = balance + 100 where id = 2; } commit; // auto retry commit; Set @@global.tidb_disable_txn_auto_retry = 1

28.Attentions for large transaction Due to the distributed, 2-phase commit requirement of TiDB, large transactions that modify data can be particularly problematic: ● Long duration ● More conflicts ● And so on ... TiDB intentionally sets some limits on transaction sizes to reduce this impact: ● Each Key-Value entry is no more than 6MB ● The total number of Key-Value entries is no more than 300,000 ● The total size of Key-Value entries is no more than 100MB

29.Attentions for small transaction # original version with auto_commit UPDATE my_table SET a='new_value' WHERE id = 1; UPDATE my_table SET a='newer_value' WHERE id = 2; UPDATE my_table SET a='newest_value' WHERE id = 3; # improved version START TRANSACTION; UPDATE my_table SET a='new_value' WHERE id = 1; UPDATE my_table SET a='newer_value' WHERE id = 2; UPDATE my_table SET a='newest_value' WHERE id = 3; COMMIT;

TiDB 是一款定位于在线事务处理/在线分析处理( HTAP: Hybrid Transactional/Analytical Processing)的融合型数据库产品,实现了一键水平伸缩,强一致性的多副本数据安全,分布式事务,实时 OLAP 等重要特性。
关注他