Millions of Core Banking online payment transactions with TiDB

本PPT解释了作为支持交易型分布式数据库系统的TiDB核心产品架构及其主要组件,包括TiDB,TiKV,Placement Driver,TiSpark,TheFlash,Tool,TiDB-operator for k8s等,对其基本作用进行阐述,并对其中的核心组件TiKV重点分析,解释了基本数据组织方式,执行方式,数据管理,水平扩展和负载均衡,以及分布式一致性等基本问题。最好对其分析引擎TiSpark也进行了简要功能说明。

1. Millions of core banking online payment transactions with TiDB Yu Jun

2.About me Yu Jun ( 余军 ), Senior Solution Architect , PingCAP CTO@BSG China/ AE@Teradata / Red Hat/ Hewlett-Packard Distributed & availability/FOSS/FSI Industry 20+ years as an infrastructure engineer, open source advocate

3.TiDB Stras Increasing Trend TiDB Contributors Increasing Trend



6.TiDB platform NewSQL: the best features of both RDBMS and NoSQL Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Elastic scalability HTAP Serve both OLTP & Real-time OLAP HTAP

7.TiDB Architecture TiDB TiDB Worker Spark Driver TiKV Cluster (Storage) Metadata TiKV TiKV TiKV MySQL Clients Syncer Data location Job TiSpark DistSQL API TiKV TiDB TSO/Data location Worker Worker Spark Cluster TiDB Cluster TiDB ... ... ... DistSQL API PD PD PD Cluster TiKV TiKV TiDB PD

8.Components TiDB (tidb-server) TiKV (tikv-server) Placement Driver (PD) TiSpark ( Analytic Engine) TheFlash (Cloumn based Analytic Engine) Tools (syncer / TiDB-Lightning / {tikv,pd}-ctl) TiDB-operator for Kubernetes

9.TiDB (tidb-server) Stateless SQL layer Client can connect to any existing tidb-server instance TiDB * will not * re-shuffle the data across different tidb-servers Full-featured SQL Layer Speak MySQL wire protocol Why not reusing MySQL? Homemade parser & lexer RBO & CBO Secondary index support DML & DDL (online, non-blocking) SQL AST Logical Plan Optimized Logical Plan Cost Model Selected Physical Plan TiKV TiKV TiKV tidb-server Statistics TiKV TiKV TiKV TiKV Cluster

10.Table mapping - From SQL to KV What happens behind: CREATE TABLE user ( id INT PRIMARY KEY, name TEXT, email TEXT );

11.Table mapping - From SQL to KV Key Value user/1 bob | user/2 tom | ... ... INSERT INTO user VALUES (1, “bob”, “”); INSERT INTO user VALUES (2, “tom”, “”); Inside TiKV SQL

12.Table mapping - Secondary index Global index All indexes in TiDB are transactional and fully consistent Stored as separate key-value pairs in TiKV Keyed by a concatenation of the index prefix and primary key in TiKV For example: table := {id, name, email} , id is primary key. If we want to build an index on the name column, for example we have a row: (2, ‘tom’, ‘’) , we could store another kv pair just like: idx:name/ tom_2 => nil idx:name/tom_3 => nil For unique index idx:name/ tom => 2

13.TiKV (tikv-server) The storage layer for TiDB Distributed Key-Value store Support ACID Transactions Replicate logs by Raft Range partitioning Split / merge dynamically Support coprocessor for SQL operators pushdown TiKV TiKV TiKV TiKV TiKV TiKV PD PD PD Placement Driver TiKV TiKV TiKV TiKV TiKV TiKV TiKV Nodes TiDB Nodes Metadata Dataflow

14.TiKV (tikv-server) - Physical stack Highly layered TiKV API (gRPC) Transaction MVCC Multi- Raft (gRPC) RocksDB Raw KV API Transactional KV API

15.TiKV (tikv-server) - Logical view (1/2) Stores Key-Value pairs Infinite sorted (in byte-order) Key-Value map Key space is split into regions (Range-based) dynamically , like HBase Metadata: [start_key, end_key) Each region has multiple replicas (default 3) across different physical nodes, data is replicated by Raft All regions in the same node share the same RocksDB instance TiKV Key Space [ start_key, end_key) (-∞, +∞) Sorted Map 96 MB

16.TiKV (tikv-server) - Logical view (2/2) Region 1 Region 2 Region 3 Region 1 Region 2 Region 3 Region 1 Region 2 Region 3 Raft Group 1 Raft Group 2 Raft Group 3 A - D D - H H - K Key Space ... ... TiKV A TiKV B TiKV C

17.TiKV (tikv-server) - Region split & merge Region A Region A Region B Region A Region A Region B Split Region A Region A Region B Merge Node 2 Node 1 Region splitting and merging affect all replicas of one region. The correctness and consistency are guaranteed by Raft .

18.TiKV (tikv-server) - Scaling & Rebalancing Region 1 Region 3 Region 1 Region 2 Region 1* Region 2 Region 2 Region 3 Region 3 Node A Node B Node C Node D

19.TiKV (tikv-server) - Scaling & Rebalancing Region 1 Region 3 Region 1^ Region 2 Region 1* Region 2 Region 2 Region 3 Region 3 Node A Node B Node E 1) Transfer leadership of region 1 from Node A to Node B Node C Node D

20.TiKV (tikv-server) - Scaling & Rebalancing Region 1 Region 3 Region 1* Region 2 Region 2 Region 2 Region 3 Region 1 Region 3 Node A Node B 2) Add Replica to Node E Node C Node D Node E Region 1

21.TiKV (tikv-server) - Scaling & Rebalancing Region 1 Region 3 Region 1* Region 2 Region 2 Region 2 Region 3 Region 1 Region 3 Node A Node B 3) Remove Replica from Node A Node C Node D Node E

22.TiKV (tikv-server) - Transaction Inspired by Google Percolator Large-scale Incremental Processing Using Distributed Transactions and Notifications 2010 OSDI Classical 2 phase commit , nothing fancy Almost decentralized An ACID compliant layer on top of a Key-Value system

23.TiKV (tikv-server) - Transaction Client 1 Client 2 Client 3 GetTs 12:25 12:26 12:27 GetTs GetTs TSO (TimeStamp Oracle) TSO allocates monotonically increasing timestamp

24.Inject your own logic to TiKV nodes Expose new RPC APIs Used for SQL predicate/aggregation pushdown TiKV (tikv-server) - Coprocessor

25.TiKV (tikv-server) - Coprocessor Partial Aggregate COUNT(c1) Filter c2 = “ foo ” Read Index idx1: (10, +∞) Physical Plan on TiKV (index scan) Read Row Data by RowID RowID Row Row Final Aggregate SUM(COUNT(c1)) DistSQL Scan Physical Plan on TiDB COUNT(c1) COUNT(c1) TiKV TiKV TiKV COUNT(c1) COUNT(c1) SELECT COUNT(c1) FROM t WHERE c1 > 10 AND c2 = ‘foo’ ;

26.Placement Driver (PD) Concept from the paper of Spanner The brain of the TiKV cluster Timestamp allocator Metadata storage Replica scheduling PD PD PD Raft Raft Metadata

27.Placement Driver (PD) Region A Region B TiKV Node 1 TiKV Node 2 PD Scheduling Strategy Cluster Info Admin HeartBeat Scheduling Command Region C Config Movement

28.Placement Driver (PD) - Schedulers PD TiKV TiKV TiKV Store Heartbeat Region Heartbeat Add Replica Remove Replica Transfer Leader ... Schedule Operator PD(M) PD

29.Placement Driver (PD) - Schedulers (Region) Use size for calculation R1 - 0 MB R2 - 0 MB R3 - 0 MB R4 - 64 MB R5 - 64 MB R6 - 96 MB R1 - 0 MB R5 - 64 MB R3 - 0 MB R4 - 64 MB R2 - 0 MB R6 - 96 MB After scheduling