PingCAP-Infra-Meetup-101-tangliu-Evolution+of+TiKV

本次分享唐刘老师首先介绍了 3.0 整个 TiDB 集群的架构变化,主要包括如何通过 TiKV + TiFlash 来实现真正的 HTAP。然后介绍了 3.0 TiKV 的一些性能优化,最后按照 TiKV 架构自底向上,详细介绍了 engine、Raft、transaction、scheduler 等后面需要做的事情。
展开查看详情

1.Evolution of TiKV LiuTang@PingCAP

2.About me ● Chief Engineer of PingCAP ● Leader of TiKV ● Open source projects: go-mysql, go-mysql-elasticsearch, LedisDB, etc. PingCAP.com

3.Agenda ● Architecture ● 3.0 Optimization ● Future Plan PingCAP.com

4.Architecture

5. Data Diagnosis DM Master Migration Lightning Backup & Recovery TiDB Vision Monitoring DM Worker DM Worker KV Importer KV Dumper TiDB Insight Upstream Database Schrodinger TiKV TiKV Spark Driver TiDB Application via MySQL DistSQL API TiKV TiKV Worker Protocol TiDB DistSQL API Spark SQL KV API TiFlash TiFlash Worker TiDB ... ... Worker PD PD PD ... TiDB Binlog PD Cluster Spark Cluster Pump Pump Pump Downstream Database Drainer ... TiDB Operator TiDB Ansible PingCAP.com

6.TiKV: The whole picture Client Placement Driver RPC RPC RPC RPC PD 1 TiKV node 1 TiKV node 2 TiKV node 3 TiKV node 4 PD 2 Store 1 Store 2 Store 3 Store 4 PD 3 Region 1 Region 1 Region 1 Region 1 Region 3 Region 2 Region 5 Region 2 Region 5 Region 4 Region 3 Region 5 Region 4 Region 3 Region 4 Raft Group PingCAP.com

7.Prometheus Client gRPC Txn API Txn API Txn API MVCC MVCC MVCC Raft Raft Raft RocksDB RocksDB RocksDB PingCAP.com

8.What’s new in 3.0

9.Titan Immutable Memory Table WAL Memory Table Flush SST Compaction vlog vlog vlog SST SST GC PingCAP.com

10.Raft - Prevote Times out Times out Receives votes from Starts election New Election majority servers Starts up Follower Pre-Candiate Candiate Leader Discovers Discovers server current leader or with higer term new term PingCAP.com

11.Raft - Learner Follower Leader Follower Learner PingCAP.com

12.Multi Raft - Threaded Raft Req for Region1 Raft Store Thread gRPC Pool Remote TiKV peer Req for Region2 Raft Store Thread Req for Region3 Req for Region4 Raft Store Thread Raft Apply Thread Pool Append log RocksDB TiKV node 1 PingCAP.com

13.Multi Raft - Hibernate Region Node B Heartbeat Node A Node B Node C Hot Region 1 Heartbeat Region 1 Region 1 Region 2 Region 2 Region 2 Cold Region 3 Region 3 Region 3 PingCAP.com

14.Transaction - Distributed GC PD ③ Poll SafePoint ② Set SafePoint TiKV TiDB TiKV ① Resolve TiKV Locks ④ TiKV GC themselves PingCAP.com

15.Coprocessor - Vectorization SELECT SUM(foo) FROM Table WHERE foo>bar Batch Table Scan Batch Selection Batch Aggregation foo Filter 1 > 10 foo SUM bar c1 10 4 4 0 5 42 c2 1 1 42 52 10 0 NIL NIL -5 1 PingCAP.com

16.gRPC - Batch Message batch request stream TiKV TiDB batch response stream TiKV map[req_id]Request PingCAP.com

17.PD - Region storage PD Raft etcd: meta PD Region: LevelDB etcd: meta Async Replication Region: LevelDB Raft PD etcd: meta Async Replication Region: LevelDB PingCAP.com

18.PD - Store Limit Scheduler send operator or TiKV t era op ate cre PD TiKV TiKV PingCAP.com

19.Future Plan

20.Engine - Abstraction Node B LevelDB Node A RocksDB Node C TiFlash PingCAP.com

21.Engine - RocksDB Guard [a, b) [b, c) [c, +∞) SST Level 0 SST SST SST SST Level 1 SST SST SST SST SST Level 2 PingCAP.com

22.Raft - Follower / Learner read Write Read Leader Follower Read Learner PingCAP.com

23.Raft - Chain Replication Leader Follower Read Learner PingCAP.com

24.Raft - Witness Leader Follower Log Replication Log Meta Replication Follower Witness Only Vote PingCAP.com

25.Raft - Joint Consensus PingCAP.com

26.Raft - Flexible Raft Leader Write Group Follower Follower Follower Follower Election Group PingCAP.com

27.Multi Raft - Huge Region Node A Node B Node C Node D Region 1 Region 1 [a, b) [d, e) Region 1 Region 1 Write [a, z) [a, z) Read Full Read Partial Read PingCAP.com

28.Transaction - Pessimistic lock PingCAP.com

29.Transaction - 1 PC TiDB PD TiKV TiDB PD TiKV Get start_ts Get start_ts Prewrite Calculate a Prewrite & valid commit_ts Commit Get commit_ts Commit When there’s only one Region affected by the transaction... PingCAP.com

TiDB 是一款定位于在线事务处理/在线分析处理( HTAP: Hybrid Transactional/Analytical Processing)的融合型数据库产品,实现了一键水平伸缩,强一致性的多副本数据安全,分布式事务,实时 OLAP 等重要特性。