TiDB Architecture and Practice

TiDB Architecture and Practice
Shen Li
VP of Engineering, PingCAP

About Me
● Shen Li (申砾)
● VP of Engineering @ PingCAP
● Infrastructure Engineer / Open-source advocator
● Netease/360/PingCAP
● Tech lead of TiDB
● shenli@pingcap.com

About PingCAP
● Since 2015
● Office @ Beijing, Shanghai, Guangzhou, Hangzhou, Chengdu, Shenzhen, Silicon Valley
● Open-Source infrastructure software
● TiDB, TiKV, TiSpark, TiDB-Operator

Agenda
● Introduction
● Architecture
● Evolution
● TiDB in Real World

Why we want to build a "New"SQL Database
● From the beginning
● RDBMS RDBMS NoSQL NewSQL
● How to scale?
○ Middleware 1970s 2010 2015 Present
○ NoSQL
● NewSQL: F1 & Spanner
MySQL Redis Google
PostgreSQL HBase Spanner
Oracle Cassandra Google F1
DB2... MongoDB TiDB

What's TiDB?
TiDB - A Distributed, Consistent, Scalable, SQL Database that supports the best features of both traditional RDBMS and NoSQL.
● Build from scratch
● Key features:
○ Horizontal Scalability
○ High Availability
○ ACID Transaction
○ SQL at Scale

Open Source
● From first-day
● Active community
● 300+ contributors
One of the most popular Open Source Distributed Relational Database in the world!

TiDB Architecture
PD PD
TSO/Data location Data location
PD
PD Cluster
Metadata
Spark Driver
TiDB MySQL Clients TiKV TiKV Job
TiDB DistSQL API DistSQL API Worker
TiDB TiKV TiKV Syncer Worker
TiDB TiKV TiKV Worker
TiDB ... ... Spark Cluster
TiDB Cluster TiKV Cluster (Storage) TiSpark

The SQL Layer
● Stateless SQL layer
Logical Optimized SQL AST
○ Client can connect to any existing Plan Logical Plan tidb-server instance
○ TiDB *will not* re-shuffle the data Statistics Cost Model across different tidb-servers
● Full-featured SQL Layer Selected
○ Speak MySQL wire protocol tidb-server Physical Plan
■ Why not reusing MySQL?
○ Homemade parser & lexer RBO & CBO
○ Secondary index support TiKV TiKV TiKV TiKV TiKV TiKV
○ DML & DDL TiKV Cluster

The Storage Layer (1/2)
Client
● The storage layer for TiDB
● Distributed Key-Value storage engine Dataflow
○ Support ACID Transactions Metadata
○ Replicate logs by Raft
○ Range partitioning TiKV TiKV TiKV TiKV
■ Split / merge dynamically TiKV TiKV TiKV TiKV
○ SQL operators pushdown
PD PD TiKV TiKV TiKV TiKV PD
Placement Driver TiKV Nodes

The Storage Layer (2/2)
Client Placement
Driver RPC RPC RPC RPC
PD 1
TiKV node 1 TiKV node 2 TiKV node 3 TiKV node 4 PD 2
Store 1 Store 2 Store 3 Store 4 PD 3
Region 1 Region 1 Region 1 Region 1 Region 3 Region 2
Region 5 Region 2
Region 5 Region 4 Region 3 Region 5
Region 4 Region 3 Region 4 Raft Group

Ecosystem Tools

HTAP
● Hybrid transactional/analytical platform
● Real-time Data
● No more ETL!
● Cascades Optimizer
● Vectorized+Parallel Execution Engine
● Row-Column mixed Storage Engine
● Isolation of resources

Cloud Native
TiDB Operator
TiDB Controller Manager TiDB Scheduler
TiDB TiDB Cluster Controller
Scheduler
TiDB PD TiKV Kube Controller Controller Controller Scheduler
Kubernetes Core
Scheduler Controller Manager API Server

Say goodbye to sharding

Real-time Data Platform
● Real-time
● Data Convergence Platform
● Middle-End System

Geo Replication

Q&A
https://github.com/pingcap/tidb
https://github.com/tikv/tikv
Thank You !
https://github.com/pingcap/pd
https://github.com/pingcap/tispark
https://github.com/pingcap/tidb-operator
https://github.com/pingcap/docs
https://github.com/pingcap/docs-cn