Horizontally Scale MySQL with TiDB-Avoiding Manual Sharding



1. Horizontally Scale MySQL with TiDB While avoiding Manual Sharding Peter Zaitsev, CEO, Percona Morgan Tocker, Senior Product and Community Manager, TiDB May 1st, 2019 Percona Technical Webinars © 2019 Percona. 1

2.In This Presentation Scaling MySQL Why and when Sharding is Needed Problems to Consider Solutions TiDB offers © 2019 Percona. 2

3.MySQL Scalability (Single Instance) © 2019 Percona. 3

4.Single MySQL Instance Can Do Hundreds of Thousands of Queries/Sec Tends of Thousands of Updates/Sec Traverse Tens of Millions of Rows/Sec Comfortably Handle Several TB Database size © 2019 Percona. 4

5.Lets Do Some Math 100.000 QPS 10 Queries per User Interaction 10.000 User Interactions/sec 864.000.000 User Interactions/Day 30 User Interactions/User Avg 28.000.000 Daily Active Users Possible 15M of Daily Active Users counting time of day skew © 2019 Percona. 5

6.Is it Enough ? More than Enough Not enough for for Small-Medium next Uber or Size Applications Facebook © 2019 Percona. 6

7.Additional Worries Single Thread Query Execution means no scalability for complicated queries Huge instances are painful especially in the age of Cloud, Containers, Kubernetes © 2019 Percona. 7

8.Solution Sharding – Splitting the data across multiple instances by some criteria © 2019 Percona. 8

9.Approach to Sharding Manual •Application manually Sharding places data in right location Automatic •Sharding implemented on Sharding database engine level © 2019 Percona. 9

10.Sharding Pains Manual Sharding Automated Sharding •Increases Application •More Complicated Complexity Database Engine •Reduces Development •Danger of relying on Velocity Magic © 2019 Percona. 10

11.Sharding Problems to Consider Picking Right Sharding Key Query Routing Cross Shard Query Execution Schema Maintenance Consistent Backups Cluster Scaling and Shard Balancing © 2019 Percona. 11

12.Automating Sharding Custom Distributed Application Level Proxy Engine • Manual Sharding • Use Existing MySQL • Can be designed to Done Right backend solve all the • Custom API • Easy Compatibility Sharding problems • Pain for smaller for Routed Queries • Compatibility with group of backend • Hard to handle MySQL is harder Developers distributed queries optimally © 2019 Percona. 12

13.Examples for MySQL Application Level • Hibernate Shards • ProxySQL Proxy • Vitess Custom Engine • TiDB © 2019 Percona. 13

14.Beyond MySQL Have been • Analytical Workloads to Spark, moving Hadoop, ClickHouse, RedShift • Full Text Search workloads to Elastic certain and Solr workloads • Document and Key Value Workloads to MongoDB and Cassandra off MySQL © 2019 Percona. 14

15.Polyglot Persistence Great Not So Great •Allows to use the •Increases best tool for the complexity in job development and operations © 2019 Percona. 15

16.TiDB Allow to horizontally scale MySQL workloads and limit technology sprawl © 2019 Percona. 16

17.Thank You! © 2019 Percona. 17

18.How to horizontally scale MySQL with TiDB while avoiding sharding issues May 2019 Morgan Tocker, PingCAP (@morgo)

19.Agenda ● History and Community ● Technical Walkthrough ● Use Cases ● MySQL Compatibility ● Benchmarks

20.History and Community Founded in 2015 in China Ti = Titanium Apache 2.0 Licensed Storage layer (TiKV) a CNCF project since 2018 US Office since 2018 Quick Numbers: 700+ Annual Conference Attendees 300+ Production Deployments 250+ GitHub Contributors (the TiDB server alone)

21.Agenda ● History and Community ● Technical Walkthrough ● Use Cases ● MySQL Compatibility ● Benchmarks

22.Introduction TiDB is a distributed database that speaks the MySQL protocol It is not based on the MySQL source code It is an ACID/strongly consistent database The inspiration is Google Spanner/F1 It separates SQL processing and Storage into separate components Both of them are independently scalable The SQL processing layer is stateless It is designed for both Transaction and Analytical Processing (HTAP)

23. TiDB TiDB PD Cluster TiKV Node 1 TiKV Node 2 TiKV Node 3 Region 1 L Region 4 Region 2 L Region 2 Region 3 L Region 3 Region 3 Region 2 Region 4 L Region 4 Region 1 Region 1 TiKV Cluster

24.Row + Column Storage (Announced Jan 2019) Spark Cluster TiDB TiSpark Worker TiSpark Worker TiDB TiKV Node 1 TiKV Node 2 TiKV Node 3 TiFlash Node 2 TiFlash Node 1 Region 1 Region 4 Region 2 Region 2 Region 3 Region 3 Region 3 Region 2 Region 4 Region 4 Region 1 Region 1 TiFlash Extension Cluster TiKV Cluster

25.TiDB: The SQL Layer Any ORM which ODBC/JDBC MySQL Client supports MySQL MySQL Network Protocol SQL Parser TiDB Cost-based Optimizer Distributed Executor (Coprocessor) TiKV Node1 Node2 Node3 Node4

26.TiKV: The Storage Foundation Client PD Cluster gRPC gRPC gRPC Txn KV Coprocessor Txn KV Coprocessor Txn KV Coprocessor API API API API API API Transaction Transaction Transaction Raft Raft Raft Raft Group RocksDB RocksDB RocksDB TiKV Instance TiKV Instance TiKV Instance

27.Migration (in and out of TiDB) DM MySQL Binlog MySQL TiDB Instances SQL Dump File Binlog Lightning

28.Agenda ● History and Community ● Technical Walkthrough ● Use Cases ● MySQL Compatibility ● Benchmarks

29.Use Cases 1. Approaching the maximum size for MySQL on a single server. Debating whether or not to shard. 2. Already sharded MySQL, but having a hard time doing analytics on up-to-date data.