- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
How to build a ‘self-driving’ database
展开查看详情
1 .How to build a Self-Driving database An overview Dongxu
2 .Who am I ! Dongxu Huang ! CTO & Cofounder, PingCAP ! Distributed system engineer / Open source advocator ! h@pingcap.com ! TiDB / TiKV / TiSpark
3 .Self-Driving?
4 .
5 .
6 .Why now? ! Better hardware ! Better tools ! Workload is different
7 .SQL Tuning
8 .
9 .
10 .Imagine we got a logical plan: Its physical plan could be: or:
11 .Cost estimation Network cost Memory cost CPU cost In TiDB, the default memory factor is 5 and CPU factor is 0.8. For example: Operator Sort(r), its cost would be:
12 .Data placement
13 .Problem: Uneven Data Distribution ! Dealing with hotspot ○ Choose a wrong sharding key ! Inefficient usage ○ Some are busy ○ Some are idle ! Caused by the nature of RDBMS
14 .A little bit about how data is organized in TiDB PD PD TSO/Data location Data location PD PD Cluster Metadata Spark TiDB Driver TiKV TiKV DistSQL API DistSQL API Job TiDB Application TiDB Worker SparkSQ KV API TiKV TiKV L Syncer Worker TiDB TiKV TiKV Worker TiDB Spark Cluster TiDB TiKV Cluster Cluster (Storage) TiSpark
15 . Any ORM which ODBC/JDBC MySQL Client supports MySQL MySQL Network Protocol SQL Parser TiDB Cost-based Optimizer Distributed Executor (Coprocessor) TiKV Node1 Node2 Node3 Node4
16 .Why Raft? ! Saft split/merge ! Self-healing Start Follower ! Easy to implement Election Timeout, Start new election. Receive higher Find leader or term msg receive higher term msg Election, re- Candidat campaign Leader e Receive majority vote
17 . Table mapping: Rows => Key-Value pairs (-∞, a) Dataset [a, b) (b, +∞)
18 .Data movement Step 1:
19 .Data movement Step 2:
20 .Placement Driver: The scheduler
21 .
22 .Only by Region count? Size also matters.
23 .Hot Balance
24 .Scheduler - More ! More… ○ Weight Balance - High-weight TiKV will save more data ○ Evict Leader Balance - Some TiKV node can’t have any Raft leader ! OpInfluence - Avoid over frequent balancing
25 .Autonomy
26 .
27 .Our work: Oasis
28 .Isolation Forest Start API ! Anomaly detection New Job ! Easy to implement ! Paper IForest Get Metrics From Prometheus Model Features Train Error Predict Alert Report Finish
29 .Related work - OtterTune ! Database Tuning-as-a-Service ○ Automatically generates DBMS knob configurations ○ Reuse data from previous tuning sessions ! Supported systems ○ PostgreSQL ○ MySQL ○ Greenplum ○ Vectorwise