- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
An Introduction To TiDB SQL Layer
展开查看详情
1 .An Introduction To TiDB SQL Layer zhangjian@pingcap.com Tech Leader of TiDB SQL Engine Team
2 .Agenda ● TiDB Overview ● The SQL Layer ○ Query Optimizer ○ Statistics ○ Execution Engine ● What’s Next
3 .Part I - TiDB Overview
4 .What is TiDB? A Distributed Relational Database that speaks MySQL Protocol ● MySQL Compatibility ● Distributed Transaction ● High Availability ● Scalability ● HTAP
5 .TiDB Architecture PD PD TSO/Data location Data location PD PD Cluster Metadata Spark Driver TiDB MySQL Clients TiKV TiKV Job TiDB DistSQL API DistSQL API Worker TiDB TiKV TiKV Syncer Worker TiDB TiKV TiKV Worker TiDB ... ... ... Spark Cluster TiDB Cluster TiKV Cluster (Storage) TiSpark
6 .Part II - TiDB SQL Layer
7 .TiDB SQL Layer listener SQL Core Layer Packet Privilege Manager SQL AST parser validator Logical AST Connection Schema Manager Plan Context Physical Logical Optimize Optimize Command Session DDL Worker Protocol SQL Context Physical Decode Statistics Plan Data GC Worker Feedback Protocol TiDB Distributed encode Executor Data Coprocessor BG Job Worker Data Data Data Protocol Layer TiKV
8 .TiDB SQL Layer SQL Core Layer Privilege Manager SQL AST parser validator Logical AST Schema Manager Plan Physical Logical Optimize Optimize Session DDL Worker Context Physical Statistics Plan GC Worker Feedback TiDB Distributed Executor Data Coprocessor BG Job Worker Data Data TiKV
9 .TiDB SQL Layer Key components: SQL Core Layer ● Query Optimizer SQL AST parser validator ● Statistics Logical AST ● TiDB Executor Physical Plan Logical Optimize Optimize Physical Statistics Plan Feedback TiDB Distributed Data Executor Data Coprocessor Data TiKV
10 .Part III - Query Optimizer
11 .Query Optimizer Find a reasonable plan in a reasonable time ● Operator Pushdown ● Access Path Selection ● Join Order/Algorithm Selection ● Subquery Evaluation
12 .Query Optimizer Phase 1: Logical Optimization ● Equally logical transformation ● transformation rules should always be beneficial What rules does TiDB have?
13 .Query Optimizer/Logical Optimization ● Column Pruning ● Partition Pruning ● Group By Elimination ● Max/Min Eliminatation ● Project Emination ● Outer Join Elimination ● Outer Join Simplification ● Subquery Decorrelation ● Predicate Push Down ● Aggregate Push Down ● TopN/Limit Push Down ● Join Reordering ● ...
14 .Query Optimizer/Outer Join Elimination Outer Join Elimination, parent operator only wants outer columns ● Join Key on inner side is unique select t1.* from t1 left join t2 on t1.a=t2.unique_key select * from t1;
15 .Query Optimizer/Join Reorder Join Reorder ● extract join nodes ● apply a DP algorithm if join group size is small ● apply a greedy algorithm otherwise
16 .Query Optimizer Phase 1: Logical Optimization Phase 2: Physical Optimization
17 .Query Optimizer/Physical Optimization Physical Property: ● task type ● data order ● data distribution(in the future) A Dynamic Programming Progress: ● (Logical Plan, Required Physical Property) -> Physical Plan ● A top-down search approach ● Memorization
18 .Part IV - Statistics
19 .Statistics Cardinality Estimation ● WHERE t.col > 0 ● ON t1.col1 = t2.col2 ● GROUP BY t.col1, t.col2 What kinds of statistics does TiDB need?
20 .Statistics/Histograms What kinds of statistics does TiDB need? ● Equi-depth Histogram
21 .Statistics/Histograms What kinds of statistics does TiDB need? ● Equi-depth Histogram ● Count-Min Sketch
22 .Statistics/Histograms What kinds of statistics does TiDB need? ● Equi-depth Histogram ● Count-Min Sketch How is these stats info collected?
23 .Statistics/Histograms What kinds of statistics does TiDB need? ● Equi-depth Histogram ● Count-Min Sketch How is these stats info collected? ● ANALYZE command ● Auto Analyze ● Query Feedback
24 .Part V - Execution Engine
25 .Execution Engine Evolved from Iterator Model to Vectorized/Batch Model ● Iterator Model ○ Volcano model ○ One tuple per Next() ● Vectorized/Batch Model ○ Based on the Iterator Model ○ Multiple tuples at a time
26 .Execution Engine Chunk input = child.Next() output = proj(input, t.a+t.b) return output π (t.a+t.b) Chunk select t.a+t.b while |output|<batchSize: from t ρ (t.c>10) input = child.Next() where t.c>10 output += filter(input, t.c>10) return output t2 Chunk while |output|<batchSize: output += scan(t) return output
27 .Execution Engine ● TiDB 1.0: Iterator Model ● Since TiDB 2.0: Vectorized/Batch Model
28 .Execution Engine ● TiDB 1.0: Iterator Model ● Since TiDB 2.0: Vectorized/Batch Model
29 .Execution Engine Evolved from Iterator Model to Vectorized/Batch Model Intra-Operator Parallelism ● Hash Join, Index Join, Hash Aggregate, etc. ● Under Development: https://github.com/pingcap/tidb/projects/11