- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
PingCAP-Infra-Meetup-95-yaokenan-Introduction-of-TiDB-SQL-Layer
展开查看详情
1 . Welcome! 加入 Infra Meetup No.95 交流群 和大家一起讨论吧~ 我们将分享本期 Meetup 资料
2 .Introduction of TiDB SQL Layer Presented by Kenan Yao
3 .Agenda ● TiDB Overview ● TiDB SQL Layer ○ Query Optimizer ○ Statistics ○ Execution Engine
4 .Part I - TiDB Overview
5 .What is TiDB? A distributed relational database that speaks MySQL protocol ● MySQL compatibility ● Distributed transaction ● High availability ● Scalability ● HTAP PingCAP.com
6 .TiDB Architecture PD PD TSO/Data location Data location PD PD Cluster Metadata Spark Driver TiDB MySQL Clients TiKV TiKV Job TiDB DistSQL API DistSQL API Worker TiDB TiKV TiKV Syncer Worker TiDB TiKV TiKV Worker TiDB ... ... ... Spark Cluster TiDB Cluster TiKV Cluster (Storage) TiSpark PingCAP.com
7 .Part II - TiDB SQL Layer
8 .TiDB Layer listener SQL Core Layer Packet Privilege Manager SQL AST parser validator Logical AST Connection Schema Manager Plan Context Physical Logical Optimize Optimize Command Session DDL Worker Protocol SQL Context Physical Decode Statistics Plan Data GC Worker Feedback Protocol TiDB Distributed encode Executor Data Coprocessor BG Job Worker Data Data Data Protocol Layer TiKV PingCAP.com
9 .TiDB SQL Layer Key components: SQL Core Layer ● Query Optimizer SQL AST parser validator ● Statistics Logical AST ● Execution Engine Physical Plan Logical Optimize Optimize Physical Statistics Plan Feedback TiDB Distributed Data Executor Data Coprocessor Data TiKV PingCAP.com
10 .Part III - Query Optimizer
11 .Query Optimizer Find a reasonable plan in reasonable time ● Operator Pushdown ● Access Path Selection ● Join Order/Algorithm Selection ● Aggregation Algorithm Selection ● Subquery Optimization ● ... PingCAP.com
12 .Execution Plan Example Example select t1.key, t2.val π t1.key, t2.val from t1, t2 where t1.key=t2.key ⋈ t1.key=t2.key and t2.val>22; ρ t2.val > 22 t1 t2 PingCAP.com
13 .Try PointGet Plan For simple point get queries 1. Single table SELECT/UPDATE/DELETE 2. Filters should be point get, e.g, unique_key = xx Benefit 1. Reduce the query optimization overhead 2. The PointGet executor is also very efficient
14 .PlanBuilder Build logical plan tree bottom-up from input AST ● Add flags for later logical optimization ● Optimization starts from here ○ Fold constant ○ Eliminate IFNULL(pk, 0 ,1) ○ Subquery handling ○ ...
15 .Query Optimizer Phase 1: Logical Optimization / RBO ● Logical ● Equal ● Beneficial Phase 2: Physical Optimization / CBO
16 .Logical Optimization Apply applicable rules in order ● Column Pruning ● Outer Join Simplification ● Partition Pruning ● Subquery Decorrelation ● Group By Elimination ● Predicate Push Down ● Max/Min Eliminatation ● Aggregate Push Down ● Project Elimination ● TopN/Limit Push Down ● Outer Join Elimination ● Join Reorder ● ...
17 .Logical Optimization Example 1: Outer Join Simplification & Predicate Push Down select * Filter: Filter: Inner Join: t1.val > 11 t1.val > 11 from t1 t1.key = t2key t2.val > 22 t2.val > 22 left join t2 on t1.key = t2.key where t1.val > 11 Left Join: Inner Join: Filter: Filter: and t2.val > 22; t1.key = t2.key t1.key = t2.key t1.val > 11 t2.val > 22 t1 t2 t1 t2 t1 t2 PingCAP.com
18 .Logical Optimization Example 2: Aggregation Push Down Aggregate: Aggregate: group by: t1.key group by: t1.key select sum(t1.val) sum(t1.val) sum(t1.val) from t1, t2 on t1.key = t2.key group by t1.key; Inner Join: Inner Join: t1.key = t2.key t1.key = t2.key t1 t2 Aggregate: group by: t1.key t2 sum(t1.val) t1 PingCAP.com
19 .Logical Optimization Example 3: Outer Join Elimination ● Cond 1: parent operator only needs outer columns ● Cond 2: ○ Cond 2-1: join key on inner side is unique select t1.* from t1 left join t2 on t1.a=t2.unique_key select * from t1;
20 .Logical Optimization Example 3: Outer Join Elimination ● Cond 1: parent operator only needs outer columns ● Cond 2: ○ Cond 2-1: join key on inner side is unique ○ Cond 2-2: parent conly needs distinct outer columns select sum(distinct t1.a) from t1 left join t2 on t1.b=t2.b group by t1.b; select sum(distinct t1.a) from t1;
21 .Logical Optimization Example 4: Join Reorder ● extract join nodes ● pull up statistics ● apply a bottom-up DP algorithm for small join group use bitmask to represent join nodes 1. 6(110) represents a join group with node 1 and 2 2. f[6] represents the best join order for join group {1, 2} 3. f[group] = min{join(f[sub], f[group-sub])}
22 .Logical Optimization Example 4: Join Reorder ● extract join nodes ● pull up statistics ● apply a bottom-up DP algorithm for small join group ● apply a greedy algorithm otherwise Join 600 Join Join Join t2 800 200 200 100 t1 t1 t2 t1 t3 t1 t3 10 10 100 10 100 10 100 t1 t2 t3 10 100 100
23 .Query Optimizer Phase 1: Logical Optimization Phase 2: Physical Optimization
24 .Physical Optimization start None None Sort NominalSort select t1.a from t1 None (t1.c1) (t1.c1) join t2 on t1.c1 = t2.c1 order by t1.c1 HashJoin MergeJoin IndexJoin (t2.c1) (t1.c1) None None (t1.c1) None None IndexScan TableScan IndexScan TableScan t1 t1 t2 t2
25 .Physical Optimization Goals: ● Which index should be used? ● Physical operator implementation ○ Hash or Stream Aggregate? ○ Hash Join, Merge Join or Index Lookup Join? ● Split the Root Task and Coprocessor Task ● Push Agg/Limit/TopN to coprocessor? PingCAP.com
26 .Physical Optimization Physical Property: ● task type ● data order ● expected count A Dynamic Programming Progress: ● (Logical Plan, Required Physical Property) -> Physical Plan ● A top-down search approach ● Memorize the search result ● Cost model
27 .Part IV - Statistics
28 .Statistics Cardinality Estimation ● WHERE t.col > 0 ● ON t1.col1 = t2.col2 ● GROUP BY t.col1, t.col2 What kinds of statistics does TiDB need?
29 .Statistics What kinds of statistics does TiDB need? ● Equi-depth Histogram