An Introduction To TiDB SQL Layer

TiDB 和 TiDB SQL 层的基本架构介绍,并带大家了解 TiDB SQL 中 SQL 引擎的优化与执行。
展开查看详情

1.An Introduction To TiDB SQL Layer zhangjian@pingcap.com Tech Leader of TiDB SQL Engine Team

2.Agenda ● TiDB Overview ● The SQL Layer ○ Query Optimizer ○ Statistics ○ Execution Engine ● What’s Next

3.Part I - TiDB Overview

4.What is TiDB? A Distributed Relational Database that speaks MySQL Protocol ● MySQL Compatibility ● Distributed Transaction ● High Availability ● Scalability ● HTAP

5.TiDB Architecture PD PD TSO/Data location Data location PD PD Cluster Metadata Spark Driver TiDB MySQL Clients TiKV TiKV Job TiDB DistSQL API DistSQL API Worker TiDB TiKV TiKV Syncer Worker TiDB TiKV TiKV Worker TiDB ... ... ... Spark Cluster TiDB Cluster TiKV Cluster (Storage) TiSpark

6.Part II - TiDB SQL Layer

7.TiDB SQL Layer listener SQL Core Layer Packet Privilege Manager SQL AST parser validator Logical AST Connection Schema Manager Plan Context Physical Logical Optimize Optimize Command Session DDL Worker Protocol SQL Context Physical Decode Statistics Plan Data GC Worker Feedback Protocol TiDB Distributed encode Executor Data Coprocessor BG Job Worker Data Data Data Protocol Layer TiKV

8.TiDB SQL Layer SQL Core Layer Privilege Manager SQL AST parser validator Logical AST Schema Manager Plan Physical Logical Optimize Optimize Session DDL Worker Context Physical Statistics Plan GC Worker Feedback TiDB Distributed Executor Data Coprocessor BG Job Worker Data Data TiKV

9.TiDB SQL Layer Key components: SQL Core Layer ● Query Optimizer SQL AST parser validator ● Statistics Logical AST ● TiDB Executor Physical Plan Logical Optimize Optimize Physical Statistics Plan Feedback TiDB Distributed Data Executor Data Coprocessor Data TiKV

10.Part III - Query Optimizer

11.Query Optimizer Find a reasonable plan in a reasonable time ● Operator Pushdown ● Access Path Selection ● Join Order/Algorithm Selection ● Subquery Evaluation

12.Query Optimizer Phase 1: Logical Optimization ● Equally logical transformation ● transformation rules should always be beneficial What rules does TiDB have?

13.Query Optimizer/Logical Optimization ● Column Pruning ● Partition Pruning ● Group By Elimination ● Max/Min Eliminatation ● Project Emination ● Outer Join Elimination ● Outer Join Simplification ● Subquery Decorrelation ● Predicate Push Down ● Aggregate Push Down ● TopN/Limit Push Down ● Join Reordering ● ...

14.Query Optimizer/Outer Join Elimination Outer Join Elimination, parent operator only wants outer columns ● Join Key on inner side is unique select t1.* from t1 left join t2 on t1.a=t2.unique_key select * from t1;

15.Query Optimizer/Join Reorder Join Reorder ● extract join nodes ● apply a DP algorithm if join group size is small ● apply a greedy algorithm otherwise

16.Query Optimizer Phase 1: Logical Optimization Phase 2: Physical Optimization

17.Query Optimizer/Physical Optimization Physical Property: ● task type ● data order ● data distribution(in the future) A Dynamic Programming Progress: ● (Logical Plan, Required Physical Property) -> Physical Plan ● A top-down search approach ● Memorization

18.Part IV - Statistics

19.Statistics Cardinality Estimation ● WHERE t.col > 0 ● ON t1.col1 = t2.col2 ● GROUP BY t.col1, t.col2 What kinds of statistics does TiDB need?

20.Statistics/Histograms What kinds of statistics does TiDB need? ● Equi-depth Histogram

21.Statistics/Histograms What kinds of statistics does TiDB need? ● Equi-depth Histogram ● Count-Min Sketch

22.Statistics/Histograms What kinds of statistics does TiDB need? ● Equi-depth Histogram ● Count-Min Sketch How is these stats info collected?

23.Statistics/Histograms What kinds of statistics does TiDB need? ● Equi-depth Histogram ● Count-Min Sketch How is these stats info collected? ● ANALYZE command ● Auto Analyze ● Query Feedback

24.Part V - Execution Engine

25.Execution Engine Evolved from Iterator Model to Vectorized/Batch Model ● Iterator Model ○ Volcano model ○ One tuple per Next() ● Vectorized/Batch Model ○ Based on the Iterator Model ○ Multiple tuples at a time

26.Execution Engine Chunk input = child.Next() output = proj(input, t.a+t.b) return output π (t.a+t.b) Chunk select t.a+t.b while |output|<batchSize: from t ρ (t.c>10) input = child.Next() where t.c>10 output += filter(input, t.c>10) return output t2 Chunk while |output|<batchSize: output += scan(t) return output

27.Execution Engine ● TiDB 1.0: Iterator Model ● Since TiDB 2.0: Vectorized/Batch Model

28.Execution Engine ● TiDB 1.0: Iterator Model ● Since TiDB 2.0: Vectorized/Batch Model

29.Execution Engine Evolved from Iterator Model to Vectorized/Batch Model Intra-Operator Parallelism ● Hash Join, Index Join, Hash Aggregate, etc. ● Under Development: https://github.com/pingcap/tidb/projects/11

TiDB 是一款定位于在线事务处理/在线分析处理( HTAP: Hybrid Transactional/Analytical Processing)的融合型数据库产品,实现了一键水平伸缩,强一致性的多副本数据安全,分布式事务,实时 OLAP 等重要特性。