Vectorized executor in TiDB

介绍最近TiDB向量化执行引擎近期的工作,包括表达式计算的向量化优化,以及Join、Aggregation等算子的向量化优化的方法和带来的成果。

展开查看详情

1. Recent work on Vectorized exectuor Presented by Feng Liyuan

2.About me ● Feng Liyuan (冯立元) ● Was: Experienced Engineer, Cloud Storage, Qiniu ● Now: Engineer, TiDB SQL Engine Team ● Focus on: TiDB Runtime PingCAP.com

3.Agenda ● Backgroud ○ What is executor? ○ What is vectorized executor on TiDB ● Recent work ○ Hash Join ○ Stream Aggregation ○ Vectorized expression evaluation ● Future work & Call for participation

4.Part I - Backgroud

5.Row-based executor Volcano Iterator Model next() tuple Aggregation: SELECT SUM(price) sum(price) next() tuple FROM goods Selection: shop_id = 10 WHERE shop_id = 10 next() tuple Scan: goods PingCAP.com

6.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

7.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

8.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

9.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

10.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 1 5 10.5

11.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

12.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

13.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2

14.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2

15.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

16.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

17.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

18.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7

19.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7

20.Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) sum(price) 14.9

21.Volcano Iterator Model ● A.k.a Tuple at a time next() tuple ● Elegant, flexible, extensible and Hash powerful[2] GroupBy ● Efficient at that time next() tuple ○ disk I/O is the main overhead Hash Join ● Low memory cost next() tuple next() ● Very natural for row stores Selection Selection [2] Graefe, Goetz. "Volcano/spl minus/an extensible and parallel query evaluation system." IEEE Transactions on Knowledge and Data Engineering 6.1 (1994): 120-135.

22.Vectorized executor Vectorized We call it We call it chunk column Row-based We call it id shop_id price row 1 5 10.5 id shop_id price 2 10 1.2 1 5 10.5 3 10 13.7 2 10 1.2 ... ... ... 3 10 13.7 ... ... ... 1024 10 15.3 1025 ... 2048 PingCAP.com

23.Vectorized executor nextChunk() chunk Aggregation: SELECT SUM(price) sum(price) nextChunk() chunk FROM goods Selection: shop_id = 10 WHERE shop_id = 10 nextChunk() chunk Scan: goods PingCAP.com

24. SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

25. SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

26. SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

27. SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

28. SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) A chunk which has 1024 rows

29. SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) A chunk which has 3 rows

TiDB 是一款定位于在线事务处理/在线分析处理( HTAP: Hybrid Transactional/Analytical Processing)的融合型数据库产品,实现了一键水平伸缩,强一致性的多副本数据安全,分布式事务,实时 OLAP 等重要特性。