申请试用
HOT
登录
注册
 
PingCAP-Infra-Meetup-98-xuhuaiyu-Compiled+and+Vectorized+Query
TiDB
/
发布于
/
4152
人观看
本次分享徐怀宇老师为大家介绍了论文《Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask》,主要包括: •介绍经典 Volcano 模型的执行流程,并分析其运行时性能。 •介绍行存、列存的基本概念,并进而引出向量化执行,分析其如何克服经典 Volcano 模型的缺点。 •介绍代码生成的基本概念,结合案例分析其如何克服经典 Volcano 模型的缺点。 最后,结合论文内容,重点从 micro-architecture, data-parallel execution 两个方面,分析对比向量化执行和代码生成的特性,进而引出论文结论:向量化执行在 memory-bound 类的查询中更有优势,代码生成在 calculation-heavy 类的查询中更有优势。但是总体来看,在 OLAP 场景中,向量化执行和代码生成的执行性能相近。
展开查看详情

1 . Welcome! 加入 Infra Meetup No.98 交流群 和大家一起讨论吧~ 我们将分享本期 Meetup 资料

2 . Compiled and Vectorized Queries Presented by Huaiyu Xu Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask

3 .About Me ● Huaiyu Xu (徐怀宇) ● Kernel Engineer for TiDB (SQL execution engine, distributed computing framework) ● xuhuaiyu@pingcap.com PingCAP.com

4 .Agenda ● Classic Volcano-style Model ● Vectorized Execution ● Compiled Execution ● Comparison and Analysis PingCAP.com

5 .Part I - Volcano-style model

6 .Volcano-style model ● A.k.a Tuple at a time ● Open-Next-Close next() tuple ● Elegant, flexible, extensible and Hash powerful[1] GroupBy next() tuple ● Efficient at that time Hash ○ disk I/O is the main overhead Join next() tuple next() ● Pull-based model Selection Selection [1] Graefe, Goetz. "Volcano/spl minus/an extensible and parallel query evaluation system." IEEE Transactions on Knowledge and Data Engineering 6.1 (1994): 120-135.

7 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

8 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

9 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

10 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

11 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 1 5 10.5

12 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

13 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

14 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2

15 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2

16 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

17 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

18 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

19 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7

20 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7

21 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) sum(price) 14.9

22 .Runtime performance ● Expensive virtual function calls and type dispatch ○ Too many non-productive / control flow instructions ○ CPU pipeline bubbles ● Function call overhead for each next ○ Poor cache utilization

23 .Part II - Vectorized Execution

24 .Vectorized Execution ● Row Stores ● Column Stores

25 .Row Stores ● Suitable for OLTP workloads ● Mostly reading / writing all of the values for entire records && typically a small number of records at a time: SELECT * FROM goods WHERE id = 10

26 .Row Stores ● Not suitable for OLAP workloads ● Only one column is referred, all data has to be read SELECT SUM(UNITS_SOLD) FROM SALE

27 .Column Stores SELECT SUM(UNITS_SOLD) FROM SALE ● Only one column is referenced ● Only reads small part of the table

28 .Column-at-a-time Model SELECT Id, Name, Age, (Age-30)*50 AS Bonus FROM People WHERE Age > 30

29 .Column-at-a-time Model ● However, column may not not be fit into the cache or memory Total Runtime Column Size Cache Size Store in files??? Memory Size

0 点赞
0 收藏
5下载