PingCAP-Infra-Meetup-98-xuhuaiyu-Compiled+and+Vectorized+Query

下载 10

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/TiDB/PingCAP_Infra_Meetup_98_xuhuaiyu_Compiled_and_Vectorized_Query_60995?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

TiDB

发布于

6年前

4448

人观看

#信息技术

本次分享徐怀宇老师为大家介绍了论文《Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask》，主要包括： •介绍经典 Volcano 模型的执行流程，并分析其运行时性能。 •介绍行存、列存的基本概念，并进而引出向量化执行，分析其如何克服经典 Volcano 模型的缺点。 •介绍代码生成的基本概念，结合案例分析其如何克服经典 Volcano 模型的缺点。最后，结合论文内容，重点从 micro-architecture, data-parallel execution 两个方面，分析对比向量化执行和代码生成的特性，进而引出论文结论：向量化执行在 memory-bound 类的查询中更有优势，代码生成在 calculation-heavy 类的查询中更有优势。但是总体来看，在 OLAP 场景中，向量化执行和代码生成的执行性能相近。

展开查看详情

1 . Welcome! 加入 Infra Meetup No.98 交流群和大家一起讨论吧～我们将分享本期 Meetup 资料

2 . Compiled and Vectorized Queries Presented by Huaiyu Xu Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask

3 .About Me ● Huaiyu Xu (徐怀宇) ● Kernel Engineer for TiDB (SQL execution engine, distributed computing framework) ● xuhuaiyu@pingcap.com PingCAP.com

4 .Agenda ● Classic Volcano-style Model ● Vectorized Execution ● Compiled Execution ● Comparison and Analysis PingCAP.com

5 .Part I - Volcano-style model

6 .Volcano-style model ● A.k.a Tuple at a time ● Open-Next-Close next() tuple ● Elegant, flexible, extensible and Hash powerful[1] GroupBy next() tuple ● Efficient at that time Hash ○ disk I/O is the main overhead Join next() tuple next() ● Pull-based model Selection Selection [1] Graefe, Goetz. "Volcano/spl minus/an extensible and parallel query evaluation system." IEEE Transactions on Knowledge and Data Engineering 6.1 (1994): 120-135.

7 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

8 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

9 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

10 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

11 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 1 5 10.5

12 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

13 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

14 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2

15 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2

16 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

17 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

18 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)

19 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7

20 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7

21 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) sum(price) 14.9

22 .Runtime performance ● Expensive virtual function calls and type dispatch ○ Too many non-productive / control flow instructions ○ CPU pipeline bubbles ● Function call overhead for each next ○ Poor cache utilization

23 .Part II - Vectorized Execution

24 .Vectorized Execution ● Row Stores ● Column Stores

25 .Row Stores ● Suitable for OLTP workloads ● Mostly reading / writing all of the values for entire records && typically a small number of records at a time: SELECT * FROM goods WHERE id = 10

26 .Row Stores ● Not suitable for OLAP workloads ● Only one column is referred, all data has to be read SELECT SUM(UNITS_SOLD) FROM SALE

27 .Column Stores SELECT SUM(UNITS_SOLD) FROM SALE ● Only one column is referenced ● Only reads small part of the table

28 .Column-at-a-time Model SELECT Id, Name, Age, (Age-30)*50 AS Bonus FROM People WHERE Age > 30

29 .Column-at-a-time Model ● However, column may not not be fit into the cache or memory Total Runtime Column Size Cache Size Store in files??? Memory Size

0点赞

0收藏

10下载