- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
PingCAP-Infra-Meetup-98-xuhuaiyu-Compiled+and+Vectorized+Query
展开查看详情
1 . Welcome! 加入 Infra Meetup No.98 交流群 和大家一起讨论吧~ 我们将分享本期 Meetup 资料
2 . Compiled and Vectorized Queries Presented by Huaiyu Xu Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask
3 .About Me ● Huaiyu Xu (徐怀宇) ● Kernel Engineer for TiDB (SQL execution engine, distributed computing framework) ● xuhuaiyu@pingcap.com PingCAP.com
4 .Agenda ● Classic Volcano-style Model ● Vectorized Execution ● Compiled Execution ● Comparison and Analysis PingCAP.com
5 .Part I - Volcano-style model
6 .Volcano-style model ● A.k.a Tuple at a time ● Open-Next-Close next() tuple ● Elegant, flexible, extensible and Hash powerful[1] GroupBy next() tuple ● Efficient at that time Hash ○ disk I/O is the main overhead Join next() tuple next() ● Pull-based model Selection Selection [1] Graefe, Goetz. "Volcano/spl minus/an extensible and parallel query evaluation system." IEEE Transactions on Knowledge and Data Engineering 6.1 (1994): 120-135.
7 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
8 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
9 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
10 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
11 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 1 5 10.5
12 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
13 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
14 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2
15 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2
16 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
17 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
18 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
19 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7
20 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7
21 .Volcano-style model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) sum(price) 14.9
22 .Runtime performance ● Expensive virtual function calls and type dispatch ○ Too many non-productive / control flow instructions ○ CPU pipeline bubbles ● Function call overhead for each next ○ Poor cache utilization
23 .Part II - Vectorized Execution
24 .Vectorized Execution ● Row Stores ● Column Stores
25 .Row Stores ● Suitable for OLTP workloads ● Mostly reading / writing all of the values for entire records && typically a small number of records at a time: SELECT * FROM goods WHERE id = 10
26 .Row Stores ● Not suitable for OLAP workloads ● Only one column is referred, all data has to be read SELECT SUM(UNITS_SOLD) FROM SALE
27 .Column Stores SELECT SUM(UNITS_SOLD) FROM SALE ● Only one column is referenced ● Only reads small part of the table
28 .Column-at-a-time Model SELECT Id, Name, Age, (Age-30)*50 AS Bonus FROM People WHERE Age > 30
29 .Column-at-a-time Model ● However, column may not not be fit into the cache or memory Total Runtime Column Size Cache Size Store in files??? Memory Size