Flare 和TensorFlare

Spark性能已经取得了令人印象深刻的进步,但是对于在现代服务器级硬件上通过最佳查询引擎或手写低级C代码可以实现什么,仍然存在很大的差距。我们提出了一种新的后端 Spark SQL,通过将催化剂查询计划编译为本机代码,可以产生显著的加速。
展开查看详情

1.Flare and TensorFlare: Native Compilation for Spark and TensorFlow Pipelines Gregory Essertel, Purdue University Tiark Rompf, Purdue University #Res5SAIS 1

2.#Res4SAIS 2

3.#Res4SAIS 3

4.#Res4SAIS 4

5.#Res4SAIS 5

6.How Fast Is Spark? #Res4SAIS 6

7.#Res4SAIS 7

8.Demo #Res4SAIS 8

9.Spark Architecture #Res4SAIS 9

10.Flare: a New Back-end for Spark #Res4SAIS 10

11.Results #Res4SAIS 11

12.Single-Core Running Time: TPCH Absolute running time in milliseconds (ms) for Postgres, Spark, HyPer and Flare in SF10 #Res4SAIS 12

13.Apache Parquet Format Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Spark CSV 16762 12244 21730 19836 19316 12278 24484 17726 30050 29533 5224 Spark Parquet 3728 13520 9099 6083 8706 535 13555 5512 19413 21822 3926 Flare CSV 641 168 757 698 758 568 788 875 1417 854 128 Flare Parquet 187 17 125 127 151 99 183 160 698 309 9 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Spark CSV 21688 8554 12962 26721 12941 24690 27012 12409 19369 57330 7050 Spark Parquet 5570 7034 719 4506 21834 5176 6757 2681 8562 25089 5295 Flare CSV 701 388 573 551 150 1426 1229 605 792 1868 178 Flare Parquet 133 246 86 88 66 264 181 178 165 324 22 #Res4SAIS 13

14.What about parallelism? #Res4SAIS 14

15.Parallel Scaling Experiment Scaling-up Flare and Spark SQL in SF20 Hardware: Single NUMA machine with 4 sockets, 18 Xeon E5-4657L cores per socket, and 256GB RAM per socket (1 TB total). #Res4SAIS 15

16.NUMA Optimization #Res4SAIS 16

17.NUMA Optimization Scaling-up Flare for SF100 with NUMA optimizations on different configurations: threads pinned to one, two and four sockets Hardware: Single NUMA machine with 4 sockets, 18 Xeon E5-4657L cores per socket, and 256GB RAM per socket (1 TB total). #Res4SAIS 17

18.Heterogeneous Workloads: UDFs and ML Kernels #Res4SAIS 18

19.TensorFlow -> TensorFlare #Res4SAIS 19

20.TensorFlare architecture Flare produces SQL Engine Specialized data loading TensorFlow Model TensorFlow Runtime HDD XLA #Res4SAIS 20

21.Demo #Res4SAIS 21

22.flaredata.github.io 22

23.flaredata.github.io 23

24. FLARE Thank You! Web: flaredata.github.io Twitter: @flaredata 24