申请试用
HOT
登录
注册
 
Internals of Speeding up PySpark with Arrow
Internals of Speeding up PySpark with Arrow

Internals of Speeding up PySpark with Arrow

Spark开源社区
/
发布于
/
3611
人观看

Back in the old days of Apache Spark, using Python with Spark was an exercise in patience. Data was moving up and down from Python to Scala, being serialised constantly. Leveraging SparkSQL and avoiding UDFs made things better, likewise did the constant improvement of the optimisers (Catalyst and Tungsten). But, after Spark 2.3, PySpark has sped up tremendously thanks to the addition of the Arrow serialisers. In this talk you will learn how the Spark Scala core communicates with the Python processes, how data is exchanged across both sub-systems and the development efforts present and underway to make it as fast as possible.

12点赞
5收藏
0下载
确认
3秒后跳转登录页面
去登陆