申请试用
HOT
登录
注册
 
Make your PySpark Data Fly with Arrow

Make your PySpark Data Fly with Arrow

Spark开源社区
/
发布于
/
8615
人观看
In the big data world, it’s not always easy for Python users to move huge amounts of data around. Apache Arrow defines a common format for data interchange, while Arrow Flight introduced in version 0.11.0, provides a means to move that data efficiently between systems. Arrow Flight is a framework for Arrow-based messaging built with gRPC. It enables data microservices where clients can produce and consume streams of Arrow data to share it over the wire. In this session, I’ll give a brief overview of Arrow Flight from a Python perspective, and show that it’s easy to build high performance connections when systems can talk Arrow. I’ll also cover some ongoing work in using Arrow Flight to connect PySpark with TensorFlow – two systems with great Python APIs but very different underlying internal data.
0 点赞
1 收藏
1下载
确认
3秒后跳转登录页面
去登陆