申请试用
HOT
登录
注册
 
Apache Spark's Built-in File Sources in Depth

Apache Spark's Built-in File Sources in Depth

Spark开源社区
/
发布于
/
3478
人观看

In Spark 3.0 releases, all the built-in file source connectors [including Parquet, ORC, JSON, Avro, CSV, Text] are re-implemented using the new data source API V2. We will give a technical overview of how Spark reads and writes these file formats based on the user-specified data layouts. The talk will also explain the differences between Hive Serde and native connectors, and share the experiences of how to tune the connectors and choose the best data layouts for achieving the best performance.

6 点赞
2 收藏
2下载
确认
3秒后跳转登录页面
去登陆