申请试用
HOT
登录
注册
 
Improving Apache Spark’s Reliability with DataSourceV2

Improving Apache Spark’s Reliability with DataSourceV2

Spark开源社区
/
发布于
/
8358
人观看
DataSourceV2 is Spark’s new API for working with data from tables and streams, but “v2” also includes a set of changes to SQL internals, the addition of a catalog API, and changes to the data frame read and write APIs. This talk will cover the context for those additional changes and how “v2” will make Spark more reliable and predictable for building enterprise data pipelines. This talk will include: * Problem areas where the current behavior is unpredictable or unreliable * The new standard SQL write plans (and the related SPIP) * The new table catalog API and a new Scala API for table DDL operations (and the related SPIP) * Netflix’s use case that motivated these changes
0点赞
0收藏
3下载
确认
3秒后跳转登录页面
去登陆