申请试用
HOT
登录
注册
 

Working with Complex Types in DataFrames - Optics to the Rescue

Spark开源社区
/
发布于
/
3787
人观看

Working with complex types shouldn’t be a complex job. DataFrames provide a great SQL-oriented API for data transformation, but it doesn’t help much when the time comes to update elements of complex types like structs or arrays. In such cases, your program quickly turns into a humongous code of struct words and parenthesis, while trying to make transformations over inner elements, and reconstructing your column. This is exactly the sample problem that we encounter when working with immutable data structures in functional programming, and to solve that problem, optics were invented. Couldn’t we use something similar to optics in the DataFrame realm?

In this talk, we will show how we can enrich the DataFrame API with design patterns that lenses, one of the most common type of optic, put forward to manipulate immutable data structures. We will show how these patterns are implemented through the spark-optics library, an analogue to the Scala Monocle library, and will illustrate its use with several examples. Last but not least, we will take advantage of the dynamic type system of DataFrames to do more than transforming sub-columns, like pruning elements, and renaming them.

10点赞
4收藏
2下载
确认
3秒后跳转登录页面
去登陆