- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
Working with Complex Types in DataFrames - Optics to the Rescue
Working with complex types shouldn’t be a complex job. DataFrames provide a great SQL-oriented API for data transformation, but it doesn’t help much when the time comes to update elements of complex types like structs or arrays. In such cases, your program quickly turns into a humongous code of struct words and parenthesis, while trying to make transformations over inner elements, and reconstructing your column. This is exactly the sample problem that we encounter when working with immutable data structures in functional programming, and to solve that problem, optics were invented. Couldn’t we use something similar to optics in the DataFrame realm?
In this talk, we will show how we can enrich the DataFrame API with design patterns that lenses, one of the most common type of optic, put forward to manipulate immutable data structures. We will show how these patterns are implemented through the spark-optics library, an analogue to the Scala Monocle library, and will illustrate its use with several examples. Last but not least, we will take advantage of the dynamic type system of DataFrames to do more than transforming sub-columns, like pruning elements, and renaming them.