申请试用
HOT
登录
注册
 
Cobrix: A Mainframe Data Source for Spark SQL and Streaming

Cobrix: A Mainframe Data Source for Spark SQL and Streaming

Spark开源社区
/
发布于
/
8753
人观看
Seamless integration of diverse data sources into an enterprise data lake has a great value for data-driven companies. In financial and banking industries, where our company ABSA belongs, mainframes are among the most common platforms. However, their interoperability with other platforms remains challenging. In this talk, we introduce a new data source for Spark called Cobrix (https://github.com/AbsaOSS/cobrix) which radically simplifies consuming mainframe data from Spark. Currently, a wide range of approaches is used to integrate mainframe data with analytics platform such as message queues, direct ODBC/JDBC connectors, tools like Sqoop and LegStar, or running Spark directly on mainframes. But these approaches have several limitations. For instance, the existing tools primarily focus on relational data, therefore, the original hierarchical schema is flattened, exploded and/or projected. As a consequence, the resulting table may become extremely wide (~10k columns) which complicates its further processing. Our solution, Cobrix, extends Spark SQL API with a Data Source for mainframe data. It allows reading binary files stored in HDFS having a native mainframe format, and parsing it into Spark DataFrames, with the schema being provided as a COBOL copybook. Spark’s native support for nested structures and arrays allows retention of the original schema. As a result, Cobrix offers a new and convenient way of processing mainframe data. In this talk we first review the difference in data definition models between mainframes and PCs. Then we explain schema mapping between COBOL and Spark in Cobrix. Further, we demonstrate Cobrix usage for reading simple and multi-segment files and present performance and scalability characteristics of the data source. Finally, we discuss the broad picture of mainframe integration through Cobrix, Spark, Avro, Kafka, etc. through use case examples.
0点赞
0收藏
1下载
确认
3秒后跳转登录页面
去登陆