申请试用
HOT
登录
注册
 

From HelloWorld to Configurable and Reusable Apache Spark Application in Scala

Spark开源社区
/
发布于
/
3609
人观看

We can think of an Apache Spark application as the unit of work in complex data workflows. Building a configurable and reusable Apache Spark application comes with its own challenges, especially for developers that are just starting in the domain. Configuration, parametrization, and reusability of the application code can be challenging. Solving these will allow the developer to focus on value-adding work instead of mundane tasks such as writing a lot of configuration code, initializing the SparkSession or even kicking-off a new project.

This presentation will describe using code samples a developer’s journey from the first steps into Apache Spark all the way to a simple open-source framework that can help kick-off an Apache Spark project very easy, with a minimal amount of code. The main ideas covered in this presentation are derived from the separation of concerns principle.

The first idea is to make it even easier to code and test new Apache Spark applications by separating the application logic from the configuration logic.

The second idea is to make it easy to configure the applications, providing SparkSessions out-of-the-box, easy to set-up data readers, data writers and application parameters through configuration alone.

The third idea is that taking a new project off the ground should be very easy and straightforward. These three ideas are a good start in building reusable and production-worthy Apache Spark applications.

The resulting framework, spark-utils, is already available and ready to use as an open-source project, but even more important are the ideas and principles behind it.

6点赞
2收藏
0下载
确认
3秒后跳转登录页面
去登陆