申请试用
HOT
登录
注册
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage

Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage

Spark开源社区
/
发布于
/
8298
人观看
At Pure Storage, our strong belief in aggressive automated testing has caused our continuous integration (CI) systems to generate massive amounts of messy log data. Spark’s flexible computing platform allows us to write a single application to understand the state of our CI pipeline for both streaming (over a million events per second) and batch jobs (at 40TB/hour). Decoupling our data storage enabled us to orchestrate and independently scale stateless pipeline components (spark, kafka, rsyslog, and custom code) using nomad. In this talk, we will discuss how we architected our data pipeline to leverage simple orchestration and enable resiliency with ephemeral compute components.
0点赞
0收藏
1下载
确认
3秒后跳转登录页面
去登陆