Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage

At Pure Storage, our strong belief in aggressive automated testing has caused our continuous integration (CI) systems to generate massive amounts of messy log data. Spark’s flexible computing platform allows us to write a single application to understand the state of our CI pipeline for both streaming (over a million events per second) and batch jobs (at 40TB/hour). Decoupling our data storage enabled us to orchestrate and independently scale stateless pipeline components (spark, kafka, rsyslog, and custom code) using nomad. In this talk, we will discuss how we architected our data pipeline to leverage simple orchestration and enable resiliency with ephemeral compute components.
展开查看详情

1. Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage Ivan Jibaja Software Engineer 1 © 2018 PURE STORAGE INC. PURE PROPRIETARY

2.Our Log Analytics Pipeline in Numbers ü1.5 - 2M events / second ü0.5 - 1 PB of data / day ü5 seconds SLA ü(6) 9s of Reliability 2 © 2018 PURE STORAGE INC. PURE PROPRIETARY

3. Data Pipeline – Early Stages 1,000+ 20,000+ VMs tests 100+ 12 FBs 16 12 12 40 16 12 12 40 6T 16 18T 12 18T 12 6G 40 400+ 16 12 12 40 clients 12 rsyslog 10+ Jenkins 6G 12 Custom code 3 © 2018 PURE STORAGE INC. PURE PROPRIETARY

4. Data Pipeline - Now 12 12 120,000+ 12 12 tests / day 12 12 2,500+ 12 12 VMs 12 12 12 16 12 12 16 12 350+ 12 12 FBs 16 12 12 72T 16 24T 16 72T 12 12 800G 12 1,000+ 16 12 12 clients 12 rsyslog 12 ü Duplicate bug 12 12 20+ 12 ü Infrastructure failure Jenkins 12 200T 12 90G 12 ü Performance regression 12 12 12 12 12 189T 12 50G 12 ü Low level details 12 12 ü Easy to read graphs 4 © 2018 PURE STORAGE INC. PURE PROPRIETARY

5.Reliability, Scalability, Flexibility 5 © 2018 PURE STORAGE INC. PURE PROPRIETARY

6.Software Crashes Need to be able to restart each stage of your pipeline without affecting correctness Idempotency 6 © 2018 PURE STORAGE INC. PURE PROPRIETARY

7.Growth Each stage of your pipeline may grow at different speeds Orchestration 7 © 2018 PURE STORAGE INC. PURE PROPRIETARY

8.Efficiency and Flexibility 1. Application stack to solve every kind of problem and they are easy to setup 2. Application silos are inefficient and increase operational cost 3. Scale may require re-architecting a given stage Decouple compute and storage 8 © 2018 PURE STORAGE INC. PURE PROPRIETARY

9.Technologies we use • Docker: Containers • Nomad: Orchestration • Prometheus: Monitoring • Grafana: Dashboards • Consul: Service discovery • Chef: Container build • Jenkins: Continuous Integration • Kafka Manager: Kafka Interface • Artifactory: Image repository • Ansible: Configuring servers 9 © 2018 PURE STORAGE INC. PURE PROPRIETARY

10.Takeaways • Reliability: Idempotency • Scalability: Orchestration • Flexibility and Efficiency: Decoupled compute and storage 10 © 2018 PURE STORAGE INC. PURE PROPRIETARY

11. QUESTIONS? 11 © 2018 PURE STORAGE INC. PURE PROPRIETARY