Why Splunk Chose Pulsar——Karthik Ramasamy


1. © 2019 SPLUNK INC. Why Splunk Chose Pulsar June 2019 Karthik Ramasamy Splunk

2. © 2020 SPLUNK INC. Karthik Ramasamy Senior Director of Engineering @karthikz streaming @splunk | ex-CEO of @streamlio | co-creator of @heronstreaming | ex @Twitter | Ph.D

3. © 2020 SPLUNK INC. Forward- During the course of this presentation, we may make forward-looking statements regarding future events or plans of the company. We caution you that such statements Looking reflect our current expectations and estimates based on factors currently known to us and that actual events or results may differ materially. The forward-looking statements Statements made in the this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, it may not contain current or accurate information. We do not assume any obligation to update any forward- looking statements made herein. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionalities described or to include any such feature or functionality in a future release. Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved

4. © 2019 SPLUNK INC. Agenda 1) Introduction to Splunk 2) Streaming system requirements 3) How Pulsar satisfies the requirements? 4) Apache Pulsar at Splunk 5) Questions?

5. © 2020 SPLUNK INC. New technologies are enabling and fueling digitization Cloud 5G IoT AI Platforms Mobility Virtualization Robotic Process Blockchain VR Automation

6. © 2020 SPLUNK INC. Data is Transforming Everything The way we work, live and play

7. © 2020 SPLUNK INC. IT Point Data Security Management The Solutions Data Data-to-Everything Master Lakes Data Data Management ETL Silos Platform DevOps Business Processes

8. © 2019 SPLUNK INC. Core of Emerging Use Cases ! Real-time monitoring and Log processing and Interactive applications IoT analytics notifications analytics Messaging / Streaming Systems Streaming data Data Real-time analytics Event-driven workflows transformation distribution

9. © 2020 SPLUNK INC. Streaming System Requirements Fault High Sharing & Scalability Durability Tolerance Availability Isolation Client Messaging Deployment in Persistence Type Safety Languages Models k8s

10. © 2020 SPLUNK INC. Streaming System Requirements Disaster Operability TCO Observability Recovery Ecosystem Adoption Community Licensing

11. © 2019 SPLUNK INC. Requirement #1 - Scalability ✦ Traffic can wildly vary while the system in production ✦ System need to scale up with no effect to publish/consume throughput and latency ✦ Support for linear increase/decrease in publish/consume throughput as new nodes are added ✦ Automatic spreading out load to new machines as new nodes are added ✦ Scalability across different dimensions - serving and storage

12. © 2019 SPLUNK INC. Scalability Function Processing Worker Worker Consumer Producer Messaging Consumer Producer Broker Broker Broker Consumer Producer Consumer Bookie Bookie Bookie Bookie Bookie Event storage ✦ Independent layers for processing, serving and storage ✦ Messaging and processing built on Apache Pulsar ✦ Storage built on Apache BookKeeper

13. © 2019 SPLUNK INC. Requirement #2 - Durability ✦ Splunk applications have different types of durability ✦ Persistent Durability - No data loss in the presence of nodes failures or entire cluster failure - e.g security & compliance ✦ Replicated Durability - No data loss in the presence of limited nodes failures - e.g, machine logs ✦ Transient Durability - Data loss in the presence of failures - e.g metrics data

14. © 2019 SPLUNK INC. Durability fsync Bookie Journal fsync Producer Broker Bookie Journal fsync Bookie Journal

15. © 2019 SPLUNK INC. Requirement #3 - Fault Tolerance ✦ Ability of the system to function under component failures ✦ Ideally without any manual intervention up to a certain degree

16. © 2019 SPLUNK INC. Pulsar Fault Tolerance Serving Broker Broker Broker ✦ Broker Failure ✦ Topic reassigned to available broker based on load ✦ Can construct the previous state consistently ✦ No data needs to be copied Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 .. .. .. .. ✦ Bookie Failure Segment n Segment n Segment n Segment n ✦ Immediate switch to a new node ✦ Background process copies segments to other bookies to maintain replication factor Storage

17. © 2019 SPLUNK INC. Requirement #4 - High Availability ✦ System should continue to function in the cloud or on-prem in following conditions, if applicable ✦ When two nodes/instances fail ✦ When an availability zone or a rack fails

18. © 2019 SPLUNK INC. Pulsar High Availability Serving Broker Broker Broker ✦ Node Failures ✦ Broker failures ✦ Bookie failures ✦ Handled similar to respective component failures Segment 1 Segment 2 Segment 3 Segment 2 Segment 3 Segment 1 ✦ Zone/Rack Failures .. .. .. Segment n Segment n Segment n ✦ Bookies provide rack awareness ✦ Broker replicate data to different racks/zones ✦ In the presence of zone/rack failure, data is available Storage in other zones Zone A Zone B Zone C

19. © 2019 SPLUNK INC. Requirement #5 - Sharing and Isolation ✦ System should have the capabilities to ✦ Share many applications on the same cluster for cost and manageability purposes ✦ Isolate different applications on their own machines in the same cluster when needed

20. © 2019 SPLUNK INC. Sharing and Isolation Topic-1 Account History Topic-2 ETL User Clustering Topic-1 Customer Authentication Microservice 5 TB Data Fraud Topic-1 Product Serving Detection Risk Classification Safety Apache Pulsar Cluster 7 TB Campaigns Marketing Topic-1 Budgeted Spend ✦ Software isolation Topic-2 10 TB Storage quotas, flow control, back pressure, rate limiting Demographic Classification ETL ✦ Hardware isolation Topic-1 Constrain some tenants on a subset of brokers/bookies Location Resolution

21. © 2019 SPLUNK INC. Requirement #6 - Client Languages Python Java Go Apache Pulsar Cluster C++ C Officially supported by the project

22. © 2019 SPLUNK INC. Requirement #7 - Multiple Messaging Models ✦ Splunk applications require different consuming models ✦ Collect once and deliver once capability (e.g) process S3 file and ingest into index ✦ Receive data once and deliver many times (e.g) multiple pipelines sharing same data for different types of processing ✦ Avoid two systems, if possible - from cost and operations perspective ✦ Avoid any additional infra-level code, if possible, that emulates one semantics on top of another system

23. © 2020 SPLUNK INC. Pulsar Messaging Models Messaging Queuing • Exclusive Subscription • Shared Subscription • Failover Subscription • Key Shared Subscription Native support avoids two systems and extra infrastructure code that requires maintenance

24. © 2019 SPLUNK INC. Requirement #8 - Persistence ✦ Offload cold data to lower-cost storage (e.g. Producer cloud storage, HDFS) Consumer Producer Topic ✦ Manual or automatic (configurable threshold) Consumer Producer ✦ Transparent to publishers and consumers Hot storage ✦ Allows near-infinite event storage at low cost (e.g) compliance and security Cold storage

25. © 2019 SPLUNK INC. Requirement #9 - Type Safety ✦ Splunk applications are varied ✦ One class requires fixed schema ✦ Another class requires fixed schema with evolution ✦ Other class requires flexibility for no schema or handled at the application level ✦ Avoid bringing another system for schema management ✦ Support for multiple different types -

26. © 2019 SPLUNK INC. Pulsar Schema Registry ✦ Provides type safety to applications built on top of Pulsar ✦ Server side - system enforces type safety and ensures that producers and consumers remain synced ✦ Schema registry enables clients to upload data schemas on a topic basis. ✦ Schemas dictate which data types are recognized as valid for that topic

27. © 2019 SPLUNK INC. Requirement #10 - Ease of Deployment in k8s ✦ Splunk uses k8s for orchestration ✦ System should be easily deployable in k8s ✦ Surface area of the system exposed outside k8s should be minimal - one single end point backed by ✦ Should be able to segregate the nodes receiving external traffic ✦ Should be flexible to deploy from CI/CD pipelines for testing and development

28. © 2019 SPLUNK INC. Pulsar Deployment in k8s LB LB S S Proxy Proxy Proxy Proxy Proxy Proxy Broker Broker Broker Broker Broker Broker Segment 1 Segment 2 Segment 3 Segment 1 Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 .. .. .. .. Segment 2 Segment 3 Segment 1 Segment 2 .. .. .. .. Segment n Segment n Segment n Segment n Segment n Segment n Segment n Segment n Aggregated Deployment Segregated Deployment

29. © 2019 SPLUNK INC. Requirement #11 - Operability ✦ System should be online and continue to serve production traffic in the following scenarios ✦ OS upgrades ✦ Security patches ✦ Disk swapping ✦ Upgrading ✦ Self adjusting components ✦ Bookies turn themselves into readonly when 90% of disk is full ✦ Load manager to balance traffic across brokers

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。