Kafka on Pulsar:bringing native Kafka protocol support to Pulsar

展开查看详情

1. Introducing Kafka-on-Pulsar: Bring native Kafka protocol support to Apache Pulsar Sijie Guo / Pierre Zemb Pulsar Summit 2020-06-17

2. Who are we? ● Sijie Guo (@sijieg) ● Co-Founder & CEO, StreamNative ● PMC Member of Pulsar/BookKeeper ● Ex Co-Founder, Streamlio ● Ex-Twitter, Ex-Yahoo ● Work on messaging and streaming data technologies for many years

3. Who are we? ● Pierre Zemb (@PierreZ) ● Technical Leader ● Working around distributed systems ● Newcomer as an Apache contributor ○ Apache {Flink, HBase, Pulsar} ● Involved into local communities

4. Agenda ● What is Apache Pulsar? ● Why KoP? ● Introduction of protocol handler ● Kafka VS Pulsar, the protocol version ● How we implement KoP ● Demo ● Roadmap ● Q&A

5.What is Apache Pulsar?

6. Flexible pub/sub messaging backed by durable log storage

7. Flexible pub/sub messaging backed by durable log storage

8.Cloud-Native Event Streaming

9. Apache Pulsar ● Publish-subscribe: unified messaging model (streaming + queueing) ● Infinite event stream storage: Apache BookKeeper + Tiered Storage ● Connectors: ingest events without writing code ● Process events in real-time ○ Pulsar Functions for serverless / lightweight computation ○ Spark / Flink for unified data processing ○ Presto for interactive queries

10. Pulsar Highlights ● Multi-tenancy ● Unified messaging (queuing + streaming) ● Layered Architecture ● Tiered Storage ● Built-in schema support ● Built-in geo-replication

11. The Need of KoP ● Adoptions ● Inbound requests ● Migration

12. The Existing Efforts ● Kafka Java Wrapper ● Pulsar IO Connector

13. Implement Kafka protocol on Pulsar? ● Proxy / Gateway ● Implement Kafka protocol on Pulsar broker

14. KoP as a proxy, OVHcloud version We first implemented KoP has a proxy PoC in Rust: ● Rust async was out in nightly compiler when we started ● We wanted no GC on proxy layers ● Rust has awesome libraries at TCP-level Our goal was to convert TCP frames from Kafka to Pulsar

15.KoP as a proxy, OVHcloud version Proxy layer "Hyperion" clients

16. KoP as a proxy, OVHcloud version Everything is a state-machine: ● TCP cnx from Kafka clients ● TCP cnx to Pulsar brokers Those event-driven finite-state machines were triggered by TCP frames from their respective protocol. A third one was above the two to provide synchronization

17.KoP as a proxy, OVHcloud version Proxy layer "Hyperion" clients State machines

18. KoP as a proxy, OVHcloud version Pros Cons ● Working at TCP layer enables ● Rewrite everything performance ● Some things were hard to proxify: ● nice PoC to discover both protocols ○ Group coordinator ● Rust is blazing fast ○ Offsets management ● Proxify production is easy ● Difficult to open-source (different ● We could bump old version of Kafka language) frames for old Kafka clients

19. The group-coordinator/offsets problem In Kafka, the group coordinator is an In Pulsar, partition assignment is managed elected actor within the cluster by broker on a per-partition basis. responsible for: Offset management is done by storing the ● assigning partitions to consumers of a acknowledgements in cursors by the consumer group owner broker of that partition. ● managing offsets for each consumer group

20. The group-coordinator/offsets problem In Kafka, the group coordinator is an In Pulsar, partition assignment is managed elected actor within the cluster by broker on a per-partition basis. responsible for: Offset management is done by storing the ● assigning partitions to consumers of a acknowledgements in cursors by the consumer group owner broker of that partition. ● managing offsets for each consumer group Simulate this at proxy-level is hard (missing low-level info)

21.And then we saw this 😍

22.Which lead to our collaboration 🤝

23.What is Apache Pulsar??

24.How Pulsar implements its protocol

25.Protocol Handler

26.What is the protocol handler?

27. What is the protocol handler? How to load plugins in a jvm without using classpath? Pulsar is using NAR to load plugins! - Pulsar Function - Pulsar Connector - Pulsar Offloader - Pulsar Protocol Handler

28. How-to load protocol handlers? 1. Upgrade your cluster to 2.5 2. Set the following configurations: 3. Configure each protocol handlers 4. Restart your broker 5. Enjoy!

29.Kafka-on-Pulsar Protocol Handler

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。