1. #TGIP-CN EP-010 Protocol Handler & Kafka-on-Pulsar
2. Who are we? ● Sijie Guo (@sijieg) ● Co-Founder & CEO, StreamNative ● PMC Member of Pulsar/BookKeeper ● Ex Co-Founder, Streamlio ● Ex-Twitter, Ex-Yahoo ● Work on messaging and streaming data technologies for many years
3. Who are we? ● Pierre Zemb (@PierreZ) ● Tech lead ● Working around distributed systems ● Newcomer as an Apache contributor ● Involved into local communities
4. Agenda ● What is Apache Pulsar? ● Why KoP? ● Introduction of protocol handler ● Kafka VS Pulsar, the protocol version ● How we implement KoP ● Demo ● Roadmap ● Q&A
5.What is Apache Pulsar?
6. Flexible pub/sub messaging backed by durable log storage
7. Flexible pub/sub messaging backed by durable log storage
8.Cloud-Native Event Streaming
9. Apache Pulsar ● Publish-subscribe: uniﬁed messaging model (streaming + queueing) ● Inﬁnite event stream storage: Apache BookKeeper + Tiered Storage ● Connectors: ingest events without writing code ● Process events in real-time ○ Pulsar Functions for serverless / lightweight computation ○ Spark / Flink for uniﬁed data processing ○ Presto for interactive queries
10. Pulsar Highlights ● Multi-tenancy ● Uniﬁed messaging (queuing + streaming) ● Layered Architecture ● Tiered Storage ● Built-in schema support ● Built-in geo-replication
11. The Need of KoP ● Adoptions ● Inbound requests ● Migration
12. The Existing Efforts ● Kafka Java Wrapper ● Pulsar IO Connector
13. Implement Kafka protocol on Pulsar? ● Proxy / Gateway ● Implement Kafka protocol on Pulsar broker
14. KoP as a proxy, OVHcloud version We ﬁrst implemented KoP has a proxy PoC in Rust: ● Rust async was out in nightly compiler when we started ● We wanted no GC on proxy layers ● Rust has awesome libraries at TCP-level Our goal was to convert TCP frames from Kafka to Pulsar
15.KoP as a proxy, OVHcloud version Proxy layer "Hyperion" clients
16. KoP as a proxy, OVHcloud version Everything is a state-machine: ● TCP cnx from Kafka clients ● TCP cnx to Pulsar brokers Those event-driven ﬁnite-state machines were triggered by TCP frames from their respective protocol. A third one was above the two to provide synchronization
17.KoP as a proxy, OVHcloud version Proxy layer "Hyperion" clients State machines
18. KoP as a proxy, OVHcloud version Pros Cons ● Working at TCP layer enables ● Rewrite everything performance ● Some things were hard to proxify: ● nice PoC to discover both protocols ○ Group coordinator ● Rust is blazing fast ○ Offsets management ● Proxify production is easy ● Difficult to open-source (different ● We could bump old version of Kafka language) frames for old Kafka clients
19. The group-coordinator/offsets problem In Kafka, the group coordinator is an In Pulsar, partition assignment is managed elected actor within the cluster by broker on a per-partition basis. responsible for: Offset management is done by storing the ● assigning partitions to consumers of a acknowledgements in cursors by the consumer group owner broker of that partition. ● managing offsets for each consumer group
20. The group-coordinator/offsets problem In Kafka, the group coordinator is an In Pulsar, partition assignment is managed elected actor within the cluster by broker on a per-partition basis. responsible for: Offset management is done by storing the ● assigning partitions to consumers of a acknowledgements in cursors by the consumer group owner broker of that partition. ● managing offsets for each consumer group Simulate this at proxy-level is hard (missing low-level info)
21.And then we saw this 😍
22.Which lead to our collaboration 🤝
23.What is Apache Pulsar??
24.How Pulsar implements its protocol
26.What is the protocol handler?
27. What is the protocol handler? How to load plugins in a jvm without using classpath? Pulsar is using NAR to load plugins! - Pulsar Function - Pulsar Connector - Pulsar Offloader - Pulsar Protocol Handler
28. How-to load protocol handlers? 1. Upgrade your cluster to 2.5 2. Set the following conﬁgurations: 3. Conﬁgure each protocol handlers 4. Restart your broker 5. Enjoy!
29.Kafka-on-Pulsar Protocol Handler