Kafka on Pulsar

展开查看详情

1.

2.Kafka on Pulsar (KoP) 翟佳

3.Who am I? Jia Zhai / 翟佳 Apache Pulsar Committer & PMC Member Apache BookKeeper Committer & PMC Member EMC -> StreamNative StreamNative Core Engineer HUST -> ICT

4. What is Apache Pulsar? Flexible Pub/Sub Messaging backed by Durable log/stream Storage

5.Barrier for user? Unified Messaging Protocol Apps Build on old systems

6.How Pulsar handles it? Pulsar Kafka Wrapper on Kafka Java API https://pulsar.apache.org/docs/en/adaptors-kafka/ Pulsar IO Connect https://pulsar.apache.org/docs/en/io-overview/

7.Kafka on Pulsar (KoP)

8.KoP Feasibility — Log Topic

9.KoP Feasibility — Log Topic Producer Consumer

10.KoP Feasibility — Log Topic Kafka Producer Consumer

11.KoP Feasibility — Log Topic Pulsar Producer Consumer

12.KoP Feasibility — Others Topic Lookup Produce Consume Offset Consumption State Producer Consumer

13.KoP Overview Pulsar Pulsar Kafka Kafka Producer Consumer Producer Consumer Pulsar lib Pulsar lib Kafka lib Kafka lib Pulsar Pulsar Protocol handler Kafka Protocol handler ZooKeeper Pulsar Topic Managed Ledger Load Geo- Bookie BK Client Broker Balancer Replicator

14.KoP Implementation Topic flat map: Broker sets `kafkaNamespace` Message ID and Offset: LedgerId + EntryId Message: Convert Key/value/timestamp/headers(properties) Topic Lookup: Pulsar admin topic lookup -> owner broker Produce: Convert, then call PulsarTopic.publishMessage Consume: Convert, then call non-durable-cursor.readEntries Group Coordinator: Keep in topic `public/__kafka/__offsets`

15.KoP Implementation — Topic Map

16.KoP Implementation — Offset Kafka LedgerId Producer Kafka lib entryId LedgerId entryId Offset

17.KoP Implementation — Message Map

18.KoP Implementation — Topic Lookup

19.KoP Implementation — Pro/Con

20.KoP Implementation — Pro/Con

21.KoP Now Pulsar Pulsar Kafka Kafka Producer Consumer Producer Consumer Pulsar lib Pulsar lib Kafka lib Kafka lib Pulsar Pulsar Protocol handler Kafka Protocol handler ZooKeeper Pulsar Topic Managed Ledger Load Geo- Bookie BK Client Broker Balancer Replicator

22.KoP Now Layered Architecture Independent Scale Instant Recovery Balance-free expand

23.KoP Now Durability Ordering Delivery Guarantees Data replicated and Guaranteed ordering At least once, at most synced to disk once and effectively once High throughput Low Latency Unified messaging Can reach 1.8 M Low publish latency of model messages/s in a 5ms Support both single partition Streaming and Queuing Multi-tenancy Geo-replication Highly scalable & A single cluster can Out of box support for available support many tenants geographically Can support millions of and use cases distributed topics applications HA

24.Demo https://kafka.apache.org/quickstart Demo1: Kafka Producer / Consumer Demo2: Kafka Connect https://archive.apache.org/dist/kafka/2.0.0/ kafka_2.12-2.0.0.tgz

25.Demo Pulsar Pulsar Kafka Kafka Producer Consumer Producer Consumer Pulsar lib Pulsar lib Kafka lib Kafka lib Pulsar Pulsar Protocol handler Kafka Protocol handler ZooKeeper Pulsar Topic Managed Ledger Load Geo- Bookie BK Client Broker Balancer Replicator

26.Demo1: K-Producer -> K-Consumer Kafka Producer Kafka lib Pulsar Protocol handler Kafka Protocol handler Broker Pulsar Topic Kafka Consumer Kafka lib bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

27. Demo1: P-Producer -> K-Consumer Pulsar Kafka Producer Producer Pulsar lib Kafka lib Pulsar Protocol handler Kafka Protocol handler Pulsar Broker Pulsar Topic Consumer Kafka Consumer Pulsar lib Kafka lib bin/pulsar-client produce test -n 1 -m “Hello from Pulsar Producer, Message 1” bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

28. Demo1: P-Producer -> K-Consumer Pulsar Kafka Producer Producer Pulsar lib Kafka lib Pulsar Protocol handler Kafka Protocol handler Pulsar Broker Pulsar Topic Consumer Kafka Consumer Pulsar lib Kafka lib bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test bin/pulsar-client consume -s sub-name test -n 0

29.Demo2: Kafka Connect

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。