Apache Pulsar: How a Segment-Oriented Architecture Delivers Bett

This talk introduces Apache Pulsar, a durable, distributed messaging system. Pulsar uses Apache BookKeeper as its storage layer for message persistence. In this talk, Jia will discuss how this layered architecture allows Pulsar to achieve a high level of scalability and performance. In particular, It will cover: - Pulsar’s segment oriented architecture. This architecture allows easy elasticity for your messaging cluster without having to do cumbersome rebalancing operations. - The benefits that this segment oriented architecture brings. - A benchmark of throughput and latency of Pulsar messaging system. In short, this talk will be a deep dive into the storage internals of a large scale, production validated distributed messaging system.

1. Fast, Durable, Flexible Pub/Sub based on Segment-Oriented Architecture 演讲者/streamlio 翟佳

2.What is Apache Pulsar? Durability Ordering Delivery Guarantees Data replicated and Guaranteed ordering At least once, at most synced to disk once and effectively once Geo-replication Multi-tenancy Low Latency Out of box support for A single cluster can Low publish latency of geographically support many tenants 5ms at 99pct distributed and use cases applications Unified messaging High throughput Highly scalable model Can reach 1.8 M Can support millions of Support both Topic & messages/s in a topics Queue semantic in a single partition single model !2


4. Architecture view - Separate layers between brokers bookies 4


6.Bookies - Apache BookKeeper \ Durable and Consistent - I/O Isola3on ! High Throughput 2 Low Latency !6

7.Bookies - Apache BookKeeper !7

8.Architecture view • Unbounded topic partition storage • Instant scaling without data rebalance • Independent scalability

9.A Compare


11.Seamless - broker failure

12.Seamless - bookie failure

13.Seamless - cluster expand

14. Conclusion • Unbounded topic partition storage • Instant scaling without data rebalance • Seamless - broker failure recovery • Seamless- bookie failure recovery • Seamless - cluster expansion • Independent scalability

15. Benchmark https://github.com/openmessaging/openmessaging-benchmark



18.Pulsar Functions

19. Pulsar Functions - Lightweight stream processing - New in Pulsar 2.0 - Currently supports Java and Python Python def process(input): return input.replace(“jia”, “anonymous”) Java import java.util.function.Function; public class Anon implements Function<String,String> { @Override public String apply(String input) { return input.replace(“jia”, “anonymous”); } }

20. Pulsar Functions Input Function Output Topic Topic # pulsar-admin functions create \ —py anon.py --className anon \ --fqfn lc3-tenant/demo/anony \
 Python --inputs persistent://lc3-tenant/demo/input \
 --output persistent://lc3-tenant/demo/output # pulsar-admin functions create \ —jar anon.jar --className Anon \ --fqfn lc3-tenant/demo/anony \
 Java --inputs persistent://lc3-tenant/demo/input \
 --output persistent://lc3-tenant/demo/output

21. Curious to Get More • Apache Pulsar : http://pulsar.incubator.apache.org • Apache BookKeeper : http://bookkeeper.apache.org • Technical Blog : https://streaml.io/blog/ • Twitter: @apache_pulsar @asfbookkeeper • slack: • https://apache-pulsar.herokuapp.com/ • https://apachebookkeeper.herokuapp.com/