• Pulsar简介
• Pulsar的根本不同
• Pulsar的现状

注脚

展开查看详情

1. Cloud-Native Messaging & Streaming 翟佳 — StreamNative

2.⾃自我介绍 • 开源项⽬目爱好者:Pulsar, BookKeeper, DistributedLog的PMC成员 • 中科曙光 -> EMC -> Streamlio -> StreamNative • 华中科⼤大 -> 中科院计算所

3.• Pulsar简介 • Pulsar的根本不不同 • Pulsar的现状

4. 4 Apache Pulsar是什什么

5. 5 Apache Pulsar是什什么 “Flexible Pub/Sub messaging backed by durable log/stream storage”

6. 6 其他消息系统的问题 • 已有的系统存在问题 • 分区模型紧耦合存储和计算,不不是云原⽣生(Cloud Native)的设计 • 存储模型过于简单,强依赖于⽂文件系统 • 想开持久化保证数据不不丢,或者,增加Topics的数量量,性能下降太厉害 • IO不不隔离:消费者在清除Backlog的时候会影响其他⽣生产者和消费者 • 运维太痛苦 - 替换机器器、服务扩容都需要很漫⻓长的重新均衡数据的过程

7. 7 Why Apache Pulsar? Durability Ordering Delivery Guarantees Data replicated and Guaranteed ordering At least once, at most synced to disk once and effectively once Geo-replication Multi-tenancy Low Latency Out of box support for A single cluster can Low publish latency of geographically support many tenants 5ms at 99pct distributed and use cases applications Unified messaging High throughput Highly scalable model Can reach 1.8 M Can support millions of Support both messages/s in a topics Streaming and single partition Queuing in a single model

8. 8 项⽬目状态: 最佳开源软件 https://www.infoworld.com/article/3306454/big-data/the-best-open-source-software-for-data-storage-and-analytics.html#slide3

9. 9 项⽬目状态: github

10.• Pulsar简介 • Pulsar的根本不不同 • Pulsar的现状

11.消息 Pr ing oc ag es ss sin Me g Stream Storage

12. 12 存储计算分离的趋势

13. 13 Pulsar 系统架构 Producer Consumer 节点对等 Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1 分层分⽚片架构 服务⾼高可⽤用 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Apache BookKeeper 扩展运维便便捷 Apache Pulsar

14. 14 模型简单

15. 15 统⼀一的消息模型 (kafka, kinesis, …) (SQS, ActiveMQ, RabbitMQ, …)

16. 16 多租户等企业及特性

17.存储 Pr ing oc ag es ss sin Me g Stream Storage

18. 18 存储 - Apache BookKeeper 分布式⽇日志/流存储 • 低延时多复本复制: • Quorum Parallel Replication • 持久化: • 所有操作保证刷盘后才进⾏行行ACK • 强⼀一致性: • 可重复读的⼀一致性
 (Repeatable Read Consistency) • 读写⾼高可⽤用 • 存储节点的读写隔离

19. 19 存储 - Apache BookKeeper Log: Source of Truth of Everything … • HDFS NameNode • Databases: • Twitter Manhattan, Salesforce NewSQL Database • Messaging: • Twitter EventBus, Pulsar

20. 20 存储 - Apache BookKeeper模型

21. 21 存储 - 分⽚片架构

22. 22 存储 - 分⽚片架构 https://jack-vanlightly.com/sketches/2018/10/2/kafka-vs-pulsar-rebalancing-sketch

23.计算 Pr ing oc ag es ss sin Me g Stream Storage

24. 24 计算1.0: DAG DAG % Actio n % Source Sink 1 1 % Actio n % Source Sink 2 2 % Actio

25. 25 计算API - 2.0: DSL Functional Builder.newBuilder() .newSource(() -> StreamletUtils.randomFromList(SENTENCES)) .flatMap(sentence -> Arrays.asList(sentence.toLowerCase().split("\\s+"))) .reduceByKeyAndWindow(word -> word, word -> 1, WindowConfig.TumblingCountWindow(50), (x, y) -> x + y);

26. 26 计算API - 3.0:serverless Abstract View f(x) Incoming Messages Output Messages

27. 27 Pulsar Function 特性 API简单 Method/Procedure/Function Multi Language API Scale developers 流原⽣生 Input/Output/Log as topics 部署简单 Simple standalone applications vs system managed applications

28. 28 Pulsar Functions import org.apache.pulsar.functions.api.Context; import org.apache.pulsar.functions.api.PulsarFunction; public class CounterFunction implements PulsarFunction<String, Void> { @Override public Void process(String input, Context context) throws Exception { for (String word : input.split("\\.")) { context.incrCounter(word, 1); } return null; } }

29. 29 Pulsar Functions Running as a standalone application bin/pulsar-admin functions localrun \ --input persistent://sample/standalone/ns1/test_input \ --output persistent://sample/standalone/ns1/test_result \ --className CounterFunction \ --jar myjar.jar Runs as a standalone process Run as many instances as you want. Framework automatically balances data Run and manage via Mesos/K8/Nomad/your favorite tool