- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Apache Pulsar: Cloud-Native Messaging & Streaming
展开查看详情
1 . Cloud-Native Messaging & Streaming 翟佳 — StreamNative
2 .⾃自我介绍 • 开源项⽬目爱好者:Pulsar, BookKeeper, DistributedLog的PMC成员 • 中科曙光 -> EMC -> Streamlio -> StreamNative • 华中科⼤大 -> 中科院计算所
3 .• Pulsar简介 • Pulsar的根本不不同 • Pulsar的现状
4 . 4 Apache Pulsar是什什么
5 . 5 Apache Pulsar是什什么 “Flexible Pub/Sub messaging backed by durable log/stream storage”
6 . 6 其他消息系统的问题 • 已有的系统存在问题 • 分区模型紧耦合存储和计算,不不是云原⽣生(Cloud Native)的设计 • 存储模型过于简单,强依赖于⽂文件系统 • 想开持久化保证数据不不丢,或者,增加Topics的数量量,性能下降太厉害 • IO不不隔离:消费者在清除Backlog的时候会影响其他⽣生产者和消费者 • 运维太痛苦 - 替换机器器、服务扩容都需要很漫⻓长的重新均衡数据的过程
7 . 7 Why Apache Pulsar? Durability Ordering Delivery Guarantees Data replicated and Guaranteed ordering At least once, at most synced to disk once and effectively once Geo-replication Multi-tenancy Low Latency Out of box support for A single cluster can Low publish latency of geographically support many tenants 5ms at 99pct distributed and use cases applications Unified messaging High throughput Highly scalable model Can reach 1.8 M Can support millions of Support both messages/s in a topics Streaming and single partition Queuing in a single model
8 . 8 项⽬目状态: 最佳开源软件 https://www.infoworld.com/article/3306454/big-data/the-best-open-source-software-for-data-storage-and-analytics.html#slide3
9 . 9 项⽬目状态: github
10 .• Pulsar简介 • Pulsar的根本不不同 • Pulsar的现状
11 .消息 Pr ing oc ag es ss sin Me g Stream Storage
12 . 12 存储计算分离的趋势
13 . 13 Pulsar 系统架构 Producer Consumer 节点对等 Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1 分层分⽚片架构 服务⾼高可⽤用 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Apache BookKeeper 扩展运维便便捷 Apache Pulsar
14 . 14 模型简单
15 . 15 统⼀一的消息模型 (kafka, kinesis, …) (SQS, ActiveMQ, RabbitMQ, …)
16 . 16 多租户等企业及特性
17 .存储 Pr ing oc ag es ss sin Me g Stream Storage
18 . 18 存储 - Apache BookKeeper 分布式⽇日志/流存储 • 低延时多复本复制: • Quorum Parallel Replication • 持久化: • 所有操作保证刷盘后才进⾏行行ACK • 强⼀一致性: • 可重复读的⼀一致性 (Repeatable Read Consistency) • 读写⾼高可⽤用 • 存储节点的读写隔离
19 . 19 存储 - Apache BookKeeper Log: Source of Truth of Everything … • HDFS NameNode • Databases: • Twitter Manhattan, Salesforce NewSQL Database • Messaging: • Twitter EventBus, Pulsar
20 . 20 存储 - Apache BookKeeper模型
21 . 21 存储 - 分⽚片架构
22 . 22 存储 - 分⽚片架构 https://jack-vanlightly.com/sketches/2018/10/2/kafka-vs-pulsar-rebalancing-sketch
23 .计算 Pr ing oc ag es ss sin Me g Stream Storage
24 . 24 计算1.0: DAG DAG % Actio n % Source Sink 1 1 % Actio n % Source Sink 2 2 % Actio
25 . 25 计算API - 2.0: DSL Functional Builder.newBuilder() .newSource(() -> StreamletUtils.randomFromList(SENTENCES)) .flatMap(sentence -> Arrays.asList(sentence.toLowerCase().split("\\s+"))) .reduceByKeyAndWindow(word -> word, word -> 1, WindowConfig.TumblingCountWindow(50), (x, y) -> x + y);
26 . 26 计算API - 3.0:serverless Abstract View f(x) Incoming Messages Output Messages
27 . 27 Pulsar Function 特性 API简单 Method/Procedure/Function Multi Language API Scale developers 流原⽣生 Input/Output/Log as topics 部署简单 Simple standalone applications vs system managed applications
28 . 28 Pulsar Functions import org.apache.pulsar.functions.api.Context; import org.apache.pulsar.functions.api.PulsarFunction; public class CounterFunction implements PulsarFunction<String, Void> { @Override public Void process(String input, Context context) throws Exception { for (String word : input.split("\\.")) { context.incrCounter(word, 1); } return null; } }
29 . 29 Pulsar Functions Running as a standalone application bin/pulsar-admin functions localrun \ --input persistent://sample/standalone/ns1/test_input \ --output persistent://sample/standalone/ns1/test_result \ --className CounterFunction \ --jar myjar.jar Runs as a standalone process Run as many instances as you want. Framework automatically balances data Run and manage via Mesos/K8/Nomad/your favorite tool