申请试用
HOT
登录
注册
 
Pulsar介绍
献良
/
发布于
/
6774
人观看
Apache Pulsar下一代实时消息引擎,基于Apache BookKeeper,以segment为粒度做数据存储,既保证了数据存储的性能,也确保消息数据持久性,架构的弹性伸缩,无论Broker还是Bookie意外退出,都不会让数据访问和一致性出现问题。作为一个消息引擎,它统一了发布-订阅的主题模式和队列两种语意模型,99%的消息延迟都能在5毫秒以内,单个分区可以支持高达180万消息/秒;此外,它还原生支持轻量级的消息处理函数,让一些简单的消息处理逻辑不再依赖外部处理工具。
展开查看详情

1 . Apache Pulsar 实时数据处理理中 消息,计算和存储的统⼀一 演讲者/streamlio 翟佳

2 . 4 What’s the state of the art

3 . 5 What’s the state of the art

4 . 6 Apache Pulsar — Unify Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper

5 . 7 Why Apache Pulsar? Durability Ordering Delivery Guarantees Data replicated and Guaranteed ordering At least once, at most synced to disk once and effectively once Geo-replication Multi-tenancy Low Latency Out of box support for A single cluster can Low publish latency of geographically support many tenants 5ms at 99pct distributed and use cases applications Unified messaging High throughput Highly scalable model Can reach 1.8 M Can support millions of Support both messages/s in a topics Streaming and single partition Queuing in a single model

6 . 8 Pulsar Architecture Producer Consumer Separate layers between brokers bookies • Broker and bookies can Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1 be added independently • Traffic can be shifted very quickly across Bookie 1 Bookie 2 Bookie 4 Bookie 5 Bookie 3 brokers Apache BookKeeper • New bookies will ramp up on traffic quickly Apache Pulsar

7 . 9 Messaging Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper

8 . 10 Messaging - Concepts

9 . 11 Messaging - Namespace

10 . 12 Messaging - Queuing & Streaming (kafka, kinesis, …) (SQS, ActiveMQ, RabbitMQ, …)

11 . 13 Messaging - ACK Cumulative Individual

12 . 14 Messaging - Retention

13 . 15 Storage Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper

14 . 16 Storage - Apache BookKeeper • A replicated log storage • Low-latency durable writes • Simple repeatable read consistency • Highly available • Store many logs per node • I/O Isolation

15 . 17 Storage - Apache BookKeeper

16 . 18 Storage - Segment Centric

17 . 19 Storage - Segment/Stream/Table Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper

18 . 20 Compute Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper

19 . 21 Compute Representation Abstract View f(x) Incoming Messages Output Messages

20 . 22 Lessons learnt A significant percentage of transformations are simple ETL/Reactive Services/Classification/Real-time Aggregation Event Routing/Microservices The emergence of Serverless Simple Function API Run per event Composition APIs to do complex things Wildly popular

21 . 23 Whats needed: Stream-Native Compute Insight gained from serverless Simplest possible API Method/Procedure/Function Multi Language API Scale developers Stream native concepts Input/Output/Log as topics Flexible runtime Simple standalone applications vs system managed applications

22 . 24 Pulsar Functions — API SDK less API import java.util.function.Function; public class ExclamationFunction implements Function<String, String> { @Override public String apply(String input) { return input + "!"; } } SDK API import org.apache.pulsar.functions.api.PulsarFunction; import org.apache.pulsar.functions.api.Context; public class ExclamationFunction implements PulsarFunction<String, String> { @Override public String process(String input, Context context) { return input + "!"; } }

23 . 25 Pulsar Functions Running as a standalone application bin/pulsar-admin functions localrun \ --input persistent://sample/standalone/ns1/test_input \ --output persistent://sample/standalone/ns1/test_result \ --className org.mycompany.ExclamationFunction \ --jar myjar.jar Runs as a standalone process Run as many instances as you want. Framework automatically balances data Run and manage via Mesos/K8/Nomad/your favorite tool

24 . 26 Pulsar Functions: Use Cases Sensor devices generate tons of data Lot of local actions Edge Computing Simple filtering, threshold detection, regex matching, etc Resource Constrained Limited scope for Full blown schedulers/Job Managers Models computed via offline analysis Model Incoming requests should be classified using the model Serving Function is a natural representation for the classification action Model itself can be stored in Bookkeeper

25 . 27 Pulsar Functions Unify Messaging and Compute cluster into one Function executed for every message of input topic Supports multiple topics as inputs Runtime User Controlled Guarantees: ATMOST_ONCE / ATLEAST_ONCE / EFFECTIVE_ONCE Built-in State Management: Unified Stream & State Store with BookKeeper. Simplified application development

26 . 28 Unified Streaming Solution Messaging Computing Spark DATA Pulsar Broker Pulsar Functions Flink DATA DATA HDFS DATA 。。。 Segment Store Stream Table DATA BookKeeper

27 . 29 Messaging Benchmark https://github.com/openmessaging/openmessaging-benchmark

28 . 30 Benchmark • Testing goals • Throughput & latency under different conditions • Min 2 guaranteed copies • Running on 3 EC2 VMs with local SSDs

29 . 31 Kafka settings • Topic settings replicationFactor=3 min.insync.replicas=2 log.flush.interval.ms= # Using default: means no fsyncs • Kafka producer config acks=all linger.ms=1 batch.size=131072

113 点赞
8 收藏
41下载
确认
3秒后跳转登录页面
去登陆