EP-002:Message Lifecycle

TGIP-CN 第二期

展开查看详情

1.

2.Data Lifecycle TGIP-CN Episode 002

3.Data Lifecycle ❏ Data Flow ❏ Data Retention

4.Data Flow

5.Brokers + Bookies Brokers are “stateless”. They serve clients for producing and consuming events Broker 0 Broker 1 Broker 2 Bookie 0 Bookie 1 Bookie 2 The processes for storing data are called bookies. They persist data for Pulsar.

6.ZooKeeper Brokers are “stateless”. They serve clients for producing and consuming events Broker 0 Broker 1 Broker 2 ZooKeeper ZooKeeper ZooKeeper Bookie 0 Bookie 1 Bookie 2 ZooKeeper is used for storing the metadata for Pulsar and bookkeeper as well as for The processes for storing discovering brokers and bookies. data are called bookies. They persist data for Pulsar.

7.Pulsar Lego Producer 0 Producer 1 Topic Partition 0 Partition 1 Partition 2 Broker X Broker Y Broker Z Consumer (P012) Subscription A

8.Write Path Producer 0 Producer 1 1. A message is created and a Topic partition is selected Partition 0 Partition 1 Partition 2 Broker 0 Broker 1 Broker 2 2. The message is sent to the owner broker that serves the selected partition Bookie 0 Bookie 1 Bookie 2 3. The message is written to N bookies in 4. Once the message has been parallel by the owner broker. The message written by 2 bookies, the broker is written once and stored in their entirety. will acknowledge the message

9.Read Path (Cached) Topic Partition 0 Partition 1 Partition 2 2. Broker sends messages for the partition coming out of its Broker 0 Broker 1 Broker 2 memory cache Bookie 0 Bookie 1 Bookie 2 Consumer (P012) 1. The consumer subscribes to a 3. Consumer acknowledges a topic. It connects to the owner message after processing it. brokers serving the partitions. Broker updates cursor once it receives acknowledgment.

10.Read Path (BK) Topic Partition 0 Partition 1 Partition 2 2. Broker does not have the data in the memory and will read from one Broker 0 Broker 1 Broker 2 of the Bookies that have the data. Bookie 0 Bookie 1 Bookie 2 Consumer (P012) 1. The consumer subscribes to a 3. Consumer acknowledges a topic. It connects to the owner message after processing it. brokers serving the partitions. Broker updates cursor once it receives acknowledgment.

11.Failure Handling Producer 0 Producer 1 In flights messages will be automatically retried by Topic Pulsar clients Partition 0 Partition 1 Partition 2 Broker 0 Broker 1 Broker 2 Brokers are stateless. Any broker process that dies that doesn’t impact data storage. Bookie 0 Bookie 1 Bookie 2 When a bookie dies, all the data is still accessible and will be Consumer (P012) replicated by other replicas

12.Cloud-Native ❏ Apache Pulsar was designed to be cloud-native ❏ Apache Pulsar separates serving from storage ❏ Apache Pulsar scales horizontally to meet storage and serving needs

13. All brokers in the bookkeeper Brokers and Bookies cluster are stateful and can scale independently Producer Brokers Consumer Producers do not directly Consumers do not directly interact interact with the bookkeeper with the bookkeeper cluster cluster Bookies All bookies in the bookkeeper cluster are stateful and can scale independently

14.Data Retention

15.Data Retention ❏ Retention ❏ TTL ❏ Msg Backlog ❏ Storage Size

16.Subscription & Cursor Subscription B (2, 2) Subscription C (3, 2) Subscription A (1, 1) Partition (Event Stream)

17.Subscription Initial Position Partition (Event Stream)

18. Earliest SubscriptionInitialPosition Earliest Partition (Event Stream)

19.Latest SubscriptionInitialPosition Latest Partition (Event Stream)

20.Seek or Reader Subscription (x, y) Partition (Event Stream)

21.Unsubscribe Subscription (x, y) Partition (Event Stream)

22.Message Retention Subscription B (2, 2) Subscription C (3, 2) Partition (Event Stream)

23.Message Retention Subscription B (2, 2) Subscription C (3, 2) OK to delete Partition (Event Stream) NOT OK to delete

24.Message Retention Subscription B (2, 2) Subscription C (3, 2) OK to delete Message Retention Partition (Event Stream) NOT OK to delete

25.Message Retention Subscription B (2, 2) Subscription C (3, 2) OK to delete Message Retention Partition (Event Stream) Yet to be processed

26.Message Retention Acked Acked Acked Acked Acked Acked Acked Acked Unacked Unacked Unacked Msg 1 Msg 2 Msg 3 Msg 4 Msg 5 Msg 6 Msg 7 Msg 8 Msg 9 Msg 10 Msg 11 Deleted Retention Yet to be processed

27.Message Expiry Acked Acked Acked Acked Unacked Unacked Unacked Unacked Unacked Unacked Unacked Msg 5 Msg 6 Msg 7 Msg 8 Msg 9 Msg 10 Msg 11 Msg 12 Msg 13 Msg 14 Msg 15 Not within TTL Deleted Retention Within the applied TTL (may still be processed)

28.Message Expiry Acked Acked Acked Acked Acked Acked Acked Acked Unacked Unacked Unacked Msg 5 Msg 6 Msg 7 Msg 8 Msg 9 Msg 10 Msg 11 Msg 12 Msg 13 Msg 14 Msg 15 Not within TTL Deleted Retention (may still be processed)

29.Msg Backlog Acked Acked Acked Acked Unacked Unacked Unacked Unacked Unacked Unacked Unacked Msg 5 Msg 6 Msg 7 Msg 8 Msg 9 Msg 10 Msg 11 Msg 12 Msg 13 Msg 14 Msg 15 Deleted Retention Yet to be processed

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。