Elastic Data Processing with Apache Pulsar and Apache Flink
展开查看详情
1. Elastic Data Processing with Apache Pulsar and Apache Flink Kaixing Zhao, Apache Flink Committer Sijie Guo, Apache Pulsar Committer 2019-03-23 Hangzhou
2. Agenda • What is Apache Flink? • Flink use cases and ecosystem • Pulsar - Segmented Stream Architecture • Pulsar - Tiered Storage • Pulsar + Flink Integration
3.Pulsar == Segmented Streams
4.Batch - HDFS
5.Stream - Pub/Sub
6.A Flink View on Computing “Batch processing is a special case of Stream processing”
7.A Pulsar View on Data
8. Topic Producers Topic Consumers Time
9.Partitions P0 Producers P1 P2 Consumers P3 Time
10. Segments P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time
11. Stream P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time
12. Stream Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Consumers Time
13. Access Patterns ✓ Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
14. Access Patterns ✓ Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
15. Access Patterns Tailing Read ✓ Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
16. Access Patterns Tailing Read ✓ Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time Catchup Read ✓ Catchup Read 追赶读
17. Write Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
18. Tailing Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
19. Catchup Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
20. Infinite Stream ✓ Write 写 Bookies Brokers Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
21. Infinite Stream ✓ Write 写 Tiered Storage Bookies Brokers Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
22. Tiered Storage • Offloader • When: size-based, time-based, or triggered by pulsar-admin • How: copy a segment to tiered storage, and delete it from bookkeeper • Access: broker knows how to read the data back, or bypass read the offloaded segments directly • Available Offloaders • Cloud Offloder : AWS, GCS, Azure, … • HDFS, Ceph, …
23. Stream as a Unified View on Data Segment Readers Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 Consumers Time
24. Data Processing on Pulsar Bounded Stream Bounded Stream Stream Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 Time Unbounded Stream Unbounded Stream
25.Pulsar + Flink
26. Goals • Flink + Pulsar • Streaming Connectors • Source Connectors • PulsarCatalog: Schema Integration • PulsarStateBackend • Pulsar for the unified view of Data, Flink for the unified view of Computing
27.Done
28.Streaming Source -> Streaming Sink
29.Streaming Source -> Streaming Table Sink