- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Elastic Data Processing with Apache Pulsar and Apache Flink
展开查看详情
1 . Elastic Data Processing with Apache Pulsar and Apache Flink Kaixing Zhao, Apache Flink Committer Sijie Guo, Apache Pulsar Committer 2019-03-23 Hangzhou
2 . Agenda • What is Apache Flink? • Flink use cases and ecosystem • Pulsar - Segmented Stream Architecture • Pulsar - Tiered Storage • Pulsar + Flink Integration
3 .Pulsar == Segmented Streams
4 .Batch - HDFS
5 .Stream - Pub/Sub
6 .A Flink View on Computing “Batch processing is a special case of Stream processing”
7 .A Pulsar View on Data
8 . Topic Producers Topic Consumers Time
9 .Partitions P0 Producers P1 P2 Consumers P3 Time
10 . Segments P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time
11 . Stream P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time
12 . Stream Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Consumers Time
13 . Access Patterns ✓ Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
14 . Access Patterns ✓ Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
15 . Access Patterns Tailing Read ✓ Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
16 . Access Patterns Tailing Read ✓ Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time Catchup Read ✓ Catchup Read 追赶读
17 . Write Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
18 . Tailing Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
19 . Catchup Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
20 . Infinite Stream ✓ Write 写 Bookies Brokers Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
21 . Infinite Stream ✓ Write 写 Tiered Storage Bookies Brokers Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read 追尾 Time ✓ Catchup Read 追赶读
22 . Tiered Storage • Offloader • When: size-based, time-based, or triggered by pulsar-admin • How: copy a segment to tiered storage, and delete it from bookkeeper • Access: broker knows how to read the data back, or bypass read the offloaded segments directly • Available Offloaders • Cloud Offloder : AWS, GCS, Azure, … • HDFS, Ceph, …
23 . Stream as a Unified View on Data Segment Readers Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 Consumers Time
24 . Data Processing on Pulsar Bounded Stream Bounded Stream Stream Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 Time Unbounded Stream Unbounded Stream
25 .Pulsar + Flink
26 . Goals • Flink + Pulsar • Streaming Connectors • Source Connectors • PulsarCatalog: Schema Integration • PulsarStateBackend • Pulsar for the unified view of Data, Flink for the unified view of Computing
27 .Done
28 .Streaming Source -> Streaming Sink
29 .Streaming Source -> Streaming Table Sink