翟佳 郭斯杰_使⽤Flink和Pulsar进⾏批流⼀体弹性计算

翟佳 郭斯杰_使⽤Flink和Pulsar进⾏批流⼀体弹性计算
展开查看详情

1.Elastic Stream and Batch Processing with Apache Flink and Apache Pulsar 使⽤用Flink和Pulsar进⾏行行批流⼀一体弹性计算 演讲者: Jia Zhai (@Jia_Zhai) Sijie Guo (@sijieg)

2.Agenda 1. What is Apache Pulsar?
 什什么是Pulsar 2. IO Access Patterns
 IO访问模式 3. Stream and Batch Processing with Apache Flink
 使⽤用Apache Flink进⾏行行批流⼀一体的数据处理理

3.Agenda 1. What is Apache Pulsar?
 什什么是Pulsar a. Flexible, Unified Messaging API : Queue + Stream
 灵活且统⼀一的消息API:⼯工作队列列 + 流 b. Layered Architecture & Segment Centric Storage
 分层架构和分⽚片存储 2. IO Access Patterns
 IO访问模式 3. Stream and Batch Processing with Apache Flink
 使⽤用Apache Flink进⾏行行批流⼀一体的数据处理理

4.What is Apache Pulsar? 
 Apache Pulsar是什什么?

5.2003 2010 2012 2006 2011

6.Flexible Pub-Sub messaging backed by durable log storage — 基于分布式⽇日志存储的 灵活的消息发布-订阅系统

7.Pulsar Namespaces

8.Topic, Producer & Consumer

9.Topic Producers Topic Consumers Time

10.Partition P0 Producers P1 P2 Consumer P3 Time

11.Segment P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time

12.Layered Architecture Layered Architecture
 计算和存储分离 Independent Scalability
 独⽴立扩展性 Fault Tolerance
 灵活容错 Instant Scalability
 快速扩容

13.Segment Centric Storage Logical Partition
 逻辑分区 Partition divided into Segments
 分区被切成分⽚片 Size-based & Time-based
 基于⼤大⼩小和时间的分⽚片 Uniformly distributed across the cluster 
 分⽚片被均匀打散存储到集群中

14.Pulsar Subscriptions

15. Pulsar IO Cassandra Kinesis MySQL MongoDB Messaging Event Processing Complex Stream Pulsar Brokers Pulsar Functions Processing Stream Storage Analytics Other BookKeeper Presto SQL Frameworks Tiered Storage Google Cloud Azure Blob AWS S3 HDFS Storage Storage

16.Agenda 1. What is Apache Pulsar?
 什什么是Pulsar 2. IO Access Patterns
 IO访问模式 a. Write & Tailing Read
 写和追尾读 b. Catchup Read & Tiered Storage
 追赶读和层级存储 3. Stream and Batch Processing with Apache Flink
 使⽤用Apache Flink进⾏行行批流⼀一体的数据处理理

17.Stream P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time

18.Stream Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Consumers Time

19.Access Patterns Write
 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read
 追尾读 Time Catchup Read
 追赶读

20.Access Patterns Write Write
 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read
 追尾读 Time Catchup Read
 追赶读

21.Access Patterns Tailing Read Write Write
 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read
 追尾读 Time Catchup Read
 追赶读

22.Access Patterns Tailing Read Write Write
 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read
 追尾读 Time Catchup Read
 Catchup Read 追赶读

23.Write Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)

24.Tailing Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)

25.Catchup Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)

26.Putting all together - IO Isolation Kafka Pulsar Broker Broker Broker Broker Broker Broker L1 Cache Partition Partition Partition (Leader) (Follower) (Follower) L2 Cache

27.Batch & Stream Stream Processing Write
 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read
 追尾读 Time Catchup Read
 Batch Processing 追赶读

28.Tiered Storage Write
 Tiered Storage Bookies Brokers 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read
 追尾读 Time Catchup Read
 追赶读

29.Stream as a Unified View on Data Segment Readers Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 Consumers Time