- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
翟佳 郭斯杰_使⽤Flink和Pulsar进⾏批流⼀体弹性计算
展开查看详情
1 .Elastic Stream and Batch Processing with Apache Flink and Apache Pulsar 使⽤用Flink和Pulsar进⾏行行批流⼀一体弹性计算 演讲者: Jia Zhai (@Jia_Zhai) Sijie Guo (@sijieg)
2 .Agenda 1. What is Apache Pulsar? 什什么是Pulsar 2. IO Access Patterns IO访问模式 3. Stream and Batch Processing with Apache Flink 使⽤用Apache Flink进⾏行行批流⼀一体的数据处理理
3 .Agenda 1. What is Apache Pulsar? 什什么是Pulsar a. Flexible, Unified Messaging API : Queue + Stream 灵活且统⼀一的消息API:⼯工作队列列 + 流 b. Layered Architecture & Segment Centric Storage 分层架构和分⽚片存储 2. IO Access Patterns IO访问模式 3. Stream and Batch Processing with Apache Flink 使⽤用Apache Flink进⾏行行批流⼀一体的数据处理理
4 .What is Apache Pulsar? Apache Pulsar是什什么?
5 .2003 2010 2012 2006 2011
6 .Flexible Pub-Sub messaging backed by durable log storage — 基于分布式⽇日志存储的 灵活的消息发布-订阅系统
7 .Pulsar Namespaces
8 .Topic, Producer & Consumer
9 .Topic Producers Topic Consumers Time
10 .Partition P0 Producers P1 P2 Consumer P3 Time
11 .Segment P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time
12 .Layered Architecture Layered Architecture 计算和存储分离 Independent Scalability 独⽴立扩展性 Fault Tolerance 灵活容错 Instant Scalability 快速扩容
13 .Segment Centric Storage Logical Partition 逻辑分区 Partition divided into Segments 分区被切成分⽚片 Size-based & Time-based 基于⼤大⼩小和时间的分⽚片 Uniformly distributed across the cluster 分⽚片被均匀打散存储到集群中
14 .Pulsar Subscriptions
15 . Pulsar IO Cassandra Kinesis MySQL MongoDB Messaging Event Processing Complex Stream Pulsar Brokers Pulsar Functions Processing Stream Storage Analytics Other BookKeeper Presto SQL Frameworks Tiered Storage Google Cloud Azure Blob AWS S3 HDFS Storage Storage
16 .Agenda 1. What is Apache Pulsar? 什什么是Pulsar 2. IO Access Patterns IO访问模式 a. Write & Tailing Read 写和追尾读 b. Catchup Read & Tiered Storage 追赶读和层级存储 3. Stream and Batch Processing with Apache Flink 使⽤用Apache Flink进⾏行行批流⼀一体的数据处理理
17 .Stream P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time
18 .Stream Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Consumers Time
19 .Access Patterns Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read 追尾读 Time Catchup Read 追赶读
20 .Access Patterns Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read 追尾读 Time Catchup Read 追赶读
21 .Access Patterns Tailing Read Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read 追尾读 Time Catchup Read 追赶读
22 .Access Patterns Tailing Read Write Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read 追尾读 Time Catchup Read Catchup Read 追赶读
23 .Write Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
24 .Tailing Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
25 .Catchup Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)
26 .Putting all together - IO Isolation Kafka Pulsar Broker Broker Broker Broker Broker Broker L1 Cache Partition Partition Partition (Leader) (Follower) (Follower) L2 Cache
27 .Batch & Stream Stream Processing Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read 追尾读 Time Catchup Read Batch Processing 追赶读
28 .Tiered Storage Write Tiered Storage Bookies Brokers 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 Tailing Read 追尾读 Time Catchup Read 追赶读
29 .Stream as a Unified View on Data Segment Readers Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 Consumers Time