Apache Flink 和 Apache Pulsar 的开源数据技术框架可以以不同的方式融合,来提供大规模弹性数据处理。4 月 2 日,郭斯杰受邀在 Flink Forward San Francisco 2019 大会上发表演讲,介绍了 Flink 和 Pulsar 在批流应用程序的融合情况。

本次分享会简要介绍 Apache Pulsar 及其与其他消息系统的不同之处,并讲解如何融合 Pulsar 和 Flink 协同工作,为大规模弹性数据处理提供无缝的开发人员体验。

Pulsar 和 Flink 对应用程序在数据和计算级别如何处理数据的视图基本一致,将“批”作为“流”的特殊情况进行“流式优先”处理。通过 Pulsar 的 Segmented Streams 方法和 Flink 在一个框架下统一批处理和流处理工作负载的几个步骤,可以应用多种方法融合两种技术,提供大规模的弹性数据处理。

注脚

展开查看详情

1. Elastic Data Processing with Apache Flink and Apache Pulsar Sijie Guo (sijieg) 2019-04-02

2. Who am I • Apache Pulsar PMC Member • Apache BookKeeper PMC Member • Interested in technologies around Event Streaming

3. Agenda • What is Apache Pulsar? • A Pulsar View on Data - Segmented Stream • Pulsar - Access Pattern & Tiered Storage • Pulsar - Schema • When Flink meets Pulsar

4.What is Apache Pulsar?

5. Pub/Sub Messaging 2003 2010 2012 2006 2011

6. “Flexible Pub/Sub messaging backed by durable log/stream storage”

7.Pulsar - Pub/Sub

8.Pulsar - Multi Tenancy

9.Pulsar - Queue + Streaming

10.Pulsar - Cloud Native Layered Architecture • Independent Scalability • Instant Failure Recovery • Balance-free on cluster expansions

11.A Pulsar View on Data

12.Batch - HDFS

13.Stream - Pub/Sub

14.A Flink View on Computing “Batch processing is a special case of Stream processing”

15.Pulsar = Segmented Stream

16. Topic Producers Topic Consumers Time

17.Partitions P0 Producers P1 P2 Consumers P3 Time

18. Segments P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time

19. Stream P0 Segment 1 Segment 2 Segment 3 Producers P1 Segment 1 Segment 2 Segment 3 Segment 4 P2 Segment 1 Segment 2 Segment 3 Consumers P3 Segment 1 Segment 2 Segment 3 Time

20. Stream Producers Stream Segment 1 Segment 2 Segment 3 Segment 4 Consumers Time

21. Segmented Stream • Segmented Stream Systems • Apache Pulsar, Twitter EventBus, EMC Pravega • All Apache BookKeeper based • Used BK in a different way • Pulsar, EventBus - Uses BK as the segment store • Pravega - Uses BK as the journal only

22. Access Patterns ✓ Write
 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read
 追尾 Time ✓ Catchup Read
 追赶读

23. Access Patterns ✓ Write
 Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read
 追尾 Time ✓ Catchup Read
 追赶读

24. Access Patterns Tailing Read ✓ Write
 Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read
 追尾 Time ✓ Catchup Read
 追赶读

25. Access Patterns Tailing Read ✓ Write
 Write 写 Stream Segment 1 Segment 2 Segment 3 Segment 4 ✓ Tailing Read
 追尾 Time Catchup Read ✓ Catchup Read
 追赶读

26. Write Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)

27. Tailing Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)

28. Catchup Read Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)

29. IO Isolation Kafka Pulsar Broker Broker Broker Broker Broker Broker Partition Partition Partition (Leader) (Follower) (Follower)