What turns stream processing from a tool into a platform?

流式数据处理是一个非常强大有意思的编程范式,对于Apache Flink而言,更是如此,年复一年的发布,我们看到Flink在各种应用场景下发挥作用,当然也看到不少挑战。抛开Flink本身不谈(Flink的CTO就是这么牛气),流式数据处理概念范畴会更为宽广,比如把不同计算数据架构都集成到一个平台中,使其实现数据分析,数据规整,SQL,机器学习,数据源管理,数据库等等一切以数据驱动的基础架构,在峰会开场白中,Flink创始人会介绍Flink未来怎么把这一切做到,并能做得更多。


2.A platform makes building new applications simple by taking care of the common and repeatable parts. 2

3.Internal streaming data platforms built with Apache Flink 3

4. Observation 1 Stream Processing is about building applications 4

5.Batch / Data Lake Architecture a.k.a. collect now, figure out later 5

6.Streaming / Data-driven Applications build applications directly on data streams 6

7. Observation 2 Stream Processing changes the database-centric architecture 7

8.Recall last Flink Forward… Classic tiered architecture Streaming architecture compute compute + layer application state database stream storage layer and snapshot storage (backup) application working state + historic state 8

9.Changing the Two Tier Architecture Classic tiered architecture Streaming architecture all modifications are local reads/writes across tier boundary asynchronous writes of large blobs 9

10.Application Platforms 10

11.Application Platforms Logging Metrics Resource Manager CI / CD 11

12.Kubernetes deploying new scaling applications applications Kubernetes 12

13.Kubernetes & Stateful Applications Database Kubernetes 13

14.What about stateful containers? • Example: Scaling down a replicated database • 3 replicas, 4 node scale down need to move or reorganize data before container shutdown Kubernetes 14

15.Stateful Questions  consistent stateful upgrades • application evolution and bug fixes  migration of application state • cluster migration, A/B testing A B  re-processing and reinstatement • fix corrupt results, bootstrap new applications  state evolution (schema evolution) 15

16. Kubernetes Apache Flink Container-based Stateful Stream Resource Orchestration Processing & Snapshots Application dA Platform Manager Code, Resource, Config, and Container-based Snapshot Management platform for stateful data-driven applications 16

17.Web CI/CD interface Job Control App Snapshot Management Manager Resource Allocation Kubernetes Storage

18. Versioned Applications, not Jobs/Jars Stream Processing Application New Application Version 3 Version 3a upgrade Version 2 Version 2a fork / upgrade duplicate Version 1 Code and Application Snapshot 18

19.Architecture dA Application Logging Manager Application Apache Flink lifecycle Stateful stream processing Metrics management Kubernetes Container platform

20.What could the future of a Streaming Data Platform look like? 20

21.The Usual Suspects  Role-based access control  Metadata management  Cross Datacenter Failover / Disaster Recovery 21

22.Support for Batch Processing Everything is a stream. Finite applications as a special case. 22

23.Periodic Bursty Stream Processing Bursty Event Stream (events only at end-of-day ) time Checkpoint / Savepoint Store 23

24.Support a Broad Developer Audience … Streaming Data Platform 24

25.Use Case Vertical Libraries Machine SQL CEP … Learning Streaming Data Platform 25

26.dA Platform is a turnkey solution for stateful stream processing with Apache Flink. dA Logging Application Manager Apache Flink Application lifecycle Stateful stream processing Metrics management Kubernetes Container platform