What turns stream processing from a tool into a platform?
展开查看详情
1.STREAM PROCESSING FROM APPLICATIONS TO PLATFORMS - STEPHAN EWEN, CO-FOUNDER & CTO
2.A platform makes building new applications simple by taking care of the common and repeatable parts. 2
3.Internal streaming data platforms built with Apache Flink 3
4. Observation 1 Stream Processing is about building applications 4
5.Batch / Data Lake Architecture a.k.a. collect now, figure out later 5
6.Streaming / Data-driven Applications build applications directly on data streams 6
7. Observation 2 Stream Processing changes the database-centric architecture 7
8.Recall last Flink Forward… Classic tiered architecture Streaming architecture compute compute + layer application state database stream storage layer and snapshot storage (backup) application working state + historic state 8
9.Changing the Two Tier Architecture Classic tiered architecture Streaming architecture all modifications are local reads/writes across tier boundary asynchronous writes of large blobs 9
10.Application Platforms 10
11.Application Platforms Logging Metrics Resource Manager CI / CD 11
12.Kubernetes deploying new scaling applications applications Kubernetes 12
13.Kubernetes & Stateful Applications Database Kubernetes 13
14.What about stateful containers? • Example: Scaling down a replicated database • 3 replicas, 4 node scale down need to move or reorganize data before container shutdown Kubernetes 14
15.Stateful Questions consistent stateful upgrades • application evolution and bug fixes migration of application state • cluster migration, A/B testing A B re-processing and reinstatement • fix corrupt results, bootstrap new applications state evolution (schema evolution) 15
16. Kubernetes Apache Flink Container-based Stateful Stream Resource Orchestration Processing & Snapshots Application dA Platform Manager Code, Resource, Config, and Container-based Snapshot Management platform for stateful data-driven applications 16
17.Web CI/CD interface Job Control App Snapshot Management Manager Resource Allocation Kubernetes Storage
18. Versioned Applications, not Jobs/Jars Stream Processing Application New Application Version 3 Version 3a upgrade Version 2 Version 2a fork / upgrade duplicate Version 1 Code and Application Snapshot 18
19.Architecture dA Application Logging Manager Application Apache Flink lifecycle Stateful stream processing Metrics management Kubernetes Container platform
20.What could the future of a Streaming Data Platform look like? 20
21.The Usual Suspects Role-based access control Metadata management Cross Datacenter Failover / Disaster Recovery 21
22.Support for Batch Processing Everything is a stream. Finite applications as a special case. 22
23.Periodic Bursty Stream Processing Bursty Event Stream (events only at end-of-day ) time Checkpoint / Savepoint Store 23
24.Support a Broad Developer Audience … Streaming Data Platform 24
25.Use Case Vertical Libraries Machine SQL CEP … Learning Streaming Data Platform 25
26.dA Platform is a turnkey solution for stateful stream processing with Apache Flink. dA Logging Application Manager Apache Flink Application lifecycle Stateful stream processing Metrics management Kubernetes Container platform