Apache Flink的过去、现在和未来

Apache Flink的过去、现在和未来

展开查看详情

1.

2.

3. 0 2009 - 2014 2014 • 1 6 • F4 • 82 .

4. 4 0 1 2 DataStream API DataSet API Stream Processing Batch Processing Runtime Distributed Streaming Dataflow

5. 2 10 Source Offset Computation State Sink Periodic Snapshots

6. Checkpoint Barrier N Checkpoint Barrier N-1 Part of Part of Part of Checkpoint N+1 Checkpoint N Checkpoint N-1 • L C •

7.Checkpoint State Time Window , Vn , a c nt mnhM , pey kC , , py k , , M uoP W s y , , , dL B ly i nrSAoP IC s n A , -, ,

8.

9. 2. Start job 6. Schedule Task Dispatcher Job Manager Task Manager Task Manager 1. Submit job 3. Request slots 5. Start Task Manager Resource Manager 4. Allocate Container Cluster Manager Client YARN RM K8S RM

10.

11.

12. Stream Mode: 12:01> SELECT Name, SUM(Score), MAX(Time) FROM USER_SCORES GROUP BY Name; ------------------------- ----------------------------- ---------------------------- ---------------------------- | USER_SCORES | | [-inf, 12:01) | [12:01, 12:04) | [12:04, now) | ------------------------- | ------------------------- | ------------------------- | ------------------------- | | User | Score | Time | | | Name | Score | Time | | | Name | Score | Time | | | Name | Score | Time | | ------------------------- | ------------------------- | ------------------------- | ------------------------- | | Julie | 7 | 12:01 | | | | | | | | Julie | 8 | 12:03 | | | Julie | 12 | 12:07 | | | Frank | 3 | 12:03 | | | | | | | | Frank | 3 | 12:03 | | | Frank | 5 | 12:06 | | | Julie | 1 | 12:03 | | ------------------------- | ------------------------- | ------------------------- | | Frank | 2 | 12:06 | ----------------------------- ---------------------------- ---------------------------- | Julie | 4 | 12:07 | -------------------------

13. / 7 1

14. Batch Continuous Processing & Event-driven Processing Streaming Analytics Applications

15.

16. Table API & SQL DataStream Table API & SQL Relational Physical Relational DataStream API DataSet API Query Processor Stream Processing Batch Processing DAG & StreamOperator Runtime Runtime Distributed Streaming Dataflow Distributed Streaming Dataflow Local Cluster Cloud Local Cluster Cloud Single JVM Standalone, YARN GCE, EC2 Single JVM Standalone, YARN GCE, EC2

17.Pull-based operator -

18. A A

19. N M

20.

21.

22.

23.

24.

25.

26.Flink Hive Flink Zeppelin

27.

28. Batch Continuous Processing & Event-driven Processing Streaming Analytics Applications

29.