Apache Flink® 1.7 and Beyond Part II

下载 3

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/FlinkChina/Apache_Flink_17_and_Beyond_Part_II?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

Flink China中文社区

发布于

6年前

2049

人观看

#信息技术

Apache Flink® 1.7 and Beyond Part II

展开查看详情

1 . End-to-end SQL Only Pipelines Hive Meta Store • Support for external catalogs (Confluent Schema Registry, Hive Meta Store) Input schema information Output schema information • Data definition language (DDL) SQL Table Source Table Sink Query

2 . Capability Spectrum Batch Streaming analytics Event-driven applications offline real time Flink

3 . Flink as a Library • Deploying Flink applications should be as easy as starting a process • Bundle application code and Flink into a single image • Process connects to other application processes and figures out its role • Expose Flink’s managed state to the embedding application P1 P2 P3 P4 New process

4 . Reactive vs. Active • Active mode • Flink is aware of underlying cluster framework • Flink allocate resources • E.g. existing YARN and Mesos integration • Reactive mode • Flink is oblivious to its runtime environment • External system allocates and releases resources • Flink scales with respect to available resources • Relevant for environments: Kubernetes, Docker, as a library

5 . Dynamic Scaling • Latency • Throughput • Resource utilization • Connector signals

6 . Batch-Streaming Unification • No fundamental difference between batch and stream processing • Batch allows optimizations because data is bounded and ”complete” • Batch and streaming still separately treated from task level upwards • Working toward a single runtime for batch and streaming workloads

7 . Flink Scheduler • Lazy scheduling (batch case) src build side • Deploy tasks starting from the sources build side join join • Whenever data is produced start consumers probe probe side • Scheduling of idling tasks → resource under- src side src utilization

8 . Batch Scheduler (1) • More efficient scheduling by taking dependencies into account src build side (2) • E.g. probe side is only scheduled after build side has been join build side join processed probe probe side side src src (2) (3)

9 . Extendable Scheduler Scheduler • Make Flink’s scheduler extendable & pluggable • Scheduler considers dependencies and reacts to signals from Streaming Scheduler Batch Scheduler ExecutionGraph • Specialized scheduler for different use cases Speculative Scheduler

10 . Flink’s Shuffle Service • Tasks own produced result partitions • Containers cannot be freed until result is consumed • One implementation for streaming and batch loads Container Result partition

11 . External & Persistent Shuffle Service • Result partitions are written to an external shuffle service • Containers can be freed early • Different implementations based on use case External shuffle service (e.g. Yarn, DFS)

12 . TL;DL • Flink 1.7.0 added many new features around SQL, connectors and state evolution • A lot of new features in the pipeline • Join the community! • Subscribe to mailing lists • Participate in Flink development • Become active

13 .

14 .Flink Forward San Francisco 2019

0点赞

0收藏

3下载