- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Apache Flink® 1.7 and Beyond Part II
展开查看详情
1 . End-to-end SQL Only Pipelines Hive Meta Store • Support for external catalogs (Confluent Schema Registry, Hive Meta Store) Input schema information Output schema information • Data definition language (DDL) SQL Table Source Table Sink Query
2 . Capability Spectrum Batch Streaming analytics Event-driven applications offline real time Flink
3 . Flink as a Library • Deploying Flink applications should be as easy as starting a process • Bundle application code and Flink into a single image • Process connects to other application processes and figures out its role • Expose Flink’s managed state to the embedding application P1 P2 P3 P4 New process
4 . Reactive vs. Active • Active mode • Flink is aware of underlying cluster framework • Flink allocate resources • E.g. existing YARN and Mesos integration • Reactive mode • Flink is oblivious to its runtime environment • External system allocates and releases resources • Flink scales with respect to available resources • Relevant for environments: Kubernetes, Docker, as a library
5 . Dynamic Scaling • Latency • Throughput • Resource utilization • Connector signals
6 . Batch-Streaming Unification • No fundamental difference between batch and stream processing • Batch allows optimizations because data is bounded and ”complete” • Batch and streaming still separately treated from task level upwards • Working toward a single runtime for batch and streaming workloads
7 . Flink Scheduler • Lazy scheduling (batch case) src build side • Deploy tasks starting from the sources build side join join • Whenever data is produced start consumers probe probe side • Scheduling of idling tasks → resource under- src side src utilization
8 . Batch Scheduler (1) • More efficient scheduling by taking dependencies into account src build side (2) • E.g. probe side is only scheduled after build side has been join build side join processed probe probe side side src src (2) (3)
9 . Extendable Scheduler Scheduler • Make Flink’s scheduler extendable & pluggable • Scheduler considers dependencies and reacts to signals from Streaming Scheduler Batch Scheduler ExecutionGraph • Specialized scheduler for different use cases Speculative Scheduler
10 . Flink’s Shuffle Service • Tasks own produced result partitions • Containers cannot be freed until result is consumed • One implementation for streaming and batch loads Container Result partition
11 . External & Persistent Shuffle Service • Result partitions are written to an external shuffle service • Containers can be freed early • Different implementations based on use case External shuffle service (e.g. Yarn, DFS)
12 . TL;DL • Flink 1.7.0 added many new features around SQL, connectors and state evolution • A lot of new features in the pipeline • Join the community! • Subscribe to mailing lists • Participate in Flink development • Become active
13 .
14 .Flink Forward San Francisco 2019