- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
【分一06-马国维+陶阳宇】Runtime Improvements for Flink as a Unified Engine
展开查看详情
1 .Runtime Improvements For Flink As A Unified Engine
2 .Ø Flink in Alibaba Ø FLINK Ø New Flink API Stack For Unified Processing Ø Flink API Ø Flink Runtime Improvements Ø Flink Ø Future Plans Ø
3 .• Flink in Alibaba
4 .Flink In Alibaba Cluster State Events TPS 10K Nodes PetaBytes Trillions/PerDay 472M/Sec
5 .Flink Computing Platform in Alibaba AOP Bahamut Dataphin Streaming … Recommendation Search BI FOAS Flink Open Api Service Table Sql DataStream Relational Relational Streaming Process Flink Runtime Distributed Streaming DataFlow Yarn Cluster Management HDFS
6 .• New Flink API Stack For Unified Processing
7 .Current Flink API Stack DataStream API DataSetAPI StreamTransform Operator Tree Table API & SQL Relational Batch Plan DataStream API DataSet API Streaming Process Batch Process StreamGraph Optimization Plan Runtime Distributed Streaming DataFlow Local Cluster Cloud Single JVM Standalone/Yarn EC2/GCE JobGraph StreamTask & OP BatchTask & Driver
8 .Proposed New Flink API Stack OLD NEW Table API & SQL DataStream DataSet Table & SQL Relational Streaming Batch Relational DataStream API DataSet API DAG & StreamOperator Streaming Process Batch Process Unified Core API Runtime Runtime Distributed Streaming DataFlow Distributed Streaming DataFlow Local Cluster Cloud Local Cluster Cloud Single JVM Standalone/Yarn EC2/GCE Single JVM Standalone/Yarn EC2/GCE
9 .Proposed New StreamOperator API Build Probe Input1 Input2 Notified when input is finished Notified when input is finished InputSelection endInput1() InputSelection endInput2() enum InputSelection new new Hash 1. BuildFrom(Input1) Join InputSeclection firstInputSelection() 2. ProbeFrom(Input2) new new Which Input side to read first ? Which Input side to process next ? void processElement1(StreamRecord) InputSelection processElement1(StreamRecord) void processElement2(StreamRecord) InputSelection processElement2(StreamRecord) old new
10 .Ø Flink Runtime Improvements Ø Flink Ø Flink Architecture Ø Flink Ø Improvements For Job Scheduler Ø Ø Pluggable Shuffle Service Architecture Ø Ø Improvements For Task Execution Ø Ø Improvements For Job FailOver Ø
11 .• Flink Architecture
12 .Flink Architecture 2. Start Job 6. Schedule Task Task Task Dispatcher JobMaster Data Exechange Managers Managers 1. Submit Job 3. Request Slot 5. Launch TM 4. Allocate Container Client ResourceManager Cluster
13 .• Job Scheduler
14 .Current Schedule Mode
15 .Pluggable Job Schedule Framework JobMaster DagEvent DAGManager Plugable ExecutionGraph ScheduleTask UpdateState TaskScheduler Task Report Deploy Task TaskManager TaskManager
16 .• Pluggable Shuffle Service
17 .Current Shuffle Service Worker Machines Worker Machines Finished TM TM TM TM Free Slot Free Slot Task Task Server Client Client Server Client Server Client Server JOB1 FileSystem FileSystem JOB2 No Resource Available
18 .External Yarn Shuffle Service Finished Worker Machines Worker Machines TM TM TM TM Task Task Task Task JOB1 External External External External External External External External JOB2 Writer Reader Writer Reader Reader Writer Writer Reader YarnShuffle Service YarnShuffle Service FileSystem FileSystem
19 .Hard to Extend New Shuffle Service Worker Machines TM TM Task Task External Writer External Reader External Writer External Reader ! ! YarnShuffle Service FileSystem
20 .Pluggable Shuffle Architecture JobGraph JobMaster Deploy Task Deploy Task Worker Machines Worker Machines New YarnShuffleManager() TM TM TM TM Task Task Task Task ShuffleManager ShuffleManager ShuffleManager ShuffleManager create create create create create create create create New RDMAManager() Writer Reader Writer Reader Writer Reader Writer Reader FileSystem FileSystem
21 .• Task Execution
22 .StreamTask Based on OperatorTree After Chain Chain Policy Chain Chain JobVertex Operator Forward KeyBy
23 .StreamTask Based on OperatorDag After Chain Chain Policy Chain Chain JobVertex Operator Forward KeyBy
24 .• Job FailOver
25 .Current JM FailOver - Restart All
26 .JM FailOver - Take Over
27 .• Failover Improvements
28 .Failover Improvements • Some failures are non-recoverable, such as • DividebyZero where task can immediately . fail. ExecutionJobVertex ExecutionJobVertex
29 .Failover Improvements • There are some machines with hardware • issues that we could detect and avoid to schedule to that machines. JobMaster Machine Machine TaskManager TaskManager Task Task Task Task ......