Operating Flink on Mesos at Scale

Mesosphere工程师介绍Apache Flink和Mesos集成的那些事,包括完整过程的最佳实践,诸如:部署/监控/资源扩展/升级/调试等等。
展开查看详情

1.@joerg_schad biswajit@branch.io Operating Flink on Mesos at Scale

2. Biswajit Das Chief Architect @Branch Jörg Schad Tech Lead Community @Mesosphere biswajit@branch.io @joerg_schad @joerg.mesosphere © 2018 Mesosphere, Inc. All Rights Reserved. 2

3.Apache Mesos in a Nutshell ● Resource Manager ○ Dynamic resource allocation ○ Running multiple applications ○ 2-level scheduling ● Fault-tolerant, battle-tested ● Scalable to 10,000+ nodes ● Created by Mesosphere founder @ UC Berkeley; used in production by 100+ web-scale companies [1] ● [1] http://mesos.apache.org/documentation/latest/powered-by-mesos/ © 2018 Mesosphere, Inc. All Rights Reserved.

4.Why Flink & Mesos ● Mesos offers full functionality to implement fault tolerant and elastic distributed applications ● 30% of survey respondents were running Flink on Mesos (prior to proper Mesos support*, September 2016) ● Other Deployment Models ● Standalone ● Yarn ● Kubernetes *Kudos to Eron Wright for this work © 2018 Mesosphere, Inc. All Rights Reserved.

5.Why Mesos? Flink Flink Test Kafka Kubernetes HDFS Typical Datacenter siloed, over-provisioned servers, low utilization © 2018 Mesosphere, Inc. All Rights Reserved. 5

6.© 2018 Mesosphere, Inc. All Rights Reserved. 6

7.Datacenter Flink Flink 2 HDFS Kubernetes Kafka Typical Datacenter Mesos/ DC/OS siloed, over-provisioned servers, automated schedulers, workload multiplexing onto the low utilization same machines © 2018 Mesosphere, Inc. All Rights Reserved. 7

8.3 AM Deploy Flink Scale Flink 2 HDFS Configure Kafka Recover Kubernetes Typical Datacenter siloed, over-provisioned servers, low utilization ...

9. MESOS ARCHITECTURE Flink Spark Kafka Scheduler Scheduler Scheduler Two-level Scheduling Mesos Mesos Mesos 1. Agents advertise resources to Master Master Master Master 2. Master offers resources to Framework 3. Framework rejects / uses resources Mesos Mesos Agent Agent Service Mesos Mesos Agent Agent Service 4. Agent reports task status to Master CDB Docker Spark Cassandra Executor Executor Executor Executor Spark Docker Spark Cassandra Task Task Task Task © 2018 Mesosphere, Inc. All Rights Reserved. 9

10.© 2018 Mesosphere, Inc. All Rights Reserved. 10

11.Powered by Apache Mesos © 2018 Mesosphere, Inc. All Rights Reserved.

12.Flink Mesos Integration (old/simplefied) Apache Flink Framework Mesos Master Allocate Mesos App Master Resources Flink Mesos ResourceManager Mesos Task Register TaskManager Launch Mesos tasks JobManager Mesos Task Execute Job TaskManager © 2018 Mesosphere, Inc. All Rights Reserved.

13. Flink Mesos Integration (1) Start and monitor Marathon (3) Allocate Mesos Master dispatcher container for Flink master (2) HTTP POST Mesos Cluster JobGraph/Jars Flink Mesos Client Dispatcher (6) Allocate (4) Start Process containers (and supervise) for TaskManagers Flink Master Process Mesos Task Flink Mesos TaskManager ResourceManager (7) Register (5) Request slots Mesos Task JobManager TaskManager (8) Deploy Tasks © 2018 Mesosphere, Inc. All Rights Reserved.

14.● ● ● ● ●

15. Tranquility SECOR

16. Streaming Path DownStream Batch HDFS/S3 Re-Publish Master Data/Warehouse Chronos/Schedule SECOR

17.private docker hub Job Template

18.➢ Custom scheduler to submit job once it satisfy resource criteria

19.➢ 50 Streaming Jobs ➢ Stream RPS 120k/sec ➢ 10B + events /day ➢ 2.5 TB /day ➢ 200+ Mesos Node cluster ➢ Marathon on Marathon ➢ Auto Scale with custom tool x-scale & ASG ➢ Custom Monitoring Platform with prometheus and Elk

20.Operating Flink on Mesos © 2018 Mesosphere, Inc. All Rights Reserved.

21.Deployments ● Versioned app definition/job ● Immutable Docker tags ● Private Docker registry ● CI/CD ● No manual deployments to Prod © 2018 Mesosphere, Inc. All Rights Reserved.

22.HA Setup ● Use HDFS for HA setup ● dcos package install HDFS ● dcos hdfs endpoints © 2018 Mesosphere, Inc. All Rights Reserved.

23.Containerization ● Which Container Runtime ● UCR vs Docker ● No need to build docker images { "id": "/flink-app", "cmd": "$JAVA_HOME/bin/java -jar MyApp.jar", "instances": 1, "fetch": [ { "uri": "http://…/MyApp.jar", }, { "uri": "https://.../jre-8u121-linux-x64.tar.gz", } ], © 2018 Mesosphere, Inc. All Rights Reserved.

24. Containerization ● JVM and Container ● Not aware of cgroups ● Much better with JDK 9 & 10 ● Overwrite JVM default values https://cloakable.irdeto.com/2017/08/24/java-is-a-first-class-citizen-in-a-docker-ecosystem-now/ © 2018 Mesosphere, Inc. All Rights Reserved.

25.Resource Allocation ● Depends on Job you are :) ○ Monitoring usage/allocation ● Memory ○ Consider Overhead to Heap ● Flexibility thanks to Flip-6 © 2018 Mesosphere, Inc. All Rights Reserved.

26.Multi-User: Quota ● Share resources between multiple frameworks/job ● Without static partitioning ● One role per job/entity ● Use quota per role ● Min and Max resource allocation © 2018 Mesosphere, Inc. All Rights Reserved.

27.Monitoring © 2018 Mesosphere, Inc. All Rights Reserved.

28.Configuration Changes and Updates Currently manual changes and redeploy ● Checkpoints ● Parallel Deployments © 2018 Mesosphere, Inc. All Rights Reserved.

29.Demo Generator Display 1. Financial data created 2. Written to 3. Kafka Topics consumed by Flink 4. Results written back into Kafka by generator Kafka topics 7. Results displayed stream (another topic) © 2018 Mesosphere, Inc. All Rights Reserved. 29