从YARN到Kubernetes - 如何应对资源管理及作业调度的挑战

如今Kubernetes已经成长为容器管理平台的标准,并逐步成为下一代数据处理以及AI平台的基础价构层的中坚力量。越来越多的人尝试把原来运行在YARN中的大数据应用迁移到Kubernetes上。在这个变迁中,资源管理以及作业调度的挑战逐渐显现出来。在这次分享中,我们将介绍如何利用Apache YuniKorn (Incubating) 搭建基于Kubernetes的数据处理及AI计算平台,如何进行细粒度的资源管控以及高效的作业调度。

展开查看详情

1. 从 到 如何应对资源管理及作业调度的挑战 杨巍威

2.♥

3.● ● ● ● ●

4.“Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.” ● ● ● ●

5.● ● ● ● ●

6.“CDP delivers powerful self-service analytics across hybrid and multi-cloud environments, along with sophisticated and granular security and governance policies that IT and data leaders demand.” “Open Data Hub is a blueprint for building an AI as a service platform on Red Hat's Kubernetes-based OpenShift® Container Platform and Ceph Object Storage. It inherits from upstream efforts such as Kafka/Strimzi and Kubeflow, and is the foundation for Red Hat's internal data science and AI platform.“ “The launch of Cloud Dataproc on Kubernetes is significant in that it provides customers with a single control plane for deploying and managing Apache Spark jobs on Google Kubernetes Engine in both public cloud and on-premises environments.”

7.Separate Compute Resource Job Scheduling and Storage Management

8.Apache YuniKorn (Incubating) is a light-weight, universal resource scheduler for container orchestrator systems. It provides fine-grained resource sharing for various workloads efficiently on a large scale, multi-tenant, and cloud-native environment.

9.

10.DECOUPLED An abstraction of scheduler-interface to decouple the scheduler-core with the underneath platforms SCHEDULING Built-in with advanced scheduling capabilities to support both batch and long-running workloads. CLOUD-NATIVE Highly extendable, scalable, natively works on-prem and cloud.

11.Feature Default YUNIKORN Note Scheduler Scheduling at app App is the 1st class citizen in YuniKorn, YuniKorn schedules apps with respect dimension to, e,g their submission order, priority, resource usage, etc. Job ordering YuniKorn supports FIFO/FAIR/Priority (WIP) job ordering policies Fine-grained resource Manage cluster resources with hierarchy queues, queue provides the capacity management guaranteed resources (min) and the resource quota (max). Resource fairness Inter-queue resource fairness Natively support Big Data The default scheduler is main for long-running services. YuniKorn is designed workloads for Big Data app workloads, it natively supports Spark/Flink/Tensorflow, etc. Scale & Performance YuniKorn is optimized for performance, it is suitable for high throughput and large scale environments.

12. master ETCD Resource Scheduler Apps 1 API Server 2 3 Kubelet Default Scheduler YUNIKORN Queues Sort Filter Apps Score Request Queue Sort Extensions Nodes App Sort Node Sort Pluggable Policies

13.

14. Compute Pool Manager Central Resource Compute Pool Manager Storage Job YUNIKORN API Server Storage Control Plane Metadata Security Governance

15.Schedule 50,000 pods on 50k pods on 2k nodes 50k pods on 4k nodes 2,000/4,000 nodes. Compare Scheduling throughput (Pods per second allocated by scheduler) Red line (YuniKorn) Green line (Default Scheduler) ↑ ↑ Detail report: https://github.com/apache/incubator-yunikorn-core/blob/master/docs/evaluate-perf-function-with-Kubemark.md

16.

17.THANKS! 如果对 YUNICORN 技术感兴趣请扫描入群 或者发邮件至 dev@yunikorn.apache.org