- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
The path to GPU as a Service
展开查看详情
1 .The Path to GPU as a Service in Kubernetes March 27, 2018 Viraj Chavan Renaud Gaubert <@RenaudWasTaken>
2 . Kubernetes The State of GPUs 2
3 . The State of GPUs in Kubernetes In 1.7 and Before: --accelerators ● Completely experimental support (1.6 supports one GPU / node) ● Manually mount the volumes in your pod spec ● No GPU Monitoring or Health check ○ Black Hole effect ● Not supported by NVIDIA 3
4 . The State of GPUs in Kubernetes In 1.8 and 1.9: Device Plugin ● Pluggable System in alpha state ● GPU Health check ● Official NVIDIA support ○ Through the use of the new NVIDIA container runtime Limits: ● It’s an alpha feature ○ 1.8 Plugins are not compatible with 1.9 ○ 1.9 plugins are not compatible with 1.10 ● You might get some races when Kubelet restarts ● Init Containers are counted as regular containers in 1.8 ● Homogeneous nodes only (e.g: You can’t have a 1070 and a 1080 on the same node) 4
5 . The State of GPUs in Kubernetes In 1.10 ● Graduated to a Beta system ● GPU Metrics are now advertised by cAdvisor ● Complete CRIO support 5
6 . The State of GPUs in Kubernetes Going Forward ● Kubernetes is still missing a number of important features for GPUs: ● NUMA ● GPU Topology ● Multi-node ● GPU sharing ● GPU attributes ● More GPU metrics ● GPU soft quotas? 6
7 .Kubernetes Optimized For NVIDIA GPUs 7
8 . KUBERNETES Optimized for NVIDIA GPUs Mission ● A specialized Kubernetes for specialized computing ● Maximize individual GPU utilization and cluster level GPU occupancy ● Provide Early access to complex GPU features ● Provide Frictionless adoption of Kubernetes for NVIDIA GPUs 8
9 . KUBERNETES Optimized for NVIDIA GPUs Why ● Similar to TensorFlow we will upstream features as fast as possible ● We want to provide these features today, not a year from now ● Some features are specific to GPUs and don’t need to be in core Kubernetes ● Single product offer rather than 10 plugins ● Support for upstream changes 9
10 .FEATURE OVERVIEW 10
11 . Full Docker Runtime support Cluster Admin Facing ▶ Use case ▶ I want the minimum amount of setup when provisioning a node ▶ I want the NVIDIA runtime to be ran only for NVIDIA images ▶ Before ▶ The NVIDIA runtime was ran for all images (default runtime) ▶ Images that did not request GPUs might have all GPUs exposed ▶ After: The NVIDIA runtime is selected only for NVIDIA images 11
12 . Full CRI-O Runtime support Cluster Admin Facing ▶ Use case ▶ For enterprises customers running RHEL, CRI-O is becoming the default runtime ▶ Before ▶ Same issues as Docker (default runtime, …) ▶ After: The NVIDIA runtime is selected only for NVIDIA images ▶ Additionally this will be in upstream 1.10 12
13 . GPU Attributes Cluster Admin and User Facing ▶ Use case ▶ I want to request 2 different GPUs ▶ I want to request N GPUs with a minimum of 16Gb ▶ Before: only homogeneous nodes + manually label nodes with GPU attributes ▶ Attributes needed to be exposed automatically ▶ Attributes needed an explicit API ▶ After: GPU selection can be done on Memory, Compute Capability, ECC 13
14 . GPU Sharing User Facing ▶ Use case ▶ Sharing the same GPU between multiple containers ▶ Requesting “shares” of any GPU for a container ▶ Before: No sharing 14
15 . GPU Monitoring Cluster Admin and User Facing ▶ Use case ▶ Monitor GPU usage and Health ▶ Prometheus and cadvisor ▶ Per-process/container monitoring ▶ Before: no or little GPU monitoring (1.10) 15
16 . Hard Quotas ▶ Use case ▶ Limit the number of GPUs / namespace ▶ Before: No quotas ▶ Will be upstreamed in 1.10 16
17 .FEATURES IN THE RACKS 17
18 . NUMA and Topology User Facing ▶ Use case ▶ As a GPU Software Engineer I want my application to run as fast as possible ▶ I want my container to be pinned to the CPU(s) that matches my GPU(s) ▶ I want a NIC on the same NUMA node as my GPU(s) ▶ I might want to select the minimum interconnection between my GPUs (QPI, Bridge, Switch, NVLINK, 2xNVLINK) ▶ As a cluster admin I want to maximize GPU occupancy ▶ A common workaround the NUMA issue is to request all the GPUs on a node ▶ Even though you might only need 2/3/5/6/… ▶ Today: NUMA and Topology not handled 18
19 . Batch Scheduling ▶ Use case ▶ Run MPI jobs on a Kubernetes cluster ▶ Before: No support for batch scheduling ▶ We need to sync efforts with the NGC team 19
20 .Dive into the Architecture 20
21 .Conclusion 21
22 .Thank You! Viraj Chavan Renaud Gaubert