Containerd ShimV2 + KataContainersas Kubernetes Runtime



1.Containerd ShimV2 + KataContainers as Kubernetes Runtime Lei Zhang, Kubernetes Community



4. Kubernetes Control Panel pod, node list api-server Workloads Scheduling Orchestration bind Etcd kubelet kubelet kubelet C C C C C C C C C C C Node Node Node

5.Kubernetes + containerd kubelet containerd runC clone(), setns(), pivot_root() Linux Kernel C C C C C Node

6. Linux Container /bin /dev /etc /home /lib / lib64 /media /mnt /opt /proc / • Container Runtime root /run /sbin /sys /tmp / usr /var /data /temp.txt read-write layer “echo hello” • The dynamic view and boundary of your running process Read-Write Layer & /data init layer • Namespace + Cgroups /etc/hosts /etc/hostname /etc/resolv.conf n js o • Container Image CMD [“echo hello"] js o n read-only layer VOLUME /data • t The static view of your program, data, tx p. em dependencies, files and directories /t ADD temp.txt / • rootfs FROM busybox FROM busybox  ADD temp.txt /   VOLUME /data   CMD [“echo hello"]

7. KataContainers • Container Runtime • Each Pod is hypervisor isolated • Independent guest kernel • Secure as VM • Fast as container • Container Image • Same as Linux container

8. Container Security • Linux container • Dropping Linux capabilities • Read-only mount points • KataContainers • Mandatory access controls (MAC) • Hardware virtualization • SELinux & AppArmor • Dropping syscalls • Independent Linux instance per Pod • SECCOMP • e.g. run Linux 3.16 container on a Linux 4.0 host • In 99.99% cases • wrap containers in VMs

9.Kubernetes + KataContainers kubelet ??? KataContainers virtualization Linux Kernel VM VM VM VM VM Node

10.Container Runtime Interface (CRI) • Describe what kubelet expects from container runtimes • Imperative container-centric interface • why not pod-centric? • Every container runtime implementation needs to understand the concept of pod. • Interface has to be changed whenever new pod-level feature is proposed.

11. CRI Spec • Sandbox • How to isolate Pod environment? • Linux container: infra container + pod level cgroups • Kata: light-weighted VM • Container • Linux container: namespace + cgroups • Kata: namespace containers controlled by hyperstart

12. How CRI Works Management pod, node list Scheduling api-server Workloads bind Orchestration Etcd pod CRI Spec Sandbox kubelet Create Delete List client api dockershim docker kubelet pod CRI grpc Container Create GenericRuntime SyncLoop SyncPod Start Exec remote CRI shim Container Image (no-op) Runtime Pull List

13. How CRI shim works CNI add() 1.RunPodSandbox(foo) NODE 2. CreatContainer(A) foo (vm) 3. StartContainert(A) A B foo A B $ kubectl run foo … 4. CreatContainer(B) 5. StartContainer(B) docker runtime vm runtime Pod foo container A CreatContainer() StartContainer() StopContainer() RemoveContainer() container B null Created Running Exited null

14.Wrap Up kubelet CRI CRI shim Do your work here! KataContainers syscall Linux Kernel C C C C C Node

15.Use containerd/cri as CRI shim

16. But … • Too many containerd-shim, large resource footprint • CRI is a well-defined interface for Kubernetes to consume, not for runtimes • gVisor/KataContainers/VM • Un-match to existing CRI shims • Maintenance “nightmare” • e.g. cri-o VS cri-containerd + gVisor/KataContainers/VM-based runtimes, oh my …

17. Containerd ShimV2 • A “standard interface” between CRI shim and container runtime! • CRI -> containerd -> OCI runtime • CRI -> containerd -> shimV2 -> OCI runtime

18. What’s the difference? • Previous: • Call `containerd-shim` • This will start a shim process per container • Now: • Call `containerd-shim start` • Implement “start” operation as you wish: • Start containerd-shim when creating sandbox • Reuse existing containerd-shim when creating container

19.Containerd + ShimV2 + KataContainers

20.Containerd ShimV2 kubelet cri-containerd kata-containerd-shimv2 Do your work here! KataContainers virtualization Linux Kernel VM VM VM VM VM Node


22. Live Demo • Kubernetes + containerd + shimV2 +KataContainers 1. kubeadm installed, 3 nodes cluster on GCE, nested virtualization 2. Pod lifecycle 3. Independent kernel 1. No kernel sharing with host 4. Strong isolation 1. e.g. forkbomb 5. High density, small footprint 1. 100 KataContainers in one GCE Node in 2mins

23. Real Case • 1.5 Engineers + 1 GSoC student • Pull Request • • Expected to be merged in next 2 weeks

24. Read Our Story GSoC 18: Kata Containers support for containerd

25.Thank You! Lei Zhang