BoF: Not One Size Fits All, How to Size Kubernetes Clusters

Sizing Kubernetes clusters, at best, can be compared to throwing darts at a dartboard, in the dark. However, our enterprise-tested rules and tips can shine a little light on the dartboard and help you have enough capacity for your apps. In this lightning talk, we will go over some tips to help you throw a bullseye for sizing your clusters. A unique demo will accompany this talk. Don't throw darts in the dark -- Kube at scale is possible.
展开查看详情

1.Not one size fits all, how to size Kubernetes clusters Sahdev Zala / Guang Ya Liu @sp_zala / @gyliu513 spzala@us.ibm.com / liugya@cn.ibm.com

2.Outline • Some basics – Containers, Kubernetes • Kubernetes cluster – What is it? – Design and sizing consideration – Optimization techniques • Large scale enterprise cluster – How we created a 1000 node cluster – Lessons learned

3.Overview of Containers • Abstraction at the app layer that packages code and dependencies together • Multiple containers can run on the same machine and share the OS kernel with other containers, each running as isolated processes in user space

4.What is Kubernetes? • Enterprise level container orchestration • Provision, manage, scale applications (containers) across a cluster • Manage infrastructure resources needed by applications • Compute • Volumes • Networks • And many many many more... • Declarative model • Provide the "desired state" and Kubernetes will make it happen • What's in a name? • Kubernetes (K8s/Kube): "Helmsman" in ancient Greek

5.Kubernetes Community Overview • Cloud Native Computing Foundation project • Github Repositories • github.com/kubernetes/kubernetes • github.com/kubernetes/kubernetes/issues • github.com/kubernetes/kubernetes/pulls • github.com/kubernetes/website • github.com/kubernetes/community • Special Interest Groups (SIGs) • Slack channels – https://kubernetes.slack.com • Mailing lists

6.K8s – API vs Compute Resources • Pod • ReplicaSet • Deployment • Service • ConfigMap • Secrets • Jobs • …But how about your cluster and compute resources? – Node, CPU, Memory

7.Kubernetes Cluster • A running Kubernetes cluster contains a cluster control plane (AKA master) and worker node(s), with cluster state backed by a distributed storage system(etcd). Cluster can be a single node to several nodes • Kubernetes can run on various platforms – Laptop, VMs, Rack of bare metal servers. The effort required to set up a cluster varies from running a single command to crafting your own customized cluster

8.Kubernetes Cluster choices • Local-machine Solutions – Minikube, DIND, Ubuntu on LXD – IBM Cloud Private Community Edition (CE) - https://hub.docker.com/r/ibmcom/cfc-installer/ – Running Kubernetes locally has obvious development advantages, such as lower cost and faster iteration than constantly deploying and tearing down clusters on a public cloud. • Cloud Provider Solutions – AKS, EKS, GKE, IKS.. • Hosted/Managed cluster • On-Premises Solutions – Allow you to create Kubernetes clusters on your internal, secure, cloud network with only a few commands • IBM Cloud Private, Kubermatics.. • Turnkey Solution, Custom Cluster Etc.

9.Factors Impacting Cluster Size • Single node to several nodes - what’s the right size for me? • What you want to do with it? – Just kick off the tires – i.e. learn/play/development – Production level cluster – Managed by a cloud provider or you want to manage? • What do you want to run on it? – One or few or many applications – Kind of applications • Big Data/Artificial Intelligence (AI) application • CPU vs Memory intensive • Stateless or Stateful – Scale vs Performance – Networking need – Monitoring, logging need • What kind of traffic do you expect? – Steady heavy traffic – Burst traffic • What is your your budget? – Hardware/Virtualization infrastructure set up • Etc.

10.Scale vs Performance • After reaching a recommended threshold, scale and performances are inversely proportional – Kubernetes has defined two service level objectives • Return 99% of all API calls in < 1sec • Start 99% of pods within < 5 sec – According to study, clusters with more than 5,000 nodes may not be able to achieve these service level objectives – Per what we learned, a single cluster with maximum 2500 nodes is good enough • Anything above, go for multi-cluster approach. This is not very stable yet but it’s a WIP. Learn more here, https://github.com/kubernetes-sigs/federation- v2

11.Networking Need • NodePort, LoadBalancer, or Ingress services? – e.g. a Minikube cluster is not ideal if you want to expose your app with a LoadBalancer or Ingress services • Also, what you using for networking – e.g. Calico, Flannel?

12.Single or Multiple Master Nodes • A big cluster with single master may not be enough – You may need multiple master nodes and want to divide the load of master to multiple servers. • In our case, – We had 3 master nodes – Added management node for Monitoring, Logging – Multiple Proxy nodes

13.Requests and Limits • Helps manage compute resources for containers – Specify how much CPU and memory (RAM) each Container needs by using requests and limits • Requests determines minimum cpu/memory required by container • Limits set the max cpu/memory allowed 64 MiB • Improve scheduler efficiency 250 – Allows Kubernetes to increases millicore/ utilization, while at the same millicpu time maintaining resource guarantees for the containers that need guarantees

14.Node Selectors • Provides control on how to assign a pod to nodes • Simplest form of constrains • Constrain a pod to run on a specific nodes Node label • Useful in certain circumstances – Ensure that a pod ends up on a machine with an SSD attached to it

15.Node Affinity / Anti Affinity • The node affinity expands the types of constraints in compare to Node Selector – Allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node – Two types • Hard – requiredDuringSchedulingIgnored DuringExecution – must be met for a pod to be scheduled onto a nod • Soft – preferredDuringSchedulingIgnored DuringExecution – scheduler will try to enforce but will not guarantee

16.Inter-pod Affinity / Anti Affinity • Allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes • Inter-pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. We do not recommend using them in clusters larger than several hundred nodes

17.Taints and Tolerations • Taints are the opposite Adding taint to a node to the Node Affinity. They are key-value Corresponding pod pairs associated with an effect • Together they ensure that pods are not scheduled onto inappropriate nodes • Provides a flexible way to steer pods away from certain nodes

18.Kubernetes Based IBM Cloud Private • Kubernetes is not enough • An enterprise Kubernetes distribution should also include some other core services for logging, monitoring etc • Learn more about IBM Cloud Private at here https://www- 03.ibm.com/support/knowledgecenter/SSBS 6K_2.1.0.3/kc_welcome_containers.html

19.Deployment Topology • Best Deployment – Master – Management – Worker – Proxy – Dynamic host group

20.500 Nodes Deployment Arch • IBM Cloud Private 2.1.0.2 which was released in 2018.3 • Calico V2 with Node to Node Mesh • Sharing one etcd cluster between Kubernetes and Calicos

21.Network Impact for 500+ Nodes • Kubernetes claim support 5000 nodes, why IBM Cloud Private cannot in 2.1.0.2? – IBM Cloud Private using calico as default network – IBM Cloud Private Calico using node-to-node mesh to configure peering between all calico nodes. – etcd load is very high when deploying 1000 node cluster, most load is from calico – Node-to-node mesh stops working if there are more than 700 nodes in the cluster. – Mesh number would be 1000! in a 1000 node cluster which is not acceptable! https://docs.projectcalico.org/v2.6/getting- started/kubernetes/installation/integration#requirements http://fuel-ccp.readthedocs.io/en/latest/design/k8s_1000_nodes_architecture.html https://coreos.com/etcd/docs/latest/tuning.html

22.ETCD Benchmark Test • ETCD Benchmark Comparison – Calico V2 with ETCD V2 API – Calico V3 with ETCD V3 API • Conclusion – Migrate to Calico V3 and use ETCD V3 API for IBM Cloud Private https://coreos.com/etcd/docs/latest/op-guide/v2-migration.html

23.1000+ Nodes Deployment Arch • IBM Cloud Private 2.1.0.3 which was released in 2018.5 • Calico V3 with Router Reflector • ETCD V3

24.Deployment Topology Changes etcd IBM Cloud Private 2.1.0.2 (k8s 1.9) Kubernetes Calico 2.6 with Node Node Mesh etcd1 Etcd2(Optional) Router Reflector IBM Cloud Private 2.1.0.3 (k8s 1.10) Kubernetes Calico v3.0.4 with Router Reflector

25.Summary • Sizing Kubernetes cluster can be challenging specially for large scale cluster • Be benefitted from the experience of others – Do good research on what others recommend. Learn from already proven approaches. • Understand scheduler optimization techniques in Kubernetes • Etcd storage with SSD for better performance

26.Thank You!! THANK YOU!!

27.