Node Operator：Kubernetes 节点管理化繁为简

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/u5174/KubeCon__NodeOperator?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

ccone

发布于

6年前

2904

人观看

#信息技术

Kubernetes 节点依赖于许多主机上的软件和配置，包括容器运行时、网络插件和 kubelet。维护这些依赖关系既繁琐又容易出错。在阿里巴巴和蚂蚁金服，一个普通的集群管理员平均需要维护成千上万个 Kubernetes 节点。我们开发了 Node Operator 以简化任务并降低任务风险。在本次演讲中，我们将分享如何使用 Node Operator 维护节点软件和配置。我们设计的声明式 API 可以让集群管理员与节点 CRD 资源进行交互，以管理任何节点的生命周期。 Node Operator 还负责对节点的状态改变做出响应并在必须要采取恢复措施。节点运算符具有可扩展设计，因此可管理不属于 Kubernetes 的其他主机上的软件。

展开查看详情

1 .Node Operator: Kubernetes Node Management Made Simple 陈俊, Ant Financial

2 .Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: Kube-on-Kube-Operator • Achievement • Q&A

3 . Background: DC/OS From Sigma 2.0(Swarm) to Sigma 3.1(Kubernetes)

4 .Background: Operation Requirements • Apply to large-scale Cluster • Setup & Teardown Cluster fast and convenient • Add & delete Node at any time • Upgrade Master & Node Components reliably • Canary Rollout • Master & Node Component Versions Management

5 .Motivation: Work Order Deployment • Upgrade Nodes Versions • Upgrade Node 10.10.10.1 • Upgrade docker • Upgrade kubelet • Upgrade Node 10.10.10.2 • Upgrade docker Worker Order • Upgrade kubelet ….

6 .Motivation: Work Order Deployment Disadvantages • Inconsistency • Non-failure-aware • Complicated architecture Work order deployment system can not meet the requirements of resource management.

7 .Operator • Observe: watch desired Observe resource and actual resource • Analyze: difference from desired and actual config Action Analyze • Action: manage resource to desired config

8 .Operator: Advantages • Declarative system • Manage resource to final state continually • kube-apiserver oriented programming • CustomResourceDefinition (CRD) • Built on Kubernetes APIs • Kubernetes repo support • Agile, flexible and convenient

9 .Node-Operator: Overview • User: SREs who can scale & offline Nodes through posting Machine CRs. • Node-Operator: difference Machine and Node state, manage Node softwares and configure files. • Machine: the instance of Machine CRD with node basic information, which represent a node desired in the Kubernetes. • NPD(Node Problem Detector): post Node state to kube- apiserver.

10 .Node-Operator: Scale Nodes Node-Operator

11 .Node-Operator: Upgrade Nodes Node-Operator

12 .Node-Operator: Grayscale Rollout Node-Operator

13 .Kube-on-Kube-Operator: Overview • Biz-Cluster: used to deploy our application. • Meta-Cluster: used to set up Biz-Cluster master components. We add Biz-Cluster master nodes to Meta-Cluster. • User: SREs who can setup & upgrade Biz-Cluster by posting Cluster CRs. • Kube-on-Kube-Operator: difference Biz-Cluster CRs and Biz-Cluster master components state, and manage Biz-Cluster master components through Kubernetes resource, such as Deployment, Pod, etc.

14 .Work Together

15 .Achievement • Anyone can operate and maintenance Kubernetes Cluster • Set up & tear down Kubernetes Cluster in two Minutes • Automated rollouts and rollbacks • Cluster & Node self-healing

16 .Q&A THANKS --------- Q&A Section -------- /感谢聆听陈俊 WeChat: answer1991chen

17 .

18 .Background: Cluster Scale • Production environment: • Dozens of Cluster • 5k+ Nodes / Cluster • 10k+ Nodes / largest Cluster • Testing environment • Hundreds of Cluster for CI/CD • 500+ Nodes / Cluster

4点赞

0收藏

0下载