深入了解Rook

将深入探讨 Rook 项目的架构。Rook 是 Kubernetes 的开源云原生存储协调器,为各种存储解决方案提供平台、框架和支持,以便与云原生环境本地集成。我们将了解 Kubernetes 的各种扩展机制以及像控制器/运算符、资源管理和滚动升级等模式,以了解如何为云原生环境构建完全自动化的存储解决方案。针对某些特定存储解决方案(如 Ceph、CockroachDB、Minio 和 NFS),我们还将深入了解如何生成运算符以及如何执行业务流程。2018 年 1 月,Rook 被认可为云原生计算基金会主办的第一个存储项目。
展开查看详情

1.Rook Deep Dive Jared Watts Rook Senior Maintainer Upbound Founding Engineer https://rook.io/ https://github.com/rook/rook

2.Agenda ● Quick introduction to Rook (again) ● Deep dive: Ceph orchestration ● Deep dive: Storage provider integration for Minio ● Demo (if time allows) ● Questions

3.What is Rook? ● Cloud-Native Storage Orchestrator ● Extends Kubernetes with custom types and controllers ● Automates deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management ● Framework for many storage providers and solutions ● Open Source (Apache 2.0) ● Hosted by the Cloud-Native Computing Foundation (CNCF)

4.Storage Challenges ● Reliance on external storage ○ Requires these services to be accessible ○ Deployment burden ● Reliance on cloud provider managed services ○ Vendor lock-in ● Day 2 operations - who is managing the storage?

5.Possible Solutions ● Deploy storage systems INTO the cluster ● Portable abstractions for all storage needs ○ Database, message queue, cache, object store, etc. ● Power of choice: cost, features, resiliency, compliance ● Automated management by smart software

6.Custom Resource Definitions (CRDs) ● Teaches Kubernetes about new first-class objects ● Custom Resource Definition (CRDs) are arbitrary types that extend the Kubernetes API ○ look just like any other built-in object (e.g. Pod) ○ Enabled native kubectl experience ● A means for user to describe their desired state

7. Rook Operators ● Implements the Operator Pattern for storage solutions ● User defines desired state for the storage cluster ● The Operator runs reconciliation loops ○ Observe - Watches for changes in desired state and cluster ○ Analyze - Determine differences between desired and actual ○ Act - Applies changes to the cluster to drive it towards desired

8.Rook Framework for Storage Solutions ● Rook is more than just a collection of Operators and CRDs ● Framework for storage providers to integrate their solutions into cloud-native environments ○ Storage resource normalization ○ Operator patterns/plumbing ○ Common policies, specs, logic ○ Testing effort ● Ceph, CockroachDB, Minio, NFS, Cassandra, Nexenta, and more...

9.Ceph Deep Dive

10. General Orchestration Approach ● Operator runs ceph commands to initialize and bootstrap cluster (cephx auth, crush map, etc.) ● Pod template spec is generated from Cluster CRD config ○ Ceph daemon command line arguments ○ Environment variables injected ○ ceph.conf generated and written to pod filesystem ● Operator creates a Kubernetes controller primitive (Deployment) to manage lifecycle of each Ceph pod ● Health of cluster and components is monitored over time and corrective actions taken

11. Orchestration of Monitors ● Operator creates a pod for each mon specified in the CRD apiVersion: ● Deployment object wraps each mon pod ceph.rook.io/v1beta1 kind: Cluster for reliable lifecycle management metadata: ● Placement of mons ensures node isolation name: my-cluster spec: (1 mon per node) mon: count: 3 ● Service object is created per mon to multiPerNode: false establish a consistent IP address - important for quorum and mon map

12. Monitors: Surviving Pod Restarts ● Mon persistent state must survive restarts (pod restart, node reboot, power failure, etc.) ● Mon state is stored in a HostPath mounted by the mon pod ○ user configurable via dataDirHostPath in Cluster CRD ● After a power outage, mon pods start and load state from the persisted data ○ Once mons form quorum, the cluster is healthy again ● If a mon loses its persisted data, it will heal itself after a restart

13.Monitors: Maintaining quorum ● Mon quorum is critical to cluster health ● Operator regularly checks on mon quorum ● If a mon falls out of quorum for too long, the operator takes action to replace the failed mon ○ A new mon is started (new Deployment and Service IP) ○ Wait for new mon to join quorum ○ Delete the failed mon Deployment and Service ○ Remove the failed mon from the mon map

14. Orchestration of OSDs ● Operator starts OSDs according to config from Cluster CRD ● “Discover” DaemonSet identifies available devices on all nodes in the cluster ● Operator schedules a BatchJob on each node to initialize/provision its OSDs ● One Deployment (ReplicaSet/Pod) is created for each OSD ○ OSDs run independently ● Horizontal scaling: Operator automatically add OSDs to new nodes and devices

15. OSDs: Device selection ● Mode 1: Automatically consume available spec: storage: devices on all nodes useAllNodes: true useAllDevices: true ○ Safety checks ensure devices aren’t already in use spec: ● Mode 2: Only consume the devices storage: specified in the CRD useAllNodes: false nodes: ○ Full admin control for which devices will - name: "node1" devices: run OSDs - name: "sdb" - name: "node2" deviceFilter: "^sd."

16. Orchestration of RGW apiVersion: ceph.rook.io/v1beta1 ● Creates an object gateway according kind: ObjectStore to settings in the ObjectStore CRD metadata: name: my-store ● Required Ceph pools are created spec: metadataPool: ○ 5 metadata pools replicated: size: 3 ○ 1 data pool (can be erasure coded) dataPool: erasureCoded: ● RGW pods are started via Deployment dataChunks: 2 for HA/reliability codingChunks: 2 gateway: ● Service created for client access and port: 80 instances: 1 load balancing

17. Orchestration of CephFS apiVersion: ceph.rook.io/v1beta1 ● Creates a shared file system according kind: Filesystem to settings in the Filesystem CRD metadata: name: my-filesystem ● Required Ceph pools are created spec: metadataPool: ○ 1 metadata pool replicated: size: 3 ○ 1 data pool (can be erasure coded) dataPools: - replicated: ● MDS pods are started via Deployment size: 3 for HA/reliability metadataServer: activeCount: 1 ○ Standby MDS pods for quick activeStandby: true failover

18. Rook Agent ● Dynamically attaches/mounts Ceph storage for pod consumption ● Runs as DaemonSet on all schedulable nodes in cluster ● Block: rbd map ● File: mount -t ceph ● Fencing and locking for ReadWriteOnce ● Detach and reattach if pod scheduled onto another node ● Currently a Kubernetes FlexVolume, will be replaced by CSI driver in the near future (work ongoing)

19. Automated Stateful Upgrades ● Partially implemented in 0.8, more support coming in 0.9 ● Operator controls and manages software upgrade flow ● Upgrade is simply applying/reconciling desired state ● Leverages built-in functionality of K8s resources like Deployments to update components in a rolling fashion ● Health checks to ensure cluster health is maintained ● Separation of Rook and Ceph versioning to isolate impact ● Special upgrade and migration steps between major versions of Ceph (Mimic -> Nautilus) will be implemented as necessary

20.Developer Deep Dive: Storage Provider Integration Minio Operator

21. Operator Frameworks Current: Register CRDs, watch events and invoke handler functions ● Rook operator-kit: https://github.com/rook/operator-kit Future: Auto-generate APIs, CRDs, controllers, reconciliation, boilerplate code, unit tests, deployment, etc. ● Operator SDK: https://github.com/operator-framework/operator-sdk ● Kubebuilder: https://github.com/kubernetes-sigs/kubebuilder

22.Minio ObjectStore CRD

23.Minio ObjectStore Custom Object

24.Using the Object Store CRD

25.Revisiting the ObjectStore ● Rook knows how to work with common information in storage object specs (networking, node counts, etc.) ● Only the credentials are Minio-specific ● We can use this information to deploy a Minio cluster

26.Minio Operator ● We specify the container that the Minio operator will reside in ● Args are provided to inform the Rook binary that it needs to operate on Minio ● We include the CRD in the same file as this operator description

27.Minio Operator Container Image ● Contains both Minio server/tools and Rook libraries ● Optimized Docker build to collapse layers and minify image ● Base image is Alpine Linux

28.Minio ObjectStore Golang Types ● ObjectStoreSpec struct defines the config properties exposed to the user in object-store.yaml ● Notice the spec takes advantage of the common types/specs from the Rook framework

29.Minio Operator Watching for Events ● We create a new watcher to watch for add, update, or delete events ● Event handler functions are passed to the Rook operator-kit