在 eBay 上管理生产级工作负载的 Operator 框架

在生产环境中对大数据应用程序进行容器化是一个很大的挑战,因为这些应用程序需要人工操作知识来正确地操作,同时防止数据丢失。 在 eBay 上,我们从 CoreOs 运算符样式概念中获得灵感,并构建了一个框架,使其更容易和敏捷地管理 Kubernetes 的多种组件、地理分布的有状态工作负载。 凭借这个运算符框架,用户可以在一个 yaml 文件、数据库或消息队列或流处理器中获得所需的任何应用程序。该框架利用我们自行设计的工作流引擎,将应用程序部署到多个 Kubernetes 集群中,并自动使用组件管理应用程序。提供了自我修复、扩展、升级、配置等管理功能,以实现生产级的可靠性和高可用性。
展开查看详情

1.Operator Framework to manage Stateful Workloads in eBay

2.AGENDA l Background l Introduction l Features l Conclusion

3.BACKGROUND San Jose Data Platform Data Data Data eBay has the data platform and shared data ecosystem to provide the off-line and real- Las Vegas time data that power eBay's vision of a buyer Data experience, seller insights and agile, data driven business to serve the ebay employees Phoenix all around the world. C 190+ OUNTRY D 600PB ATA

4.BACKGROUND High IOSP Local Persistent Volume Disk failure

5.BACKGROUND Example: Tiered Kafka Architecture Example: Active-Active Mysql Cluster

6.BACKGROUND Simplify Management Improve Reliability Security Multiple Kubernetes Clusters Always Available Integrate with existing Internal Dependency enterprise security policies Resiliency Complex Data Operation Highly Scalable o Scaling High Performance o Rolling Restart

7.BACKGROUND StatefulSet: 1. Cross K8s clusters deployment 2. Auto Recover for Disk Failure 3. Defined order for rolling restart Helm Charts: 1. On-demand configuration changes for related components 2. Customize the docker images

8.INTRODUCTION Operator Pattern + Workflow Web Codin g Passc ode mental Enviro

9.INTRODUCTION Operator Framework: One Stack Management for Stateful Workloads

10.FEATURES – Deploy Pattern 1. Declarative interface and Free combination 2. Model data applications with K8s resources • Pure Pods + Local Volume • Deployment • DeploymentSets • StatefulSet

11.FEATURES – Deploy Pattern 3. Cross Kubernetes Clusters Deployment and Management

12. FEATURES – Lifecycle Management 1. On-demand Management functions to reduce the maintain efforts Provision/ Decommission Auto Remediation Rest API and Kubernetes native If one node is down, the WISB WISB Management Workflow API for one step create or delete flow will automatically triggered to a cluster bring back the missed node • Declarative with simple syntax via yaml, easy to Scaling Rolling Restart / Upgrade modify and maintain The cluster could on-demand Upgrade the cluster to new • Decompose complex logic with idempotent reusable flexup and flexdown version and retryable tasks • Design for failure Configuration Management • Reusable and parallelism subflows Replacement Modify the application Replace the bad node or low • Special design for group operation configuration parameters and performance node update to the cluster

13.FEATURES – Lifecycle Management 2. Reusable common flow and customize WISB flow

14. FEATURES – Reliability Health Check Sidecar Agent • Scheduling triggered HealthCheck Flow to check app status • Proxy admin actions and permission control • Send notification by Email or Stack if any unhealthy detected • View Log and collect metrics • Alerting (Disk healthy/ Disk usage) • Secret rotation

15. 15 FEATURES – Security Authentication Keystone authentication and integration with LDAP RBAC Grant the CRU permission to specific namespace, only the user have the CRU permission could manage the cluster . Non-root user security context / setcap Standard approve process sudo action / human approve / Trace system

16.Conclusion Operator Framework We utilize the operator pattern, and benefit from the workflow engine to build a framework to makes it easier and more agile to manage all the multi-component, GEO-distributed stateful workloads.

17.Thanks !

18.