在 eBay 上管理生产级工作负载的 Operator 框架
展开查看详情
1.Operator Framework to manage Stateful Workloads in eBay
2.AGENDA l Background l Introduction l Features l Conclusion
3.BACKGROUND San Jose Data Platform Data Data Data eBay has the data platform and shared data ecosystem to provide the off-line and real- Las Vegas time data that power eBay's vision of a buyer Data experience, seller insights and agile, data driven business to serve the ebay employees Phoenix all around the world. C 190+ OUNTRY D 600PB ATA
4.BACKGROUND High IOSP Local Persistent Volume Disk failure
5.BACKGROUND Example: Tiered Kafka Architecture Example: Active-Active Mysql Cluster
6.BACKGROUND Simplify Management Improve Reliability Security Multiple Kubernetes Clusters Always Available Integrate with existing Internal Dependency enterprise security policies Resiliency Complex Data Operation Highly Scalable o Scaling High Performance o Rolling Restart
7.BACKGROUND StatefulSet: 1. Cross K8s clusters deployment 2. Auto Recover for Disk Failure 3. Defined order for rolling restart Helm Charts: 1. On-demand configuration changes for related components 2. Customize the docker images
8.INTRODUCTION Operator Pattern + Workflow Web Codin g Passc ode mental Enviro
9.INTRODUCTION Operator Framework: One Stack Management for Stateful Workloads
10.FEATURES – Deploy Pattern 1. Declarative interface and Free combination 2. Model data applications with K8s resources • Pure Pods + Local Volume • Deployment • DeploymentSets • StatefulSet
11.FEATURES – Deploy Pattern 3. Cross Kubernetes Clusters Deployment and Management
12. FEATURES – Lifecycle Management 1. On-demand Management functions to reduce the maintain efforts Provision/ Decommission Auto Remediation Rest API and Kubernetes native If one node is down, the WISB WISB Management Workflow API for one step create or delete flow will automatically triggered to a cluster bring back the missed node • Declarative with simple syntax via yaml, easy to Scaling Rolling Restart / Upgrade modify and maintain The cluster could on-demand Upgrade the cluster to new • Decompose complex logic with idempotent reusable flexup and flexdown version and retryable tasks • Design for failure Configuration Management • Reusable and parallelism subflows Replacement Modify the application Replace the bad node or low • Special design for group operation configuration parameters and performance node update to the cluster
13.FEATURES – Lifecycle Management 2. Reusable common flow and customize WISB flow
14. FEATURES – Reliability Health Check Sidecar Agent • Scheduling triggered HealthCheck Flow to check app status • Proxy admin actions and permission control • Send notification by Email or Stack if any unhealthy detected • View Log and collect metrics • Alerting (Disk healthy/ Disk usage) • Secret rotation
15. 15 FEATURES – Security Authentication Keystone authentication and integration with LDAP RBAC Grant the CRU permission to specific namespace, only the user have the CRU permission could manage the cluster . Non-root user security context / setcap Standard approve process sudo action / human approve / Trace system
16.Conclusion Operator Framework We utilize the operator pattern, and benefit from the workflow engine to build a framework to makes it easier and more agile to manage all the multi-component, GEO-distributed stateful workloads.
17.Thanks !
18.