The Highs and Lows of Running a Distributed Database on Kubernetes

近年来,随着现代组织迅速采用容器,像数据库这样的有状态应用程序比其他工作负载更难过渡到这个勇敢的新世界。当涉及到持久状态时,容器编排系统和有状态应用程序本身都需要更多,以确保数据的持久性和可用性。
本文将介绍我在Kubernetes上可靠地运行CockroacheDB(开源分布式SQL数据库)的经验,优化其性能,并帮助其他人在其异构环境中实现这一点。我们将研究哪些类型的有状态应用程序最容易在容器中运行,Kubernetes的特性和使用模式对运行它们最有帮助,以及在运行过程中遇到的许多陷阱。最后,我们将思考在容器中运行数据库所缺少的内容以及未来可能会拥有的内容。

展开查看详情

1.The Highs and Lows of Running a Distributed Database on Kubernetes Presented by Alex Robinson / Systems Engineer @alexwritescode

2.Databases are critical to the applications that use them

3. You need to be very careful when making big changes to your database

4.Containers are a huge change

5.To succeed, you must:

6. To succeed, you must: 1. Understand your database

7. To succeed, you must: 1. Understand your database 2. Understand your orchestration system

8. To succeed, you must: 1. Understand your database 2. Understand your orchestration system 3. Plan for the worst

9.Let’s talk about databases in Kubernetes • Why would you even want to run databases in Kubernetes? • What do databases need to run reliably? • What should you know about your orchestration system? • What’s likely to go wrong and what can you do about it?

10.My experience with databases and containers • Worked directly on Kubernetes and GKE from 2014-2016 ○ Part of the original team that launched GKE • Led all container-related efforts for CockroachDB from 2016-2019 ○ Configurations for Kubernetes, DC/OS, Docker Swarm, even Cloud Foundry ○ AWS, GCP, Azure, On-Prem ○ From single availability zone deployments to multi-region ○ Helped users deploy and troubleshoot their custom setups

11.These days...

12.Why even bother? We’ve been operating databases for decades

13.Traditional management of databases 1. Provision one or more beefy machines with large/fast disks 2. Copy binaries and configuration onto machines 3. Run binaries with provided configuration 4. Never change anything unless absolutely necessary

14.Traditional management of databases • Pros ○ Stable, predictable, understandable • Cons ○ Most management is manual, especially to scale or recover from hardware failures ■ And that manual intervention may not be very well practiced

15.So why move state into Kubernetes? • The same reasons you’d move stateless applications to Kubernetes ○ Automated deployment, scheduling, resource isolation, scalability, failure recovery, rolling upgrades ■ Less manual toil, less room for operator error • Avoid separate workflows for stateless vs stateful applications

16.Challenges of managing state “Understand your databases”

17.What do stateful systems need?

18.What do stateful systems need? • Process management • Persistent storage

19.What do stateful systems need? • Process management • Persistent storage • If distributed, also: ○ Network connectivity ○ Consistent name/address ○ Peer discovery

20.Managing State on Kubernetes “Understand your orchestration system”

21.Let’s skip over the basics • Unless you want to manually pin pods to nodes, you should use either: ○ StatefulSet: ■ decouples replicas from nodes ■ persistent address for each replica, DNS-based peer discovery ■ network-attached storage instance associated with each replica ○ DaemonSet: ■ pin one replica to each node ■ use node’s disk(s)

22.Where do things go wrong?

23.

24.Don’t trust the defaults! • If you don’t specifically ask for persistent storage, you won’t get any ○ Always think about and specify where your data will live

25.Don’t trust the defaults! • If you don’t specifically ask for persistent storage, you won’t get any ○ Always think about and specify where your data will live 1. Data in container 2. Data on host filesystem 3. Data in network storage

26.Ask for a dynamically provisioned PersistentVolume

27.Don’t trust the defaults! • Now your data is persistent • But how’s performance?

28.Don’t trust the defaults! • If you don’t create and request your own StorageClass, you’re probably getting slow disks ○ Default on GCE is non-SSD (pd-standard) ○ Default on Azure is non-SSD (non-managed blob storage) ○ Default on AWS is gp2, which are backed by SSDs but with fewer IOPs than io2 • This really affects database performance

29.Use a custom StorageClass