- 微博 QQ QQ空间 贴吧
The Highs and Lows of Running a Distributed Database on Kubernetes
1 .The Highs and Lows of Running a Distributed Database on Kubernetes Presented by Alex Robinson / Systems Engineer @alexwritescode
2 .Databases are critical to the applications that use them
3 . You need to be very careful when making big changes to your database
4 .Containers are a huge change
5 .To succeed, you must:
6 . To succeed, you must: 1. Understand your database
7 . To succeed, you must: 1. Understand your database 2. Understand your orchestration system
8 . To succeed, you must: 1. Understand your database 2. Understand your orchestration system 3. Plan for the worst
9 .Let’s talk about databases in Kubernetes • Why would you even want to run databases in Kubernetes? • What do databases need to run reliably? • What should you know about your orchestration system? • What’s likely to go wrong and what can you do about it?
10 .My experience with databases and containers • Worked directly on Kubernetes and GKE from 2014-2016 ○ Part of the original team that launched GKE • Led all container-related efforts for CockroachDB from 2016-2019 ○ Conﬁgurations for Kubernetes, DC/OS, Docker Swarm, even Cloud Foundry ○ AWS, GCP, Azure, On-Prem ○ From single availability zone deployments to multi-region ○ Helped users deploy and troubleshoot their custom setups
11 .These days...
12 .Why even bother? We’ve been operating databases for decades
13 .Traditional management of databases 1. Provision one or more beefy machines with large/fast disks 2. Copy binaries and conﬁguration onto machines 3. Run binaries with provided conﬁguration 4. Never change anything unless absolutely necessary
14 .Traditional management of databases • Pros ○ Stable, predictable, understandable • Cons ○ Most management is manual, especially to scale or recover from hardware failures ■ And that manual intervention may not be very well practiced
15 .So why move state into Kubernetes? • The same reasons you’d move stateless applications to Kubernetes ○ Automated deployment, scheduling, resource isolation, scalability, failure recovery, rolling upgrades ■ Less manual toil, less room for operator error • Avoid separate workﬂows for stateless vs stateful applications
16 .Challenges of managing state “Understand your databases”
17 .What do stateful systems need?
18 .What do stateful systems need? • Process management • Persistent storage
19 .What do stateful systems need? • Process management • Persistent storage • If distributed, also: ○ Network connectivity ○ Consistent name/address ○ Peer discovery
20 .Managing State on Kubernetes “Understand your orchestration system”
21 .Let’s skip over the basics • Unless you want to manually pin pods to nodes, you should use either: ○ StatefulSet: ■ decouples replicas from nodes ■ persistent address for each replica, DNS-based peer discovery ■ network-attached storage instance associated with each replica ○ DaemonSet: ■ pin one replica to each node ■ use node’s disk(s)
22 .Where do things go wrong?
24 .Don’t trust the defaults! • If you don’t speciﬁcally ask for persistent storage, you won’t get any ○ Always think about and specify where your data will live
25 .Don’t trust the defaults! • If you don’t speciﬁcally ask for persistent storage, you won’t get any ○ Always think about and specify where your data will live 1. Data in container 2. Data on host filesystem 3. Data in network storage
26 .Ask for a dynamically provisioned PersistentVolume
27 .Don’t trust the defaults! • Now your data is persistent • But how’s performance?
28 .Don’t trust the defaults! • If you don’t create and request your own StorageClass, you’re probably getting slow disks ○ Default on GCE is non-SSD (pd-standard) ○ Default on Azure is non-SSD (non-managed blob storage) ○ Default on AWS is gp2, which are backed by SSDs but with fewer IOPs than io2 • This really affects database performance
29 .Use a custom StorageClass