别再搬起石头砸自己的脚!

软件应该简单易用。它应该试图阻止您的自我打击。当今许多成功的软件具有简单易用的默认设置,使得用户入门极其容易。问题在于“入门经验”往往在用户或应用真正就绪之前就已转化为“生产”。本次演讲提及用户用 Kubernetes 进入生产时遇到的常见错误,并提出可行的解决方案让人们今后不再犯这些错误。我将谈及资源限制/限额、pod 中断预算、亲和性、升级策略、就绪探查、日志记录、监控,并解释为何您对于任何应用绝对需要这些方面才能认为该应用已真正生产就绪。我还将谈及,为何我认为软件就应如此,以及我认为人们能做些什么来为所有人改善生活。
展开查看详情

1.Stop hitting yourself! Michael Russell 1

2.“” Computers do what you say, Software should do what you mean. - Me 2

3.“” Software is easy to use but hard to run. - Also me 3

4.About me Australian living in the Netherlands Works at Elastic Responds to: Michael, Mick, Micky, Mike, Mikey, Crazybus, Rusty Likes: Food, travelling, gaming 4

5.Stopping hitting yourself! 5

6.Disclaimer A lot of this talk is going to come off pretty negative I actually do like Kubernetes I didn’t get paid to say that I didn’t get paid at all actually 6

7.Restore Dev “Are you busy?” (yes) Dev “We need to restore the production database.” (Oh noes) Me “Do we have backups?” Dev “Yeah there is a bash script and a cronjob somewhere.” 7

8.Bash bashing #!/bin/bash # Author: Some dude # # Changelog # 1995/05/23 - Some dude - Added mysql backups for important company database mysqldump --all-databases > /mnt/backup/$(date +%s).sql 8

9.20 years later... #!/bin/bash # Author: Some dude # # Changelog # 1995/05/23 - Some dude - Added mysql backups for important company database # 1995/05/27 - Some dude - Added debug logging and log backup duration # 1998/02/13 - New dude - Create seperate dumps per backups # 2010/09/03 - Ops dude - Add set -e to make sure backup actually fails # 2010/09/15 - Dev dude - Compress backups to save space 9

10.Looks good to me! # important don't remove!!! set -e echo “Started backup” start=`date +%s` timestamp=$(date+%s) for table in $(echo "use jobsite_db; show tables;" | mysql -uroot -psupersecretpassword); do echo "Backup up table: ${table}" mysqldump jobsite_db ${table} | gzip > /mnt/backup/${timestamp}/${table}.sql.gz echo "Finished backing up table: ${table}" done end=`date +%s` runtime=$((end-start)) echo "Backup successful!!! Finished in ${runtime} seconds!" 10

11.This is fine No table locking Need to restore the tables in the reverse order Need to write a restore script Only alerts when something went wrong 11

12.Things that email are good at 1 Humans sending a message to another human who already knows the original human 2 Saving articles for yourself that you plan on reading later 12

13.Things that email are not so good at Monitoring alerts Computers contacting humans Alerting the fire brigade 13

14.Wow, it actually worked Restore worked perfectly first go Site started up Site actually worked 14

15.‘Worked’ is a loaded term The exit code of my restore script was 0 Stuff was on the website Stuff was in the database There were actually only 8 jobs online 15

16.OK, let’s go back in time git blame: “initial commit from svn” “Maybe it was just a bad backup? Let’s look at yesterday's backup.” 16

17.Hmm... $ du -hs /mnt/backups/*/jobs.tar.gz 1.0M 1539761034/jobs.tar.gz 500.3M 1539674634/jobs.tar.gz 183.5M 1539588234/jobs.tar.gz 302.3M 1539501834/jobs.tar.gz 10.7M 1539415434/jobs.tar.gz 485.9M 1539329034/jobs.tar.gz 56.3M 1539242634/jobs.tar.gz 152.3M 1539156234/jobs.tar.gz 501.3M 1539069834/jobs.tar.gz 17

18.Let’s take a closer look # important don't remove!!! set -e echo “Started backup” start=`date +%s` timestamp=$(date+%s) for table in $(echo "use jobsite_db; show tables;" | mysql -uroot -psupersecretpassword); do echo "Backup up table: ${table}" mysqldump jobsite_db ${table} | gzip > /mnt/backup/${timestamp}/${table}.sql.gz echo "Finished backing up table: ${table}" done end=`date +%s` runtime=$((end-start)) echo "Backup successful!!! Finished in ${runtime} seconds!" 18

19.Changelog #!/bin/bash # Author: Some dude # # Changelog # 1995/05/23 - Some dude - Added mysql backups for important company database # 1995/05/27 - Some dude - Added debug logging and log backup duration # 1998/02/13 - New dude - Create seperate dumps per backups # 2010/09/03 - Ops dude - Add set -e to make sure backup actually fails # 2010/09/15 - Dev dude - Compress backups to save space 19

20.But, you were supposed to help me? It’s OK if every part of the script failed except for the last line default Do you care if any of them failed or just the last one? set -e OK, but do you care if parts of each line failed or just the last one? set -o pipefail 20

21.“” Software is easy to use but hard to run. - Me again 21

22.“” Data is easy to backup, but hard to restore. - Always me 22

23.Times have changed OK, this was 1995... And it was bash So things have improved right? 23

24.Hello Docker $ docker run hello-world Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. 24

25.“” The Docker daemon pulled the "hello-world" image from the Docker Hub. - Docker daemon (in haiku) 25

26.What actually happened The docker daemon checked if the image hello world with the tag latest was already downloaded It already existed, so it actually did nothing and didn’t connect to Docker hub 26

27.What didn’t happen The Docker Daemon didn’t check that this was the right image from the Docker Hub It didn’t download docker.io/library/hello-world It didn’t download the latest latest 27

28.Nit picking? latest is the default tag for the client - it doesn’t have to be latest, or updated or even exist in the registry If you already have some version of this tag it won’t pull a new one Your latest may not be my latest or even exist on Docker Hub anymore 28

29.Down the rabbit hole 3 servers, 3 rabbitmq nodes running in Docker with image “rabbitmq” All running 2.8.7 at the time Updates happened to the rabbitmq docker image No updates happened to our setup New server was added to the pool Yay, we just upgraded to version 3.0.0! 29