Highway to Hell or Stairway to Cloud

在Zalando,我们按比例运行PostgreSQL:几百个数据库集群,大小从几兆字节到10兆字节。什么是比运行一个高OLTP多兆字节PostgreSQL集群更大的挑战?它是这样一个集群从裸机数据中心环境迁移到AWS的过程。
需要解决的问题有很多,需要回答的问题有:
选择哪种实例类型:具有临时卷的i3或M4/R4+EBS卷? 我们应该试试亚马逊极光吗?
从AWS到数据中心没有直接连接。如果VPN不是一个选项,如何在AWS上构建副本并保持同步? 几百名员工使用该数据库进行特殊查询;理想情况下,他们应该通过旧的连接URL保留访问权限。
如何在AWS上备份如此庞大的数据库? 如果出现问题,我们应该能够切换回数据中心。
在本文中,我将详细介绍我们如何成功解决所有这些问题。

展开查看详情

1. Highway to Hell or Stairway to Cloud? Percona Live 2018, Frankfurt ALEXANDER KUKUSHKIN 06-11-2018

2. ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech The Patroni guy alexander.kukushkin@zalando.de Twitter: @cyberdemn 2

3. WE BRING FASHION TO PEOPLE IN 17 COUNTRIES 17 markets 7 fulfillment centers 23 million active customers 4.5 billion € net sales 2017 200 million visits per month 15,000 employees in Europe 3

4. FACTS & FIGURES > 300 databases on premise > 650 clusters in the Cloud (AWS) 4

5.AGENDA About the old setup Choosing your cloud options Retain access & make it secure Data migration & switchover Backup & recovery Conclusions 5

6. The old setup data center ● Provisioned in 2015 Primary ● DELL PowerEdge R730xd vip ● 2 * Intel Xeon E5-2667v3 (16 cores) ● 256 GB RAM ● 14 * 1.5 TB SSD in raid10 (10.5 TB) app1 Replica ● Network: 2 * 10 GBit/s app2 vip app3 ● PostgreSQL 9.6 6

7.Under the hood ● 3000 tables ○ two tables per event ■ Hot data (last 10 days) ■ Archived data ○ No primary/unique keys! ● About 100 millions inserts/day ● Size (before the migration): 10 TB ● Avg growth 2 TB per year 7

8. Free space: 500 GB Upgrade or migrate? 8

9. Migrate it! ● Minimize costs (cloud isn’t cheap) ● How to switch back to the data center if something goes wrong? ● How to retain access through the old connection url? ● Make it secure ● Minimal downtime 9

10.About the old setup Choosing your cloud options Retain access & make it secure Data migration & switchover Backup & recovery Conclusions 10

11. Candidates ● Amazon Aurora ● DIY ○ i3 instances ○ EBS backed instances ■ gp2 ■ io1 11

12. Amazon Aurora PROS CONS ● AWS promise decent performance ● $0.22 per 1 million I/O requests. ● Storage auto-scaling ○ All instances are sharing the ● plproxy extension is not available same storage! ● Price for storage is the same as for gp2 EBS, $0.119/GB-month 12

13. i3 instances PROS CONS ● Local NVMe volumes: ● Ephemeral volumes ○ low latency ○ Minimum 3 instances for HA ○ high bandwidth and throughput ● The biggest instance has “only” 15TB ● Low storage price ● 488 GB RAM 13

14. EBS backed instances (m4/r4) PROS CONS ● Data on EBS survives instance restart ● I/O latencies ● Easy to scale up or down ● Limited IOPS and bandwidth per ● Makes it possible to run only two volume: instances ○ gp2: 160 MB/s, 10000 IOPS ○ io1: 500 MB/s, 32000 IOPS ● Price per GB (comparing with i3) 14

15.15 10000 IOPS 30000 IOPS gp2 vs io1

16. Do benchmarks ● Cloud makes it very easy to conduct experiments ● Apply the load similar to production ○ Ideally, replicate production workload ● Use Spot instances to make it cheaper 16

17. It’s all about the money (and risks) Single Instance HA Cluster 17

18. The cloud setup ● r4.8xlarge ○ 32 vCPU cores ○ 244 GB RAM ○ 37500 IOPS ○ 875 MB/s ● 20 TB EBS gp2 ○ 6 * 3333 GB, raid 0 18

19.About the old setup Choosing your cloud options Retain access & make it secure Data migration & switchover Backup & recovery Conclusions 19

20. How to retain access via old conn_url? ● Possible options: ○ DNS ○ “Proxy” (iptables/HAProxy/pgbouncer) ● Think about security: ○ Internet traffic MUST be encrypted! ○ Some of the legacy applications are not using SSL ■ Nobody wants to fix legacy code :( ○ How to protect from Man-in-the-Middle attack? 20

21. Pgbouncer to the rescue data center Primary primary SSL 5432 vip pgbouncer app1 Replica app2 replica SSL 5432 vip pgbouncer app3 Cluster Security Group 21

22. Make it secure ● Setup CA ● Generate server and client keys ● Sign server and client certs with the CA private key ● Postgres must validate the client certificate from pgbouncer ● Pgbouncer must validate the Postgres server certificate 22

23. Postgres configuration ● postgresql.conf ○ ssl_cert_file = ‘server.crt’ ○ ssl_key_file = ‘server.key’ ○ ssl_ca_file = ‘ca.crt’ data center public ip ● pg_hba.conf ○ hostssl all all A.B.C.D/32 md5 clientcert=1 ○ hostnossl all all A.B.C.D/32 reject 23

24. Pgbouncer configuration ● Configure pgbouncer (in the data center) ○ pool_mode = session ○ auth_file = users.conf ○ auth_query = “SELECT * FROM pgbouncer.user_lookup($1)” ○ server_tls_sslmode = verify-ca ○ server_tls_ca_file = ca.crt ○ server_tls_cert_file = client.crt ○ server_tls_key_file = client.key 24

25.About the old setup Choosing your cloud options Retain access & make it secure Data migration & switchover Backup & recovery Conclusions 25

26. Possible options ● pg_basebackup + physical replication ○ via VPN? ○ via SSH tunnel? ● S3 compatible backup tool ○ WAL-E ○ pgBackRest ○ WAL-G 26

27. Keep it Simple data center etcd Primary Replica vip wa l-e l-e wa app1 wa Replica l- e Replica app2 S3 bucket: vip Backup + WAL app3 Cluster Security Group 27

28. Migration statistics ● “wal-e backup-push” in the DC: 12 hours ● “wal-e backup-fetch” on AWS: 9 hours ● Replay accumulated WAL: 4 hours replication lag in such setup is usually about a few seconds and determined by amount of write activity on the primary. 28

29. Switchover plan 1. Shutdown the main application writing into DB 2. Move the replica virtual ip to the pgbouncer host 3. Shutdown the replica in the data center 4. Move the primary virtual IP to the pgbouncer host 5. Shutdown the primary in the data center 6. Promote replica in the Cloud 7. Start the main application 8. Start replicas in the data center with the new recovery.conf 29