Preparing for disaster recovery on MySQL and PostgreSQL

Preparing for disaster recovery on MySQL and PostgreSQL
Carlos Tutte
Carlos Tutte is a Support Engineer with Percona


1.Preparing for disaster recovery on MySQL and PostgreSQL Carlos Tutte, Support Engineer April 23, 2019

2. Agenda ● Introduction ● Backups ● HA ● Load balancing ● Monitoring ● How to recover? ● Conclusions ● Questions


4. Introduction ● What is the Recovery Time Objective (RTO)? ● What is the Recovery Point Objective (RPO)? ● What Risks should be mitigated?


6. Backup types ● Logical ● Physical ○ Cold backup (offline) ○ Hot backup (online) ○ Warm backup ● Snapshot based ● (Delayed) Replicas

7. Why to take backups? ● For system recovery ● For auditing purposes ● Migrations to new versions/hw ● Testing/QA performance ● Create/restore other environments

8. Backups in MySQL:logical ● Mysqldump mysqldump -u root -p --all-databases --master-data --single-transaction --routines --events --triggers --hex-blob > /path/to/backup.sql ● Mydumper mydumper -uroot -p -t 4 --trx-consistency-only --triggers --events --routines -o /path/to/dump/ -L /path/to/log/mydumper.log -v 3 ● Mysqlpump mysqlpump -u root -p --all-databases --default-parallelism=8 --single-transaction --routines --events --triggers --hex-blob > /path/to/backup.sql ● select….into outfile

9. Backups in MySQL: physical ● Percona Xtrabackup (PXB) xtrabackup --user=root --password=pass--backup --target-dir=/data/backups/ ● MySQL enterprise backup mysqlbackup -uroot -ppass --backup_dir=/backup-dir/ ● Rsync o cp

10. Backups in PostgreSQL:logical ● Pg_dump pg_dump -U root -W -F t database_name > /path/backup_file.tar ● Pg_dumpall pg_dumpall -U root > /path/backup.sql Since 9.3, pg_dump is multi threaded

11. Backups in PostgreSQL:physical ● Pg_basebackup pg_basebackup -h mydbserver -D /path/to/backupdir/ ● pgBackRest sudo -u postgres pgbackrest --stanza=demo --type=incr --log-level-console=info backup ● Barman barman backup server-name barman receive-wal --create-slot server-name ● Rsync o cp

12. DB Agnostic: Binary logs Binary logs (WAL = write ahead logs for Psql) are not a backups, but do contain executed statements and are needed for PITR (point in time recovery). ● This can be done with barman or pgbackrest in Psql ● In MySQL you need to use mysqlbinlog utility with --read-from-remote-server flag

13. Backup tips ● Save backup on a remote location! ● Keep latest backup copy on server for faster recovery ● Have logical+binary forms ● Test your backups! ● Logical backup of physical backup ● Queries for referential integrity verification ● Queries for index scanning ● Have enough binary logs

14. Backup tips cont ● Use incremental backups ● Have incremental backups ready ● Take backups from a replica ● Have multiple backup copies in case data was altered and change was not noticed

15.High availability is a quality of a system or component that assures a high level of operational performance for a given period of time. HIGH AVAILABILITY

16. High Availability Availability Max Downtime per year 90% 36.53 days 99% 3.65 days 99.9% 8.77 hours 99.99% 4.38 minutes 99.999% 26.30 seconds

17. Reasons for HA Business won’t earn money if their servers are not working. Common causes for unavailability: ● Maintenance ● Bugs/OS/HW problems ● Human errors ● Natural disasters ● Electrical disruptions ● Intentional damage

18. HA in MySQL Only consider ROW binlog format ● Async slave ● Delayed slave ● Semi sync slave ● Virtually synchronous ○ Group Replication ○ Percona XtraDB Cluster (PXC) ● DRBD replication Hot-standby vs cold-standby

19. HA in PostreSQL ● Streaming replication ● Delayed replication ● DRBD replication Hot-standby vs cold-standby For Psql there is no native clustering Solution. It’s based on 3rd party software (Stolon, pgpool-II, pgBouncer) which monitores and load balances between many nodes

20.Load balancing have many uses but in the context of this talk, we use them as a way to redirect load to different servers in the cluster in case of failure LOAD BALANCING

21. Load balancing products ● DB Agnostic HAProxy, F5 load balancer ● MySQL: ProxySQL, MaxScale (MariaDB), MySQL Router ● Psql: Stolon, pgpool-II, pgBouncer


23. Monitoring ● Backup server disk space ● Last backup state ● Last backup verification test run ● Binlog availability


25. How to recover? ● Load balancing to other nodes ● Master-slave topologies: promote slave ● Shared disk failover ● Failover: ○ MySQL: MHA, Orchestrator ○ Psql: Patroni, repmgr (replication manager), PAF (Psql automatic failover) + pacemaker, pglookout

26. How to recover? cont ● Apply logs in delayed slave ○ START SLAVE UNTIL ● Restore backups + PITR ○ Apply binlogs from fake master


28. Conclusions ● For every server or system: ○ Specify RTO and RPO ○ Identify failure points (FP) ○ Design recovery plan for every FP ○ Implement HA and restore strategy ○ Test and monitor everything! ○ Automatize!

29. References ● MySQL reference manual ● PostgreSQL reference manual ● Percona blog ● PMM demo