Database Backup and Recovery Best Practices



1.Database Backup and Recovery Best Practices (with a focus on MySQL) • Manjot Singh, Data & Infrastrustructure Architect

2.About Manjot Singh • MySQL Fanatic • Long time user (~16 years) • Database and Systems Administrator for a number of years • Oracle, SQL Server, MySQL • Now an Architect at Percona, helping many companies scale databases and systems and implement best practices

3.About Percona • Percona Delivers enterprise-class software, support and consulting solutions for MySQL and MongoDB • We have a global 24x7x365 consulting team • Percona consults with some of the largest companies on the internet as well as new startups • Percona Consultants have decades of experience solving complete database and performance issues

4. Do I Need to Take Database Backups?


6.Why Backup and Recovery? • Backup and Recovery processes are a critical part of any application infrastructure. • A well tested backup and recovery system can be the difference between a minor outage and the end of your business.

7.Recovery Requirements Determine Backups Three important questions define backup and recovery systems for organizations: • Recovery Time Objective • Recovery Point Objective • Risk Mitigation

8.Recovery Time Objective (RTO) • The amount of time that may pass during a disruption before it exceeds the maximum allowable time specified in the Business Continuity Plan • How long does it take you to get back up?

9.Recovery Point Objective (RPO) • Duration of time and service level within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with a break in continuity • “How much data can I lose?” • When was the last backup before failure?

10.Risk Mitigation What failure scenarios must the data be protected against?

11.Risk Mitigation Backups: • Can help mitigate failure scenarios • (Multiple) Host Failure • (Multiple) Data Center Failure • Data Corruption or Loss • Satisfy Legal Regulations • Legislation • Regulation • Fulfill Industry Standards • PCI • HIPAA • etc

12.Designing a Backup and Recovery System • Based on the RTO, RPO, and address whatever risks are applicable • Not all data should have the same requirements • For example HR database could have a longer RTO than your customer facing product database. Cost of 1 hour outage for HR is small compared to not accepting orders for 1 hour.

13.Types of Backups

14.Physical or Binary Backup • A physical backup typically refers to a copy of the entire on disk database • Used to mitigate a single or multiple host failure • Can build replicas • Quick Full Recovery Time • Slow to recover single row or table (user error)

15.Physical or Binary Backup • MySQL • Percona XtraBackup (takes a copy of the data directory and applies logs) • MySQL Enterprise Backup • LVM or VM Snapshot

16.XtraBackup - Backup Standard use: [mysql-master]# xtrabackup --backup --target-dir=/backup --user=root --password=X [mysql-master]# xtrabackup --prepare --target-dir=/backup Prepare was previously apply-log

17.XtraBackup - Restore Standard use: [mysql-restore]# scp –r mysql-master:/backup [mysql-restore]# xtrabackup --copy-back --target-dir=/backup/ [mysql-restore]# chown –R mysql.mysql /var/lib/mysql

18.Logical Backup • Generates SQL files containing data that can regenerate a database • Easily restore single row, table, or database • Works across versions • SQL (such as mysqldump) can be parsed with standard UNIX tools • Restore process automatically replicated • Long Full Restore Time

19.Logical Backup • MySQL • mysqldump • mysqlpump (5.7+) • mydumper • Delayed slave can also fit similar use cases

20.mysqldump - Backup Standard use: [mysql-master]# mysqldump –triggers –routines –events –single-transaction –all-databases -r dump.sql

21.mysqldump - Restore Standard use: [mysql-restore]# scp mysql-master:dump.sql [mysql-restore]# mysql < dump.sql [mysql-restore]# # OR: [mysql-restore]# mysql mysql> source dump.sql

22.Which is best? BOTH! Binary Backups Address Host Logical Backups address data Failure corruption and user error

23.Binary Logs • Addresses RPO • Logs should be rotated and backed up • Combine with a Binary Backup to allow Point in Time recovery • If you backup logs every hour, your RPO is 1 hour in case of a failure • Can be reduced, but not to the point where the i/o and network traffic interfere with database operations • Can stream logs for real-time RPO • Adding complexity increases recovery time due to more complicated recovery procedures

24.Binary log streaming mysqlbinlog --raw --read-from-remote-server --stop-never --host mysql- master --port 3306 -u repl –p somepass

25.Binary log streaming

26.Binary log streaming

27.Binary log streaming nohup server2.conf 2>&1 > livebinlog-server2.log & More information:

28.Other Considerations

29.Retention • Protect Against Data Center Failure • Keep multiple backups • Counting on a single backup is a risk • Corruption • Malicious data • User error • Store important data offsite § Cloud storage – Google Nearline, Amazon Glacier § Vaulting – Ship drives/tapes § DR site