The Backup Methods Available for MongoDB



1. The Backup Methods Available for MongoDB Adamo Tonete

2.Agenda Backup importance for companies and backup plans. Available Methods: - Disk Snapshot - mongodump - rsync or copy - Point in time backup from Percona - MongoDB Cloud / Ops Manager backup (on-prem) - Hot Backup Q&A 2

3.Replica-set and Shard Concepts 101

4.Replicasets and Shard concepts 4

5.Replicaset and Shard concepts 5

6.Why is Backup Important?

7.Why is Backup Important? Data usually is the most valuable asset in a company. A company with severe data loss may not even come back to the business. Could you imagine a bank losing all its data or an e-commerce offline for 1 week? 7

8.Why is Backup Important? Data loss can occur in 3 main different situations: 1) Human Error 2) DB failure/corruption 3) System failure/collapse 4) Security Breach 8

9.Backup Plan

10.Backup Plan/Disaster Recovery Plan Backup Plan Choose the best RPO, RTO for your company. - Recovery POINT Objective - Recovery Time Objective 10

11.Why is Backup Important? ● RTO is how much time can the company would accept to be "offline". ● How long should take to have my application back online? 11

12.Why is Backup Important? ● RPO is what POINT in time must the backups be when we have a data loss/incident. ● This is an extreme important metric to know how often a backup need to be made. 12

13.Backup Plan/Disaster Recovery Plan 1TB replica-set 13

14.Backup Plan/Disaster Recovery Plan RTO = 20 minutes RPO = 30 minutes 1TB replica- set 14

15.Backup Plan/Disaster Recovery Plan RTO = 20 minutes RPO = 30 minutes 95% read 5% writes 1TB replica- set 15

16.Backup Plan/Disaster Recovery Plan RTO = 20 minutes RPO = 30 minutes 2000 inserts/day 3000 review day 1TB replica- set 16

17.Backup Plan/Disaster Recovery Plan We have 1TB data and... 5 GB is for user login 2 GB day of new writes ~ 900 GB of reviews and 40GB is the favorites (90% of the traffic) Favorites are updated every 20 minutes asynchronous. 17

18.Backup Plan/Disaster Recovery Plan 90% traffic - 10% data 10% traffic - 90% data Login Historical data/non Favorites Comment/upvote fav 18

19.Backup Plan/Disaster Recovery Plan ● Backup the user database every 30 minutes ● Backup the favorite topics every 20 minutes (right after the sync) ● Backup the new comments in an incremental way (using filter for created_at > last backup) ● Backup the history aged/non favorites collection once per day 19

20.Backup Plan/Disaster Recovery Plan 5 GB user - 30 minutes 40 GB favorites - 20 minutes Comments every hour - 500 MB 900 GB - non favorite data 20

21.Backup Plan/Disaster Recovery Plan What feature should have priority in a recovery situation? 21

22.Backup Plan/Disaster Recovery Plan Login Favorites Comment/upvote 22 90% traffic - 10% data

23.Replica-sets and Shard concepts ● With 10% of the data the environment is handling 90% of the requests and slowly recovering the old data. ● Not all the companies consider this as a full RTO but other do. It depends on the expectations. 23

24.Disk Snapshot

25.Disk Snapshot Disk snapshot is a full copy of the data currently in a disk. The snapshot process may take a while but the advantage is when a restore is needed the files are already ready for the database. No need to create indexes or run a file restore, the recover time is fast. 25

26.Disk Snapshot Advantages: Straight forward approach, take a copy of what is in the disk and that’s all. 26

27.Disk Snapshot Disadvantages May slow down the database while the snapshot is being created. Can take several hours depending on the disk speed No "partial" restore all or nothing 27

28.Disk Snapshot Backup type: Binary copy Time to backup: High Complexity: Low Time to recover: Low 28

29.Rsync or scp to a different host