Running MongoDB in Production part 1

-MongoDB Percona服务器的外部认证功能


1.Running MongoDB in Production, Part I Tim Vaillancourt Sr Technical Operations Architect, Percona Speaker Name

2.`whoami` { name: “tim”, lastname: “vaillancourt”, employer: “percona”, techs: [ “mongodb”, “mysql”, “cassandra”, “redis”, “rabbitmq”, “solr”, “mesos” “kafka”, “couch*”, “python”, “golang” ] }

3.Agenda ● Backups ○ Logical vs Binary ○ Architecture ○ Percona-Lab/mongodb_consistent_backup ● Security ○ MongoDB Authorization ○ System, Network and Filesystem best practices ○ Connection and data encryption ● Monitoring ○ Methodology ○ Important Metrics ○ Percona Monitoring and Management

4.Terminology ● Data ○ Document: single *SON object, often nested ○ Field: single field in a document ○ Collection: grouping of documents ○ Database: grouping of collections ○ Capped Collection: A fixed-size FIFO collection ● Replication ○ Oplog: A special capped collection for replication ○ Primary: A replica set node that can receive writes ○ Secondary: A replica of the Primary that is read-only

5.Terminology ● Replication ○ Election: The process to determine a new Primary member ○ Voting: The process of a single node voting in an election ○ Hidden-Secondary: A replica that cannot become Primary ○ Majority: “Most” of the members are available or have acknowledged a change ■ 3 node replica set = 2 nodes required for majority ■ 5 node replica set = 3 nodes required for majority

6.Terminology ● Sharding ○ Shard: a replica set or single node containing a piece of the cluster data ○ Shard Key: the document key used to partition data ○ Chunk: a range of the shard key ○ Partitioned Collection: a collection distributed amongst shards ○ Config Server: a MongoDB server dedicated to storing the sharding metadata

7. Backups “An admin is only worth the backups they keep” ~ Unknown

8.Backups: Logical ● ‘mongodump’ tool from mongo-tools project ● Supports ○ Multi-threaded dumping in 3.2+ ○ Optional inline gzip compression of data ○ Optional dumping of oplog for single-node consistency ○ Replica set awareness (via readPreference) ■ Ie: primary, primaryPreferred, secondary, secondaryPreferred, nearest ● Process ○ Tool issues .find() with $snapshot query ○ Stores BSON data in a file per collection ○ Stores BSON oplog data in “oplog.bson”, even when compressed

9.Backups: Logical ● Useful for... ○ upgrades of very old systems, eg: 2.6 -> 3.4 upgrade ○ protection from binary-level/storage-engine corruption ○ export/import to different CPU architecture ● Limitations ○ Index metadata only in backup ■ Indexes are rebuilt entirely, in serial!! ■ Often indexing process takes longer than restoring the data! ■ Expect hours or days of restore time ○ Not Sharding aware ■ Sharded backups are not Point-in-Time consistent

10.Backups: Logical ● Limitations ○ Fetch from storage-engine, serialization, networking, etc is very inefficient ○ Oplogs fetched in batch at end / oplog must be as long as the backup run-time ○ Wire Protocol Compression (added in 3.4+) not supported yet: (Please vote/watch Issue!)

11.Backups: Binary ● Options ○ Cold Backup ○ LVM Snapshot ○ Hot Backup ■ Percona Server for MongoDB (FREE!) ■ MongoDB Enterprise Hot Backup (non-free) ■ NOTE: MMAPv1 not supported ● Benefits ○ Indexes are backed up == faster restore! ○ Storage-engine format backed up == faster backup AND restore!

12.Backups: Binary ● Limitations ○ Increased backup storage requirements ○ Compression is storage-engine dependant ○ CPU Architecture limitations (64-bit vs 32-bit) ○ Cascading corruption ○ Batteries not included ■ Not Sharding aware ■ Not Replica Set aware ● Process ○ Cold Backup ■ Stop a mongod SECONDARY, copy/archive dbPath

13.Backups: Binary ● Process ○ LVM Snapshot ■ Optionally call ‘db.fsyncLock()’ (not required in 3.2+ with Journaling) ■ Create LVM snapshot of the dbPath ■ Copy/Archive dbPath ■ Remove LVM snapshot (as quickly as possible!) ■ NOTE: LVM snapshots can cause up to 30%* write latency impact to disk (due to COW)

14.Backups: Binary ● Process ○ Hot Backup (PSMDB or MongoDB Enterprise) ■ Pay $$$ for MongoDB Enterprise or download PSMDB for free(!) ■ db.adminCommand({ createBackup: 1, backupDir: "/data/mongodb/backup" }) ■ Copy/archive the output path ■ Delete the backup output path ■ NOTE: RocksDB-based createBackup creates filesystem hardlinks whenever possible! ■ NOTE: Delete RocksDB backupDir as soon as possible to reduce bloom filter overhead!

15.Backups: Architecture ● Risks ○ Dynamic nature of Replica Set ○ Impact of backup on live nodes ● Example: Cheap Disaster-Recovery ○ Place a ‘hidden: true’ SECONDARY in another location ○ Optionally use cloud object store (AWS S3, Google GS, etc)

16.Backups: Architecture ● Example: Replica Set Tags ○ “tags” allow fine-grained server selection with key/value pairs ○ Use key/value pair to fence various application workflows ○ Example: ■ { “role”: “backup” } == Backup Node ■ { “role”: “application” } == App Node

17.Backups: mongodb_consistent_backup ● Python project by Percona-Lab for consistent backups ● URL: ● Best-effort support, not a “Percona Product” ● Created to solve limitations in MongoDB backup tools: ○ Replica Set and Sharded Cluster awareness ○ Cluster-wide Point-in-time consistency ○ In-line Oplog backup (vs post-backup) ○ Notifications of success / failure

18.Backups: mongodb_consistent_backup ● Extra Features ○ Remote Upload (AWS S3, Google Cloud Storage and Rsync) ○ Archiving (Tar or ZBackup deduplication and optional AES-at-rest) ○ CentOS/RHEL7 RPMs and Docker-based releases (.deb soon!) ○ Single Python PEX binary ○ Multithreaded / Concurrent ○ Auto-scales to available CPUs

19.Backups: mongodb_consistent_backup ● Low-Impact ○ Tool focuses on low impact ○ Uses Secondary nodes only ○ Considers (Scoring) ■ Replication Lag ■ Replication Priority ■ Replication Health / State ■ Hidden-Secondary State (preferred by tool) ■ Fails if chosen Secondary becomes Primary (on purpose)

20.Backups: mongodb_consistent_backup ● Future ○ Incremental Backups ○ Binary-level Backups (Hot Backup, Cold Backup, LVM, Cloud-based, etc) ○ More Notification Methods (PagerDuty, Email, etc) ○ Restore Helper Tool ○ Instrumentation / Metrics ○ <YOUR AWESOME IDEA HERE> we take GitHub PRs (and it’s Python)!

21.Backups: mongodb_consistent_backup ● Simple Restore ○ Seamless restore: “mongorestore --oplogReplay --gzip --dir /path/to/backup” ● Restore an Entire Cluster ○ Mongorestore backups of config servers ■ If restoring old/SCCC config servers, restore to every node ■ If restoring replica-set config servers ● Ensure Replica Set is initiated (rs.initiate() / rs.config()) ● Ensure SECONDARY members are added (via PRIMARY) ● Restore to PRIMARY only ○ Update “config.shards” documents if shard hosts/ports changed

22.Backups: mongodb_consistent_backup ● Restore an Entire Cluster ○ Mongorestore each shard from backup subdirectory (matches shard name) ○ Start mongos process and test / QA ■ Tip: stopping the balancer may simplify troubleshooting any problems

23. Security “Think of the network like a public place”

24.Security: Authorization ● Always enable auth on Production Installs! ● Built-in Roles ○ Database User: Read or Write data from collections ■ “All Databases” or Single-database ○ Database Admin: Non-RW commands (create/drop/list/etc) ○ Backup and Restore: ○ Cluster Admin: Add/Drop/List shards ○ Superuser/Root: All capabilities

25.Security: Authorization ● User-Defined Roles ○ Exact Resource+Action specification ○ Very fine-grained ACLs ■ DB + Collection specific

26.Security: Filesystem Access ● Use a service user+group ○ ‘mongod’ or ‘mongodb’ on most systems ○ Ensure data path, log file and key file(s) are owned by this user+group ● Data Path ○ Mode: 0750 ● Log File ○ Mode: 0640 ○ Contains real queries and their fields!!! ■ See Log Redaction for PSMDB (or MongoDB Enterprise) to remove these fields

27.Security: Filesystem Access ● Key File(s) ○ Files Include: keyFile and SSL certificates or keys ○ Mode: 0600

28.Security: Network Access ● Firewall ○ Single TCP port ■ MongoDB Client API ■ MongoDB Replication API ■ MongoDB Sharding API ○ Sharding ■ Only the ‘mongos’ process needs access to shards ■ Client driver does not need to reach shards directly ○ Replication ■ All nodes must be accessible to the driver

29.Security: Network Access ● Internal Authentication: Use a key to use inter-node replication/sharding ● Creating a dedicated network segment for Databases is recommended! ● DO NOT allow MongoDB to talk to the internet at all costs!!!