MongoDB HA, What Can Go Wrong

冗余和高可用性是所有生产部署的基础。使用MongoDB,这可以通过部署副本集来实现。在本文中,我们将探讨MongoDB复制是如何工作的,以及复制集的组件是什么。使用错误部署配置的示例,我们将重点介绍如何在生产环境中正确运行副本集,无论是在内部部署还是在云环境中。
-MongoDB复制工作原理
-副本集组件/部署类型
-错误部署配置的实践
-隐藏节点、仲裁节点、优先级0节点
-单个区域中的可用性区域和HA
-监视副本集状态

展开查看详情

1.MongoDB HA, what can go wrong? May-29-2019

2. About Me {"name": "Igor Donchovski", "live_in": "Skopje", "email": "donchovski@pythian.com", "current_role": "Lead database consultant", "education": [{"type": "College", "name": "FEIT", "graduated": "2008", "university": "UKIM"}, {"type": "Master", "name": "FINKI", "graduated": "2013", "university": "UKIM"}], "work": [{"role": "Web developer", "start": "2007", "end": "2012", "company": "Gord Systems"}, {"role": "DBA", "start": "2012", "end": "2014", "company": "NOVP"}, {"role": "Database consultant", "start": "2014", "end": "2016", "company": "Pythian"}, {"role": "Lead database consultant", "start": "2016", "company": "Pythian"}], "certificates": [{"name": "C100DBA", "year": "2016", "description": "MongoDB certified DBA"}], "social": [{"network": "LinkedIn", "url": "https://mk.linkedin.com/in/igorle"}, {"network": "Twitter", "url": "https://twitter.com/igorle"}], "interests": ["Hiking", "Biking", "Traveling"], "hobbies": ["Painting", "Photography", "Cooking"], "proud_of": ["Volunteering", "Helping the Community"]} © 2019 Pythian. Confidential

3.Overview • What is replica set, how replication works • Replication concept • Replica set features, deployment architectures • Hidden nodes, Arbiter nodes, Priority 0 nodes • Production failures • Monitoring replica set • QA © 2019 Pythian. Confidential

4.Replication © 2019 Pythian. Confidential

5.Replica Set • Group of mongod processes that maintain the same data set • Redundancy and high availability • Increased read capacity (scaling reads) • Automatic failover # Members # Nodes Required to Elect New Primary Fault Tolerance 3 2 1 priority:1 votes:1 4 3 1 5 3 2 6 4 2 7 4 3 priority:1 votes:1 priority:1 votes:1 © 2019 Pythian. Confidential

6.Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 1. 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary © 2019 Pythian. Confidential

7.Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 1. 4. Secondaries copy the Primary oplog 2. oplog 5. Secondary can use sync source Secondary © 2019 Pythian. Confidential

8.Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 1. 4. Secondaries copy the Primary oplog 2. oplog 5. Secondary can use sync source Secondary 3. 3. © 2019 Pythian. Confidential

9.Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 1. 4. Secondaries copy the Primary oplog 2. oplog 5. Secondary can use sync source Secondary 3. 3. 4. 4. © 2018 Pythian. Confidential

10.Replication Concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 1. 4. Secondaries copy the Primary oplog 2. oplog 5. Secondary can use sync source Secondary* 3. 3. 4. 4. *settings.chainingAllowed (true by default) 5. © 2019 Pythian. Confidential

11.Replica Set Oplog • Special capped collection that keeps a rolling record of all operations that modify the data stored in the databases • Idempotent • Default oplog size For Unix and Windows systems Storage Engine Default Oplog Size Lower Bound Upper Bound In-memory 5% of physical memory 50MB 50GB WiredTiger 5% of free disk space 990MB 50GB MMAPv1 5% of free disk space 990MB 50GB © 2019 Pythian. Confidential

12.Configuration © 2019 Pythian. Confidential

13.Configuration Options • 50 members per replica set (7 voting members) • Arbiter node • Priority 0 node • Hidden node • Delayed node © 2019 Pythian. Confidential

14.Arbiter Node • Does not hold copy of data • Votes in elections Arbiter hidden : true © 2019 Pythian. Confidential

15.Priority 0 Node Priority - floating point (i.e. decimal) number between 0 and 1000 • Cannot become primary, cannot trigger election • Visible to application (accepts reads/writes) • Votes in elections Secondary priority : 0 © 2019 Pythian. Confidential

16.Hidden Node • Not visible to application • Never becomes primary, but can vote in elections • Use cases ○ Reporting ○ Backups Secondary hidden : true priority : 0 hidden: true priority:0 hidden : true © 2019 Pythian. Confidential

17.Delayed Node • Must be priority 0 member • Should be hidden member (not mandatory) • Mainly used for backups (historical snapshot of data) • Recovery in case of human error Secondary slaveDelay : 3600 priority : 0 hidden : true © 2019 Pythian. Confidential

18.Everyone on the same page? © 2019 Pythian. Confidential

19.Failures © 2019 Pythian. Confidential

20.Small Oplog Size 1. Primary/Secondary node down ○ Node failure ○ Planned maintenance 2. Automatic Failover …… (several hours later) 3. New Primary overwrites latest oplog 4. Failed Node needs resync MongoDB >= 3.6: db.adminCommand({replSetResizeOplog: 1, size: 32000}) © 2019 Pythian. Confidential

21.Arbiter Nodes ● Votes in election ● Does not hold copy of data ● If 2 nodes are down, no majority to elect new Primary ● Fault tolerance is still 1 node Heartbeat ● 4 data nodes + 1 Arbiter makes more sense © 2019 Pythian. Confidential

22.Priority 0 Nodes ● Application driver sends writes to Primary ● Reads go to Primary by default ● Secondaries can serve reads ● Read preference ○ primary (default) ○ primaryPreferred ○ secondary ○ secondaryPreferred ○ nearest © 2019 Pythian. Confidential

23.Priority 0 Nodes • Primary node fails • Replica set starts election for new Primary • Zero nodes eligible for Primary • Application can not send writes • Database is read only* *depends on read preference setting © 2019 Pythian. Confidential

24.Hidden Nodes ● Application driver sends writes to Primary ● Reads go to Primary by default ● Secondaries cannot serve reads ● Read preference ○ primary © 2019 Pythian. Confidential

25.Hidden Nodes • Primary node fails • Replica set starts election for new Primary • Zero nodes eligible for Primary (priority:0) • Application can not send writes/reads • Downtime © 2019 Pythian. Confidential

26.Hardware • Primary node fails • Secondary elected as new Primary • Working set does not fit in memory 64GB RAM, 16 CPU • Performance degradation • Application stalls 32GB RAM, 8 CPU 32GB RAM, 8 CPU © 2019 Pythian. Confidential

27.Hardware • Dataset grows • No Disk space on Secondary • mongod process fails Disk: 300GB • 2 nodes replica set • Zero tolerance for failures Disk: 300GB Disk: 200GB © 2019 Pythian. Confidential

28.Network ● Heartbeat lost ● Primary step down ● New Primary election ● Application timeout* ● Rollback Best Practice: Test Primary step down for your application *Retryable writes since MongoDB 3.6 © 2019 Pythian. Confidential

29.Cloud Deployment • All replica set members deployed in single Availability Zone • Availability Zone #1 goes down • Downtime Cloud Region #1 Availability Zone #1 © 2019 Pythian. Confidential