MongoDB Sharding 101



1.MongoDB Sharding 101

2.Agenda ● What is MongoDB? ● Single Instances ● Replica-set architecture ● Shard architecture ● Q&A

3.What is MongoDB

4.MongoDB MongoDB is a free and open-source, cross-platform, document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemata. MongoDB is developed by MongoDB Inc.,

5.MongoDB ● Open source document-oriented database ● Made to run in the cloud, easily scalable ● Quick installation and configuration ● Fast deployment, schema-free

6.Single Instances

7.Single Instance Single Instance ● Commonly used for development/tests ● Embedded systems

8.Single Instance MongoDB mobile is currently in Beta IoT, Android, TV, IOS


10.Replica-set ● Way to scale out ● Ability to elect a new primary in case of failure (auto HA) ● Data are the same in the replicas, asynchronous replication. ● Single master - PRIMARY

11.Replica-set How does a replica-set work? Main collection when running PRIMARY replica-set is; Secondary Secondary

12.Replica-set - oplogs How does a replica-set work? Secondaries - Pull oplog and apply locally

13.Replica-set Replica-set ● Heartbeat ● Votes ● Priority ● Arbiter ● Hidden

14.Replica-set How does a replica-set work? Chained replication Secondaries - Pull oplog and apply locally

15.Replica-set How does a replica-set work? Delayed Secondary Secondaries - Pull oplog and apply locally

16.Replica-set What if a heartbeat fails? An election process will promote a secondary to primary when the primary is no longer available. The most up to date instance has preference to become a primary, although it is not always true.

17.Replica-set What if a heartbeat fails? Each instance can have different priorities, in sum the higher the priority, the greater the probability of the instance becoming a primary. Arbiters do not hold data, they're only used to break ties in elections. Use an arbiter if a replica set has an even number of members.

18.Replica-set ● Tweaking the consistency - readPreference - writeConcern

19.Replica-set - writeConcern The default write concern value is "1", which means that once the primary receives the operation, it is considered complete. If an election is triggered, we can lose this operation as it might not be replicated. It is possible to specify a different writeConcern per query or per connection. The possible values are: 1, 2, 3, N / "majority" / "tag_name"

20.Replica-set - readPreference Read Preference is primary. Every single read will come from the primary if we don't specify a parameter for the driver to read from secondaries. Important! Reading from secondaries may return outdated, stall data.. However, this is a very common way to scale out read intensive applications.

21.Cluster (Shards)

22.Sharded Cluster Config Servers mongos Shard1 Shard2

23.Sharded Cluster architecture Mongos Config Shards

24.Sharded Cluster - chunks shard1 shard2 A database can have multiple collections, each collection can be sharded differently. Each shard has chunks and their documents.

25.Sharded Cluster Primary Shard Data are split among shards in small chunks by the shard key. Each chunk has 64MB data (default). Chunks are distributed among the shards but can also live in a single one.

26.Sharded Cluster - Shard key Shard key: Field(s) that will be used to distribute the data among the shards. Once data is partitioned there is no way to change the shard key. (No other key except _id/not part of shard key columns could be unique - you can explain about this) A shard key can be used to distribute data in: ● Hashed ● Range ● Zones

27.Sharded Cluster Hashed Shard key:

28.Sharded Cluster Range Shard key:

29.Sharded Cluster TAG Shard key: