管理MongoDB中的数据和运行分配

在sharded mongodb集群中,规模和数据分布是由您的shard键定义的。即使在选择正确的切分键时,仍需要进行持续的维护和检查以保持最佳性能。
此演示文稿将回顾碎片键的选择以及块的分布如何创建场景,您可能需要在碎片群集中手动移动、拆分或合并块。需要这些操作的场景可以同时存在于优化和次优化的切分键中。示例用例将提供有关选择切分密钥、检测问题、可能遇到这些场景的原因以及可以采取的纠正问题的具体步骤的提示。

展开查看详情

1. Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1

2.Introduction Antonios Giannopoulos Jason Terpko www.objectrocket.com 2

3.Overview • Sharded Cluster • Shard Keys Selection • Shard Key Operations • Chunk Management • Data Distribution • Orphaned documents • Q&A www.objectrocket.com 3

4.Sharded Cluster • Cluster Metadata • Data Layer • Query Routing • Cluster Communication www.objectrocket.com 4

5.Cluster Metadata

6.Data Layer … s1 s2 sN

7.Replication Data redundancy relies on an idempotent log of operations.

8.Query Routing … s1 s2 sN

9.Sharded Cluster … s1 s2 sN

10.Cluster Communication How do independent components become a cluster and communicate? ● Replica Set ○ Replica Set Monitor ○ Replica Set Configuration ○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry ○ Misc: replSetName, keyFile, clusterRole ● Mongos Configuration ○ configDB Parameter ○ Network Interface ASIO Shard Registry ○ Replica Set Monitor ○ Task Executor ● Post Add Shard ○ Collection config.shards ○ Replica Set Monitor ○ Task Executor Pool ○ config.system.sessions

11.Primary Shard Database <foo> … s1 s2 sN

12.Collection UUID With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections

13.Collection UUID With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections Important • UUID’s for a namespace must match • Use 4.0+ Tools for a sharded cluster restore

14.Shard Key - Selection • Profiling • Identify shard key candidates • Pick a shard key • Challenges www.objectrocket.com 14

15.Sharding Shards are Physical Partitions Chunks are Logical Partitions Database <foo> Collection <foo> … s1 s2 sN chunk chunk chunk chunk chunk chunk 15

16. What is a Chunk? The mission of the shard key is to create chunks The logical partitions your collection is divided into and how data is distributed across the cluster. ● Maximum size is defined in config.settings ○ Default 64MB ● Before 3.4.11: Hardcoded maximum document count of 250,000 ● Version 3.4.11 and higher: 1.3 configured chunk size by the average document size ● Chunk map is stored in config.chunks ○ Continuous range from MinKey to MaxKey ● Chunk map is cached at both the mongos and mongod ○ Query Routing ○ Sharding Filter ● Chunks distributed by the Balancer ○ Using moveChunk ○ Up to maxSize

17.Shard Key Selection Profiling Helps identify your workload Requires Level 2 – db.setProfilingLevel(2) May need to increase profiler size www.objectrocket.com 17

18.Shard Key Selection Profiling Candidates Export statements types with frequency Export statement patterns with frequency Produces a list of shard key candidates www.objectrocket.com 18

19.Shard Key Selection Build-in Profiling Candidates Constraints Key and Value is immutable Must not contain NULLs Update and findAndModify operations must contain shard key Unique constraints must be maintained by a prefix of shard key A shard key cannot contain special index types (i.e. text) Potentially reduces the list of candidates www.objectrocket.com 19

20.Shard Key Selection Build-in Schema Profiling Candidates Constraints Constraints Cardinality Monotonically increased Data Hotspots Operational Hotspots Targeted vs Scatter-gather operations www.objectrocket.com 20

21.Shard Key Selection Build-in Schema Profiling Candidates Future Constraints Constraints Poor cardinality Growth and data hotspots Data pruning & TTL indexes Schema changes Try to simulate the dataset in 3,6 and 12 months www.objectrocket.com 21

22.Shard key - Operations • Apply a shard key • Revert a shard key www.objectrocket.com 22

23.Apply a shard key Create the associated index Make sure the balancer is stopped: sh.stopBalancer() sh.getBalancerState() Apply the shard key: sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1}) Allow a burn period Start the balancer www.objectrocket.com 23

24.Sharding sh.ShardCollection({foo.foo},<key>) Burn Period sh.startBalancer() Database <foo> Collection <foo> … s1 s2 sN chunk chunk chunk chunk chunk chunk

25.Revert a shard key Two categories: o Affects functionality (exceptions, inconsistent data,…) o Affects performance (operational hotspots…) Dump/Restore o Requires downtime – write and in some cases read o Time consuming operation o You may restore on a sharded or unsharded collection o Better pre-create indexes o Same or new cluster can be used o Streaming dump/restore is an option o On special cases, like time series data can be fast www.objectrocket.com 25

26.Revert a shard key Dual writes o Mongo to Mongo connector or Change streams o No downtime o Requires extra capacity o May Increase latency o Same or new cluster can be used o Adds complexity Alter the config database o Requires downtime – but minimal o Easy during burn period o Time consuming, if chunks are distributed o Has overhead during chunk moves www.objectrocket.com 26

27.Revert a shard key Process: 1) Disable the balancer – sh.stopBalancer() 2) Move all chunks to the primary shard (skip during burn period) 3) Stop one secondary from the config server ReplSet (for rollback) 4) Stop all mongos and all shards 5) On the config server replset primary execute: db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>}) db.getSiblingDB(‘config’).collections.remove({_id:<collection name>}) 6) Start all mongos and shards 7) Start the secondary from the config server replset Rollback: • After step 6, stop all mongos and shards • Stop the running members of the config server ReplSet and wipe their data directory • Start all config server replset members • Start all mongos and shards www.objectrocket.com 27

28.Revert a shard key Online option requested on SERVER-4000 - May be supported in 4.2 Further reading - Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf Special use cases: Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1}) o Possible (and easier) if b’s max and min (per a) are predefined o For example {year:month} to be extended to {year:month:day} Reduce the elements of a shard key (({a:1, b:1} to {a:1}) o Possible (and easier) if all distinct “a” values are in the same shard o There aren’t chunks with the same “a.min” (adds complexity) www.objectrocket.com 28

29. Revert a shard key Always preform a dry-run Balancer/Autosplit must be disabled You must take downtime during the change *There might be a more optimal code path but the above one worked like a charm www.objectrocket.com 29