Performing MongoDB Massive Write Operations Efficiently

无论是因为一个新特性、一个bug,还是仅仅出于存档目的,在生产中经常需要更新或删除大量文档。
这种类型的操作的挑战不仅在于设计高效的流程查询,而且在于能够在生产中执行它,而不会使服务器衰弱或导致辅助设备滞后。
有一些策略可以用来创建高度控制的写入过程,这些过程可以在雷达下运行数天,从而在不严重影响应用程序性能的情况下完成工作。
在本期课程中,我将与大家分享在MongoDB中创建大规模写入操作时需要考虑的要点、执行的实际流程的示例以及一些经验教训。

展开查看详情

1.Performing MongoDB Massive Write Operations Efficiently Gabriel Ciciliani - Pythian May 29th, 2019 © The Pythian Group Inc., 2018 1

2.AGENDA ● Massive write operations ● Scenarios ● Useful facts ● Different solutions ● Additional features ● Conclusions ● Q&A © The Pythian Group Inc., 2019 2018 2

3.When do we perform massive write operations? ● We need to archive old data for query or space efficiency.. ● We need to fix an error introduced by an application.. ● We need to update/create a key because of a new feature.. ● We need to get rid of entire databases in multi tenant environments.. © The Pythian Group Inc., 2019 2018 3

4.Massive write operations: challenges What’s wrong with running a single update(multi) or remove() statement: ● Excesive locking ● Could require large collection scans ○ Evicting frequently used pages from the cache ○ Causing IO spikes ● Hard to keep track of progress and recover from failures ● Lag on secondaries © The Pythian Group Inc., 2019 2018 4

5.General strategy: Split the operation in smaller, controllable tasks © The Pythian Group Inc., 2018 5

6.Useful facts ## All collections have an indexed _id key that is unique and not null ## There is an optimization for sort() + limit(1) where only 1 key and 1 document is read db.restaurants.find({cuisine:"American"},{ _id:1}).sort({cuisine:-1}).limit(1).explain(true) db.restaurants.find({cuisine:"American"},{ _id:-1}).sort({cuisine:-1}).limit(1).explain(true ) ... "totalKeysExamined" : 1, "totalDocsExamined" : 1, ... 6 © The Pythian Group Inc., 2019 2018

7.Useful facts ## limit() will effectively limit the amount of keys and documents examined after the specified number of documents matching the filter condition were found. ## When count() is executed against an indexed condition, it will only read the index db.restaurants.explain(true).find({cuisine:"American"},{_id:1}).count() ... "totalKeysExamined" : 6184, "totalDocsExamined" : 0, ... © The Pythian Group Inc., 2019 2018 7

8.Scenarios We will analyze the following situations: 1) Update and delete documents when there is an index for the condition 2) Update and delete documents when there is no index for the condition 3) Dropping databases 4) Dropping collections © The Pythian Group Inc., 2019 2018 8

9.1) Update and delete documents when there is an index for the condition © The Pythian Group Inc., 2018 9

10.Solution approach 1) Update and delete documents when there is an index for the condition a) Update cases: i) If the search condition is affected by the update operation -> iterate with limit() until there are no more documents to update ii) If the search condition is *not* affected by the update -> split the total amount of documents in chunks and run the update for each one b) Delete: i) Iterate with limit() until there are no more documents to delete © The Pythian Group Inc., 2019 2018 10

11.2) Update and delete documents when there is no index for the condition © The Pythian Group Inc., 2018 11

12.Solution approach 2) Update and delete documents when there is no index for the condition a) Update and delete: i) Split the entire collection in chunks and run the operation for each one © The Pythian Group Inc., 2019 2018 12

13.3) Dropping databases © The Pythian Group Inc., 2018 13

14.Solution approach Database Drop ● db.dropDatabase() requires a global write lock, blocking other operations until it has completed. ● db.collection.drop() only requires a database level lock ● Drop all collections first, then drop the database during off-peak hours © The Pythian Group Inc., 2019 2018 14

15.Database drop PoC -Instance type: t2.large -Storage: 80Gb gp2 EBS volume -MongoDB version: 4.0.6 -Database size: 53Gb Without dropping collections Dropping collections first db.dropDatabase() real 1.248 secs 0.056 secs time © The Pythian Group Inc., 2019 2018 15

16.4) Dropping collections © The Pythian Group Inc., 2018 16

17.Solution approach Collection Drop ● db.collection.drop() requires a database-level write lock, blocking other operations until it has completed. ● db.collection.remove() only requires a collection-level lock ● Drop all documents first, then drop the collection © The Pythian Group Inc., 2019 2018 17

18.Additional features ● Add a mechanism to stop/restart the process ● Keep track of the process progress ● Dry run option © The Pythian Group Inc., 2019 2018 18

19.Add a mechanism to stop and restart the process ● Use an auxiliary collection and use a specific document as a flag ● Check the status of that document between iterations ● Register last _id processed and completed collections so you can resume the process © The Pythian Group Inc., 2019 2018 19

20.Keep track of the process progress ● Use an auxiliary collection and update it upon every iteration, specifying number and timestamp ● If possible, compute the total documents to be modified / removed and the amount of iterations required © The Pythian Group Inc., 2019 2018 20

21.Dry run option ● It is always good to test the code by disabling the destructive operation (update, remove, drop, etc) to make sure the script is about to remove what we are expecting ● This can be implemented by simply adding a boolean parameter and by checking its value before dropping, updating or pruning a collection © The Pythian Group Inc., 2019 2018 21

22.Conclusions ● Split the overall operation in smaller, controllable tasks ● Make sure the most efficient approach is used, depending on the type of operation and indexes available ● Use delays to reduce pressure on the system ● Calculate overall iterations required and log progress using an auxiliary collection ● Use an auxiliary document as emergency stop © The Pythian Group Inc., 2019 2018 22

23.Code samples You will find all code reviewed during this session at: https://github.com/gabocic/mongodb/tree/pl19tx/pl19tx © The Pythian Group Inc., 2019 2018 23

24.Q&A © The Pythian Group Inc., 2018 24

25.Thanks! We are always growing! Come and see me after the session or reach out at ciciliani@pythian.com © The Pythian Group Inc., 2018 25