How to visually spot and analyze slow MongoDB operations

你有没有想过为什么MongoDB数据库看起来比预期的慢得多?你是否经历过周期性的热点,并且想知道罪魁祸首?您想知道哪些数据库操作最昂贵,哪些操作最有意义吗?
在本文中,您将很快了解MongoDB内置的数据库探查器的局限性,特别是在分片环境中。我将介绍一个我们为解决这些问题而开发的开源工具。我们创建了一个工具来收集、存储和帮助可视化分析缓慢的MongoDB操作。通过使用该工具,您可以更好地了解MongoDB集群,而无需更深入的MongoDB专业知识。该工具针对多个数据库或Mongod同时运行数据库命令,只需点击几下,就可以轻松地干预和发现性能问题。

展开查看详情

1.How to visually spot and analyze slow MongoDB operations Kay Agahd idealo internet GmbH

2.About me ● Current location: Berlin/Germany ● Education: Engineer's degree, Software Engineering ● Experience: ○ Software developer from 1998 - 2009 in Paris/France ○ Database engineer since 2012 at idealo in Berlin/Germany ● Certifications: ○ MongoDB certified Java developer (final grade 100%) ○ MongoDB certified DBA (final grade 96%) linkedin.com/in/kayag/ 2

3.About idealo ● founded in 2000 ● Europe’s leading price comparison website and app ● Germany, Austria, United Kingdom, France, Italy and Spain ● > 1 billion offers online (November 2018) ● fast growing 3

4.idealo and MongoDB ● different types of databases: Oracle, MySQL, PostgreSQL, MongoDB ● MongoDB in production since v1.6 (ca. 2011) ● sharding in production since MongoDB v1.8 ● MongoDB stores mainly offers for back-end usage ● > 2 billions docs in offerStore, up to 1 bn both read/writes per day ● > 10 billions docs in offerHistory 4

5. Motivation Why we need a better MongoDB profiler

6.Review profiling ● MongoDB supports profiling of “slow” operations ● “slow” is a threshold to be set when turning profiling on (100 ms) ● profiler writes collected data to a capped collection of the profiled database ● profiling per-database or per-instance on a running mongod 6

7.Inconveniences ● each mongod and database needs to be handled separately ● sharding: shards * repl. factor * databases = #profilers ● gives only a view on a limited time span due to capped collection ● profiling/analyzing may add stress to the profiled server ● different formats of “query” field makes querying more difficult ● bug: ops through mongos omit the user (JIRA: SERVER-7538) 7

8.idealo requirements ● easily switch on/off profiling, even for many mongod’s involved ● quick overview of types of slow-ops and their quantity within a time period (“types” means op type, user, server, query shape, etc.) ● historical view to see how slow-ops evolve to extrapolate them ● discovering spikes in time or in slow-op types ● filtering by slow-op types and/or time range to drill down ● usable also by non-database-admins, e.g. software developers 8

9.What we’ve built MongoDB slow operations profiler

10.How it works mongod 1 slow ops app DB 1 profiler 1..n collector DB DB 2 DB n mongod n profiler n..m DB n1 DB n2 DB nm 10

11.Example of slow op document { "shardVersion": [ "nreturned": 185, "op": "query", Timestamp(14944, 25276), "responseLength": 119954, "ns": "offerStore.offer", ObjectId("591c6...8fcde") "protocol": "op_command", "query": { ] "millis": 4189, "find": "offer", }, "execStats": { "filter": { "keysExamined": 2210852, "stage": "PROJECTION", "shopId": 292731, "docsExamined": 232, "nReturned": 185, "opIds": { "cursorExhausted": true, "executionTimeMillisEstimate": 3941, "$in": [ "keyUpdates": 0, "works": 2210853, 29337,5478 "writeConflicts": 0, "advanced": 185, ] }, "numYield": 17272, ... 124 lines omitted ... "locks": { "offerStatus": { }, "Global": { "$gt": 0 "ts": ISODate("2018-10-26T07:17:12.747Z"), "acquireCount": { } "client": "10.135.128.219", "r": }, "allUsers": [ NumberLong("34546")}}, "projection": { { "Database": { "traceId": 1, "user": "__system", "acquireCount": { "bokey": 1, "db": "local" "r": "version": 1, } NumberLong("17273")}}, "offerTitle": 1 ], "Collection": { }, "user": "__system@local" "acquireCount": { "batchSize":1000, } "r": 11 ... NumberLong("17273")}} },

12.Condense slow op documents { "_id":ObjectId("5bd3090b68b5c4203f53ce7e"), "ts":ISODate("2018-10-26T12:31:07.752Z"), "adr":"host523.idealo.de", "lbl":"offerStoreDE", "rs":"offerStoreDE09", "db":"offerStore", "col":"offers" "op":"getmore", "fields":["shopId", "opIds.$in", "offerStatus.$gt"], "sort":["_id"], "nret":500, "reslen":94656, "millis":5322, "user":"__system@local" } 12

13.Some numbers of the collector DB ● > 250 millions slow ops stored within the last > 5 years ● average doc size = 238 Bytes ● uncompressed data size ca. 55 GB ● index size < 9 GB ● total storage size (snappy compression) < 12 GB 13

14.Configuration ● Collector { "collector":{ "hosts":["myCollectorHost_member1:27017", "myCollectorHost_member2:27017", "myCollectorHost_member3:27017"], "db":"profiling", "collection":"slowops", "adminUser":"", "adminPw":"" }, ... 14

15.Configuration ● databases to be profiled "profiled":[ { "label":"dbs foo", "hosts":["someHost1:27017", "someHost2:27017", "someHost3:27017"], "ns":["someDB.someCollection", "anotherDB.*"], "enabled": false }, { "label":"dbs bar", "hosts":["someMongoRouter1:27017","someMongoRouter2:27017"], "ns":["*.*"], "adminUser":"kay", "adminPw":"never.tell.it!:-)", "enabled":false, "slowMS":500, "responseTimeoutInMs":2000 } ],... 15

16.How it looks 16

17.How it looks - part 2 17

18.Slow ops diagram 2018/10/30 10:04 = Count:95 db=offerStore coll=offers op=query fields=[shopId,mCC.$gt] Duration: avg:322 max:990 sum:31.682 ms 18

19.Slow ops data table 19

20.Slow ops search form 20

21.Further benefits ● global collector allows to see evolution of slow ops ● aggregate slow ops of last minute grouped by label {$match: { ts: {$gt:from, $lt:to}} }, {$group: {_id: {label:"$lbl"}, count:{$sum:1}, sumMs:{$sum:"$millis"}, maxMs:{$max:"$millis"}, sumNret:{$sum:"$nret"}, sumResplen:{$sum:"$reslen"} } } ● send metrics i.e. count, durations and nReturned to graphite ● build grafana dashboard 21

22.Historical view per DBS 22

23.Further documentation ● Open source project, you are welcome to contribute: https://github.com/idealo/mongodb-slow-operations-profiler ● Blog post 1 of 2: https://medium.com/idealo-tech-blog/how-to-visually-spot- and-analyze-slow-mongodb-operations-d91ac819e0de ● Blog post 2 of 2: https://medium.com/idealo-tech-blog/mongodb-slow- operations-analyzer-2-0-24da414fad13 We are hiring: jobs.idealo.de 23

24.Rate My Session 24