Monitoring MongoDB's Engines in the Wild

MMAP在MongoDB生态系统中的应用和监测由来已久。然而,现在有了新的玩家,他们的形式是wiredtiger和rocksdb。在本课程中,我们将分解这些引擎的所有度量输出,并讨论它们的含义。我们还将讨论如何将度量分组,以了解它们的交互,从而解释队列、CPU、内存和其他区域中的峰值。

展开查看详情

1.Monitoring MongoDB’s Engines in the Wild Tim Vaillancourt Sr. Technical Operations Architect

2.About Me • Joined Percona in January 2016 • Sr Technical Operations Architect for MongoDB • Previous: • EA DICE (MySQL DBA) • EA SPORTS (Sys/NoSQL DBA Ops) • Amazon/AbeBooks Inc (Sys/MySQL+NoSQL DBA Ops) • Main techs: MySQL, MongoDB, Cassandra, Solr, Redis, queues, etc • 10+ years tuning Linux for database workloads (off and on) • Monitoring techs • Nagios • MRTG • Munin • Zabbix • Cacti • Graphite • Prometheus

3.Storage Engines • MMAPv1 • Mostly done by Linux kernel • WiredTiger • Default as of 3.2 • Percona In-Memory • Same metrics as WiredTiger • RocksDB • PerconaFT / TokuMX • Deprecated • Fractal-tree based storage engine

4.Storage Engines?! The New SE API • Introduced in MongoDB 3.0 • Abstraction layer for storage-level interaction • Allowed integration of WiredTiger and other features

5.Storage Engines: MMAPv1 • Default storage engine < 3.2 (now WiredTiger) • Collection-level locking (common performance bottleneck) • Monitored via Lock Ratio/Percent metrics • In-place datafile updating (when possible) • OS-level operations • Uses OS-level mmap() to map BSON files on disk <=> memory • Uses OS-level filesystem cache as block cache • Much low(er) monitoring visibility • Database metrics must be gathered from OS-level • OS-level metrics are more vague

6.Storage Engines: MMAPv1 • Document read path • Try to load from cache • If not in cache, load from BSON file on disk • Document update/write path • Try to update document in-place • If too big, “move” document on disk until a free space is found

7.Storage Engines: WiredTiger • New default engine as of 3.2 • Standalone LSM engine acquired by MongoDB Inc • BTree-Based under MongoDB • Integrated using Storage Engine API • Document-level locking • Built-in compression • Index prefix compression • MVCC and Concurrency Limits • High parallelism / CPU utilisation

8.Storage Engines: WiredTiger • Document Write Path • Update, delete or write is written to WT log • Changes to data files are performed by checkpointing later • Document Read Path • Looks for data in in-heap cache • Looks for data in the WT log • Goes to data files for the data • Kernel will look in filesystem cache, uncompress result if exists • If not in FS cache, read from disk and uncompress result • Switch compression algorithms if CPU is too high

9.Storage Engines: RocksDB / MongoRocks • MongoRocks developed by • Tiered level compaction strategy • First layer is called the MemTable • N number of on-disk levels • Compaction is triggered when any level is full • In-heap Block Cache (default 30% RAM) • Holds uncompressed data • BlockCache reduces compression CPU hit • Kernel-level Page Cache for compressed data • Space amplification of LSM is about +10% • Optional ‘counters’: storage.rocksdb.counters

10.Storage Engines: RocksDB / MongoRocks • Document Write path • Updates, Deletes and Writes go to Meltable and complete • Compaction resolves multi-versions of data in the background • Document Read path • Looks for data in MemTable • Level 0 to Level N is asked for the data • Data is read from filesystem cache, if present, then uncompressed • Or, bloom filter is used to find data file, then data is read and uncompressed

11.Storage Engines: RocksDB / MongoRocks • Watch for • Pending compactions • Stalls • Indicates compaction system is overwhelmed, possibly due to I/O • Level Read Latencies • If high, disk throughput may be too low • Rate of compaction in bytes vs any noticeable slowdown • Rate of deletes vs read latency • Deletes add expense to reads and compaction

12.Metric Sources: operationProfiling • Writes slow database operations to a new MongoDB collection for analysis • Capped Collection: “system.profile” in each database, default 100mb • The collection is capped, ie: profile data doesn’t last forever • Support for operationProfiling data in Percona Monitoring and Management in current future goals • Enable operationProfiling in “slowOp” mode • Start with a very high threshold and decrease it in steps • Usually 50-100ms is a good threshold • Enable in mongod.conf operationProfiling: slowOpThresholdMs: 100 mode: slowOp Or the command-line way… mongod <other-flags> —profile 1 —slowms 100

13.Metric Sources: operationProfiling • op/ns/query: type, namespace and query of a profile • keysExamined: # of index keys examined • docsExamined: # of docs examined to achieve result • writeConflicts: # of WCE encountered during update • numYields: # of times operation yielded for others • locks: detailed lock statistics

14.Metric Sources: operationProfiling • nreturned: # of documents returned by the operation • nmoved: # of documents moved on disk by the operation • ndeleted/ninserted/nMatched/nModified: self explanatory • responseLength: the byte-length of the server response • millis: execution time in milliseconds • execStats: detailed statistics explaining the query’s execution steps • SHARDING_FILTER = mongos sharded query • COLLSCAN = no index, 35k docs examined(!)

15.Metric Sources: db.serverStatus() • A function that dumps status info about MongoDB’s current status • Think “SHOW FULL STATUS” + “SHOW ENGINE INNODB STATUS” • Sections • Asserts • backgroundFlushing • connections • dur (durability) • extra_info • globalLock + locks • network • opcounters • opcountersRepl • repl (replication) • storageEngine • mem (memory) • metrics • (Optional) wiredTiger • (Optional) rocksdb

16.Metric Sources: db.serverStatus().rocksdb

17.Metric Sources: db.serverStatus().rocksdb

18.Metric Sources: db.serverStatus().wiredTiger • ‘block-manager’: disks reads/writes • ‘cache’: in-heap page cache • Watch eviction modified vs unmodified • ‘cursor’: WiredTiger cursor ops/calls • ‘log’: WiredTiger log stats

19.Metric Sources: db.serverStatus().wiredTiger • ‘transaction’: checkpoint and trx info • Watch max/min/avg checkpoint times • ‘concurrentTransactions’: concurrency ticket info (!) • Increased with engine variable

20.Metric Sources: db.serverStatus().wiredTiger

21.Metric Sources: rs.status() • A function that dumps replication status • Think “SHOW MASTER STATUS” or “SHOW SLAVE STATUS” • Contains • Replication set name and term • Member status • State • Optime state • Election state • Heartbeat state

22.Metric Sources: Cluster Metadata • The “config” database on Cluster Config servers • Contains • actionlog (3.0+) • changelog • databases • collections • shards • chunks • settings • mongos • locks • lockpings

23.Metric Sources: db.currentOp() • A function that dumps status info about running operations and various lock/execution details

24.Metric Sources: Log Files • Interesting details are logged to the mongod/mongos log files • Slow queries • Storage engine details (sometimes) • Index operations • Chunk moves • Connections

25.Monitoring: Percona PMM • Open-source monitoring from Percona! • Based on open- source technology • Simple deployment • Examples in this demo are from PMM • 800+ metrics per ping

26.Monitoring: Prometheus + Grafana • Percona-Lab GitHub • grafana_mongodb_dashboards for Grafana • prometheus_mongodb_exporter for Prometheus • Sources • db.serverStatus() • rs.status() • sh.status() • Config-server metadata • Others and more soon.. • Supports MMAPv1, WT and RocksDB • node_exporter for Prometheus • OS-level (mostly Linux) exporter

27.Monitoring: Prometheus + Grafana

28.Usual Performance Suspects • Locking • Collection-level locks • Document-level locks • Software mutex/semaphore • Limits • Max connections • Operation rate limits • Resource limits • Resources • Lack of IOPS, RAM, CPU, network, etc

29.MongoDB Resources and Consumers • Memory • CPU • System CPU • FS cache • Networking • Disk I/O • Threading • User CPU (MongoDB) • Compression (WiredTiger and RocksDB) • Session Management • BSON (de)serialisation • Filtering / scanning / sorting • Optimiser • Disk • Data file read/writes • Journaling • Error logging • Network • Query request/response • Replication