- 微博 QQ QQ空间 贴吧
Automated Database Monitoring at Uber With M3 and Prometheus
1 .Scaling Automated Database Monitoring at Uber … with M3 and Prometheus Richard Artoul
2 .Agenda 01 Automated database monitoring 02 Why scaling automated monitoring is hard 03 M3 architecture and why it scales so well 04 How you can use M3
3 .Uber’s “Architecture” 4000+ microservices - 2015 2019 Most of which directly or indirectly depend on storage 22+ storage technologies - Ranging from C* to MySQL 1000’s of dedicated servers running databases - Monitoring all of these is hard
4 .Monitoring Databases Application Technology Application Hardware
5 .Hardware Level Metrics
6 .Technology Level Metrics E E
7 .Application Level Metrics ● Number of successes, errors, and latency broken down by ○ All queries against a given database ○ All queries issue by a specific service ○ A specific query ■ SELECT * FROM TABLE WHERE USER_ID = ?
9 .What’s so hard about that?
10 .Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per DB cluster 20 Queries per service X 10 Metrics per query 480 100+million millionunique time series dollars!
11 .Workload 800M 110M Datapoints emitted pre- Writes per second/s aggregation (post replication) 200B 9B Datapoints read per second Unique Metric IDs
12 .How do we do it?
13 .A Brief History of M3 ● 2014-2015: Graphite + WhisperDB ○ No replication, operations were ‘cumbersome’ ● 2015-2016: Cassandra ○ Solved operational issues ○ 16x YoY Growth ○ Expensive (> 1500 Cassandra Hosts) ○ Compactions => R.F=2 ● 2016-Today: M3DB
14 .M3DB Overview
15 .M3DB An open source distributed time series database ● Store arbitrary timestamp precision data points at any resolution for any retention ● Tag (key/value pair) based inverted index ● Optimized file-system storage with minimal need for compactions of time series data for real-time workloads
16 .High-Level Architecture Like a Log Structured Merge Tree (LSM) Except a typical LSM has levelled or size based compaction M3DB has time window compaction
17 .Topology and Consistency ● Strong consistent topology (using etcd) ○ No gossip ○ Replicated with zone/rack aware layout and configurable replication factor ● Consistency managed via synchronous quorum writes and reads ○ Configurable consistency level for both read and write
18 .Cost Savings and Performance ● Disk Usage in 2017 ○ ~1.4PB for Cassandra at R.F=2 ○ ~200TB for M3DB at R.F=3 ● Much higher throughput per node with M3DB ○ Hundreds of thousands of writes/s on commodity hardware
19 .But what about the index?
20 .Centralized Elasticsearch Index ● Actual time series data was stored in Cassandra and the M3DB ● But indexing of data (for querying) was handled by Elasticsearch ● Worked for us for a long time ● … but scaling writes and reads required a lot of engineering
21 .Elasticsearch Index - Write Path M3DB Influx of new metrics: “Don’t write cache” 1. Large Service Indexer Deployment Redis 2. Datacenter Failover m3agg Cache E.S
22 .Elasticsearch Index - Read Path 1 E.S Query 2 M3DB
23 .Elasticsearch Index - Read Path Query Redis Query Cache E.S Need high T.T.L to prevent overwhelming E.S … but high T.T.L means long delay for new time series to become queryable
24 .Elasticsearch Index - Read Path Short TTL Redis Query Cache E.S Short Merge on Query read Redis Query Cache E.S Long Long TTL
25 .Elasticsearch Index - Final Breakdown ● Two elasticsearch clusters with separate configuration ● Two query caches ● Two don’t-write caches ● A stateful indexing tier that requires consistent hashing, an in-memory cache, and breaks everything if you deploy it too quickly ● Another service just for automatically rotating the short-term elasticsearch cluster indexes ● A background process that’s always running and trying to delete stale documents from the long term index
26 .M3 Inverted Index ● Not nearly as expressive or feature-rich as Elasticsearch ● … but like M3DB, it’s designed for high throughput ● Temporal by nature (T.T.Ls are cheap) ● Fewer moving parts
27 .M3 Inverted Index ● Primary use-case, support queries in the form: ○ service = “foo” AND ○ endpoint = “bar” AND ○ client_version regexp matches “3.*” service=”foo” endpoint=”bar” client_version=”3.*” AND AND AND → Intersection client_version=”3.1” OR → Union OR client_version=”3.2” OR
28 .M3 Inverted Index - F.S.Ts ● The inverted index is similar to Lucene in that it uses F.S.T segments to build an efficient and compressed structure for fast regexp. ● Each time series’ label has its own F.S.T that can be searched to find the offset to unpack another data structure that contains the set of metrics IDs associated with a particular label value (postings list) Encoded Relationships are --> 4 ate --> 2 see --> 3 Compressed + fast regexp!
29 .M3 Inverted Index - Postings List ● For every term combination in the form of service=”foo” need to store a set of metric IDs (integers) that match, this is called a postings list. ● Index is broken into blocks and segments, so need to be able to calculate intersection (AND - across terms) and union (OR - within a term). 12P.M -> 2P.M Index Block Intersect → AND service=”foo” INTERSECT endpoint=”bar” INTERSECT client_version=”3.*” Union → OR client_version=”3.1” Union Primary data structure for the postings list in M3DB is the roaring bitmap client_version=”3.2” Union