Prometheus, onto being boring
1.Prometheus Onto being boring Goutham Veeramachaneni putadent gouthamve
2.Who am I? putadent gouthamve
4.Prometheus: The monitoring system ● Started in 2012 at Soundcloud ● Inspired by Google’s monitoring tools ● First blog post out in Jan 2015 ● CNCF Project ○ Second graduated project after Kubernetes ● Used by thousands of companies, big and small!
9.Prometheus: A little History ● First public blog post in Jan 2015 ● CNCF project and 1.0 release in May-July 2016 ● 2.0 released on Nov 8 2017 ● A completely re-written storage engine ● 3-5x improvements in CPU, RAM and queries ● Broke everything we wanted to break ● Laid the foundation for everything we wanted to achieve
10.Our focus in 2.x: Make it boring ● Boring?
11.Our focus in 2.x: Make it boring ● Boring?
12.Boring software ● Rock solid ● Easy to understand ● Release notes: ○ Performance improvements :) ○ It’s faster :D ● No surprises
13.Why? ● 2.0 was exciting :D ● Prometheus is now everywhere ● Our releases were just good, not great ● Enterprise ready?
14.2.0: The storage rewrite ● 1.0: A single index with a file for each series ○ Bloated index and millions of files ● 2.0: Block based with compactions
15.Timeseries series time
16.Timeseries Query Patterns series time
17.Timeseries Query Patterns series time
18.Timeseries Query Patterns series time
19.Timeseries 1 file/series Holding compressed chunks series time
20.Modern Era ● Kubernetes, Docker Swarm ● Super dynamic environments ● New IP for every update, scale up and down as you want
21.Timeseries churn series time
22.Problem: Too Many Files
23.Problem: An index that bloats with time! ● A single index which resolves to the relevant files. ● Which means the index gets bigger with time. ● 5 million active series ● 150 million total timeseries
24.2.0 Storage series time
25.2.0 Storage series time
26. 2.0 Storage t0 t1 t2 t3 now mutable write prometheus query merge
27. RAM (GB): 3-5x better 15GB 1.5 Queried 1.5 Unqueried 5GB 2.0 Queried 2.0 Unqueried
28. CPU (Cores): 5-10x better 6 Cores 1.5 Queried 1.5 Unqueried 1 Cores 2.0 Queried 2.0 Unqueried
29. IO Write(MB): 25+x better 1.5 Queried 1.5 Unqueried 80MB 20MB 2.0 (both)