Prometheus, onto being boring



1.Prometheus Onto being boring Goutham Veeramachaneni putadent gouthamve

2.Who am I? putadent gouthamve


4.Prometheus: The monitoring system ● Started in 2012 at Soundcloud ● Inspired by Google’s monitoring tools ● First blog post out in Jan 2015 ● CNCF Project ○ Second graduated project after Kubernetes ● Used by thousands of companies, big and small!

5.Prometheus: Architecture

6.Prometheus: Architecture

7.Prometheus: Architecture

8.Prometheus: Architecture

9.Prometheus: A little History ● First public blog post in Jan 2015 ● CNCF project and 1.0 release in May-July 2016 ● 2.0 released on Nov 8 2017 ● A completely re-written storage engine ● 3-5x improvements in CPU, RAM and queries ● Broke everything we wanted to break ● Laid the foundation for everything we wanted to achieve

10.Our focus in 2.x: Make it boring ● Boring?

11.Our focus in 2.x: Make it boring ● Boring?

12.Boring software ● Rock solid ● Easy to understand ● Release notes: ○ Performance improvements :) ○ It’s faster :D ● No surprises

13.Why? ● 2.0 was exciting :D ● Prometheus is now everywhere ● Our releases were just good, not great ● Enterprise ready?

14.2.0: The storage rewrite ● 1.0: A single index with a file for each series ○ Bloated index and millions of files ● 2.0: Block based with compactions

15.Timeseries series time

16.Timeseries Query Patterns series time

17.Timeseries Query Patterns series time

18.Timeseries Query Patterns series time

19.Timeseries 1 file/series Holding compressed chunks series time

20.Modern Era ● Kubernetes, Docker Swarm ● Super dynamic environments ● New IP for every update, scale up and down as you want

21.Timeseries churn series time

22.Problem: Too Many Files

23.Problem: An index that bloats with time! ● A single index which resolves to the relevant files. ● Which means the index gets bigger with time. ● 5 million active series ● 150 million total timeseries

24.2.0 Storage series time

25.2.0 Storage series time

26. 2.0 Storage t0 t1 t2 t3 now mutable write prometheus query merge

27. RAM (GB): 3-5x better 15GB 1.5 Queried 1.5 Unqueried 5GB 2.0 Queried 2.0 Unqueried

28. CPU (Cores): 5-10x better 6 Cores 1.5 Queried 1.5 Unqueried 1 Cores 2.0 Queried 2.0 Unqueried

29. IO Write(MB): 25+x better 1.5 Queried 1.5 Unqueried 80MB 20MB 2.0 (both)