Performance Analysis and Troubleshooting Methodologies for Databases



1. Performance Analysis and Troubleshooting Methodologies for Databases Peter Zaitsev, CEO February 3nd, 2018 FOSDEM Monitoring and Cloud devroom © 2018 Percona. 1

2.Databases and Performance Databases are frequent Performance Trouble Makers © 2018 Percona. 2

3.Why Databases are Painful ? Generally Non-Linear Scalability Complex Often Poorly understood by developers © 2018 Percona. 3

4.Performance Work with Databases Troubleshooting Capacity Planning Cost and Efficiency Optimization Change Management © 2018 Percona. 4

5.Points of View BlackBox – WhiteBox – “Application “DBA, Ops” Developer” © 2018 Percona. 5

6.Developer Point of View Database as a Blackbox I throw queries at it and it responds DBaaS bring this “promise” to OPS too © 2018 Percona. 6

7.BlackBox Success Criteria for Databases Availability Response Time Correctness Cost © 2018 Percona. 7

8.Ops Point of View Load Resource Utilization System/Hardware Problems Scaling/Capacity Planing © 2018 Percona. 8

9.Methodologies for Performance Troubleshooting and Analyses © 2018 Percona. 9

10.Typical Default Troubleshooting by Random Googling © 2018 Percona. 10

11.Problems with Typical Approach Hard to Assure Outcome Hard to Train People Hard to Automate © 2018 Percona. 11

12.Methodologies Save the Day USE (Utilization, Golden Signals (Latency RED (Rate, Errors(Rate), Saturation, Errors) - Traffic - Errors - Duration) Method Tom Method by Brendan Saturations) Method by Wilkie Gregg Rob Ewaschuk © 2018 Percona. 12

13.USE Method © 2018 Percona. 13

14.USE Method Basics Developed to Troubleshoot Server Performance Issues Resolve 80% of problems with 5% of Effort Operating System Specific Checklists Available © 2018 Percona. 14

15.USE Method in One Sentence “For every resource, check utilization, saturation, and errors.” © 2018 Percona. 15

16.USE Method Terminology Defitinions Resource • all physical server functional components (CPUs, disks, busses, ...) Utilization • the average time that the resource was busy servicing work Saturation • the degree to which the resource has extra work which it can't service, often queued Errors • the count of error events © 2018 Percona. 16

17.USE Method Resources CPUs: sockets, cores, hardware threads (virtual CPUs) Memory: capacity Network interfaces Storage devices: I/O, capacity Controllers: storage, network cards Interconnects: CPUs, memory, I/O © 2018 Percona. 17

18.USE Method with Software Same Basic Resources Apply Additional Software Resources Apply Mutex Locks File Descriptors Connections © 2018 Percona. 18

19.USE Method Benefits Proven Track Record Broad Applicability Detailed Checklists Available © 2018 Percona. 19

20.USE Method Drawbacks Requires Good Understanding of System Architecture Requires Access to Low Level Resources Monitoring Hard to apply in Service “Blackbox” environments © 2018 Percona. 20

21.RED Method © 2018 Percona. 21

22.RED Method Focus Microservices “Cattle not Pets” Mapping to Resources can be fluid © 2018 Percona. 22

23.RED Method For every Service Check •Rate Request check these •Error (Rate) are within •Duration (Distribution) SLO © 2018 Percona. 23

24.RED Method for Databases Looking at Service Level Looking at Individual Database Servers Can be applied to Components/Resources Can be applied to individual Types of Queries © 2018 Percona. 24

25.RED Method Benefits Easily maps to what Developers Care About Does not require as deep understanding of architecture Does not need access to low lever resource monitoring © 2018 Percona. 25

26.RED Method Drawbacks Does not have as More focused on much tools and Answering WHAT checklists support rather WHY yet © 2018 Percona. 26

27.Four Golden Signals © 2018 Percona. 27

28.Focus Monitoring Distributed Systems from SRE Book To Be used for Alerting, Troubleshooting, Trend Analyses © 2018 Percona. 28

29.Four Golden Signals Latency • Distribution not just Average; Latency for Successful requests vs Errors Traffic • How much Demand is being placed on the System Errors • Error Codes are Easy; Bad Content is hard Saturation • How Full your system “capacity”. Forecast when Possible. © 2018 Percona. 29