Performance Analysis and Troubleshooting Methodologies for Databases
数据库的性能分析和故障排除方法
展开查看详情
1. Performance Analysis and Troubleshooting Methodologies for Databases Peter Zaitsev, CEO February 3nd, 2018 FOSDEM Monitoring and Cloud devroom © 2018 Percona. 1
2.Databases and Performance Databases are frequent Performance Trouble Makers © 2018 Percona. 2
3.Why Databases are Painful ? Generally Non-Linear Scalability Complex Often Poorly understood by developers © 2018 Percona. 3
4.Performance Work with Databases Troubleshooting Capacity Planning Cost and Efficiency Optimization Change Management © 2018 Percona. 4
5.Points of View BlackBox – WhiteBox – “Application “DBA, Ops” Developer” © 2018 Percona. 5
6.Developer Point of View Database as a Blackbox I throw queries at it and it responds DBaaS bring this “promise” to OPS too © 2018 Percona. 6
7.BlackBox Success Criteria for Databases Availability Response Time Correctness Cost © 2018 Percona. 7
8.Ops Point of View Load Resource Utilization System/Hardware Problems Scaling/Capacity Planing © 2018 Percona. 8
9.Methodologies for Performance Troubleshooting and Analyses © 2018 Percona. 9
10.Typical Default Troubleshooting by Random Googling © 2018 Percona. 10
11.Problems with Typical Approach Hard to Assure Outcome Hard to Train People Hard to Automate © 2018 Percona. 11
12.Methodologies Save the Day USE (Utilization, Golden Signals (Latency RED (Rate, Errors(Rate), Saturation, Errors) - Traffic - Errors - Duration) Method Tom Method by Brendan Saturations) Method by Wilkie Gregg Rob Ewaschuk © 2018 Percona. 12
13.USE Method © 2018 Percona. 13
14.USE Method Basics Developed to Troubleshoot Server Performance Issues Resolve 80% of problems with 5% of Effort Operating System Specific Checklists Available © 2018 Percona. 14
15.USE Method in One Sentence “For every resource, check utilization, saturation, and errors.” © 2018 Percona. 15
16.USE Method Terminology Defitinions Resource • all physical server functional components (CPUs, disks, busses, ...) Utilization • the average time that the resource was busy servicing work Saturation • the degree to which the resource has extra work which it can't service, often queued Errors • the count of error events © 2018 Percona. 16
17.USE Method Resources CPUs: sockets, cores, hardware threads (virtual CPUs) Memory: capacity Network interfaces Storage devices: I/O, capacity Controllers: storage, network cards Interconnects: CPUs, memory, I/O © 2018 Percona. 17
18.USE Method with Software Same Basic Resources Apply Additional Software Resources Apply Mutex Locks File Descriptors Connections © 2018 Percona. 18
19.USE Method Benefits Proven Track Record Broad Applicability Detailed Checklists Available © 2018 Percona. 19
20.USE Method Drawbacks Requires Good Understanding of System Architecture Requires Access to Low Level Resources Monitoring Hard to apply in Service “Blackbox” environments © 2018 Percona. 20
21.RED Method © 2018 Percona. 21
22.RED Method Focus Microservices “Cattle not Pets” Mapping to Resources can be fluid © 2018 Percona. 22
23.RED Method For every Service Check •Rate Request check these •Error (Rate) are within •Duration (Distribution) SLO © 2018 Percona. 23
24.RED Method for Databases Looking at Service Level Looking at Individual Database Servers Can be applied to Components/Resources Can be applied to individual Types of Queries © 2018 Percona. 24
25.RED Method Benefits Easily maps to what Developers Care About Does not require as deep understanding of architecture Does not need access to low lever resource monitoring © 2018 Percona. 25
26.RED Method Drawbacks Does not have as More focused on much tools and Answering WHAT checklists support rather WHY yet © 2018 Percona. 26
27.Four Golden Signals © 2018 Percona. 27
28.Focus Monitoring Distributed Systems from SRE Book To Be used for Alerting, Troubleshooting, Trend Analyses © 2018 Percona. 28
29.Four Golden Signals Latency • Distribution not just Average; Latency for Successful requests vs Errors Traffic • How much Demand is being placed on the System Errors • Error Codes are Easy; Bad Content is hard Saturation • How Full your system “capacity”. Forecast when Possible. © 2018 Percona. 29