PingCAP-Infra-Meetup-105-Chaos practicein TiDB
展开查看详情
1.Chaos practice in TiDB PingCAP 舒科 2019 年 6 月 1 日
2.Testings in TiDB ● Unit Testing ● Integration Testing ● Performance Testing ● Schroinger Testing ○ Chaos Testing
3.content ● Why Chaos ● Practice in TiDB ● Schrodinger
4.Why Chaos ● Use fault injection ● by Netflix 2010 ○ Break things ● Why 2010 ○ Netflix move to AWS ○ lots of errors ■ hardware ■ network latency ■ ...
5.Why Chaos (cont.) ● Goal: ○ Make system stronger ● Steps
6.Why Chaos (cont.) ● Samples ○ Chaos in EMC ■ Robot to remove harddisk in BMW POC ○ Chaos in Facebook ■ shutdown a data center ● lack of Chaos ○ 737 - max
7.Why Chaos (cont.) ● Micro service ○ Too complex to understand ● Error always happens ● Do Chaos to Gain confidence
8.Why Chaos (cont.)
9.Why Chaos (cont.) ● ETCD bug ● RocksDB bug ● Leader partitioned ● Transfer leader if busy ● Too many regions ● Crashed when processing batch raft ● ...
10.● Why Chaos ● Practice in TiDB ● Schrodinger
11.Chaos practise in TiDB
12.Chaos practise in TiDB (cont.) ● Region hearbeats ○ check what happened when huge number regions on a machine ○ choose metric: CPU ○ Hypothesize: ■ CPU is still low ○ Experiments ■ 40k regions on a machine ○ What happened? ■ OOM ■ 30% CPU occupied
13.Chaos practise in TiDB (cont.) ● Choose Metrics ○ often QPS ○ CPU ○ memory ● Hypothesis ○ QPS revert to previous level in X seconds ○ QPS drop 1/x
14.Chaos practise in TiDB (cont.) Error injection
15.Chaos practise in TiDB (cont.) ● Applications ○ kill, kill -9 ○ renice ○ sigstop, sigcont
16.Chaos practise in TiDB (cont.) ● Memory ○ cgroup ● Storage ○ fuse ○ rm -rf ● Network ○ tc ○ iptable
17.Chaos practise in TiDB (cont.) ● other errors ○ ETCD key deleted ○ NTP errors ○ ...
18.Chaos practise in TiDB (cont.) ● Observe results ○ Learn from history
19.Chaos practise in TiDB (cont.) ● Observe results ○ Learn from log
20.Chaos practise in TiDB (cont.) ● Automation ○ Take some machines from SRE ○ Deploy ○ Experiment ○ Debug ○ Return
21.● Why Chaos ● Practice in TiDB ● Schrodinger
22.Schrodinger
23.Schrodinger (cont.)
24.Schrodinger (cont.)
25.Schrodinger (cont.)
26.Schrodinger (cont.) cat
27.Schrodinger (cont.)
28.Schrodinger (cont.) Chaos Operator
29.Schrodinger (cont.) Run with your own Helm charts