Percona XtraDB Cluster: Failure Scenarios and their Recovery

PerconaXtradb集群(a.k.aPXC)是一个开源、多主机、高可用性的MySQL集群解决方案。PXC与您的mysql/percona服务器创建的数据库一起工作。考虑到多主机方面,有多个保护来保护集群不进入不一致状态。大多数防护装置都可以根据其用户环境进行配置。但是,如果配置不正确,它们可能会导致集群停止、失败或出错。
在本会话中,我们将讨论故障场景,包括由于网络分区而进入非主状态的MySQL集群。我们还将讨论由于流控制导致的集群暂停、导致节点关闭的数据不一致以及初始捕获过程中的常见问题——A.K.A状态快照传输(SST)。其他问题包括事务清除的延迟、导致整个集群暂停的阻塞DDL和配置错误的集群。
我们还将讨论如何解决其中一些问题,以及如何从这些故障中安全地恢复。

展开查看详情

1.Percona XtraDB Cluster: Failure Scenarios and their Recovery Krunal Bauskar (PXC Lead, Percona) Alkin Tezuysal (Sr. Technical Manager, Percona)

2.Who we are? Krunal Bauskar Alkin Tezuysal (@ask_dba) ● Database enthusiast. ● Open Source Database Evangelist ● Practicing databases (MySQL) for over a ● Global Database Operations Expert decade now. ● Cloud Infrastructure Architect AWS ● Wide interest in data handling and ● Inspiring Technical and Strategic Leader management. ● Creative Team Builder ● Worked on some real big data that powered ● Speaker, Mentor, and Coach application @ Yahoo, Oracle, Teradata. ● Outdoor Enthusiast 2

3.Agenda ● Quick sniff at PXC ● Failure Scenarios and their recovery ● PXC Genie - You wish. We implement. ● Q&A 3

4.Quick Sniff at PXC

5. What is PXC ? Enhanced Multi-master Security Network Flexible protection topology (Geo-distributed) Auto-node Performance provisioning tuned 5

6.Failure Scenarios and their recovery

7.Scenario: New node fail to connect to cluster 7

8.Scenario: New node fail to connect to cluster Joiner log 8

9.Scenario: New node fail to connect to cluster Joiner log Administrator reviews DONOR log doesn’t have any configuration settings like IP traces of JOINER trying to JOIN. address are sane and valid. 9

10.Scenario: New node fail to connect to cluster Joiner log Administrator reviews DONOR log doesn’t have any configuration settings like IP traces of JOINER trying to JOIN. address are sane and valid. Still JOINER fails to connect 10

11.Scenario: New node fail to connect to cluster Joiner log Administrator reviews DONOR log doesn’t have any configuration settings like IP traces of JOINER trying to JOIN. address are sane and valid. SELinux/AppArmor 11

12.Scenario: New node fail to connect to cluster Joiner log Don’t confuse this error with SST since node is not yet offered membership of cluster. SST comes post membership. 12

13.Scenario: New node fail to connect to cluster ● Solution-1: ○ Setting mode to PERMISSIVE or DISABLED 13

14.Scenario: New node fail to connect to cluster ● Solution-1: ○ Setting mode to PERMISSIVE or DISABLED ● Solution-2: ○ Configuring policy to allow access in ENFORCING mode. ○ Related blogs ■ “Lock Down: Enforcing SELinux with Percona XtraDB Cluster”. It probs what all permission are needed and add rules accordingly. ■ “Lock Down: Enforcing AppArmor with Percona XtraDB Cluster” ■ Using this we can continue to use SELinux in enable mode. (You can also refer to selinux configuration on Codership site too). 14

15.Scenario: New node fail to connect to cluster PXC can operate with SELinux/AppArmor. 15

16.Scenario: Catching up cluster (SST, IST) 16

17.Scenario: Catching up cluster (SST, IST) ● SST: complete copy-over of data-directory ○ SST has has multiple external components SST script, XB, network aspect, etc. Some of these are outside control of PXC process. ● IST: missing write-sets (as node is already member of cluster). ○ Intrinsic to PXC process space. 17

18.Scenario: Catching up cluster (SST, IST) Joiner log #1 18

19.Scenario: Catching up cluster (SST, IST) Joiner log SST failed on DONOR #1 19

20.Scenario: Catching up cluster (SST, IST) Joiner log SST failed on DONOR wsrep_sst_auth not set on DONOR #1 20

21.Scenario: Catching up cluster (SST, IST) Joiner log wsrep_sst_auth should be set on DONOR (often user set it on JOINER and things still fails). Post SST, JOINER will copy-over the said user from DONOR. #1 21

22.Scenario: Catching up cluster (SST, IST) Donor log #2 22

23.Scenario: Catching up cluster (SST, IST) Donor log Possible cause: ● Specified wsrep_sst_auth user doesn’t exit. ● Credentials are wrong. ● Insufficient privileges. #2 23

24.Scenario: Catching up cluster (SST, IST) Joiner log #3 24

25.Scenario: Catching up cluster (SST, IST) Joiner log #3 Trying to get old version JOINER to join from new version DONOR. (Not supported). Opposite is naturally allowed. 25

26.Scenario: Catching up cluster (SST, IST) Donor log Joiner log #4 26

27.Scenario: Catching up cluster (SST, IST) Donor log WSREP_SST: [WARNING] wsrep_node_address or Joiner log wsrep_sst_receive_address not set. Consider setting them if SST fails. #4 27

28.Scenario: Catching up cluster (SST, IST) #5 28

29.Scenario: Catching up cluster (SST, IST) Faulty SSL configuration #5 29