Errant GTIDs Breaking Replication

为了解决复制问题和提高数据库一致性,引入了GTID。
当一个复制副本上意外发生事务时,这会在该复制副本上引入不在主副本上的GTID。当主故障转移时,此副本成为新的主服务器,并且错误gtid的对应binlog已清除时,复制将中断此新主服务器的副本,因为无法从此新主服务器的binlog中检索缺少的gtid。
本演示将讨论GTID以及如何在副本上检测错误的GTID(在清除相应的binlog之前),以及如何查看binlog中的相应事务。我将给出一些可能发生在并非来自主节点的复制副本上的事务的示例,解释这是如何可能的,并分享一些关于如何避免这种情况的提示。
假设对MySQL数据库复制有基本的了解。

展开查看详情

1.Errant GTIDs breaking replication Dieter Adriaenssens Ghent University

2.Who am I? Dieter Adriaenssens • Linux System Administrator • MySQL DBA • Works at Ghent University • Open Source : former phpMyAdmin team member • Lives in Ghent, Belgium • Climber • E-mail : dieter.adriaenssens@ugent.be • Twitter : @dcadriaenssens • Pictures : − (c) Thomas Kindermans https://en.wikipedia.org/wiki/File:Gent_Graslei_zonsondergang.JPG − (c) Rouslan Placella 2

3.Errant GTIDs breaking replication How to detect and avoid them

4.Context Replicating MySQL cluster • Primary with some replicas ‘[..] the master has purged binary • Master failover logs containing GTIDs that the • Replication breaks from new primary (previous replica) to remaining replicas slave requires.’ • Where did those transactions on the new primary come from? 4

5.Overview • Replication and GTID • Errant GTIDs on a replica • Detection • Where do they come from? • How to avoid them • How to fix them • Demo 5

6.Introduction Replication, GTID, data consistency

7.Replication MySQL server replication : • Primary (master) Replic a • Several replicas (slaves) • Identical state across several servers Primar y Replic a 7

8.Replication : Why • High availability • Master failover • Disaster recovery • Scaling load • Regional distribution • ... 8

9.Replication : How MySQL master-slave replication : • All nodes start from an identical state • Replicas apply transactions from the primary node’s binary logs • Transaction in correct order • Asynchronous : replica initiates • Maintain identical state ● Read/write on primary ● Replicas should be readonly 9

10.Replication : How Keeping track of binlog position : • Several binlogfiles : mysql_bin.xxxxxx • Rotation (size based, restart server) • Every transaction has a unique (incremental) position • Replication start : ● Binary log file name ● position in the binlog • Problematic ● multiple primaries ● Master failover ● Restore from backup 10

11.GTID Global Transaction identifier • Unique identifier of a transaction Server_uuid:transaction_id • GTID = Origin ID + transaction ID 3E11FA47-71CA-11E1-9E33-C80AA9429562:23 • Origin ID : server UUID • Transaction ID : incremental number SELECT @@GLOBAL.server_uuid; 3E11FA47-71CA-11E1-9E33-C80AA9429562 11

12.GTID set Executed GTID set SHOW SLAVE STATUS; • Set of applied GTIDs on a server SHOW MASTER STATUS; SELECT @@GLOBAL.gtid_executed; • Defines a state of data consistency • Easy to compare between nodes Executed_Gtid_Set: • Replica applies GTIDs of a primary 27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113, • Simplifies failovers (no need to remember binlog position) 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-80192973, c004c0eb-c84e-11e6-8efc-aa00009002fd: 1-6084195:6140951-6141015 12

13.ROW based replication • Changes in rows are replicated • <-> transaction-based : actual queries • In combination with GTID => data consistency across the cluster nodes 13

14.Replicating cluster with GTID orchestrator -c topology –alias=demo node1:3306 [0s,ok,5.7.25,rw,ROW,>>,GTID] MySQL Server cluster + node2:3306 [0s,ok,5.7.25,ro,ROW,>>,GTID] • Primary (R/W) + node3:3306 [0s,ok,5.7.25,ro,ROW,>>,GTID] • Replicas (R/O) Orchestrator • GTID enabled • Row based replication • ProxySQL to redirect traffic to correct cluster node Node 2 • Orchestrator managing the cluster nodes Prox Nod (automatic master failover) y e1 SQL Node 3 14

15. Errant GTID Definition, consequences, detection, examples, how to avoid, fixes

16.Errant GTID • Unreplicated transaction existing only on a replica • Data is not the same on all nodes • Cluster is no longer in a consistent state 16

17.Errant GTID : consequences • Everything is fine? • Inconsistent state between nodes ● Split brain ● Different data when reading from that replica • Unexpected behaviour when a replica is promoted to master ‘[..] the master has purged binary ● Replication might fail logs containing GTIDs that the ● If GTID is purged from binlog, on slave requires.’ master failover → replication stops 17

18.Errant GTID detection

19.Errant GTID detection • Compare executed GTID sets between primary node and replica nodes • Replica has more GTIDs than primary => errant GTID 19

20.GTID executed set # primary SELECT @@GLOBAL.gtid_executed; 27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552120 # replica SELECT @@GLOBAL.gtid_executed; 27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113, 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4 20

21.GTID subset SELECT GTID_SUBSET('<gtid_executed_replica>', '<gtid_executed_primary>'); SELECT GTID_SUBSET( '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113', '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552120') AS is_subset; +-----------+ | is_subset | +-----------+ | 1| +-----------+ 1 row in set (0.00 sec) Replica GTID set is a subset of primary GTID set : OK 21

22.GTID subset SELECT GTID_SUBSET('<gtid_executed_replica>', '<gtid_executed_primary>'); SELECT GTID_SUBSET( '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113, 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4', '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552120') AS is_subset; +-----------+ | is_subset | +-----------+ | 0| +-----------+ 1 row in set (0.00 sec) Replica GTID set is NOT a subset of primary GTID set => Errant GTID on replica 22

23.Find errant GTIDs SELECT GTID_SUBTRACT('<gtid_executed_replica>', '<gtid_executed_primary>'); SELECT GTID_SUBTRACT( '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113, 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4', '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552120') AS errant_gtid; +------------------------------------------+ | errant_gtid | +------------------------------------------+ | 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4 | +------------------------------------------+ 1 row in set (0.00 sec) Result is subset of errant GTIDs 23

24.Errant GTID : automatic detection • Icinga monitoring check • Orchestrator • Other tools : ● Mysqlfailover ● Mysqlrpladmin 24

25.Errant GTID : detection Monitoring check • Automate checking for errant GTID • Icinga compatible output format • Uses orchestrator for cluster info • https://github.com/UGent-DICT/check_mysql_gtid 25

26.Errant GTID : monitoring check ./check_mysql_gtid <clustername> ./check_mysql_gtid demo MYSQL_CLUSTER_GTID OK - GTIDs on all nodes are replicated in the cluster! Cluster demo (primary : node1) : - node2 : OK - node3 : OK Everything is fine! 26

27.Errant GTID : monitoring check ./check_mysql_gtid <clustername> ./check_mysql_gtid demo MYSQL_CLUSTER_GTID WARNING : replicas containing unreplicated GTIDs : node2 Cluster demo (primary : node1) : - node2 : GTIDs only exist on the replica : 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4 - node3 : OK Check reports unreplicated (errant) GTIDs on one of the nodes 27

28.Errant GTID : detection Orchestrator • Reports errant GTIDs (>= v3.0.13) ● Provides ways to fix them (>= v3.0.14) 28

29.Errant GTID Find transaction • Look for GTID in binary logs • Each binlog mentions the executed GTID set (initial state) • Select relevant binlog • Find transaction in that binlog 29