MySQL High Availability and Disaster Recovery on AWS

在这一小时的课程中,Peter描述了高可用性(HA)和灾难恢复(DR)之间的区别,然后详细介绍了如何在Amazon RDS中手动处理每一种情况。


1. MySQL High Availability and Disaster Recovery on AWS From doing it manually to Amazon RDS and Aurora Peter Zaitsev, CEO March 6th, 2019 Percona Technical Webinars © 2019 Percona. 1

2.Originally delivered at AWS Re:Invent 2018 Could not get recorded, so updated version is done as Webinar © 2019 Percona. 2

3.About this Presentation High Availability and Disaster Recovery Implementing HA and DR manually MySQL HA and DR with AWS DBaaS Offerings © 2019 Percona. 3

4.What Do We Want From Applications ? Work Have Good Be “UP” Correctly Performance © 2019 Percona. 4

5.From Engineer Standpoint • Many practices from Design to Proper Correctness QA • Benchmarks, Load Testing, Capacity Performance Planning High Availability • Design, Chaos Engineering © 2019 Percona. 5

6.High Availability is Hard Hard to Foresee Everything What Can Go Wrong Long Tail of Low Probability Problems A lot of Environment Specifics Depends on Technology and Operational Practices © 2019 Percona. 6

7.Engineered Systems Like Security, High Availability favors well designed and tested “Engineered Systems” © 2019 Percona. 7

8.Pro Tip Consider Having Expert Evaluation of your HA and DR Design, Implementation and Operational Practices. Percona can provide one. © 2019 Percona. 8

9.HA and DR High Availability Disaster Recovery • High/Medium Event Frequency • Rare Event Frequency • Low/Medium Event Impact • High Event Impact • No/Minimum Downtime • Downtime may be allowed by • No Data Loss Design • No Manual Intervention Needed • Data Loss may be allowed by Design • Manual Intervention may be needed © 2019 Percona. 9

10.HR vs DR Example High Availability Disaster Recovery • Failure of the Single • Multiple AZ failure Server within region • Failure of Single • Major Software issues Availability Zone • Major Security Incidents © 2019 Percona. 10

11.High Availability Planning What is the estimated time between events (MTBF) What Is acceptable impact to the system ? What Is the time to recover (MTTR) © 2019 Percona. 11

12.Example: RAID5 volume disk failure Single SSD may have 1Mil+ hours between failures (100+ years) With 1000s of disk in the data center it is not that rare event Impact to the system: Performance Impact, Loss of Redundancy Time to Recover: Hours © 2019 Percona. 12

13.Achieving High Availability for a Database Compute Redundancy Data Redundancy and Replication Failover Management Service Endpoint Provisioning © 2019 Percona. 13

14.HA and DR for MySQL MySQL is often your “system or record” Appropriate HA and DR planning is critically important © 2019 Percona. 14

15.Choices for MySQL at AWS Use AWS Build your DBaaS own Services © 2019 Percona. 15

16.DBaaS Advantages Disadvantages •Save Time •Potentially Higher •Reduce Risk Infrastructure Costs •Empower •Less Flexibility Development Team •Less Control © 2019 Percona. 16

17.Roll Your Own Advantages Disadvantages •More Flexibility •Additional Time and •More Control Effort Required •Potentially lower •Potentially Infrastructure Costs Additional Risks © 2019 Percona. 17

18.Do it yourself HA options for MySQL © 2019 Percona. 18

19.Achieving Data Redundancy External/Application Storage Level Database Level Level © 2019 Percona. 19

20.Storage Level Redundancy Mount EBS volume to another instance DRBD Clustered File Systems © 2019 Percona. 20

21.Database Level Redundancy Classical MySQL Replication MySQL Group Replication/MySQL Innodb Cluster Percona XtraDB Cluster and other Galera based Technologies MySQL Cluster (NDB) © 2019 Percona. 21

22.External/Application Level Manual Trigger Based Using Kafka as Application Replication Message Bus Replication © 2019 Percona. 22

23.Classical MySQL Replication © 2019 Percona. 23

24.Classical MySQL Replication Properties Asynchronous or Semi-Synchronous Parallel (since MySQL 5.7) Many Masters to Many Slaves (MySQL 5.7) No Conflict Resolution or “Protection” No Built-in Failover © 2019 Percona. 24

25.Advanced MySQL Replication Topologies © 2019 Percona. 25

26.MySQL Group Replication © 2019 Percona. 26

27.MySQL Group Replication (New in 5.7) “Group of Peers” Write-Anywhere or Dedicated Writer Asynchronous Replication with Flow Control Conflicts Prevented through Certification Built in Failover No Automated Provisioning © 2019 Percona. 27

28.Percona XtraDB Cluster/Galera © 2019 Percona. 28

29.Percona XtraDB Cluster Topology © 2019 Percona. 29