Globalizing Player Accounts with MySQL at Riot Games



1.Globalizing Player Accounts with MySQL at Riot Games Tyler Turk Riot Games, Inc.

2.About Me Senior Infrastructure Engineer Player Platform at Riot Games Father of three Similar talk at re:Invent last year 2

3.Accounts Team Responsible for account data Provides account management Ensures players can login Aims to mitigate account compromises 3

4.Overview The old and the new

5.League’s growth and shard deployment Launched in 2009 Experienced rapid growth Deployed multiple game shards Each shard used their own MySQL DBs 5

6.Some context Hundreds of millions of players worldwide Localized primary / secondary replication Data federated with each shard Account transfers were difficult 6

7.Why MySQL? Widely used & adopted at Riot Used extensively by Tencent Ensures ACID compliance 7

8.Catalysts for globalization General Data Protection Regulation Decoupling from game platform Single source of truth for accounts 8

9.Globalization of Player Accounts Migrating from 10 isolated databases to a single globally replicated database

10.Data deployment considerations Globally replicated, multi-master Globally replicated, single master Federated or sharded data To cache or not to cache 10

11.Global database expectations Highly available Geographically distributed < 1 sec latency replication < 20ms read latency Enables a better player experience 11

12.Continuent Tungsten Third-party vendor Provides cluster orchestration Manages data replication MySQL connector proxy 12

13.Why Continuent Tungsten? Prior issues with Aurora RDS was not multi-region Preferred asynchronous replication Automated cluster management 13

14.Explanation & tolerating failure 14

15.Deployment Terraform & Ansible (docker initially) 4 AWS regions r4.8xlarge (10Gbps network) 5TB GP2 EBS for data 15TB for logs / backups 15

16.Migrating the data Multi-step migration of data Consolidated data into 1 DB Multiple rows for a single account 16

17.Load testing 17

18.Chaos testing 18

19.Monitoring 19

20.Performing backups Leverage standalone replicator Backup with xtrabackup Compress and upload to S3 Optional delay on replicator 20

21.Performing maintenance Cluster policies Offline and shun nodes Perform cluster switch 21

22.Performing schema changes Schema MUST be backwards compatible The Process Order of operations for schema change: • Offline node 1. Replicas in non-primary region • Wait for connections to drain 2. Cluster switch on relay • Stop replicator 3. Perform change on former relay • Perform schema change 4. Repeat steps 1-3 on all non-primary regions • Start replicator 5. Replicas in primary region • Wait for replication 6. Cluster switch on write primary • Online node 7. Perform change on former write 22

23.De-dockering Fully automated the process One server at a time Performed live Near zero downtime 23

24.Current state Database deployed on host No docker for database / sidecars Accounts are distilled to a single row Servicing all game shards 24

25.Lessons Learned Avoiding the same mistakes we made

26.Databases in docker Partially immutable infrastructure Configuration divergence possible Upgrades required container restarts Pain in automating deploys 26

27.Large data imports Consider removing indexes Perform daily delta syncs Migrate in chunks if possible 27

28.Think about data needs Synchronous vs asynchronous Read heavy vs write heavy 28

29.Impacts of replication latency Replication can take >1 second Impacts strongly consistent expectations Immediate read-backs can fail Think about “eventual” consistency 29