MyRocks in the Real World

在这次主题演讲中,Yoshi将分享从Facebook Myrocks的生产部署和运营以及未来Myrocks开发路线图中获得的有趣经验。vadim将讨论mysql的percona服务器中的myrock,并共享来自内部部署和云部署的性能基准测试。

展开查看详情

1.MyRocks in the Real World Yoshinori Matsunobu Production Engineer / MySQL Tech Lead, Facebook Nov 2018

2.What is MyRocks ▪ MySQL on top of RocksDB (Log-Structured Merge Tree Database) ▪ Open Source, distributed from MariaDB and Percona as well MySQL Clients SQL/Connector Parser Optimizer Replication etc InnoDB RocksDB MySQL http://myrocks.io/

3.Read, Write and Space Performance/Efficiency ▪ Pick two of them ▪ InnoDB/B-Tree favors Read at cost of Write and Space ▪ For large scale database on Flash, Space is important ▪ Read inefficiency can be mitigated by Flash and Cache tiers ▪ Write inefficiency can not be easily resolved ▪ Implementations matter (e.g. ZSTD > Zlib)

4. Read, Write and Space Performance/Efficiency - Compressed InnoDB is roughly 2x smaller than uncompressed InnoDB, MyRocks/HBase are 4x smaller - Decompression cost on read is non zero. It matters less on i/o bound workloads - HBase vs MyRocks perf differences came from implementation efficiencies rather than database architecture

5.UDB – Migration from InnoDB to MyRocks ▪ UDB: Our largest user database that stores social activities ▪ Biggest motivation was saving space ▪ 2X savings vs compressed InnoDB, 4X vs uncompressed InnoDB ▪ Write efficiency was 10X better ▪ Read efficiency was no worse than X times ▪ Could migrate without rewriting applications

6.User Database at Facebook InnoDB in user database MyRocks in user database CPU IO Space CPU IO Space Machine limit Machine limit 45% 90% 21% 15% 45% 20% 15% 21% 15%

7.MyRocks on Facebook Messaging ▪ In 2010, we created Facebook Messenger and we chose HBase as backend database ▪ LSM database ▪ Write optimized ▪ Smaller space ▪ Good enough on HDD ▪ Successful MyRocks on UDB led us to migrate Messenger as well ▪ MyRocks used much less CPU time, worked well on Flash ▪ p95~99 latency and error rates improved by 10X ▪ Migrated from HBase to MyRocks in 2017~2018

8.FB Messaging Migration from HBase to MyRocks

9.Our current status ▪ Our two biggest database services (UDB and Facebook Messenger) have been reliably running on top of MyRocks ▪ Efficiency wins : InnoDB to MyRocks ▪ Performance and Reliability wins : HBase to MyRocks ▪ Gradually working on migrating long tail, smaller database services to MyRocks

10.Future Plans ▪ MySQL 8.0 ▪ Pushing more efficiency efforts ▪ Simple read query paths to be much more CPU efficient ▪ Working without WAL, engine crash recovery relying on Binlog ▪ Towards more general purpose database ▪ Gap Lock and Foreign Key ▪ Long running transactions ▪ Online and fast schema changes ▪ Mixing MyRocks and InnoDB in the same instance

11.(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

12.MyRocks in the Real World Vadim Tkachenko, CTO Percona

13.Facebook-scale engine comes with Percona Server 5.7 and 8.0 RC

14.MyRocks is based on a Log Structured Merge Tree data structure merge merge merge merge writes cN c0 c1 c2 memory disk LSM-tree is Industry accepted YugaByte DB 3

15.MyRocks is designed for big data sets MyRocks is Cloud efficient MyRocks is good for SSD lifetime MyRocks is compression friendly 4

16.MyRocks is designed for big data sets Throughput transaction/sec (MORE is better) 7000 6000 5961 5000 4599 4503 4000 4205 3867 3000 2000 1000 849 0 100% Data fits into memory 50% Data fits into memory 5% Data fits into memory (100GB of data for 100GB of cache) (200GB of data for 100GB of cache) (2TB of data for 100GB of cache) InnoDB MyRocks 5

17.MyRocks is designed for big data sets MyRocks is Cloud efficient MyRocks is good for SSD lifetime KB written per transaction (LESS is better) 350 300 288 250 200 150 100 82 50 31 21 21 21 0 100% Data fits into memory 50% Data fits into memory 5% Data fits into memory (100GB of data for 100GB of (200GB of data for 100GB of (2TB of data for 100GB of cache) cache) cache) InnoDB MyRocks 6

18.MyRocks is Cloud efficient Cost of the storage per year, $ (LESS is better) 14000 13200 12000 10000 8000 6000 5400 4000 2000 0 InnoDB - 15000 IOPS Volume MyRocks - 5000 IOPS Volume Volumes are provisioned to achieve the same performance 7

19. MyRocks is designed for big data sets MyRocks is Cloud efficient InnoDB Engine User CPU – a good one Server time distribution System CPU Iowait time Idle time – a “waste” 0% percentage of data fits into memory 100% 8

20. MyRocks is designed for big data sets MyRocks is Cloud efficient InnoDB Engine MyRocks Engine User CPU – a good one User CPU – a good one Server time distribution User CPU – a good one Iowait time System CPU System CPU Iowait time System CPU Iowait time Idle time – a “waste” Idle time – a “waste” 0% percentage of data fits into memory 100% 0% percentage of data fits into memory 100% 9

21.MyRocks is Cloud efficient Scaling Scale with the better performance storage with better IO volumes 10

22. MyRocks is compression friendly Modern compressions methods Method Compression Ratio Compression speed Decompression speed LZ4 (default) 2.1 750 MB/s 3700 MB/s Zstandard 2.8 470 MB/s 1380 MB/s 11

23.So is MyRocks perfect? Are there downsides? Want to know more? My talk today: 12:20pm, room @Jones Yoshinori’s talk: 3:30pm, room @Dax