How to Rock with MyRocks

myrocks是Facebook的一个新的存储引擎,可在Percona Server for MySQL中使用。在什么情况下你会想使用它?我们将检查不同的工作量以及myrocks最适合您的时间。另外,对于任何新的引擎来说,正确地设置和调优它可能并不容易,因此我们将回顾需要注意的最重要的设置。

展开查看详情

1.How To Rock with MyRocks Vadim Tkachenko CTO, Percona

2.Thank You Sponsors!! 2

3.Agenda • MyRocks intro and internals • MyRocks limitations • Benchmarks: When to choose MyRocks over InnoDB • Tuning for the best results 3

4.LSM-trees merge merge merge merge writes cN c0 c1 c2 memory disk “Write-optimized” data structure 4

5.Write-optimized? Flush when • Write immediately full L0 • Do the heavy work SST File “sometime” later Memtable writes (in memory) SST File Key=>value SST File WAL (redo-log) memory storage 5

6.Traditional engines (B+ tree) • Write may trigger read Flush in background Data File cache writes (in memory) Read if updated data is not in memory WAL (redo-log) memory storage 6

7.B+tree performance 7

8.Write-optimized problems • Reads • Do the heavy work “sometime” later ➔ may cause multiple copies in multiple files • Reads in general slower than in B+ Tree engines 8

9.Write-optimized problems Flush when • Reads full L0 • Unique keys ➔ Writes to SST File force read for unique keys Memtable constraint check (in memory) SST File • Foreign keys – SST File the same (not supported Constrain check Read if updated data is not in memory at the moment) Block cache WAL (redo-log) memory storage 9

10.Background jobs Sorted run of key=>value pairs, Final data set, partial data sets sorted by (key) Flush when full L0 L1 … L6 SST File SST File SST File Memtable (in memory) SST File SST File SST File SST File SST File SST File SST File SST File Compact (merge) when full SST File SST File memory storage Compact Compact SST File (merge) (merge) when full when full 10

11.Space amplification Final data set L1 L2 L3 L6 SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File N GB SST File SST File N*10 GB storage SST File N*100 GB 11 N*100.000 GB

12.Space amplification • If N=1 and the final size is 100.000GB (100TB), then extra size: • 10.000 + 1.000 + 100 + 10 + 1 = 11.111GB • amplification is 11.1% 12

13.Space amplification • What if the final size is not 100TB? • The space amplification will be bigger than 11% • Dynamic compaction • N is calculated dynamically as we change data • level_compaction_dynamic_level_bytes = true (now default in 8.0) • Dynamic compaction allows to keep space amplification minimal 13

14.Space amplification in B+ tree • Depends on insertion order • Worst case – all pages split in half • Space amplification: • 0% if insert in sequential order • up to 50% if random (the worst case scenario) • The real number is somewhere in the middle 14

15.Slow reads – read amplification in LSM

16. Read amplification - Slow reads in MyRocks L0 L1 … L6 SST File SST File SST File Point Read (select where key=N) SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File Binary search Reads are slow. on the whole level SST File This is what we pay for not doing reads at the time of write 16

17. Slow reads L0 L1 … L6 Point Read (select where key=N) SST File SST File SST File SST File SST File SST File SST File SST File SST File Block cache (memory) SST File SST File rocksdb_block_cache_size SST File SST File Binary search on the whole level SST File 17

18. Slow reads Quick answer if data does not exist L0 L1 … L6 Point Read bloom SST File SST File SST File (select where key=N) SST File SST File SST File bloom SST File SST File SST File SST File SST File SST File SST File bloom Trick: Bloom filters SST File Stored in memory Allow quickly dismiss levels that do not have data. Enabled by filter_policy=bloomfilter:10:false 18

19. Slow reads L0 L1 … L6 Point Read bloom SST File SST File SST File (select where key=N) SST File SST File SST File bloom SST File SST File SST File SST File SST File SST File SST File bloom L6 – bloom filter may be not needed if the most queries ask data that exists SST File SELECT name FROM users WHERE email=<registered@example.com> 19

20. Slow RANGE reads L0 L1 … L6 Range scan SST File SST File SST File Range Read (select where SST File SST File SST File key > M and key < N) SST File SST File SST File SST File SST File Range scan SST File SST File Bloom filters DO NOT WORK on range. SST File Range queries are going to be slow. There is a hope: Succinct Range Filter (SuRF) in the research stage by Carnegie Mellon University 20

21.Cost of reads • Thanks Mark Callaghan for making the math • http://smalldatum.blogspot.com/2018/07/query-cpu-overheads-in-rocksdb.html • Assume 8 bln rows table • Point lookup • B+ tree: ~33 comparison operations • LSM tree: ~80 comparison operations 21

22.So… If reads are slow – is MyRocks useful? Let’s not forget the bigger picture. • To select by a non-primary key we need indexes. • The more indexes - the slower writes in B+ tree 22

23. Totally unscientific chart – just for the illustration Many variables are in play: • Memory size Overall performance • Storage performance • Datasize • Workload InnoDB (B+ tree) MyRocks (LSM Tree) Size of indexes / data If you are here – If you are here – InnoDB is better MyRocks is better 23

24.Where do I see a fit for MyRocks? • Relatively big datasets • Over ~100GB in size • 5GB is not big • With multiple indexes • Write-intensive workloads 24

25.MyRocks – how to get it Click to add text

26.The Source of truth • RocksDB – key-value library • https://github.com/facebook/rocksdb/ • Only in source code • MyRocks – engine for MySQL • Implements MySQL api and calls RocksDB • https://github.com/facebook/mysql-5.6 • Only in source code • Only for MySQL 5.6 • You need to know what RocksDB version it works with 26

27.Production packages • Percona Server 5.7 and 8.0 • All heavy-integration work and testing 27

28.Things to know about MyRocks • All files are in a single .rocksdb directory • No separation per-database or per-table 001636.sst 001637.sst 001638.sst 001639.sst 001640.sst 001641.sst • LOG file – contains a lot of useful information. A LOT… 28

29.Default compression LZ4 • In Percona Server, all levels are LZ4 compressed by default • Zstd is available. • Different compression for L6 is possible: • compression=kLZ4Compression;bottommost_compression=kZSTD • Global setting for ALL databases and tables • Column families allow settings per table and per index 29