- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
How to Rock with MyRocks
myrocks是facebook的一个新的存储引擎,可以在Percona Server for MySQL中使用。在什么情况下,您想使用它?我们将检查不同的工作量,以及何时myrocks最适合您。另外,对于任何新的发动机来说,正确设置和调整发动机也是很重要的。因此,我们将回顾需要注意的最重要的设置。
展开查看详情
1 .How To Rock with MyRocks Vadim Tkachenko CTO, Percona Webinar, Jan-16 2019
2 .Agenda • MyRocks intro and internals • MyRocks limitations • Benchmarks: When to choose MyRocks over InnoDB • Tuning for the best results
3 .LSM-trees merge merge merge merge writes memory disk “Write-optimized” data structure
4 .Write-optimized? Flush when • Write immediately full L0 • Do the heavy work SST File “sometime” later Memtable writes (in memory) SST File Key=>value SST File WAL (redo-log) memory storage
5 .Traditional engines (B+ tree) • Write may trigger read Flush in background Data File cache writes (in memory) Read if updated data is not in memory WAL (redo-log) memory storage
6 .B+tree performance
7 .Write-optimized problems • Reads • Do the heavy work “sometime” later è may cause multiple copies in multiple files • Reads in general slower than in B+ Tree engines
8 .Write-optimized problems Flush when • Reads full L0 • Unique keys è Writes to SST File force read for unique keys Memtable constraint check (in memory) SST File • Foreign keys – SST File the same (not supported Constrain check at the moment) Read if updated data is not in memory Block cache WAL (redo-log) memory storage
9 .Background jobs Sorted run of key=>value pairs, Final data set, partial data sets sorted by (key) Flush when full L0 L1 … L6 SST File SST File SST File Memtable (in memory) SST File SST File SST File SST File SST File SST File SST File SST File Compact (merge) when full SST File SST File memory storage Compact Compact SST File (merge) (merge) when full when full
10 .Space amplification Final data set L1 L2 L3 L6 SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File N GB SST File SST File N*10 GB storage SST File N*100 GB N*100.000 GB
11 .Space amplification • If N=1 and the final size is 100.000GB (100TB), then extra size: • 10.000 + 1.000 + 100 + 10 + 1 = 11.111GB • amplification is 11.1%
12 .Space amplification • What if the final size is not 100TB? • The space amplification will be bigger than 11% • Dynamic compaction • N is calculated dynamically as we change data • level_compaction_dynamic_level_bytes = true (now default in 8.0) • Dynamic compaction allows to keep space amplification minimal
13 .Space amplification in B+ tree • Depends on insertion order • Worst case – all pages split in half • Space amplification: • 0% if insert in sequential order • up to 50% if random (the worst case scenario) • The real number is somewhere in the middle
14 .Slow reads – read amplification in LSM Brief description of section contents
15 . Read amplification - Slow reads in MyRocks L0 L1 … L6 SST File SST File SST File Point Read (select where key=N) SST File SST File SST File SST File SST File SST File SST File SST File SST File SST File Binary search Reads are slow. on the whole level SST File This is what we pay for not doing reads at the time of write
16 . Slow reads L0 L1 … L6 Point Read (select where key=N) SST File SST File SST File SST File SST File SST File SST File SST File SST File Block cache (memory) SST File SST File rocksdb_block_cache_size SST File SST File Binary search on the whole level SST File
17 . Slow reads Quick answer if data does not exist L0 L1 … L6 Point Read SST File SST File SST File bloom (select where key=N) SST File SST File SST File bloom SST File SST File SST File SST File SST File SST File SST File bloom Trick: Bloom filters SST File Stored in memory Allow quickly dismiss levels that do not have data. Enabled by filter_policy=bloomfilter:10:false
18 . Slow reads L0 L1 … L6 Point Read SST File SST File SST File bloom (select where key=N) SST File SST File SST File bloom SST File SST File SST File SST File SST File SST File SST File L6 – bloom filter may be not needed bloom if the most queries ask data that exists SST File SELECT name FROM users WHERE email=<registered@example.com>
19 . Slow RANGE reads L0 L1 … L6 Range scan SST File SST File SST File Range Read (select where SST File SST File SST File key > M and key < N) SST File SST File SST File SST File SST File Range scan SST File SST File Bloom filters DO NOT WORK on range. SST File Range queries are going to be slow. There is a hope: Succinct Range Filter (SuRF) in the research stage by Carnegie Mellon University
20 .Cost of reads • Thanks Mark Callaghan for making the math • http://smalldatum.blogspot.com/2018/07/query-cpu-overheads-in-rocksdb.html • Assume 8 bln rows table • Point lookup • B+ tree: ~33 comparison operations • LSM tree: ~80 comparison operations
21 .So… If reads are slow – is MyRocks useful? Let’s not forget the bigger picture. • To select by a non-primary key we need indexes. • The more indexes - the slower writes in B+ tree
22 . Totally unscientific chart – just for the illustration Many variables are in play: • Memory size Overall performance • Storage performance • Datasize • Workload InnoDB (B+ tree) MyRocks (LSM Tree) Size of indexes / data If you are here – If you are here – InnoDB is better MyRocks is better
23 .Where do I see a fit for MyRocks? • Relatively big datasets • Over ~100GB in size • 5GB is not big • With multiple indexes • Write-intensive workloads
24 .MyRocks – how to get it Brief description of section contents
25 .The Source of truth • RocksDB – key-value library • https://github.com/facebook/rocksdb/ • Only in source code • MyRocks – engine for MySQL • Implements MySQL api and calls RocksDB • https://github.com/facebook/mysql-5.6 • Only in source code • Only for MySQL 5.6 • You need to know what RocksDB version it works with
26 .Production packages • Percona Server 5.7 and 8.0 • All heavy-integration work and testing
27 .Things to know about MyRocks • All files are in a single .rocksdb directory • No separation per-database or per-table 001636.sst 001637.sst 001638.sst 001639.sst 001640.sst 001641.sst • LOG file – contains a lot of useful information. A LOT…
28 .Default compression LZ4 • In Percona Server, all levels are LZ4 compressed by default • Zstd is available. • Different compression for L6 is possible: • compression=kLZ4Compression;bottommost_compression=kZSTD • Global setting for ALL databases and tables • Column families allow settings per table and per index
29 .Traditional zlib is slow Great speed/ratio balance Best speed