20_09 Next Generation Cassandra Compaction Going Beyond LCS

本文主要介绍了当前Cassandra compaction挑战、通用compaction策略-分层大小、均衡、下一代compaction策略


1.Next Generation Cassandra Compaction, Going beyond LCS Joey Lynch

2.Speaker Joey Lynch Senior Software Engineer Cloud Data Engineering at Netflix Distributed system addict and data wrangler

3.Outline Main Compaction challenges General Purpose Compaction Strategies: ● Size Tiered ● Leveled Next Generation Compaction Strategy Outline

4.Compaction Putting the merge in log structured merge (LSM) Compaction

5.Path to Compaction: Flush Writes are coalesced into a memtable, then flushed Keep it Running

6.What is an “SSTable” One “SSTable” actually multiple “components” Keep it Running

7.Problem: Space Amplification We have unbounded number of SSTables Keep it Running

8.Problem: Read Amplification Compaction You wanted to read your data?

9.Compaction Inputs Either compressed or uncompressed size What range of tokens does the SSTable span? Non-overlapping guarantee Have some SSTables Have some metrics Keep it Running

10.Compaction Strategy Picks Candidates Keep it Running

11.Compaction Outputs Phase 1 Phase 2 Phase 3 Compaction

12.Compaction Is Expensive Phase 1 Phase 2 Phase 3 Reads through a Re-builds view Re-builds view, bunch of data, Drops old data, OS materializing it into page cache drops heap This is very This can be This can be expensive expensive expensive Compaction

13.Problems? Why? Space Amplification Can’t take infinite disk space ... Read Amplification Reads should be fast Write Amplification Compaction is expensive

14.Compaction Strategy Goal Compaction

15.“How much disk space did I reclaim?” “Is the data really gone?” “My reads are slow!!”

16.Problems? Why? Full compaction Have to be able to make guarantees Manual compaction Targeted incidents (hot partitions) Handle tons of small Repair is fun this way SSTables

17.Lessons Compaction is an optimization problem Learned Trying to reduce space and read amplification … while doing as little write amplification as we can

18.Size Tiered Let’s do the most obvious thing Compaction STCS

19.Group SSTables by Size Size Tiered

20.Main Tunables Meaning? min_threshold This dictates the level of tiering bucket_low What should be bucketed together bucket_high

21.Group SSTables by Size Size Tiered

22.Advantages Why? Good for write only Exponential re-write workloads curve Full compaction works Small number of files great Super simple Fewer bugs

23.Problems? Why do we care? 50% space overhead Hard to provision for Giant SSTables Hard to backup / transfer Begat early re-open 🔥🔥 This had a ton of bugs Read amplification Variable read performance is bad

24.Problems? Why do we care? Needless write Wasting CPU is bad amplification Have to do periodic full Re-writing the whole compactions to give any dataset all at once every guarantees few weeks seems like a suboptimal choice

25.Lessons Large single files are problematic Learned Need to wait until we have enough work from flushes to do. Time bounds on full compaction is nice

26.Leveled What if we only did useful work? Compaction LCS

27.Sorted runs to the rescue! Leveled

28.Advantages Why? Wicked fast reads Bounded read amplification Relatively fixed disk Bounded space usage amplification

29.Sorted runs to the rescue! Leveled