HBase Bucket Cache On Persistent Memory

由来自 Intel 的资深PMC成员 Anoop 和 Ramkrishna 分享,他们的 Intel 同事 XuKai有参与介绍。Persistent Memory 是 Intel 研发的一种新型持久化内存,和 Intel 的朋友交流,据说成本只有内存的1/3,但是性能能到内存的90%左右,同时还能保证持久性。这是一种性价比很高的新型存储介质。

以小米机器为例,HBase 的机器都是128GB的内存,外加12块900GB左右的SSD盘。单机能存放近10TB的数据,但内存却只有128GB,内存容量和磁盘容量占比为1.1%。而实际上,延迟敏感型业务方对 HBase 的 Cache 命中率是有更高要求的,那怎么办?Intel 的思路就是将 Cache 放到容量更大、性能损耗可控的 Persistent Memory 上来,例如在10TB的机器上用1TB的 Persistent Memory 做 BucketCache,那 Cache 命中率将大幅提升。
从他们的测试结果可以看出,也确实是有很大性能提升的。

当然,我们内部讨论过,如果没有 Persistent Memory 这种特殊的硬件支持,也可以考虑将 BucketCache 混合存放在内存和 SSD 上。简单来说,就是将最热的数据存内存,次热的数据存 SSD.至少次热的数据是直接读的本地 SSD,无论是走 HDFS 本地短路读,还是 HDFS 远程读,都不可能比跳过 HDFS 协议读本地 SSD 更快。

展开查看详情

1.

2.HBase Bucket Cache On Persistent Memory Anoop Sam John, Ramkrishna S Vasudevan, Xu Kai

3.Notices and Disclaimers Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Performance results are based on testing as of 06 24, 2019 and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Configuration: See slide 9 Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at [intel.com]. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. No product or component can be absolutely secure. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, 3D XPoint, Optane, Xeon, Xeon logos, and Intel Optane logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation.

4.Persistent Memory Technology Operation modes Bucket cache On Persistent Memory Performance Numbers

5.Persistent memory Technology

6.Memory modes 2 LM – Memory mode App direct mode Transparent to applications Application is aware of Pmem and DRAM DRAM acts as first level cache Application decides whether to use DRAM or Pmem (HBase Bucket cache) No persistence available, huge memory is made available for applications Persistence available

7.Configuration modes 2-2-2 2-2-1 2-1-1

8. Random Reads Region Server • HBase Bucket Cache overview: o Data read from HDFS is cached in BlockCache Region 1 Region 2 Region n o HBase has various implementations of BlockCache Cache block o BucketCache is one implementation of Block Cache Bucket Cache  BucketCache is allocated on DCPMM/DRAM using Java DirectByteBuffer mechanism  Modes: offheap (DRAM), file, mmap Write  New Mode : pmem (HBASE-21874), included in CDH6.2.0 Read DCPMM  Supports large BlockCache for high performance Fetch from HDFS  Large BlockCache -> low latency and higher throughput • This case study is with Bucket Cache in Offheap(DRAM) vs Pmem(DCPMM) Data Node o Equivalent capacity, DCPMM can be much cheaper than DRAM with minor performance drop o Same/Similar cost, DCPMM gives a larger size compared to DRAM, which means HDD/SSD more data in cache and better latency/throughput o Though the server with DCPMM has DRAM also, note that in DCPMM tests the amount of DRAM has no role to play in the bucket cache experiment.

9.HBase Bucket Cache with Intel® Optane™ DC Persistent Memory – Similar Capacity up to 94.0% performance of DRAM when DCPMM/DRAM have * HBase Random-Read Normalized Performance (x) similar capacity and all the data can fit within DCPMM/DRAM DCPMM vs. DRAM TPS(Transaction per Second), Higher is Better DRAM Cluster DCPMM Cluster 100.0% 94.0% 92.5% 92.4% 93.1% # of workers 1 2nd Gen Intel Xeon Gold 6240 (dual sockets)(Casacade Processors lake) 1.5TB 192 GB DRAM (24 * 64GB) (12 * 16GB) 1.5TB DCPMM N/A (12 * 128GB) 2-2-2 DRAM DCPMM/50 threads DCPMM/100 threads DCPMM/150 threads DCPMM/200 threads DCPMM Config N/A (AppDirect mode) 7.68 TB – 8 * 960GB SATA3 SSD Benchmark kit HBase performance evaluation tool Storage Dataset Size 1200GB (Data can fit within both DRAM and DCPMM) Network 10Gb Ethernet SUT (system under test) CDH 6.2.0 Performance results are based on testing as of 06 24, 2019 and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks.

10. HBase Bucket Cache with Intel® Optane™ DC Persistent Memory – Similar Cost up to 13.2X* performance of up to 3.1X* performance of DRAM when HBase Random-Read Normalized Performance (x) DCPMM vs. DRAM DRAM when all of the data can fit all data can not fit within DCPMM/DRAM, but TPS(Transaction per Second), Higher is Better within DCPMM, about ⅔ of the data can DCPMM can hold more(DCPMM holds about ¾ , fit within DRAM DRAM holds about ½ ) 13.2 DRAM 12.5 11.6 All data fit within DCPMM 10.2 Part of data fit within DCPMM DRAM Cluster DCPMM Cluster # of workers 1 3.1 3.1 2.6 3.0 2nd Gen Intel Xeon Gold 6240 (dual sockets)(Cascade 1 Processors lake) DRAM DCPMM/900GB/50 DCPMM/900GB/100 DCPMM/900GB/150 DCPMM/900GB/200 DCPMM/1200GB/50 DCPMM/1200GB/100 DCPMM/1200GB/150 DCPMM/1200GB/200 768GB 192 GB DRAM (12 * 64GB) (12 * 16GB) threads threads threads threads threads threads threads threads 1TB DCPMM N/A (8 * 128GB) 2-2-1 Benchmark kit HBase performance evaluation tool DCPMM Config N/A (AppDirect mode) Dataset Size 900GB (All data can fit within DCPMM, but part fit within DRAM) 1200GB 7.68 TB – 8 * 960GB SATA3 SSD Storage (Data can not all fit within DCPMM/DRAM, but DCPMM can hold more) Network 10Gb Ethernet SUT (system under test) CDH 6.2.0 Performance results are based on testing as of 06 24, 2019 and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks.

11.HBase release and JIRA https://issues.apache.org/jira/browse/HBASE-21874 - Bucket cache on Persistent memory Available in CDH 6.2.0 release.

12.Future Work • Support Tiered cache – with cache residing on both DRAM and DCPMM. • JDK support for DCPMM with old gen objects residing on DCPMM. • Support DCPMM in write path (WALLess HBase).

13.Thanks!

14.Backup

15.Tests on GCP Name n1-himem- n1-himem- n1-standard-96 (DRAM + 96(DRAM only) 32(DRAM only) AEP) # of instances Varies by test 3 1 CPU Xeon Xeon Xeon vCores 96 32 96 Freq: 2.0 GHz/false 2.0 GHz/false Base/Turbo 55 MB 55 MB Cache DDR4 624 GB 208 GB 192 GB Memory HBase 550 GB 180 GB 1500 GB Bucket Cache DCPMM NA NA 1.6 TB (AD mode) Memory 1 * 2 TB 1 * 2 TB 1 * 2 TB Storage “SSD persistent “SSD persistent “SSD persistent disk” disk” disk” test

16. • 13.5x (219999 vs. 16194) to • 28x (177157 vs. 6194) to 33x 13.7x (177157 vs. 12871) (219999 vs. 6495) TPS* TPS* speedup using one speedup using one DCPMM- DCPMM-based instance based instance compared compared with two DRAM- with DRAM-only instances. only instances. Scenario: HBase data exceeds DRAM bucket-cache (550 GB), HBase data fits within DCPMM bucket-cache (1.5 TB) Scenario: HBase data exceeds DRAM bucket-cache (2 * 550 GB), * TPS: Transactions Per Second HBase data fits within DCPMM bucket-cache (1.5 TB)

17. • TPS* using one DCPMM- based instance is 90.1% (217745 vs. 241594) to 91.5% (178097 vs. 194447) of the TPS using two DRAM- only instances. Scenario: HBase data fits within DRAM bucket-cache (2 * 550 GB), HBase data fits within DCPMM bucket-cache (1.5 TB)

18.Tests on HPE infrastructure * TPS: Transactions Per Second Scenario: HBase hot data (1.2 TB) fits within DCPMM bucket-cache (1.2 TB) HBase hot data (1.2 TB) fits within DRAM bucket-cache (1.2 TB)