- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
HBase 吞吐量提升实践
展开查看详情
1 . Lift the Ceiling of Throughputs Yu Li, Lijin Bin {jueding.ly, tianzhao.blj} @alibaba-inc.com
2 . Agenda n What/Where/When l History of HBase in Alibaba Search n Why l Throughputs mean a lot n How l Lift the ceiling of read throughputs l Lift the ceiling of write throughputs n About future
3 . HBase in Alibaba Search n HBase is the core storage in Alibaba search system, since 2010 n History of version used online l 2010~2014: 0.20.6à0.90.3à0.92.1à0.94.1à0.94.2à0.94.5 l 2014~2015: 0.94à0.98.1à0.98.4à0.98.8à0.98.12 l 2016: 0.98.12à1.1.2 n Cluster scale and use case l Multiple clusters, largest with more than 1,500 nodes l Co-located with Flink/Yarn, serving over 40Million/s Ops throughout the day l Main source/sink for search and machine learning platform
4 . Throughputs mean a lot n Machine learning generates huge workloads l Both read and write, no upper limit l Both IO and CPU bound n Throughputs decides the speed of ML processing l More throughputs means more iterations in a time unit n Speed of processing decides accuracy of decision made l Recommendation quality l Fraud detection accuracy
5 . Lift ceiling of read throughput n NettyRpcServer (HBASE-17263) l Why Netty? n Enlightened by real world suffering l HBASE-11297 n Better thread model and performance l Effect n Online RT under high pressure: 0.92msà0.25ms n Throughputs almost doubled
6 . Lift ceiling of read throughput n NettyRpcServer (HBASE-17263) l Why Netty? n Enlightened by real world suffering l HBASE-11297 n Better thread model and performance l Effect n Online RT under high pressure: 0.92msà0.25ms n Throughputs almost doubled
7 . Lift ceiling of read throughput (con’t) n RowIndexDBE (HBASE-16213) l Why n Seek in the row when random reading is one of the main consumers of CPU n All DBE except Prefix Tree use sequential search. l How n Add row index in a HFileBlock for binary search. (HBASE-16213) l Effect n Use less CPU and improve throughput, KeyValues<64B, increased >10%
8 . Lift ceiling of read throughput (con’t) n End-to-end read path offheap l Why n Advanced disk IO capability cause quicker cache eviction n Suffering from GC caused by on-heap copy l How n Backport E2E read-path offheap to branch-1 (HBASE-17138) n More details please refer to Anoop/Ram’s session l Effect n Throughput increased 30% n Much more stable, less spike
9 . Lift ceiling of read throughput (con’t) n End-to-end read path offheap l Before l After
10 . Lift ceiling of write throughput n MVCC pre-assign (HBASE-16698, HBASE-17509/17471) l Why n Issue located from real world suffering: no more active handler n MVCC is assigned after WAL append n WAL append is designed to be RS-level sequential, thus throughput limited l How n Assign mvcc before WAL append, meanwhile assure the append order l Original designed to use lock inside FSHLog (HBASE-16698) l Improved by generating sequence id inside MVCC existing lock (HBASE-17471) l Effect n SYNC_WAL throughput improved 30%,ASYNC_WAL even more (>70%)
11 . Lift ceiling of write throughput (cont’d) n Refine the write path (Experimenting) l Why n Far from taking full usage of IO capacity of new hardware like PCIe-SSD n WAL sync is IO-bound, while RPC handling is CPU-bound l Write handlers should be non-blocking: do not wait for sync l Respond asynchronously n WAL append is sequential, while region puts are parallel l Unnecessary context switch n WAL append is IO-bound, while MemStore insertion is CPU-bound l Possible to parallelize?
12 . Lift ceiling of write throughput (cont’d) n Refine the write path (Experimenting) l How n Break the write path into 3 stages l Pre-append, sync, post-sync l Buffer/queue between stages n Handlers only handle pre-append stage, respond in post-sync stage n Bind regions to specific handler l Reduce unnecessary context switch
13 . Lift ceiling of write throughput (cont’d) n Refine the write path (Experimenting) l Effect (Lab data) n Throughput tripled: 140K à 420K with PCIe-SSD l TODO n Currently PCIe-SSD IO util only reached 20%, much more space to improve n Integration with write-path offheap – more to expect n Upstream the work after it’s verified online
14 . About Future n HBase is still a kid – only 10 years’old l More ceilings to break n Improving, but still long way to go n Far from fully utilizing the hardware capability, no matter CPU or IO l More scenarios to try n Embedded-mode (HBASE-17743) l More to expect n 2.0 coming, 3.0 in plan n Hopefully more community involvement from Asia l More upstream, less private
15 .Q & A Thank You!