- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
HBase在阿里巴巴的优化及实践
展开查看详情
1 . HBase: Recent Improvement And Practice At Alibaba Allan Yang(Alibaba/HBase Committer) Han Yang (Alibaba) Confidential & Proprietary
2 . Agenda HBase at Alibaba – Typical scenarios Running Architecture – Range data copy – Dual Service SQL – Performance and feature improvements Confidential & Proprietary 2
3 . HBase At Alibaba LOG Chat Monitor Trade IoT Logistics Search … 100 million+ TPS 10,000+ nodes PB+ data HBase Confidential & Proprietary 3
4 . Double 11 Festival Data Service Low latency query Data Src Message Real-time Trade Middleware Computing Order HBase LOG 1GB/s + 1 Million + … throughput TPS HBase HBase Confidential & Proprietary 4
5 . Risk Management of Ant Financial Risk Console People Real Time Query Action ENV Incremental Event Real-Time export Offline import HBase Computing Daily result import Method Time Expire data based on: • TTL 10TB+ data per day • Version • Low value column Confidential & Proprietary 5
6 . Deployment Architecture Client Dual Service Async Replication HBase HBase Range data copy HDFS HDFS 2 replicas 1 replica Confidential & Proprietary 6
7 . Range Data Copy Split copy job Grab RS Write to sub-tasks Bulkload Master ZK RS HDFS Cluster2 RS A feature provided inside HBase, fully distributed, no MR On the fly, no need to stop service Recoverable from all kinds of error and disaster Confidential & Proprietary 7
8 . Range Data Copy Scenarios HBase Data Center relocation IDC1 IDC2 Relocation Historical data Movement Data Recovery Replication HBase HBase Other Solution 1. CopyTable Historical Data Move Too too slow (scan table) Need MR role HBase HBase 2. Snapshot (Backup in HBase2.0) × Need disable table when restore HDFS HDFS No control of data range Not design for data migration Data Recovery Confidential & Proprietary 8
9 . Dual Service – Why? Region split, balance, RS down … GC Network HDFS Possible Solution: Region Replicas (HBASE-10070) Need internal replication Need triple disk space Replica region is not writable Confidential & Proprietary 9
10 . Dual Service • Take advantage of slave cluster • No extra resources needed Request Response Request to Glitch Request to Master Timeout Slave Callback Select the Async processer first return Master Replication Slave HBase HBase Confidential & Proprietary 10
11 . Dual Service - Benchmark Let’s call Request with RT > 50ms a ‘spike’ Set Glitch Timeout = 40ms(call slave if running after 40ms) Spike rate – Before Dual Service: Requests with RT>50ms / Total request – After Dual Service: (The proportion of request > 40ms in master) * (The proportion of request > 10ms in slave) W/O Dual Service W/ Dual Service Spike rate 0.047095% 0.001714% Confidential & Proprietary 11
12 . Why SQL? Easy and Quick to use HBase Schema Rich data typing 1 Data Type Rowkey construction Semantics 2 Basic/complex queries 3 Optimize Query with index 4 Optimize transparently Target Existing tools Confidential & Proprietary
13 . Phoenix Phoenix JDBC Driver ZooKeeper Service HBase Client HBase Master Service RegionServer RegionServer RegionServer Phoenix Phoenix Phoenix Coprocessor Coprocessor Coprocessor HDFS Confidential & Proprietary 13
14 . Performance against HBase API 3 • Single row select/Scan 2.6 • Single row upsert/Put 2.5 2.3 2 1.5 1.5 Phoenix 1 HBase API 0.7 0.5 0 (ms) Read Write Confidential & Proprietary 14
15 . Why is UPSERT much slower? UPSERT statement Cost around 1ms UpsertCompiler MutationState Update meta cache Meta Region hit RS Table#batch(mutations) Data table region Confidential & Proprietary 15
16 . What is Meta Cache? • Meta data of a Phoenix table in each Client • Schema – Columns, types, properties • Indexes – Add new index – Drop an index Confidential & Proprietary 16
17 . Meta Cache Update Policy • Init at the 1st time • Update meta periodically(PHOENIX-2520) • Update meta at mistakes • Version for each meta update – Request with meta version – Server always has the latest version Confidential & Proprietary 17
18 . Lift UPSERT performance 3 Reduce 38% latency 2.6 2.5 2 1.6 1.5 Phonenix 1.5 alhb-sql 1 HbaseAPI 0.5 0 (ms) write Confidential & Proprietary 18
19 . Lift SELECT performance JDBC Driver QueryCompiler -1ms: update meta cache once -0.5ms: use small scan QueryPlan setCaching Parallel Scan use single scan Spooling Spooling no prefetch Confidential & Proprietary 19
20 . Parallel or Single? • Use small scan properly • Use parallel scan unless it's necessary Scenarios Phoeinx's plan alihb-sql's plan full table scan parallel big scan single big scan single row select parallel big scan single small scan single region range scan single big scan single small scan cross region range scan parallel big scan single big scan aggregation parallel big scan parallel big scan Confidential & Proprietary 20
21 . Lift SELECT performance select * from tt where a = 10; Reduce 65% latency 2.5 2.3 2 1.5 Phoenix 1 0.8 0.7 alihb-sql 0.5 HbaseAPI 0 (ms) Read Confidential & Proprietary 21
22 . Improved performance 3 2.6 2.5 2.3 38% 65% 2 1.6 1.5 PHoenix 1.5 alihb-sql 1 0.8 0.7 HBaseAPI 0.5 0 (ms) Read Write Confidential & Proprietary 22
23 . Secondary Indexing Data Table Local Index Global Index 1 a a 1 a 1 Region1 2 d a 3 a 3 3 a d 2 c 5 4 f c 5 d 2 Region2 5 c f 4 f 4 RegionServer1 RegionServer2 select * from tt where (pk between 1 and 3) and col = 3; select * from tt where col = 3; Confidential & Proprietary 23
24 . How Global Index Works? Write RPC Write RPC Handler Handler preBatch Read data table Build index updates syncLog data table edits index edits postBatch Commit Index updates syncLog index edits DataTable RS IndexTable RS Confidential & Proprietary 24
25 . Consistency: Index Updates Failure Write RPC Write RPC Handler Handler preBatch Read data table IndexTable Build index updates Region/RS syncLog data table edits Not Available index edits postBatch Commit Index updates syncLog index edits DataTable RS Confidential & Proprietary 25
26 . Solution I: Disalbe Index writing Write RPC Write RPC Handler Handler preBatch Read data table IndexTable Build index updates Region/RS syncLog data table edits Not Available index edits postBatch Commit Index updates syncLog index edits DataTable RS Confidential & Proprietary 26
27 . Solution I: Disable Index writing • Update meta (may cause chain collapse) – Set index state to DISABLE – Set index disable timestamps • Query degenerated to full table scan over data table Confidential & Proprietary 27
28 . Solution I: What If Update Meta Failed? Write RPC Write RPC Write RPC Handler Handler Handler preBatch syncLog Update postBatch Index Update Meta Abort Confidential & Proprietary 28
29 . Solution II: Disalbe Data Table Writing Write RPC Write RPC Handler Handler preBatch Read data table IndexTable Build index updates Region/RS syncLog data table edits Not Available index edits postBatch Commit Index updates syncLog index edits Raise exception DataTable RS Confidential & Proprietary 29