介绍HBase在中国人寿的数据导入,处理,查询,导出等基本业务功能中,碰到的相关性能,可用性等问题的解决和优化。

注脚

展开查看详情

1. hosted by HBase Practice At China Life Insurance Co., Ltd. Fan Zheng August 17,2018

2. hosted by Content 01 Scenarios Integration, Processing, Query, Export 02 Optimizations Cluster, Configuration, Writing, Reading 03 Problems Copy failed, Compact never end 04 Future Work Pheonix, Realtime

3. hosted by Integration, Processing, Query, Export 01 Scenarios

4. hosted by Scenarios Overview • Storage for integration • Data source for query and processing Integration Easy update, schemaless, millions of columns 1 Processing 2 Fast batch R/W through snapshot, bulkload… Query 3 High concurrency 4 & Low latency Export Hive external table support

5. hosted by Scenarios Scale Cluster 3 Clusters 200+ Nodes Data HBase data 300TB+ largest table 30TB+ , 2500 regions Processing Hundreds of MR/Hive/Spark jobs per day 50TB+ Incremental data for update&insert Queries Tens of millions of queries per day

6. hosted by Scenarios Integration • Integrate policy, customer, agent… data from various systems policy_detail Agent_detail Customer_detail …… Business Sales management Customer contact Systems systems ECIF CallCenter system ……

7. hosted by Scenarios Integration • Data integrated by the key of business entity. • One row for all information

8. hosted by Scenarios Integration • RDBMS->Textfile->Hfile->BulkLoad to HBase

9. hosted by Scenarios Processing • Analysis of entities (customer,policy …) . • Build indexes between entities.

10. hosted by Scenarios Processing • Processing hundreds of labels within one time I/O

11. hosted by Scenarios Processing • Encapsulated, Configurable, Sql-like development framework Development: • Sql-like&Native API • Off-line Unit Test …… Runtime Configuration: • Name of Source/target table • Read Table/Snapshot • Write Hive/Hbase • Rowkey blacklist • Init/Incr switch • BulkLoad switch • ColumnPrefixFilter switch • Single Row Processing • Single Label Processing ……

12. hosted by Scenarios Query • Unified query services centering on entities such as customer

13. hosted by Scenarios Query • Sql-like input • One Service for All labels

14. hosted by Scenarios Export • Exporting to Hive to support batch query and analysis

15. hosted by Scenarios Export • Universal MR, read snapshot

16. hosted by Cluster, Configuration, Writing, Reading 02 Optimizations

17. hosted by Optimizations Multiple Cluster • Read/Write Splitting • Platform/Application Splitting

18. hosted by Optimizations Configuration • Minimal impact on RS • Region Balanced By Table • G1GC • Daily/Weekly MajorCompaction • Pre-split Regions • ……

19. hosted by Optimizations Reading • Minimal impact on RS • Snapshot read • ColumnPrefixFilter • Data Blacklist • Incremental read • ……

20. hosted by Optimizations writing • Minimal impact on RS • Write HFiles ->Do bulkLoad • Skip WAL • ……

21. hosted by Copy failed, Compact never end 03 Problems

22. hosted by Problems Failed to copy tables Processes Problems 1.Create snapshot 2.Export snapshot 1.Failed to create snapshot 3.Disable table (Wasn’t complet in expectedTime) 4.Restore snapshot 2. Failed to export snapshot 5.Enable table (File not found) 6.Delete snapshot

23. hosted by Problems Failed to copy tables 1.Failed to create snapshot Problems (Timeout/Wasn’t complet in expectedTime) Reasons 1. compacting/splitting/transiting while creating snapshot 2. Spending to much time on flushing Solutions 1.Put required column into a small table. 2.Wait and retry. 3.Disable table. 4.Disable region balance temporarily. 5.skip_flush 6.Increase the timeout limit

24. hosted by Problems Failed to copy tables 1. Failed to export snapshot Problems (File not found) 1. compaction/split happend after snapshot creation Reasons 2. Bug in ExportSnapshot Solutions 1. Disable table before copy. 2. Fix bug to search hfile correctly

25. hosted by Problems Compact never ends Problems Solutions Recreate table without this ‘PREFIX_TREE’

26. hosted by Problems Unresolved • Table unavailable at the end of copy (disable->restore->enable) • Region’s locality drop to 0 • ……

27. hosted by Pheonix, Realtime 04 Future Works

28. hosted by Future Works Pheonix • Flexible, real-time, precise query scenario Apps Real-time sparkstreaming Customer View On Phoenix Kafka Shareplex Business Systems

29. hosted by Future Works Real-time • Real-time detail tables and label tables

30.hosted by Thanks

user picture
为了让众多HBase相关从业人员及爱好者有一个自由交流HBase相关技术的社区,阿里巴巴、小米、华为、网易、京东、滴滴、知乎等公司的HBase技术研究人员共同发起了组建中国HBase技术社区。

相关文档