Phoenix Improvements and Practices on Cloud HBase at Alibaba

来自阿里巴巴的 Yun Zhang 介绍在阿里,他们是把 Phoenix 当做数据库来做的,既提供了直接访问下层的 HBase 能力,也提供了基于Solr构建二级索引快速访问HBase和搜索的能力,类似于传统数据库或者一些 New SQL 操作型分析和即时查询,响应时间是毫秒到秒级。
Phoenix 在数据库主要面向 TB、PB 级的数据体量,查询延时在毫秒和秒级的即时响应场景,另外过滤后的数据量不能太大,查询模式不能太多,便于 Phoenix 建立二级索引。另外,由于稳定性、维护性的问题,他们把 Phoenix 的重客户端模式演变成了轻客户端模式。最后他结合实际案例给出了一些最佳实践。



2. Phoenix Improvements and Practices on Cloud HBase at Alibaba Yun Zhang Apache Phoenix committer / Alibaba Cloud HBase Phoenix Owner


4. Overview Phoenix Control Flink XPack Spark Kafka MR Ecosystem Control Terminal • 200+ Instances BulkLoad BDS Datax Cluster Manage • Maximum increment 4TB Statistics Collection UDF Search Index Backup Recovery (daily & single instance ) Features Transactions Salt Table Local Index Job Manage Dynamic Columns Paged Queries Global Index • Maximum instance 200 TB Workflow Manage • Maximum table 80 TB Storage HBase Solr Cloud Monitor … FileSystem HDFS Clod Disk HDFS Local Disk OSS

5. About Ecosystem • X-Connector Spark / Kafka / Flume/ Hive/ Pig / Flink / MapReduce • Data Migration Tools Tools Use Scenario Transmission method Data sources Address > 100 million rows, MR, API read source and Phoenix/Text/JSON/CSV/ BulkLoad large history/increment data generate HFile HBase bulk_dataload.html < 100 million rows, API read source and write Phoenix/Mysql/PG/Hive/ Datax small history/increment data target HBase/CSV…etc history/increment/real time BDS Copy HFile + WAL sync Phoenix/HBase/MYSQL (provide by apsaradb phoenix ) data

6.Where is Phoenix in database area?

7. Use Cases Summary Business Requirements Business Features • Query: milliseconds to seconds latency • Well-known & fewer query pattern • Write: high throughput • The filters of where clause hit result set less than 1 million • Scale out • Non transactions & cross table/row transactions • Scale up(There is a advantage for cloud products) • Online/offline business Some typical reasons why users choose Phoenix • RDBMS(MYSQL) slow down when data size increase to TB • Sharding store of RDBMS will make business logical becomes complex • The latency of some operational query business is too high on Data warehouse(Hive/ODPS)

8.Architecture Evolution

9. Architecture Application Application Phoenix thin client Phoenix thick client SLB Zookeeper PQS PQS PQS Zookeeper RS cp RS cp RS cp HDFS RS cp RS cp RS cp HDFS Antiquate Now

10. Why use thin client? • Stability • Maintenance costs • Self protect • Clients upgrade • Resources limitation • Trouble shooting • Monitor • Develop new features • SQL audits • Users Experience • Requests track • Smaller binary package • Multiple languages • Fewer dependencies/conflicts • C#/Go/Python

11.Test Framework For Stability Start • Monkey Actions Deploy • HMaster kill/graceful stop/start Prepare • RS kill/graceful stop/start Start Start • Region move/split Self Test Chaos Monkey 2. Exit • Cluster Balance/compaction 3. Exit 1. Assert fail • Query server stop/start Preserve Status • Apply to data & index tables Notification End

12.Search Index

13. Background • More Query Requirements • Wildcard query (suffix match) • Spatial Search • Arbitrary query pattern • FullText Search Example: select * from tableName where c1 = ‘%_s’ select * from tableName where (c1 = ‘1_s’ and c2 = 1) or ( c2 = 2 and c3 = 4) or… select * from tableName where contains(c1, ‘hello’); • Business Requirements SQL FullText SQL KV API SQL SQL KV API KV API FullText KV API FullText Index search Index search FullText Index Index search FullText Index HBase Phoenix HBase Phoenix SOLR/ES HBase Phoenix SOLR/ES + HBase Phoenix SOLR/ES + Solr/ES Solr/ES

14. Read & Write Path Phoenix 1 1 Parser select * from data_table upsert into data_table( ID, C1, C2) values(‘id1’, 1, “hello, world”); Optimizer where search_query=’C2:Hello’ and C1 = 1 Plan Executor filter C1 = 1 and rowkey Search C2: hello, return doc id(row key) Search index DDL 2 3 HBase Search Service 2 SearchManager Solr Cloud Region Region Search Meta Inverted Index Region batchs Batch docs DocValues hlog hlog Index Rebuilder hlog hlog ReplicationConsumer Full-text 4 Fake-HBase replica RPC docs ReplicationSource Parse solr client Extract indexed column FST/BKD-Tree backpressure backpressure 5 HDFS Zookeeper


16.Global index Query Optimize

17. Global Index Query Some project columns haven't been indexed 1. Use filters to retrieve pk data from the index table 2. Generate new SQL: select * from dataTable where pk in (x1,x2,x3…)

18.Problems & Solutions • Problems 1. There is a size limitation for hitting index table result set 2. Query primary table is inefficient, especially big table is obvious. • Solutions 1. Push down filters of the primary table to the server side 2. Batch query(multi get) the primary table when scanning filtered data from the index table on the server side 3. Return Tuple of projected columns of the primary table to the client 4. The client merge sort & top n

19. Performance Improvement Average 10X performance Improvement • Rows: 5 million rows • Query : select /*+INDEX(TT IDXTT)*/ * from Test where col_1 = '28' limit 500 offset 50  60000 45000 latency(mms) 30000 15000 4X 8X 0 original optimization + non bloomfilter optimization + bloomfilter

20.Best Practices & Tips

21. Query Server Tips 1. The default format of Date type is different between thick client and thin client, the format is yyyy- MM-dd hh:mm:ss.SSS and yyyy-MM-dd respectively 2. The columns Date type can not be used aggregation or group when the format is yyyy-MM-dd 3. Use Round-robin HTTP load balancer need set model = TCP 4. Query Server query OPS is mainly decided by scanning region numbers of per query 5. Recommend the serialized option use Protocol Buffers 6. Thin client default use JVM time zone, Thick client default use GTM timezone

22. Avoid Usage Pitfalls 1. BulkLoad text data must guarantee row key unique if the primary table has index tables, or the index data will be out of sync 2. The fields of VARCHAR type: • An empty string will be stored as NULL value for the VARCHAR type • ‘\0’ is reserve value which shouldn’t exist in actual data 3. The Index columns should avoid using DESC in create index table clause. Because of indexed data will be changed to variable data type to store, query these fields may get incorrect results

23. Best practices 1. For big data scenarios, the pre-split table is a better choice than the salted table 2. Use secondary indexes or primary key to accelerate order and group queries 3. Reduce redundant indexed columns and index tables number as far as possible 4. Set autocommit = true before executing delete from … where… 5. Set UPDATE_CACHE_FREQUENCY parameter when creating the view table

24.Future Work

25.• Search Index • Supports native SQL • CBO • Index merge • Support cancel full scan query or slow query • Query Server memory manage • Continue contributing the community

26. Thanks! Dingding Yun Zhang (Wechat)