Test-suite for Automating Data-consistency checks on HBase

这是 HBase Track 1 专场最后一个Talk,由 Flipkart 的工程师 Pradeep 来分享(Flipkart 是由亚马逊的两名前员工于2007年创建,是印度最大电子商务零售商) 。
由于 Flipkart 是电商场景,所以他们对分布式数据库的数据一致性要求非常高。因此他们设计了一系列的测试集,用来评估每一个版本的 HBase 是否满足他们严苛的数据一致性要求。他们设计的一些典型测试集有:zookeeper 网络断开、组件间网络丢包、时钟漂移、磁盘突然变只读等,这对为 HBase 提供可靠的数据保证很有帮助。

未来,Flipkart 会考虑开源他们的测试集到 github,供其他 HBase 的用户参考和评估。

展开查看详情

1. Test-suite for Automating Data-consistency checks on HBase Pradeep S, Mallikarjun V Flipkart

2.About Flipkart • India’s largest online retailer: • 10M page visits a day • 2M shipments a day • 30M products across more than 70 categories • Big Billion Days ($300M sales, top ranked app on Google Play Store)

3.Agenda ● Yak: HBase Cluster @ Flipkart ○ Need for scalable, multi-tenant, strongly consistent data stores ● Yak: Need for Data-correctness Guarantees ● Why Data-consistency Test-suite? ● How we did the Test-suite?

4.Yak: HBase Cluster @ Flipkart ● HBase for OLTP Key-Value store ● Bring-Your-Own-Box multi-tenancy on HBase ● RSGroup based isolation on HBase 1.2.4 & 2.1.3 ● WAL based Change-Data-Capture into Kafka ● Stores critical data-sets in e-commerce like: Orders, Payments

5.Yak: Data-correctness Guarantees Required ● Read-Your-Own-Write consistency at single row level ● ‘Atleast-once’ guarantee in the change-data-capture ● Ordering guarantee in the change-data-capture ● Predictable recovery times, no data-loss upon failures ● Data-reliability for HBase admin operations

6.Why Data-consistency Test-suite? ● To enable faster upgrade cycles ● Releases dont degrading on the data-guarantees ● To assist in reproducing edge-cases scenarios ● To accommodate additional failure scenarios

7.Different Approaches for the Test-suite ● Algorithm testing: TLA+, etc. ● Code testing: Test suite of HBase, Hadoop, Zookeeper ● Testing on running cluster: Jepsen, ChaosMonkey & ITBLL of HBase, etc.

8.Different Approaches for the Test-suite ● Algorithm testing: TLA+, etc. - Implementation, Deployment NOT tested ● Code testing: Test suite of HBase - Deployment, Integration NOT tested ● Testing on running cluster: Jepsen, ChaosMonkey & ITBLL of HBase, etc. Reason: Integrated test with { Algorithm + Code + Deployment }

9.Test-Suite ● Jepsen, ChaosMonkey based ● Test workload on HBase ● Simulated Interrupts: ○ Infrastructure failures ○ HBase component failures ○ Admin operations ● Test Report

10.Test-Suite - Test Workload ● Writes using CheckAndPut ● Observation points: ○ HBase Get ○ Kafka consumer from CDC

11.Test-Suite - Test Workload Thread-1 Generate CheckPut Save Get CheckPut Save Get CheckPut Save Get …... Row Key1 col:v = 1 Locally col:v = 2 Locally col:v = 3 Locally rowkey = “key1” col:v = 1 rowkey = “key1” col:v = 2 rowkey = “key1” col:v = 3 ….

12. Test-Suite - Test Workload Thread-1 Generate CheckPut Save Get CheckPut Save Get CheckPut Save Get …... Row-Key1 col:v = 1 Locally col:v = 2 Locally col:v = 3 Locally Thread-2 Generate CheckPut Save Get CheckPut Save Get CheckPut Save Get …... Row-Key2 col:v = 1 Locally col:v = 2 Locally col:v = 3 Locally Thread-3 Generate CheckPut Save Get CheckPut Save Get CheckPut Save Get …... Row-Key3 col:v = 1 Locally col:v = 2 Locally col:v = 3 Locally N Threads… on M Machines

13.Test-Suite - Assertions Thread-1 Generate CheckPut Save Get CheckPut Save Get CheckPut Save Get …... Row-Key1 col:v = 1 Locally col:v = 2 Locally col:v = 3 Locally ● Locally saved-state is compared against Get call result for no data-loss For Key1 after every CheckAndPut: Assert ( ThreadLocal<Version> == Version from HBase )

14.Test-Suite - Assertions Thread-1 Generate CheckPut Save Get CheckPut Save Get CheckPut Save Get …... Row-Key1 col:v = 1 Locally col:v = 2 Locally col:v = 3 Locally ● Kafka listener reads the events of the key: For the key1: Assert ( For a key1 having writes as: 1,2,3,4 read event version <= 1,2,2,2,3,4 latest seen version +1 1,2,3,2,3,4 1,2,4,2,3,4 ) 1,2,4

15.Test-Suite - Assertions Thread-1 Generate CheckPut Save Get CheckPut Save Get CheckPut Save Get …... Row-Key1 col:v = 1 Locally col:v = 2 Locally col:v = 3 Locally ● HBase regions of a table are within the RSGroup nodes ● HDFS data-blocks stored within the RSGroup nodes Assert (Region Assignment within RSGroup Nodes) Assert (HDFS blocks within RSGroup Nodes)

16.Test-Suite - Assertions Thread-1 Generate CheckPut Save Get CheckPut Save Get CheckPut Save Get …... Row-Key1 col:v = 1 Locally col:v = 2 Locally col:v = 3 Locally ● Locally saved-state is compared against Get call result for no data-loss ● Kafka listener reads the events of the key: ○ To assert no data-loss as the versions are incremental ○ To assert no-ordering loss ● RSGroup isolation checks: WAL & HDFS data-blocks & Region Assignment

17.Interruptions ● Simulate issue & heal ● Types of Interruptions: ○ Infrastructure failures ○ HBase component failures ○ HBase admin operations

18.Interruptions - Infrastructure Failures ● Network failures: ○ Network partition/failure: within zookeeper, region-servers, namenode etc. ○ Packet loss: within zookeeper, master, region-servers, namenode etc. ○ Tools: iptables, tc, comcast ● Other failure modes - to be added: ○ Clock skew ○ Packet delays ○ Disk read-only etc

19.Interruptions - Component Failures ● Kill hbase components: ○ region-server, master-node, zookeeper ● Kill hadoop components: ○ data-node, journal-node, name-node ● Node crash: ○ region-server-node, master-node, name-node, zookeeper, journal-node

20.Interruptions - HBase Admin Operations ● Split/merge region ● Assign/move region ● Restart: region-server, data-node ● Stop: region-server, data-node, etc. ● Kafka properties reload for CDC

21.Issues Uncovered & Fixed ● Assign an already assigned region causing data-loss - Fixed in 2.x ● Ordering loss in tail process from WAL file upon a region failover - Fixed in 2.x ● WAL not-isolated across region-server group - HBASE-21641

22.Further Plans ● Migration to HBase ChaosMonkey for Interruptions ● Opensource the test-suite ● WAL push to Kafka - https://github.com/flipkart-incubator/hbase-sep

23.Thanks!

24.Appendix

25.Limitations ● Tests are not deterministic. Needs 100s of iterations. ● Doesn’t catch all the bugs. But catches the practical issues for the specific use-case.

26.Test-Suite - Sample Report