来自BloomBerg的工程师介绍HBase的基本特性,并介绍如何利用HBase来完成对数百亿条记录进行毫秒级延迟查询。

注脚

展开查看详情

1.Serving Billions of Queries In Millisecond Latency Biju Nair HBaseConAsia 2018 August 17, 2018 © 2018 Bloomberg Finance L.P. All rights reserved.

2.Agenda • HBase principles • Modeling • Implementation • Monitoring and Tuning © 2018 Bloomberg Finance L.P. All rights reserved.

3.Bloomberg by the numbers • Founded in 1981 • 325,000 subscribers in 170 countries • Over 19,000 employees in 192 locations • More News reporters than The New York Times + Washington Post + Chicago Tribune • Over 5,000 Engineers © 2018 Bloomberg Finance L.P. All rights reserved.

4.Bloomberg Tech • Over 5,000 software engineers • 100+ technologists and data scientists devoted to machine learning • One of the largest private networks in the world • 100B+ tick messages per day, with a peak of more than 10 million messages/second • >1.5M news stories ingested / published each day (that's 500 news stories ingested/second) • News content from 125K+ sources • More than a billion messages (emails and IB chats) processed each day © 2018 Bloomberg Finance L.P. All rights reserved.

5.Bloomberg in a nutshell © 2018 Bloomberg Finance L.P. All rights reserved.

6.Data Storage and Retrieval • Files • VSAM • Network • Hierarchical • Relational • MPP © 2018 Bloomberg Finance L.P. All rights reserved.

7.RDBMS Application Lifecycle • Use Case • Entities and Relations • Logical data model • Physical data model • Implementation and tuning © 2018 Bloomberg Finance L.P. All rights reserved.

8.HBase Principles • Ordered Key Value Store • Distributed © 2018 Bloomberg Finance L.P. All rights reserved.

9.Key Value … Key-9999 Value Key-9998 Value Key-9997 Value Key-9996 Value Key-9995 Value Key-9994 Value … © 2018 Bloomberg Finance L.P. All rights reserved.

10.Ordered Key Value … Key-9999 Value Lexicographic order Key-9998 Value Key-9997 Value Key-9996 Value Key-9995 Value Key-9994 Value Key-9993 Value … © 2018 Bloomberg Finance L.P. All rights reserved.

11.Distributed Order Key Value … … … Key-199 Value Key-299 Value Key-399 Value ordered Key-198 Value Key-298 Value Key-398 Value Key-197 Value Key-297 Value Key-397 Value … … … … … … Key-499 Value Key-599 Value Key-999 Value ordered Key-498 Value Key-598 Value Key-998 Value Key-497 Value Key-597 Value Key-997 Value … … … © 2018 Bloomberg Finance L.P. All rights reserved.

12.Abstraction • Table row view • Versioning • ACIDity © 2018 Bloomberg Finance L.P. All rights reserved.

13.Table Row View Key Value Row Id Column Id Timestamp Value © 2018 Bloomberg Finance L.P. All rights reserved.

14.Table Row View Key11|col1|1234567 Value-A Key11|col2|1234567 Value-B Key11|col3|1234567 Value-C Key11|col4|1234567 Value-D Col1 Col2 Col3 Col4 Key11 Value-A Value-B Value-C Value-D © 2018 Bloomberg Finance L.P. All rights reserved.

15.Versioning Key11|col1|1234567 Value-A1 Key11|col1|1234566 Value-A Descending order Key11|col2|1234567 Value-B Key11|col3|1234567 Value-CC Key11|col3|1234563 Value-C Key11|col4|1234567 Value-DD Key11|col4|1234560 Value-D1 Key11|col4|1234557 Value-D © 2018 Bloomberg Finance L.P. All rights reserved.

16.ACIDity • Atomic at row level • Consistent to a point in time before the request • Isolation through MVCC (reads) and row locks (mutations) • Durability is guaranteed for all successful mutations © 2018 Bloomberg Finance L.P. All rights reserved.

17.Data Modeling • Fitness for key value store — Can’t build relations — No secondary indexes — De-normalization • Understand queries to design key — Data Skew — Query Skew © 2018 Bloomberg Finance L.P. All rights reserved.

18. Data Skew Key-e Value Key-a Value Key-b Value Key-e Value Key-a Value Key-b Value Key-e Value Key-a Value Key-b Value Key-e Value Key-a Value Key-e Value Key-z Value Key-h Value Key-z Value Key-e Value Hot Key-e Value Key-h Value Key-h Value Key-y Value Key-e Value Key-e Value Key-y Value Key-f Value Key-e Value Key-f Value Key-e Value Key-d Value Key-e Value Key-x Value Key-d Value Key-e Value Key-x Value Key-d Value © 2018 Bloomberg Finance L.P. All rights reserved.

19.Query Skew Queries … … … Key-199 Value Key-299 Value Key-399 Value Key-198 Value Key-298 Value Key-398 Value Key-197 Value Key-297 Value Key-397 Value … … … … … … Key-499 Value Key-599 Value Key-999 Value Key-498 Value Key-598 Value Key-998 Value Key-497 Value Key-597 Value Key-997 Value … … … © 2018 Bloomberg Finance L.P. All rights reserved.

20.Data Write Write HBase 2 Memory 1 3 File System WAL t1 t1 t1 Store files Block Block Block Data Idx Blm © 2018 Bloomberg Finance L.P. All rights reserved.

21.Data Read Read HBase Memstore 2 Block Cache Block 1 File System WAL t1 t1 t1 Store files © 2018 Bloomberg Finance L.P. All rights reserved.

22.Cache • Pack more data into cache — Block size — Column Family • Large cache © 2018 Bloomberg Finance L.P. All rights reserved.

23. Block Size vs Read Latency Get Performance (ms) – 64 K Block BAvg 16.731 16.728 16.761 16.763 16.418 16.371 16.37 16.431 16.152 16.14 16.169 16.158 16.308 16.29 16.325 16.307 16.34 16.381 16.391 16.352 BMedian 14 14 14 14 13 13 13 13 15 15 15 15 13 13 13 13 13 13 13 13 B95% 41 41 41 41 41 41 41 41 43 43 43 43 40 40 40 40 41 41 41 41 B99% 55 55 55 55 54 54 54 54 55 55 55 55 54 54 54 54 54 54 55 54 B99.9% 71 71 71 71 70 70 70 70 67 67 67 67 71 70 70 71 71 71 71 70 BMax 545 1062 559 567 1075 1027 561 567 564 541 558 1062 1062 561 1075 1072 1067 563 1035 1032 Get Performance (ms) – 16 K Block Avg 3.002 5.362 5.361 5.357 6.419 6.369 6.405 6.383 6.188 6.196 6.182 6.174 6.246 6.264 6.268 6.253 5.194 5.207 5.219 3.031 Median 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 95% 10 15 15 15 18 18 18 18 18 18 17 17 18 18 18 18 15 15 15 10 99% 15 26 26 26 30 30 30 30 28 28 28 28 29 29 29 29 25 24 25 15 99.90% 26 41 41 41 45 45 45 45 43 43 43 43 44 44 44 44 41 41 41 26 Max 2261 127 185 102 90 106 92 102 93 106 119 114 89 140 132 82 81 150 93 1910 Note: Smaller block size increases the overhead of increased index blocks © 2018 Bloomberg Finance L.P. All rights reserved.

24.Block Size Vs Index Size 16 K Blocks 8 K Blocks Idx Sz K Bloom K Idx Sz K Bloom K 266346 2368 472058 2432 247895 2240 574239 2944 225561 2096 331899 1792 253633 2368 471362 2304 224862 2016 517272 2560 225685 2096 469543 2432 © 2018 Bloomberg Finance L.P. All rights reserved.

25.Column Family Row Id cf1:col1 Timestamp Value Row Id cf2:col1 Timestamp Value File System t1:cf1 t1:cf1 t1:cf2 T1:cf2 Store files © 2018 Bloomberg Finance L.P. All rights reserved.

26.Compaction File System K-x K-x K-x K-x K-x D-1 Store files Compaction File System K-x Store files © 2018 Bloomberg Finance L.P. All rights reserved.

27.Compaction • Part of regular HBase operations • Minor Compaction • Major Compaction • Utilizes server and HBase resources • Major compaction can be scheduled © 2018 Bloomberg Finance L.P. All rights reserved.

28.Short Circuit Read HBase/DFS HBase/DFS TCP HDFS TCP HDFS Client Client Pass FD Open File File System File System Data Data © 2018 Bloomberg Finance L.P. All rights reserved.

29.Garbage Collection Read HBase Memstore 3 Block Cache Block 2 Off-heap Cache Block 1 File System WAL t1 t1 t1 Store files © 2018 Bloomberg Finance L.P. All rights reserved.

user picture
为了让众多HBase相关从业人员及爱好者有一个自由交流HBase相关技术的社区,阿里巴巴、小米、华为、网易、京东、滴滴、知乎等公司的HBase技术研究人员共同发起了组建中国HBase技术社区。

相关文档