Apache HBase社区大神Stack和LiYu,站在社区核心开发者的角度,介绍Apache HBase项目的历史和现在的进展状态,并对最近几个版本大的功能特性进行了梳理和分析。

注脚

展开查看详情

1.Project Status HBaseConAsia2018, Beijing Michael Stack <stack@apache.org> Yu Li <liyu@apache.org>

2.2.0.0

3.Pervasive... ...distributed, scalable, big data store

4.In a nutshell... ●  ...15,409 commits made by 311 contributors ●  ...representing 800,490 lines of code ●  ...mostly written in Java ●  ...has a well established, mature codebase ●  ...maintained by a very large development team ●  ...with stable Y-O-Y commits ●  ...took an estimated 222 years of effort (COCOMO model) ●  ...starting with its first commit in April, 2007 (>10 years old!) Source https://www.openhub.net/p/hbase

5. LOC Source https://www.openhub.net/p/hbase

6.Issues

7. Commits per month Source https://www.openhub.net/p/hbase

8. Contributors Source https://www.openhub.net/p/hbase

9.New PMC Chairperson! “The HBase project represents solid, useful computer science that solves problems and runs businesses every day. To keep that going, we need to keep bringing new ideas and approaches into the project...we need to continue to attract people from all backgrounds and parts of the world. I'd love to see more women, more people of color, and even more worldwide diversity. I'd like to see more contributions from people not employed by big data platform companies. “If we continue to strive for a diversity of ideas and experiences, we'll keep innovating so that HBase remains relevant for years to come.” Misty Linville, Vice-President of the Apache HBase Project

10. 2.0.0 Our project ●  Apache HBase is an Open Source Apache project. ●  It’s what we want to make of it. ●  No owners! ●  Anyone can help! ●  All welcome! ●  The more, the merrier!

11. 2.0.0 Active branches Active Branches Latest Branche Release Release Manager branch-1.2 1.2.6.1 (EOL’d) Sean Busbey branch-1.3 1.3.2.1 (Yahoo) Francis Liu branch-1.4 1.4.6 (Current Stable) Andrew Purtell branch-1.5 Coming... Andrew Purtell branch-2.0 2.0.1 Michael Stack branch-2.1 2.1.0 Duo Zhang branch-2.2 <null> <null> branch-3 <null> <null>

12. 2.0.0 hbase-2.0.0

13. 2.0.0 2.0.0: Long-time coming ●  Branched four years ago ●  Released end-of-April, 2018 ●  Took > 1 year to stabilize ○  hbase-2.0.0 released, April 29th, 2018 ○  hbase-2.0.0-beta2 released, March 22nd, 2018 ○  hbase-2.0.0-beta1 released January, 16th, 2018 ○  hbase-2.0.0-alpha4 released November 4th, 2017 ○  hbase-2.0.0-alpha3 released September 17th, 2017 ○  hbase-2.0.0-alpha2 released August 21st, 2017 ○  hbase-2.0.0-alpha1 released June 22nd, 2017 ●  Multiple Release Managers ○  Matteo Bertozzi, Stephen Yuan Jiang, yours truly...

14. 2.0.0 Lets not do this again! ●  Backed-up mountains of “Tech Debt” ○  Rotted Unit Tests ■  “...99/100 it was the test, not Apache Infra” ○  Performance regressions ■  Not out of the woods yet…

15.2.0.0 S

16. 2.0.0 Goals: Compatibility ●  Double-down on Semantic Versioning, semver ○  Adopted in hbase-1.0.0 ○  MAJOR.MINOR.PATCH[-IDENTIFIER] ■  E.g. 2.0.0-alpha1

17. 2.0.0 Goals: Compatibility ●  But… ○  Semantic Versioning is about API only ■  What about…. ●  Internal/External Interfaces ○  Where is Client Interface when Spark/MapReduce ●  It’s complicated...

18. 2.0.0 Goals: Compatibility ●  From Hadoop… Yetus, annotations ○  InterfaceAudience.Public ■  Get/Put/Scan/Connection ○  InterfaceAudience.LimitedPrivate ■  Coprocessors, Replication, etc. ○  InterfaceAudience.Private ■  Internal only ●  What about… ○  Source/Binary compatibility ○  Serializations ■  Wire ■  Formats in HDFS/Zookeeper ○  Dependencies ●  See refguide semver section ○  http://hbase.apache.org/book.html#hbase.versioning

19. 2.0.0 Goals: Compatibility ●  Grey areas… ○  Coprocessors ■  Free access HBase core ■  Change to hbase internals => broken Coprocessors ■  InterfaceAudience.LimitedPrivate ○  Published metrics/jmx ○  Protobufs ■  hbase-protocol/hbase-protocol-shaded

20. 2.0.0 Goals: Compatibility in 2.0.0 ●  We adhere to SemVer for DML in 2.x ○  Not for DDL ●  hbase-1.x client can work against hbase-2.x cluster ○  Even 1.x Coprocessor Endpoints work on an hbase-2.x cluster ○  Read-only DDL/Admin of hbase-2.x from hbase-1.x client ○  Replication 1 2 works ●  Extensive curation of what is public/private ●  Purged Guava/Protobuf from API ●  Coprocessors ○  Revamped ●  No Singularity! No downtime! Rolling upgrade from hbase1! ○  Experimental! From 1.4.x to 2.1.x has been tested.

21. 2.0.0 Goals: Compatibility ●  Still plenty to do ○  Ongoing effort... ○  3.0.0!

22. 2.0.0 Goals: Others ●  Scale ○  More Regions, bigger clusters ●  Performance ○  Inline read/write but also macro-aspect: restart, assign, etc. ○  Better resource utilization ■  I/O, RAM ●  Fix primary root of operational woes/bugs ○  Master Region Assignment

23. 2.0.0 Insides ●  Scale ○  More Regions, bigger clusters ●  Performance ○  Inline read/write but also macro restart, assign, etc. ○  Better resource utilization ■  I/O, RAM ●  Fix primary root of operational woes/bugs ○  Master Region Assignment ●  Cleanup ○  Spark narrative ○  Interfaces

24. 2.0.0 Insides ●  Currently >4500 issues resolved ○  ~3k exclusive to 2.0.0+

25. 2.0.0 Insides: Prerequisites ●  JDK8 only ●  Hadoop-2.7.7 minimum* ○  Works against the coming Hadoop-3.x *Be wary of “...not stable / production ready” Hadoops

26. 2.0.0 .0.0 Insides: Features ●  New Master Core (A.K.A AMv2) ●  Off-heap Read/Write path ●  In-memory Compaction (“Accordion”) ●  And more...

27. 2.0.0 Insides: Assignment Manager ●  New Master Core (A.K.A AMv2) ○  Assignment Manager v1 (AMv1) root of many operational headaches ●  Prompt assign of millions of Regions, faster startup, larger scale ●  Scrutable/Standalone Testable ●  One hbase:meta writer only, the Master ●  No more intermediate state in ZK ○  At other end of an RPC... ○  Only final state published to hbase:meta ○  No more distributed state: some in Master memory, some in ZK, some in HDFS. ●  New degree of Resilience

28. 2.0.0 Insides: Off-heap ●  Smaller JVM heaps, less copying ○  But more accounting! ●  Off-heap Read Path ○  ○  HDFS=>BucketCache=>Outbound Socket ~latency OFF ■  Cache more ■  Less GC, less erratic ●  Off-heap Write Path ○  RPC=>HDFS data kept off-heap ■  Async DFS WAL Client ●  Off-heap ○  Socket Socket ○  Off-heap fragmentation anyone? ○  On by default?

29. 2.0.0 Insides: Offheap Before: After:

user picture
为了让众多HBase相关从业人员及爱好者有一个自由交流HBase相关技术的社区,阿里巴巴、小米、华为、网易、京东、滴滴、知乎等公司的HBase技术研究人员共同发起了组建中国HBase技术社区。

相关文档