Apache BookKeeper - A High Performance and Low Latency Storage

Apache BookKeeper是一个高性能的,分布式强一致性的日志存储系统,通过核心组件Bookie来实现单节点的输入高效和可靠写入,并通过分布式的Bookie节点来完成冗余数据达到容错和负载均衡的目的。诞生于Yahoo,2008年捐献给了Apache社区,是很多大数据软件的基础组件,比如Hadoop HDFS Name Node HA实现,Twitter的Apache DistributedLog,Pulsar等等。

1.Apache BookKeeper A High Performance and Low Latency Storage Service @sijieg (Sijie Guo, Twitter) @jvjujjuri (JV, Salesforce)

2.Hello! I am Sijie Guo - PMC Chair of Apache BookKeeper - Co-creator of Apache DistributedLog - Twitter Messaging/Pub-Sub Team - Yahoo! R&D Beijing

3.Challenges in Distributed Systems

4. Expect Failures up to 10% annual failure rates for disks/servers

5.Symptoms “

6.Problem 1: Not Available

7.Problem 1: Not Available

8.Problem 2: Inconsistencies


10.More Issues “

11. Problem 3: Split Brain Two Writers Writer A Writer A Writer A Write A’ Write A’

12.Problem 4: Failure Detection B A C

13.Problem 5: Recovery B A Consistency C Recovery Protocol

14.Solutions “

15.Overview Enter Apache BookKeeper

16. BookKeeper - Durable Storage A Durable Storage Optimized for Immutable Data Serve as a building block for reliable systems Client Library Replication Consistency Recovery Durability Commodity Hardware

17.Immutable Data Abstraction

18.Ledger ◉ Segment ◉ Block / Object ◉ Append-Only File ◉ ...

19.Guarantees If an entry has been acknowledged, it must be readable If an entry is read once, it must always be readable

20.History ◉ Initial Use Case - Hadoop NameNode HA ◉ 2008: Open Sourced Contrib of ZooKeeper ◉ 2011: Sub-Project of ZooKeeper ◉ 2012: Yahoo! Push Notification ◉ 2012~Now: DistributedLog, Pulsar, Majordodo ◉ 2015~Now: Salesforce Distributed Store

21.Details Inside of Apache BookKeeper

22.Architecture Metadata Store Bookie Client APP Bookie Ledger Bookie

23.Reliable Writes Bookie ◉ Store checksum along with entry ◉ Fsync entries before responding ◉ Ack when Bookie Accepted ○ All Previous Entries by ○ This Entry Quorum Bookie

24.Consistency - LastAddPushed Writer Add entries 0 1 2 3 4 7 8 9 10 11 12 LastAdd Pushed

25.Consistency - LastAddConfirmed Ownership Changed Writer Writer Add entries Ack Adds 0 1 2 3 4 7 8 9 10 11 12 Fencing LastAdd LastAdd Confirmed Confirmed Reader Reader


27. Read Entry & Read LAC Read Entry K Read LAC Client Client Speculative Reads Quorum Read On Timeouts B1 B2 B3 B1 B2 B3

28.Long Poll Read Client Speculative Long Poll Read Long Poll B1 B2 B3

29.Inside a Bookie