TGIP-CN 023: Apache Pulsar 在 BIGO 的实践分享

展开查看详情

1.

2.Apache Pulsar BIGO | BIGO

3.About Me - BIGO - Apache Pulsar Contributor - Apache Bookkeeper Contributor - StreamNative/Pulsar-Flink-Connector Contributor

4.Motivation BIGO Kafka Kafka Ø o Ø o ISR VS. Ø o catch-up PageCache Ø o HDD Ø o Ø Apache Kafka Apache Pulsar

5.Apache Pulsar BIGO 2019.12 2020.4 Apache Pulsar Apache Pulsar Apache Pulsar Apache Pulsar Apache Pulsar & Apache Kafka 2019.11 2020.4 2020.5

6.BIGOer Apache Pulsar

7.Apache Pulsar BIGO Baina (BIGO C++ ) KMM (Kafka Mirror Maker) Flink SQL … … PUB - SUB

8. (Apache Pulsar 2.5.1) Ø Pulsar broker Ø Pulsar broker Cache bookie Ø broker OOM Ø Bookie direct memory OOM Ø Bookie Ø Journal HDD fsync bookie add entry 99th latency Ø bookie add entry latency Ø Pulsar client Lookup Timeout Exception” Ø ZooKeeper Pulsar Ø Reader API(eg. Pulsar Flink Connector) Pulsar topic Pulsar 2.5.2

9.Apache Pulsar ü Bookie Journal/Ledger § OS: 1 ~ 2 GB ü Journal/Ledger HDD ZooKeeper § JVM: 1/2 dataDir/dataLogDir Journal/Ledger § heap: 1/3 § direct memory: 2/3 § PageCache: 1/2 o jvm heap/gc o bytes in per broker o message in per broker o loadbalance o broker Cache o bookie client quarantine ratio o bookie client request queue Broker

10.Ø § Broker § Bookie Ø § Broker broker direct memory OOM § Broker consumer/reader consumer/reader GC Ø Cache Ø ZooKeeper Ø auto bundle split

11. —Broker (PR-6772) new_avg = old_avg * factor + (1-factor) * avg new_avg: newest average resoruce usage old_avg: old average resource usge which is calculate in last round. factor: the decrease factor, default value is 0.9 avg: the average resource usage of the brokers Loadbalance : broker resource usage > average resource usage + threshold 0 avg avg+threshold 100 loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl. ThresholdShedder

12.Ø § Broker § Bookie Ø § Broker broker direct memory OOM § Broker consumer/reader consumer/reader GC Ø Cache Ø ZooKeeper Ø auto bundle split

13.—Bookie

14.—Bookie Bookkeeper PR-2327 Bookie Client

15.Ø § Broker § Bookie Ø § Broker broker direct memory OOM § Broker consumer/reader consumer/reader GC Ø Cache Ø ZooKeeper Ø auto bundle split

16.– Broker

17.Ø § Broker § Bookie Ø § Broker broker direct memory OOM § Broker consumer/reader consumer/reader GC Ø Cache Ø ZooKeeper Ø auto bundle split

18.– Consumer Consumer N message broker M entry ==> Consumer ==> GC Broker Message request Message Push

19. – Consumer (PR-6719) avgMessagesPerEntry = avgMessagePerEntry * avgPercent + (1 - avgPercent) * new Value Default: avgMessagePerEntry = 1000 avgPercent = 0.9 # Precise dispatcher flow control according to history message number of each entry preciseDispatcherFlowControl=true

20.Ø § Broker § Bookie Ø § Broker broker direct memory OOM § Broker consumer/reader consumer/reader GC Ø Cache Ø ZooKeeper Ø auto bundle split

21. Cache (PR-6769, PR-7894) Ø broker Cache Tailing Read Ø bookie write Cache(Memtable) Ø bookie read Cache Ø OS PageCache Has Active Cursor: § durable cursor § Cursor lag managedLedgerCursorBackloggedThreshold ( 1000 entry)

22.Ø § Broker § Bookie Ø § Broker broker direct memory OOM § Broker consumer/reader consumer/reader GC Ø Cache Ø ZooKeeper Ø auto bundle split

23. ZooKeeper § HDD ZooKeeper dataDir/dataLogDir IO bookie Journal/Ledger ) SSD § ZooKeeper dataDir dataLogDir SSD § broker/bookie ZooKeeper

24.Ø § Broker § Bookie Ø § Broker broker direct memory OOM § Broker consumer/reader consumer/reader GC Ø Cache Ø ZooKeeper Ø auto bundle split

25. auto bundle split Pulsar bundle split producer/consumer/reader namespace bundle auto bundle split

26.Pulsar Pulsar

27.Q&A

28. ! Before I Get Old

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。