Apache Pulsar at YahooJapan

Yahoo! Japan is a heavy user of Apache Pulsar. We have used Pulsar as a private messaging platform ever since it became an open-source software. Over 170 tenants (auction, shopping, maps, mail, etc.), 4K topics, and 50K messages are produced and consumed every second.

In this talk, we will introduce use cases of Yahoo! Japan:

Log pipeline
Notification
Job queueing
Migration from Kafka

Meanwhile, we will introduce our current project: Pulsar node.js client (open-source) and Web UI tool to manage topics (available as open source soon).

展开查看详情

1.Apache Pulsar at Yahoo! JAPAN Yahoo Japan Corporation Nozomi Kurihara Aug., 17th, 2019

2.Who am I? Nozomi Kurihara • Software engineer at Yahoo! JAPAN (April 2012 ~) • Working on internal messaging platform using Apache Pulsar • Committer of Apache Pulsar • (Hobby: Board / video games!) Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 2

3. Agenda 1. What is Apache Pulsar? 2. Why did Yahoo! JAPAN choose Apache Pulsar? 3. How does Yahoo! JAPAN use Apache Pulsar? 4. Future plans Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 3

4.What is Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 4

5.Apache Pulsar Flexible pub-sub system backed by durable log storage ▪ History: ▪ Competitors: › 2014 Development started at Yahoo! Inc. › Apache Kafka › 2015 Available in production in Yahoo! Inc. › RabbitMQ › Sep. 2016 Open-sourced (Apache License 2.0) › Apache ActiveMQ › June 2017 Moved to Apache Incubator Project › Apache RocketMQ › Sep. 2018 Graduated as Top Level Project! etc. ▪ Users: › Verizon media (Yahoo! Inc.) › Comcast › The Weather Channel › Mercado Libre › Streamlio › Yahoo! JAPAN etc. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 5

6.Pub-Sub messaging Message transmission from one system to another via Topic ▪ Producers publish messages to Topics ▪ Consumers receive only messages from Topics to which they subscribe ▪ Decoupled (no need to know each other) → asynchronous, scalable, resilient Subscribe Consumer 1 Publish Producer Topic Consumer 2 message (log, notification, etc.) Consumer 3 Pub-Sub system Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 6

7.Architecture Producer Consumer ■3 components: ‣ Broker ‣ Bookie Broker 1 Broker 2 Broker 3 ‣ ZooKeeper Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 7

8.Architecture - Broker Producer Consumer ■Broker ‣ Serving node for clients’ requests ‣ No data locality (stateless) Broker 1 Broker 2 Broker 3 Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 8

9.Architecture - Bookie Producer Consumer ■Bookie (Apache BookKeeper) ‣ Storage node for messages ‣ Durable, Scalable, Consistent, Fault-tolerant, Low- Broker 1 Broker 2 Broker 3 latency Configuration Store Local ZK (Global ZK) Apache BookKeeper: distributed write-ahead log system Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright © 2016 - 2018 The Apache Software Foundation, licensed under the Apache License, version 2.0. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 9

10.Architecture - ZooKeeper Producer Consumer ■Apache ZooKeeper ‣ Store metadata and configuration ‣ Local ZK: within local cluster Broker 1 Broker 2 Broker 3 ‣ Configuration Store: across all clusters Configuration Store Local ZK (Global ZK) Bookie Bookie Bookie 1 2 3 Pulsar Cluster Copyright © 2016 - 2018 The Apache Software Foundation, licensed under the Apache License, version 2.0. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 10

11.Why did Yahoo! JAPAN choose Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 11

12.Yahoo! JAPAN https://www.yahoo.co.jp/ Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 12

13.Yahoo! JAPAN – 3 numbers 100+ 150,000+ 93,000,000+ services servers Unique Browsers (real) (avg in 2018/7-9) Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. image: aflo 13

14.Why did Yahoo! JAPAN choose Pulsar? ▪ Large number of customers → High performance & scalability ▪ Large number of services → Multi-tenancy ▪ Sensitive/mission-critical messages → Durability ▪ Multiple data centers → Geo-replication Pulsar meets all these requirements! Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 14

15.Scalability Just adding Brokers/Bookies increases serving/storage capacity! (no special operation e.g. data rebalancing is required) Producer Consumer Broker 1 Broker 2 Broker 3 Broker X Configuration Store (Global ZK) for more serving capacity Local ZK Bookie Y Bookie 1 Bookie 2 Bookie 3 Pulsar Cluster for more storage capacity Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 15

16.Multi-tenancy Multiple services can share one Pulsar system ▪ Just use Pulsar as a “Tenant” → no need to maintain own messaging system ▪ Authentication/Authorization mechanism protects messages from interception Service A Producer Topic A Consumer Service B Producer Topic B Consumer Service C Producer Topic C Consumer Service D Producer Topic D Consumer Authentication/Authorization blocks unauthorized access Pulsar System Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 16

17.Geo-replication Pulsar can replicate messages to another cluster 1. Producers only have to publish messages to Pulsar in the same data center 2. Pulsar asynchronously replicates messages to another cluster 3. Consumers can receive messages from the same data center Consumer Geo-replication Consumer Pulsar Cluster A Pulsar Cluster B Producer Topic Consumer Topic Consumer Consumer Consumer Data center A Data center B Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 17

18.How does Yahoo! JAPAN use Apache Pulsar? Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 18

19.System architecture in Yahoo! JAPAN Service A (Node.js) WebSocket WebSocket Proxy Proxy Service C (C++) Service B (Java) Geo-replication Broker Broker Prometheus + Grafana Collect metrics For each cluster: Bookie ZK Bookie ZK + ・20 WSs Visualize ・15 Brokers ・10 Bookies ・5 ZKs West Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. East 19

20.Users More and more services start to use Pulsar! • 210+ tenants • 4000+ topics • ~100K publishes/s • ~180K subscribes/s Typical use cases: • Notification • Job queueing • Log pipeline Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 20

21. Case 1 – Notification of contents update ▪ Various contents files pushed from partner companies to Yahoo! JAPAN ▪ Notification sent to topic when contents are updated ▪ Once services receive notification, they then fetch contents from file server ②receive notification Service A weather, map, news etc. Consumer Pulsar FTP server Service B ftpd Producer Topic Consumer ①send notification Partner Service C Companies ③fetch content files Consumer Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 21

22.Case 2 – Job queuing in mail service ▪ Indexing of mail can be heavy → you can execute it asynchronously ▪ Producers register jobs to Pulsar ▪ Consumers take jobs from Pulsar at their own pace Pulsar Mail BE server Mail BE server Consumer Take and process a job Producer Topic Handler for indexing request Register a job Producer Re-register if it fails Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 22

23. Case 3 – Log pipeline ▪ Publisher: computing platforms on which YJ applications are running ▪ Subscriber: data platforms (monitoring, analyzing, storing etc.) check logs deploy apps Service developers app1 app2 app3 … Monitoring PaaS_logs PaaS logs Analyzing container1 container2 … CaaS_logs CaaS Storing … … Pulsar Computing PFs Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. Data PFs 23

24.Case 3 – Log pipeline Logs can have destinations → Consumers need to filter them To: To: discard Analyzing Monitoring Filtering app1 app2 app3 … Monitoring PaaS_logs PaaS Filtering Analyzing container1 container2 … CaaS_logs CaaS Filtering Storing … … Pulsar Computing PFs Data PFs Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 24

25. Case 3‘ – Log pipeline + filtering (Future plan) ▪ Filtering on Pulsar side ▪ Pulsar Function is helpful to filter! app1 app2 app3 … Pulsar For Monitoring Monitoring PaaS_logs PaaS Pulsar Functions For Analyzing Analyzing container1 container2 … CaaS_logs CaaS For Storing Storing … … Computing PFs Filtering Data PFs Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 25

26.Migration from Kafka ▪ We have an internal FaaS system using Apache OpenWhisk ▪ Problem: FaaS team had to maintain Apache Kafka ▪ Solution: migrate from Kafka to our internal Pulsar ▪ Pulsar Kafka Wrapper needs only a few configuration changes (.pom, topic name, etc.) <dependency> - <groupId>org.apache.kafka</groupId> - <artifactId>kakfa-clients</artifactId> - <version>0.10.2.1</version> + <groupId>org.apache.pulsar</groupId> + <artifactId>pulsar-client-kafka</artifactId> + <version>2.4.0</version> </dependency> Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 26

27.Future plans Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 27

28.Node.js Client Node.js users can easily use Pulsar! Implementation: • https://github.com/apache/pulsar-client-node • Based on C++ Client Done: ✅ basic functionalities(producer, consumer, reader) ✅ test codes ✅ performance scripts Todo: • publish to npm registry • Fix release flow • support more features (multi-topic consume, unack etc.) Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 28

29.Admin WebUI (under development) Administrators can easily and intuitively manage Pulsar topics! Implementation • https://gist.github.com/massakam/8e9bd3ca62874f18cf3ce3ecb6db1473 • Vue.js + Express Done: ✅ basic pages (tenants, namespaces, topics etc.) Todo: • open repository • advanced commands (unload, skip-messages etc.) • authentication to Broker Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved. 29

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。