Large scale log pipeline using Apache Pulsar——Nozomi Kurihara

展开查看详情

1.Large scale log pipeline using Apache Pulsar Yahoo Japan Corporation Nozomi Kurihara June, 18th, 2020

2.Who am I? Nozomi Kurihara • Software engineer at Yahoo! JAPAN (April 2012 ~) • Working on internal messaging platform using Apache Pulsar • Committer of Apache Pulsar • (Hobby: Board / video games!) Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 2

3.Agenda 1. Apache Pulsar at Yahoo! JAPAN - About Yahoo! JAPAN - Why Pulsar was chosen - Architecture and performance - Use cases 2. Large scale log pipeline Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 3

4.Apache Pulsar at Yahoo! JAPAN Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 4

5.Yahoo! JAPAN https://www.yahoo.co.jp/ Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 5

6.Yahoo! JAPAN – 3 numbers 100+ 150,000+ 49,010,000+ services servers login users per month (real) (2019/06) Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. image: aflo 6

7.Pulsar at Yahoo! JAPAN • We use Apache Pulsar as a centralized messaging platform for 3.5 years • 1 Pulsar maintainer team and a lot of teams (services) use Pulsar as a “tenant” Pulsar Service A Producer Topic A Consumer Service B Producer Topic B Consumer Service C Producer Topic C Consumer Pulsar team Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 7

8.Pulsar at Yahoo! JAPAN - Users More and more services start to use Pulsar! • 270+ tenants • 4400+ topics • ~50K publishes/s • ~150K consumes/s Typical use cases: • Notification • Job queueing • Log pipeline Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 8

9.Pulsar community in Japan TechBlog - https://techblog.yahoo.co.jp/entry/20200312818173/ - https://techblog.yahoo.co.jp/entry/20200413827977/ - https://techblog.yahoo.co.jp/entry/2020060330002394/ Apache Pulsar Meetup Japan (in Tokyo) - https://japan-pulsar-user-group.connpass.com/ Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 9

10.Why Pulsar was chosen Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 10

11.Why did Yahoo! JAPAN choose Pulsar? Large number of customers → High performance & scalability Large number of services → Multi-tenancy Sensitive/mission-critical messages → Security & Durability Multiple data centers → Geo-replication Pulsar meets all requirements! Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 11

12.Multi-tenancy Share 1 Pulsar with all YJ services → low hardware and labor costs Service A Service A Producer MQ Consumer Producer topic Consumer Service B Service B Producer MQ Consumer Producer topic Consumer Service C Service C Producer MQ Consumer Producer topic Consumer Pulsar team Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 12

13.Multi-tenancy – self-service Users can create/configure/delete their topics by themselves → management of topics is delegated to users Internal Web UI tool to manage topics (will be replaced with pulsar-manager): Create tenant Create namespace See topic stats Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 13

14.Architecture and performance Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 14

15.Clusters in Yahoo! JAPAN For each cluster: • 20 WS proxies WebSocket WebSocket • 15 Brokers Proxy Proxy • 10 Bookies • 5 ZKs Geo-replication Broker Broker Bookie ZK Bookie ZK West East Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 15

16.Performance – experimental settings • Pulsar version: 2.3.2(Broker) / 2.4.1(Client) • Tool: openmessaging-benchmark • Message size: 1 KB • partition: 1, 16, 32 • rate(attempted): 100000, 500000 • Server spec: CPU Memory Disk NIC Broker 2.00GHz / 2CPU 768GB SATA SSD 240GB x2(RAID1) 10GBaseT Bookie 2.00GHz / 2CPU 768GB Journal: SATA SSD 240GB x2(RAID1) 10GBaseT Ledger: SATA HDD 10TB x12(RAID1+0) Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 16

17.Performance – experimental results - 16, 32 partitions achieves 500,000 msg/s whereas 1 partition does not - max publish rate with 1 partition looks 200,000 msg/s Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 17

18.Tuning example (Bookie) CPU Memory Disk NIC Broker 2.00GHz / 2CPU 768GB SATA SSD 240GB x2(RAID1) 10GBaseT Bookie 2.00GHz / 2CPU 768GB Journal: SATA SSD 240GB x2(RAID1) 10GBaseT Ledger: SATA HDD 10TB x12(RAID1+0) Problem: • More users increases, more writes to SSD • That reduces lifespan of SSD (actually we saw frequent failure of SSD) Solution: Increase journalMaxGroupWaitMSec from 1 to 2 → Write decreased by 30% at the sacrifice of the least latency Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 18

19.Use cases Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 19

20.Case 1 – Notification of contents update Various contents files pushed from partner companies to Yahoo! JAPAN Notification sent to topic when contents are updated Once services receive notification, fetch contents from file server ②receive notification Service A weather, map, news etc. Consumer Pulsar FTP server Service B ftpd Producer Topic Consumer ①send notification Partner Service C Companies ③fetch content files Consumer Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 20

21.Case 2 – Job queuing in mail service Asynchronously execute heavy jobs like indexing of mail Producers register jobs to Pulsar Consumers take jobs from Pulsar at their own pace Pulsar Mail BE server Mail BE server Consumer Take and process a job Producer Topic Handler for indexing request Register a job Producer Re-register if it fails Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 21

22.Case 3 – Kafka alternative We have an internal FaaS system using Apache OpenWhisk Problem: FaaS team had to maintain Apache Kafka Solution: migrate from Kafka to our internal Pulsar Pulsar Kafka Wrapper needs only a few configuration changes (.pom, topic name, etc.) <dependency> - <groupId>org.apache.kafka</groupId> - <artifactId>kakfa-clients</artifactId> - <version>0.10.2.1</version> + <groupId>org.apache.pulsar</groupId> + <artifactId>pulsar-client-kafka</artifactId> + <version>2.4.0</version> </dependency> Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 22

23.Large scale log pipeline Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 23

24.Situation logs/ metrics monitor deploy FaaS PaaS CaaS … Service developers Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 24

25.Yamas • Metrics monitoring / alerting platform (SaaS) • Originally developed in Verizon media • Will be open-sourced soon! Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 25

26.Scale • Amount of total logs: 1.4~3.8 TB/h • Peek traffics: 10+ Gbps • Number of PFs will increase more and more Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 26

27.Legacy architecture Computing PFs Monitoring PFs PaaS Splunk Splunk app app agent Yamas app app agent CaaS Splunk Yamas app app agent Yamas app app agent … … L Need to install dedicated “agent” for each Monitoring PFs L Difficult to scale out L Traffic spikes directly influence Monitoring PFs Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 27

28.Motivation Remove dedicated agent for each monitoring PF: - No need specific knowledge and extra components - Easier trouble shooting Decouple sender/receiver PFs by introducing message queueing layer: - Scalability - Resiliency Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 28

29.New architecture Computing PFs Monitoring PFs PaaS Splunk app app Pulsar Pulsar producer consumer app app Pulsar Splunk topic CaaS Yamas topic Yamas app app Pulsar Pulsar producer consumer app app … … J Single library J Easy to scale out J Traffic spikes are mitigated by queueing layer Copyright (C) 2020 Yahoo Japan Corporation. All Rights Reserved. 29

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。