Lambda架构和Apache Pulsar
Orange Financial如何使用Lambda进行风险控制决策部署,以及如何利用Pulsar提高效率


1. How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar Vincent Xie (Bestpay), Jia Zhai (StreamNative)

2.About us Vincent (Weisheng) Xie Jia Zhai ❏ Current Director @ Orange Financial ❏ Co-Founder of StreamNative ❏ Previous Tech lead of ML engineering ❏ Apache Pulsar PMC Member team @ Intel ❏ Apache BookKeeper PMC Member

3.Agenda ❏ Background ❏ Apache Pulsar ❏ Unified Data Processing ❏ Our Practices ❏ Q&A

4.Background Intro

5.Orange Financial Orange Financial Services Group (Chinese: 甜橙金融), formerly known as Bestpay, is an affiliate company of China Telecom. It reached 1.13 trillion CNY ($18.37 Billion) transaction volume in 2018, with 500 million registered users and 41.9 million active users. Subsidiaries: Bestpay - a mobile wallet and payment app Jieqian - a consumer loan service Orange Wealth Orange Insurance Orange Credit Orange Financial Cloud


7.Source: iiMedia Research Inc.

8.High Industry Penetration Rate Source: China Unionpay

9.Source: RSA

10.Challenges ❏ High concurrency ❏ > 50M transactions, 1 billion events a day (peek: 35K/s) ❏ Low latency demand ❏ response < 200ms ❏ Large number of batch jobs and streaming jobs

11.“A merchant’s total transaction volume ($) within the past month (30days) (current transaction included)” = sum($past_29days) + sum($today_upto_current) batch streaming

12.Architecture V1 API Gateway

13.Architecture V1 - Lambda Speed/Streaming Layer API Serving Gateway Layer Batch Layer

14.Drawbacks ❏ S/W stacks complexity ❏ Realtime / Offline / Serving stacks ❏ Multiple clusters to maintain (Kafka / Hive / Spark / Flink) ❏ Different skill sets to manipulate (Scala / Java / SQL) ❏ Segmented Logics ❏ Historical/Current ❏ Data redundancy ❏ Multiple duplications to move over

15.Introduce Apache Pulsar

16.What is Apache Pulsar?

17. “Flexible Pub/Sub Messaging Backed by durable log storage”

18.Pulsar - A cloud-native architecture Stateless Serving Durable Storage

19.Pulsar - Segment Centric Storage ❏ Topic Partition (Managed Ledger) ❏ The storage layer for a single topic partition ❏ Segment (Ledger) ❏ Single writer, append-only ❏ Replicated to multiple bookies

20.Pulsar - Pub/Sub

21.Pulsar - Topic Partitions

22.Pulsar - Segments

23.Pulsar - Stream

24.Pulsar - Stream as a unified view on data

25.Pulsar - Two levels of reading API ❏ Pub/Sub (Streaming) ❏ Read data from brokers ❏ Consume / Seek / Receive ❏ Subscription Mode - Failover, Shared, Key_Shared ❏ Reprocessing data by rewinding (seeking) the cursors ❏ Segment (Batch) ❏ Read data from storage (bookkeeper or tiered storage) ❏ Fine-grained Parallelism ❏ Predicate pushdown (publish timestamp)

26.Unified Data Processing on Pulsar

27.Architecture V2 Spark Structured Streaming Spark SQL API Gateway

28.Architecture V2 Spark Structured Streaming Spark SQL API Gateway ❏ Single Data Store (Pulsar) ❏ Single Computing Engine (Spark) ❏ Unified API

29.Pulsar-Spark ❏ Deeply integrated with Pulsar schema ❏ Pulsar topics as Structured Streams ❏ Pulsar Connectors for Spark Structured Streaming ❏ Pulsar Connectors for Spark SQL