Bestpay-Pulsar
通过此次分享您可以了解到:
Lambda架构和Apache Pulsar
Orange Financial如何使用Lambda进行风险控制决策部署,以及如何利用Pulsar提高效率
展开查看详情
1. How Orange Financial combat financial frauds over 50M transactions a day using Apache Pulsar Vincent Xie (Bestpay), Jia Zhai (StreamNative)
2.About us Vincent (Weisheng) Xie Jia Zhai ❏ Current Director @ Orange Financial ❏ Co-Founder of StreamNative ❏ Previous Tech lead of ML engineering ❏ Apache Pulsar PMC Member team @ Intel ❏ Apache BookKeeper PMC Member
3.Agenda ❏ Background ❏ Apache Pulsar ❏ Unified Data Processing ❏ Our Practices ❏ Q&A
4.Background Intro
5.Orange Financial Orange Financial Services Group (Chinese: 甜橙金融), formerly known as Bestpay, is an affiliate company of China Telecom. It reached 1.13 trillion CNY ($18.37 Billion) transaction volume in 2018, with 500 million registered users and 41.9 million active users. Subsidiaries: Bestpay - a mobile wallet and payment app Jieqian - a consumer loan service Orange Wealth Orange Insurance Orange Credit Orange Financial Cloud
6.
7.Source: iiMedia Research Inc.
8.High Industry Penetration Rate Source: China Unionpay
9.Source: RSA
10.Challenges ❏ High concurrency ❏ > 50M transactions, 1 billion events a day (peek: 35K/s) ❏ Low latency demand ❏ response < 200ms ❏ Large number of batch jobs and streaming jobs
11.“A merchant’s total transaction volume ($) within the past month (30days) (current transaction included)” = sum($past_29days) + sum($today_upto_current) batch streaming
12.Architecture V1 API Gateway
13.Architecture V1 - Lambda Speed/Streaming Layer API Serving Gateway Layer Batch Layer
14.Drawbacks ❏ S/W stacks complexity ❏ Realtime / Offline / Serving stacks ❏ Multiple clusters to maintain (Kafka / Hive / Spark / Flink) ❏ Different skill sets to manipulate (Scala / Java / SQL) ❏ Segmented Logics ❏ Historical/Current ❏ Data redundancy ❏ Multiple duplications to move over
15.Introduce Apache Pulsar
16.What is Apache Pulsar?
17. “Flexible Pub/Sub Messaging Backed by durable log storage”
18.Pulsar - A cloud-native architecture Stateless Serving Durable Storage
19.Pulsar - Segment Centric Storage ❏ Topic Partition (Managed Ledger) ❏ The storage layer for a single topic partition ❏ Segment (Ledger) ❏ Single writer, append-only ❏ Replicated to multiple bookies
20.Pulsar - Pub/Sub
21.Pulsar - Topic Partitions
22.Pulsar - Segments
23.Pulsar - Stream
24.Pulsar - Stream as a unified view on data
25.Pulsar - Two levels of reading API ❏ Pub/Sub (Streaming) ❏ Read data from brokers ❏ Consume / Seek / Receive ❏ Subscription Mode - Failover, Shared, Key_Shared ❏ Reprocessing data by rewinding (seeking) the cursors ❏ Segment (Batch) ❏ Read data from storage (bookkeeper or tiered storage) ❏ Fine-grained Parallelism ❏ Predicate pushdown (publish timestamp)
26.Unified Data Processing on Pulsar
27.Architecture V2 Spark Structured Streaming Spark SQL API Gateway
28.Architecture V2 Spark Structured Streaming Spark SQL API Gateway ❏ Single Data Store (Pulsar) ❏ Single Computing Engine (Spark) ❏ Unified API
29.Pulsar-Spark ❏ Deeply integrated with Pulsar schema ❏ Pulsar topics as Structured Streams ❏ Pulsar Connectors for Spark Structured Streaming ❏ Pulsar Connectors for Spark SQL https://github.com/streamnative/pulsar-spark