申请试用
HOT
登录
注册
 
账号已存在
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar
StreamNative
/
发布于
/
1381
人观看
展开查看详情

1 .Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar Vincent Xie (Bestpay), Jia Zhai (StreamNative)

2 .About us Vincent (Weisheng) Xie Jia Zhai ❏ Current Director @ Orange Financial ❏ Co-Founder of StreamNative ❏ Previous Tech lead of ML engineering ❏ Apache Pulsar PMC Member team @ Intel ❏ Apache BookKeeper PMC Member

3 .Agenda ❏ Background ❏ Apache Pulsar ❏ Unified Data Processing ❏ Our Practices ❏ Q&A

4 .Background Intro

5 .Orange Financial Orange Financial Services Group (Chinese: 甜橙金融), formerly known as Bestpay, is an affiliate company of China Telecom. It reached 1.13 trillion CNY ($18.37 Billion) transaction volume in 2018, with 500 million registered users and 41.9 million active users. Subsidiaries: Bestpay - a mobile wallet and payment app Jieqian - a consumer loan service Orange Wealth Orange Insurance Orange Credit Orange Financial Cloud

6 .

7 .Source: iiMedia Research Inc.

8 .High Industry Penetration Rate Source: China Unionpay

9 .Source: RSA

10 .Challenges ❏ High concurrency ❏ > 50M transactions, 1 billion events a day (peek: 35K/s) ❏ Low latency demand ❏ response < 200ms ❏ Large number of batch jobs and streaming jobs

11 .“A merchant’s total transaction volume ($) within the past month (30days) (current transaction included)” = sum($past_29days) + sum($today_upto_current) batch streaming

12 .Architecture V1 API Gateway

13 .Architecture V1 - Lambda Speed/Streaming Layer API Serving Gateway Layer Batch Layer

14 .Drawbacks ❏ S/W stacks complexity ❏ Realtime / Offline / Serving stacks ❏ Multiple clusters to maintain (Kafka / Hive / Spark / Flink) ❏ Different skill sets to manipulate (Scala / Java / SQL) ❏ Segmented Logics ❏ Historical/Current ❏ Data redundancy ❏ Multiple duplications to move over

15 .Introduce Apache Pulsar

16 .What is Apache Pulsar?

17 . “Flexible Pub/Sub Messaging Backed by durable log storage”

18 .Pulsar - A cloud-native architecture Stateless Serving Durable Storage

19 .Pulsar - Segment Centric Storage ❏ Topic Partition (Managed Ledger) ❏ The storage layer for a single topic partition ❏ Segment (Ledger) ❏ Single writer, append-only ❏ Replicated to multiple bookies

20 .Pulsar - Pub/Sub

21 .Pulsar - Topic Partitions

22 .Pulsar - Segments

23 .Pulsar - Stream

24 .Pulsar - Stream as a unified view on data

25 .Pulsar - Two levels of reading API ❏ Pub/Sub (Streaming) ❏ Read data from brokers ❏ Consume / Seek / Receive ❏ Subscription Mode - Failover, Shared, Key_Shared ❏ Reprocessing data by rewinding (seeking) the cursors ❏ Segment (Batch) ❏ Read data from storage (bookkeeper or tiered storage) ❏ Fine-grained Parallelism ❏ Predicate pushdown (publish timestamp)

26 .Unified Data Processing on Pulsar

27 .Architecture V2 Spark Structured Streaming Spark SQL API Gateway

28 .Architecture V2 Spark Structured Streaming Spark SQL API Gateway ❏ Single Data Store (Pulsar) ❏ Single Computing Engine (Spark) ❏ Unified API

29 .Pulsar-Spark ❏ Deeply integrated with Pulsar schema ❏ Pulsar topics as Structured Streams ❏ Pulsar Connectors for Spark Structured Streaming ❏ Pulsar Connectors for Spark SQL https://github.com/streamnative/pulsar-spark

8 点赞
3 收藏
2下载
确认
3秒后跳转登录页面
去登陆