Pulsar Functions Deep Dive--Sanjeev Kulkarni

展开查看详情

1. © 2020 SPLUNK INC. Pulsar Functions A Deep Dive | Pulsar Summit 2020 Sanjeev Kulkarni sanjeevk@splunk.com

2. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Agenda Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work

3. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Agenda Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work

4. © 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Core Concept Bringing Serverless concepts to the Abstract View streaming world. Execute processing logic per message on input topic Function output goes to an output topic • Optional

5. © 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Simple API Emphasis on simplicity SDK-less API Great for 90% use-cases on streams • Filtering import java.util.function.Function; public class ExclamationFunction implements Function<String, String> { • Routing @Override • Enrichment public String apply(String input) { return input + "!"; } } Not meant to replace Spark/Flink

6. © 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Function lifecycle Flexible execution environments • Pulsar managed – Thread – Process • Externally managed – Kubernetes CRUD based Rest API

7. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Agenda Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work

8. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function Representation Submit to any worker FunctionConfig public class FunctionConfig { Json repr of FunctionConfig private String tenant; private String namespace; • tenant/namespace/name private String name; • Input/Output private String className; private Collection<String> inputs; • configs private String output; • lot more knobs …. private ProcessingGuarantees processingGuarantees; private Map<String, Object> userConfig; private Map<String, Object> secrets; private Integer parallelism; Function Code private Resources resources; • jars/.py/zip/etc ... }

9. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Submission Checks AuthN/AuthZ checks FunctionMetaData FunctionConfig validation message FunctionMetaData { • missing parameters FunctionDetails functionDetails; • Incorrect parameters PackageLocationMetaData packageLocation; • Local Configs uint64 version; uint64 createTime; map<int32, FunctionState> instanceStates; Function Code Validation FunctionAuthenticationSpec functionAuthSpec; • class presence, etc } Copy Code to Bookeeper

10. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager Function MetaData Manager System of record Stores all Functions • map from <FQFN, FunctionMetaData> foo -> {functionDetails : {...}, MetaData Topic Tailer version: 2, …} FQFN:- Fully Qualified Function Name Backed by Pulsar Topic MetaData Topic • Function MetaData Topic Contains a MetaData Topic Tailer

11. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- Update State Machine Just before Function creation/update/delete foo -> {functionDetails : {...}, MetaData Topic Tailer version: 2, …} MetaData Topic

12. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- Update State Machine Make a copy of the current state foo -> {functionDetails : {...}, version: 2, …} foo -> {functionDetails : {...}, MetaData Topic Tailer version: 2, …} MetaData Topic

13. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- Update State Machine Make a copy of the current state foo -> {functionDetails : {......}, version: 2, …} Merge the updates foo -> {functionDetails : {...}, MetaData Topic Tailer version: 2, …} MetaData Topic

14. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- Update State Machine Make a copy of the current state foo -> {functionDetails : {......}, version: 3, …} Merge the updates foo -> {functionDetails : {...}, Increment the version MetaData Topic Tailer version: 2, …} MetaData Topic

15. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- Update State Machine Make a copy of the current state Merge the updates foo -> {functionDetails : {...}, Increment the version MetaData Topic Tailer version: 2, …} Write to MetaData Topic MetaData Topic

16. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- Update State Machine Make a copy of the current state Merge the updates foo -> {functionDetails : {...}, Increment the version MetaData Topic Tailer version: 2, …} Write to MetaData Topic Tailer reads and verifies MetaData Topic

17. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- Update State Machine Make a copy of the current state Merge the updates foo -> {functionDetails : {.....}, Increment the version MetaData Topic Tailer version: 3, …} Write to MetaData Topic Tailer reads and verifies MetaData Topic Upon no conflict, tailer updates

18. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- When do conflicts occur? Worker 2 Worker 1 Multiple Workers foo -> {functionDetails : {...}, foo -> {functionDetails : {...}, version: 2, version: 2, …} …} MetaData Topic Tailer MetaData Topic Tailer MetaData Topic

19. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- When do conflicts occur? Worker 2 Worker 1 Multiple Workers foo -> {functionDetails : {...}, foo -> {functionDetails : {...}, Concurrent updates to same function version: 2, …} version: 2, …} MetaData Topic Tailer MetaData Topic Tailer MetaData Topic

20. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- When do conflicts occur? Worker 2 Worker 1 Multiple Workers foo -> {functionDetails : {...}, foo -> {functionDetails : {...}, Concurrent updates to same function version: 3, …} version: 3, …} First Writer Wins MetaData Topic Tailer MetaData Topic Tailer MetaData Topic

21. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Function MetaData Manager:- When do conflicts occur? Worker 2 Worker 1 Multiple Workers foo -> {functionDetails : {...}, foo -> {functionDetails : {...}, Concurrent updates to same function version: 3, …} version: 3, …} First Writer Wins MetaData Topic Tailer MetaData Topic Tailer Others are rejected MetaData Topic

22. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Advantages Worker 2 Worker 1 Submit to any worker foo -> {functionDetails : {...}, foo -> {functionDetails : {...}, Validation load scales linearly version: 3, …} version: 3, …} Deterministic State Machine MetaData Topic Tailer MetaData Topic Tailer MetaData Topic is audit log MetaData Topic

23. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Pitfalls Worker 2 Worker 1 MetaData topic topic growth foo -> {functionDetails : {...}, foo -> {functionDetails : {...}, MetaData Topic compaction version: 3, version: 3, …} …} non-trivial Worker Start time MetaData Topic Tailer MetaData Topic Tailer All Workers know everything MetaData Topic

24. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Agenda Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work

25. © 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Pluggable Scheduler IScheduler Interface Abstracts out Scheduler Executed only on a Leader public interface IScheduler { List<Assignment> schedule(<List<Instance> unassigned, List<Instance> current, Invoked when Set<String> workers); • Function CRUD operations } – create/update – delete • Worker Changes – Unresponsive/dead workers – New workers – Periodic – Leadership changes

26. © 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Leader Election Leader Election Empty Coordination Topic Failover Subscription Worker 3 Worker 2 Worker 1 Active Consumer is the Leader Coordination Topic

27. © 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Function Assignments Worker 3 Worker 2 Worker 1 Assignment Topic {foo, 1} : worker-1, {foo, 1} : worker-1, {foo, 1} : worker-1, ... ... ... Written by the Leader Compacted based on key(FQFN + Assignment Tailer Assignment Tailer Assignment Tailer Instance Id) All workers know about all assignments Assignment Topic

28. © 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Assignment Topic Assignment Stores Assignment message Instance { Compacted FunctionMetaData functionMetaData = 1; int32 instanceId = 2; Key -> (FQFN + InstanceId) } message Assignment { Instance instance = 1; string workerId = 2; }

29. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Agenda Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。
关注他