- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Easily Build a Smart Pulsar Stream Processor——Simon Crosby
展开查看详情
1 .Continuous Stream Processing with Pulsar and Swim Simon Crosby, CTO Swim swim.a i
2 .SwimOS is an Apache 2.0 licensed platform that makes it easy to build applications that deliver continuous intelligence from streaming data, at scale swimos.org
3 .PM25 Pollution NOX Pollution
4 . Streaming Platforms • Apache Pulsar • Apache Kafka • Apache Beam • Support pub/sub at scale • CNCF NATS • Buffer data between pubs & subs • Amazon Kinesis • Event-time ordered delivery • Google pub/sub • Azure Enterprise Data Bus • Events stored in arrival order • Salesforce Kafka • Don’t run applications • Confluent Cloud • … Ø SwimOS is a stream processor that delivers continuous intelligence from streaming data
5 . Stream Processors • Stream processors subscribe to a broker to analyze streaming event data • Their insights can be asynchronously consumed by publishing back to the broker • The broker offers a low-latency API that gives the stream processor events in real-time • Pulsar does not control execution of the stream processor
6 .SwimOS is a Stateful, Real-time Stream Processor Application: Distributed, stateful, concurrent graph of Web Agents & real-time UIs Infra: Distributed, p2p mesh of instances on k8s using WebSockets • Builds and auto-scales apps from real-world event data, creating a stateful graph that continuously computes – driven by data • Automates infrastructure operation • Load balances, secures, persists and auto-scales the application • Apps are easy to develop • Delivers unimaginable performance
7 .Major Mobile Provider • > 150M devices • > 10Gb/s of streaming data from Pulsar • Continuous analysis, aggregation & reduction • Millisecond latency • Pervasively real-time UI • Distributed across AZ 6
8 . Pulsar’s Many Pros • Event Processing – Filtering – Transformation – Counts / Windows – Alerts • Serverless is a great abstraction • SQL-style API • Storage tiering • Delivery guarantees • lll Multi-tenancy l • Replication • Scaling Database
9 . Challenges… l l • How ! many topics do you need? üü ü ü l l
10 . Challenges… l l ! l engine_temp: 290 fan_temp: 188 l coolant_vol: 25
11 . Challenges… Streaming analytics (#solved !) ☞ polling ® not real-time Client Application Client Client Client Client Continuous Intelligence •demands engine_temp: 290 Databases fan_temp:don’t 188drive computation! • Data driven computation (though in-memory is faster) • Analysis in context, everywhere, coolant_vol: concurrently 25 • What DB architecture do you need? • Stateful, in-memory, distributed • Scaling / clustering / consistency … • Pervasively real-time computation
12 .> Palo Alto, CA
13 . 60 ~600 TB/day TB/day There’s more data than your cloud could store ▶ (mostly ephemeral)
14 .Intelligence is driven by (a flow of) state changes - not raw data
15 . Users Want Stateful, Continuous, Contextual Analysis λ λ xn-1 Streams are a sequence of state changes They never stop… (so “store-then-analyze” is silly) Applications always have to have an answer “Meaning” depends on granular contextual relationships
16 . Introducing Swim Web Agents • SwimOS subscribes to event streams from real-world sources • It creates a stateful, concurrent web agent for each data source • Each web agent cleans, labels, analyzes data from its real-world twin • Agents dynamically link to related agents, creating a stateful in-memory graph • Containment, proximity… logical relationships eg: pod/cluster … • Computed relationships: correlated… • Linked web agents share their states in real-time • Web Agents are vertices in the graph • Each continually computes on its own state & state of its links, as data flows over the graph – and streams its results in real-time over its links • This is data-driven, stateful, continuous computation
17 .Web Agents Continuously Compute - Driven by Data Analyze data to determine state Real-world Stateful Web Agent Analytics Relational Analysis Relational Graph MapReduce Learning & Prediction
18 .I’m green • Web agents are stateful
19 .I’m red • Raw data can typically be discarded
20 .I’m still red • Noisy / redundant updates are discarded
21 .… … I’m green …… … … I’m still red … ……… …… …… …… … … … … No push … … … …No push … … … … Streaming data auto-scales the application – … … composed of concurrent web agents - at low cost, in … real time, as data arrives
22 .A Swim Application is an Active Graph
23 .A graph of linked active Web Agents 1000m
24 . Web Agents Link to Form a Computational Graph ① SwimOS creates a web agent for each source in streaming data Web agents continuously compute on their own state and the state of linked web agents enabling granular contextual analysis on-the-fly ③ Powerful operators for analysis, learning & prediction continuously compute on state ② Agents interlink to reflect & stream results real-world relationships
25 .Eg:Un-supervised Training & Prediction Observed Predicted D Training P ro B a c pa k ga tio n
26 .• A scaled application is a graph dynamically built from data • Objects are stateful and concurrent
27 . SwimOS Eliminates “the Stack” Developer defines entities & their relationships – as Java objects Swim builds a stateful, distributed, graph of concurrent web agents that statefully represent real-world sources, from streaming data * Web agents collaborate to analyze, learn, predict and respond on the fly = They continuously stream real- time insights to UIs & applications
28 . Pulsar and Swim: Better Together Application: Distributed, stateful, concurrent graph of Web Agents & real-time UIs Infra: Distributed, p2p mesh of instances on k8s using WebSockets • Builds and auto-scales apps from real-world event data, creating a stateful graph that continuously computes – driven by data • Automates infrastructure operation • Load balances, secures, persists and auto-scales the application • Apps are easy to develop • Delivers unimaginable performance
29 .Questions ? swim.a i