- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Unified Data Processing with Apache Flink and Apache Pulsar——Seth Wiesman
展开查看详情
1 .Unified Data Processing with Apache Flink and Apache Pulsar Seth Wiesman Senior Solutions Architect @ Ververica Committer Apache Flink © 2019 Ververica
2 .About Ververica (the company formerly known as “data Artisans”) Original Creators of Enterprise Stream Processing Subsidiary of Apache Flink® With Ververica Platform Alibaba Group 2 © 2019 Ververica
3 . Apache Flink 3 © 2019 Ververica
4 .Apache Flink 4 © 2019 Ververica
5 . Apache Flink at The "Singles Day" (11/11/2019) containers data size throughput latency state size Sub- 2M 985 PB 2.5 B Second 100TB events / sec 5 © 2019 Ververica
6 . Why Stream Processing? 6 © 2019 Ververica
7 . Stream Processing is real-time data processing and real-time data-driven actions 7 © 2019 Ververica
8 . Stream Processing is the unification of real-time and offline analytics 8 © 2019 Ververica
9 . Stream Processing is the intersection of data analytics and applications 9 © 2019 Ververica
10 . Stream Processing is to event-driven applications what the database is to request/response apps 10 © 2019 Ververica
11 . Stream Processing is a flexible and extensible architecture for data-driven applications 11 © 2019 Ververica
12 .Stream Processing changes how Applications and Data interact request/trigger result/response event stream event stream Stream Application / Processor Business Logic events are the data events act as triggers Application / application logic triggered Business Logic by events/changes (Datalake, Database) Batch Proc. or Req/resp. Stream Processing 12 © 2019 Ververica
13 . What is Stream Processing for? Ad-hoc queries, data exploration, Most business logic ML model training query/logic changes fast data changes fast data changes slowly query/logic changes slowly Batch Proc. or Req/resp. Continuous Streaming 13 © 2019 Ververica
14 . The Spectrum of Streaming Data Use Cases machine learning unified offline/ real-time behavior modeling real-time ML model model training real-time analytics (recommenders, pricing, ..) training/evaluation data warehousing continuous continuous monitoring real-time alerts distributed OLAP / BI / reporting ETL (position, risk, …) (fraud, security, …) OLTP-style apps more lag time more real time 14 © 2019 Ververica
15 .Stateful Single Record Processing 15 © 2019 Ververica
16 .Everything is a Stream Streams Of Records in a Log or MQ 16 © 2019 Ververica
17 .Everything is a Stream Stream of Requests/Responses to/from Services GET /a/b POST /b/c PUT /e/f Service 200 404 200 200 403 DB à event sourcing architecture 17 © 2019 Ververica
18 .Everything is a Stream Stream of Rows in a Table or in Files 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am … 2016-3-11 10:00pm 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-12 2:00am 2016-3-12 3:00am 18 © 2019 Ververica
19 .Everything is a Stream Stream of Rows in a Table or in Files 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am … 2016-3-11 10:00pm 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-12 2:00am 2016-3-12 3:00am a batch 19 © 2019 Ververica
20 .Everything is a Stream Streams may span storage systems Parquet files Avro records 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am … 2016-3-11 10:00pm 2016-3-11 11:00pm more distant past recent past (e.g., compressed files in DFS/Object Store) (e.g., events in MQ/Log) 20 © 2019 Ververica
21 .21 © 2019 Ververica
22 .Bounded and Unbounded Streams 22 © 2019 Ververica
23 .Components of a Streaming Data Architecture (Apache Flink) Results (Views) (K/V stores, databases) Stream Processing Stream Processing Stream Processing Log / Stream Storage Triggered (Pulsar) Applications Event producers (applications, servers, databases, sensors) 23 © 2019 Ververica
24 . Apache Flink: Analytics and Applications on Streaming Data Stateful Stream Processing Event-driven Streaming Analytics Applications Streams, State, Time SQL and Tables Stateful Functions Flink Runtime Stateful Computations over Data Streams 24 © 2019 Ververica
25 . Stateful Stream Processing 25 © 2019 Ververica
26 . Apache Flink: Analytics and Applications on Streaming Data Stateful Stream Processing Event-driven Streaming Analytics Applications Streams, State, Time SQL and Tables Stateful Functions Flink Runtime Stateful Computations over Data Streams 26 © 2019 Ververica
27 .Stateful Stream Processing Source (Stream) Source (Static) Computation State Computation State Transformation Computation Computation State Sink Sink 27 © 2019 Ververica
28 .Example Use Cases • Real time search and recommendation models (e.g., Alibaba) • Build a real-time session behavior profile of users (e.g., Netflix) • Real time trade settlement dashboard (e.g., UBS) • Real time revenue accounting (various AdTechs) • Machine Learning-based anomaly/fraud detection (e.g., ING, Microsoft) • Real-time data refinement and data pipelines (many) 28 © 2019 Ververica
29 .DataStream API val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…)) Source val events: DataStream[Event] = lines.map((line) => parse(line)) Transformation val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) Windowed Transformation .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Sink Streaming Dataflow Source Transform Window Sink (state read/write) 29 © 2019 Ververica