Streaming SQL 基础

本文从实时数据处理的基本问题和原理入手,讲述其计算模型的基本概念和解决的问题思考方式。(What/Where/When/How)
展开查看详情

1. Foundations of streaming SQL or: how I learned to love stream & table theory Slides: https://s.apache.org/streaming-sql-qcon-london Tyler Akidau Apache Beam PMC Software Engineer at Google @takidau Covering ideas from across the Apache Beam, Apache Calcite, Apache Kafka, and Apache Flink communities, with thoughts and contributions from Julian Hyde, Fabian Hueske, Shaoxuan Wang, Kenn Knowles, Ben Chambers, Reuven Lax, Mingmin Xu, James Xu, Martin Kleppmann, Jay Kreps and many more, not to mention that whole database community thing... QCon London 2018 1

2.Table of Contents 01 Stream & Table Theory A Basics Chapter 7 B The Beam Model 02 Streaming SQL A Time-varying relations Chapter 9 B SQL language extensions 2

3.01 Stream & Table Theory TFW you realize everything you do was invented by the database community decades ago... A Basics B The Beam Model 3

4.Stream & table basics https://www.confluent.io/blog/making-sense-of-stream-processing/ https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/ 4

5.Special theory of stream & table relativity streams → tables: The aggregation of a stream of updates over time yields a table. tables → streams: The observation of changes to a table over time yields a stream. 5

6.Non-relativistic stream & table definitions Tables are data at rest. Streams are data in motion. 6

7.01 Stream & Table Theory TFW you realize everything you do was invented by the database community decades ago... A Basics B The Beam Model 7

8.The Beam Model What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate? 8

9.Reconciling streams & tables w/ the Beam Model ● How does batch processing fit into all of this? ● What is the relationship of streams to bounded and unbounded datasets? ● How do the four what, where, when, how questions map onto a streams/tables world? 9

10.MapReduce input Map Reduce output 10

11.MapReduce input MapRead ReduceRead Map Reduce MapWrite ReduceWrite output 11

12.MapReduce ? MapRead ReduceRead ? ? Map Reduce ? ? MapWrite ReduceWrite ? ? 12

13.MapReduce table MapRead ReduceRead ? ? Map Reduce ? ? MapWrite ReduceWrite ? table 13

14.Map phase table MapRead ? Map ? MapWrite ? 14

15.Map phase API void map(K1 key, V1 value, Emit<K2, V2>); 15

16.Map phase API void map(K1 key, V1 value, Emit<K2, V2>); 16

17.Map phase table MapRead stream Map ? MapWrite ? 17

18.Map phase API void map(K1 key, V1 value, Emit<K2, V2>); 18

19.Map phase table MapRead stream Map stream MapWrite ? 19

20.Map phase API void map(K1 key, V1 value, Emit<K2, V2>); void reduce(K2 key, Iterable<V2> value, Emit<V3>); 20

21.Map phase table MapRead stream Map stream MapWrite table 21

22.MapReduce table MapRead ReduceRead stream ? Map Reduce stream ? MapWrite ReduceWrite table table 22

23.Map phase API void map(K1 key, V1 value, Emit<K2, V2>); void reduce(K2 key, Iterable<V2> value, Emit<V3>); 23

24.Map phase API void map(K1 key, V1 value, Emit<K2, V2>); void reduce(K2 key, Iterable<V2> value, Emit<V3>); 24

25.MapReduce table MapRead ReduceRead stream stream Map Reduce stream stream MapWrite ReduceWrite table table 25

26.Reconciling streams & tables w/ the Beam Model ● How does batch processing fit into all of this? 1.● Tables What isare read into streams. the relationship of streams to bounded and unbounded datasets? 2. Streams are processed into new streams until a grouping operation is hit. ● How do the four what, where, when, how questions map onto a streams/tables 3. Grouping world? into a table. turns the stream 4. Repeat steps 1-3 until you run out of operations. 26

27.Reconciling streams & tables w/ the Beam Model ● How does batch processing fit into all of this? ● What is the relationship of streams to bounded and unbounded datasets? ● HowStreams arewhat, do the four the where, in-motion when,form of data map how questions onto a streams/tables world? both bounded and unbounded. 27

28.Reconciling streams & tables w/ the Beam Model ● How does batch processing fit into all of this? ● What is the relationship of streams to bounded and unbounded datasets? ● How do the four what, where, when, how questions map onto a streams/tables world? 28

29.The Beam Model What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate? 29