- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
基于streaming构建统一的数据处理引擎的挑战与实践 部分1
展开查看详情
1 .ݪلғᴨ᯾૬૬ $OLEDED ᘳ֖ғṛᕆದӫਹ ṛᕆݎૡᑕ ᄍᦖᘏғظᇙ շ嬬
2 . About me n ظᇙҁṼੰ҅Kurt҂ n Apache Flink Committer n ଙᏗॊླӱفےᴨ᯾҅݇Өग़ӻᦇᓒଘݣ໐ஞᶱፓጱ ᦡᦇᎸ҅ݎ۱ೡᔱක̵᧣ଶᔮᕹ̵ፊഴړຉᒵ̶ፓ ڹᨮᨱ%OLQN 64/ጱᎸݎս۸
3 . Agenda Why What How Achievement Future
4 .Why to Unify Batch and Stream ᴹવጱԟใᕚғӷӻක҅ӷղդᎱ Steep learning curve: two engines, two codes https://mapr.com/developercentral/lambda-architecture/
5 .Why to Unify Batch and Stream य़හഝ $,قว Ń BIG DATA & LANDSCAPE 2018 http://mattturck.com/bigdata2018/
6 . Why to Unify Batch and Stream ԟӞॺකٟ҅ӞղդᎱ҅ݶᬩᤈၞ॒ቘಢ॒ቘ҅ኜᛗๅग़ Learn One Engine, Write One Code, Run both Stream and Batch, Even More
7 . Agenda Why What How Achievement Future
8 . What is an Unified SQL Engine 01 አಁଶғӞղդᎱ҅Ӟጱᕮຎ User Perspective: One Query, Same Result 02 ݎଶғຝᕹӞ̵դᎱ॔አ̵ၞಢᣟݳ Dev Perspective: Unified Arch, Code Reuse
9 . User: One Query, Same Result CREATE TABLE USER_SCORES ( Name VARCHAR, Score DOUBLE, Time TIMESTAMP ) WITH ( ... ); SQL is exactly the same in CREATE TABLE TOTOAL_SCORES ( Name VARCHAR, Streaming and Batch TotalScore DOUBLE, MaxTime TIMESTAMP ) WITH ( ... ); INSERT INTO TOTAL_SOURCES SELECT Name, SUM(Score), MAX(Time) FROM USER_SCORES GROUP BY Name;
10 . User: One Query, Same Result Batch Mode: 12:07> SELECT Name, SUM(Score), MAX(Time) FROM USER_SCORES GROUP BY Name; ------------------------- ------------------------- | Name | Score | Time | | USER_SCORES | ------------------------- ------------------------- | Julie | 12 | 12:07 | | User | Score | Time | | Frank | 5 | 12:06 | ------------------------- ------------------------- | Julie | 7 | 12:01 | | Frank | 3 | 12:03 | Stream Mode: | Julie | 1 | 12:03 | 12:01> SELECT Name, SUM(Score), MAX(Time) FROM USER_SCORES GROUP BY Name; | Frank | 2 | 12:06 | ------------------------------------------------------------------------------------- | Julie | 4 | 12:07 | | [-inf, 12:01) | [12:01, 12:04) | [12:04, now) | ------------------------- | ------------------------- | ------------------------- | ------------------------- | USER_SCORES is a | | Name | Score | Time | | | Name | Score | Time | | | Name | Score | Time | | | ------------------------- | ------------------------- | ------------------------- | source table/stream. | | | | | | | Julie | 8 | 12:03 | | | Julie | 12 | 12:07 | | | | | | | | | Frank | 3 | 12:03 | | | Frank | 5 | 12:06 | | | ------------------------- | ------------------------- | ------------------------- | ------------------------------------------------------------------------------------- Reference: https://qconsf.com/sf2017/system/files/presentation-slides/foundations_of_streaming_sql_qcon_sf_2017.pdf
11 . Developer: Unified Architecture ? DataStream API DataSet API Stream Processing Batch Processing Table API & SQL Runtime Relational Operator Tree DataStream API DataSet API Transformation Stream Processing Batch Processing Batch Plan Runtime StreamGraph Distributed Streaming Dataflow Optimized Plan Local Cluster Cloud Single JVM Standalone, YARN GCE, EC2 Job Graph Stream Task & Operator Batch Task & Driver
12 . Developer: What about Code Reuse SELECT * FROM t1, t2 WHERE t1.a = t2.b where t1.a < 1 Streaming Batch DataStream[Row] t1; DataSet[Row] t1; DataStream[Row] t2; DataSet[Row] t2; t1.connect(t2) t1.join(t2) .keyBy(a, b) .where(a) .transform(KeyedCoProcessOperator) .equalTo(b) .with(joinFunc) // t1.a < 1 class CoProcessFunction<IN1, IN2, OUT>
13 . Developer: What about Runtime Push (stream) scanFile() process(elem) process(elem) process(elem) " # ! SELECT SUM(T.B) T execute() FROM T WHERE T.A < 10 scanFile() elem=next() elem=next() elem=next() Pull (batch) ഴګၞ (Control Flow) හഝၞ (Data Flow)
14 . Agenda Why What How Achievement Future
15 . About me n շ嬬ҁԯᮐ҅-DUN҂ n Apache Flink Committer l Contributing since Flink v1.0 l Focusing on Flink Table & SQL since 3 years n Software Engineer at Alibaba in Blink SQL team
16 . How to Unify Batch and Stream 01 ቘᦞचᏐғ ۖாᤒ 03 ս۸ጱᕹӞ Dynamic Table Unification of the Optimizer 02 ຝදᬰ 04 चᏐහഝᕮጱᕹӞ Improve Architecture Unification of Basic Data Structure 05 ᇔቘਫሿጱوՁ Sharing of the Runtime Implementation