申请试用
HOT
登录
注册
 
账号已存在
PingCAP-Infra-Meetup-105-Happy+Hacking+TiDB
TiDB
/
发布于
/
4131
人观看
本次杜川老师的分享主要分成三个部分: 1.首先通过对现有 Streaming 系统和 Batch 系统的分析,讨论了在数据处理领域 Streaming 和 Batch 的异同,明确了 Streaming 的核心本质,探讨了 Streaming 和 Batch 融合处理的可能性和必要性,并对现有类似系统进行了简单的分析。 2.简单回顾了 RDMS 中经典的 Volcano 模型的执行流程,探讨了在 RDMS 上支持 Streaming 处理的难点以及 Streaming SQL 设计的关键要素。 3.介绍了 TBSSQL 的设计思路,架构设计和若干关键技术点的方案选择,展示了 TBSSQL 的运行 Demo。并以 TBSSQL 为例,简单介绍了在 TiDB 上增加一个 Feature 的大致思路和入手点。
展开查看详情

1 . Streaming, SQL and TiDB Batch Streaming SQL Presented By Du Chuan

2 .About me • Infrastructure engineer - working on cloud database • TiDB committer - mainly in the TiDB SQL layer • Technophile - database & distribute systems

3 .Agenda • Streaming Overview • SQL and Streaming SQL • Introduction to TBSSQL

4 .Part I – Streaming Overview

5 .“Before it’s news, it’s data. After it‘s news, it’s data.”

6 .Batch systems VS Streaming systems

7 .Batch VS Streaming • Processing on data • Get data via processing • Finite data set • Infinite data streaming

8 .But wait… What’s streaming, indeed?

9 .Essentials of Streaming • Bind with time • Strictly ordered by time • Infinite

10 .Streaming is the way the world organizing data

11 . Batch …… …… Streaming

12 .Batch is a snapshot of some Streaming At some point of time In some scope

13 .So... One system to rule them both ?

14 .Batch && Streaming systems • Using a batch runtime to simulate • Using a streaming runtime to simulate streaming batch

15 .What about the RDMS?

16 . An example – Ads click analysis Click- User ID Click User 1 ADS Firehose 1 DML Click- User ID Click User 2 ADS 2 Click- User ID Click User 3 ADS 3 Click- User ID Click User 1 ADS 1 Click- User ID Click User 3 ADS 3 Click- User ID User Table Ads Table Click User 1 ADS 1

17 .Part II – SQL and Streaming SQL

18 .The volcano model… Again

19 .SQL consists various kinds of operators

20 . The SQL operators SELECT t1.a, count(*) FROM t1 JOIN t2 ON t1.b = t2.b WHERE t1.c > 10 GROUP BY t1.a; Projection Join Selection Aggregation

21 .The SQL operator interface • Open: Init operator SQL Operator • Next: Consume input and generate output • Close: Do cleanup job

22 .The volcano model input = child.Next() return proj(input, t.a + t.b) SELECT t.a + t.b (". $ + ". &) FROM t input = child.Next() WHERE t.c > 10; return filter(input, t.c > 10) (t.c > 10) t return scan(t)

23 .The SQL operator reordering >1 ( ) ) ) > 2 < 2 > ( ) ) ) 2 < >1 2

24 . Aggregation operator • SELECT a, COUNT(c) AS cnt FROM t GROUP BY a HAVING cnt > 1; 1. Grouping 2. Apply aggregators 3. Filter using“Having”clause

25 . Join operator • SELECT * FROM t1 (LEFT/RIGHT/FULL OUTER) JOIN)t2 ON t1.a = t2.a; 1. Get Cartesian product 2. Filter using “On” . clause

26 .When SQL operator meets Streaming…

27 .SELECT a, COUNT(c) FROM t GROUP BY a; 1. Grouping(Never End) 2. Apply aggregators Infinite Data Stream

28 .Solution: limit the scope of streaming using “window”

29 . SELECT a, COUNT(c) FROM t GROUP BY a; 1. Grouping 2. Apply Aggregators Window1 Infinite Data Stream Window2

0 点赞
0 收藏
1下载
确认
3秒后跳转登录页面
去登陆