- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
结构化流中的连续处理
展开查看详情
1 .Continuous Processing in Structured Streaming Jose Torres, Databricks #Dev4SAIS
2 .Continuous Processing Overview ● Unified Spark SQL API ● No microbatches ● Low (~1ms) latency Continuous Processing Microbatch #Dev4SAIS 2
3 .DStream API ● Non-declarative, similar to RDDs ● Scala/Java only ● Checkpoints only through complete snapshots ● No event time #Dev4SAIS 3
4 .Structured Streaming ● Data represented as a virtual append-only table ● Unified Spark SQL query API ● Batch and streaming queries return same results #Dev4SAIS 4
5 .Structured Streaming #Dev4SAIS 5
6 .Structured Streaming Features ● Dataframes and Datasets ● SQL, Python, and R language APIs ● Delta-based aggregation state #Dev4SAIS 6
7 .Microbatches #Dev4SAIS 7
8 .Continuous Processing #Dev4SAIS 8
9 .Chandy-Lamport Checkpoints ● Asynchronous Driver checkpoint complete ready for global commit ● Consistent epoch marker Data Stream Aggregation Reader Writer Task Task save checkpoint to state store partition level commit #Dev4SAIS 9
10 .Checkpointing - Detailed Driver Shuffle epoch markers Reader Processing Aggregation Writer Reader Processing Reader Processing Aggregation Writer #Dev4SAIS 10
11 .Checkpointing - Detailed Shuffle Reader Processing Aggregation Writer Reader Processing Reader Processing Aggregation Writer #Dev4SAIS 11
12 .Checkpointing - Detailed Shuffle Reader Processing Aggregation Writer Reader Processing Reader Processing Aggregation Writer #Dev4SAIS 12
13 .Checkpointing - Detailed Shuffle Reader Processing Aggregation Writer Reader Processing Reader Processing Aggregation Writer #Dev4SAIS 13
14 .Checkpointing - Detailed Shuffle Reader Processing Aggregation Writer Reader Processing Reader Processing Aggregation Writer #Dev4SAIS 14
15 .Checkpointing - Detailed Shuffle Reader Processing Aggregation Writer aggregation checkpoint Reader Processing Reader Processing Aggregation Writer aggregation checkpoint #Dev4SAIS 15
16 .Checkpointing - Detailed Driver Shuffle ready for global commit Reader Processing Aggregation Writer commit partition writer Reader Processing Reader Processing Aggregation Writer commit partition writer #Dev4SAIS 16 commit partition writer
17 .Continuous Processing API ● It’s just Structured Streaming ● Run the same queries in continuous mode #Dev4SAIS 17
18 .Continuous Processing in 2.3 ● Initial experimental release ● Supports ETL use cases #Dev4SAIS 18
19 .Ongoing And Future Work ● Shuffles (SPARK-24036) ● Event time (SPARK-24459) ● Metrics (SPARK-23887) ● Exactly-once semantics mode (SPARK-24460) ● Performance testing (TBD) ● Additional data sources (TBD) 19
20 .Q&A 20