- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Timely data in a data warehouse is a challenge many of us face, often with there being no straightforward solution.
Using a combination of batch and streaming data pipelines you can leverage the Delta Lake format to provide an enterprise data warehouse at a near real-time frequency. Delta Lake eases the ETL workload by enabling ACID transactions in a warehousing environment. Coupling this with structured streaming, you can achieve a low latency data warehouse. In this talk, we’ll talk about how to use Delta Lake to improve the latency of ingestion and storage of your data warehouse tables. We’ll also talk about how you can use spark streaming to build the aggregations and tables that drive your data warehouse.