申请试用
HOT
登录
注册
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake

Near Real-Time Data Warehousing with Apache Spark and Delta Lake

Spark开源社区
/
发布于
/
3700
人观看

Timely data in a data warehouse is a challenge many of us face, often with there being no straightforward solution.
Using a combination of batch and streaming data pipelines you can leverage the Delta Lake format to provide an enterprise data warehouse at a near real-time frequency. Delta Lake eases the ETL workload by enabling ACID transactions in a warehousing environment. Coupling this with structured streaming, you can achieve a low latency data warehouse. In this talk, we’ll talk about how to use Delta Lake to improve the latency of ingestion and storage of your data warehouse tables. We’ll also talk about how you can use spark streaming to build the aggregations and tables that drive your data warehouse.

6 点赞
2 收藏
4下载
确认
3秒后跳转登录页面
去登陆