使用Delta Lake构建批流一体数据仓库

本次分享将从当前大数据分析常见痛点入手,解析Databricks最新开源的Delta Lake项目的产生背景及实现细节,最后以现场demo的形式展示Delta Lake的基本用法。(视频主要为DeltaLake功能演示)

展开查看详情

1. Delta Lake yuanjian.li@databricks.com 2019.11.30

2.

3.Index • Delta Lake • Delta Lake • Live Demo

4. Data Lake Data Lake

5. Real-time Streaming, Data Science and ML • Recommendation Engines • Risk, Fraud, & Intrusion Detection • Customer Analytics IoT & Predictive Maintenance Data Lake • • Genomics & DNA Sequencing

6. Real-time Streaming, Data Science and ML • Recommendation Engines • Risk, Fraud, & Intrusion Detection • Customer Analytics IoT & Predictive Maintenance Data Lake • • Genomics & DNA Sequencing

7. Data Lake Events ? Streaming Analytics Data Lake AI & Reporting

8. Data Lake Events Streaming Analytics Data Lake AI & Reporting

9. Data Lake #1 λ-arch 1 λ-arch Events 1 1 λ-arch Streaming Analytics Data Lake AI & Reporting

10. Data Lake #2 λ-arch 1 λ-arch Events 1 2 Validation 1 λ-arch Streaming Analytics 2 Validation Data Lake AI & Reporting

11. Data Lake #3 λ-arch 1 λ-arch Events 1 2 Validation 1 λ-arch Streaming 3 Reprocessing Analytics 2 Validation Partitioned 3 Reprocessing Data Lake AI & Reporting

12. Data Lake #4 λ-arch 1 λ-arch Events 1 2 Validation 1 λ-arch Streaming 3 Reprocessing Analytics 4 Updates 2 Validation Partitioned 3 4 Scheduled to Avoid Modifications Reprocessing Data Lake 4 UPDATE & AI & Reporting MERGE

13. Data Lake • • • •

14. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake

15. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake 1. Reader Writer

16. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake 1. Reader Writer 2. Spark

17. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake 1. Reader Writer 2. Spark 3. Time Travel

18. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake 1. Reader Writer 2. Spark 3. Time Travel 4.

19. Kinesis CSV, JSON, TXT… AI & Reporting Data Lake 1. Reader Writer 2. Spark 3. Time Travel 4. 5.

20.Delta Lake Streaming Kinesis Analytics CSV, JSON, TXT… Data Lake AI & Reporting

21.

我们团队核心骨干来自百度、阿里、滴滴、美团,有多年的大数据产品研发经验,业务覆盖产业互联网的各个领域(bdp.cn),工程师文化;;海致目前已完成C轮3000万美元的融资。https://www.haizhi.com/
关注他