使用Delta Lake构建批流一体数据仓库
本次分享将从当前大数据分析常见痛点入手,解析Databricks最新开源的Delta Lake项目的产生背景及实现细节,最后以现场demo的形式展示Delta Lake的基本用法。(视频主要为DeltaLake功能演示)
展开查看详情
1. Delta Lake yuanjian.li@databricks.com 2019.11.30
2.
3.Index • Delta Lake • Delta Lake • Live Demo
4. Data Lake Data Lake
5. Real-time Streaming, Data Science and ML • Recommendation Engines • Risk, Fraud, & Intrusion Detection • Customer Analytics IoT & Predictive Maintenance Data Lake • • Genomics & DNA Sequencing
6. Real-time Streaming, Data Science and ML • Recommendation Engines • Risk, Fraud, & Intrusion Detection • Customer Analytics IoT & Predictive Maintenance Data Lake • • Genomics & DNA Sequencing
7. Data Lake Events ? Streaming Analytics Data Lake AI & Reporting
8. Data Lake Events Streaming Analytics Data Lake AI & Reporting
9. Data Lake #1 λ-arch 1 λ-arch Events 1 1 λ-arch Streaming Analytics Data Lake AI & Reporting
10. Data Lake #2 λ-arch 1 λ-arch Events 1 2 Validation 1 λ-arch Streaming Analytics 2 Validation Data Lake AI & Reporting
11. Data Lake #3 λ-arch 1 λ-arch Events 1 2 Validation 1 λ-arch Streaming 3 Reprocessing Analytics 2 Validation Partitioned 3 Reprocessing Data Lake AI & Reporting
12. Data Lake #4 λ-arch 1 λ-arch Events 1 2 Validation 1 λ-arch Streaming 3 Reprocessing Analytics 4 Updates 2 Validation Partitioned 3 4 Scheduled to Avoid Modifications Reprocessing Data Lake 4 UPDATE & AI & Reporting MERGE
13. Data Lake • • • •
14. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake
15. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake 1. Reader Writer
16. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake 1. Reader Writer 2. Spark
17. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake 1. Reader Writer 2. Spark 3. Time Travel
18. Kinesis CSV, ? AI & Reporting JSON, TXT… Data Lake 1. Reader Writer 2. Spark 3. Time Travel 4.
19. Kinesis CSV, JSON, TXT… AI & Reporting Data Lake 1. Reader Writer 2. Spark 3. Time Travel 4. 5.
20.Delta Lake Streaming Kinesis Analytics CSV, JSON, TXT… Data Lake AI & Reporting
21.