- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- <iframe src="https://www.slidestalk.com/u180/Real_Time_Attribution_with_Structured_Streamingand_Databricks_Delta_?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
- 微信扫一扫分享
结构化流和数据库增量的实时属性
展开查看详情
1 .Real-Time Attribution with Structured Streaming and Databricks Delta Caryl Yuhas, Databricks #ExpSAIS13
2 .Introduction • Goal: Caryl previo Provide tools and information u for At sly MediaM tributi on, SA ath / SE / that can help you build more for Da PM tabric ks real-time / lower latency attribution pipelines • Crawl, Walk, Run: Pull Model #ExpSAIS13 2
3 .Getting Started • What is Attribution? Image Source: www.mediamath.com #ExpSAIS13 3
4 .Introduction What is Databricks Delta? Delta is a data management capability that brings data reliability and performance optimizations to the cloud data lake. #ExpSAIS13 4
5 . Stream-to-Sink BEFORE λ-arch 1 λ-arch Events 1 2 Validation 1 λ-arch Streaming 3 Reprocessing Analytics 4 Compaction 2 Validation Partitioned 3 4 Scheduled to Avoid Compaction Reprocessing Data Lake 4 Compact Reporting Small Files #ExpSAIS13 5
6 . Stream-to-Sink AFTER 1 λ-arch 2 Validation Events 2 Validation λ-arch 1 Streaming 3 Reprocessing 3 Analytics DELTA 4 Compaction 4 Reprocessing Optimize Compact Small Files 3 ZOrder Partitioned Reporting #ExpSAIS13 6
7 .Attribution in Practice impressions JOIN conversions attributed impressions #ExpSAIS13 7
8 .Attribution Challenges Scale • Often dealing with millions to billions of data points per attribution window Complexity • Simple, last-click model is still common • MTA and more sophisticated attribution on rise #ExpSAIS13 8
9 .High Level Attribution Pipeline #ExpSAIS13 9
10 .Attribution in Practice impressions JOIN conversions attributed impressions #ExpSAIS13 10
11 .Data Architecture attribution views attributed table (filters, logic, etc.) last touch impression stream impressions table attributed table weighted conversion stream conversions table #ExpSAIS13 11
12 .System Architecture STRUCTURED STREAMING Amazon Kinesis #ExpSAIS13 12
13 .Unification of Streaming + Batch DEMO #ExpSAIS13 13
14 .Managing Performance • How can we optimize performance? • Levers: – Delta Tools • Optimize • ZOrder • Caching • Data Skipping – Join on Stream – Cluster Size #ExpSAIS13 14
15 .Handling Complexity • Flexibility with Complex Logic – Forking streams – Logic on query vs. in-stream • Late or Corrected Data – Upserts – Views automatically update when raw data changed #ExpSAIS13 15
16 .Conclusion • Unification of Batch & Streaming • Easy APIs for Managing Performance • Flexible and Scalable Analytics on Near Real-Time Data #ExpSAIS13 16