- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Building an AI-Powered Retail Experience with Delta Lake
Building an AI-Powered Retail Experience with Delta Lake
展开查看详情
1 .WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
2 .AI-Powered Retail Experience with Databricks Akhil Dhingra, Zalando Saurav Verma, Zalando #UnifiedDataAnalytics #SparkAISummit
3 .Zalando SE ● Founded in 2008 in Berlin. ● Europe's leading online fashion platform ● Connects customers, brands and partners. #UnifiedDataAnalytics #SparkAISummit 3
4 .Zalando SE 4
5 .Big-Data Stack @ Zalando 5
6 .About Us Akhil Dhingra Product Manager, Data Solutions @Zalando Exp: 7+ Years, Ex-Groupon, Ex-Wingify | MBA Saurav Verma Senior Engineer, Data Lake @Zalando Exp: 9+ Years , Ex-Visa | Masters NUS 6
7 .Data Platform Data Sources 7
8 .Data Platform ● Data Lake on top of S3 Data Sources 8
9 .Data Platform ● Multi-tenant / single compute: more ingestion pipelines Data Sources 9
10 .Many Use Cases Team A Data Sources 10
11 .Many Use Cases Team B Team A Data Sources 11
12 .Many Use Cases Team B Team A Team C Data Sources 12
13 .Too Many Use Cases Team B Team M Team A Team C Team N Data Sources 13
14 .Too Many … Compute Stream Team B Training Auto-Scale Team M Team A Compute Team Batch C Python Team / Scala N Data Sources 14
15 . Too Many … Compute Stream Team B ● Cost control problem at Scale Training ● More Time To Production Auto-Scale ● No Best Practices Team M ● Duplication of work / Data Team A Compute ● Dependencies ● Inconsistent Environment ● No Community Knowledge Team C Batch ● Accidental Complexity Python Team / Scala N 15
16 . Spark as a Service Stream Team ● Foundational piece of Zalando’s B Big Data Infrastructure Training ● GitOps Management, Auto-Scale Decentralized Clusters Team M ● Security / Compliance / CI-CD Team A ● XX clusters/Jobs ● ~20 teams in production ● Thriving #Databricks community Team C Batch in Zalando Python Team / Scala N 16
17 .Spark as a Service Migration Projects ETLs | Data Preparation in Spark-S3 17
18 .Spark as a Service Others: Structured Streams | Traceability 18
19 .Spectrum of use cases 19
20 .GDPR and Antitrust Compliance with GDPR and antitrust laws 20
21 .GDPR and Antitrust Probe (pilot) - Use marker event to create heat map of the data path. - List of all datasets within the heat map. 21
22 .GDPR and Antitrust Pseudonymize/Remove - Identifier based, on-demand, in-place record updater with field precision - Great for semi-structured formats like JSON - Use S3 Inventory + Streaming 22
23 .Search & Ranking Personalized article ranking for relevance and user engagement. 23
24 .Search & Ranking Using Spark in ML training pipeline ! 24
25 .Search & Ranking ML Model Article Scoring and personalization ! 25
26 .Others • Sizing: Reducing return rates due to size and fit issues. • Experimentation @Scale • Merchant Analytics • Marketing Services 26
27 .First Impressions • GitOps | Self Service 27
28 .First Impressions • Multi-Tiered support system • Delta Adoption | But few readers outside Databricks ecosystem • Communicating pricing downstream • Exploding Usage is Good • Fits all Size? 28
29 .Thank you. AI- Powered Retail Experience with Databricks Akhil Dhingra Saurav Verma www.zalando.com www.jobs.zalando.com/tech #UnifiedDataAnalytics #SparkAISummit 29