Building an AI-Powered Retail Experience with Delta Lake

Building an AI-Powered Retail Experience with Delta Lake


1.WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

2.AI-Powered Retail Experience with Databricks Akhil Dhingra, Zalando Saurav Verma, Zalando #UnifiedDataAnalytics #SparkAISummit

3.Zalando SE ● Founded in 2008 in Berlin. ● Europe's leading online fashion platform ● Connects customers, brands and partners. #UnifiedDataAnalytics #SparkAISummit 3

4.Zalando SE 4

5.Big-Data Stack @ Zalando 5

6.About Us Akhil Dhingra Product Manager, Data Solutions @Zalando Exp: 7+ Years, Ex-Groupon, Ex-Wingify | MBA Saurav Verma Senior Engineer, Data Lake @Zalando Exp: 9+ Years , Ex-Visa | Masters NUS 6

7.Data Platform Data Sources 7

8.Data Platform ● Data Lake on top of S3 Data Sources 8

9.Data Platform ● Multi-tenant / single compute: more ingestion pipelines Data Sources 9

10.Many Use Cases Team A Data Sources 10

11.Many Use Cases Team B Team A Data Sources 11

12.Many Use Cases Team B Team A Team C Data Sources 12

13.Too Many Use Cases Team B Team M Team A Team C Team N Data Sources 13

14.Too Many … Compute Stream Team B Training Auto-Scale Team M Team A Compute Team Batch C Python Team / Scala N Data Sources 14

15. Too Many … Compute Stream Team B ● Cost control problem at Scale Training ● More Time To Production Auto-Scale ● No Best Practices Team M ● Duplication of work / Data Team A Compute ● Dependencies ● Inconsistent Environment ● No Community Knowledge Team C Batch ● Accidental Complexity Python Team / Scala N 15

16. Spark as a Service Stream Team ● Foundational piece of Zalando’s B Big Data Infrastructure Training ● GitOps Management, Auto-Scale Decentralized Clusters Team M ● Security / Compliance / CI-CD Team A ● XX clusters/Jobs ● ~20 teams in production ● Thriving #Databricks community Team C Batch in Zalando Python Team / Scala N 16

17.Spark as a Service Migration Projects ETLs | Data Preparation in Spark-S3 17

18.Spark as a Service Others: Structured Streams | Traceability 18

19.Spectrum of use cases 19

20.GDPR and Antitrust Compliance with GDPR and antitrust laws 20

21.GDPR and Antitrust Probe (pilot) - Use marker event to create heat map of the data path. - List of all datasets within the heat map. 21

22.GDPR and Antitrust Pseudonymize/Remove - Identifier based, on-demand, in-place record updater with field precision - Great for semi-structured formats like JSON - Use S3 Inventory + Streaming 22

23.Search & Ranking Personalized article ranking for relevance and user engagement. 23

24.Search & Ranking Using Spark in ML training pipeline ! 24

25.Search & Ranking ML Model Article Scoring and personalization ! 25

26.Others • Sizing: Reducing return rates due to size and fit issues. • Experimentation @Scale • Merchant Analytics • Marketing Services 26

27.First Impressions • GitOps | Self Service 27

28.First Impressions • Multi-Tiered support system • Delta Adoption | But few readers outside Databricks ecosystem • Communicating pricing downstream • Exploding Usage is Good • Fits all Size? 28

29.Thank you. AI- Powered Retail Experience with Databricks Akhil Dhingra Saurav Verma #UnifiedDataAnalytics #SparkAISummit 29