Near Real-Time Netflix Recommendations Using Apache Spark Streaming
展开查看详情
1.Near Real-Time Recommendations - Spark Streaming Elliot Chow, Netflix Nitin Sharma, Netflix #ML7SAIS #ML7SAIS
2.Agenda ● Recommendations @ Netflix ● The Need for Near Real Time ● Use Cases ● Common Infrastructure ● Scaling Challenges #ML7SAIS
3.Recommendations at Netflix ● Personalize the Netflix experience for each member ○ Goal: Quickly help members find content they’d like to watch ○ Risk: Member may lose interest and abandon the service ○ Challenge: Recommending at scale #ML7SAIS
4.Scale @ Netflix ● 125M+ active members ● 190 countries ● 450B+ unique events/day ● 700+ Kafka topics #ML7SAIS
5.Typical Data Pipelines @ Netflix ● Data stored in Hive/S3 ● Batch ETLs using Spark/Hive ● Table partitioning by day or hour ● Job scheduling by both CRON or data availability ● Latency often is on the order of days #ML7SAIS
6.The Need for Near Real Time (NRT) ● Dynamic catalog ● Growing member base ● Time sensitivity ○ Content popularity changes ○ Member interests evolve #ML7SAIS
7.The Need for Near Real Time (NRT) ● Increasing amount of data ○ Process data as soon as possible to keep latencies low ○ Minimize amount of data to reprocess in case of failure ● Multi-Armed Bandits Adoption #ML7SAIS
8.Use Cases ● Video Insights ● ML Pipelines for Recommendations #ML7SAIS
9.NRT for Video Insights #ML7SAIS
10.Video Insights ● New title launches ● Early reads on title performance ● Slice by various dimensions #ML7SAIS
11.Video Insights Service #ML7SAIS
12.Video Insights Service #ML7SAIS
13.Video Insights Service #ML7SAIS
14.Video Insights Service #ML7SAIS
15.Video Insights Client Service #ML7SAIS
16.Video Insights - State ● Counts maintained in Spark ● Custom state management ○ Based on mapWithState implementation input.scan(initRDD)((currentRDD, batchRDD) => f(currentRDD, batchRDD)) ○ Easier to re-use the function f in batch mode ○ Lower-level control over state management ○ scanByKey alternative for keyed state #ML7SAIS
17.NRT for Recommendations #ML7SAIS
18.Billboard Recommendations ● Recommend a relevant title to each member ● Right time ● Respond quickly to member feedback #ML7SAIS
19. Artwork Personalization ● Personalized Image ● Visual Evidence ● Quickly adapting - Title launches, member tastes ● Rapid learning - Cold start #ML7SAIS 19
20. Traditional Recommendations DataStore/ Millions of ETL Training Online Play related Layers pipelines caches Signals Few days Rankings Models Precompute/Live Compute #ML7SAIS
21. NRT Recommendations DataStore/Online Millions of Streaming caches Play related layer Signals NRT Rankings Training pipelines Models Precompute/Live Compute #ML7SAIS
22.Required Data ● Impressions, Plays, etc. ● Attribution ● Explore/Exploit Metadata #ML7SAIS
23.Attribution Infrastructure Stream Processing Batch Processing Query API #ML7SAIS
24.Stream Processing - Zoomed in Impression Play Enrich Cache Video Metadata #ML7SAIS
25.Batch Processing - Zoomed in Impression Windowed Impression Dedupe Windowed Play Join Play Video Experimentation Metadata Data #ML7SAIS
26.Common Infrastructure #ML7SAIS
27. Netflix Spark Stack ● Jenkins ● Spinnaker ● ASG ● Runners: Marathon, Meson ● Resource Manager: Mesos ● Storage: HDFS, S3, EFS ● Multi-Region #ML7SAIS
28. Multi Region Challenges ● Geo routing ● Impression in one region; play in another ● Streaming - Multi Region ● Batch Reduce/Merge - One region #ML7SAIS
29.Can it withstand Chaos? • Chaos is a design principle • Instance Failovers => Region Failovers • Transparent to the consumers • Over provisioned • Long term - Autoscale #ML7SAIS