- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Near Real-Time Netflix Recommendations Using Apache Spark Streaming
展开查看详情
1 .Near Real-Time Recommendations - Spark Streaming Elliot Chow, Netflix Nitin Sharma, Netflix #ML7SAIS #ML7SAIS
2 .Agenda ● Recommendations @ Netflix ● The Need for Near Real Time ● Use Cases ● Common Infrastructure ● Scaling Challenges #ML7SAIS
3 .Recommendations at Netflix ● Personalize the Netflix experience for each member ○ Goal: Quickly help members find content they’d like to watch ○ Risk: Member may lose interest and abandon the service ○ Challenge: Recommending at scale #ML7SAIS
4 .Scale @ Netflix ● 125M+ active members ● 190 countries ● 450B+ unique events/day ● 700+ Kafka topics #ML7SAIS
5 .Typical Data Pipelines @ Netflix ● Data stored in Hive/S3 ● Batch ETLs using Spark/Hive ● Table partitioning by day or hour ● Job scheduling by both CRON or data availability ● Latency often is on the order of days #ML7SAIS
6 .The Need for Near Real Time (NRT) ● Dynamic catalog ● Growing member base ● Time sensitivity ○ Content popularity changes ○ Member interests evolve #ML7SAIS
7 .The Need for Near Real Time (NRT) ● Increasing amount of data ○ Process data as soon as possible to keep latencies low ○ Minimize amount of data to reprocess in case of failure ● Multi-Armed Bandits Adoption #ML7SAIS
8 .Use Cases ● Video Insights ● ML Pipelines for Recommendations #ML7SAIS
9 .NRT for Video Insights #ML7SAIS
10 .Video Insights ● New title launches ● Early reads on title performance ● Slice by various dimensions #ML7SAIS
11 .Video Insights Service #ML7SAIS
12 .Video Insights Service #ML7SAIS
13 .Video Insights Service #ML7SAIS
14 .Video Insights Service #ML7SAIS
15 .Video Insights Client Service #ML7SAIS
16 .Video Insights - State ● Counts maintained in Spark ● Custom state management ○ Based on mapWithState implementation input.scan(initRDD)((currentRDD, batchRDD) => f(currentRDD, batchRDD)) ○ Easier to re-use the function f in batch mode ○ Lower-level control over state management ○ scanByKey alternative for keyed state #ML7SAIS
17 .NRT for Recommendations #ML7SAIS
18 .Billboard Recommendations ● Recommend a relevant title to each member ● Right time ● Respond quickly to member feedback #ML7SAIS
19 . Artwork Personalization ● Personalized Image ● Visual Evidence ● Quickly adapting - Title launches, member tastes ● Rapid learning - Cold start #ML7SAIS 19
20 . Traditional Recommendations DataStore/ Millions of ETL Training Online Play related Layers pipelines caches Signals Few days Rankings Models Precompute/Live Compute #ML7SAIS
21 . NRT Recommendations DataStore/Online Millions of Streaming caches Play related layer Signals NRT Rankings Training pipelines Models Precompute/Live Compute #ML7SAIS
22 .Required Data ● Impressions, Plays, etc. ● Attribution ● Explore/Exploit Metadata #ML7SAIS
23 .Attribution Infrastructure Stream Processing Batch Processing Query API #ML7SAIS
24 .Stream Processing - Zoomed in Impression Play Enrich Cache Video Metadata #ML7SAIS
25 .Batch Processing - Zoomed in Impression Windowed Impression Dedupe Windowed Play Join Play Video Experimentation Metadata Data #ML7SAIS
26 .Common Infrastructure #ML7SAIS
27 . Netflix Spark Stack ● Jenkins ● Spinnaker ● ASG ● Runners: Marathon, Meson ● Resource Manager: Mesos ● Storage: HDFS, S3, EFS ● Multi-Region #ML7SAIS
28 . Multi Region Challenges ● Geo routing ● Impression in one region; play in another ● Streaming - Multi Region ● Batch Reduce/Merge - One region #ML7SAIS
29 .Can it withstand Chaos? • Chaos is a design principle • Instance Failovers => Region Failovers • Transparent to the consumers • Over provisioned • Long term - Autoscale #ML7SAIS