RTL Netherlands exists for 30 years in 2019. Video has been our core business. AI gives us the opportunity to deeply understand what our consumers love. On our Spark platform in AWS we apply several AI and ML methods to extract and analyze features.
A selection of our content intelligence pipelines:
- Object and person detection in videos.
- Multi-modal emotion detection.
- Speaker identification.
- Script and subtitle keyword extraction.
- Among others
All of these features are used for different data science products: new show and episode creation, talkshow subject selection, interpret viewing ratings among others. Our future goal is to personalize TV on our video-on-demand platform. Not only recommend other series that you like, but also to create personalized talkshows and soap opera’s with the subjects, storylines, guests and characters that you like. Video is this our basis, but digitally the opportunities are much more diverse. With this talk I want to inspire and share knowledge.
Visiting the Spark Summit 2018, I learned a lot. Some talks even helped to further build this content intelligence project. It would be amazing to give back to the Spark community. Especially when they visit my hometown of Amsterdam. I want to surprise the attendees with the story of this unknown Dutch TV channel, that is taking a leading role on content intelligence in the Netherlands and Europe. It will be an open, inspiring talk with technical details on the pipelines and technology that we used. Accompanied with the end use cases. Including drawbacks and challenges we faced. Not a talk about ambitions, but concrete results of the next level of TV innovation. RTL NL was the first broadcaster of Big Brother and The Voice. And I’m confident that the next break-out hit will be Spark driven.
1.WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
2.How data is transforming the Dutch media industry Maurits van der Goes | RTL Netherlands #UnifiedDataAnalytics #SparkAISummit
5.RTL NL LINEAIR TELEVISION ONLINE VIDEO DIGITAL PUBLISHING 8.9 million 779 million 2.3 million Daily TV viewers Online views per month unique vistors daily 5
6. What Where Consumers Content When 6
9.Personalisation Emotion Talkshow Ratings Automated Detection Analysis Forecasting Trailers 9
11.Domain News Content Articles Model Content 11
14.Taxonomy SIMILARITY TF-IDF Uplift Embeddings 23.3% TF.IDF 22.2% Taxonomy 17.6% Editor’s pick (baseline) Random -19.3% 14
15.Domain Films & Series Content Long Video Model Behavior 15
16.Explainability Neural networks Last watched A/B testing 16
17. 30 minutes more VIEWING TIME per user per month 17
18. Emotion Detection 18
19. MEDIA = We tell stories that touch the EMOTION mind & heart 19
20.Emotion detection Face& Emotion Musical Genre & Speaker Emotion Detection Mood 20
21. oarriaga/face_classification 21
22.BERT Bidirectional Encoder Representations from Transformers (Devlin et al., 2018) google-research/bert BERT-Base, Multilingual Cased • 104 languages • 12-layer • 768-hidden • 12-heads • 110M parameters (Kaggle) 22
24.Talkshow Analysis 24
25.Scenario & subtitles matching Scenario Subtitles FuzzyWuzzy Levenshtein Distance Items + TS 25
26.Item classification • Crime • Entertainment • Lifestyle • Royalty Item NaiveBayes 0,89946 Logistic Regression 0,84533 (Count Vector Features) MLlib Logistic Regression 0,81268 (TF-IDF Features) RandomForrest 0,56564 26
28. Ratings Forecasting 28
29.Model components –0.12 pp/yr –2.65 pp/yr –1.95 pp/yr Four weather variables (temperature, wind, precipitation, Dummy Truncated Piecewise-linear variables Components: sunshine), Fourierfunction for trend measured series: +weekdays various “events” seasonal with respect (e.g. holidays, + weekday to average+ weather conditions various sports events) + event for the time of year ( Interaction with n = 1,2π𝑛𝑡 2 Fourier terms2π𝑛𝑡 𝑓 𝑡 = $ Interactions: 𝑎% cos + 𝑏 % sin , 𝑃 = 1 year 𝑃 𝑃 • n =%&' 1, 2 Fourier terms • saturday/sunday 29 • internal