机器学习对时间敏感数据的分析方法

本文讨论如何利用Spark及其相关的机器学习工具来解决在无监督和有监督的机器学习方式中分析时间敏感数据的问题。
展开查看详情

1.A Machine Learning Approach to Time-Sensitive Data Analysis Anthony Kim Staff Data Scientist Samsung Research America #Ent4SAIS

2.Acknowledgement Machine Learning Feature Engineering Management Jaemun Sim Byeonjin Kim Chanmuk Kim Kwanghyun Koh Sai Kiriti Jinbeom Lee Wonkeun Oh VP Chul Lee #Ent4SAIS 2

3.Big Picture Turn Big data into Smart Data via Audience Science to Drive Value Creation Devices • Provide executives with actionable business insights • Provide engineering teams with Users Markets models for customers’ better product experience • Provide corporate citizens with Apps Services guidance to increase efficiency Location/Time #Ent4SAIS 3

4. Smart Devices, Smart Data, and Smart Services Machine-oriented Organic Logs: Observe Customers! • ACR logs • Smart Hub logs Feeding HUMAN reaction back into observation Turning MACHINE logs into HUMAN data Data- Outcome from Machine Learning & driven Data Mining Models: Business • Targeted Ads / Marketing: Human-centric Smart Data: Serve Know • Personalized Recommendation on: Customers! Customers! • Hierarchical Session Data • Connected Life • User Profiles / Segments • Home, Auto, Mobile, … • Household Profiles / Segments • Smart TV Services • Contents • Consumer Goods Building HUMAN behavioral models #Ent4SAIS 4

5. Sessionization of Viewing History • Combine all repetitive log datapoints as one time block if the state remained the same 8:00pm Anthony Kim watched “XYZ-TV-SHOW Season 1 Episode 1” on Channel 1 8:01pm Anthony Kim watched “XYZ-TV-SHOW Season 1 Episode 1” on Channel 1 8:02pm Anthony Kim watched “XYZ-TV-SHOW Season 1 Episode 1” on Channel 1 … 8:30pm Anthony Kim watched “XYZ-TV-SHOW Season 1 Episode 1” on Channel 1 8:00pm – 8:30pm Anthony Kim watched “XYZ-TV-SHOW Season 1 Episode 1” on Channel 1 Apache Spark, Spark and the Spark logo are trademarks of This is one program watch session The Apache Software Foundation #Ent4SAIS 5

6.Hierarchical Sessions Top Layer (Layer 1): By Panel Display in Use TV was turned on TV was turned off Layer 2: HDMI 1 HDMI 2 HDMI 3 HDMI 1 By Content Input Source Layer 3: STB Game Console Media Player STB By Input Source Type Brand Layer 4: Device Model Device Model #1234-0 Brand #1 Console #2 Device Model #1234-5 By Connected Device #3456 Console Layer 5: CH Game Movie By Content Provider CH1 Game Provider 1 CH 3 2 Provider 2 Provider Layer 6: XYZ A Game Movie By Content Ad Game ABCD Ad BBall Ad Tennis S01E01 d WXYZ CDEF #Ent4SAIS 6

7. Samsung Taste Graph ® Tastes Devices T1 Compute Relevance Scores Map Content D1 T2 D2 ... ... TN DM Samsung Taste Graph is a trademark of Samsung Electronics Co., Ltd. #Ent4SAIS 7

8. TS01 TS02 … TS47 TS48 Time Units Day 1 Day 2 … • Daily Time Slices (48 Time Slices/Day) Day N • Day of Week (7) Mon Tue … Sat Sun • Weekdays (5) and Weekend (2) Week 1 • Dayparts (8) Week 2 • Prime Time, Late News, Late Fringe, Post Late Fringe … • Morning, Daytime, Early Fringe, Prime Access Week N Weekdays Weekend Week 1 Week 2 v Proprietary … Time-Sensitive Week N Taste Graph DP1 DP2 … DP7 DP8 v Proprietary Day 1 Quality Score Day 2 … Day N #Ent4SAIS 8

9.Household Demographics Prediction • Combination of General Machine Learning and Deep Learning Algorithms HH 1 Person 1 Feature Vector Labels • 7,000+ features HH 1 Person 2 Feature Vector Labels • Time-sensitive features HH 1 Person 3 Feature Vector Labels • Keyword features HH 2 Person 1 Feature Vector Labels • Title features HH 2 Person 2 Feature Vector • Duration features Labels HH 3 Person 1 Feature Vector Labels #Ent4SAIS 9

10. Training and Test Process ACR EPG on Training models using Prediction on 3rd party data with Cross Validation Transfer Learning Proprietary Data labels TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. Horovod is Uber’s Open Source Distributed Deep Learning Framework for TensorFlow Databricks, the Databricks logo and any related marks are trademarks of Databricks Inc. #Ent4SAIS 10

11.Why Databricks? • Suitable for a wide range of user groups • Cost-effectiveness • End-to-end machine learning lifecycle • Support for distributed deep learning model training #Ent4SAIS 11

12.Summary • Hierarchical Sessionization • Samsung Taste Graph ® • Time-Sensitive Features – Proprietary Time-Sensitive Taste Graph / Quality Score – Demographics Prediction Smart Devices, Smart Data, and Smart Services #Ent4SAIS 12

13.Future Work 360-degree View of Whole Processes (Observation, Profiling, Serving, Feedback, etc.) #Ent4SAIS 13

14.Thank you for your attention! Questions? hyunwoo.k@samsung.com linkedin.com/in/anthony-sra #Ent4SAIS 14