申请试用
HOT
登录
注册
 
账号已存在
Large Scale Video Data Analysis at Tubi
openLooKeng
/
发布于
/
262
人观看

15沈达.png
沈达-Tubi 数据工程师/Spark社区贡献者

展开查看详情

1 .

2 .Large Scale Video Data Analysis at Tubi 2021/10/16 © Tubi, proprietary and confidential

3 .About Me 沈达, Data Engineer @ Tubi ● GNU TeXmacs Developer ● 《Scala实用指南》译者 ● Apache Spark Contributor © Tubi, proprietary and confidential 3

4 .Tubi Stream Freely democratize content and make premium content accessible to everyone.

5 .Video Data Analysis: Why? At Tubi: ● Ads Logo Detection ● Ads Point Finding ● Cue Point Finding Other Applications: ● Self Driving Car ● Security Camera Imagine how a superhero finds villains provided with video data from all security cameras in Beijing. © Tubi, proprietary and confidential 5

6 .Challenges of Video Data Structured Data Unstructured Data Schema Schema-on-write Schema-on-read Format Predefined, using alphanumeric Binaries and can only be displayed as characters hexnumber Storage Well-formatted Binary column in NoSQL Delta/Parquet/ORC Or scattered in S3 Query SQL Special Tools Needed Applying info extractors and ML models © Tubi, proprietary and confidential 6

7 .Rikai: 理解 A parquet based ML data format built for working with unstructured data at scale ● Run ML-models via SQL ○ PyTorch ○ Scikit-Learn ○ … (more to come) ● Native support for images and videos via PySpark ● Customized data format for images and videos (TODO) https://github.com/eto-ai/rikai Open Source Driven at the very beginning © Tubi, proprietary and confidential 7

8 .Rikai vs Redshift-ML vs BigQuery-ML 1. The biggest difference is in how Rikai introduces “strong typing” for annotations and labels. This actually allows you to use SQL to understand your dataset 2. Rikai also contains many UDFs to help process images/videos (eg compute IOU, generate spectrogram etc) 3. Looking narrowly at just the feature, because it’s Spark-based it’s easily extensible and easy to diagnose issues. © Tubi, proprietary and confidential 8

9 . Rikai-enhanced Spark SQL © Tubi, proprietary and confidential 9

10 .Rikai on Images: the magic ML_PREDICT 0: person 65: remote © Tubi, proprietary and confidential 10

11 .Rikai on Videos Split the 10min videos to 14400 images 14400 = 10 * 60 * 24 Apply the Yolov5 model on images Find predictions with the label 0 © Tubi, proprietary and confidential 11

12 .Rikai Types: Image, Box2d © Tubi, proprietary and confidential 12

13 .Rikai Types: Video © Tubi, proprietary and confidential 13

14 . Run ML Models © Tubi, proprietary and confidential 14

15 .Case Study: Linear Regression Train Predict © Tubi, proprietary and confidential 15

16 .Case Study: Linear Regression with MLflow log_model Train Predict Train Predict load_model © Tubi, proprietary and confidential 16

17 .MLflow: Collaborate on versioned models S3/HDFS © Tubi, proprietary and confidential 17

18 .Case Study: Linear Regression with Rikai Train Predict LOG MODEL Train Predict CREATE MODEL Train Create Model ML_PREDICT: Generated UDF ML_PREDICT © Tubi, proprietary and confidential 18

19 .Summary Train ML Engineers: who defines and creates the model ML/Data Engineers: who manages the model Model Management Create CreateModel CreateModel Model ● Versioning (full lifecycle) ● Options Tuning ML_PREDICT ML/Data Engineers, Data Analysts/Scientists, ... Inputs Predicts Everyone can become a superhero!!! © Tubi, proprietary and confidential 19

20 . ML_PREDICT Revealed © Tubi, proprietary and confidential 20

21 .ML_PREDICT: Generated UDF Models can be cached to reduce IO latency © Tubi, proprietary and confidential 21

22 .ML_PREDICT: Vectorized UDF 0 1 25 26 50 51 75 76 1 2 26 27 51 52 UDF 76 77 UDF UDF UDF 2 3 27 28 52 53 invoked 78 79 invoked invoked invoked … … … … 24 25 49 50 74 75 99 100 © Tubi, proprietary and confidential Assuming batch_size is 4, UDF invoked for 4 times in total 22

23 .Call for Contributors!!! The NEXT AWESOME Big Data/ML Open Source Project © Tubi, proprietary and confidential 23

24 .Thanks! Work with me (da@tubi.tv) !!! https://github.com/da-tubi/rikai-example © Tubi, proprietary and confidential 24

25 .

0 点赞
1 收藏
2下载
确认
3秒后跳转登录页面
去登陆