Alibaba’s common algorithm platform on Flink

在阿里内部,拥有有大量的实时流数据和批量数据处理的应用厂家,然而很多的数据分析师并非技术研发人员,他们更习惯用使用有图形化界面或者脚本工具来完成商业决策分析,今天我们向分享我们基于Apache Flink,开发算法库和图形化界面的经验,这些工作将帮助到数据分析师们更好地完成他们使命,让他们更容易完成算法分析/训练和预测。
展开查看详情

1.Alibaba’s Common Algorithm Platform on Flink Xu Yang yangxu@alibaba-inc.com

2.Agenda •  Background •  Why based on Flink/Blink •  Pla7orm Introduc;on •  Demos

3.Background •  Alibaba Group •  Alibaba Compu;ng Pla7orm •  PAI ( Pla7orm of AI )

4.Why based on Flink/Blink •  More requirements on stream processing •  Advanced Flink architecture •  User –  Low learning curve –  Less coding –  More func;ons

5.Alibaba’s Common Algorithm Platform on Flink •  Code Name: Alink -  Common part of related words -  Alibaba, Algorithm, AI, Flink, Blink •  Current supported algorithms -  Sta;s;cs, Machine Learning, Recommenda;on, Outlier

6.Alink Architecture Alink SDK & Web UI & Client & Visualiza)on Processing for Structural Data Stream Operator Batch Operator Stream Processing Batch Processing Maching Learning Common Libs Graph Processing Event Processing Alink ...... Alink ...... For Streaming For Streaming For Streaming Alink Stat Alink Stat Alink ML Alink ML Flink ML Rela;onal Rela;onal For Batch For Batch For Batch Table Table Gelly CEP DataStream API DataSet API Stream Processing Batch Processing Run)me Distributed Streaming Dataflow Local Cluster Cloud Single JVM Standalone YARN GCE EC2

7.Alink UI •  Web UI –  Drag-drop, easy to build workflow •  Client –  Local run –  Edit and run script •  Console –  Client without GUI

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.Alink UI •  Web UI –  Drag-drop, easy to build workflow •  Client –  Local run –  Edit and run script •  Console –  Client without GUI

18.

19.Local Run! Cluster Run!

20.Alink UI •  Web UI –  Drag-drop, easy to build workflow •  Client –  Local run –  Edit and run script •  Console –  Client without GUI

21.Alink Functions (Part 1 of 3) •  Sta;s;cs and Visualiza;on -  Current and History -  Basic Sta;s;c •  Mean, Variance, StdVar, CV, StdErr, Moment, Central Moment, Skewness, Kurtosis •  Histogram, TopK, Bo[omK, Frequency, Percen;le, Quan;le, Median, Mode •  Covariance, Coef of Correla;on, Cross Table, Ranking List -  Sta;s;cal Analysis •  PCA, Correspondence Analysis, Mul;-collinearity •  T-Test, Chi2-Test, KS-Test, AD-Test

22.Demo for Statistics and Visualization •  IJCAI-17 Dataset -  h[ps://;anchi.aliyun.com/datalab/index.htm -  Trading amounts and loca;ons of Alipay users -  19.6 million users, 67 million trades

23.Stat Demo: Current and History •  AllStat for History, stat from start to now •  WindowStat for Current, stat over last 3 seconds •  Trading amounts •  Frequency of shop_level

24.Data Frequent & Count Demo

25.Stat Demo: Distribution •  Get 2 stream data: shop_level=‘low’, shop_level=‘high’ •  Consider 2 Features : comment_cnt and pay •  Probabilis;c Distribu;on

26.Stat Demo: Distribution

27.Stat Demo: Relationship of Features •  Numerical Features: pay, comment_cnt and shop_level_int -  Mul;collinearity, Coef of Correla;on •  Categorical Features: province and shop_level -  Correspondence Analysis, Cross Table

28.Stat Demo: Relationship of Features

29.Stat Demo: Ranking List •  provinces for user counts •  provinces for sum of pay, showed in map •  catalogs for sum of pay