Flink Forward China 2018

Flink Forward China 2018
展开查看详情

1. Flink Forward China 2018 'U5DGX7XGRUDQ

2. Flink Forward China 2018 In this AI era and continuous incoming data, we need new platform and algorithmic approaches to enable to learn and take smart decisions in real time by continuous adaptation.

3.What is Online ML?

4. Business Scenarios for Online ML Anomaly Detection and Prediction for: IoT, Smart Factory, Smart Manufacturing, SafeCity Forecast of values in real time by modeling the sesonality of …dynamics state machine Estimator time series samples Example of Holt Winters Sesonal Prediciton for Time Series 300 250 200 Value 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time Demand Forecast Distribution Detection and Prediction for: Smart Factory, Smart Forecast of values in real time by modeling the time series Manufacturing, Advertising… samples Example of AR Prediction following a Time Series 20 Value 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Time Actual Predicted

5. Stream Platform for Online ML Design Principles: • Incremental computation • Fixed size memory • Constant to sub-linear time complexity

6. Stream (Redesigned) Algorithms for Online ML The linear least squares fitting technique is the simplest and most commonly applied form Batch: O(n) complexity Autoregressor model of linear regression, and provides a solution to the problem of finding the best In general: Coefficients are computed with an optimizer, which is costly, and fitting straight line through a set of points. " potentially unstable (not all systems have a solution). The !" = $ % − ' () , +, , +", +- , … , +/ Practically, it tries to minimize the sum squares of the deviations of a set of n data points: procedure is applied for every sample of the time series data. New Point For the linear fit: (received at y ( = F +G ∗% time T) ! " = $ %) − ' +, + +" () " 3 3 % = F2 + G2 ∗ ( Move from batch computation over the whole data set to incremental computation based on previous values. Requirement: Redo the math for the ML algorithm (hard Online estimation of coefficients with task!) updates Stream: O(1) complexity using Kalman Filters 123,4 Incremental Linear +" = " +, = %6 − +" (̅ 5",4 Regression Learner Regression (x) = +, + +" ( I4 = I48, ∗ JKL''MNLO:14 x Incremental mean 1 Used incremental (̅ = (̅48, + ((4 − (̅48, ) functions : JKL''MNLO:14 = JKL''MNLO:148, + ΔN Based on such a formula we can: Incremental variance 1)Implement the ML in Flink 5",4 = 5",48, + ((4 − (̅48, )((4 − (̅4 ) 2)Add the support for stateful processing Incremental covariance :−2 1 Coefficients are updated incrementally following an iterative process to 3)Embed the function in window processing 4)Extend the SQL to support it 123,4 = 123,48, + (4 − (̅48, %4 − %648, predict and update :−1 :

7.Online ML Enabled in (Stream) SQL 40 new Online ML SQL functions added already.

8. Online DeepLearning with SQL DL Online Serving for Image DL Online Serving for Text SELECT headlines Application scenario supported FROM MyTable • Text classification (using Word2Vec) WHERE • Image classification DL_TEXT_MAX_PREDICTION_INDEX( headlines, /* Text input data*/ • Handwriting digits prediction word2VecModelFile, /* Word2Vec neural network file path*/ SELECT DL_IMAGE_MAX_PREDICTION_INDEX( predictionModelFile, /* Pre-trained prediction model path*/ image, /* input image under byte[ ] format*/ dl4jModel /* True if DL4J model, false otherwise*/ modelConfigFilePath, ) = 2 /* Bollywood category*/ weightsHdf5File Enabled Stream DL serving via SQL ) as prediction FROM MyTable Word2Vec neural DL offline trained network classification model Nestle enters deal to sell based on multiple engines: DL offline trained 000 Starbucks’ products prediction model 111 •DL4J (importing also models from Business and 222 333 DL4J Engine Development DL4J Engine 444 TensorFlow using Kera) Images input stream as byte Flink Predicted output as 555 666 777 Flink [] •TensorFlow native support integer 858 CloudStream 999 CloudStream

9.

10.