1.AI as a Service Build Shared AI Service Platforms Based on Deep Learning Technologies Suqiang Song, Director, Chapter Leader of Data Engineering & AI Mastercard #AI1SAIS

2.Mastercard Big Data & AI Expertise Differentiation starts with consumer insights from a massive worldwide payments network and our experience in data cleansing, analytics and modeling What can MULTI-SOURCED • 38MM+ merchant locations 2.4 BILLION • 22,000 issuers CLEANSED, AGGREGATD, ANONYMOUS, AUGMENTED • 1.5MM automated rules • Continuously tested Global Cards and WAREHOUSED 56 BILLION • 10 petabytes • 5+ year historic global view • Rapid retrieval Transactions/ • Above-and-beyond privacy protection and security Year mean TRANSFORMED INTO ACTIONABLE INSIGHTS to you? • Reports, indexes, benchmarks • Behavioral variables • Models, scores, forecasting • Econometrics Mastercard Enhanced Artificial Intelligence Capability with the Acquisitions of Applied Predictive Technologies(2015) and Brighterion (2017)

3.What is the AI as a Service ?

4.Three modes of AI as a Services AI Applications AI Applications AI Applications Automation Services Fully managed Fully managed machine learning services machine learning services Machine learning frameworks On / Off Premise Advanced Machine learning frameworks Machine learning frameworks Infrastructure On / Off Premise Advanced On / Off Premise Advanced Infrastructure Infrastructure • Machine learning frameworks: • Automation Services, tasks like explora • Fully managed machine learning servic Provide stable and secure tory data analysis, pre-processing of da es use templates, pre-built models and environments and consolidate ta, hyper-parameter tuning, model sele ©2018 Mastercard. Proprietary and Confidential. drag-and-drop development tools to si integrated wrappers on top of ction and putting models into producti mplify and expedite the process of usin variable technologies for regular on can be automated g a machine learning framework machine learning works • “God's Return to God, Satan's Return • Applications share templates and pre- • Applications build silos from scratch to Satan , Math’s Return AI, Business’s built models , assembly and infer them Return Biz” into pipelines or business context

5.Regular Mode :Machine learning frameworks Example : Machine Learning Sandbox Cost $100,000 Model Deployment &Serving Evaluation Modeling & Benchmarking Features Engineering Data Exploration & Harmonization ©2018 Mastercard. Proprietary and Confidential. 0,0 Time 6 weeks 5

6.Plus Mode : Fully managed machine learning services Example : Data Science Workbench Cost $50,000 Model Deployment &Serving Evaluation Features & Benchmarking Engineering Modeling Data Exploration & Harmonization ©2018 Mastercard. Proprietary and Confidential. 0,0 Time 2 weeks 6

7.Premium Mode: Automation Services Example : Amazon SageMaker ? Cost $10,000 Evaluation Features & Benchmarking Engineering Model Deployment Modeling Data Exploration & &Serving Harmonization ©2018 Mastercard. Proprietary and Confidential. 0,0 Time 2 days 7

8.Challenges to achieve Premium Automation AI Service Learning Automation Serving Automation Feature engineering bottlenecks Less integration with end to end data 1 Pre-calculate hundreds or thousands Long 4 pipelines, fill in the loop Term Variables take lots of resources and times Gap to bring machine learning process into the existing enterprise data pipelines , including batch , streaming and real-time 2 Model scalability limitations Trade-off between automation in parallel and 5 Model Serving to multiple contexts Gap to connect to existing business scaling machine learning to ever larger datasets pipelines , offline ,streaming and real-time and ever more complicated models Heavily relies on human machine learning API Enablement and automate deployment 3 6 ©2018 Mastercard. Proprietary and Confidential. experts Low productivity to create more models with Relies on human to perform the most of tasks low level raw APIs Isolated promotions and operation readness with automate deployment 8

9. What Deep Learning can help ? ©2018 Mastercard. Proprietary and Confidential.

10.Challenges with Traditional ML : Feature engineering bottlenecks LTV DATA from last week AUTH DETAIL from last week MERCHANT CATEGORY FILTERED TRANSACTIONS GEO SUMMED BY USER ITEM LEVEL DATA AGED LTV DATA AGED BY USER LTV DATA FOR THIS WEEK Bottlenecks ©2018 Mastercard. Proprietary and Confidential.  Need to pre-calculate hundreds or thousands Long Term Variables for each user, such as total spends /visits for merchants list, category list divided by week, months and years  The computation time for LTV features took > 70% of the data processing time for the whole lifecycle and occupied lots of resources which had huge impact to other critical workloads.  Miss the feature selection optimizations which could save the data engineering efforts a lot 10

11.With Deep Learning : Remove lots of LTV workloads and simply the feature engineering Improvements  When build model , only focus on few pre-defined sliding features and custom overlap features ( Users only need to identify the columns names from data source)  Remove most of the LTV pre-calculations works, saved hours time and lots of resources ©2018 Mastercard. Proprietary and Confidential.  Deep learning algorithm generates exponential growth of hidden embedding features ,do the internal features selections and optimization automatically when it does cross validation at training stage 11

12. Challenges with Traditional ML : Model scalability Feature Engineering Item 1 * Users Model 1 Limitations 2 Training 1 3  All the pipelines separated by items and Evaluation 1 generate one model for each item Prebuilt correlation  Have to pre-calculate the correlation Item 2* Users 1 Model Model 2 Merge matrix between items 2 Training 2  Lots of redundant duplications and 3 computations at feature engineering Evaluation 2 4 ,training and testing process Merge all the  Run items in parallel and occupied most of cluster resources when executed … prediction results Item n* Users  Bad metrics for items with few ©2018 Mastercard. Proprietary and Confidential. 2 Training n Model n transactions Evaluation 3 3  It is very hard to scale more items , from hundreds to millions ? 12

13. With Deep Learning : Scale models in deeper and wider without decreasing metrics • NCF Sigmoid • Scenario:Neural Collaborative Filtering ,recommend products to Linear 3 customers (priority is to recommend to active users) Concat according to customers’ past history activities. ReLU • CMul Linear 2 ngnan/papers/ncf.pdf ConcatTable ReLU • Wide & Deep learning Linear 1 MF • Scenario: jointly trained wide linear Conca MLP models and deep neural networks- MF User Embedding MF Item Embedding MLP User Embedding MLP Item Embedding --to combine the benefits of memorization and generalization LookupTable LookupTable LookupTable LookupTable ©2018 Mastercard. Proprietary and Confidential. Embedding for recommender systems. Layers (MF User) (MF Item) (MLP User) (MLP Item) Item Index • User index Item Index User index 9d/39e938c84a867ddf2a8cabc575f Select Select Select Select fba27b721.pdf User Item Pair 13

14. Challenges with Traditional ML : Heavily relies on human machine learning experts Model 1 Model 2 Model n Training Data Sets Partitioning Choose Best Model Testing Data Sets Data Source Validate Model Metrics Validation Data Sets Relies on human to perform the following tasks: ©2018 Mastercard. Proprietary and Confidential. Select and construct appropriate features. Select an appropriate model family. Optimize model hyper parameters. Post process machine learning models. Critically analyze the results obtained. 14

15.With Deep Learning : Gives more options for finding an optimally performing robust configuration Improvements  Common neural network "tricks", including initialization, L2 and dropout regularization, Batch normalization, gradient checking  A variety of optimization algorithms, such as mini-batch gradient descent, Momentum, RMSprop and Adam  Provides optimization-as-a- ©2018 Mastercard. Proprietary and Confidential. service using an ensemble of optimization strategies, allowing practitioners to efficiently optimize models faster and cheaper than standard approaches. 15

16.Our Explore & Evaluation Journey

17.Enterprise requirements for Deep Learning Collocated with mass data Seamless integration with storage Products Internal & External • Analyze a large amount of data on the • Add deep learning capabilities to existing same Big Data clusters where the data Analytic Applications and/or machine learning are stored (HDFS, HBase, Hive, etc.) rather workflows rather than rebuild all of them than move or duplicate data Data governance with Shared infrastructure with Multi- restricted Processing tenant isolated resources • Follow data privacy, regulation and compliance ( such as PCI/PII compliance • Leverage existing Big Data clusters and deep and GDPR rather than operate data in learning workloads should be managed and monitored with other workloads (ETL, data ©2016 Mastercard. Proprietary and Confidential. unsecured zones warehouse, traditional ML etc..) rather than run DL workloads standalone in separate clusters

18.Challenges and limitations to Production considering some “Super Stars”…. • Claimed that the GPU computing are better than CPU which requires new hardware infrastructure (very long timeline normally ) • Success requires many engineer-hours ( Impossible to Install a Tensor Flow Cluster at STAGE ...) • Low level APIs with steep learning curve ( Where is your PHD degree ? ) • Not well integrated with other enterprise tools and need data movements (couldn't leverage the existing ETL, data warehousing and other analytic relevant data pipelines, technologies and tool sets. And it is also a big challenge to make duplicate data pipelines and data copy to the capacity and performance.) • Tedious and fragile to distribute computations ( less monitoring ) • The concerns of Enterprise Maturity and InfoSec ( use GPU cluster with Tensor Flow from ©2016 Mastercard. Proprietary and Confidential. Google Cloud ) ………….. Maybe not your story , but we have ....

19.What does Spark offer? Integrations with existing DL Implementations of DL on Spark libraries • Deep Learning Pipelines (from Databricks) • Caffe (CaffeOnSpark) • BigDL • Keras (Elephas) • DeepDist • mxnet • DeepLearning4J • Paddle • SparkCL • TensorFlow (TensorFlow on Spark, • SparkNet TensorFrames) ©2016 Mastercard. Proprietary and Confidential. • CNTK (mmlspark)

20.Need more break down ….. Programming Contributors commits interface BigDL Scala & Python 50 2221 TensorflowOnSpark Python 9 257 Databricks/tensor Python 9 185 Databricks/spark-deep- Python 8 51 learning Statistics collected on Mar 5th, 2018 Tensor Flow-on-Spark (or Caffe-on-Spark) uses Spark executors (tasks) to launch Tensor Flow/Caffe instances in the cluster; however, the distributed deep learning (e.g., training, tuning and prediction) are performed outside of Spark (across multiple Tensor Flow or Caffe instances). (1) As a results, Tensor Flow/Caffe still runs on specialized HW (such as GPU servers interconnected by InfiniBand), and the Open MP implementations in Tensor Flow/Caffe conflicts with the JVM threading in ©2016 Mastercard. Proprietary and Confidential. Spark (resulting in lower performance). (2) In addition, in this case Tensor Flow/Caffe can only interact the rest of the analytics pipelines in a very coarse-grained fashion (running as standalone jobs outside of the pipeline, and using HDFS files as job input and output).

21. POC: Benchmark BigDL & Spark Mllib User-Merchant Train Multiple Models User-Category Test User-Geo User-Merchant-Geo Test Data Training Data Spark ML Pipeline Stages model …. 1~2 sampled features Models Months candidate partition Estimator Transformer Model Pre-processing Feature Evaluation Selections & Fine Tune Feature 10~12 Spark Pipeline Selection Months Raw models Simple sampled Txns Feature Post Model + Engineering partition Train NCF Model ( BigDL) Processing Ensemble Negative models samples Inference sampled Train Wide and Deep Model ( BigDL) partition models Train AIS Model ( Mlib) ©2016 Mastercard. Proprietary and Confidential. Predictions … Load Parquet … Parquet Files Spark Data Frames Spark Mllib Neural Recommender Using BigDL NCF/ Wide And Deep Benchmark 21

22.Benchmark results ( > 100 rounds) Mllib AIS BigDL NCF BigDL WAD AUROC: A AUROC: A+23% AUROC: A+20% (3 % down) AUPRCs: B AUPRCs: B+31% AUPRCs: B+30% (1% down) recall: C recall: C+18% recall: C+12% (4 % down) precision: D precision: D+47% precision: D+49% (2 % up) 20 precision: E 20 precision: E+51% 20 precision: E+54% (3% up) Parameters : Parameters : Parameters : MaxIter(100) MaxEpoch(10) MaxEpoch(10) learningRate(3e-2) learningRate(1e-2) ©2016 Mastercard. Proprietary and Confidential. RegParam(0.01) Rank(200) learningRateDecay(3e-7) learningRateDecay(1e-7) Alpha(0.01) uOutput(100) uOutput(100) mOutput(200) mOutput(200) batchSize(1.6 M) batchSize(0.6 M) 22

23.Beyond Deep Learning library , we need more automated platform capabilities to fit PROD adoption gaps ©2016 Mastercard. Proprietary and Confidential.

24.Gap 1 : Incremental Tuning Periodic Incremental Tuning Incremental Set Incremental Tuning ( only re-run the Incremental Incremental Fine Tuning & Fact whole pipeline with incremental changed Benchmark Ingest Fine Tuner datasets such as daily changed transactions and benchmark the models ) Incremental  Refresh the dimensional datasets ( such Dimensional as adding new users , items …) Lookups Refresher Models  Load the history model to the context Ingest Benchmark and update incremental parts of model based on the incremental data sets  Periodic Re-training with a batch … algorithm and time-series prediction ©2016 Mastercard. Proprietary and Confidential.  Benchmark the history model and update History Model model and on-board the better ones. Model Loader Model 24

25.Gap 2 : Model Serving to multiple contexts Model Serving (Connect to existing business pipelines , offline ,streaming and real-time )  Build the model serving capability by exporting model to scoring/prediction/recommendation services and integration points  Integrate the model serving services inside the business pipelines , such as embed them into Spark jobs for offline, Spark Streaming jobs for streaming , the real-time “dialogue” with Kafka messaging … ©2016 Mastercard. Proprietary and Confidential. 25

26.Gap 3 : Build user friendly high level pipeline APIs High level pipeline APIs  Abstract and purify high level data and learning pipeline APIs on top of BigDL lib to simply the deep learning model assembly process and increase productivity ©2016 Mastercard. Proprietary and Confidential. 26

27.Gap 4 : Integrated with end to end data pipelines, fill in the loop Embedded the deep learning process into existing enterprise data pipelines  Build pre-defined templates and customized processors to bring deep learning process into the existing enterprise data pipelines , including batch , streaming and real-time ©2016 Mastercard. Proprietary and Confidential. 27

28.Gap 5 : AI Pipelines promotion with automated CI/CD deployment Design and Implement pipelines at Visualized workbench Pipeline Designer Generate AI Pipelines AI Pipelines and Flows Pipelines Promotion Configuration Biz. A Dev Management Sandbox (Tag / Biz. B Branches) Prod(s) Biz. C Local Dev Pipeline Registry ©2016 Mastercard. Proprietary and Confidential. Biz. D Stage Continuous integration  Deployment sequences Biz. E (Parameter, template) Biz. F Automate deployment with CI/CD pipelines 28

29.Community improvements : Analytics Zoo -> Unified Analytics + AI Platform for Spark and BigDL Easier to build end-to-end analytics + AI applications • Reference use cases • Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction, etc. • Predefined models • Object detection, image classification, text classification, recommendations, GAN, etc. • Feature engineering & transformations • Image, text, speech, 3D imaging, time-series, etc. • High level pipeline APIs • Dataframes, ML Pipelines, autograd, transfer learning, Keras/Keras2, etc. ©2016 Mastercard. Proprietary and Confidential.