A “Real-Time” Architecture for Machine Learning Execution with MLeap
1. A “Real-Time” Architecture for Machine Learning Execution with MLeap Noah Pritikin, Site Reliability Engineer Spark+AI Summit 2019 | April 24, 2019
2. Machine Learning Applications Not “Real-Time” “Real-Time” Agriculture Detecting credit-card fraud Automated medical diagnosis Financial markets Computer vision Online advertising Insurance Recommender systems Marketing Robotics Sentiment analysis … User behavior analytics Weather forecasting … I am defining “Real-Time” as <100ms for the context of this presentation.
3.Agenda What is Kount? Data Pipeline Context “Real-Time” Architecture / Model Governance Statistical Metrics and Monitoring Q&A
4.What is Kount?
5.Fighting Fraud, Boosting Revenue Industry-Leading Technology & Experience Developing fraud-fighting technology since 1999 AI/Machine Learning Implemented in 2007 Dozens of Patented Technologies Continuous Innovation A SaaS-Based, All-in-One Fraud Mitigation Platform Safeguard Some of the World’s Largest Merchants Payment Service Providers Ecommerce Platforms $80M Investment from CVC Growth Partners
6.Data Pipeline Context
7.Data Pipeline Context Highly-available Client-facing Infrastructure / Services Machine Learning MLeap API Servers Execution Platform Data Science Magical Fairy Dust! Machine Learning Model (MLeap Pipeline) Kount Data Lake
8.“Real-Time” Architecture / Model Governance
9.We were faced with a technical problem to solve… Kount Boost Technology™ was released to production in October 2017. First iteration of the architecture based on Python3 / Scikit-learn worked, but… • Lacked portability • Challenging to scale into the future • Lacked multiple model support • Limited model governance Built in-house Apache Spark cluster in January 2018. • Begin iterating on Boost Technology™ model improvements (e.g. feature engineering, tuning model hyper parameters, etc.). Spark ML-generated models depend on a SparkContext, but “real-time” predictions required! First iteration was our baseline for improvement.
10.“Real-Time” Architecture Overview Feature Extraction separated from Transaction Prediction Hosting multiple models allow for blue- green deployments Centralized model governance Load balancer deployed in a “sidecar proxy” implementation allowing for simpler Feature Extraction instance design • Backend health checks make a prediction on a test transaction MLeap API instances run GC-optimized Java8 configuration JVM metrics (e.g. Jolokia, etc.)
11.Dark Production Infrastructure
12.Dark Production Infrastructure An entirely separate parallel infrastructure in production NO customer impact NO “real-time” requirements Parallelization is implemented via a message bus (e.g. Kafka, Kinesis, ZeroMQ, etc.) Optimize cost through only processing a fraction of production traffic (e.g. 1/3) Only logs raw predictions that are returned from MLeap for later analysis Dark production infrastructure enables model governance / validation.
13. Tools Enabling Model Governance Train model & verify quality Unload retired model from MLeap API instances Add model to governance data store Migrate production traffic to MLeap API instances Deploy model to hosting new model dark production Yes infrastructure MLeap API No Replaced instances End model? Bad Dark Good Deploy to available production production MLeap infrastructure API instances test? Centrally track state of machine learning models – end-to-end!
14.Statistical Metrics and Monitoring
15.“Real-Time” Architecture Performance – Transforming LEAP frames This is NOT machine learning model performance (e.g. TOC curve, ROC curve, PR curve, etc.) “Real-Time” system requires metrics to measure the systemic performance.
16.Averages + Distributions! Due to “real-time” requirements, averages don’t cut it (by themselves…) Distributions provide critical visibility in monitoring low latency systems.
17.Applied Statistics – Improvement with MLeap! Boost without MLeap (previous) Average 95th Percentile 99th Percentile Standard Deviation 19.27ms 24ms 37ms 5.31ms Boost with MLeap (current) Average 95th Percentile 99th Percentile Standard Deviation 7.00ms 9ms 16ms 2.41ms 99th percentile saw a ~56% improvement!
18.Consider Improvements to Your “Real-Time” Architecture! MLeap… Model governance… Dark Production Infrastructure (assisting with model testing)… Latency Metrics (emphasize the use of distributions)… Further reading… • “Deploying Apache Spark Supervised Machine Learning Models to Production with MLeap” - https://medium.com/@combust/9e0fb57f79db • MLeap GitHub repo - https://github.com/combust/mleap • MLeap documentation - http://mleap-docs.combust.ml/
19.Thank you! … and, Q&A?