A “Real-Time” Architecture for Machine Learning Execution with MLeap

This talk describes a production environment that hosts a large random forest model on a cluster of MLeap runtimes. A microservice architecture with a Postgres database backend manages configuration. The architecture provides full traceability and model governance through the entire lifecycle while cutting execution time by nearly 2/3rds. Kount provides certainty in digital interactions like online credit card transactions. Our production environment has extreme requirements for availability: we can process hundreds of transactions per second, have no scheduled downtime, and achieve 99.99% annual uptime. One of our scores uses a random forest classifier with 250 trees and 100,000 nodes per tree. Our original implementation serialized a scikit-learn model, which itself takes 1 GB in memory. It required exactly identical environments in training, where the model was serialized, and production, where it was deserialized and evaluated. This is risky when maintaining high uptime and no planned downtime. The improved solution load balances across a cluster of API servers hosting MLeap runtimes. These model execution runtimes scale separately from the data pre-processing pipeline, which is the more expensive step in our application. Each pre-processing application is connected to multiple MLeap runtimes to provide complete redundancy and independent scaling. We extend model governance into the production environment using a set of services wrapped around a Postgres backend. These services manage model promotion and role across several production, QA, and integration environments. Finally, we describe a “shadow” pipeline in production that can replace any or all portions of transaction evaluation with alternative models and software. A Kafka message bus provides copies of live production transactions to the shadow servers where results are logged for analysis. Since this shadow environment is managed through the same services, code and models can be directly promoted or retired after being test run on live data streams.
展开查看详情

1. A “Real-Time” Architecture for Machine Learning Execution with MLeap Noah Pritikin, Site Reliability Engineer Spark+AI Summit 2019 | April 24, 2019

2. Machine Learning Applications Not “Real-Time” “Real-Time” Agriculture Detecting credit-card fraud Automated medical diagnosis Financial markets Computer vision Online advertising Insurance Recommender systems Marketing Robotics Sentiment analysis … User behavior analytics Weather forecasting … I am defining “Real-Time” as <100ms for the context of this presentation.

3.Agenda What is Kount? Data Pipeline Context “Real-Time” Architecture / Model Governance Statistical Metrics and Monitoring Q&A

4.What is Kount?

5.Fighting Fraud, Boosting Revenue Industry-Leading Technology & Experience Developing fraud-fighting technology since 1999 AI/Machine Learning Implemented in 2007 Dozens of Patented Technologies Continuous Innovation A SaaS-Based, All-in-One Fraud Mitigation Platform Safeguard Some of the World’s Largest Merchants Payment Service Providers Ecommerce Platforms $80M Investment from CVC Growth Partners

6.Data Pipeline Context

7.Data Pipeline Context Highly-available Client-facing Infrastructure / Services Machine Learning MLeap API Servers Execution Platform Data Science Magical Fairy Dust! Machine Learning Model (MLeap Pipeline) Kount Data Lake

8.“Real-Time” Architecture / Model Governance

9.We were faced with a technical problem to solve… Kount Boost Technology™ was released to production in October 2017. First iteration of the architecture based on Python3 / Scikit-learn worked, but… • Lacked portability • Challenging to scale into the future • Lacked multiple model support • Limited model governance Built in-house Apache Spark cluster in January 2018. • Begin iterating on Boost Technology™ model improvements (e.g. feature engineering, tuning model hyper parameters, etc.). Spark ML-generated models depend on a SparkContext, but “real-time” predictions required! First iteration was our baseline for improvement.

10.“Real-Time” Architecture Overview Feature Extraction separated from Transaction Prediction Hosting multiple models allow for blue- green deployments Centralized model governance Load balancer deployed in a “sidecar proxy” implementation allowing for simpler Feature Extraction instance design • Backend health checks make a prediction on a test transaction MLeap API instances run GC-optimized Java8 configuration JVM metrics (e.g. Jolokia, etc.)

11.Dark Production Infrastructure

12.Dark Production Infrastructure An entirely separate parallel infrastructure in production NO customer impact NO “real-time” requirements Parallelization is implemented via a message bus (e.g. Kafka, Kinesis, ZeroMQ, etc.) Optimize cost through only processing a fraction of production traffic (e.g. 1/3) Only logs raw predictions that are returned from MLeap for later analysis Dark production infrastructure enables model governance / validation.

13. Tools Enabling Model Governance Train model & verify quality Unload retired model from MLeap API instances Add model to governance data store Migrate production traffic to MLeap API instances Deploy model to hosting new model dark production Yes infrastructure MLeap API No Replaced instances End model? Bad Dark Good Deploy to available production production MLeap infrastructure API instances test? Centrally track state of machine learning models – end-to-end!

14.Statistical Metrics and Monitoring

15.“Real-Time” Architecture Performance – Transforming LEAP frames This is NOT machine learning model performance (e.g. TOC curve, ROC curve, PR curve, etc.) “Real-Time” system requires metrics to measure the systemic performance.

16.Averages + Distributions! Due to “real-time” requirements, averages don’t cut it (by themselves…) Distributions provide critical visibility in monitoring low latency systems.

17.Applied Statistics – Improvement with MLeap! Boost without MLeap (previous) Average 95th Percentile 99th Percentile Standard Deviation 19.27ms 24ms 37ms 5.31ms Boost with MLeap (current) Average 95th Percentile 99th Percentile Standard Deviation 7.00ms 9ms 16ms 2.41ms 99th percentile saw a ~56% improvement!

18.Consider Improvements to Your “Real-Time” Architecture! MLeap… Model governance… Dark Production Infrastructure (assisting with model testing)… Latency Metrics (emphasize the use of distributions)… Further reading… • “Deploying Apache Spark Supervised Machine Learning Models to Production with MLeap” - https://medium.com/@combust/9e0fb57f79db • MLeap GitHub repo - https://github.com/combust/mleap • MLeap documentation - http://mleap-docs.combust.ml/

19.Thank you! … and, Q&A?