- 微博 QQ QQ空间 贴吧
A “Real-Time” Architecture for Machine Learning Execution with MLeap
收藏 1下载 1
This talk describes a production environment that hosts a large random forest model on a cluster of MLeap runtimes. A microservice architecture with a Postgres database backend manages configuration. The architecture provides full traceability and model governance through the entire lifecycle while cutting execution time by nearly 2/3rds. Kount provides certainty in digital interactions like online credit card transactions. Our production environment has extreme requirements for availability: we can process hundreds of transactions per second, have no scheduled downtime, and achieve 99.99% annual uptime. One of our scores uses a random forest classifier with 250 trees and 100,000 nodes per tree. Our original implementation serialized a scikit-learn model, which itself takes 1 GB in memory. It required exactly identical environments in training, where the model was serialized, and production, where it was deserialized and evaluated. This is risky when maintaining high uptime and no planned downtime. The improved solution load balances across a cluster of API servers hosting MLeap runtimes. These model execution runtimes scale separately from the data pre-processing pipeline, which is the more expensive step in our application. Each pre-processing application is connected to multiple MLeap runtimes to provide complete redundancy and independent scaling. We extend model governance into the production environment using a set of services wrapped around a Postgres backend. These services manage model promotion and role across several production, QA, and integration environments. Finally, we describe a “shadow” pipeline in production that can replace any or all portions of transaction evaluation with alternative models and software. A Kafka message bus provides copies of live production transactions to the shadow servers where results are logged for analysis. Since this shadow environment is managed through the same services, code and models can be directly promoted or retired after being test run on live data streams.