Accelerating the Machine Learning Lifecycle with MLflow



1.Accelerating Machine Learning Development with Matei Zaharia @matei_zaharia

2. ML development is harder than traditional software development

3.Traditional Software Machine Learning Goal: meet a functional Goal: optimize a metric (e.g., CTR) specification • Constantly experiment to improve it Quality depends only on code Quality depends on input data, training method, tuning params Typically pick one software stack Compare many libraries, models & algorithms for the same task

4.Production ML is Even Harder DATA ML apps must be fed new data ENGINEER to keep working Data Prep ML ENGINEER Design, retraining & inference Raw Data Training done by different people Software must work across Deployment many environments MOBILE DEVELOPER WEB DEVELOPER

5.Example “I build 100s of models/day to lift revenue, using any library: MLlib, PyTorch, R, etc. There’s no easy way to see what data went in a model from a week ago and rebuild it.” -- Chief scientist at ad tech firm

6.Example “Our company has 100 teams using ML worldwide. We can’t share work across them: when a new team tries to run some code, it doesn’t even give the same result.” -- Large consumer electronics firm

7.Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX Standardize the data prep / training / deploy cycle: if you work within the platform, you get these! Limited to a few algorithms or frameworks Tied to one company’s infrastructure Can we provide similar benefits in an open manner?

8.Open source machine learning platform • Works with any ML library, algorithm, language, etc • Key idea: open interface design (use with any code you already have) Tackles three key problems: • Experiment tracking: MLflow Tracking • Reusable workflows: MLflow Projects Growing community with >80 contributors! • Model packaging: MLflow Models

9.Experiment Tracking without MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”)) What version of What ifif II tune upgrade ! What my code was this this my otherML library? parameter? result from?

10.Experiment Tracking with MLflow data = load_text(file) $ mlflow ui ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learning_rate”, lr) mlflow.log_metric(“score”, score) Track parameters, metrics, mlflow.sklearn.log_model(model) output files & code version

11.MLflow UI: Inspecting Runs

12.MLflow UI: Comparing Runs

13.MLflow Tracking: Extensibility Using a notebook? Log its final state as HTML Using TensorBoard? Record the logs for each run Etc.

14.MLflow Projects: Reusable Workflows “How can I split my workflow into modular steps?” “How do I run this workflow that someone else wrote?”

15.MLflow Projects Simple packaging format for code + dependencies my_project/ conda_env: conda.yaml ├── MLproject │ entry_points: main: │ parameters: │ training_data: path │ lr: {type: float, default: 0.1} command: python {training_data} {lr} │ ├── conda.yaml ├── $ mlflow run git://<my_project> └── ...“git://<my_project>”, ...)

16.Composing Projects r1 =“ProjectA”, params) if r1 > 0: r2 =“ProjectB”, …) else: r2 =“ProjectC”, …) r3 =“ProjectD”, r2)

17.MLflow Models: Packaging Models “How can I reliably pass my model to production apps?”

18.MLflow Models: Packaging Models REST Serving Model Format Batch & Stream Scoring Python Flavor ONNX Flavor ... LIME TCAV Packaging Format Evaluation & Debug Tools Model Logic Packages arbitrary code (not just model weights)

19.Example MLflow Model my_model/ ├── MLmodel run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 │ flavors: │ tensorflow: Usable by tools that understand saved_model_dir: estimator │ signature_def_key: predict TensorFlow model format │ python_function: Usable by any tool that can run loader_module: mlflow.tensorflow │ Python (Docker, Spark, etc!) └── estimator/ ├── saved_model.pb └── variables/ $ mlflow pyfunc serve -r <run_id> ... spark_udf = pyfunc.spark_udf(<run_id>)

20.Model Deployment without MLflow DATA PRODUCTION SCIENTIST ENGINEER Code & Models Please deploy this … TensorFlow Spark SciKit ArXiv R model! paper! model! model!

21.Model Deployment with MLflow DATA PRODUCTION SCIENTIST ENGINEER Pleasedeploy Please run this this Don’t OK, even it’s up tell REST in our me MLflow MLflow Project Model! what serverArXiv paper & Spark! nightly for updates! that’s from...

22.Combining These APIs Hyperparam Tuner Models Driver Program Re su Downstream lts Consumers Project Calls Tracking Server c king Tra nfo I Parallel Runs

23.MLflow Community 81 contributors from >40 companies since June 2018 Major external contributions: • Database storage • Docker projects • R API • Integrations with PyTorch, H2O, HDFS, GCS, … • Plugin system

24.Example Use Cases Energy company Build and track 100s of models for power plants, energy consumers, etc Online marketplace Package and deploy DL pipelines with Keras+PyTorch in the cloud Online retailer Package business logic and models for rapid experimentation & deployment

25.Upcoming Talks at Spark+AI Summit

26.Meetup Group

27.Conclusion ML development cycle tools can simplify development for both model designers and production engineers “Open interface” design enables broad collaboration Learn about MLflow at or try it with pip install mlflow