Intro to MLflow (2018-11-13)

MLFlow introduction
展开查看详情

1. : Platform for Complete Machine Learning Lifecycle Tomas Nykodym Nov 13, 2018

2.Outline Overview of ML development challenges MLflow components Demo How to get started

3. Machine Learning Development is Complex

4. μ ML Lifecycle λθ Tuning Scale Data Prep μ λθ Tuning Delta Raw Data Training Scale Scale Deploy Governance Scale 4

5.Example “I build 100s of models/day to lift revenue, using any library: MLlib, PyTorch, R, etc. There’s no easy way to see what data went in a model from a week ago, tune it and rebuild it.” -- Chief scientist at ad tech firm

6.Example “Our company has 100 teams using ML worldwide. We can’t share work across them: when a new team tries to run some code, it often doesn’t even give the same result.” -- Large consumer electronics firm

7.Introducing Open machine learning platform • Works with any ML library & language • Runs the same way anywhere (e.g. any cloud) • Designed to be useful for 1 or 1000+ person orgs

8.MLflow Design Philosophy 1. “API-first”, open platform • Allow submitting runs, models, etc from any library & language • Example: a “model” can just be a lambda function that MLflow can then deploy in many places (Docker, Azure ML, Spark UDF, …) Key enabler: built around REST APIs and CLI

9.MLflow Design Philosophy 2. Modular design • Let people use different components individually (e.g., use MLflow’s project format but not its deployment tools) • Easy to integrate into existing ML platforms & workflows Key enabler: distinct components (Tracking/Projects/Models)

10.MLflow Components Tracking Projects Models Record and query Packaging format General model format experiments: code, for reproducible that supports diverse configs, results, runs deployment tools …etc on any platform 10

11. MLflow Tracking Notebooks Python or REST API UI Local Apps Tracking Server API Cloud Jobs

12.MLflow Tracking Example import mlflow with mlflow.start_run(): mlflow.log_param("layers", layers) mlflow.log_param("alpha", alpha) # train model mlflow.log_metric("mse", model.mse()) mlflow.log_artifact("plot", model.plot(test_df)) mlflow.tensorflow.log_model(model) 12

13. MLflow Projects Local Execution Project Spec Code Config Data Remote Execution

14.Example MLflow Project my_project/ ├── MLproject conda_env: conda.yaml │ entry_points: │ main: parameters: │ training_data: path lambda: {type: float, default: 0.1} │ command: python main.py {training_data} {lambda} │ ├── conda.yaml ├── main.py └── model.py $ mlflow run git://<my_project> ... mlflow.run(“git://<my_project>”, ...)

15. MLflow Models Inference Code Model Format Flavor 1 Flavor 2 Batch & Stream Scoring Simple model flavors Run Sources usable by many tools Cloud Serving Tools

16. Example MLflow Model my_model/ ├── MLmodel run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 │ flavors: │ tensorflow: Usable by tools that understand saved_model_dir: estimator │ signature_def_key: predict TensorFlow model format python_function: │ loader_module: mlflow.tensorflow Usable by any tool that can run │ Python (Docker, Spark, etc!) └── estimator/ ├── saved_model.pb └── variables/ ... >>> mlflow.tensorflow.log_model(...)

17.Demo

18.Goal: Predict Price of Airbnb Listings listing attributes bathrooms: 1 bedrooms: 2 accommodates: 4 total_reviews: 45 cleanliness_rating: 9 location_rating: 10 f (x) price: 150 checkin_rating: 10 Model zip_code: 94105 based on data from insideairbnb.com

19.Advanced MLFlow - HyperParameters Projects Models HyperParam Train Model mlflow.log_artifact mlflow run ... Logged Model Search Run Run mlflow.get_metric() Tracking mlflow.log_metric() 19

20.Advanced MLFlow - Multistep Workflow Data Collection ETL Model Training Streaming SQL CPU CPU CPU CPU GPU GPU

21.Ongoing MLflow Roadmap • TensorFlow, Keras, PyTorch, H2O, MLlib integrations ✔ • R and Java language APIs ✔ • Multi-step workflows • Hyperparameter tuning • Data source API based on Spark data sources • Model metadata & management

22.Get started with MLflow install.packages(“mlflow”) to get started Find docs & examples at mlflow.org tinyurl.com/mlflow-slack 22

23.Thank you!

24.Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX + Standardize the data prep / training / deploy loop: if you work with the platform, you get these! Can we provide similar benefits in an open manner?

25. MLflow Tracking Notebooks R or REST API UI Local Apps Tracking Server API Cloud Jobs

26.Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Artifacts: arbitrary files, including models Source: what code ran?

27.Takeaway Workflow tools can greatly simplify the ML lifecycle • Improve usability for both data scientists and engineers • Same way software dev lifecycle tools simplify development

28.Example MLflow Project my_project/ ├── MLproject conda_env: conda.yaml │ entry_points: │ main: parameters: │ training_data: path lambda: {type: float, default: 0.1} │ command: python main.py {training_data} {lambda} │ ├── conda.yaml ├── main.py └── model.py $ mlflow run git://<my_project> ... mlflow_run(“git://<my_project>”, ...)

29.Example MLflow Model my_model/ ├── MLmodel run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 │ flavors: │ tensorflow: Usable by tools that understand saved_model_dir: estimator │ signature_def_key: predict TensorFlow model format python_function: │ loader_module: mlflow.tensorflow Usable by any tool that can run │ Python (Docker, Spark, etc!) └── estimator/ ├── saved_model.pb └── variables/ ...