MLflow with R

This talk will present R as a programming language suited for solving data analysis and modeling problems, MLflow as an open source project to help organizations manage their machine learning lifecycle and the intersection of both by adding support for R in MLflow. It will be highly interactive and touch on some of the technical implementation choices taken while making R available in MLflow. It will also demonstrate using MLflow tracking, projects, and models directly from R as well as reusing R models in MLflow to interoperate with other programming languages and technologies.
展开查看详情

1. MLflow with R Javier Luraschi September 2018

2.Overview What is MLflow? What is R? MLflow with R

3.What is MLflow?

4. Background Spark Summit from Andrej Karpathy at Tesla The toolchain for the (so ware) 2.0 tack does not exist.

5. MLflow “Helps teams manage their machine learning lifecycle.” Tracking : Track experiments to record and compare params and results. Projects : Reuse and reproduce code to share or transfer to production. Models : Manage and deploy models from across libraries and platforms.

6.What is R?

7. R Language R is a programming language and free so ware environment for statistical computing and graphics. Interface language diagram by John Chambers - Rick Becker useR 2016.

8. R Community Provides a rich package archive provided in CRAN and Bioconductor: dplyr to manipulate data, cluster to analyze clusters, ggplot2 to visualize data, etc. Daily downloads of CRAN packages.

9. R Language Language features I would highlighting: 2.1.1 Vectors 2.1.4 Expression objects 2.1.8 Promise objects 2.1.9 Dot-dot-dot 3.1.4 Operators cran.r project.org/doc/manuals/R-lang.html

10. Use Case Select the cyl and hp columns and add 2 and 20: mtcars ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2

11. How to NOT write R code This is how I would have written R code as a so ware engineer before knowing R: # Select columns subset data = data.frame(mtcars$cyl, mtcars$hp) colnames(data) = c("cyl", "hp") # Transform each row for (idx in 1:nrow(data)) { data$cyl[idx] = data$cyl[idx] + 2 } # One column at a time to use the CPU cache efficiently for (idx in 1:nrow(data)) { data$hp[idx] = data$hp[idx] + 20 }

12. 2.1.1 Vectors Everything is a vector in R: # Select columns subset data = data.frame(mtcars$cyl, mtcars$hp) colnames(data) = c("cyl", "hp") # Transform each row data$cyl = data$cyl + 2 # One column at a time to use the CPU cache efficiently data$hp = data$hp + 20

13. 2.1.9 Dot-dot-dot Dynamic parameters using the ... parameter: # Select columns subset data = data.frame(cyl = mtcars$cyl, hp = mtcars$hp) # Transform each row data$cyl = data$cyl + 2 # One column at a time to use the CPU cache efficiently data$hp = data$hp + 20

14.2.1.4/2.1.8 Expression and Promise objects One can lazily evaluate operations and operate over expressions: # Select columns subset data = select(mtcars, cyl, hp) # Transform each row data = mutate(data, cyl = cyl + 2) # One column at a time to use the CPU cache efficiently data = mutate(data, hp = hp + 20)

15. 3.1.4 Operators Use <- for assignment, or the newer %>% pipe: data mtcars %>% select(mtcars, cyl, hp) %>% mutate(data, cyl = cyl + 2) %>% mutate(data, hp = hp + 20)

16. Linear Models lm(mpg ~ cyl + hp, mtcars) %>% plot()

17.MLflow with R

18. Principles Parity with Python API. Designed for the R user.

19. Installing Install Anaconda or miniconda. Today… git clone https: github.com/mlflow/mlflow devtools install_github("mlflow/mlflow", subdir = "R/mlflow") mlflow mlflow_install() reticulate conda_install("r mlflow", "<local github repo>", pip = TRUE) Soon… install.packages("mlflow") mlflow mlflow_install()

20. Tracking - Implicit Implicit MLflow run: library(mlflow) # Log a parameter (key value pair) mlflow_log_param("param1", 5) # Log a metric; metrics can be updated throughout the run mlflow_log_metric("foo", 1) mlflow_log_metric("foo", 2) mlflow_log_metric("foo", 3) # Log an artifact (output file) writeLines("Hello world!", "output.txt") mlflow_log_artifact("output.txt") Run terminates when the R session finishes or by running: mlflow_end_run() Useful when sourcing files.

21. Tracking - Explicit Explicit MLflow run: library(mlflow) with(mlflow_start_run(), { # Log a parameter (key value pair) mlflow_log_param("param1", 5) # Log a metric; metrics can be updated throughout the run mlflow_log_metric("foo", 1) mlflow_log_metric("foo", 2) mlflow_log_metric("foo", 3) # Log an artifact (output file) writeLines("Hello world!", "output.txt") mlflow_log_artifact("output.txt") })

22. Tracking - Sources mlflow_run("R/tracking.R") Or adding the following to tracking.R in RStudio 1.2: # !source mlflow mlflow_run(entry_point = .file)

23. Tracking - UI mlflow_ui()

24. Projects - Snapshots Create dependencies snapshot: mlflow_snapshot() Then restore snapshot: mlflow_restore_snapshot()

25. Projects - Consuming mlflow_run( "train.R", "https: github.com/rstudio/mlflow example", param_list = list(alpha = 0.2) ) Elasticnet model (alpha=0.2, lambda=0.5): RMSE: 0.827574750159859 MAE: 0.632070002076146 R2: 0.227227498131926 Or from bash: mlflow run entry point train.R https: github.com/rstudio/mlflow example

26. Models - Saving mlflow_save_model(model) Generic functions are serialized with crate: column mlflow_log_param("column", 1) model lm( Sepal.Width ~ x, data.frame(Sepal.Width = iris$Sepal.Width, x = iris[,column]) ) mlflow_save_model( crate(~ stats predict(model, .x), model) ) However, mlflow_save_model() can be extended by packages: #' @export mlflow_save_flavor.tensorflow function( ) {} mlflow_load_flavor.tensorflow function( ) {} mlflow_predict_flavor.tensorflow function( ) {}

27. Models - Predictions mlflow_rfunc_predict( "model", data = data.frame(x = c(0.3, 0.2)) ) 1 2 3.400381 3.406570 Or from bash, mlflow rfunc predic model path model input path data.csv

28. Models - Serving mlflow_rfunc_serve("model") mlflow rfunc serve model path model curl -X POST "http: 127.0.0.1 8090/predict/" -H "accept: application/json" -H "Content-Type: application/json" d '[{"x": [0.3, 0.2]}]'

29. Future Work Currently merged, various github issues pending: