- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Briefing on the Modern ML Stack with R
We provide an update on developments in the intersection of the R and the broader machine learning ecosystems. These collections of packages enable R users to leverage the latest technologies for big data analytics and deep learning in their existing workflows, and also facilitate collaboration within multidisciplinary data science teams. Topics covered include – MLflow: managing the ML lifecycle with improved dependency management and more deployment targets – TensorFlow: TF 2.0 update and probabilistic (deep) machine learning with TensorFlow Probability – Spark: latest improvements and extensions, including text processing at scale with SparkNLP
展开查看详情
1 .WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
2 .Briefing on the Modern ML Stack with R Javier Luraschi, RStudio #UnifiedDataAnalytics #SparkAISummit
3 .Intro to R “R is a programming language and free software environment for statistical computing and graphics." #UnifiedDataAnalytics #SparkAISummit 3
4 .Modern R The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. library(tidyverse) flights %>% group_by(month, day) %>% summarise(count = n(), avg_delay = mean(dep_delay, na.rm = TRUE)) %>% filter(count > 1000) #UnifiedDataAnalytics #SparkAISummit 4
5 .About RStudio #UnifiedDataAnalytics #SparkAISummit 5
6 .RStudio Multiverse Team Authors of R packages to support Apache Spark, TensorFlow and MLflow. Contributors to tidyverse and Apache Arrow. #UnifiedDataAnalytics #SparkAISummit 6
7 .The Modern ML Stack with R 2016 2018 2015 2017 2019 #UnifiedDataAnalytics #SparkAISummit 7
8 .#UnifiedDataAnalytics #SparkAISummit 8
9 .Spark with R - Motivation library(sparklyr) sc <- spark_connect(master = “local|yarn|mesos|spark|livy”) flights <- copy_to(sc, flights) library(tidyverse) flights %>% group_by(month, day) %>% summarise(count = n(), avg_delay = mean(dep_delay, na.rm = TRUE)) %>% filter(count > 1000) #UnifiedDataAnalytics #SparkAISummit 9
10 .Spark with R - Timeline sparklyr 0.7 Spark sparklyr 0.5 sparklyr 0.9 Pipelines and Livy and dplyr Streams and Machine improvements. Kubernetes. Learning. Sep 2016 Jul 2017 May 2018 Mar 2019 Jan 2017 Jan 2018 Oct 2018 Oct 2019 sparklyr 0.4 sparklyr 0.6 sparklyr 0.8 sparklyr 1.0 R interface for Distributed R and Production Arrow, Apache Spark. external sources. pipelines and XGBoost, graphs. Broom and TFRecords #UnifiedDataAnalytics #SparkAISummit 10
11 .Spark - What’s new? library(sparklyr) library(arrow) #UnifiedDataAnalytics #SparkAISummit 11
12 .Spark - What’s new? - XGBoost library(sparkxgb) iris <- copy_to(sc, iris) xgb_model <- xgboost_classifier(iris, Species ~ ., num_class =3, num_round = 50, max_depth = 4) xgb_model %>% ml_predict(iris) %>% select(Species, predicted_label, starts_with( "probability_")) %>% glimpse() #UnifiedDataAnalytics #SparkAISummit 12
13 .Spark - What’s new? - Broom #UnifiedDataAnalytics #SparkAISummit 13
14 .Spark - New? - TF Records #UnifiedDataAnalytics #SparkAISummit 14
15 . Spark - What’s next? - Genomics github.com/r-spark/variantspark by Samuel Macêdo library(sparklyr) library(variantspark) sc <- spark_connect(master = "local") vsc <- vs_connect(sc) hipster_vcf <- vs_read_vcf(vsc, "inst/extdata/hipster.vcf.bz2") hipster_labels <- vs_read_csv(vsc, "inst/extdata/hipster_labels.txt") labels <- vs_read_labels(vsc, "inst/extdata/hipster_labels.txt") vs_importance_analysis(vsc, hipster_vcf, labels, n_trees = 100) #UnifiedDataAnalytics #SparkAISummit 15
16 . Spark - What’s next? - Genomics github.com/r-spark/sparkhail by Samuel Macêdo library(sparkhail) sc <- spark_connect(master = "local", version = "2.4", config = hail_config()) hl <- hail_context(sc) mt <- hail_read_matrix(hl, system.file( "extdata/1kg.mt", package = "sparkhail")) hail_dataframe(mt) #UnifiedDataAnalytics #SparkAISummit 16
17 .Spark - What’s next? - Genomics github.com/lawremi/hailr by Michael Lawrence #UnifiedDataAnalytics #SparkAISummit 17
18 .Spark - What’s next? - NLP github.com/r-spark/sparknlp by Kevin Kuo #UnifiedDataAnalytics #SparkAISummit 18
19 .Spark - What’s next? - GitHub sparklyr moving to github.com/r-spark and more... #UnifiedDataAnalytics #SparkAISummit 19
20 .#UnifiedDataAnalytics #SparkAISummit 20
21 .TensorFlow with R - Timeline tfdatasets 1.5 Initial Release tensorflow keras 2.0.5 Eager Execution Initial Release Mar 2017 Dec 2017 Jun 2018 Oct 2018 Jul 2017 Jan 2018 Aug 2018 tensorflow 0.7 tfestimators 1.4.2 cloudml 0.5 tfprobability Initial Release Initial Release Initial Release Initial Release #UnifiedDataAnalytics #SparkAISummit 21
22 .TensorFlow - New? - tfdatasets Feature specs ft_spec <- training %>% select(-id) %>% feature_spec(target ~ .) %>% step_numeric_column(ends_with("bin")) %>% step_numeric_column(-ends_with("bin"), -ends_with("cat"), normalizer_fn = scaler_standard()) %>% step_categorical_column_with_vocabulary_list(ends_with("cat")) %>% step_embedding_column(ends_with("cat"), dimension = function(vocab_size) as.integer(sqrt(vocab_size) + 1)) %>% fit() #UnifiedDataAnalytics #SparkAISummit 22
23 .TensorFlow - New? - tfprobability Combine probabilistic models and deep learning on modern hardware # create a binomial distribution with n = 7 and p = 0.3 d <- tfd_binomial(total_count = 7, probs = 0.3) # compute mean d %>% tfd_mean() # compute variance d %>% tfd_variance() # compute probability d %>% tfd_prob(2.3) github.com/rstudio/tfprobability #UnifiedDataAnalytics #SparkAISummit 23
24 .TensorFlow - What’s next? TF 2.0 #UnifiedDataAnalytics #SparkAISummit 24
25 .TensorFlow - Next? - Distributed #UnifiedDataAnalytics #SparkAISummit 25
26 .#UnifiedDataAnalytics #SparkAISummit 26
27 .MLflow - Timeline Available in CRAN since v0.7.0 #UnifiedDataAnalytics #SparkAISummit 27
28 .MLflow - Timeline Docs site at a par with Python! #UnifiedDataAnalytics #SparkAISummit 28
29 .MLflow - What’s next? ● renv (packrat successor) ● Cloud Deployment Targets ● Keras Autolog #UnifiedDataAnalytics #SparkAISummit 29