- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Accelerating Machine Learning on Databricks Runtime
展开查看详情
1 .WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2 .Accelerating Machine Learning on Databricks Runtime Hossein Falaki & Yifan Cao, Databricks Inc. #UnifiedAnalytics #SparkAISummit
3 .Outline Databricks Runtime for ML Use Case Examples Under the Hood Demo What is Next #UnifiedAnalytics #SparkAISummit 3
4 . Broad Adoption of ML Disruptive innovations are affecting most enterprises on the planet Healthcare and Genomics Fraud Prevention Digital Personalization Internet of Things and many more customers in different industries and segments #UnifiedAnalytics #SparkAISummit 4
5 .Hidden Tech Debt in ML Systems “Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015 Data Machine Resource Monitoring Verification Management Data Collection Serving Configuration Infrastructure ML Code Analysis Tools Feature Process Extraction Management Tools Small fraction of real-world ML systems is composed of the ML code, as shown by the small green box in the middle. The required surrounding infrastructure is vast and complex. #UnifiedAnalytics #SparkAISummit 5
6 .#UnifiedAnalytics #SparkAISummit 6
7 .ML Runtime: Job To Be Done • As an ML practitioner 1. I want to quickly start with my ML project • Today I have to spend many hours setting up environments 2. I want a single runtime for all steps of my work • I don’t want to move data and code around #UnifiedAnalytics #SparkAISummit 7
8 .ML Project Stages Prepare Build Productionize Quality Data Models Databricks Runtime for ML #UnifiedAnalytics #SparkAISummit 8
9 .What is Databricks Runtime for ML? A ready to use environment for machine learning and data science Built on top of and updated with every Databricks Runtime release APIs for distributed deep learning on Spark (HorovodRunner) Performance improvement for popular distributed algorithms in Spark (GraphFrames, logistic regression and tree classifiers) #UnifiedAnalytics #SparkAISummit 9
10 .What is Databricks Runtime for ML? ML Environment is setup on all cluster nodes with a single click. #UnifiedAnalytics #SparkAISummit 10
11 .1. Prepare Data Easily access, explore, and visualize data in collaborative notebooks Prepare data sets at scale with: o Scala / Python / R / SQL o Optimized Apache Spark o Structured Streaming o Delta Lake o Persisted data meta store Quickly automate notebooks with jobs #UnifiedAnalytics #SparkAISummit 11
12 .2. Build Models Support for popular open source ML frameworks • TensorFlow and Tensorboard • PyTorch • Keras • Horovod for distributed DL • XGBoost • GraphFrames • Popular single node tools in Python and R #UnifiedAnalytics #SparkAISummit 12
13 .3. Productionize ML Models Model Deployment MLflow API for inference on third-party services like Docker containers, AzureML on Azure, SageMaker on AWS Databricks Runtime for ML includes mleap for model serialization. #UnifiedAnalytics #SparkAISummit 13
14 .Use Case Examples #UnifiedAnalytics #SparkAISummit 14
15 . Vision Challenge • 325,000 listed hotels, massive volume of image files • Apply ML to improve match between traveler and hotels with personalized viewing experience Solution • Leverage Databricks to train DL models on 100% of image data • Increase processing power by 20X and enable real-time scoring Result • Hotels.com significantly improved customer engagement and conversions by improving personalization models • Customer Case Study: databricks.com/customers/hotels-com 15
16 . NLP Challenge • >100 million gamers every month • 2% of all games infected by serious toxicity Solution • Leveraged Databricks to apply NLP & ML to proactively identify abusive language • Scaled training on much larger dataset and hyperparameter tuning Result • Riot Games increased customer satisfaction, retention, and lifetime value by detecting abusive language in real-time • Customer Case Study: databricks.com/customers/riot-games 16
17 . IOT Challenge • Offer insights to what consumers buy and watch • Scale from single-machine data science to large datasets to improve product offerings Solution • Leveraged Databricks to ensure collaboration across teams • Reduced annual cost by 40% and improved model performance by 1/3 Result • Nielsen improved competitive offering by applying ML to batch & live stream data from IOT devices • Customer Case Study: databricks.com/customers/nielsen 17
18 .Under the Hood #UnifiedAnalytics #SparkAISummit 18
19 .High-level Engineering Goals • Reproducible environments – Package & dependency management • Testability – Testing & QA infrastructure and process • Cross-compatibility – Careful configuration of all packages to be compatible • Performance optimization – High-performance I/O #UnifiedAnalytics #SparkAISummit 19
20 .Package Management • Package management • Environment management – Python 2.x & Python 3.x environments • Environment is selected during cluster setup • Latest stable versions from Anaconda distribution #UnifiedAnalytics #SparkAISummit 20
21 .Python Environments • ML Runtime vs. Databricks Runtime – Upgraded packages – Conda vs. pip – Additional ML packages • MKL for CPU acceleration • CUDA & cuDNN for GPU acceleration #UnifiedAnalytics #SparkAISummit 21
22 .Dependency Management • bazel for build system • Audit files for change detection – Python: Conda – JAR: maven – R: MRAN – Native: Ubuntu APT and Docker #UnifiedAnalytics #SparkAISummit 22
23 .Docker Containers • We internally use Docker to build Databricks Runtime images – Full control over content – Reproducible and automated • Runtime for ML is a layer on top of DBR – MLR benefits from all existing DBR tests and QA – MLR gets every hotfix and patch that goes into DBR #UnifiedAnalytics #SparkAISummit 23
24 .Extensive Integration Testing • Extensive tests for top-tier packages • Each commit runs unit and integration tests • Nightly tests on master and released branches • All CPU and GPU instances on Azure & AWS • Integration Tests: – Launch a docker container and run code – Launch a cluster and execute notebooks #UnifiedAnalytics #SparkAISummit 24
25 .High Performance FUSE • Why Filesystem in userspace (FUSE)? • We use high-throughput FUSE clients for ML/DL – Azure Storage FUSE on Azure – Goofys on AWS • The mounts points are pre-configured on ML Runtime at dfbs:/ml #UnifiedAnalytics #SparkAISummit 25
26 .Demo #UnifiedAnalytics #SparkAISummit 26
27 .What is Next? #UnifiedAnalytics #SparkAISummit 27
28 .GA of Runtime for ML • Release history: – 4.1 Beta: June 2018 – … – 5.3 GA: April 2019 – 5.4: May 2019 – 6.0: Second Half 2019 #UnifiedAnalytics #SparkAISummit 28
29 .Roadmap for Environment • DBR with Conda (Beta) – Enable customizable environment – Databricks Runtime & Databricks Runtime for ML will continue to be supported • 6.0 – Unify all into single Runtime – Considering removing Python 2.x #UnifiedAnalytics #SparkAISummit 29