- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Navigating the ML Pipeline Jungle with MLflow
展开查看详情
1 .Navigating the ML Pipeline Jungle with MLflow: Notes from the Field Thunder Shiviah thunder@databricks.com #SAISDS11
2 . Who am I ● Databricks Solutions Architect focused on machine learning and deep learning ● Previously McKinsey Data Scientist and QuantumBlack Machine Learning Engineer designing and building ML pipelines for Fortune 100 companies ● Developed and deployed models across diverse verticals such as healthcare, telecom, finance, and renewable energy 2
3 .● Overview of challenges with AI in production ● How we’re solving these challenges ● Demos ● A final word on where AI in production is heading ● Q&A 3
4 . AI is a Game Changing Opportunity LOTS OF NEW DATA OPPORTUNITY BUSINESS Customer Data Fraud Detection Click Streams Genome Sequencing Sensor data (IoT) Recommendation Engine Video/Speech DATA ENGINEER DATA SCIENTIST Predictive Maintenance … … Machine Learning Requires Collaborative Experimentation on Big Data
5 .Hardest Part of AI isn’t AI, it’s plumbing “Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015 Data Machine Resource Monitoring Verification Management Data Collection Serving Configuration Infrastructure ML Code Analysis Tools Feature Process Extraction Management Tools Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small green box in the middle. The required surrounding infrastructure is vast and complex.
6 . ML Lifecycle is Manual, Inconsistent and Disconnected Data Prep Build Model Deploy Model ● Low level integrations for ● Ad hoc approach to track ● Multiple tightly coupled Data and ML experiments deployment options ● Difficult to track data used ● Very hard to reproduce ● Different monitoring approach for a model experiments for each framework
7 .How we’re making AI in production simple
8 . Simplifying the AI pipeline ● ML Runtime - Pre-configured ML libraries for CPU and GPU ● Pandas vectorized UDFs ● Distributed Transfer learning with deep learning pipelines ● MLflow
9 .New: Databricks Runtime for ML Ready to use clusters with built-in ML Frameworks GPU support
10 .Run your native Python code with PySpark, fast, with Vectorized Pandas UDFs ● Use Pandas UDFs to convert existing pandas code into performant spark UDFs ● Write pyspark dataframes to Pandas fast
11 .Transfer learning with DL pipelines ● Use pre-trained neural networks to harness the power of neural nets on smaller data. ● Model inference using SparkSQL UDFs
12 . New: Databricks MLflow standardizes ML Lifecycle Data Prep Feed data to Models Enrich data in experiments Build Model Track Experiments Databricks Delta Reproduce experiments Databricks Runtime for ML MLflow Project & Tracker Deploy Model Integrate with multiple clouds Manage and monitor models MLflow Serving
13 .MLflow Components Tracking Projects Models Record and query Packaging format General model format experiments: code, for reproducible runs that supports diverse data, config, results on any platform deployment tools 13
14 .Demo
15 .A word about where AI in production is going
16 .Q&A 16
17 .Thank you! thunder@databricks.com #SAISDS11 17