MLFlow和spark在机器学习方面的进展、Project Hydrogen和spark在深度学习方面的进展

6与19日【MLFlow和spark在机器学习方面的进展、Project Hydrogen和spark在深度学习方面的进展 】

讲师:江宇,阿里云EMR技术专家。从事Hadoop内核开发,目前专注于机器学习、深度学习大数据平台的建设。

内容介绍:mlflow为企业提供一套开源的机器学习端到端工具,同时,project hydrogen项目旨在将AI框架与Spark更好的结合。本次直播介绍mlflow的场景和使用方式,project hydrogen的进展以及我们如何通过project hydrogen提供的能力更好的将Spark与AI结合。


阿里巴巴开源大数据EMR技术团队成立Apache Spark中国技术社区,定期打造国内Spark线上线下交流活动。请持续关注。
钉钉群号:21784001
团队群号:HPRX8117
微信公众号:Apache Spark技术交流社区

展开查看详情

1.

2.MLflow实践与Project Hydrogen最新进展 阿⾥里里云-E-MapReduce 江宇 2019.6.19

3.内容 1 2 MLflow Project Hydrogen

4.MLflow Part I

5.MLflow 机器器学习流程和挑战 MLflow解决⽅方式 MLflow组件构成 MLflow系统演示

6.MLflow 机器器学习整体流程⾮非常复杂 Data Prepare Row Data Trainging Deploy

7.MLflow 机器器学习整体流程⾮非常复杂 Data Prepare Aliyun EMR Row Data Trainging Deploy

8.MLflow Scale 机器器学习整体流程⾮非常复杂 Data Prepare Aliyun EMR Model Row Data Trainging Exchange Deploy Scale Scale Governance Scale

9.MLflow 机器器学习整体流程⾮非常复杂 hidden technical debt in machine learning systems

10.MLflow 成熟的ML平台 Facebook FBLearner, Uber Michelangelo, Google TFX + 成熟的流程、包含了了从数据处理理、训练、部署的全流程 - 与公司的基础架构紧密结合,有限的算法 - 闭源

11.MLflow MLflow介绍 开源的机器器学习平台 ‣ 兼容各种ML library和各种编程语⾔言 Python, Java, R ‣ 统⼀一的运⾏行行环境,可以运⾏行行任何地⽅方(⽐比如阿⾥里里云中) ‣ 设计时考虑到各种规模⽤用户,1或者1000+⼈人公司 发展历史 ‣ 2018年年6⽉月开源 ‣ 2019年年4⽉月 Spark AI⼤大会发布 1.0

12.MLflow MLflow介绍

13.MLflow MLflow — Tracking MLflow import mlflow with mlflow.start_run(): Tracking mlflow.log_param("x", 1) mlflow.log_metric("y", 2) with mlflow.start_run(): Record and Query for epoch in range(0, 3): experiments:code, configs, mlflow.log_metric(key="quality", value=2*epoch, step=epoch) results, parameters …etc mlflow.log_artifact('/root/mlflow_tracking_example.py')

14.MLflow MLflow — Tracking 基本概念 • Code Version • Start Time • Source • Parameters • Metrics • Artifacts

15.MLflow MLflow — Tracking DEMO

16.MLflow MLflow — Tracking 存储 Backend Store • File Store • Database-backend Store Artifact Store • Aws S3 • Azure Blob Storage • Google Cloud Storage • FTP Server • SFTP Server • NFS • HDFS

17.MLflow MLflow — Project Local Execution MLflow Project Packaging format Aliyun EMR for reproducible runs Remote Execution on any platform Code Config Data

18.MLflow MLflow — Project 定义ml项⽬目的⽬目录结构 • 本地⽬目录或者Github仓库 • 可选的MLproject⽂文件⽤用于整个项⽬目配置 定义项⽬目的环境依赖 • Conda(Docker..)等依赖 • 可以在⼏几乎所有环境进⾏行行重复运⾏行行 通过API在本地或者远程运⾏行行 • mlflow run (cli) • mlflow.projects.run() (API)

19. MLproject MLflow name: tutorial MLflow — Project conda_env: conda.yaml User project/ entry_points: main: parameters: MLproject alpha: float l1_ratio: {type: float, default: 0.1} conda.yaml command: "python train.py {alpha} {l1_ratio}" train.py conda.yaml tuil.py name: tutorial channels: …… - defaults dependencies: - python=3.6 - scikit-learn=0.19.1 - pip: - mlflow>=1.0

20.MLflow MLflow — Project DEMO

21.MLflow MLflow — Models Inference Batch & Stream Scoring Cloud Serving Tools ML framework

22.MLflow MLflow — Models Inference Mlflow Model Format Flavor 1 Flavor 1 Batch & Stream Scoring sklearn TF Cloud Serving Tools ML framework

23.MLflow MLflow — Models 定义模型存储格式 • 根据MLmodel⽂文件来定义各种 • 不不同框架具有不不同flavors(格式) 定义项⽬目的环境依赖 • Conda(Docker..)等依赖 • 可以在⼏几乎所有环境进⾏行行重复运⾏行行 通过API在本地或者远程运⾏行行 • mlflow server(cli) • mlflow.sklearn.load() (API)

24.MLflow MLmodel flavors: MLflow — Model python_function: data: model.pkl env: conda.yaml loader_module: mlflow.sklearn User_model / python_version: 3.6.5 sklearn: pickled_model: model.pkl serialization_format: cloudpickle MLmodel sklearn_version: 0.19.1 utc_time_created: '2019-06-17 02:46:05.185724' conda.yaml model.pkl conda.yaml channels: …… - defaults dependencies: - python=3.6.5 - scikit-learn=0.19.1 - pip: - mlflow - cloudpickle==0.5.3 name: mlflow-env

25.MLflow MLflow — Model • Python Function (generic) • R Function • H20 • Keras • MLeap • PyTorch • SKlearn • Spark MLlib • TensorFlow • ONNX

26.MLflow MLflow — Model DEMO

27.Project Hydrogen Part II

28.Project Hydrogen Project Hydrogen aims at enabling first-class support for all distributed machine learning frameworks on Apache SparkTM, by substantially improving the performance and fault-recovery of distributed deep learning and machine learning frameworks on Spark Barrier Optimized Accelerator Execution Data Aware Mode Exchange Scheduling

29.Project Hydrogen Data Prepare Aliyun EMR Row Data Trainging Deploy