Apache Submarine: State of the Union

Apache Submarine is the ONE PLATFORM to allow Data Scientists to create end-to-end machine learning workflow. ONE PLATFORM means it supports Data Scientists to finish their jobs on the same platform without frequently switching their toolsets. From dataset exploring data pipeline creation, model training (experiments), and push model to production (model serving and monitoring). All these steps can be completed within the ONE PLATFORM.
In this talk, we’ll start with the current status of Apache Submarine – how it is used today in deployments large and small. We’ll then move on to the exciting present & future of Submarine – features that are further strengthening Submarine as the ONE PLATFORM for data scientists to train/manage machine learning models.

We’ll discuss highlights of the newly released 0.4.0 version, and new features in coming 0.5.0 releases.

展开查看详情

1.Apache Submarine Unified Machine Learning Platform - State of the Union

2.Agenda

3.Machine Learning In Production

4.Machine Learning in tutorial

5.What is included in a ML training lifecycle

6.Data Pipeline For Machine Learning ETL Data Exploration Model Join / Sampling / Training Feature Extraction Split train, test Data set, etc. Model Saving, Model Deployment Versioning, etc. (Online Serving)

7.Data Scientist

8.What Data Scientist Expect? (Cont)

9.What Data Scientist NOT expect to know?

10.What is Apache Submarine?

11.Apache Submarine (TLP)

12.

13.Features of Submarine

14.Available Since 0.4.0 (Released)

15.● ● Under development, Target: 0.5.0 (Oct, 2020)

16.Experiment (Training) Support - Python SDK ● import submarine experiment = create_experiment { Environment=”team_ds_env”, ExperimentConfig = { type = "adhoc", localize_artifacts = [ "s3://bucket/training-job.tar.gz" ], name = "abc", parameter = "python training.py --iteration 10 --input="s3://bucket/input output="s3://bucket/output", } } experiment.run() Available Since 0.4.0 (Released)

17.Experiment (Training) Support - Predefined Experiment Template ● ○ { "input": { ○ "train_data": ["s3://data/tr.libsvm"], "valid_data": ["s3://data/va.libsvm"], ○ "test_data": ["s3://data/te.libsvm"], "type": "libsvm" ● }, ○ "output": { "save_model_dir": "hdfs:///user/submarine/deepfm", ○ "metric": "auc" }, "training": { "batch_size" : 512, "field_size": 39, "num_epochs": 3, "feature_size": 117581, ... } } Under development, Target: 0.5.0 (Oct, 2020)

18.Experiment (Training) Support - UI Screenshots Under development, Target: 0.5.0 (Oct, 2020)

19.Experiment (Training) Support - Git repo integration ● import submarine ● experiment = create_experiment { Environment=”team_ds_env”, ○ code: sync_mode: git url: "https://foo.com/training-job.git" ● # this Git repo will be cloned to /code directory # before the job get started. ○ ExperimentConfig = { <...> ● parameter = "python /code/training-job/training.py --iteration 10 --input="s3://bucket/input output="s3://bucket/output", } } experiment.run() Feature Merged: Target: 0.5.0 (Oct, 2020)

20.Environment profile (Docker/Conda) Support ● name: "my_submarine_env", docker-image: "...", kernel: ○ name: team_default_python_3.7 channels: - defaults dependencies: - _ipyw_jlab_nb_ext_conf=0.1.0=py37_0 ○ - alabaster=0.7.12=py37_0 - anaconda=2020.02=py37_0 ● import submarine experiment = create_experiment { ● Environment=”my_submarine_env”, ExperimentConfig = { parameter = "python /code/training.py --iteration 10 --input="s3://bucket/input output="s3://bucket/output", }... Feature Merged: Target: 0.5.0 (Oct, 2020)

21.Environment profile (Docker/Conda) Support - UI Screenshots

22.Relationship with other Open-Source project? ● Module Open Source Project Environment Profile Management Docker, Conda Notebook Jupyter Experiment On YARN TonY (from LinkedIn) Tensorflow (Kubeflow TFJob) Experiment on K8s PyTorch (kubeflow PyTorchJob) ●

23.Submarine Community

24.Community Developers / Release Plan Release Highlights Initial release, with training on YARN 0.1.0 (Jan, 2019) support. PyTorch (on YARN) support, Integrated to 0.2.0 (Jul, 2019) TonY. 0.3.0 (Feb, 2020) Mini Submarine Support, Basic K8s support Deploy Submarine Server on K8s. 0.4.0 (Jul, 2020) Run Tensorflow / PyTorch on K8s. Notebook Support, Environment profile, ONGOING: 0.5.0 (Sep-Oct, 2020) UIs (For Notebook, Experiment, Environment, etc.) Security, development-related 0.6.0 enhancement.

25.Thanks to our contributors!

26.Community Use Cases

27.Thank you!