在培训阶段,用户面临一系列挑战,包括处理各种深度学习框架,硬件要求和配置,更不用说代码质量,一致性和包装。在部署阶段,他们面临另一系列挑战,从数据预处理和后处理的自定义要求到跨框架的不一致,再到服务API缺乏标准化。 IBM Code Model Asset eXchange(MAX)的目标是消除这些进入门槛,以便开发人员为其企业应用程序获取,培训和部署开源深度学习模型。在建立交流中,我们遇到了所有这些挑战等等。

在培训阶段,我们的目标是利用Fabric for Deep Learning(FfDL:https://github.com/IBM/FfDL),这是一个开源项目,为Kubernetes提供独立于框架的深度学习模型培训。对于部署阶段,MAX提供基于容器的完全独立的模型工件,包括端到端深度学习预测管道并公开标准化REST API。

本演讲探讨了构建MAX的过程,遇到的挑战和问题,开发的解决方案,沿途的经验教训以及跨框架,标准化深度学习模型培训和部署的未来和最佳实践。

注脚

展开查看详情

1.IBM Developer Model Asset eXchange Nick Pentreath Principal Engineer @Mlnick #SAISDL6 DBG / Oct 4, 2018 / © 2018 IBM Corporation

2.About @MLnick on Twitter & Github Principal Engineer, IBM CODAIT - Center for Open-Source Data & AI Technologies Machine Learning & AI Apache Spark committer & PMC Author of Machine Learning with Spark Various conferences & meetups DBG / Oct 4, 2018 / © 2018 IBM Corporation

3.Center for Open Source Data and AI Technologies CODAIT codait.org CODAIT aims to make AI solutions dramatically easier to create, deploy, Improving Enterprise AI Lifecycle in Open Source and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission DBG / Oct 4, 2018 / © 2018 IBM Corporation

4.Applying Deep Learning: Perception Training – Data Scientist Train Data ??? ??? $$$ model Consumption – App Developer, Domain Expert Deploy Get model ??? ??? $$$ model DBG / Oct 4, 2018 / © 2018 IBM Corporation

5.Applying Deep Learning: Reality Find Get Test, Train / Use $$$ model code verify, fix Deploy model maybe? DBG / Oct 4, 2018 / © 2018 IBM Corporation

6.Step 1: Find a model
 
 
 … that does what you need
 
 … that is free to use
 
 … that is performant enough DBG / Oct 4, 2018 / © 2018 IBM Corporation

7.Step 2: Get the code
 
 Is there a good implementation available?
 
 … that does what you need
 
 … that is free to use
 
 … that is performant enough
 
 DBG / Oct 4, 2018 / © 2018 IBM Corporation TensorFlow code to build ResNet50 neural network graph

8.Or… Step 2: Get the pre-trained weights
 
 Is there a good pre-trained model available?
 
 … that does what you need
 
 … that is free to use
 
 … that is performant enough
 
 DBG / Oct 4, 2018 / © 2018 IBM Corporation Caffe2 ResNet50 model weights

9.Step 3: Verify the model you found
 
 Check …
 
 … that it does what you need
 
 … that it is free to use
 
 … that it is performant enough
 
 
 
 DBG / Oct 4, 2018 / © 2018 IBM Corporation

10. Step 4(a): Train the model
 
 
 
 
 
 DBG / Oct 4, 2018 / © 2018 IBM Corporation

11.Step 4(a): Train the model
 
 DBG / Oct 4, 2018 / © 2018 IBM Corporation * Logos trademarks of their respective projects

12. Step 4(b): Figure out how to deploy the model
 
 
 
 
 
 
 
 
 … adjust inference code (or write from scratch)
 … package your inference code, model code, and pre-trained weights together
 … deploy your package DBG / Oct 4, 2018 / © 2018 IBM Corporation

13.Step 5: Consume the model
 
 
 
 … plug in to your application
 
 … which does not know
 (or care) about tensors
 
 
 
 DBG / Oct 4, 2018 / © 2018 IBM Corporation

14.Step 6: Profit
 
 
 
 … hopefully
 
 
 
 DBG / Oct 4, 2018 / © 2018 IBM Corporation

15.Applying Deep Learning: Reality Discovery Execution Consumability Find Get Test, Train / Use $$$ model code verify, fix Deploy model maybe? DBG / Oct 4, 2018 / © 2018 IBM Corporation

16.Model Zoos (in theory) DBG / Oct 4, 2018 / © 2018 IBM Corporation

17. Model Zoos (in practice) DBG / Oct 4, 2018 / © 2018 IBM Corporation

18.IBM Developer http://ibm.biz/model- exchange DBG / Oct 4, 2018 / © 2018 IBM Corporation

19. FfDL Github Page 
 Fabric for Deep Learning
 https://github.com/IBM/FfDL
 
 FfDL dwOpen Page
 https://developer.ibm.com/code/open/projects/ https://github.com/IBM/FfDL fabric-for-deep-learning-ffdl/
 
 FfDL Announcement Blog 
 http://developer.ibm.com/code/2018/03/20/fabric- for-deep-learning
 FfDL provides a scalable, resilient, and FfDL 
 FfDL Technical Architecture Blog
 http://developer.ibm.com/code/2018/03/20/ democratize-ai-with-fabric-for-deep-learning
 fault tolerant deep-learning framework 
 Deep Learning as a Service within Watson Studio
 https://www.ibm.com/cloud/deep-learning • Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’) Research paper: “Scalable Multi-Framework Management of Deep Learning Training Jobs” http:// is an open source project which aims at making Deep learningsys.org/nips17/assets/papers/paper_29.pdf Learning easily accessible to the people it matters the most i.e. Data Scientists, and AI developers. 
 • FfDL provides a consistent way to deploy, train and visualize Deep Learning jobs across multiple frameworks like TensorFlow, Caffe, PyTorch, Keras etc. 
 • FfDL is being developed in close collaboration with IBM Research and IBM Watson. It forms the core of Watson`s Deep Learning service in open source. DBG / Oct 4, 2018 / © 2018 IBM Corporation

20.Fabric for Deep Learning
 https://github.com/IBM/FfDL FfDL is built using a microservices architecture on Kubernetes • FfDL platform uses a microservices architecture to offer resilience, scalability, multi-tenancy, and security without modifying the deep learning frameworks, and with no or minimal changes to model code. • FfDL control plane microservices are deployed as pods on Kubernetes to manage this cluster of GPU- and CPU- enabled machines effectively • Tested Platforms: Minikube, IBM Cloud Public, IBM Cloud Private, GPUs using both Kubernetes feature gate Accelerators and NVidia device plugins July 27 2018 / © 2018 IBM Corporation 20

21. FfDL Github Page 
 Fabric for Deep Learning
 https://github.com/IBM/FfDL
 
 FfDL / PyTorch 1.0 Blog Post
 https://developer.ibm.com/blogs/2018/10/01/ https://github.com/IBM/FfDL announcing-pytorch-1-support-in-fabric-for-deep- learning/ FfDL / Horovod Blog Post
 https://developer.ibm.com/code/2018/07/18/ FfDL scalable-distributed-training-using-horovod-in-ffdl/ 
 Just announced: Support for PyTorch 1.0 – including distributed training and ONNX! Supports distributed training via Horovod DBG / Oct 4, 2018 / © 2018 IBM Corporation

22.Trainable Models Training Training Training Data Code Definition Standardized Script DBG / Oct 4, 2018 / © 2018 IBM Corporation

23.Deployable Models Compute Data Model Expertise resources Input/output Pre-trained model REST API processing Deep-Learning asset on Model Asset Exchange ibm.biz/model-exchange DBG / Oct 4, 2018 / © 2018 IBM Corporation

24.Deployable Models Deep-Learning asset on Model Asset Exchange Deploy Microservice Swagger specification Inference endpoint Metadata endpoint

25.Deployable Models Highlights • Image, audio, text, healthcare, time-series and more • Pre- / post-processing & inference wrapped up in Docker container • Generic API framework code - Flask RESTPlus • Swagger specification for API • One-line deployment locally and on a Kubernetes cluster • Code Patterns demonstrating how to easily consume MAX models DBG / Oct 4, 2018 / © 2018 IBM Corporation

26.Summary and Possible Future Directions Current status Potential Future • 22 models (4 trainable) • More deployable models – breadth and depth • Image, audio, text, healthcare, time-series and • More trainable models - transfer learning in more particular • 3 Code Patterns demonstrating how to • New MAX web portal launching soon consume MAX models in a web app • More MAX-related content: • Code Pattern on training an audio classifier • Code Patterns using Watson Machine Learning • Conference talks, meetups • One-line deployment via Docker and on a • Workshops Kubernetes cluster • Enhance production-readiness of MAX models • Improve MAX API framework DBG / Oct 4, 2018 / © 2018 IBM Corporation

27.IBM Developer Model Asset eXchange Free, open-source deep learning models. Wide variety of domains. Multiple deep learning frameworks. Vetted and tested code and IP. http://ibm.biz/model- exchange DBG / Oct 4, 2018 / © 2018 IBM Corporation

28.Thank you! MAX codait.org twitter.com/MLnick FfDL github.com/MLnick developer.ibm.com Sign up for IBM Cloud and try Watson Studio! https://ibm.biz/BdYbTY https://datascience.ibm.com/ DBG / Oct 4, 2018 / © 2018 IBM Corporation

29.DBG / Oct 4, 2018 / © 2018 IBM Corporation