三个深度学习框架的故事:TysFROW、Keras和深度学习管道

本讲座将介绍三种最流行的深度学习框架:TensorFlow、Keras和深度学习管道,以及何时、何地以及如何使用它们。我们还将讨论它们与诸如Apache Spark(它可以处理大量数据)之类的分布式计算引擎的集成。
展开查看详情

1.A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learning Pipelines Brooke Wenig Jules S. Damji Spark + AI Summit, SF 6/5/2018

2. About Us . . . Jules S. Damji Brooke Wenig Apache Spark Developer & Community Databricks Machine Learning Instructor Advocate @Databricks Data Science Solution Consultant @ Databricks Program Chair Spark + AI Summit Software Engineering @ Splunk & MyFitnessPal Software engineering @ Sun Microsystems, Netscape, @Home, VeriSign, Scalix, Centrify, MS Machine Learning (UCLA) LoudCloud/Opsware, ProQuest Fluent in Chinese https://www.linkedin.com/in/dmatrix https://www.linkedin.com/in/brookewenig/ @2twitme

3.Agenda for Today’s Talk • Impact of Big Data • Why Apache Spark? • Short Survey of 3 DL Frameworks • TensorFlow • Keras • Deep Learning Pipelines • Demo • Q&A

4.What has Big Data Done to Us? Source : MIT Permeated our lives

5.Hardest Part of AI isn’t AI, it’s Data “Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015 Data Machine Resource Monitoring Verification Management Data Collection Serving Configuration Infrastructure ML Code Analysis Tools Feature Process Extraction Management Tools Figure 1: Only a small fraction of real-world ML systems is composed of the ML code. The required surrounding infrastructure is vast and complex.

6.What’s Apache Spark & Why

7.Apache Spark: The First Unified Analytics Engine Uniquely combines Data & AI technologies Runtime Delta Spark Core Engine Big Data Processing Machine Learning ETL + SQL + Streaming MLlib + SparkR

8.Survey of Three Deep Learning Frameworks

9.What’s TensorFlow? • Open source from Google, 2015 • Current v1.8 API • Fast: Backend C/C++ • Data flow graphs • Nodes are functions/operators • Edges are input or data (tensors) • Lazy execution • Eager execution (1.7)

10.TensorFlow Programming Stack Use canned estimators Build models Keras Models CPU GPU TPU Android iOS …

11.Why TensorFlow: Community • 100K+ stars! • 11M downloads • Popular open-source code AF AF • TensorFlow Hub & Blog ○ Code Examples & Tutorials! ○ Learn + share from others

12.Why TensorFlow: Tools AF AF • TensorBoard • Deploy + Serve Models • Visualize Tensors flow

13.TensorFlow: We Get it … So What? • Steep learning curve, but powerful!! • Low-level APIs, but offers control!! • Expert in Machine Learning, just learn!! • Yet, high-level Estimators help, you bet!! • Better, Keras integration helps, indeed!!

14.What’s Keras? • Open source Python Library APIs for Deep Learning • Current v2.1.6 APIs François Chollet (Google) • API spec: TensorFlow, CNTK and Theano • Easy to Use High-Level Declarative APIs! • Build layers – Great for Neural Network Applications • Fast Experimentation, Modular & Extensible!

15.Keras Programming Stack Keras API Specification Use canned estimators TF-Keras Theano-Keras CNTK ..... Specific Impl models TensorFlow Workflow CPU GPU TPU Android iOS …

16.Why Keras? • Focuses on Developer Experience • Popular & Broader Community • Supports multiple backends • Modularity • Sequential Layers • Multi-layer input networks model = Sequential() model.add(Dense(32, input_dim=784)) model.add(Activation('relu')) model.add(Dense, 32, activation=’softmax’) ...

17. Transfer Learning & Deep Learning Pipelines

18.What’s Transfer Learning? • Training from scratch requires • Enormous amounts of data • A lot of compute resources & time Intermediate representations learned for one task may be useful for other related tasks IDEA

19.Trained Model GIANT PANDA 0.9 SoftMax RACCOON 0.05 RED PANDA 0.01 …

20.Transfer Learning as a Pipeline Classifier Dog/Cat?

21. When to use Transfer Learning? • Dataset is small & similar • Dataset is large & similar • Dataset is small but different • Dataset is large and different Source: Andrej Karpathy’s Transfer Learning

22.What & Why Deep Learning Pipelines (DLP)? • Open source from Databricks, 2017 • Current v1.0 APIs w/ Apache Spark 2.3 • Primarily in Python • Ease of Use & Integration • Spark MLlib Pipelines & DataFrames • TensorFlow & Keras • SQL – Deploying & Evaluating • Distributed Hyperparameter Tuning • Easy for Transfer Learning

23. DEMO https://dbricks.co/dlf_sai_2018

24.Takeaways: Which One & What Language?

25.Takeaways: When to Use TF, Keras or DLP • Low-level APIs & Control • Integration with Spark • Visualize with • High-level APIs MLlib Pipelines & TensorBoard • TensorFlow Backend DataFrames • Train Models or Transfer Learning • Love Python • Integrated with TF & • Model Serving • Train models or Keras transfer learning • Transfer Learning TensorFlow Keras Deep Learning Pipelines

26.Resources Blog posts Talk, & webinars (http://databricks.com/blog) • Deep Learning Pipelines • GPU acceleration in Databricks • Deep Learning and Apache Spark • Build Scalable Deep Learning Pipelines • Deep Learning course: fast.ai • TensorFlow Tutorials • TensorFlow Dev Summit • Keras/TensorFlow Tutorials • MLFlow.org Docs for Deep Learning on Databricks (http://docs.databricks.com) • Deep Learning Pipelines Example • Apache Spark integration

27. Thank You! Questions? brooke@databricks.com jules@databricks.com (@2twitme)