Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbu

“In addition to the many data engineering initiatives at Starbucks, we are also working on many interesting data science initatives. The business scenarios involved in our deep learning initatives include (but are not limited to) planogram analysis (layout of our stores for efficient partner and customer flow) to predicting product pairings (e.g. purchase a caramel machiato and perhaps you would like caramel brownie) via the product components using graph convolutional networks. For this session, we will be focusing on how we can run distributed Keras (TensorFlow backend) training to perform image analytics. This will be combined with MLflow to showcase the data science lifecycle and how Databricks + MLflow simplifies it. “
展开查看详情

1. STARBUCKS TECHNOLOGY Simplifying Deep Learning with HorovodRunner at Starbucks

2. About the presenters Denny Lee is a Technology Vishwanath Subramanian is a Evangelist with Databricks; he Director of Data and Analytics is a hands-on data sciences Engineering at Starbucks. engineer with more than 15 Vishwanath has over 15 years of years of experience experience with a background in developing internet-scale distributed systems, product infrastructure, data platforms, management, software and distributed systems for Denny Lee Vishwanath Subramanian engineering and Analytics. both on-premises and cloud. His key focuses surround At Starbucks, his key focus is on solving complex large scale providing Next Generation data problems – providing not Analytics platforms and enabling only architectural direction large scale data processing and but the hands-on machine learning to enable implementation of these Business Intelligence and Data systems. Services across Starbucks.

3.Scenarios • Smarter checkout experiences • Predicting customer traffic • On-Demand one click Provisioning of Seamlessly integrated Infrastructure Bill of Material for Data Science and Intelligent Apps. • Planogram Analysis • Secured Connectivity to Enterprise Data Platform completely abstracted from Analytics teams. • And more… • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development

4.Current State • Solving complex / streaming image and video analytics is hard • It also typically involves distributing the problem to multiple nodes • But how do I perform Keras+TensorFlow on a distributed environment?

5.Convolutional Neural Networks

6.Convolutional Neural Networks 28 x 28 28 x 28 14 x 14 0 1 Fully Connected Dropout Convolution Convolution Subsampling 8 32 filters 64 filters Stride (2,2) 9 Feature Extraction Classification

7. DEMO Running Keras CNNs Standalone Keras, TensorFlow, HorovodRunner, and MLflow: https://dbricks.co/2D58PDw

8.Introducing HorovodRunner • HorovodRunner is a general API to run distributed learning workloads on Databricks using Uber’s Horovod framework • On-Demand one click Provisioning of Seamlessly integrated • Combining Horovod with Apache Spark’s barrier mode allows longer- Infrastructure Bill of Material for running deep learning training jobs Data Science and Intelligent Apps. • Secured Connectivity to Enterprise Data Platform completely • A Horovod MPI job is embedded as a Spark job using barrier abstracted from Analytics teams. execution mode • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development

9.HorovodRunner • HorovodRunner takes a Python method that contains DL training code with Horovod hooks • The first executor collects the IP address of all of the task executors using BarrierTaskContext • Then it triggers a Horovod job using mpirun. • Each Python MPI process loads the pickled program back, deserializes it, and runs it.

10.HorovodRunner driver workers

11.HorovodRunner driver runCNN(): model.add(Conv2D(32, …)) model.add(Conv2D(64, …)) model.add(MaxPooling2D(…)) model.add(Dense(128, …) model.add(Dense(10, ’softmax’) workers optimizer = keras.optimizers \ .Adadelta(1.0) In standalone or hvd local mode, the code is running on the driver

12.HorovodRunner variables driver runCNN_hvd(): hvd.init() config.tf.ConfigProto() workers # Original code runCNN() callbacks = [] With HorovodRunner, we wrap the original code and code and variables are pushed to the workers

13.HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers

14.HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers

15.HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers

16.HorovodRunner driver workers Variables are transferred from driver to workers Code is executed at the workers

17.Migrate to HorovodRunner # Primary code differences are noted below + hvd.init() + config.tfConfigProto() • On-Demand one click Provisioning + config.gpu_options.allow_growth of=Seamlessly True integrated Infrastructure + config.gpu_options.visible_device_list = Bill of Material for str(hvd.local_rank()) Data Science and Intelligent Apps. + epochs = int(math.ceil(12.0 / hvd.size())) • Secured Connectivity to Enterprise + callbacks = [ Data Platform completely + abstracted from Analytics teams. hvd.callbacks.BroadcastGlobalVariablesCallback(0), + ] • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development

18.Comparing the runs using MLflow • On-Demand one click Provisioning of Seamlessly integrated Infrastructure Bill of Material for Data Science and Intelligent Apps. • Secured Connectivity to Enterprise Data Platform completely abstracted from Analytics teams. • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development

19. DEMO Object Detection Keras, TensorFlow, HorovodRunner, and MLflow

20. Object Detection Approaches RCNN (2012) • Region proposal algorithms - give you a set of regions in the image that are likely to contain objects. • Run those images in the bounding boxes to a pre-trained alexnet to compute the features for that bounding box. • Support vector machine, to classify what the object in the image is of. • Run the box through a linear regression model to output tighter coordinates for the box. • RCNN -> Fast RCNN ->Faster RCNN Rich feature hierarchies for accurate object detection and semantic segmentation - Girshick, Donahue, Darrell, Malik Fast R-CNN - Girshick Faster R-CNN: Towards Real-Time ObjectDetection with Region Proposal Networks - Ren, He, Girshick, Su

21.Object Detection Approaches (contd.) • YOLO – detection as a regression problem • Not a traditional classifier • Divide image into grid, each cell is responsible for predicting n bounding boxes • Output confidence score that predicted bounding box • Gives a probability distribution of all the classes its trained on • Confidence score and class prediction is combined is combined into a score for object classification • Based on threshold, we determine relevant boxes. • All the boxes fed to the neural network all at once. You Only Look Once: Unified, Real-Time Object Detection - Redmon, Divvala, Girshick, Farhadi

22. https://www.starbucks.com/careers/ TALENTED TECHNOLOGISTS DELIVERING TODAY A LEADING INTO THE FUTURE aava