SparkWeaver: Full-Stack Solution to Accelerate Real-Time DNN Applications on FPG

In recent years, Deep Neural Networks (DNNs) have rapidly advanced and reached a sufficiently mature state to be adopted in real-world applications. Especially, as DNNs solve difficult problems in computer vision (e.g., image classification and object detection), community has started exploring the use of DNNs in real-time or nearly real-time vision applications. Video content analysis (VCA) is one such application that often utilizes DNNs as its core engine and offers plentiful capabilities for a wide range of domains including safety and security, flame and smoke detection, automotive, health-care, home automation, and retail. Apache Spark Streaming has been the de-facto standard platform where real-time Big data applications such as VCA are run at hyper-scale. While the integration of DNNs and real-time vision applications promise ample opportunities for Spark Streaming community, the massive compute demand to accommodate (1) the ever-increasing DNN model size, and (2) the growing scale of data (e.g., billions of high-resolution video data) significantly limits its practicality. In this work, we seek to address this challenge and provide a solution to meet this gigantic compute demand by leveraging FPGA acceleration. We develop SparkWeaver, a full-stack solution that, from DNN-based real-time vision applications (e.g., VCA), automatically offloads the heavy DNN computations to our FPGA accelerators without the developers’ intervention. We use FPGAs as our DNN acceleration platform since they not only offer low inference-latency and high power-efficiency, oftentimes required for real-time vision applications, but also provide a programmable substrate for acceleration of non-DNN components of the applications. To demonstrate the easy use of the solution, we will do a live-demo that shows the SparkWeaver’s automated workflow that takes a DNN-based VCA application written using Spark Streaming APIs and runs the VCA application on a Spark cluster, while offloading DNN computations to FPGAs, without imposing additio
展开查看详情

1.WIFI SSID: SparkAISummit | Password: UnifiedAnalytics

2.SparkWeaver: Accelerating Real-time DNN Applications with Spark and DNNWEAVER Behnam Robatmili, Jongse Park, and Blake Skinner Bigstream Solutions #UnifiedAnalytics #SparkAISummit

3.A little about Bigstream BIG DATA PLATFORMS Zero code change Cross platform Dataflow Adaptation Layer Bigstream Dataflow Intelligent, automatic HYPER-ACCELERATION computation slicing Bigstream Hypervisor Cross acceleration hardware 2X to 30X acceleration Many-cores GPU FPGA 3

4.Ingest Bottleneck in Big Data 4

5.Applications with Ingest Bottleneck ● Many big data applications ○ Lots of raw data ○ Video surveillance ■ Industrial camera market is projected to increase 2.3x by 2024 [1] ■ For a 4k camera in 60fps, the amount of data per hour is 5.2 TB (1TB for 10fps) ○ Voice recognition ○ Fraud detection 1. https://www.gminsights.com/industry-analysis/ip-camera-market 5

6.Traditional Architecture does not Scale 2) Data streaming Spark 3) Online analytics DNN Spark Streaming - Online analytics - Cross camera Image features 4) Offline analytics Raw image frames Batch processing - Offline Training Kafka - Offline analytics 1) Ingest stage HDFS 6

7.Use Cases ● How many people went from the shoe department to the jewelry department? ● How many people were observed walking around the entire building on a given day? Requires cross-camera online and offline analytics 7

8.Traditional Architecture does not Scale Detection, Tracking, Anomaly detection, cross camera Re-Identification, … Spark DNN Spark Streaming Online analytics Image features Raw image frames Batch processing Offline analytics Kafka HDFS 8

9.Semantic Compression with DNNs ● DNNs can be used for compression ○ Converting raw data into condensed, semantic data ○ For video analytics, we observed a ~5x compression rate 1. https://www.gminsights.com/industry-analysis/ip-camera-market 9

10.Large Scale Image Processing ● Deep learning on traditional big data clusters presents many challenges ○ Computationally intensive ■ Adds pressure to the entire ETL toolchain ○ Traditional CPUs are not ideal for evaluating DNN models ○ Doing many levels of DL processing on every input frame requires ■ Storing a lot of raw data ■ Storing and managing all interim data 10

11.DNN Optimized Ingest Kafka Server Spark Spark Streaming Online analytics Offline Image Features analytics Batch processing HDFS Datacenter Boundary 11

12.Challenges with DNNs ● Computationally expensive ● Require a lot of data and energy 12

13.DNNs with FPGA ● FPGAs are a good candidate ○ Faster than CPUs ○ More power efficient than GPUs ○ More programmable than ASICs ● Programmability ○ Need HDL 13

14.Solution: DNN+FPGA for Ingest ● Ingest only the data you need ○ Run DNNs on the edge ○ Condensed, meaningful features instead of raw, largely meaningless data ● Accelerate with FPGAs ○ Power efficient ○ Can be deployed with minimal infrastructure ○ Using DNNWEAVER technology for programmability ■ Compiler and full stack for automatic DNN acceleration 14

15.DNNWEAVER 15

16.DNNWEAVER ● Ease DNN Deployment to FPGAs ○ Tensorflow and ONNX ○ No code changes ○ No hardware expertise needed ● Open source implementation based on original paper[1] ● Enterprise version under development by Bigstream 1: https://github.com/hsharma35/dnnweaver2 16

17.End-to-end DNN acceleration Macro Tensorflow / Design Translator Dataflow ONNX Planner Graph Inputs Resource Execution Compiler Modules Allocation Schedule Memory Internal Outputs Interface Final Output Hand Design FPGA Binary Integrator Optimized Weaver Templates 17

18.DNNWEAVER Compute Stack 18

19.SparkWeaver 19

20.SparkWeaver Architecture Kafka Server Spark Spark Streaming Online analytics Offline Image Features analytics Batch processing HDFS FPGA Cluster 20

21.Detection and Tracking with YOLO[1] and Deep SORT[3] Bounding Boxes Tagged Bounding YOLO Deep SORT Boxes [2] [3] Image sources: [1] Redmon et al. “You Only Look Once, Unified, Real-Time Object Detection” [2] https://github.com/thtrieu/darkflow [3] Wojke et. al. “Simple Online and Real Time Tracking with a Deep Association Metric” 21

22.SparkWeaver Architecture Tracking: Deep_SORT, ... Kafka Spark Image Features HDFS Detection: YOLOv2 22

23.SparkWeaver Architecture Tracking: Deep_SORT, ... ● Multiple cameras stream video Kafka Spark to an FPGA cluster ● FPGA clusters implement YOLOv2 with DNNWEAVER ● YOLO’s image features are streamed to a Kafka server ● Features are aggregated by Spark and written to HDFS ● Detection performed on the fog Image layer, tracking on the cluster Features HDFS Detection: YOLOv2 23

24.Single Node Max FPS Benchmark FPS Traditional Architecture 7.3 Detection only 10 Tracking only 12.8 YOLO on DNNWEAVER 13.2 SparkWeaver 12.8 *Dependent on the number of people in a frame 24

25.Next Release Max FPS (projected) Benchmark FPS SparkWeaver 46.1 *Dependent on the number of people in a frame 25

26.Single Node Compression Rate • 5.5x (82%) – Deep_SORT tracking only needs the pixels within the bounding boxes, and their locations 26

27.Demo 27

28.Streaming and Batch Analytic Operations 28

29.Streaming Analysis ● Person Re-Identification[1] ○ Multiple solution (hot area of research) ○ Some solutions use pre-trained DNNs[2] ■ Generate a feature vector and apply similarity check on vectors ● Anomaly Detection[3] ○ Suspicious events/threats 1. Zheng et al “Person Re-identification: Past, Present, and Future” 2. Hermans et al “In Defense of the Triplet Loss for Person Re-Identification” 3. https://databricks.com/blog/2018/09/13/identify-suspicious-behavior-in-video-with-databricks-runtime-for-machine-learning.html 29