- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
SparkWeaver: Full-Stack Solution to Accelerate Real-Time DNN Applications on FPG
展开查看详情
1 .WIFI SSID: SparkAISummit | Password: UnifiedAnalytics
2 .SparkWeaver: Accelerating Real-time DNN Applications with Spark and DNNWEAVER Behnam Robatmili, Jongse Park, and Blake Skinner Bigstream Solutions #UnifiedAnalytics #SparkAISummit
3 .A little about Bigstream BIG DATA PLATFORMS Zero code change Cross platform Dataflow Adaptation Layer Bigstream Dataflow Intelligent, automatic HYPER-ACCELERATION computation slicing Bigstream Hypervisor Cross acceleration hardware 2X to 30X acceleration Many-cores GPU FPGA 3
4 .Ingest Bottleneck in Big Data 4
5 .Applications with Ingest Bottleneck ● Many big data applications ○ Lots of raw data ○ Video surveillance ■ Industrial camera market is projected to increase 2.3x by 2024 [1] ■ For a 4k camera in 60fps, the amount of data per hour is 5.2 TB (1TB for 10fps) ○ Voice recognition ○ Fraud detection 1. https://www.gminsights.com/industry-analysis/ip-camera-market 5
6 .Traditional Architecture does not Scale 2) Data streaming Spark 3) Online analytics DNN Spark Streaming - Online analytics - Cross camera Image features 4) Offline analytics Raw image frames Batch processing - Offline Training Kafka - Offline analytics 1) Ingest stage HDFS 6
7 .Use Cases ● How many people went from the shoe department to the jewelry department? ● How many people were observed walking around the entire building on a given day? Requires cross-camera online and offline analytics 7
8 .Traditional Architecture does not Scale Detection, Tracking, Anomaly detection, cross camera Re-Identification, … Spark DNN Spark Streaming Online analytics Image features Raw image frames Batch processing Offline analytics Kafka HDFS 8
9 .Semantic Compression with DNNs ● DNNs can be used for compression ○ Converting raw data into condensed, semantic data ○ For video analytics, we observed a ~5x compression rate 1. https://www.gminsights.com/industry-analysis/ip-camera-market 9
10 .Large Scale Image Processing ● Deep learning on traditional big data clusters presents many challenges ○ Computationally intensive ■ Adds pressure to the entire ETL toolchain ○ Traditional CPUs are not ideal for evaluating DNN models ○ Doing many levels of DL processing on every input frame requires ■ Storing a lot of raw data ■ Storing and managing all interim data 10
11 .DNN Optimized Ingest Kafka Server Spark Spark Streaming Online analytics Offline Image Features analytics Batch processing HDFS Datacenter Boundary 11
12 .Challenges with DNNs ● Computationally expensive ● Require a lot of data and energy 12
13 .DNNs with FPGA ● FPGAs are a good candidate ○ Faster than CPUs ○ More power efficient than GPUs ○ More programmable than ASICs ● Programmability ○ Need HDL 13
14 .Solution: DNN+FPGA for Ingest ● Ingest only the data you need ○ Run DNNs on the edge ○ Condensed, meaningful features instead of raw, largely meaningless data ● Accelerate with FPGAs ○ Power efficient ○ Can be deployed with minimal infrastructure ○ Using DNNWEAVER technology for programmability ■ Compiler and full stack for automatic DNN acceleration 14
15 .DNNWEAVER 15
16 .DNNWEAVER ● Ease DNN Deployment to FPGAs ○ Tensorflow and ONNX ○ No code changes ○ No hardware expertise needed ● Open source implementation based on original paper[1] ● Enterprise version under development by Bigstream 1: https://github.com/hsharma35/dnnweaver2 16
17 .End-to-end DNN acceleration Macro Tensorflow / Design Translator Dataflow ONNX Planner Graph Inputs Resource Execution Compiler Modules Allocation Schedule Memory Internal Outputs Interface Final Output Hand Design FPGA Binary Integrator Optimized Weaver Templates 17
18 .DNNWEAVER Compute Stack 18
19 .SparkWeaver 19
20 .SparkWeaver Architecture Kafka Server Spark Spark Streaming Online analytics Offline Image Features analytics Batch processing HDFS FPGA Cluster 20
21 .Detection and Tracking with YOLO[1] and Deep SORT[3] Bounding Boxes Tagged Bounding YOLO Deep SORT Boxes [2] [3] Image sources: [1] Redmon et al. “You Only Look Once, Unified, Real-Time Object Detection” [2] https://github.com/thtrieu/darkflow [3] Wojke et. al. “Simple Online and Real Time Tracking with a Deep Association Metric” 21
22 .SparkWeaver Architecture Tracking: Deep_SORT, ... Kafka Spark Image Features HDFS Detection: YOLOv2 22
23 .SparkWeaver Architecture Tracking: Deep_SORT, ... ● Multiple cameras stream video Kafka Spark to an FPGA cluster ● FPGA clusters implement YOLOv2 with DNNWEAVER ● YOLO’s image features are streamed to a Kafka server ● Features are aggregated by Spark and written to HDFS ● Detection performed on the fog Image layer, tracking on the cluster Features HDFS Detection: YOLOv2 23
24 .Single Node Max FPS Benchmark FPS Traditional Architecture 7.3 Detection only 10 Tracking only 12.8 YOLO on DNNWEAVER 13.2 SparkWeaver 12.8 *Dependent on the number of people in a frame 24
25 .Next Release Max FPS (projected) Benchmark FPS SparkWeaver 46.1 *Dependent on the number of people in a frame 25
26 .Single Node Compression Rate • 5.5x (82%) – Deep_SORT tracking only needs the pixels within the bounding boxes, and their locations 26
27 .Demo 27
28 .Streaming and Batch Analytic Operations 28
29 .Streaming Analysis ● Person Re-Identification[1] ○ Multiple solution (hot area of research) ○ Some solutions use pre-trained DNNs[2] ■ Generate a feature vector and apply similarity check on vectors ● Anomaly Detection[3] ○ Suspicious events/threats 1. Zheng et al “Person Re-identification: Past, Present, and Future” 2. Hermans et al “In Defense of the Triplet Loss for Person Re-Identification” 3. https://databricks.com/blog/2018/09/13/identify-suspicious-behavior-in-video-with-databricks-runtime-for-machine-learning.html 29