加速数据中心的推理

为了实现自主车辆的愿景,研究开发平台需要每天处理多个peta字节的驱动数据。庞大的数据量要求对重要区域进行自动化处理,例如车道变更、中断应用、到达十字路口、雪天/雨天、眩光等等。然后更仔细地分析这些更丰富的部分,甚至可能用于改进机器学习模型或测试激励计划。
展开查看详情

1.Accelerating Inference in the Data Center Dr. Malini Bhandaru & Karol Zalewski Contributors: Santiago Mok, Konrad Kurdej, Sundar Nadathur, Alexander Kanevskiy, Ismo Puustinen Intel #HWCSAIS11

2.Autonomous Vehicles – R & D Data Pressure 1-20 TB/car/hour # cameras, resolution, other sensor arrays Image Credit: https://clepa.eu/mediaroom/autonomous-vehicles-will-drive-change-auto-manufacturing-insurance/ https://ia.acs.org.au/article/2017/who-should-the-driverless-car-kill-.html #HWCSAIS11 2

3.Inference Everywhere Faster Please! • Speed ground truth generation – Human improves upon automated • Speed Privacy transformations – Face/license plate obscurring • Speed simulation – Detect (edge-ish), Plan, Act https://medium.com/@xslittlegrass/self-driving-car-in-a-simulator-with-a-tiny-neural-network-13d33b871234 #HWCSAIS11 3

4.Compute Continuum GPUs ASICs CPUs FPGAs, Flexibile,Slower Fixed, Faster Movidius Can Spark Leverage? Easily? #HWCSAIS11 4

5.FPGA Movidus Chip • Logic blocks, memory, security, • Programmable, SDK variable sizes • Low Power • Programmable, OpenCL • Tuned for image processing • Fast but Expensive • Fast, Inexpensive • Applications: Networking, • Applications: Drones, Telecommunication, Research, Cameras, Augmented Reality Machine Learning #HWCSAIS11 5

6.Data Center Platform Storage HDFS, Ceph, MySQL, S3 Drivers Orchestrator Stacks • Fungible Spark, Hadoop, Hetero Hardware CPUs • Dynamic YARN Kubernetes GPUS FPGAs • Resilient AI Frameworks Programmable Inference Chips TensorFlow • Easy to Use Caffe2, .. Glue Technologies • Kafka • Fast Containers • Oozie • Argo #ML9SAIS 6

7.Environment • Kubernetes – resilient, auto scaling, easy to use • Spark – big data in memory processing, possible data locality #HWCSAIS11 7

8. Kubernetes Device Plugin Enables use of new Resources MASTER NODE API Server CRI Authentication CRI Container Kubelet shim Runtime Authorization Admission Device Plugin API Control Device Vendor Plugin Driver etcd Controller Core Components Manager Scheduler Extensions/Plugins Device-Specific Software https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/ New Work for Device Plugins #HWCSAIS11 8 8

9.Experiment • SqueezeNet 1.1 • gRPC calls 3-4 ms • Data pre-Processing 16-30 ms #HWCSAIS11 9

10.FPGA Inference Supported Deep Learning Topologies • AlexNet • Model Size • GoogleNet v1 • VGG-16 & VCG-19 • FPGA Size • SqueezeNet 1.0 & 1.1 • • Trade-off • ResNet-18 SqueezeNet-based variant of – Model accuracy, speed SSD • GoogleNet-based variant of SSD – Compile to target hardware • VGG-based variant of SSD #HWCSAIS11 10

11.Movidius USB Learnings & Workarounds Common Paradigm: • No Python support - loss of data locality TensorFlow USB • Model – as-a-service Serving • Access to host network (isolation loss) `--net=host` • Visibilibility into Device Manager events in Docker environment `libusb` • Privilege Escalation (insecure) • Movidius NCSDk2 – resolves some issues `--privileged` • Feedback to Movidius team • Access to Virtual File System to access • Service running on bare metal USB device from within container • Movidius PCIe device coming soon! `-v /dev:/dev` USB related issues moot #HWCSAIS11 11

12.Movidius Next • SDK2 just released – Up to 10 models may co-exist on one device, – FIFO queue, – 32 bit floating point • Chip-2 Coming soon – at least an order of magnitude faster https://developer.movidius.com/start https://github.com/movidius/ncsdk https://github.com/kzzalews/sparkaisummit_movidius #HWCSAIS11 12

13.Results CPU FPGA Movidius Software Tools CentOS 7.4 SDK 1 Intel Acceleration StackStack 1.0 Intel OpenVINO Toolkit with FPGA Support Hardware CPU: Intel Xeon CPU E5- FPGA: Arria 10 GX (1150K Movidius 1650 v2 @ 3.50GHz Logic elements, 8GB DDR4, PCIe Gen3) Inference Time/image 7.5 ms 3.2 ms 34 ms put your #assignedhashtag here by setting the footer in view-header/footer 13

14.Demo https://videoportal.intel.com/media/0_selfn06l put your #assignedhashtag here by setting the footer in view-header/footer 14

15.Future Work • Kubernetes Device Manager support for Movidius • Explore native Spark support for Movidius • Kubernetes/Spark Scheduler Enhancements – Wait for HW or launch anywhere? – Speed, power, and latency implications – Targeted models #HWCSAIS11 15

16.Conclusion • FPGA support more mature • Give Movidius a try, delightful at its price point!! https://developer.movidius.com/start https://github.com/movidius/ncsdk https://github.com/kzzalews/sparkaisummit_movidius #HWCSAIS11 16

17.References Kubernetes Device Plugin: • https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ • https://kubernetes.io/docs/concepts/cluster-administration/device-plugins/ • https://github.com/kubernetes/community/blob/master/contributors/design- proposals/resource-management/device-plugin.md FPGAs and the Movidius Chip • https://venturebeat.com/2018/02/27/intel-makes-it-easier-to-bring-movidius-ai- accelerator-chip-into-production/ • https://newsroom.intel.com/editorials/introducing-myriad-x-unleashing-ai-at-the-edge/ • https://www.altera.com/products/fpga/stratix-series/stratix-10/overview.html • https://medium.com/@xslittlegrass/self-driving-car-in-a-simulator-with-a-tiny-neural- network-13d33b871234 SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters: • https://arxiv.org/ftp/arxiv/papers/1505/1505.01120.pdf #HWCSAIS11 17

18.Thank You! Karol.Zalewski@intel.com Malini.K.Bhandaru@intel.com