Intel Innovation Hardware Acceleration for Big Data Analytics and AI

播放视频

视频文档

Intel Innovation Hardware Acceleration for Big Data Analytics and AI

下载 1

暮雪

发布于

1506

人观看

#信息技术

大数据分析在数据移动量和操作并行化方面提出了新的计算挑战。
按比例完成。在本文中，我们将介绍一些新的硬件功能，例如optane dc持久内存，它可以实现新形式的更快存储和新的
诸如vnni之类的指令，可以使公共操作更广泛地并行化。我们将展示这些功能在正确启用时如何提供
显著提高了广泛使用的工作负载的性能。

展开查看详情

1 .Paolo Narvaez Sr. Principal Engineer, Engineering Director for Analytics and AI Solutions Enterprise and Government Organization XLDB April 3, 2019

2 . CLOUD COMMUNICATION ENTERPRISE SERVICE SERVICE AND PROVIDERS PROVIDERS GOVERNMENT DRIVE THE BEST EXPERIENCE & SLA PREPARING FOR 5G SILOED APPLICATIONS & DATA PACKETS PROTECT DATA PRIVACY NETWORK TRANSFORMATION SLOW DEPLOYMENTS OF NEW SERVICES DRIVE HIGHER REVENUE DEMANDS AT THE NETWORK EDGE SECURITY EXPLOITS GROWING REDUCE DOWNTIME NEW SERVICE OPPORTUNITIES / RPU DATA MOVEMENT & NETWORK BOTTLENECKS DRIVE OPERATIONAL EFFICIENCY CONVERGED WORKFLOWS & INFRASTRUCTURE INTEL DATA CENTER GROUP MOVE | STORE | PROCESS

3 . Move Faster STORE MORE PROCESS EVERYTHING ETHERNET INTEL® XEON® SCALABLE PROCESSORS SILICON PHOTONICS INTEL® XEON® D PROCESSORS INTEL ATOM® C PROCESSORS OMNI-PATH FABRIC DC SERIES SSD QLC 3D NAND DRIVE INTEL® FPGAS INTEL® NERVANA™ NNP INTEL® MOVIDIUS™ TECHNOLOGY HARDWARE SOFTWARE AND FASTER TIME TO ENHANCED SYSTEM-LEVEL SECURITY OPTIMIZED VALUE INTEL DATA CENTER GROUP MOVE | STORE | PROCESS

4 . Analytics Artificial Hybrid Network HPC Intelligence Cloud Transformation Microsoft SQL Server* Universal Customer BigDL on Apache Spark* Microsoft Azure Stack* Simulation & Modeling Business Operations Premises Equipment Microsoft SQL Server AI Inferencing Red Hat OpenShift* Professional NFVi: Red Hat* Enterprise Data Container Platform Visualization Warehouse Windows Server* VMware vSAN* NFVi: Ubuntu* Genomics Analytics Microsoft SQL Server Microsoft Windows HPC AI Converged NFVi: FusionSphere* Enterprise Data Server Software Defined* Warehouse Visual Cloud Linux* Huawei FusionStorage* Delivery Network SAP HANA* Blockchain: Hyperledger Fabric* 1H’19 Solution Refresh - Existing Workloads 1H’19 Solutions on NEW Workloads INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/SELECTSOLUTIONS

5 . BUILT-IN UNINTERRUPTED GROUNDBREAKING EMBEDDED ARTIFICIAL INTELLIGENCE HARDWARE ENHANCED ENHANCED VALUE LEADERSHIP WORKLOAD PERFORMANCE MEMORY INNOVATION ACCELERATION SECURITY AGILITY & UTILIZATION INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/XEONSCALABLE

6 . Advanced performance SCALABLE PERFORMANCE Featuring intel® speed select technology (3 in 1) 9242 8276 8260Y 2.0TB & 4.5 TB 48 3.8 2.3 71.5 350 28 4.0 2.2 38.5 165 SUPPORT FOR DDR4 MEMORY CAPACITY SUPPORT 24 3.9 2.4 35.75 165 SUPPORT FOR CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP SKUS AVAILABLE 9222 32 3.7 2.3 71.5 250 CORES TURBO BASE CACHE TDP 8260 24 3.9 2.4 35.7 165 CORES TURBO BASE CACHE TDP SUPPORT FOR 2.0TB & 4.5TB DDR4 MEMORY CAPACITY SUPPORT SKUS AVAILABLE 6240Y 18 3.9 2.6 24.75 150 CORES TURBO BASE CACHE TDP SUPPORT FOR 9221 32 3.7 2.1 71.5 250 CORES TURBO BASE CACHE TDP 8253 16 3.0 2.2 35.7 165 CORES TURBO BASE CACHE TDP SUPPORT FOR 4214Y 12 3.2 2.2 16.5 CORES TURBO BASE CACHE 85 TDP Optimized for highest per-core SCALABLE performance 6252 24 3.7 2.1 35.75 150 NETWORKING/NFV specialized SUPPORT FOR 8280 2.0TB & 4.5TB 28 4.0 2.7 38.5 205 SUPPORT FOR DDR4 MEMORY CORES TURBO BASE CACHE TDP 6252N 24 3.6 2.3 35.75 150 CAPACITY SUPPORT SUPPORT FOR CORES TURBO BASE CACHE TDP SKUS AVAILABLE 6248 20 3.9 2.5 27.5 150 SUPPORT FOR CORES TURBO BASE CACHE TDP 8270 26 4.0 2.7 35.75 205 SUPPORT FOR CORES TURBO BASE CACHE TDP 6230N 20 3.5 2.3 27.5 125 CORES TURBO BASE CACHE TDP SUPPORT FOR 6240 2.0TB & 4.5TB 18 3.9 2.6 24.75 150 CORES TURBO BASE CACHE TDP SUPPORT FOR 8268 DDR4 MEMORY 24 3.9 2.9 35.75 205 SUPPORT FOR CORES TURBO BASE CACHE TDP CAPACITY SUPPORT 5218N SKUS AVAILABLE CORES TURBO BASE CACHE TDP 16 3.9 2.3 22 105 SUPPORT FOR 6238 2.0TB & 4.5TB 8256 22 3.7 2.1 30.25 140 SUPPORT FOR CORES TURBO BASE CACHE TDP 24 3.9 3.8 16.5 105 SUPPORT FOR DDR4 MEMORY CAPACITY SUPPORT CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP SKUS AVAILABLE VM density VALUE specialized 6254 18 4.0 3.1 24.75 200 6230 20 3.9 2.1 27.5 125 SUPPORT FOR SUPPORT FOR 6262V 24 3.6 1.9 33 135 SUPPORT FOR CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP 6246 12 4.2 3.3 24.75 165 5220 18 3.9 2.2 24.75 125 SUPPORT FOR SUPPORT FOR 6222V 20 3.6 1.8 27.5 115 SUPPORT FOR CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP 6244 8 4.4 3.6 24.75 150 5218 SUPPORT FOR 16 3.9 2.3 22 125 SUPPORT FOR CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP Long-life cycle and nebs-thermal FRIENDLY 6242 16 3.9 2.8 22 150 4216 6238T SUPPORT FOR CORES TURBO BASE CACHE TDP 16 3.2 2.1 16.5 100 22 3.7 1.9 30.25 125 SUPPORT FOR CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP Available processor options 6234 8 4.0 3.3 24.75 130 4214 6230T SUPPORT FOR CORES TURBO BASE CACHE TDP 12 3.2 2.2 16.5 85 20 3.9 2.1 27.5 125 SUPPORT FOR CORES TURBO BASE CACHE TDP TDP UP TO CORES TURBO BASE CACHE Large DDR memory tier SUPPORT 4.5TB Medium DDR memory tier SUPPORT UP2TBTO 6226 12 3.7 2.7 19.25 125 4210 SUPPORT FOR 10 3.2 2.2 13.75 85 5220T 18 3.9 1.9 24.75 105 SUPPORT FOR CORES TURBO BASE CACHE TDP NETWORKING & NFV specialized CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP Search value specialized 5222 4 3.9 3.8 16.5 105 4208 SUPPORT FOR thermal & long-life cycle support 8 3.2 2.1 11 85 5218T 16 3.8 2.1 22 105 SUPPORT FOR CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP VM density value specialized CORES TURBO BASE CACHE TDP Intel® speed select TECHNOLOGY 5217 8 3.7 3.0 16.5 115 3204 SUPPORT FOR Turbo MAXIMUM INTEL® turbo boost technology 2.0 CORES TURBO BASE CACHE TDP 6 CORES 1.9 1.9 8.25 TURBO BASE CACHE 85 TDP 4209T 8 CORES 3.2 2.2 TURBO BASE 11 CACHE 70 TDP 5215 2.0TB & 4.5TB frequency (in GHz) 10 3.4 2.5 16.5 85 SUPPORT FOR DDR4 MEMORY Search application VALUE specialized CAPACITY SUPPORT CORES TURBO BASE CACHE TDP Base Base frequency (in GHz) SKUS AVAILABLE Cache Processor cache (in MB) 4215 8 3.5 2.5 16.5 85 5220S 18 3.9 2.7 24.75 125 SUPPORT FOR SUPPORT FOR Tdp Thermal design power (in WATTS) CORES TURBO BASE CACHE TDP CORES TURBO BASE CACHE TDP RCP recommended customer pricing ($ US DOLLARS) nfV Network function virtualization ALL INFORMATION PROVIDED IS SUBJECT TO CHANGE WITHOUT NOTICE. INTEL MAY MAKE CHANGES TO SPECIFICATIONS AND Vm Virtual machine PRODUCT DESCRIPTIONS AT ANY TIME, WITHOUT NOTICE. PLEASE VISIT INTEL.COM/XEON OR CONTACT YOUR INTEL REPRESENTATIVE TO OBTAIN THE LATEST INTEL PRODUCT SPECIFICATIONS. © COPYRIGHT 2019. INTEL CORPORATION. nebs Network equipment-building system

7 . Leadership XEON High Analytics & HIGH PERFORMANCE Performance Artificial DENSITY computing intelligence INFRASTRUCTURE 112 UP TO CORES 2X 3.8 3 UP TO MORE COMPUTE UP TO G H Z INTEL® TURBO BOOST UP TO T B DDR4-2933 Mt/S 2S SYSTEM DENSITY TECHNOLOGY 2.0 2S SYSTEM INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/XEONSCALABLE

8 . WORLD RECORDS AND COUNTING… 8+ SOCKETS 4.4 36 2X UP TO G H Z INTEL® TURBO BOOST TECHNOLOGY 2.0 UP TO SYSTEM MEMORY T B In a eight socket system UP TO SYSTEM MEMORY CAPACITY UP TO DDR4 2933 MT/s & 16 Gb DIMMs IN a SYSTEM USING INTEL® OPTANE™ DC PERSISTENT MEMORY AND DRAM COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSOR 95 World Records featuring Intel® Xeon® processors as of September 14, 2018. https://newsroom.intel.com/news/intel-xeon-scalable-processors-set-95-new-performance-world-records/ INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/XEONSCALABLE

9 . Business resilience with HARDWARE-ENHANCED SECURITY Agile service delivery with enhanced EFFICENCY INTEL® DEEP LEARNING BOOST • New integrated AI acceleration for inference MITIGATIONS FOR SIDE-CHANNEL methods • Enhanced performance over software-only mitigations INTEL® Speed select TECHNOLOGY • Configurable core/frequency processor attributes Encryption + Accelerators • Prioritize workload performance • Supports platform TCO optimizations • Available with Intel® QuickAssist Technology • Integrated Intel® Advanced Vector Extensions 512 • Intel® Optane™ DC persistent memory on-module data encryption INTEL® INFRASTRUCTURE MANAGEMENT TECHNOLOGIES • Industry-leading Intel® Virtualization Technology • Seamless VM migration for over 5 generations INTEL® Security libraries • Enhanced Intel® Resource Director Technology • New Intel resource orchestration software • New Intel® Threat Detection Technology • New Intel® Ethernet 800 series with Application Device • Intel® Trusted Execution Technology Queues (ADQ) and Dynamic Device Personalization (DDP) • Intel® Cloud Integrity Technology • New Intel® Optane™ DC D4800X (Dual Port) SSDs INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/XEONSCALABLE

10 . Intel® Ethernet Intel® Ethernet Intel® Ethernet 500 Series 700 Series 800 Series Niantic Fortville Columbiaville1 More Queue and Steering Hardware Assists  Application Device Queues (ADQ) Capability Richness Fully Programmable Pipeline Partially Programmable Pipeline  Table definition with DDP profile packages  Table definition modifications with a Dynamic Device Personalization (DDP) Storage profile package  RDMA (iWARP* & RoCE*v2) Fixed Pipeline Intel® Ethernet Adaptive Virtual Functions SR-IOV and VMDq (Intel® AVF) Less 10GbE 40GbE 100GbE INTEL& schedule 1Features DATA CENTER GROUP are subject to change. All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. MOVE | STORE | PROCESS INTEL.COM/ETHERNET

11 . Intel innovation for ultimate agility and flexibility High-Performance compute Any-to-any integration Any Developer CACHE-COHERENT INTEL® XEON® MEMORY, ANALOG, LOGIC INTEL® QUARTUS® PRIME PROCESSOR ACCELERATION DESIGN SOFTWARE FOR ANY NODE, SUPPLIER, IP HARDWARE DEVELOPERS MASSIVE BANDWIDTH RAPID EASIC-BASED OPTIMIZATION ONEAPI FOR SOFTWARE DEVELOPERS INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/FPGA

12 . INTRODUCING SIZE AND DATA PERSISTENCE FAST MEMORY OF STORAGE Enhance data insights by Redefining the Memory & Storage Hierarchy Supported on future Intel® Xeon® Scalable Processors Platinum and Gold SKUs INTEL DATA CENTER GROUP MOVE | STORE | PROCESS Intel Confidential – NDA Use Only 12

13 . Memory DRAM HOT TIER Persistent Memory Improving Storage SSD performance Storage Performance Gap 3D NAND SSD WARM TIER Delivering Intel® QLC 3D Nand SSD efficient storage Cost performance GAp HDD / TAPE COLD TIER INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/OPTANE

14 . PERSISTENT PERFORMANCE AFFORDABLE MEMORY CAPACITY & MAXIMUM CAPACITY FOR MANY APPLICATIONS AP P L ICAT ION AP P L ICAT ION VOLATILE MEMORY POOL OPTANE PERSISTENT DRAM AS CACHE DRAM MEMORY OPTANE PERSISTENT MEMORY INTEL DATA CENTER GROUP MOVE | STORE | PROCESS

15 . Intel Persistent Memory THE FAST PERFORMANCE OF MEMORY WITH THE AVAILABITY AND CAPACITY OF STORAGE NEW USER/APPLICATION APP / WORKLOAD KERNEL/OS FILE SYSTEM & DRIVER MEMORY/STORAGE DRAM INTEL® OPTANE™ DC INTEL® 3D NVMe SAS & SATA TYPE SSDS & SATA SMALL CAPACITY LARGE CAPACITY CHARACTERISTICS FAST PERFORMANCE SLOW PERFORMANCE I/O CONTROLLER PROCESSING Future Intel® Xeon® Scalable Processor It’s MEMORY. It’s STORAGE. It’s BOTH. FLEXIBLE AND SCALABLE TO ACCELERATE YOUR WORKLOAD’S DEMANDS AND DATA INSIGHTS DATA CENTER GROUP Intel Confidential – NDA Use Only 15

16 . MEMORY DRAM HOT TIER PERSISTENT INTEL® OPTANE™ SSD DC D4800X DUAL-PORT MEMORY • PERFORMANCE + RESILIENCY FOR CRITICAL ENTERPRISE IT APPS • DUAL PORT CONNECTIONS ENABLE 24X7 DATA AVAILABILITY WITH REDUNDANT, HOT SWAPPABLE DATA PATHS STORAGE Intel® 3D Nand SSDs INTEL® SSD D-5 P4326 E1.L • COST-OPTIMIZED, ENABLES GREATER WARM STORAGE • NEW E1.L FORM FACTOR SCALABLE TO ~1PB (IN 1U) HDD / TAPE COLD TIER INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/OPTANE

17 . BEST FIT IS SHOWN, BOTH PRODUCTS MAY BE VIABLE storage Infrastructure database AI / analytics hpc COMMS RDMA/Replication Memory Caching/Persistence Real Time Analytics Content Delivery VMWare ESXi* SAP HANA* Network (CDN) Oracle Exadata* SAS* MSFT Hyper-V* MS-SQL* Comms SP custom KVM* Oracle Exadata* SDS Redis Labs* Aerospike* Machine Learning Analytics Memcached* Redis* Apache Spark* Ceph* VDI RocksDB* Hyper-Converged (HCI) Scratch & IO Nodes Storage memory HPC Flex Memory VMware vSAN VMWare ESXi* Microsoft S2D MSFT Hyper-V* Nutanix* KVM* Cisco HyperFlex* INTEL DATA CENTER GROUP MOVE | STORE | PROCESS INTEL.COM/OPTANE

18 .THREE SOLUTION EXAMPLES 2ND GEN INTEL® XEON® SCALABLE + INTEL® OPTANE™ DC PERSISTENT MEMORY SQL Server Database, TimesTen 18

19 .DELIVER MORE FOR LESS WITH INTEL® OPTANE™ DC PERSISTENT MEMORY + SAP HANA More capacity 3 TB DRAM + 6 TB Intel® Optane™ DC persistent memory = 9TB TOTAL 13x SAP HANA Go faster Restart time Restart time Faster MINIMIZE DOWNTIME 20 mins 90 seconds restart1 SAVE MORE Cost / DB Terabyte ~$62,495 USD CPU: 4 Intel® Xeon® Platinum 8280 Processor ~$38,357 USD CPU: 4 Intel® Xeon® Platinum 8280M Processor 39% Cost savings2 MEMORY: 48 x 128GB DDR4 for 6 TB system MEMORY: 24 x 128GB DDR4 + 24 x 256GB Intel Optane DC PMEM for 9TB system Performance results are based on testing as of Jan 30, 2019 and may not reflect all publicly available security updates. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. 1. For detailed configs and pricing see slide 13. Columnar store entire reload into DRAM for 1.3TB data set is 20 mins. Entire system restart before is 32 Minutes and with DCPMM it is 13.5 Minutes (12 mins for OS + 1.5 mins) Pricing Guidance as of March 1, 2019. Intel does not guarantee any costs or cost reduction. You should consult other information and performance tests to assist you in your purchase decision. 19

20 . FASTER QUERY PERFORMANCE WITH MORE USERS WITH 2ND GEN INTEL XEON PROCESSOR + SQL SERVER* ® ® FASTER DATA WAREHOUSE QUERY INCREASE VMS PER NODE FOR MULTI- PERFORMANCE WITH LATEST SW & HW TENANT VIRTUALIZED DATABASES PAST 33,681 queries per hour SQL SERVER CUSTOMER at 1TB scale factor with a 2S Intel Xeon E5-2699 22 Microsoft SQL VM instances with DRAM EXPERIENCE memory at ~$1,588 USD per VM v3 (4 yr old system) CUSTOMER EXPERIENCE With 2S Intel Xeon Platinum 8280 processors: 30 Microsoft SQL VM instances at TODAY 903,302 queries per hour ~$1,108 USD per VM at 1TB scale factor 36% more VMs per node2 26.8X better1 30% lower estimated HW cost per VM3 Performance results are based on testing by Intel as of 2/8/2019 and may not reflect all publicly available security updates. See configuration disclosures for details. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. 1-2 Configuration: See slide 13, 14 21 Pricing Guidance as of March 1, 2019. Intel does not guarantee any costs or cost reduction. You should consult other information and performance tests to assist you in your purchase decision.

21 .MINIMIZE DOWNTIME & IMPROVE INSIGHTS WITH ORACLE* DATABASE AND TIMESTEN IMPROVED FASTER IN-MEMORY BETTER TRANSACTION ORACLE DB & TIMESTEN OPERATIONAL EFFICIENCY DB START UP THROUGHPUT & DURABILITY WITH ORACLE DATABASE WITH ORACLE TIMESTEN WITH ORACLE TIMESTEN Wholesaler supply chain managing Copy database image from persistent Log buffer in volatile DRAM PAST warehouse-to-store inventory storage into volatile DRAM Transactions commit to buffer…then CUSTOMER with 2S 5 year old system 1.35 TB database with buffer written synchronously to storage EXPERIENCE Efficiency measured by transactions per minute (TPM) > 10 min restart time Average throughput =176K TPS With 2nd Gen Intel Xeon Scalable processors: Database is persistent CUSTOMER Log buffer in persistent memory Wholesaler can now manage 2.7 TB database with Immediate persistence, no write to storage EXPERIENCE TODAY 3.7x more TPMs < 1 second restart time Average throughput =1.16M TPS 3.77X better1 ~1 second2 6.49X better3 Performance results are based on testing by Intel as of October 2018 and may not reflect all publicly available security updates. See configuration disclosures for details. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. 1-3 Configuration: See slides 16, 17 24

22 . INTEL OPTIMIZATION FOR CAFFE RESNET-50 NEW AI ACCELERATION INFERENCE THROUGHPUT (IMAGES/SEC) INTEL® XEON® PLATINUM 9200 PROCESSORS WITH INTEL® DL BOOST VECTOR NEURAL NETWORK 11 INSTRUCTION For BUILT-IN inference Acceleration INTEL® XEON® PLATINUM 8200 PROCESSORS WITH INTEL® DL BOOST 10 Optimizations for Developers & Data Scientists Toolkit INTEL® XEON® PLATINUM 8100 PROCESSORS 5.7X 9 optimized 1.0 8 Frameworks Jul’17 Dec’18 Apr’19 Data type Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 52. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. INTEL DATA CENTER GROUP MOVE | STORE | PROCESS AI.INTEL.COM

23 . Making inference go faster: vector instructions • AVX 512: Advanced Vector x-86: first launched in 2013 (!) – Xeon Phi • Vector arithmetic => perfect for deep neural networks • AVX 512 VNNI: multiply & accumulate 64 8- bit values -> 32 bit result with a single instruction • 4x more operations than 32bit • ¼ the memory vs. 32bit • In practice: roughly 2x speedup 28

24 .Accelerated Inference: 2nd Gen Xeon Scalable Processor Intel has a new vector neural network instruction (VNNI) to extend the Intel® AVX-512 instruction. Available on all 2nd Generation Intel® Xeon® Scalable Processors, VNNI speeds up dense computations characteristic of convolutional neural networks (CNNs) and deep neural networks (DNNs). 2nd generation Intel®AVX-512: with Intel® Deep Learning Boost Accelerating ai/dl inference for: Input INT8 Intel® Deep Learning Boost Output Input <Instruction 1> INT32 Up to 2x faster INT8 vpdpbusd Constant INT32 With INT8 instruction2 Image classification Speech recognition 1st generation Intel®AVX-512: without Intel® Deep Learning Boost Language translation Object detection Input Output Output Output INGT8 INT16 INT32 INT32 <Instruction 1> <Instruction 1> <Instruction 1> vpmaddubsw vpmaddwd vpaddd And more! Input Constant Constant INT8 INT16 INT32 29 29

25 . Optimized Deep Learning Frameworks and Toolkits Gen on Gen Performance gains for ResNet-50 with Intel® DL Boost 2S Intel® Xeon® Platinum 8280 Processor vs 2S Intel® Xeon® Platinum 8180 Processor Intel® Xeon® 2nd Gen Intel® Scalable Xeon® Scalable Processor Processor INT8 w/ 3.0x 3.7x 3.9x 4.0x 3.9x FP32 Intel® DL Boost INT8 w/ INT8 Intel® DL Boost 1.8x 2.1x 1.8x 2.3x 1.9x See Configuration Details 5 Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. No product can be absolutely secure. See configuration disclosure for details. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance 30

26 .Intel® Select Solution for AI Inferencing Jump start your AI strategy with low latency and high throughput inferencing High Inference Low Latency Accelerate Time Performance from Intel® Optane™ SSDs and Intel® to Inference with 2nd generation Ethernet Network with use of the Intel® Intel® Xeon® Scalable Adapters Distribution of processors featuring Intel® OpenVINO™ toolkit Deep Learning Boost Learn more at intel.com/selectsolutions 31

27 .Configuration Intel® Select Solution for AI Inferencing Hardware Base – single node Plus – single node Option 1 Plus – single node Option 2 Software Version CPU 2 x Intel® Xeon® Scalable CPU 2 x Intel® Xeon® Scalable CPU 2 x Intel® Xeon® Scalable CPU OpenVINO Toolkit 2018 R5 Runtime 6248 (CLX Gold) 8268 (CLX Platinum) 6248 (CLX Gold) OpenVINO Model 0.4 192 GB 384 GB 384 GB Server Memory (min) (12 x 16 GB 2666MHz DDR4 ECC (12 x 32 GB 2666MHz DDR4 ECC (12 x 32 GB 2666MHz DDR4 ECC TensorFlow 1.12 RDIMM) RDIMM) RDIMM) PyTorch 1.0.1 Boot Drive 1 x 256GB Intel® SSD DC P4101 1 x 256GB Intel® SSD DC P4101 1 x 256GB Intel® SSD DC P4101 (M.2 80mm PCIe 3.0 x4, 3D2, TLC (M.2 80mm PCIe 3.0 x4, 3D2, TLC (M.2 80mm PCIe 3.0 x4, 3D2, TLC MXNet 1.3.1 ) or higher ) or higher ) or higher Intel Python 2019 Update 1 Storage Data Drive: 3.2 TB (NVMe Intel Plus Config Options Data Drive: Intel DC P4610 Series Data Drive: Intel DC P4610 Series MKL-DNN 0.17 (implied by OpenVINO) SSD P4610) Cache Drive: 375 GB (NVMe 1.6TB 15mm U.2 NVMe SSD Coming Cache Drive: Intel DC P4800X Q2 1.6TB 15mm U.2 NVMe SSD Cache Drive: Intel DC P4800X Minimum Performance P4800) [Optane] Series 375gb U.2 NVMe SSD Series 375gb U.2 NVMe SSD Benchmark Standards Accelerator Intel® Programmable Acceleration OpenVINO Toolkit Card with Intel Arria® 10 GX FPGA At least 2000 images per second with Top-5 accuracy of Data 25 GbE, dual port 25 GbE, dual port 25 GbE, dual port ResNet 50 91% Network With ImageNet TensorFlow Framework Plus Option 1: Plus Option 2: At least 1300 images per Xeon Platinum Higher Memory second with Top-5 accuracy of Higher Memory FPGA 91% Required elements bolded. 32 Complete configuration details to be documented in Reference Design – ETA March 2019.

28 .Intel® Select Solution for AI Inferencing Jump start your AI strategy with low latency and high throughput inferencing Minimum Throughput Minimum Throughput AT LEAST AT LEAST 2000* 1300* *All verified solution providers will New 2nd IMAGES PER SECOND1 IMAGES PER SECOND2 meet or exceed Generation DURING INFERENCE USING RESNET-50 AND TOP-5 ACCURACY OF 91% DURING INFERENCE USING RESNET-50 these values. AND TOP-5 ACCURACY OF 91% INCLUDES INCLUDES 3.75x IMPROVEMENT USING INTEL® DEEP LEARNING BOOST 3.77x IMPROVEMENT USING INTEL® DEEP LEARNING BOOST WITH MODEL CONVERSION TO INT-8 WITH MODEL CONVERSION TO INT-8 Learn more at intel.com/selectsolutions 33

29 . Optimized Libraries Intel® Math Kernel Library for Deep Neural Networks Intel® Data Analytics Acceleration Library Intel® Machine Learning Scalable Library Intel® Integrated Performance Primitives TensorFlow*, PyTorch* Optimized AI Frameworks Caffe*, PaddlePaddle* MXNET*, … Threading Building Blocks Parallel Programming FRAMEWORK Intel® MPI Library OpenMP* Compilers Intel® Compilers (LLVM, GCC, Fortran) nGraph Intel® VTune™ Amplifier Profiling tools Intel® Advisor Intel® Inspector Toolkits / Intel® Parallel Studio XE Intel® Distribution for Python* Intel® System Studio Data Plane Development Kit Intel® Distribution of OpenVINO toolkit Persistent Memory Development Kit Tool Suites Intel® Resource Director Technology (OWCA & Intel® PRM) Storage Performance Development Kit OWCA: ORCHESTRATION-AWARE WORKLOAD COLLOCATION AGENT INTEL® PRM: INTEL® PLATFORM RESOURCE MANAGER INTEL DATA CENTER GROUP MOVE | STORE | PROCESS SOFTWARE.INTEL.COM

0点赞

0收藏

1下载