Accelerating Apache Spark with Intel QuickAssist Technology

Enterprise and cloud data centers are under pressure to continuously expand revenue-generating and value-added services, such as compute intensive and I/O-demanding Big Data solutions, which moves large amounts of data into and out of storage, and sends it across the networked clusters.

A significant amount of time and network bandwidth can be saved when the data is compressed before it is passed between servers, as long as the compression/decompression operations are efficient and require negligible CPU cycles. Intel QuickAssist Technology allows compute-intensive workloads, specifically compression, to be offloaded from the CPU core onto dedicated hardware accelerators. Intel Quick Assist Technology enables developers to create software solutions that leverage compression/decompression acceleration, accessing the technology through APIs in the Intel QuickAssist Software.

This talk provides developers with information on Intel QuickAssist Technology and presents some key use cases to provide background for them to understand how they can take advantage of the hardware-based compression acceleration and performance improvements available with Intel QuickAssist Technology in their Spark applications.

展开查看详情

1.WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

2. Accelerating Apache Spark with Intel QuickAssist Technology Xie Qi, Intel #UnifiedDataAnalytics #SparkAISummit

3. LEGAL NOTICES No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, Intel® are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others Copyright © 2018 Intel Corporation. 3

4.Agenda • QAT Overview • QAT Acceleration Opportunity in Big Data • High Level Architecture for QATCodec in Big Data • Performance Evaluation • QAT Impacts On TPCX-BB Queries Analysis #UnifiedDataAnalytics #SparkAISummit 4

5.Intel QuickAssist Technology Overview • QAT provides security(encryption) HW acceleration and compression HW acceleration • QAT makes use of a set of APIs to abstract out the hardware, so the same application can run on multiple generations of QAT hardware • Customers can also make use of patches that we have provided to popular open source software, so they can minimize or eliminate their effort to learn the API • Get QAT resources from https://01.org/intel-quickassist-technology 5

6.Why QAT is important to BIG Data 6

7.Data Compression Pipeline in BIG Data Framework 7

8.Data Compression Pipeline in BIG Data - I/O Characteristics 8

9.QAT Acceleration Opportunity 9

10.About QATCodec 10

11.Benchmark Configuration 11

12.Performance Evaluation – Spark Sort Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 11 12

13. Performance Evaluation – Map Reduce Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 11 13

14.Performance Evaluation – Hive on Map Reduce Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 11 14

15.QAT VS. Snappy – Compression Ratio (1TB data scale) 15

16.Impact Analysis On Queries - Compression Ratio 16

17.Improved Query - Q22 (Map Join conversion) 17

18.Degrade Query - Q12 (GC issue caused by Map Join) 18

19.Degrade Query - Q12 (GC Issue Caused By Map Join) – Cont’d 19

20.Micro-view Query Comparison - IO wait 20

21.Micro-view Query Comparison – No IO wait 21

22.DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT