Intel MLlib:构建平台优化的Spark机器学习

主题:
Intel MLlib:构建平台优化的Spark机器学习

时间:
10月15日

讲师:
吴晓昶
英特尔亚太研发有限公司大数据部门的高级软件工程师,主要研究方向为并行计算,大数据系统和机器学习,CPU和GPU的性能优化。目前关注Spark和机器学习的系统性能优化。

议题介绍:
Intel MLlib是一个为Apache Spark MLlib优化的软件包。它在保持和Spark MLlib兼容的同时,在底层利用原生算法库来实现在CPU和GPU上的最优化算法,同时使用Collective Communication来实现效率更高的节点间通信。我们的初步结果表明,该软件包在最小化应用改动的基础上,可以极大地提升MLlib算法的性能。

展开查看详情

1.Intel MLlib: Platform Optimized Machine Learning At Scale Xiaochang Wu

2. Legal Disclaimer No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, Atom, Core, Iris, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others © 2017 Intel Corporation. IAGS Intel Architecture, Graphics, and Software 2

3. Optimization Notice Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. IAGS Intel Architecture, Graphics, and Software 10/15/2020 3

4. Agenda ▪ Spark MLlib Overview ▪ Intel MLlib Introduction ▪ Optimized K-means in Intel MLlib ▪ Conclusion IAGS Intel Architecture, Graphics, and Software 4

5.Spark MLlib Overview 5

6. Spark MLlib Overview Spark ML Pipelines (Scala, Java, Python, R) ML Algorithms Featurization Utilities Persistence common learning feature extraction, linear algebra, saving and load algorithms such as transformation, statistics, data algorithms, models, classification, dimensionality handling, etc. and pipelines regression, clustering reduction, and selection Spark Core IAGS Intel Architecture, Graphics, and Software 6

7. Spark MLlib Ecosystem & Users ▪ Seamless integration of Spark ML pipelines with XGBoost, LightGBM, Horovod (TensorFlow, Keras, PyTorch, MXNet) and Analytics Zoo. Spark ML Pipelines (Scala, Java, Python, R) ML Algorithms Other MLlib XGBoost on Spark Horovod on Spark, A/Z Utilities Common learning (Tesnforflow, Keras, algorithms in MLlib LightGBM on Spark PyTorch, MXNet) Users: ▪ Uber ML Platform “Michelangelo” built most of their ML models based on Spark MLlib. IAGS Intel Architecture, Graphics, and Software 7

8. Spark MLlib Advantages Ingest Store Prepare Train ▪ Spark as a unified platform, MLlib seamless integrates with SQL, Streaming, GraphX and other ML/DL frameworks. ▪ No additional glue code for the entire pipeline. IAGS Intel Architecture, Graphics, and Software 8

9. Spark MLlib : The Missing Pieces For High Performance Problems: MLlib Computation Layer ▪ MLlib can only utilize L1/L2 BLAS in oneMKL, no implementation using L3 L1/L2 BLAS-based L3 BLAS-based oneDAL-based BLAS or oneDAL for acceleration. Implementation Implementation Implementation ▪ Spark shuffle is slow for oneMKL oneDAL communication during distributed ML training. Optimization Opportunities: MLlib Communication Layer ▪ Optimize MLlib computation using Spark Shuffle oneCCL oneDAL and L3 BLAS in oneMKL. ▪ Optimize MLlib communication using Spark MLLib oneCCL. IAGS Intel Architecture, Graphics, and Software 9

10.Intel MLlib Introduction 10

11. Introducing Intel MLlib ▪ Performance: achieve >2x speedup Performance ▪ Compatibility: 100% API compatible with Spark MLlib; Same level accuracy. CPU/GPU Compatibility ▪ Ease-of-use: minimal setup and configuration ▪ CPU/GPU: enable both Intel CPU and Ease-of-use GPU IAGS Intel Architecture, Graphics, and Software 11

12. Intel MLlib Overview ▪ Intel MLlib is a drop-in replacement of Intel MLlib the vanilla MLlib, released and maintained by Intel. Optimized ML Algorithms Featurization Pipeline ▪ Exposes unchanged APIs. NO • Classification • Regression application change. • Clustering • Collaborative Filtering Utilities Persistence ▪ Focus on optimizing the most Intel® oneAPI Acceleration Libraries demanded algorithms with input from oneMKL oneTBB oneDAL oneCCL our partners and customers. ▪ Achieve better performance with Apache Spark Core CPU/GPU integrated Intel performance libraries such as oneMKL, oneDAL and oneCCL. IAGS Intel Architecture, Graphics, and Software 12

13. Intel® oneAPI Acceleration Libraries oneMKL ▪ Accelerate math processing routines, including matrix algebra, fast Fourier transforms (FFT), and vector math. oneTBB ▪ Simplify parallelism with this advanced threading and memory-management template library. oneDAL ▪ Boost machine learning and data analytics performance. oneCCL ▪ Implement optimized communication patterns to distribute deep learning model training across multiple nodes. Intel® oneAPI IAGS Intel Architecture, Graphics, and Software 13

14. Using Intel MLlib ▪ Develop Spark application using standard Spark ML APIs. ▪ Download Intel MLlib Jar and submit the application using the following command. $SPARK_HOME/bin/spark-submit --conf “spark.files=/path/to/intel-mllib.jar” --conf “spark.driver.extraClassPath=/path/to/intel-mllib.jar” --conf “spark.executor.extraClassPath=./intel-mllib.jar” ... # other options IAGS Intel Architecture, Graphics, and Software 14

15. Intel MLlib Implementation MLLib Python API (PySpark) ▪ Initialize oneCCL in each task MLlib Scala API ▪ Convert data format between Spark and oneDAL ▪ Start distributed oneDAL algorithms. Spark MLlib Distributed oneDAL API Java Wrapper oneCCL Java (Scala) Wrapper ▪ Communication during distributed Native Distributed oneDAL based on oneCCL training is based on oneCCL oneCCL Native oneDAL Native Spark Components Intel MLlib Native components Spark Core Intel MLlib Java/Scala components IAGS Intel Architecture, Graphics, and Software 15

16. Distributed Training in Intel MLlib Spark MLlib Intel MLlib Multiple Spark map/reduce stages for A single Spark stage for distributed distributed training using Spark shuffle training using oneCCL for communication. for communication. Stage1 Stage2 Stage3 Stage4 Stage1 Stage2 Executor Executor oneCCL Executor Executor oneCCL Executor Executor Spark Spark Shuffle Shuffle Distributed Training Distributed Training IAGS Intel Architecture, Graphics, and Software 16

17.Optimized K-means in Intel MLlib 17

18. K-means Overview • K-means is one of the most used clustering algorithms that clusters the data points into a predefined number of clusters. IAGS Intel Architecture, Graphics, and Software 18

19. Distributed K-means Training • K-means Input: • Initial Centroids • Points • K-means Output: • Trained Centroids • Cost • Assignments ccl_bcast Assign data Broadcast points to K updated clusters centroids ccl_allgatherv Gather partial results Recalculate centroids for each cluster IAGS Intel Architecture, Graphics, and Software 19

20. Fallback Strategy in Intel MLlib ▪ Fallback to Spark MLlib if features or parameters are not supported by oneDAL. ▪ In K-means, oneDAL doesn't support cosine distance and weighted points. If these parameters are used, Intel MLlib will fallback to Spark MLlib. class pyspark.ml.clustering.KMeans(featuresCol='features’,  predictionCol='prediction', k=2, initMode='k-means||', initSteps=2, tol=0.0001, maxIter=20, seed=None, distanceMeasure='euclidean', weightCol=None) Call Spark MLlib DistanceMeasure K-means == “cosine” || Yes weightCol.nonEm No pty Call Intel MLlib K-means IAGS Intel Architecture, Graphics, and Software 20

21. Intel MLlib K-means Performance Achieved >10x better performance on training, >5x better performance over the full pipeline K-means Speedup Configuration: End-to-End: >5x • Data Size: 250 GB • num_of_samples = 62,000,000 • dimensions = 1000 Duration Training: >10x • K = 200 • max_iteration = 20 • initMode = Random Spark MLlib Intel MLlib Cluster: Initialization Data Conversion • 1 master + 3 workers with Intel Xeon Training Summary IAGS Intel Architecture, Graphics, and Software 21

22. Conclusion • Intel MLlib is an optimized package to accelerate machine learning algorithms in Apache Spark MLlib. • Enable both Intel CPU and GPU, focus on Performance, Compatibility, Ease-of-use. • K-means results are very promising. Intel MLlib site: https://github.com/Intel-bigdata/OAP/tree/master/oap-mllib IAGS Intel Architecture, Graphics, and Software 22

23.23

阿里巴巴开源大数据EMR技术团队成立Apache Spark中国技术社区,定期打造国内Spark线上线下交流活动。请持续关注。 团队群号:HPRX8117 微信公众号:Apache Spark技术交流社区